[Linux-HA] fail count was initialized after recoveringfromSplitBrain
ikedaj at intellilink.co.jp
Thu Sep 27 03:07:15 MDT 2007
> On 9/13/07, Junko IKEDA <ikedaj at intellilink.co.jp> wrote:
> > > > once again something about SplitBrain...
> > > > During SplitBrain, I wrecked the resource on the both nodes.
> > > > fail count was increased at this time.
> > > > But after recovering from SplitBrain, fail count returned to zero on
> > both!
> > > > Is this due to the restart of crmd or pengine/tengine?
> > >
> > > Most probably. The fail count belongs to the status section which
> > > is not saved.
> > Where is the status section saved at?
> > I thought that CIB kept the status.
> > cib process seems not to be restarted in this case...
> its reset whenever a node joins the cluster
sorry to keep saying the same thing over and over,
but it might cause confusion to reset CIB information whenever a node joins.
Besides, when I tried the following case, the return code of start action
was not reset.
1) There are two node; active and standby node
2) one resource is running on the active node
3) SplitBrain came up!
4) the resource would be going to start on the both node,
I drive it into failure on purpose on the standby node.
so, the return code of start action would be -1 on standby.
(it worked well)
5) after recovering SplitBrain, the return code on standby node was "-2"...
and crm_mon on the active node also showed it as -2.
Why is it incremented?
the return code is kept at <status>, but it isn't reset when a node joins.
More information about the Linux-HA