[Linux-HA] fail count was initialized after recoveringfromSplitBrain

Junko IKEDA ikedaj at intellilink.co.jp
Thu Sep 27 03:07:15 MDT 2007


> On 9/13/07, Junko IKEDA <ikedaj at intellilink.co.jp> wrote:
> > > > once again something about SplitBrain...
> > > > During SplitBrain, I wrecked the resource on the both nodes.
> > > > fail count was increased at this time.
> > > > But after recovering from SplitBrain, fail count returned to zero on
> > both!
> > > > Is this due to the restart of crmd or pengine/tengine?
> > >
> > > Most probably. The fail count belongs to the status section which
> > > is not saved.
> >
> > Where is the status section saved at?
> > I thought that CIB kept the status.
> > cib process seems not to be restarted in this case...
> 
> its reset whenever a node joins the cluster

sorry to keep saying the same thing over and over,
but it might cause confusion to reset CIB information whenever a node joins.
Besides, when I tried the following case, the return code of start action
was not reset.

1) There are two node; active and standby node
2) one resource is running on the active node
3) SplitBrain came up!
4) the resource would be going to start on the both node, 
   I drive it into failure on purpose on the standby node.
   so, the return code of start action would be -1 on standby.
   (it worked well)
5) after recovering SplitBrain, the return code on standby node was "-2"...
   and crm_mon on the active node also showed it as -2.
   
Why is it incremented?
the return code is kept at <status>, but it isn't reset when a node joins.

Thanks,
Junko




More information about the Linux-HA mailing list