[Linux-HA] fail count was initialized after recovering fromSplitBrain

Andrew Beekhof beekhof at gmail.com
Thu Sep 13 03:34:13 MDT 2007


On 9/13/07, Yan Fitterer <yan at fitterer.org> wrote:
>
>
> Junko IKEDA wrote:
> >>> once again something about SplitBrain...
> >>> During SplitBrain, I wrecked the resource on the both nodes.
> >>> fail count was increased at this time.
> >>> But after recovering from SplitBrain, fail count returned to zero on
> > both!
> >>> Is this due to the restart of crmd or pengine/tengine?
> >> Most probably. The fail count belongs to the status section which
> >> is not saved.
> >
> > Where is the status section saved at?
>
> status section is never saved to disk. When the cluster is stopped, the
> status section disappears altogether.
>
> > I thought that CIB kept the status.
>
> Yes it does. But status has no meaning once the cluster is stopped - so
> it isn't kept. Hence failcounts being reset when cluster is restarted.
> As well, the failcount for a specific node will be reset when _that_
> node is restarted. How else could resources be allowed to start after a
> STONITH operation?
>
> > cib process seems not to be restarted in this case...
>
> there is no 'cib' process.

actually there is :-)

> If I understand things right, the crmd
> process handles all core CIB maintenance operations.

nope, all done by the CIB process

> Try pstree -p and
> look for the group of processes where the parent is "heartbeat".
>
> HTH
> Yan
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



More information about the Linux-HA mailing list