[Linux-HA] fail count was initialized after recovering
beekhof at gmail.com
Thu Sep 13 03:34:13 MDT 2007
On 9/13/07, Yan Fitterer <yan at fitterer.org> wrote:
> Junko IKEDA wrote:
> >>> once again something about SplitBrain...
> >>> During SplitBrain, I wrecked the resource on the both nodes.
> >>> fail count was increased at this time.
> >>> But after recovering from SplitBrain, fail count returned to zero on
> > both!
> >>> Is this due to the restart of crmd or pengine/tengine?
> >> Most probably. The fail count belongs to the status section which
> >> is not saved.
> > Where is the status section saved at?
> status section is never saved to disk. When the cluster is stopped, the
> status section disappears altogether.
> > I thought that CIB kept the status.
> Yes it does. But status has no meaning once the cluster is stopped - so
> it isn't kept. Hence failcounts being reset when cluster is restarted.
> As well, the failcount for a specific node will be reset when _that_
> node is restarted. How else could resources be allowed to start after a
> STONITH operation?
> > cib process seems not to be restarted in this case...
> there is no 'cib' process.
actually there is :-)
> If I understand things right, the crmd
> process handles all core CIB maintenance operations.
nope, all done by the CIB process
> Try pstree -p and
> look for the group of processes where the parent is "heartbeat".
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA