[Linux-HA] querying resource failure count fails

Andrew Beekhof beekhof at gmail.com
Wed Sep 13 02:31:42 MDT 2006


On 9/12/06, Matthias Dahl <mdmlha at designassembly.de> wrote:
> On Tuesday 12 September 2006 22:56, John R Mocho wrote:
>
> > I ususally use: cibadmin -Ql -o status | grep fail-count
> > to search the whole cluster for any fail-count entries
>
> Returns nothing even though I just failed two resources.
>
> > From my experience, only failures due to monitor actions that return an
> > exit code of 1 (resource failure) will cause the fail-count to increment
> > (as opposed to exit code of 7 which simply means that the resource is not
> > running, very different that an error). Start and (aah-hem) stop errors
> > will not effect the fail-count.
>
> During my failure tests, the OCF resource agent returns OCF_ERR_GENERIC which
> is 1. Nevertheless, no failure count gets started or increased. :-(

I just fixed a bug that could be the cause here.
There were some scenarios in which the failure count was not incremented.
The fix can be found in http://hg.beekhof.net/lha/crm-stable

The exact change was:
    http://hg.beekhof.net/lha/crm-stable?cmd=changeset;node=62f1b3607975

>
> Could this somehow be related to my "adventurous" cib.xml? (see attached)
>
> Best regards,
> Matthias Dahl
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>


More information about the Linux-HA mailing list