[Linux-HA] Fail Count problem

Zachár Balázs zachar at direkt-kfki.hu
Thu Sep 14 02:01:23 MDT 2006


Firstly, thanks for the help John!

I tryied the command but it wasn't good, but if I restart the heartbeat
after the error on a-linux, I can  migrate back to there the resources
(migrate back the resources auto, but I know where can I fine tune this)...

I read the thread and if I understand it clean, this patch is only for
increment the fail count in more incidents... and I don't know how can I
install it... :( But I think, It is not necessary for me...


Thanks,
Balázs Zachár


John R Mocho írta:
>
> You might checkout the "querying resource failure count fails"
> thread.  I believe there was a patch mentioned there.
>
> In the short term, you can try: crm_resource -C -r IPaddr_192_168_66_101
> or more specifically: crm_resource -C -r IPaddr_192_168_66_101 -H a-linux
>
> This will get the lrmd whipped back into shape.  I usually like to run
> it without the -H command just to make sure that all of the nodes are
> cleaned up.
>
> John.
>
> On Wed, 13 Sep 2006, [ISO-8859-2] Zachár Balázs wrote:
>
>> Date: Wed, 13 Sep 2006 15:53:28 +0200
>> From: "[ISO-8859-2] Zachár Balázs" <zachar at direkt-kfki.hu>
>> Reply-To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
>> To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
>> Subject: [Linux-HA] Fail Count problem
>>
>> I don't understand it... I will write everything  about the problem:
>>
>> I have tried to simulate resources failures... I did these steps:
>>
>> 1., I have an alias IPaddr on a-node on eth0 with heartbeat (monitored
>> with the standard OCF). I bring down my eth0 (ifdown eth0) to trying the
>> monitoring stuff. It was great, the resources are stopped on the a-node
>> and started on the b-node.
>>
>> 2., I bring up my eth0 on a-node, and I would like to migrate back there
>> the resources with the crm_resources command like this:
>> crm_resource -M -U a-linux -r group_1
>> (it didn't work. It stopped resources on b-node but on a-node didn't
>> start the resources)
>>
>> 3., I read some documents in HA's webpage and i found: I must clear the
>> fail counters before I failback the cluster. OK! Idid this:
>> crm_failcount -G -U a-node -r IPaddr_192_168_66_101    (i found, that is
>> the resource which found the error firstly)
>> The counter value was: 1
>>
>> crm_failcount -D -U a-node -r IPaddr_192_168_66_101 (It was success)
>>
>> 4., Now I think the cluster is ready to "failback" and i did again:
>> crm_resource -M -U a-linux -r group_1
>>
>> But It still not work!
>>
>>
>> What maybe the problem?
>>
>> Thanks for help:
>> Balázs
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list