[Linux-HA] Fail Count problem

Zachár Balázs zachar at direkt-kfki.hu
Wed Sep 13 07:53:28 MDT 2006


I don't understand it... I will write everything  about the problem:

I have tried to simulate resources failures... I did these steps:

1., I have an alias IPaddr on a-node on eth0 with heartbeat (monitored
with the standard OCF). I bring down my eth0 (ifdown eth0) to trying the
monitoring stuff. It was great, the resources are stopped on the a-node
and started on the b-node.

2., I bring up my eth0 on a-node, and I would like to migrate back there
the resources with the crm_resources command like this:
crm_resource -M -U a-linux -r group_1
(it didn't work. It stopped resources on b-node but on a-node didn't
start the resources)

3., I read some documents in HA's webpage and i found: I must clear the
fail counters before I failback the cluster. OK! Idid this:
crm_failcount -G -U a-node -r IPaddr_192_168_66_101    (i found, that is
the resource which found the error firstly)
The counter value was: 1

crm_failcount -D -U a-node -r IPaddr_192_168_66_101 (It was success)

4., Now I think the cluster is ready to "failback" and i did again:
crm_resource -M -U a-linux -r group_1

But It still not work!


What maybe the problem?

Thanks for help:
Balázs



More information about the Linux-HA mailing list