[Linux-HA] Trying to understand: resource cant run anywhere
matilda at grandel.de
Fri Sep 7 12:09:00 MDT 2007
>>> Dave Augustus <davea at support.kcm.org> 07.09.2007 18:28 >>>
> What log tells me more details about WHY the sshd resource failed?
I hope you have enabled logd and you're logging to some log destination.
Look at the node where the resources initially ran.
I guess that the monitoring action had an error. grep for "ERROR".
After that look at the logfile of the node which was DC at this
time. You can see the "decision making" of crm/pengine based on the new
informations (e.g. failure of resource) it gets.
> If my understanding is correct, how do I correct this problem?
What I guess as I don't have logs:
* monitoring action returned indicating a problem with a resource.
* failcount for that resource is incremented
* cluster wants to move the resource to the other node
* failure happens there again
* failcount for this resource on the new node also incremented
* now there's no node to run the resource
If the monitor action of a resource says that there is something
wrong, failcount is incremented for that resource on that node.
crm_failcount -G -r <resid> -H <nodename>
BUT: In general this message means: CRM wants to apply all rules of cib
to the resources and there's one resource for which it ends up
that there is no node to run that resource.
There are many reasons for that. Sometimes also mistakes in the contraints
can lead to that.
Without logging just a guess.
More information about the Linux-HA