[Linux-HA] failed resources

Matthias Dahl mdmlha at designassembly.de
Tue Sep 12 14:59:56 MDT 2006


Hello...

This is actually a combined post about failed resources. I hope nobody minds 
but this all is somehow connected. :-)

First of all, I am currently looking for a way to have Heartbeat check a 
failed resource from time to time to see if it is functional again and in 
case it is, restart the resource. Does Heartbeat have such a feature...?

I have read http://www.linux-ha.org/v2/faq/forced_failover carefully, yet, 
being new to Heartbeat, I just don't get it. :) From what I understand, 
Heartbeat detects a failure through the monitor functionality of a OCF 
resource agent for example. If a failure happens, it tries once to restart 
the resource and if that fails, stops it. Now what the FAQ says is, I can set 
failure stickiness and based on some formula, a resource can fail up to X 
times and then gets migrated to a new node. That's the point: how can a 
resource fail several times on one node, if Heartbeat tries just once to 
restart it and keeps the resource stopped if that fails?

Thanks to anyone who can shed some light on this. :-)

Best regards,
Matthias Dahl


More information about the Linux-HA mailing list