[Linux-HA] Failover on monitor failure.

Dominik Klein dk at in-telegence.net
Thu Nov 13 01:34:34 MST 2008


Alex Balashov wrote:
> Greetings,
> 
> I am using a custom OCF RA and Heartbeat v2 + CRM/CIB for monitoring a 
> custom service at the application level in an active-passive binary 
> cluster.
> 
> When the service is detected as failing on the first node, the resource 
> manager tries to restart the service.  I've set effective service and 
> failure stickiness to almost zero so if it fails to start, it will fail 
> over all the resources to the secondary node.
> 
> What I want to know is whether it's possible to fail the service over 
> immediately the moment a single monitor procedure fails, no questions 
> asked, without any attempts to restart.  If so, what cluster property 
> sets should I set and how?

Set default-resource-failure-stickiness to -infinity.

cibadmin -U -o crm_config -X '<cluster_property_set 
id="cib-bootstrap-options"><nvpair id="someid" 
name="default-resource-failure-stickiness" 
value="-infinity"/></cluster_property_set>'

should do.

Whichever monitor operation fails will render the resource unrunnable on 
the node it failed on and the cluster will choose another node and start 
the resource there.

In order to ever be able to run that resource on this node again, you 
have to reset the particular failcount.

If you used pacemaker 1.0 you would not have to deal with 
failure-stickiness anymore, but could use the very nice new 
"migration-threshold" feature. Set this to 1 and after 1 failure, the 
resource will failover, regardless of its score.

Regards
Dominik


More information about the Linux-HA mailing list