[Linux-HA] Failover on monitor failure.
Dominik Klein
dk at in-telegence.net
Thu Nov 13 01:34:34 MST 2008
Alex Balashov wrote:
> Greetings,
>
> I am using a custom OCF RA and Heartbeat v2 + CRM/CIB for monitoring a
> custom service at the application level in an active-passive binary
> cluster.
>
> When the service is detected as failing on the first node, the resource
> manager tries to restart the service. I've set effective service and
> failure stickiness to almost zero so if it fails to start, it will fail
> over all the resources to the secondary node.
>
> What I want to know is whether it's possible to fail the service over
> immediately the moment a single monitor procedure fails, no questions
> asked, without any attempts to restart. If so, what cluster property
> sets should I set and how?
Set default-resource-failure-stickiness to -infinity.
cibadmin -U -o crm_config -X '<cluster_property_set
id="cib-bootstrap-options"><nvpair id="someid"
name="default-resource-failure-stickiness"
value="-infinity"/></cluster_property_set>'
should do.
Whichever monitor operation fails will render the resource unrunnable on
the node it failed on and the cluster will choose another node and start
the resource there.
In order to ever be able to run that resource on this node again, you
have to reset the particular failcount.
If you used pacemaker 1.0 you would not have to deal with
failure-stickiness anymore, but could use the very nice new
"migration-threshold" feature. Set this to 1 and after 1 failure, the
resource will failover, regardless of its score.
Regards
Dominik
More information about the Linux-HA
mailing list