[Linux-HA] failed resources
John R Mocho
jmocho at royaldc.com
Tue Sep 12 15:30:32 MDT 2006
I have some resources that are allowed to fail (and do) up to 5 times on a
node before failing over to another node and trying there (for up to 11
nodes worth of failing). It can be done, 2.0.x and crm is a great
combination.
John.
On Tue, 12 Sep 2006, Matthias Dahl wrote:
> Date: Tue, 12 Sep 2006 22:59:56 +0200
> From: Matthias Dahl <mdmlha at designassembly.de>
> Reply-To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
> To: linux-ha at lists.linux-ha.org
> Subject: [Linux-HA] failed resources
>
> Hello...
>
> This is actually a combined post about failed resources. I hope nobody minds
> but this all is somehow connected. :-)
>
> First of all, I am currently looking for a way to have Heartbeat check a
> failed resource from time to time to see if it is functional again and in
> case it is, restart the resource. Does Heartbeat have such a feature...?
>
> I have read http://www.linux-ha.org/v2/faq/forced_failover carefully, yet,
> being new to Heartbeat, I just don't get it. :) From what I understand,
> Heartbeat detects a failure through the monitor functionality of a OCF
> resource agent for example. If a failure happens, it tries once to restart
> the resource and if that fails, stops it. Now what the FAQ says is, I can set
> failure stickiness and based on some formula, a resource can fail up to X
> times and then gets migrated to a new node. That's the point: how can a
> resource fail several times on one node, if Heartbeat tries just once to
> restart it and keeps the resource stopped if that fails?
>
> Thanks to anyone who can shed some light on this. :-)
>
> Best regards,
> Matthias Dahl
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list