[Linux-HA] restart x times before a failover

Andrew Beekhof beekhof at gmail.com
Thu Sep 20 04:17:29 MDT 2007


On 9/18/07, Max Hofer <max.hofer at apus.co.at> wrote:
> On Tuesday 18 September 2007, Spindler Michael wrote:
> > Hi *,
> >
> > I´ve got a (hopefully) simple question:
> >
> > I have 5 node cluster, running 20 resources (single proceses). I would like
> > to have the following behavior: If a resource fails, it should try to
> > restart it on the same node. But this should be done max 2 times, then the
> > rsesource should failover to another node. The resource should not do a
> > auto failback, after a failed host is up again.
> >
> > I have tried the following:
> > - default_resource_failure_stickiness set to -1
> > - resource_stickiness set to 3 (on each resource)
> > - no places or other constraints configured.
> >
> > According to http://linux-ha.org/v2/faq/forced_failover we should get:
> >
> > (stickiness) / abs(failure stickiness) = maximum times, a resource can fail
> > before moved to another node.
> >
> > So in my case: 3 / abs(-1) = 3
> >
> > But my resources do a failover to other nodes immediatly after the first
> > failure.
> >
> >
> > Anyone here who is able to help me with this failover-scenario?
> First of all always provide the file created by the pengine which lead to the
> failover - so we can give you an answer ;-)  (see below for explanations).
>
> The best way to takle such kind of errors is following method:
>
> * trigger a resource failure

> * check the ha-log and see which CIB-status file was written on the failover
> (grep "PEngine Input stored" /var/log/halog) ---> they are usually stored
> in /var/lib/heartbeat/pengine

or just run:
    cibadmin -Ql > tmp.cib.xml
and use that.  much easier than hunting around in the logs :-)



More information about the Linux-HA mailing list