[Linux-HA] restart x times before a failover

Max Hofer max.hofer at apus.co.at
Thu Sep 20 06:14:42 MDT 2007


On Thursday 20 September 2007, Andrew Beekhof wrote:
> On 9/18/07, Max Hofer <max.hofer at apus.co.at> wrote:
> > On Tuesday 18 September 2007, Spindler Michael wrote:
> > > Hi *,
> > >
> > > I´ve got a (hopefully) simple question:
> > >
> > > I have 5 node cluster, running 20 resources (single proceses). I would
> > > like to have the following behavior: If a resource fails, it should try
> > > to restart it on the same node. But this should be done max 2 times,
> > > then the rsesource should failover to another node. The resource should
> > > not do a auto failback, after a failed host is up again.
> > >
> > > I have tried the following:
> > > - default_resource_failure_stickiness set to -1
> > > - resource_stickiness set to 3 (on each resource)
> > > - no places or other constraints configured.
> > >
> > > According to http://linux-ha.org/v2/faq/forced_failover we should get:
> > >
> > > (stickiness) / abs(failure stickiness) = maximum times, a resource can
> > > fail before moved to another node.
> > >
> > > So in my case: 3 / abs(-1) = 3
> > >
> > > But my resources do a failover to other nodes immediatly after the
> > > first failure.
> > >
> > >
> > > Anyone here who is able to help me with this failover-scenario?
> >
> > First of all always provide the file created by the pengine which lead to
> > the failover - so we can give you an answer ;-)  (see below for
> > explanations).
> >
> > The best way to takle such kind of errors is following method:
> >
> > * trigger a resource failure
> >
> > * check the ha-log and see which CIB-status file was written on the
> > failover (grep "PEngine Input stored" /var/log/halog) ---> they are
> > usually stored in /var/lib/heartbeat/pengine
>
> or just run:
>     cibadmin -Ql > tmp.cib.xml
> and use that.  much easier than hunting around in the logs :-)
I do not know what the -l option does. Can you explain it please?

The manual is quite obscure:
-l command takes effect locally (rarely used, advanced option)

It seems that for this option it is important on which node the command is 
run. Is that true?

I have the problem that if something triggers a resource movement i have to 
tackle down the CIB-state in which the trigger occured. I used the those 
pengine logs to do that. 

Because the CIB after the change is not of much use if something changed in 
the cluster state afterwards.

kind regards Max



More information about the Linux-HA mailing list