[Linux-HA] Problem with restarting (or moving) failed resource

Andrew Beekhof beekhof at gmail.com
Fri Oct 5 08:48:33 MDT 2007


On 10/5/07, Andrew W. Nosenko <andrew.w.nosenko at gmail.com> wrote:
> On 10/5/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > On 10/4/07, Andrew W. Nosenko <andrew.w.nosenko at gmail.com> wrote:
> > > Heartbeat-2.1.2
> > > If resource (test-daemon process) killed too frequently, then
> > > heartbeat marks this resource/process as "failed" and doesn't try to
> > > restart this process or move it to the another node.
>
> [skip]
>
> > > Logs of the full cycle (from start to stop) and "cibadmin -Q" output
> > > are attached.
> >
> > can you attach the following 2 files from awn:
> >   /var/lib/heartbeat/pengine/pe-warn-304.bz2
> >   /var/lib/heartbeat/pengine/pe-warn-305.bz2
> >
> > they contain exactly what the PE was working with at the time
>
> Sure.  Attached.

I think i may have misunderstood what you were asking previously.

What you're seeing is a bug that is triggered when the monitor action
fails on its first invocation.  If you grab one of the interim builds
you'll find the bug fixed.

The relevant patch is:
   http://hg.beekhof.net/lha/crm-stable/rev/8070a2a3d6b9



More information about the Linux-HA mailing list