[Linux-HA] Problem with restarting (or moving) failed resource
beekhof at gmail.com
Fri Oct 5 08:48:33 MDT 2007
On 10/5/07, Andrew W. Nosenko <andrew.w.nosenko at gmail.com> wrote:
> On 10/5/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > On 10/4/07, Andrew W. Nosenko <andrew.w.nosenko at gmail.com> wrote:
> > > Heartbeat-2.1.2
> > > If resource (test-daemon process) killed too frequently, then
> > > heartbeat marks this resource/process as "failed" and doesn't try to
> > > restart this process or move it to the another node.
> > > Logs of the full cycle (from start to stop) and "cibadmin -Q" output
> > > are attached.
> > can you attach the following 2 files from awn:
> > /var/lib/heartbeat/pengine/pe-warn-304.bz2
> > /var/lib/heartbeat/pengine/pe-warn-305.bz2
> > they contain exactly what the PE was working with at the time
> Sure. Attached.
I think i may have misunderstood what you were asking previously.
What you're seeing is a bug that is triggered when the monitor action
fails on its first invocation. If you grab one of the interim builds
you'll find the bug fixed.
The relevant patch is:
More information about the Linux-HA