[Linux-HA] strange monitor behaviour

Pavol Gono palo.gono at gmail.com
Wed Jan 10 15:48:19 MST 2007


On 1/10/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> On 1/10/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > > But let's go to the original topic :)
> > > I installed heartbeat from sources, changeset 9934, configure options
> > > are custom like in previous posts. Distribution SLES10, nodes
> > > deboserver and pgbook. BasicSanityCheck was successful on both.
> > > I made very similar configuration like in the first post, resources
> > > IPaddr and Dummy.
> > > When I removed directory /tmp/a on machine, where resources were
> > > running, the same situation occured: Dummy resource is stopped, IPaddr
> > > resource remains on original node, no failover.
> > >
> > > Is this correct behaviour?
> >
> > if failure_count was not incremented for that resource on that node,
> > then this is not the expected behavior
> >
> > i will look at the logs momentarily
>
> i see:
>
> tengine[9819]: 2007/01/10_16:49:22 WARN: update_failcount: Updating
> failcount for x_Dummy on 92ba1bad-9c97-4f5d-b2f7-48492256893c after
> failed monitor: rc=7
>
> tengine[9819]: 2007/01/10_16:49:22 debug: log_data_element:
> abort_transition_graph: Cause       <nvpair
> id="status-92ba1bad-9c97-4f5d-b2f7-48492256893c-fail-count-x_Dummy"
> name="fail-count-x_Dummy" value="1"/>
>
> cib[9752]: 2007/01/10_16:49:22 debug: log_data_element: cib:diff: +
>          <nvpair
> id="status-92ba1bad-9c97-4f5d-b2f7-48492256893c-fail-count-x_Dummy"
> name="fail-count-x_Dummy" value="1"/>
>
> which would indicate that things are working as they should so far.
>
> can you also attach the following file on pgbook:
> /var/lib/heartbeat/pengine/pe-input-47.bz2

attached

>
> for some reason we consider deboserv out-of-bounds for x_Dummy:
> pengine[9820]: 2007/01/10_16:49:24 debug: native_print: Allocating:
> x_Dummy     (heartbeat::ocf:Dummy): Stopped
> pengine[9820]: 2007/01/10_16:49:24 debug: native_assign_node: Color
> x_Dummy, Node[0] pgbook: 1000000
> pengine[9820]: 2007/01/10_16:49:24 debug: native_assign_node: Color
> x_Dummy, Node[1] deboserver: -1000000
> pengine[9820]: 2007/01/10_16:49:24 debug: native_assign_node:
> Assigning pgbook to x_Dummy
> pengine[9820]: 2007/01/10_16:49:24 notice: StartRsc:  pgbook    Start x_Dummy
> pengine[9820]: 2007/01/10_16:49:24 notice: Recurring: pgbook
> x_Dummy_monitor_5000
>
> (btw. those are the node weights for the x_Dummy resource)

My intention was forcing failover when one of resources fails (by
monitor or start). Is anything wrong with my configuration or are
out-of-bounds the problem?

Palo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-input-47.bz2
Type: application/x-bzip2
Size: 1646 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20070110/bc5cad58/pe-input-47.bin


More information about the Linux-HA mailing list