[Linux-HA] New problem(s) with heartbeat 2.0.3 and STONITH
alanr at unix.sh
Wed Oct 26 10:14:42 MDT 2005
Stefan Peinkofer wrote:
> Hello everybody,
> unforunately I have new prolbems with the heartbeat 2.0.3 cvs version
> and stonith.
> I ran a cvs heartbeat which was checked out on 2005-10-18 and
> encountered a problem with stonithd which was killed by signal 11.
> The effects were that the stonith resources were NOT_ACTIVE and when I
> initiated a split brain no node could fence the other off.
> I thought maybe it's already fixed in cvs and checkout a version today
> (2005-10-26). But unfortunately this version seems to contain a even
> worse problem with stonith.
> After I startup heartbeat on the two nodes, and wait until it's started
> up completely I initiated the split brain situation. I had expected that
> this works as expected because both stonith resources were active.
> In the logs I saw:
> Oct 26 17:30:53 spock pengine: : WARN: mask(stages.c:stage6):
> Scheduling Node sarek for STONITH
> Thats what I want :)
> But then the following message appeared:
> Oct 26 17:31:03 spock tengine: : ERROR: stonithd_node_fence:
> cannot add field to ha_msg.
> And no node kills the other. The try it over and over again but it
> breaks always with the above message.
> I have attached the complete logfile of the DC. As well as my ha.cf and
> the cib.xml.
> Note that both nodes have the problem.
> My system: two RHEL 4 Update 2 Kernel 2.6.0-11ELsmp
> 2 wti_nps power switches.
IIRC used to see the signal 11 stuff in our testing a few months ago,
but it went away - so we could't fix it.
Can you get us the stack trace from the core dump from this occurance?
It's odd that the monitoring of the STONITH objects didn't detect that
they weren't running any more. Guess we'll have to look at the logs
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
More information about the Linux-HA