[Linux-HA] New problem(s) with heartbeat 2.0.3 and STONITH
peinkofe at fhm.edu
Wed Oct 26 10:32:19 MDT 2005
On Wed, 2005-10-26 at 10:14 -0600, Alan Robertson wrote:
> Stefan Peinkofer wrote:
> > Hello everybody,
> > unforunately I have new prolbems with the heartbeat 2.0.3 cvs version
> > and stonith.
> > I ran a cvs heartbeat which was checked out on 2005-10-18 and
> > encountered a problem with stonithd which was killed by signal 11.
> > The effects were that the stonith resources were NOT_ACTIVE and when I
> > initiated a split brain no node could fence the other off.
> > I thought maybe it's already fixed in cvs and checkout a version today
> > (2005-10-26). But unfortunately this version seems to contain a even
> > worse problem with stonith.
> > After I startup heartbeat on the two nodes, and wait until it's started
> > up completely I initiated the split brain situation. I had expected that
> > this works as expected because both stonith resources were active.
> > In the logs I saw:
> > Oct 26 17:30:53 spock pengine: : WARN: mask(stages.c:stage6):
> > Scheduling Node sarek for STONITH
> > Thats what I want :)
> > But then the following message appeared:
> > Oct 26 17:31:03 spock tengine: : ERROR: stonithd_node_fence:
> > cannot add field to ha_msg.
> > And no node kills the other. The try it over and over again but it
> > breaks always with the above message.
> > I have attached the complete logfile of the DC. As well as my ha.cf and
> > the cib.xml.
> > Note that both nodes have the problem.
> > My system: two RHEL 4 Update 2 Kernel 2.6.0-11ELsmp
> > 2 wti_nps power switches.
> IIRC used to see the signal 11 stuff in our testing a few months ago,
> but it went away - so we could't fix it.
> Can you get us the stack trace from the core dump from this occurance?
Sorry, my problem description may be ambiguous. I'm talking about two
presumably independent problems. Problem 1 is the 'killed by signal 11'
problem. That was the reason why I updated my heartbeat to a more recent
cvs version. Unfortunately I haven't keep the logs of this problem.
(Because I wanted to use the more recent cvs version to provide logs and
Problem 2 is the problem with 'cannot add field to ha_msg' and it
appeared with the more recent cvs version. The logs attached are for
Prolbem 2. I will be able to provide logs, cores and stuff for Problem 1
if Problem 2 is fixed (since it takes place before Problem 1 occours).
I hope I did a better job this time.
Many thanks in advance.
> It's odd that the monitoring of the STONITH objects didn't detect that
> they weren't running any more. Guess we'll have to look at the logs
> more closely.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 189 bytes
Desc: This is a digitally signed message part
More information about the Linux-HA