[Linux-HA] New problem(s) with heartbeat 2.0.3 and STONITH

Stefan Peinkofer peinkofe at fhm.edu
Wed Oct 26 09:54:50 MDT 2005

Hello everybody,

unforunately I have new prolbems with the heartbeat 2.0.3 cvs version
and stonith.

I ran a cvs heartbeat which was checked out on 2005-10-18 and
encountered a problem with stonithd which was killed by signal 11.
The effects were that the stonith resources were NOT_ACTIVE and when I
initiated a split brain no node could fence the other off.

I thought maybe it's already fixed in cvs and checkout a version today
(2005-10-26). But unfortunately this version seems to contain a even
worse problem with stonith. 

After I startup heartbeat on the two nodes, and wait until it's started
up completely I initiated the split brain situation. I had expected that
this works as expected because both stonith resources were active.

In the logs I saw:
Oct 26 17:30:53 spock pengine: [20031]: WARN: mask(stages.c:stage6):
Scheduling Node sarek for STONITH
Thats what I want :)
But then the following message appeared:
Oct 26 17:31:03 spock tengine: [20030]: ERROR: stonithd_node_fence:
cannot add field to ha_msg.

And no node kills the other. The try it over and over again but it
breaks always with the above message.

I have attached the complete logfile of the DC. As well as my ha.cf and
the cib.xml.
Note that both nodes have the problem.

My system: two RHEL 4 Update 2 Kernel 2.6.0-11ELsmp
2 wti_nps power switches.

Many thanks in advance.

Stefan Peinkofer
Stefan Peinkofer
Zentrum fuer angewandte Kommunikationstechnologien (ZaK)
Fachhochschule Muenchen, Munich University of Applied Sciences
URL: http://www.fhm.edu/zak/
-------------- next part --------------
############# Heartbeat configuration ##########
# Define the cluster nodes
node spock
node sarek

# Define cluster interconnects
bcast eth3
#bcast bond0
#serial /dev/ttyS1
#boud 19200
debug 1
# Define timings
deadtime 30
warntime 10
keepalive 2
#initdead 240
initdead 30
# Define ping nodes
ping gomtuu.rz.fh-muenchen.de nagilum.rz.fh-muenchen.de infotest.rz.fh-muenchen.de infotst2.rz.fh-muenchen.de

# Define failback behaviour
#auto_faiback on

# Define watchdog
#watchdog /dev/watchdog

# Define asynchron logging
use_logd yes

# Define heartbeat 2 style 
crm yes
