[Linux-HA] New problem(s) with heartbeat 2.0.3 and STONITH

Peter Kruse pk at q-leap.com
Thu Oct 27 03:05:29 MDT 2005


Hello all,

Andrew Beekhof wrote:
> On 10/26/05, Stefan Peinkofer <peinkofe at fhm.edu> wrote:
> 
>>Hello Alan,
>>On Wed, 2005-10-26 at 10:14 -0600, Alan Robertson wrote:
>>
>>>Stefan Peinkofer wrote:
>>>
>>>>I ran a cvs heartbeat which was checked out on 2005-10-18 and
>>>>encountered a problem with stonithd which was killed by signal 11.
>>>>The effects were that the stonith resources were NOT_ACTIVE and when I
>>>>initiated a split brain no node could fence the other off.
>>>>

I have a similair problem, or maybe the same.  I looks like
stonithd can not recover if the connection to the powerswitch
was lost.  It stays in "NOT ACTIVE" state in output of "crm_mon -1".
In my setup there are  two nodes and one powerswitch (the stonith
device). The stonith resources are configured as clones.  The
nodes are connected via a crossover-cable and
with another link to a switch to a network where they can
reach each other (the second hearbeat link) and the powerswitch.
When I pull the network cable of this public link of the active
node, a failover occurs (yeah!), and "crm_mon -1" shows:

Clone Set: fence1
     fence1:apc1:0 (stonith:apcmastersnmp):      NOT ACTIVE
     fence1:apc1:1 (stonith:apcmastersnmp):      ha-test-2

Unfortunately it stays like this even when I connect the
network again.  This is with current CVS, and see attached logs
and cib.xml.
So the stonithd does not seem to be able to recover from
a connection loss.
What would I have to do to manually restart stonithd such
that heartbeat marks the device as "ACTIVE"?

Regards,

	Peter



More information about the Linux-HA mailing list