[Linux-HA] New problem(s) with heartbeat 2.0.3 and STONITH
pk at q-leap.com
Thu Oct 27 03:05:29 MDT 2005
Andrew Beekhof wrote:
> On 10/26/05, Stefan Peinkofer <peinkofe at fhm.edu> wrote:
>>On Wed, 2005-10-26 at 10:14 -0600, Alan Robertson wrote:
>>>Stefan Peinkofer wrote:
>>>>I ran a cvs heartbeat which was checked out on 2005-10-18 and
>>>>encountered a problem with stonithd which was killed by signal 11.
>>>>The effects were that the stonith resources were NOT_ACTIVE and when I
>>>>initiated a split brain no node could fence the other off.
I have a similair problem, or maybe the same. I looks like
stonithd can not recover if the connection to the powerswitch
was lost. It stays in "NOT ACTIVE" state in output of "crm_mon -1".
In my setup there are two nodes and one powerswitch (the stonith
device). The stonith resources are configured as clones. The
nodes are connected via a crossover-cable and
with another link to a switch to a network where they can
reach each other (the second hearbeat link) and the powerswitch.
When I pull the network cable of this public link of the active
node, a failover occurs (yeah!), and "crm_mon -1" shows:
Clone Set: fence1
fence1:apc1:0 (stonith:apcmastersnmp): NOT ACTIVE
fence1:apc1:1 (stonith:apcmastersnmp): ha-test-2
Unfortunately it stays like this even when I connect the
network again. This is with current CVS, and see attached logs
So the stonithd does not seem to be able to recover from
a connection loss.
What would I have to do to manually restart stonithd such
that heartbeat marks the device as "ACTIVE"?
More information about the Linux-HA