[Linux-HA] Failover if one Node has no power

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Dec 9 07:35:33 MST 2008


On Mon, Dec 08, 2008 at 10:02:49AM +0100, Adrian Schoene wrote:
> Hi there,
> I have a SLES 10 SP2 based two node cluster. The cluster is stonith 
> enabled
> and uses IPMI to kill a dead node.
> Finally I am testing the cluster and the behavior of the cluster if a node 
> fails. 
> I used iptables to block the udp packages of a node. After a short time 
> the 
> node get stonithed and the alive node take over the ressources of the dead 
> node.
> I tested the same thing with plugging off the power cables - with success.
> In my last test I forgot plug in the power cable and the failover failed 
> because
> the alive node tries to reset / kill the dead node.
> stonithd[6810]: 2008/12/08_09:28:08 info: external_run_cmd: Calling 
> '/usr/lib64/stonith/plugins/external/ipmi off bdmz02' returned 256
> stonithd[6810]: 2008/12/08_09:28:08 CRIT: external_reset_req: 'ipmi off' 
> for host bdmz02 failed with rc 256
> stonithd[7151]: 2008/12/08_09:28:08 info: Failed to STONITH node bdmz02 
> with one local device, exitcode = 5. Will try to use the next local 
> device.
> stonithd[7151]: 2008/12/08_09:28:29 ERROR: Failed to STONITH the node 
> bdmz02: optype=POWEROFF, op_result=TIMEOUT
> After plugging in the cable (but not starting the server) the server 
> recognizes that the stonith of the server is back to life
> and the cluster will start the failover.
> How can I manage or solve this problem because it can happen that one 
> server room loose the power unit
> and therefore the server has no power.

You can't solve it. No power, no stonith device, no fencing, and
the cluster will wait forever. Either get a UPS based fencing
device, or make sure that there's power (if you can), or document
a procedure for manual failover on power loss.



> Greetings,
> Adrian
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

More information about the Linux-HA mailing list