[Linux-HA] Failover if one Node has no power
dejanmm at fastmail.fm
Tue Dec 9 07:35:33 MST 2008
On Mon, Dec 08, 2008 at 10:02:49AM +0100, Adrian Schoene wrote:
> Hi there,
> I have a SLES 10 SP2 based two node cluster. The cluster is stonith
> and uses IPMI to kill a dead node.
> Finally I am testing the cluster and the behavior of the cluster if a node
> I used iptables to block the udp packages of a node. After a short time
> node get stonithed and the alive node take over the ressources of the dead
> I tested the same thing with plugging off the power cables - with success.
> In my last test I forgot plug in the power cable and the failover failed
> the alive node tries to reset / kill the dead node.
> stonithd: 2008/12/08_09:28:08 info: external_run_cmd: Calling
> '/usr/lib64/stonith/plugins/external/ipmi off bdmz02' returned 256
> stonithd: 2008/12/08_09:28:08 CRIT: external_reset_req: 'ipmi off'
> for host bdmz02 failed with rc 256
> stonithd: 2008/12/08_09:28:08 info: Failed to STONITH node bdmz02
> with one local device, exitcode = 5. Will try to use the next local
> stonithd: 2008/12/08_09:28:29 ERROR: Failed to STONITH the node
> bdmz02: optype=POWEROFF, op_result=TIMEOUT
> After plugging in the cable (but not starting the server) the server
> recognizes that the stonith of the server is back to life
> and the cluster will start the failover.
> How can I manage or solve this problem because it can happen that one
> server room loose the power unit
> and therefore the server has no power.
You can't solve it. No power, no stonith device, no fencing, and
the cluster will wait forever. Either get a UPS based fencing
device, or make sure that there's power (if you can), or document
a procedure for manual failover on power loss.
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA