[Linux-HA] possible bug in hb_resource.c
Greg Dickie
greg at max-t.com
Thu Dec 14 19:56:44 MST 2006
Hi,
I'm new to the list so apologies if this is a silly question but I'm
seeing what seems to be a bug in the handling of a failure of a stonith
event. See log excerpt below:
heartbeat: 2006/12/14_15:49:44 WARN: node sledgehammer: is dead
heartbeat: 2006/12/14_15:49:44 info: Link sledgehammer:eth1 dead.
heartbeat: 2006/12/14_15:49:44 info: Resetting node sledgehammer with
[ipmilan STONITH device]
heartbeat: 2006/12/14_15:49:54 ERROR: Host sledgehammer not reset!
heartbeat: 2006/12/14_15:49:54 WARN: Exiting STONITH sledgehammer
process 6351 returned rc 1.
heartbeat: 2006/12/14_15:49:54 ERROR: STONITH of sledgehammer failed.
Retrying...
heartbeat: 2006/12/14_15:49:59 info: Resetting node ^R with [ipmilan
STONITH device]
heartbeat: 2006/12/14_15:49:59 ERROR: Host ^R not reset!
heartbeat: 2006/12/14_15:49:59 WARN: Exiting STONITH ^R process 6352
returned rc 1.
heartbeat: 2006/12/14_15:49:59 ERROR: STONITH of ^R failed. Retrying...
heartbeat: 2006/12/14_15:50:04 info: Resetting node b556^Q with [ipmilan
STONITH device]
heartbeat: 2006/12/14_15:50:04 ERROR: Host b556^Q not reset!
heartbeat: 2006/12/14_15:50:04 WARN: Exiting STONITH b556^Q process 6409
returned rc 1.
heartbeat: 2006/12/14_15:50:04 ERROR: STONITH of b556^Q failed.
Retrying...
heartbeat: 2006/12/14_15:50:09 info: Resetting node b7a8^Q with [ipmilan
STONITH device]
heartbeat: 2006/12/14_15:50:09 ERROR: Host b7a8^Q not reset!
This is an endless loop
As you can see it looks like after the first try somehow the host to
reset is screwed up like its been freed somewhere. This is version 1.2.4
but the code seems to be the same in 1.2.5. Unfortunately I have other
problems with 2.0.7.
Has anyone seen this before?
Thanks alot,
Greg
--
Greg Dickie
just a guy
Maximum Throughput
More information about the Linux-HA
mailing list