[Linux-HA] Unexplained failovers

Joe Kemp jkemp at capwin.org
Mon Jul 11 07:53:39 MDT 2005


I have a server that seems to be failing over with no cause.  Everything
was fine for about 1 year.  I rebooted the server a week ago and now
every day or two it fails over.  There does not appear to be anything
wrong with the box.  Do the log entries below indicate that it was
unable to ping the servers in the ping group group1?  If so how long
would it have been trying to ping them, if deadtime is set at 30 seconds
why did it fail over 4 seconds after the warning was written to the log
file?  I had a ping running every second and it did not fail when the
box failed over.   Any ideas?

 

 

HA-LOG server01

heartbeat: 2005/07/10_00:30:12 WARN: node group1: is dead

heartbeat: 2005/07/10_00:30:12 info: Link group1:group1 dead.

heartbeat: 2005/07/10_00:30:12 info: Running /etc/ha.d/rc.d/status
status

heartbeat: 2005/07/10_00:30:16 info: server01 wants to go standby [all]

heartbeat: 2005/07/10_00:30:16 info: standby: server02 can take our all
resources

heartbeat: 2005/07/10_00:30:16 info: give up all HA resources (standby).

heartbeat: 2005/07/10_00:30:16 info: Releasing resource group: server02
192.168.1.80 runjabber

heartbeat: 2005/07/10_00:30:16 info: Running /etc/init.d/runjabber  stop

heartbeat: 2005/07/10_00:30:18 info: Running /etc/ha.d/resource.d/IPaddr
192.168.1.80 stop

 

HA-DEBUG server01

heartbeat: 2005/07/10_00:30:12 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL

heartbeat: 2005/07/10_00:30:17 debug: Starting /etc/init.d/runjabber
stop

 

HARESOURCES

server02 192.168.1.80 runjabber

 

HA.CF

keepalive 2

deadtime 30

warntime 10

baud    19200

serial  /dev/ttyS0      # Linux

bcast   eth0            # Linux

auto_failback off

node    server01

node    server02

ping_group group1 192.168.1.10 192.168.1.11 192.168.1.12 192.168.1.13

 

 

Joe A. Kemp

CapWIN Senior Systems Architect

6305 Ivy Lane Suite 300

Greenbelt, MD 20770

(P) 301-614-3727

(F) 301-614-0581

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20050711/a89d3534/attachment.html>


More information about the Linux-HA mailing list