[Linux-HA] Unexplained failovers

Joe Kemp jkemp at capwin.org
Mon Jul 11 11:41:32 MDT 2005


Should the WARN message occur at the warntime and the server01 wants to
go standby occur at the deadtime? (i.e. 20 seconds elapse before
failover)

I am running version 1.2.1, anyone remember ping issues like this in
previous versions?  It just failed over again.  I bumped DEBUG up to 2,
maybe that will give me more info.  

-----Original Message-----
From: linux-ha-bounces at lists.linux-ha.org
[mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of Alan Robertson
Sent: Monday, July 11, 2005 11:01 AM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Unexplained failovers

Joe Kemp wrote:
> I have a server that seems to be failing over with no cause.
Everything 
> was fine for about 1 year.  I rebooted the server a week ago and now 
> every day or two it fails over.  There does not appear to be anything 
> wrong with the box.  Do the log entries below indicate that it was 
> unable to ping the servers in the ping group group1?  If so how long 
> would it have been trying to ping them, if deadtime is set at 30
seconds 
> why did it fail over 4 seconds after the warning was written to the
log 
> file?  I had a ping running every second and it did not fail when the 
> box failed over.   Any ideas?
> 
>  
> 
>  
> 
> HA-LOG server01
> 
> heartbeat: 2005/07/10_00:30:12 WARN: node group1: is dead
> 
> heartbeat: 2005/07/10_00:30:12 info: Link group1:group1 dead.
> 
> heartbeat: 2005/07/10_00:30:12 info: Running /etc/ha.d/rc.d/status
status
> 
> heartbeat: 2005/07/10_00:30:16 info: server01 wants to go standby
[all]
> 
> heartbeat: 2005/07/10_00:30:16 info: standby: server02 can take our
all 
> resources
> 
> heartbeat: 2005/07/10_00:30:16 info: give up all HA resources
(standby).
> 
> heartbeat: 2005/07/10_00:30:16 info: Releasing resource group:
server02 
> 192.168.1.80 runjabber
> 
> heartbeat: 2005/07/10_00:30:16 info: Running /etc/init.d/runjabber
stop
> 
> heartbeat: 2005/07/10_00:30:18 info: Running
/etc/ha.d/resource.d/IPaddr 
> 192.168.1.80 stop
> 
>  
> 
> HA-DEBUG server01
> 
> heartbeat: 2005/07/10_00:30:12 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> 
> heartbeat: 2005/07/10_00:30:17 debug: Starting /etc/init.d/runjabber
stop
> 
>  
> 
> HARESOURCES
> 
> server02 192.168.1.80 runjabber
> 
>  
> 
> HA.CF
> 
> keepalive 2
> 
> deadtime 30
> 
> warntime 10
> 
> baud    19200
> 
> serial  /dev/ttyS0      # Linux
> 
> bcast   eth0            # Linux
> 
> auto_failback off
> 
> node    server01
> 
> node    server02
> 
> ping_group group1 192.168.1.10 192.168.1.11 192.168.1.12 192.168.1.13


This does indeed seem to indicate that it can't ping that resource.  If 
it's marked dead, that's because 'deadtime' has already passed without 
hearing any ping responses from any node in the ping group...

Why it didn't hear any pings from any of those nodes I can't tell you...

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha




More information about the Linux-HA mailing list