[Linux-HA] Unexplained failovers
Alan Robertson
alanr at unix.sh
Mon Jul 11 09:00:39 MDT 2005
Joe Kemp wrote:
> I have a server that seems to be failing over with no cause. Everything
> was fine for about 1 year. I rebooted the server a week ago and now
> every day or two it fails over. There does not appear to be anything
> wrong with the box. Do the log entries below indicate that it was
> unable to ping the servers in the ping group group1? If so how long
> would it have been trying to ping them, if deadtime is set at 30 seconds
> why did it fail over 4 seconds after the warning was written to the log
> file? I had a ping running every second and it did not fail when the
> box failed over. Any ideas?
>
>
>
>
>
> HA-LOG server01
>
> heartbeat: 2005/07/10_00:30:12 WARN: node group1: is dead
>
> heartbeat: 2005/07/10_00:30:12 info: Link group1:group1 dead.
>
> heartbeat: 2005/07/10_00:30:12 info: Running /etc/ha.d/rc.d/status status
>
> heartbeat: 2005/07/10_00:30:16 info: server01 wants to go standby [all]
>
> heartbeat: 2005/07/10_00:30:16 info: standby: server02 can take our all
> resources
>
> heartbeat: 2005/07/10_00:30:16 info: give up all HA resources (standby).
>
> heartbeat: 2005/07/10_00:30:16 info: Releasing resource group: server02
> 192.168.1.80 runjabber
>
> heartbeat: 2005/07/10_00:30:16 info: Running /etc/init.d/runjabber stop
>
> heartbeat: 2005/07/10_00:30:18 info: Running /etc/ha.d/resource.d/IPaddr
> 192.168.1.80 stop
>
>
>
> HA-DEBUG server01
>
> heartbeat: 2005/07/10_00:30:12 debug: notify_world: setting SIGCHLD
> Handler to SIG_DFL
>
> heartbeat: 2005/07/10_00:30:17 debug: Starting /etc/init.d/runjabber stop
>
>
>
> HARESOURCES
>
> server02 192.168.1.80 runjabber
>
>
>
> HA.CF
>
> keepalive 2
>
> deadtime 30
>
> warntime 10
>
> baud 19200
>
> serial /dev/ttyS0 # Linux
>
> bcast eth0 # Linux
>
> auto_failback off
>
> node server01
>
> node server02
>
> ping_group group1 192.168.1.10 192.168.1.11 192.168.1.12 192.168.1.13
This does indeed seem to indicate that it can't ping that resource. If
it's marked dead, that's because 'deadtime' has already passed without
hearing any ping responses from any node in the ping group...
Why it didn't hear any pings from any of those nodes I can't tell you...
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
More information about the Linux-HA
mailing list