[Linux-HA] ipfail on 2.6.10 or 2.6.11

Sheldon Hearn sheldonh at starjuice.net
Mon Mar 28 09:53:41 MST 2005


Hi there,

I have a firewall cluster running heartbeat-1.2.3 and
linux-2.6.9-gentoo-13.  I installed heartbeat using the Gentoo portage
system.

Today, I tried to upgrade to linux-2.6.11-gentoo-r4. I used hb_takeover
on fw2 (still running the previous kernel) to take over all IPAddr2
resources, then booted the new kernel on fw1.

With this new kernel, heartbeat fails on fw1 startup, with the log
messages shown below. Although I see messages stating that ping nodes
are not available, they are actually pingable during the failure.

Naturally, I'm too scared to try upgrading _both_ kernels, in case I
completely lose my cluster (which is remote).

I've also included fw1's ha.cf.

I haven't found anything relating to this in the FAQ, and the two hits
in the archives didn't produce an answer.  Do other folks have ipfail
working on 2.6.10 and 2.6.11?

Ciao,
Sheldon.

----- /var/log/messages excerpt
Mar 28 13:39:56 fw1 heartbeat[3626]: info: **************************
Mar 28 13:39:56 fw1 heartbeat[3626]: info: Configuration validated.
Starting heartbeat 1.2.3
Mar 28 13:39:56 fw1 heartbeat[3627]: info: heartbeat: version 1.2.3
Mar 28 13:39:56 fw1 heartbeat[3627]: info: Heartbeat generation: 6
Mar 28 13:39:57 fw1 heartbeat[3627]: info: Starting serial heartbeat on
tty /dev/ttyS0 (19200 baud)
Mar 28 13:39:57 fw1 heartbeat[3627]: info: ping heartbeat started.
Mar 28 13:39:57 fw1 heartbeat[3627]: info: ping heartbeat started.
Mar 28 13:39:57 fw1 heartbeat[3627]: info: pid 3627 locked in memory.
Mar 28 13:39:57 fw1 heartbeat[3627]: info: Local status now set to: 'up'
Mar 28 13:39:58 fw1 heartbeat[3752]: info: pid 3752 locked in memory.
Mar 28 13:39:58 fw1 heartbeat[3753]: info: pid 3753 locked in memory.
Mar 28 13:39:58 fw1 heartbeat[3756]: info: pid 3756 locked in memory.
Mar 28 13:39:58 fw1 heartbeat[3758]: info: pid 3758 locked in memory.
Mar 28 13:39:58 fw1 heartbeat[3755]: info: pid 3755 locked in memory.
Mar 28 13:39:58 fw1 heartbeat[3757]: info: pid 3757 locked in memory.
Mar 28 13:39:58 fw1 heartbeat[3627]: ERROR: Exiting HBWRITE process 3755
killed by signal 11.
Mar 28 13:39:58 fw1 heartbeat[3627]: ERROR: Core heartbeat process died!
Restartir 28 13:39:58 fw1 heartbeat[3627]: info: Status update for node
fw2: status active
Mar 28 13:39:58 fw1 heartbeat[4044]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Mar 28 13:39:58 fw1 heartbeat: info: Running /etc/ha.d/rc.d/status
status
Mar 28 13:40:17 fw1 heartbeat[3627]: WARN: node 10.0.0.2: is dead
Mar 28 13:40:17 fw1 heartbeat[3627]: WARN: node [hidden]: is dead
Mar 28 13:40:17 fw1 heartbeat[3627]: debug: StartNextRemoteRscReq():
child count 1
Mar 28 13:40:17 fw1 heartbeat[3627]: info: Local status now set to:
'active'
Mar 28 13:40:17 fw1 heartbeat[3627]: info: Starting child client
"/usr/lib/heartbeat/ipfail" (65,65)
Mar 28 13:40:17 fw1 heartbeat[4064]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Mar 28 13:40:17 fw1 heartbeat[4065]: info: Starting
"/usr/lib/heartbeat/ipfail" as uid 65  gid 65 (pid 4065)
Mar 28 13:40:17 fw1 heartbeat: info: Running /etc/ha.d/rc.d/status
status
Mar 28 13:40:17 fw1 ipfail[4065]: debug: Signing in with heartbeat
Mar 28 13:40:17 fw1 heartbeat[4073]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Mar 28 13:40:17 fw1 heartbeat: info: Running /etc/ha.d/rc.d/status
status
Mar 28 13:40:17 fw1 heartbeat[3627]: info: remote resource transition
completed.
Mar 28 13:40:17 fw1 heartbeat[3627]: info: remote resource transition
completed.
Mar 28 13:40:17 fw1 heartbeat[3627]: info: Local Resource acquisition
completed. (none)
Mar 28 13:40:17 fw1 heartbeat[3627]: info: Initial resource acquisition
complete (T_RESOURCES(them))
Mar 28 13:40:17 fw1 heartbeat[3627]: info: Heartbeat shutdown in
progress. (3627)
Mar 28 13:40:17 fw1 heartbeat[4081]: info: Giving up all HA resources.
ng.
Mar 28 13:39:58 fw1 heartbeat[3627]: WARN: Shutdown delayed until
current resource activity finishes.
Mar 28 13:39:58 fw1 heartbeat[3754]: info: pid 3754 locked in memory.
Mar 28 13:39:58 fw1 heartbeat[3627]: ERROR: Exiting HBWRITE process 3757
killed by signal 11.
Mar 28 13:39:58 fw1 heartbeat[3627]: ERROR: Core heartbeat process died!
Restarting.
Mar 28 13:39:58 fw1 heartbeat[3627]: WARN: string2msg_ll: node [fw2]
failed authentication
Mar 28 13:39:58 fw1 heartbeat[3627]: info: Link fw2:/dev/ttyS0 up.
...
----- /etc/ha.d/ha.cf
debugfile /var/log/ha-debug
logfacility	local0
keepalive 1
deadtime 10
baud	19200
serial	/dev/ttyS0	# Linux
auto_failback off
node	fw1
node	fw2
ping 10.0.0.2 [hidden]
respawn cluster /usr/lib/heartbeat/ipfail

deadping 10
apiauth	ipfail gid=cluster uid=cluster
apiauth ccm uid=cluster





More information about the Linux-HA mailing list