[Linux-HA] ipfail on 2.6.10 or 2.6.11
Guochun Shi
gshi at ncsa.uiuc.edu
Mon Mar 28 10:37:56 MST 2005
There were complains about some 2.6 kernels and ipfail.
Sun Jiang Dong <hasjd at cn.ibm.com> has tested 1.2.3 with ipfail in 2.6.7 and 2.6.10, it works fine.
-Guochun
At 06:53 PM 3/28/2005 +0200, you wrote:
>Hi there,
>
>I have a firewall cluster running heartbeat-1.2.3 and
>linux-2.6.9-gentoo-13. I installed heartbeat using the Gentoo portage
>system.
>
>Today, I tried to upgrade to linux-2.6.11-gentoo-r4. I used hb_takeover
>on fw2 (still running the previous kernel) to take over all IPAddr2
>resources, then booted the new kernel on fw1.
>
>With this new kernel, heartbeat fails on fw1 startup, with the log
>messages shown below. Although I see messages stating that ping nodes
>are not available, they are actually pingable during the failure.
>
>Naturally, I'm too scared to try upgrading _both_ kernels, in case I
>completely lose my cluster (which is remote).
>
>I've also included fw1's ha.cf.
>
>I haven't found anything relating to this in the FAQ, and the two hits
>in the archives didn't produce an answer. Do other folks have ipfail
>working on 2.6.10 and 2.6.11?
>
>Ciao,
>Sheldon.
>
>----- /var/log/messages excerpt
>Mar 28 13:39:56 fw1 heartbeat[3626]: info: **************************
>Mar 28 13:39:56 fw1 heartbeat[3626]: info: Configuration validated.
>Starting heartbeat 1.2.3
>Mar 28 13:39:56 fw1 heartbeat[3627]: info: heartbeat: version 1.2.3
>Mar 28 13:39:56 fw1 heartbeat[3627]: info: Heartbeat generation: 6
>Mar 28 13:39:57 fw1 heartbeat[3627]: info: Starting serial heartbeat on
>tty /dev/ttyS0 (19200 baud)
>Mar 28 13:39:57 fw1 heartbeat[3627]: info: ping heartbeat started.
>Mar 28 13:39:57 fw1 heartbeat[3627]: info: ping heartbeat started.
>Mar 28 13:39:57 fw1 heartbeat[3627]: info: pid 3627 locked in memory.
>Mar 28 13:39:57 fw1 heartbeat[3627]: info: Local status now set to: 'up'
>Mar 28 13:39:58 fw1 heartbeat[3752]: info: pid 3752 locked in memory.
>Mar 28 13:39:58 fw1 heartbeat[3753]: info: pid 3753 locked in memory.
>Mar 28 13:39:58 fw1 heartbeat[3756]: info: pid 3756 locked in memory.
>Mar 28 13:39:58 fw1 heartbeat[3758]: info: pid 3758 locked in memory.
>Mar 28 13:39:58 fw1 heartbeat[3755]: info: pid 3755 locked in memory.
>Mar 28 13:39:58 fw1 heartbeat[3757]: info: pid 3757 locked in memory.
>Mar 28 13:39:58 fw1 heartbeat[3627]: ERROR: Exiting HBWRITE process 3755
>killed by signal 11.
>Mar 28 13:39:58 fw1 heartbeat[3627]: ERROR: Core heartbeat process died!
>Restartir 28 13:39:58 fw1 heartbeat[3627]: info: Status update for node
>fw2: status active
>Mar 28 13:39:58 fw1 heartbeat[4044]: debug: notify_world: setting
>SIGCHLD Handler to SIG_DFL
>Mar 28 13:39:58 fw1 heartbeat: info: Running /etc/ha.d/rc.d/status
>status
>Mar 28 13:40:17 fw1 heartbeat[3627]: WARN: node 10.0.0.2: is dead
>Mar 28 13:40:17 fw1 heartbeat[3627]: WARN: node [hidden]: is dead
>Mar 28 13:40:17 fw1 heartbeat[3627]: debug: StartNextRemoteRscReq():
>child count 1
>Mar 28 13:40:17 fw1 heartbeat[3627]: info: Local status now set to:
>'active'
>Mar 28 13:40:17 fw1 heartbeat[3627]: info: Starting child client
>"/usr/lib/heartbeat/ipfail" (65,65)
>Mar 28 13:40:17 fw1 heartbeat[4064]: debug: notify_world: setting
>SIGCHLD Handler to SIG_DFL
>Mar 28 13:40:17 fw1 heartbeat[4065]: info: Starting
>"/usr/lib/heartbeat/ipfail" as uid 65 gid 65 (pid 4065)
>Mar 28 13:40:17 fw1 heartbeat: info: Running /etc/ha.d/rc.d/status
>status
>Mar 28 13:40:17 fw1 ipfail[4065]: debug: Signing in with heartbeat
>Mar 28 13:40:17 fw1 heartbeat[4073]: debug: notify_world: setting
>SIGCHLD Handler to SIG_DFL
>Mar 28 13:40:17 fw1 heartbeat: info: Running /etc/ha.d/rc.d/status
>status
>Mar 28 13:40:17 fw1 heartbeat[3627]: info: remote resource transition
>completed.
>Mar 28 13:40:17 fw1 heartbeat[3627]: info: remote resource transition
>completed.
>Mar 28 13:40:17 fw1 heartbeat[3627]: info: Local Resource acquisition
>completed. (none)
>Mar 28 13:40:17 fw1 heartbeat[3627]: info: Initial resource acquisition
>complete (T_RESOURCES(them))
>Mar 28 13:40:17 fw1 heartbeat[3627]: info: Heartbeat shutdown in
>progress. (3627)
>Mar 28 13:40:17 fw1 heartbeat[4081]: info: Giving up all HA resources.
>ng.
>Mar 28 13:39:58 fw1 heartbeat[3627]: WARN: Shutdown delayed until
>current resource activity finishes.
>Mar 28 13:39:58 fw1 heartbeat[3754]: info: pid 3754 locked in memory.
>Mar 28 13:39:58 fw1 heartbeat[3627]: ERROR: Exiting HBWRITE process 3757
>killed by signal 11.
>Mar 28 13:39:58 fw1 heartbeat[3627]: ERROR: Core heartbeat process died!
>Restarting.
>Mar 28 13:39:58 fw1 heartbeat[3627]: WARN: string2msg_ll: node [fw2]
>failed authentication
>Mar 28 13:39:58 fw1 heartbeat[3627]: info: Link fw2:/dev/ttyS0 up.
>...
>----- /etc/ha.d/ha.cf
>debugfile /var/log/ha-debug
>logfacility local0
>keepalive 1
>deadtime 10
>baud 19200
>serial /dev/ttyS0 # Linux
>auto_failback off
>node fw1
>node fw2
>ping 10.0.0.2 [hidden]
>respawn cluster /usr/lib/heartbeat/ipfail
>
>deadping 10
>apiauth ipfail gid=cluster uid=cluster
>apiauth ccm uid=cluster
>
>
>
>_______________________________________________
>Linux-HA mailing list
>Linux-HA at lists.linux-ha.org
>http://lists.linux-ha.org/mailman/listinfo/linux-ha
More information about the Linux-HA
mailing list