[Linux-HA] Problem with heartbeat and ipsec / machine reboots

Samendinger, Marc marc.samendinger at sp-online.de
Tue Jul 21 10:34:56 MDT 2009


Hi all,

I got a problem with my heartbeat configuration. I have two machines
that
should work in active/passive failover mode.
After a few starting problems with the heartbeat v2 configuration I
switched to heartbeat v1 configuration to keep it as simple as possible
for the beginning.

When I boot the master node and the slave isnt' started at all the
machine
boots up just fine the resources get started and everything works as
expected. Booting the slave and shutting down the master after a while
the
slave gets rebooted after taking over the resources and logging some
errors
relating to ipsec.
The same may happen when the master gets rebooted and tries to take over
the resources.

To be honest I'm a bit puzzled and don't know where the errors come from
or how to debug some more so I try to find some help on the mailling
list.
I read on wiki.linux-ha.org that its normal behaviour to reboot the
machine
if errors occur while stopping resources. I think thats where the
reboots
come from but I don't know why the machine has problems with stopping
the
resource. According to the log file the resource gets correctly stopped
but heartbeat tries over and over again to stop it.
I configured the init scripts so they comply to the rules in
http://wiki.linux-ha.org/LSBResourceAgent

I attached some configs and log snippets.

I'd be glad if anyone could help to light up the dark a little bit ;)

TIA
Marc

Both machines are Ubuntu Server LTS 0804 patched up to date.

Linux heartbeat-1 2.6.24-24-server #1 SMP Tue Jul 7 20:21:17 UTC 2009
i686 GNU/Linux

ii  heartbeat-2                           2.1.3-2
Subsystem for High-Availability Linux
ii  openswan                              1:2.4.9+dfsg-1build1
IPSEC utilities for Openswan

/etc/ha.d/haresources
heartbeat-1     10.85.118.245 ipsec

Jul 21 16:43:50 heartbeat-1 heartbeat: [4566]: info: Link
heartbeat-1:eth0 up.
Jul 21 16:43:50 heartbeat-1 harc[4656]: info: Running
/etc/ha.d/rc.d/status status
Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: Comm_now_up():
updating status to active
Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: Local status now
set to: 'active'
Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: remote resource
transition completed.
Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: remote resource
transition completed.
Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: Local Resource
acquisition completed. (none)
Jul 21 16:43:52 heartbeat-1 heartbeat: [4566]: info: heartbeat-2 wants
to go standby [foreign]
Jul 21 16:43:52 heartbeat-1 heartbeat: [4566]: info: standby: acquire
[foreign] resources from heartbeat-2
Jul 21 16:43:52 heartbeat-1 heartbeat: [4673]: info: acquire local HA
resources (standby).
Jul 21 16:43:52 heartbeat-1 heartbeat: [4673]: info: local HA resource
acquisition completed (standby).
Jul 21 16:43:52 heartbeat-1 heartbeat: [4566]: info: Standby resource
acquisition done [foreign].
Jul 21 16:43:52 heartbeat-1 heartbeat: [4566]: info: Initial resource
acquisition complete (auto_failback)
Jul 21 16:43:53 heartbeat-1 heartbeat: [4566]: info: remote resource
transition completed.
Jul 21 16:44:21 heartbeat-1 kernel: [  164.698146] input: AT Translated
Set 2 keyboard as /devices/platform/i8042/serio0/input/input4
Jul 21 16:45:26 heartbeat-1 heartbeat: [4566]: info: Received shutdown
notice from 'heartbeat-2'.
Jul 21 16:45:26 heartbeat-1 heartbeat: [4566]: info: Resources being
acquired from heartbeat-2.
Jul 21 16:45:26 heartbeat-1 heartbeat: [4728]: info: acquire local HA
resources (standby).
Jul 21 16:45:26 heartbeat-1 heartbeat: [4729]: info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys heartbeat-1] to acquire.
Jul 21 16:45:26 heartbeat-1 heartbeat: [4728]: info: local HA resource
acquisition completed (standby).
Jul 21 16:45:26 heartbeat-1 heartbeat: [4566]: info: Standby resource
acquisition done [foreign].
Jul 21 16:45:26 heartbeat-1 harc[4754]: info: Running
/etc/ha.d/rc.d/status status
Jul 21 16:45:26 heartbeat-1 mach_down[4768]: info: Taking over resource
group 10.85.118.245
Jul 21 16:45:26 heartbeat-1 ResourceManager[4792]: info: Acquiring
resource group: heartbeat-2 10.85.118.245 ipsec
Jul 21 16:45:26 heartbeat-1 IPaddr[4818]: INFO:  Resource is stopped
Jul 21 16:45:26 heartbeat-1 ResourceManager[4792]: info: Running
/etc/ha.d/resource.d/IPaddr 10.85.118.245 start
Jul 21 16:45:27 heartbeat-1 IPaddr[4889]: INFO: Using calculated nic for
10.85.118.245: eth2
Jul 21 16:45:27 heartbeat-1 IPaddr[4889]: INFO: Using calculated netmask
for 10.85.118.245: 255.255.0.0
Jul 21 16:45:27 heartbeat-1 IPaddr[4889]: INFO: eval ifconfig eth2:0
10.85.118.245 netmask 255.255.0.0 broadcast 10.85.255.255
Jul 21 16:45:27 heartbeat-1 IPaddr[4874]: INFO:  Success
Jul 21 16:45:27 heartbeat-1 kernel: [  230.782186] NET: Registered
protocol family 17
Jul 21 16:45:27 heartbeat-1 ResourceManager[4792]: info: Running
/etc/init.d/ipsec  start
Jul 21 16:45:27 heartbeat-1 kernel: [  231.002200] NET: Registered
protocol family 15
Jul 21 16:45:27 heartbeat-1 kernel: [  231.582742] Initializing XFRM
netlink socket
Jul 21 16:45:28 heartbeat-1 ResourceManager[4792]: CRIT: Giving up
resources due to failure of ipsec
Jul 21 16:45:28 heartbeat-1 ResourceManager[4792]: info: Releasing
resource group: heartbeat-2 10.85.118.245 ipsec
Jul 21 16:45:28 heartbeat-1 ResourceManager[4792]: info: Running
/etc/init.d/ipsec  stop
Jul 21 16:45:30 heartbeat-1 ResourceManager[4792]: info: Retrying failed
stop operation [ipsec]
Jul 21 16:45:30 heartbeat-1 ResourceManager[4792]: info: Running
/etc/init.d/ipsec  stop
Jul 21 16:45:30 heartbeat-1 kernel: [  234.467984] NET: Unregistered
protocol family 15
Jul 21 16:45:31 heartbeat-1 ResourceManager[4792]: info: Retrying failed
stop operation [ipsec]
Jul 21 16:45:31 heartbeat-1 ResourceManager[4792]: info: Running
/etc/init.d/ipsec  stop
Jul 21 16:45:32 heartbeat-1 ResourceManager[4792]: info: Retrying failed
stop operation [ipsec]
Jul 21 16:45:32 heartbeat-1 ResourceManager[4792]: info: Running
/etc/init.d/ipsec  stop
Jul 21 16:45:33 heartbeat-1 ResourceManager[4792]: info: Retrying failed
stop operation [ipsec]
Jul 21 16:45:34 heartbeat-1 ResourceManager[4792]: info: Running
/etc/init.d/ipsec  stop
Jul 21 16:45:35 heartbeat-1 ResourceManager[4792]: info: Retrying failed
stop operation [ipsec]
Jul 21 16:45:35 heartbeat-1 ResourceManager[4792]: info: Running
/etc/init.d/ipsec  stop
Jul 21 16:45:36 heartbeat-1 ResourceManager[4792]: info: Retrying failed
stop operation [ipsec]
Jul 21 16:45:36 heartbeat-1 ResourceManager[4792]: info: Running
/etc/init.d/ipsec  stop
Jul 21 17:08:58 heartbeat-1 syslogd 1.5.0#1ubuntu1: restart.

daemon.log

Jul 21 16:43:49 heartbeat-1 logd: [4486]: info: logd started with
default configuration.
Jul 21 16:43:49 heartbeat-1 logd: [4486]: WARN: Core dumps could be lost
if multiple dumps occur.
Jul 21 16:43:49 heartbeat-1 logd: [4486]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Jul 21 16:43:49 heartbeat-1 logd: [4486]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Jul 21 16:43:49 heartbeat-1 logd: [4490]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Jul 21 16:43:49 heartbeat-1 logd: [4486]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Jul 21 16:43:49 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped
Jul 21 16:43:49 heartbeat-1 ipsec_setup: Stopping Openswan IPsec...
Jul 21 16:43:49 heartbeat-1 ipsec_setup: stop ordered, but IPsec does
not appear to be running!
Jul 21 16:43:49 heartbeat-1 ipsec_setup: doing cleanup anyway...
Jul 21 16:45:27 heartbeat-1 ipsec_setup: NETKEY on eth2
10.85.118.241/255.255.0.0 broadcast 10.85.255.255
Jul 21 16:45:28 heartbeat-1 ipsec_setup: ...Openswan IPsec started
Jul 21 16:45:28 heartbeat-1 ipsec_setup: Starting Openswan IPsec
2.4.9...
Jul 21 16:45:28 heartbeat-1 rmmod: ERROR: Module af_key is in use
Jul 21 16:45:28 heartbeat-1 rmmod: ERROR: Module xfrm_user is in use
Jul 21 16:45:28 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped
Jul 21 16:45:28 heartbeat-1 ipsec_setup: Stopping Openswan IPsec...
Jul 21 16:45:29 heartbeat-1 ipsec__plutorun: 104 "sup-test" #1:
STATE_MAIN_I1: initiate
Jul 21 16:45:29 heartbeat-1 ipsec__plutorun: ...could not start conn
"sup-test"
Jul 21 16:45:30 heartbeat-1 rmmod: ERROR: Module xfrm_user is in use
Jul 21 16:45:30 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped
Jul 21 16:45:30 heartbeat-1 ipsec_setup: Stopping Openswan IPsec...
Jul 21 16:45:30 heartbeat-1 ipsec_setup: stop ordered, but IPsec does
not appear to be running!
Jul 21 16:45:30 heartbeat-1 ipsec_setup: doing cleanup anyway...
Jul 21 16:45:31 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped
Jul 21 16:45:31 heartbeat-1 ipsec_setup: Stopping Openswan IPsec...
Jul 21 16:45:31 heartbeat-1 ipsec_setup: stop ordered, but IPsec does
not appear to be running!
Jul 21 16:45:31 heartbeat-1 ipsec_setup: doing cleanup anyway...
Jul 21 16:45:32 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped
Jul 21 16:45:32 heartbeat-1 ipsec_setup: Stopping Openswan IPsec...
Jul 21 16:45:32 heartbeat-1 ipsec_setup: stop ordered, but IPsec does
not appear to be running!
Jul 21 16:45:32 heartbeat-1 ipsec_setup: doing cleanup anyway...
Jul 21 16:45:34 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped
Jul 21 16:45:34 heartbeat-1 ipsec_setup: Stopping Openswan IPsec...
Jul 21 16:45:34 heartbeat-1 ipsec_setup: stop ordered, but IPsec does
not appear to be running!
Jul 21 16:45:34 heartbeat-1 ipsec_setup: doing cleanup anyway...
Jul 21 16:45:35 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped
Jul 21 16:45:35 heartbeat-1 ipsec_setup: Stopping Openswan IPsec...
Jul 21 16:45:35 heartbeat-1 ipsec_setup: stop ordered, but IPsec does
not appear to be running!
Jul 21 16:45:35 heartbeat-1 ipsec_setup: doing cleanup anyway...
Jul 21 16:45:36 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped
Jul 21 16:45:36 heartbeat-1 ipsec_setup: Stopping Openswan IPsec...
Jul 21 16:45:36 heartbeat-1 ipsec_setup: stop ordered, but IPsec does
not appear to be running!
Jul 21 16:45:36 heartbeat-1 ipsec_setup: doing cleanup anyway...



More information about the Linux-HA mailing list