[Linux-HA] Heartbeat, ipfail and two ethernet cards

Alexis Januskiewicz ajanuskiewicz at protego.net
Wed Mar 2 06:49:53 MST 2005


Hi everybody,

I'm trying to test heartbeat with a squid cluster.
I've understood than Heartbeat can't monitor resources (so, if squid 
crashes, nothing happened, except if I use "mon" or "ldirectord"), and 
also that I've to use IPFail to detect network failures.
My network is :

      ______________ SW _____________
     |				    |
     |				    |
     |eth0			    |eth0
  ___|___			 ___|___
|	|eth1		    eth1|	|		
| SRV 1 |-----------------------| SRV 2 |
|_______|    Heartbeat link	|_______|

First time, I only used one ethernet card on each server. The problem is 
that if network link is broken, each server think he's alive and when 
link goes up, heartbeat restart on each server. Is it normal ? For me, 
it's not, but maybe I'm wrong.
Consequently, I choose to dedicate a second network card to heartbeat, 
to avoid having lot of UDP traffic on my network and also to permit to 
monitor servers even if eth0 is down.

First, without using IPFail, I unplug eth0 from SRV1. Heartbeat thinks 
that SRV1 is still alive, and doesn't switch VIP to SRV2.
After that, I decided to "heartbeat" eth0 AND eth1, but I'm in front of 
the same problem.
So, finally, I configured Heartbeat to use IPFail. The result is not 
better, and I really think that I've a problem with my configuration.

I've two questions :
- do you think this configuration is correct and "optimal" ? If not, 
what do you recommend ?
- are IPFail logs normal ("respawning too fast") ??? If not, what could 
I do ?

OS : Debian Sarge, with a 2.6.10 kernel (patched with ac and grsecurity)
Heartbeat : v1.2.3 (debian package)
You will find in attached files : ha.cf (exactly the same on both 
computers), haresources, squid initialization script and ha-log. On 
srv1, I switch off eth0 at 14:36:39.

Thanks for your help !
-- 
Alexis
-------------- next part --------------
#!/bin/bash
#

case "$1" in
	start)
		echo "Starting Squid"
		/usr/local/squid/sbin/squid -D
		;;
	stop)
		echo "Stopping Squid"
		kill -9 `cat /usr/local/squid/var/run/squid.pid`
		;;
	restart)
		/etc/init.d/squid stop
		/etc/init.d/squid start
		;;
	status)
		/usr/local/squid/sbin/squid -k check
		if [ $? == 0 ];
		then echo "Squid is running...";
		fi
		;;
	*)
		echo "Usage: squid {start|stop|status}" 
esac
exit 0
-------------- next part --------------
bcast eth1 eth0
keepalive 1000ms
warntime 3
deadtime 5
udpport 694
auto_failback on
node srv1
node srv2
#ping GW
ping 10.0.0.1
respawn hacluster /usr/lib/heartbeat/ipfail
-------------- next part --------------
heartbeat: 2005/03/02_14:35:16 info: Neither logfile nor logfacility found.
heartbeat: 2005/03/02_14:35:16 info: Logging defaulting to /var/log/ha-log
heartbeat: 2005/03/02_14:35:16 info: **************************
heartbeat: 2005/03/02_14:35:16 info: Configuration validated. Starting heartbeat 1.2.3
heartbeat: 2005/03/02_14:35:16 info: heartbeat: version 1.2.3
heartbeat: 2005/03/02_14:35:16 info: Heartbeat generation: 109
heartbeat: 2005/03/02_14:35:16 info: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat: 2005/03/02_14:35:16 info: UDP Broadcast heartbeat started on port 694 (694) interface eth0
heartbeat: 2005/03/02_14:35:16 info: ping heartbeat started.
heartbeat: 2005/03/02_14:35:17 info: pid 23635 locked in memory.
heartbeat: 2005/03/02_14:35:17 info: Local status now set to: 'up'
heartbeat: 2005/03/02_14:35:17 info: pid 13484 locked in memory.
heartbeat: 2005/03/02_14:35:17 info: pid 14546 locked in memory.
heartbeat: 2005/03/02_14:35:17 info: pid 4570 locked in memory.
heartbeat: 2005/03/02_14:35:17 info: Link srv1:eth1 up.
heartbeat: 2005/03/02_14:35:17 info: pid 18969 locked in memory.
heartbeat: 2005/03/02_14:35:17 info: pid 1207 locked in memory.
heartbeat: 2005/03/02_14:35:17 info: Link srv1:eth0 up.
heartbeat: 2005/03/02_14:35:17 info: pid 18340 locked in memory.
heartbeat: 2005/03/02_14:35:18 info: pid 27211 locked in memory.
heartbeat: 2005/03/02_14:35:18 info: Link 10.0.0.1:10.0.0.1 up.
heartbeat: 2005/03/02_14:35:18 info: Status update for node 10.0.0.1: status ping
heartbeat: 2005/03/02_14:35:24 info: Link srv2:eth1 up.
heartbeat: 2005/03/02_14:35:24 info: Status update for node srv2: status up
heartbeat: 2005/03/02_14:35:24 info: Local status now set to: 'active'
heartbeat: 2005/03/02_14:35:24 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:24 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/03/02_14:35:24 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 62)
heartbeat: 2005/03/02_14:35:24 WARN: Exiting /usr/lib/heartbeat/ipfail process 62 returned rc 126.
heartbeat: 2005/03/02_14:35:24 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:24 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:25 info: Link srv2:eth0 up.
heartbeat: 2005/03/02_14:35:25 info: Status update for node srv2: status active
heartbeat: 2005/03/02_14:35:25 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/03/02_14:35:25 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 8970)
heartbeat: 2005/03/02_14:35:25 WARN: Exiting /usr/lib/heartbeat/ipfail process 8970 returned rc 126.
heartbeat: 2005/03/02_14:35:25 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:25 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:26 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 32375)
heartbeat: 2005/03/02_14:35:26 WARN: Exiting /usr/lib/heartbeat/ipfail process 32375 returned rc 126.
heartbeat: 2005/03/02_14:35:26 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:26 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:27 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 26621)
heartbeat: 2005/03/02_14:35:27 WARN: Exiting /usr/lib/heartbeat/ipfail process 26621 returned rc 126.
heartbeat: 2005/03/02_14:35:27 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:27 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:28 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 30980)
heartbeat: 2005/03/02_14:35:28 WARN: Exiting /usr/lib/heartbeat/ipfail process 30980 returned rc 126.
heartbeat: 2005/03/02_14:35:28 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:28 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:29 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 10094)
heartbeat: 2005/03/02_14:35:29 WARN: Exiting /usr/lib/heartbeat/ipfail process 10094 returned rc 126.
heartbeat: 2005/03/02_14:35:29 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:29 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:30 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 4264)
heartbeat: 2005/03/02_14:35:30 WARN: Exiting /usr/lib/heartbeat/ipfail process 4264 returned rc 126.
heartbeat: 2005/03/02_14:35:30 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:30 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:31 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 26003)
heartbeat: 2005/03/02_14:35:31 WARN: Exiting /usr/lib/heartbeat/ipfail process 26003 returned rc 126.
heartbeat: 2005/03/02_14:35:31 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:31 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:32 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 30733)
heartbeat: 2005/03/02_14:35:32 WARN: Exiting /usr/lib/heartbeat/ipfail process 30733 returned rc 126.
heartbeat: 2005/03/02_14:35:32 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:32 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:33 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 12903)
heartbeat: 2005/03/02_14:35:33 WARN: Exiting /usr/lib/heartbeat/ipfail process 12903 returned rc 126.
heartbeat: 2005/03/02_14:35:33 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:33 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:34 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 8309)
heartbeat: 2005/03/02_14:35:34 WARN: Exiting /usr/lib/heartbeat/ipfail process 8309 returned rc 126.
heartbeat: 2005/03/02_14:35:34 ERROR: Client /usr/lib/heartbeat/ipfail "respawning too fast"
heartbeat: 2005/03/02_14:35:35 info: remote resource transition completed.
heartbeat: 2005/03/02_14:35:35 info: remote resource transition completed.
heartbeat: 2005/03/02_14:35:35 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat: 2005/03/02_14:35:37 info: Local Resource acquisition completed.
heartbeat: 2005/03/02_14:35:37 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
heartbeat: 2005/03/02_14:35:37 received ip-request-resp 10.0.0.234 OK yes
heartbeat: 2005/03/02_14:35:38 info: Acquiring resource group: srv1 10.0.0.234 squid
heartbeat: 2005/03/02_14:35:40 info: Running /etc/ha.d/resource.d/IPaddr 10.0.0.234 start
heartbeat: 2005/03/02_14:35:44 info: /sbin/ifconfig eth0:0 10.0.0.234 netmask 255.255.255.0	broadcast 10.0.0.255
heartbeat: 2005/03/02_14:35:44 info: Sending Gratuitous Arp for 10.0.0.234 on eth0:0 [eth0]
heartbeat: 2005/03/02_14:35:44 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.0.0.234 eth0 10.0.0.234 auto 10.0.0.234 ffffffffffff
heartbeat: 2005/03/02_14:35:45 info: Running /etc/init.d/squid  start
heartbeat: 2005/03/02_14:36:39 WARN: node 10.0.0.1: is dead
heartbeat: 2005/03/02_14:36:39 info: Link srv2:eth0 dead.
heartbeat: 2005/03/02_14:36:39 info: Link 10.0.0.1:10.0.0.1 dead.
heartbeat: 2005/03/02_14:36:39 info: Running /etc/ha.d/rc.d/status status
-------------- next part --------------
heartbeat: 2005/03/02_14:34:52 info: Neither logfile nor logfacility found.
heartbeat: 2005/03/02_14:34:52 info: Logging defaulting to /var/log/ha-log
heartbeat: 2005/03/02_14:34:52 info: **************************
heartbeat: 2005/03/02_14:34:52 info: Configuration validated. Starting heartbeat 1.2.3
heartbeat: 2005/03/02_14:34:52 info: heartbeat: version 1.2.3
heartbeat: 2005/03/02_14:34:52 info: Heartbeat generation: 86
heartbeat: 2005/03/02_14:34:52 info: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat: 2005/03/02_14:34:52 info: UDP Broadcast heartbeat started on port 694 (694) interface eth0
heartbeat: 2005/03/02_14:34:52 info: ping heartbeat started.
heartbeat: 2005/03/02_14:34:53 info: pid 30241 locked in memory.
heartbeat: 2005/03/02_14:34:53 info: Local status now set to: 'up'
heartbeat: 2005/03/02_14:34:53 info: pid 16807 locked in memory.
heartbeat: 2005/03/02_14:34:53 info: pid 19358 locked in memory.
heartbeat: 2005/03/02_14:34:54 info: pid 20813 locked in memory.
heartbeat: 2005/03/02_14:34:54 info: Link srv1:eth1 up.
heartbeat: 2005/03/02_14:34:54 info: Status update for node srv1: status up
heartbeat: 2005/03/02_14:34:54 info: Link srv2:eth1 up.
heartbeat: 2005/03/02_14:34:54 info: Status update for node srv1: status active
heartbeat: 2005/03/02_14:34:54 info: pid 7051 locked in memory.
heartbeat: 2005/03/02_14:34:54 info: pid 14717 locked in memory.
heartbeat: 2005/03/02_14:34:54 info: pid 10750 locked in memory.
heartbeat: 2005/03/02_14:34:54 info: Link srv1:eth0 up.
heartbeat: 2005/03/02_14:34:54 info: Link srv2:eth0 up.
heartbeat: 2005/03/02_14:34:54 info: pid 20508 locked in memory.
heartbeat: 2005/03/02_14:34:54 info: Link 10.0.0.1:10.0.0.1 up.
heartbeat: 2005/03/02_14:34:54 info: Status update for node 10.0.0.1: status ping
heartbeat: 2005/03/02_14:34:54 info: Local status now set to: 'active'
heartbeat: 2005/03/02_14:34:54 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:34:54 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 32057)
heartbeat: 2005/03/02_14:34:54 WARN: Exiting /usr/lib/heartbeat/ipfail process 32057 returned rc 126.
heartbeat: 2005/03/02_14:34:54 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:34:54 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:34:54 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/03/02_14:34:54 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/03/02_14:34:55 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 5720)
heartbeat: 2005/03/02_14:34:55 WARN: Exiting /usr/lib/heartbeat/ipfail process 5720 returned rc 126.
heartbeat: 2005/03/02_14:34:55 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:34:55 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:34:56 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 8155)
heartbeat: 2005/03/02_14:34:56 WARN: Exiting /usr/lib/heartbeat/ipfail process 8155 returned rc 126.
heartbeat: 2005/03/02_14:34:56 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:34:56 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:34:57 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 21595)
heartbeat: 2005/03/02_14:34:57 WARN: Exiting /usr/lib/heartbeat/ipfail process 21595 returned rc 126.
heartbeat: 2005/03/02_14:34:57 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:34:57 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:34:58 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 14068)
heartbeat: 2005/03/02_14:34:58 WARN: Exiting /usr/lib/heartbeat/ipfail process 14068 returned rc 126.
heartbeat: 2005/03/02_14:34:58 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:34:58 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:34:59 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 25181)
heartbeat: 2005/03/02_14:34:59 WARN: Exiting /usr/lib/heartbeat/ipfail process 25181 returned rc 126.
heartbeat: 2005/03/02_14:34:59 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:34:59 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:00 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 18470)
heartbeat: 2005/03/02_14:35:00 WARN: Exiting /usr/lib/heartbeat/ipfail process 18470 returned rc 126.
heartbeat: 2005/03/02_14:35:00 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:00 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:01 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 3849)
heartbeat: 2005/03/02_14:35:01 WARN: Exiting /usr/lib/heartbeat/ipfail process 3849 returned rc 126.
heartbeat: 2005/03/02_14:35:01 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:01 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:02 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 19911)
heartbeat: 2005/03/02_14:35:02 WARN: Exiting /usr/lib/heartbeat/ipfail process 19911 returned rc 126.
heartbeat: 2005/03/02_14:35:02 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:02 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:03 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 32388)
heartbeat: 2005/03/02_14:35:03 WARN: Exiting /usr/lib/heartbeat/ipfail process 32388 returned rc 126.
heartbeat: 2005/03/02_14:35:03 info: Respawning client "/usr/lib/heartbeat/ipfail":
heartbeat: 2005/03/02_14:35:03 info: Starting child client "/usr/lib/heartbeat/ipfail" (1002,104)
heartbeat: 2005/03/02_14:35:04 info: Starting "/usr/lib/heartbeat/ipfail" as uid 1002  gid 104 (pid 16275)
heartbeat: 2005/03/02_14:35:04 WARN: Exiting /usr/lib/heartbeat/ipfail process 16275 returned rc 126.
heartbeat: 2005/03/02_14:35:04 ERROR: Client /usr/lib/heartbeat/ipfail "respawning too fast"
heartbeat: 2005/03/02_14:35:04 info: local resource transition completed.
heartbeat: 2005/03/02_14:35:04 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat: 2005/03/02_14:35:04 info: remote resource transition completed.
heartbeat: 2005/03/02_14:35:06 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys srv2] to acquire.
heartbeat: 2005/03/02_14:36:08 info: Link srv1:eth0 dead.
heartbeat: 2005/03/02_14:38:27 info: Link srv1:eth0 up.
-------------- next part --------------
srv1 10.0.0.234 squid


More information about the Linux-HA mailing list