[Linux-HA] ipfail takes much CPU, maybe endless loop

Pavol Gono pavol_gono at yahoo.com
Wed Jul 27 03:25:18 MDT 2005


Hi

Last days we moved from heartbeat 1.2.0 to 1.2.3, we did some
tests and during night we let heartbeat run under normal
conditions.
We use 2 machines, most CPU consuming services are poptop
server and our custom application.

Functionality of heartbeat and failovers were good, but last
night we noticed that one ipfail process consumes too much CPU,
after shutting down other services it was 100%.
Did you ever experienced such thing?


RX300S2 machine (during night master):
Suse 9.3
kernel 2.6.11.4-21.7-smp
heartbeat 1.2.3

ha.cf:
logfacility     local0
keepalive 200ms
deadtime 3
warntime 1000ms
initdead 10
udpport 694
ucast eth3 172.16.0.1
ucast eth2 10.55.0.1
ucast eth4 10.54.0.1
auto_failback off
node    RX300S2
node    RX300
ping 10.55.0.4 10.54.0.8
respawn hacluster /usr/lib/heartbeat/ipfail


RX300 machine (during night slave):
Suse 9.1
kernel 2.6.5-7.111-smp
heartbeat 1.2.3

ha.cf:
logfacility     local0
keepalive 200ms
deadtime 3
warntime 1000ms
initdead 10
udpport 694
ucast eth1 172.16.0.2
ucast eth2 10.54.0.2
ucast eth0 10.55.0.2
auto_failback off
node    RX300S2
node    RX300
ping 10.55.0.4 10.54.0.8
respawn hacluster /usr/lib/heartbeat/ipfail


heartbeat processes on RX300S2 (master) after night from top,
sorted by cummulated CPU usage
  PID PR S %CPU    TIME+  COMMAND
25202 25 R 96.4 798:26.44 /usr/lib/heartbeat/ipfail
28804 -2 S  1.9  19:24.83 heartbeat: heartbeat: master control
process
28813 -2 S  0.0   1:45.63 heartbeat: heartbeat: write: ping
10.55.0.4
28815 -2 S  0.0   1:08.67 heartbeat: heartbeat: write: ping
10.54.0.8
28816 -2 S  0.0   0:41.15 heartbeat: heartbeat: read: ping
10.54.0.8
28814 -2 S  0.0   0:40.16 heartbeat: heartbeat: read: ping
10.55.0.4
28811 -2 S  0.0   0:30.67 heartbeat: heartbeat: write: ucast
eth4
28809 -2 S  0.0   0:27.94 heartbeat: heartbeat: write: ucast
eth2
28807 -2 S  0.0   0:24.66 heartbeat: heartbeat: write: ucast
eth3
28808 -2 S  0.0   0:20.28 heartbeat: heartbeat: read: ucast
eth3
28810 -2 S  0.0   0:16.42 heartbeat: heartbeat: read: ucast
eth2
28812 -2 S  0.0   0:15.99 heartbeat: heartbeat: read: ucast
eth4
28821 16 S  0.0   0:03.70 /usr/lib/heartbeat/ipfail
28806 -2 S  0.0   0:00.00 heartbeat: heartbeat: FIFO reader

On slave processes didn't use much CPU


This was situation when this thing didn't occured:
Suse 9.1
kernel 2.6.5-7.111-smp
heartbeat 1.2.0
almost the same ha.cf (other IPs, other timers)


If you want more detailed logs, top outputs, configurations, I
can provide

Pavol


	
		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail


More information about the Linux-HA mailing list