[Linux-HA] ipfail takes much CPU, maybe endless loop
Pavol Gono
pavol_gono at yahoo.com
Wed Jul 27 03:25:18 MDT 2005
Hi
Last days we moved from heartbeat 1.2.0 to 1.2.3, we did some
tests and during night we let heartbeat run under normal
conditions.
We use 2 machines, most CPU consuming services are poptop
server and our custom application.
Functionality of heartbeat and failovers were good, but last
night we noticed that one ipfail process consumes too much CPU,
after shutting down other services it was 100%.
Did you ever experienced such thing?
RX300S2 machine (during night master):
Suse 9.3
kernel 2.6.11.4-21.7-smp
heartbeat 1.2.3
ha.cf:
logfacility local0
keepalive 200ms
deadtime 3
warntime 1000ms
initdead 10
udpport 694
ucast eth3 172.16.0.1
ucast eth2 10.55.0.1
ucast eth4 10.54.0.1
auto_failback off
node RX300S2
node RX300
ping 10.55.0.4 10.54.0.8
respawn hacluster /usr/lib/heartbeat/ipfail
RX300 machine (during night slave):
Suse 9.1
kernel 2.6.5-7.111-smp
heartbeat 1.2.3
ha.cf:
logfacility local0
keepalive 200ms
deadtime 3
warntime 1000ms
initdead 10
udpport 694
ucast eth1 172.16.0.2
ucast eth2 10.54.0.2
ucast eth0 10.55.0.2
auto_failback off
node RX300S2
node RX300
ping 10.55.0.4 10.54.0.8
respawn hacluster /usr/lib/heartbeat/ipfail
heartbeat processes on RX300S2 (master) after night from top,
sorted by cummulated CPU usage
PID PR S %CPU TIME+ COMMAND
25202 25 R 96.4 798:26.44 /usr/lib/heartbeat/ipfail
28804 -2 S 1.9 19:24.83 heartbeat: heartbeat: master control
process
28813 -2 S 0.0 1:45.63 heartbeat: heartbeat: write: ping
10.55.0.4
28815 -2 S 0.0 1:08.67 heartbeat: heartbeat: write: ping
10.54.0.8
28816 -2 S 0.0 0:41.15 heartbeat: heartbeat: read: ping
10.54.0.8
28814 -2 S 0.0 0:40.16 heartbeat: heartbeat: read: ping
10.55.0.4
28811 -2 S 0.0 0:30.67 heartbeat: heartbeat: write: ucast
eth4
28809 -2 S 0.0 0:27.94 heartbeat: heartbeat: write: ucast
eth2
28807 -2 S 0.0 0:24.66 heartbeat: heartbeat: write: ucast
eth3
28808 -2 S 0.0 0:20.28 heartbeat: heartbeat: read: ucast
eth3
28810 -2 S 0.0 0:16.42 heartbeat: heartbeat: read: ucast
eth2
28812 -2 S 0.0 0:15.99 heartbeat: heartbeat: read: ucast
eth4
28821 16 S 0.0 0:03.70 /usr/lib/heartbeat/ipfail
28806 -2 S 0.0 0:00.00 heartbeat: heartbeat: FIFO reader
On slave processes didn't use much CPU
This was situation when this thing didn't occured:
Suse 9.1
kernel 2.6.5-7.111-smp
heartbeat 1.2.0
almost the same ha.cf (other IPs, other timers)
If you want more detailed logs, top outputs, configurations, I
can provide
Pavol
__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail
More information about the Linux-HA
mailing list