[Linux-HA] 99% CPU heartbeat & rexmit (seqno too low)

Oren Nechushtan oren at forescout.com
Mon Sep 4 06:06:27 MDT 2006


Hi,
Sometimes after a split brain the main heartbeat process starts consuming 99+% CPU.
The configuration is heartbeat 2.0.7 with two nodes cluster and crm enabled.
(I believe this can be reproduced with different Rexmit messages. Let me know if you need more details..)
Here are the logs (and strace for the main heartbeat process below.)
The arptable on the problematic node shows that the IP address of the other node was not resolved.

Thanks,
Oren.

P.S.
Looking though the code, what prevents the process_rexmit() call from retransmitting the same information (in bursts) for infinity?

========================= ha-log ==============

heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32690, lowseq=32490,ackseq=657,lastmsg=89
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:02 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:02 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:02 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:02 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:02 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:02 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:02 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:02 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:02 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:02 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32692, lowseq=32492,ackseq=657,lastmsg=91
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32692, lowseq=32492,ackseq=657,lastmsg=91
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32692, lowseq=32492,ackseq=657,lastmsg=91
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:06 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:06 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:06 info: hiseq =32695, lowseq=32495,ackseq=657,lastmsg=94
heartbeat[1516]: 2006/09/04_04:02:06 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:06 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:06 info: hiseq =32695, lowseq=32495,ackseq=657,lastmsg=94
lrmd[1929]: 2006/09/04_04:02:06 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[5747]: 2006/09/04_04:02:06 ERROR: Cannot signon with heartbeat

lrmd[1929]: 2006/09/04_04:02:06 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[5747]: 2006/09/04_04:02:06 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

lrmd[1929]: 2006/09/04_04:02:06 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[5750]: 2006/09/04_04:02:06 ERROR: Cannot signon with heartbeat

lrmd[1929]: 2006/09/04_04:02:06 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[5750]: 2006/09/04_04:02:06 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

heartbeat[1516]: 2006/09/04_04:02:06 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:06 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:06 info: hiseq =32695, lowseq=32495,ackseq=657,lastmsg=94
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:08 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:08 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:08 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:08 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:08 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:08 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:08 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:08 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95

...


heartbeat[1516]: 2006/09/04_12:30:00 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:00 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:00 info: hiseq =58969, lowseq=58769,ackseq=657,lastmsg=168
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:02 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:02 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:02 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:02 info: hiseq =58971, lowseq=58771,ackseq=657,lastmsg=170
heartbeat[1516]: 2006/09/04_12:30:02 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:02 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:02 info: hiseq =58971, lowseq=58771,ackseq=657,lastmsg=170
heartbeat[1516]: 2006/09/04_12:30:02 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:02 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:02 info: hiseq =58971, lowseq=58771,ackseq=657,lastmsg=170
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:03 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:03 info: hiseq =58971, lowseq=58771,ackseq=657,lastmsg=170
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:03 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:03 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:03 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:03 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:03 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:03 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
lrmd[1929]: 2006/09/04_12:30:04 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[25887]: 2006/09/04_12:30:04 ERROR: Cannot signon with heartbeat

lrmd[1929]: 2006/09/04_12:30:04 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[25887]: 2006/09/04_12:30:04 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

lrmd[1929]: 2006/09/04_12:30:04 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[25890]: 2006/09/04_12:30:04 ERROR: Cannot signon with heartbeat

lrmd[1929]: 2006/09/04_12:30:04 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[25890]: 2006/09/04_12:30:04 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

heartbeat[1516]: 2006/09/04_12:30:04 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:04 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:04 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
heartbeat[1516]: 2006/09/04_12:30:04 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:04 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:04 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:06 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:06 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:06 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:06 info: hiseq =58974, lowseq=58774,ackseq=657,lastmsg=173
heartbeat[1516]: 2006/09/04_12:30:06 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:06 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:06 info: hiseq =58974, lowseq=58774,ackseq=657,lastmsg=173
heartbeat[1516]: 2006/09/04_12:30:06 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:06 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:06 info: hiseq =58974, lowseq=58774,ackseq=657,lastmsg=173
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:07 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:07 info: hiseq =58974, lowseq=58774,ackseq=657,lastmsg=173
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:07 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:07 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:07 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:07 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:07 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:07 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:08 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:08 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:08 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:08 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:08 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:08 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:08 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:08 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:10 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:10 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:10 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:10 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:10 info: hiseq =58977, lowseq=58777,ackseq=657,lastmsg=176
heartbeat[1516]: 2006/09/04_12:30:10 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:10 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:10 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:10 info: hiseq =58977, lowseq=58777,ackseq=657,lastmsg=176
heartbeat[1516]: 2006/09/04_12:30:10 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:10 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:10 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:10 info: hiseq =58977, lowseq=58777,ackseq=657,lastmsg=176
heartbeat[1516]: 2006/09/04_12:30:11 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:11 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:11 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:11 info: hiseq =58977, lowseq=58777,ackseq=657,lastmsg=176
heartbeat[1516]: 2006/09/04_12:30:11 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:11 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:11 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:11 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:11 info: hiseq =58978, lowseq=58778,ackseq=657,lastmsg=177
heartbeat[1516]: 2006/09/04_12:30:11 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:11 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:11 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:11 info: hiseq =58978, lowseq=58778,ackseq=657,lastmsg=177
heartbeat[1516]: 2006/09/04_12:30:12 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:12 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:12 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:12 info: hiseq =58978, lowseq=58778,ackseq=657,lastmsg=177
heartbeat[1516]: 2006/09/04_12:30:12 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:12 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:12 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:12 info: hiseq =58978, lowseq=58778,ackseq=657,lastmsg=177
heartbeat[1516]: 2006/09/04_12:30:13 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:14 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:15 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:17 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:18 ERROR: Message hist queue is filling up (200 messages in queue)

============== strace heartbeat ==========

1516  12:38:25.923404 send(6, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.923610 alarm(0)          = 0
1516  12:38:25.923702 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.923764 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:25.923832 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.923888 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:25.923954 send(10, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.924121 alarm(0)          = 0
1516  12:38:25.924185 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.924250 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:25.924320 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.924384 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:25.924449 send(14, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.924622 alarm(0)          = 0
1516  12:38:25.924693 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.924754 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:25.924821 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.924877 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:25.924942 send(18, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.925104 alarm(0)          = 0
1516  12:38:25.925160 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.925226 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:25.925293 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.925356 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:25.925439 send(22, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.925602 alarm(0)          = 0
1516  12:38:25.925654 times({tms_utime=6067893, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540287
1516  12:38:25.925707 times({tms_utime=6067893, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540287
1516  12:38:26.105640 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.105735 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.105795 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.105870 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.105955 getpid()          = 1516
1516  12:38:26.106053 time(NULL)        = 1157362706
1516  12:38:26.106203 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.106321 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.106379 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.106454 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.106509 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.106574 send(6, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.106747 alarm(0)          = 0
1516  12:38:26.106827 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.106890 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.106960 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107023 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.107087 send(10, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.107261 alarm(0)          = 0
1516  12:38:26.107316 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107382 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.107447 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107503 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.107567 send(14, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.107730 alarm(0)          = 0
1516  12:38:26.107775 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107849 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.107915 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107972 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.108036 send(18, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.108222 alarm(0)          = 0
1516  12:38:26.108279 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.108353 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.108418 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.108489 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.108555 send(22, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.108719 alarm(0)          = 0
1516  12:38:26.108773 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540306
1516  12:38:26.108827 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540306
1516  12:38:26.288359 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540323
1516  12:38:26.288457 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540323
1516  12:38:26.288518 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.288587 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.288655 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.288725 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.288781 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.288845 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.288901 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.288959 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.289028 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.289094 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289158 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.289223 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289278 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.289343 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289399 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.289457 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.289524 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.289590 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289646 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.289711 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289766 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.289830 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289894 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.289960 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.290019 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.290085 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290142 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.290207 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290263 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.290335 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290417 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.290476 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.290536 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.290601 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290656 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.290722 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290786 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.290859 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290914 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.290971 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291031 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291091 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.291157 recv(8, 0x810db00, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.291221 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.291288 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.291346 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291405 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291465 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.291529 recv(12, 0x814a816, 1194, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.291584 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.291652 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.291710 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291778 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291837 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.291902 recv(16, 0x8111b30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.291958 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.292016 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.292074 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292133 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292201 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.292266 recv(20, 0x8138e28, 1984, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.292323 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.292380 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.292437 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292505 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292572 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.292638 recv(24, 0x8136543, 621, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.292693 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.292752 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.292811 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292870 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292939 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.293004 recv(25, 0x8117b78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.293059 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.293118 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.293177 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.338534 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.338618 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.338699 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.338770 poll([{fd=6, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN|POLLPRI}, {fd=18, events=POLLIN|POLLPRI}, {fd=22, events=POLLIN|POLLPRI}, {fd=8, events=0}, {fd=16, events=0}, {fd=20, events=0}, {fd=12, events=0}, {fd=24, events=0}, {fd=25, events=0}], 11, 0) = 0
1516  12:38:26.338869 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.338929 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339000 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339057 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.339122 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339178 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.339236 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339294 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339360 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339415 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.339479 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339534 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.339592 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339660 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339725 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339781 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.339845 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339901 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.339959 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340027 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340101 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340156 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.340220 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340276 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.340335 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340395 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340461 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340516 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.340580 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340636 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.340702 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340761 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340835 recv(8, 0x810db00, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340890 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.340950 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.341008 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341068 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341133 recv(12, 0x814a816, 1194, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.341189 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.341248 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.341314 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341396 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341463 recv(16, 0x8111b30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.341518 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.341579 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.341646 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341706 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341772 recv(20, 0x8138e28, 1984, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.341827 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.341895 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.341954 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.342014 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.342079 recv(24, 0x8136543, 621, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.342135 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.342194 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.342253 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.342312 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.342378 recv(25, 0x8117b78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.342441 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.342500 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.342559 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.387898 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.387983 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.388042 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.388107 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.388186 getpid()          = 1516
1516  12:38:26.388280 time(NULL)        = 1157362706
1516  12:38:26.388434 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.388571 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.388633 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.388705 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.388760 poll([{fd=6, events=0}], 1, 0) = 0

============= ha.cf ===========

use_logd yes
node ha-1
node ha-2
udpport 694
ucast eth7 172.17.2.171 #e.g. real eth7 address on host 1
ucast eth7 172.17.2.172 #e.g. real eth7 address on host 2
ucast eth6 172.17.2.71 #e.g. real eth3 address on host 1
ucast eth6 172.17.2.72 #e.g. real eth3 address on host 2
auto_failback off
autojoin none
keepalive 1
deadtime 60
ping_group routers 10.0.4.253
deadping 60
warntime 30
compression    bz2
compression_threshold 2
traditional_compression false
coredumps true
initdead 60
msgfmt netstring
watchdog /dev/watchdog
crm yes
respawn hacluster       /usr/lib/heartbeat/cibmon -d
respawn root            /usr/lib/heartbeat/pingd -m 1000 -d 5s -a default_ping_set


More information about the Linux-HA mailing list