[Linux-HA] RE: 99% CPU heartbeat & rexmit (seqno too low)

Oren Nechushtan oren at forescout.com
Wed Sep 6 06:49:40 MDT 2006


Here is a potential patch by a colleague (Oded Comay) for the related rexmit issue below.
Your blessing would be appreciated :)

>See below the patch, along with a sample log. The fix basically removes retransmit requests for sequence numbers which are no longer required. I still don't have a good
>explanation why would this create a high CPU load (except maybe a bug in the glib hash handling code, which I didn't look into). 

The patch:


--- heartbeat-2.0.7.orig/heartbeat/hb_rexmit.c  Thu Feb  9 21:04:04 2006
+++ heartbeat-2.0.7/heartbeat/hb_rexmit.c       Wed Sep  6 15:10:18 2006
@@ -169,6 +169,17 @@
        seqno_t seq = (seqno_t) ri->seq;
        struct node_info* node = ri->node;
        struct ha_msg*  hmsg;
+       struct seqtrack* t = &node->track;
+       int j = 0;
+
+       while (j < t->nmissing && seq != t->seqmissing[j]) {
+               ++j;
+       }
+       if (j == t->nmissing) {
+               cl_log(LOG_INFO, "%s: seq %lu from %s is no longer required",
+                       __FUNCTION__, seq, node->nodename);
+               return FALSE;
+       }

        if ((hmsg = ha_msg_new(6)) == NULL) {
                cl_log(LOG_ERR, "%s: no memory for " T_REXMIT,

Sample log: 
heartbeat[1261]: 2006/09/06_15:00:02 WARN: 13 lost packet(s) for [haha-2] [782:796]
heartbeat[1261]: 2006/09/06_15:01:07 WARN: node haha-2: is dead
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 783 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 784 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 785 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 786 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 787 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 788 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 789 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 790 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 791 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 792 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 793 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 794 from haha-2 is no longer required
heartbeat[1261]: 2006/09/06_15:01:12 info: send_rexmit_request: seq 795 from haha-2 is no longer required
crmd[1300]: 2006/09/06_15:01:12 WARN: check_dead_member:ccm.c Our DC node (haha-2) left the cluster


-----Original Message-----
From: Oren Nechushtan 
Sent: Monday, September 04, 2006 7:36 PM
To: 'General Linux-HA mailing list'
Subject: RE: 99% CPU heartbeat & rexmit (seqno too low)


Hi,
It appears the CPU increases slowly..

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1512 root      19   0  8780 8780  1964 S    16.1  0.8   0:38 heartbeat
..
 1512 root      16   0  8780 8780  1964 S    18.4  0.8   0:48 heartbeat
..
 1512 root      18   0  8780 8780  1964 S    21.1  0.8   1:19 heartbeat
..

Here is the stack trace when the CPU was already high.

(gdb) where
#0  0x400896e0 in g_main_context_find_source_by_id () at eval.c:41
#1  0x4003bbca in ?? () at eval.c:41 from /usr/lib/libplumb.so.1
#2  0x4003bc0f in ?? () at eval.c:41 from /usr/lib/libplumb.so.1
#3  0x0806a51d in schedule_rexmit_request () at eval.c:41
#4  0x0806a411 in send_rexmit_request () at eval.c:41
#5  0x4003b70c in ?? () at eval.c:41 from /usr/lib/libplumb.so.1
#6  0x40089dce in g_main_dispatch () at eval.c:41
#7  0x4008aca7 in g_main_context_dispatch () at eval.c:41
#8  0x4008b091 in g_main_context_iterate () at eval.c:41
#9  0x4008b670 in g_main_loop_run () at eval.c:41
#10 0x0805183e in master_control_process () at eval.c:41
#11 0x08050861 in initialize_heartbeat () at eval.c:41
#12 0x080577da in main () at eval.c:41
#13 0x40120657 in __libc_start_main (main=0x8056d83 <main>, argc=1, ubp_av=0xbffffd34, init=0x804e43c <_init>, fini=0x806efc0 <_fini>,
    rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffffd2c) at ../sysdeps/generic/libc-start.c:129
(gdb)


(gdb) where
#0  0x401ea427 in __poll (fds=0x8133120, nfds=20, timeout=200) at ../sysdeps/unix/sysv/linux/poll.c:63
#1  0x4008b998 in g_main_context_poll () at eval.c:41
#2  0x4008b064 in g_main_context_iterate () at eval.c:41
#3  0x4008b670 in g_main_loop_run () at eval.c:41
#4  0x0805183e in master_control_process () at eval.c:41
#5  0x08050861 in initialize_heartbeat () at eval.c:41
#6  0x080577da in main () at eval.c:41
#7  0x40120657 in __libc_start_main (main=0x8056d83 <main>, argc=1, ubp_av=0xbffffc44, init=0x804e43c <_init>, fini=0x806efc0 <_fini>,
    rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffffc3c) at ../sysdeps/generic/libc-start.c:129


It is possible the code in schedule_rexmit_request is too permissive with respect to pushing timed event into a queue?

Thanks,
Oren.

-----Original Message-----
From: Oren Nechushtan 
Sent: Monday, September 04, 2006 3:06 PM
To: 'General Linux-HA mailing list'
Subject: 99% CPU heartbeat & rexmit (seqno too low)


Hi,
Sometimes after a split brain the main heartbeat process starts consuming 99+% CPU.
The configuration is heartbeat 2.0.7 with two nodes cluster and crm enabled.
(I believe this can be reproduced with different Rexmit messages. Let me know if you need more details..)
Here are the logs (and strace for the main heartbeat process below.)
The arptable on the problematic node shows that the IP address of the other node was not resolved.

Thanks,
Oren.

P.S.
Looking though the code, what prevents the process_rexmit() call from retransmitting the same information (in bursts) for infinity?

========================= ha-log ==============

heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32690, lowseq=32490,ackseq=657,lastmsg=89
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:01 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:01 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:01 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:02 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:02 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:02 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:02 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:02 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:02 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:02 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:02 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:02 info: hiseq =32691, lowseq=32491,ackseq=657,lastmsg=90
heartbeat[1516]: 2006/09/04_04:02:02 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32692, lowseq=32492,ackseq=657,lastmsg=91
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32692, lowseq=32492,ackseq=657,lastmsg=91
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32692, lowseq=32492,ackseq=657,lastmsg=91
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:03 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:03 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:03 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32693, lowseq=32493,ackseq=657,lastmsg=92
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:04 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:04 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:04 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:05 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:05 info: hiseq =32694, lowseq=32494,ackseq=657,lastmsg=93
heartbeat[1516]: 2006/09/04_04:02:05 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:06 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:06 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:06 info: hiseq =32695, lowseq=32495,ackseq=657,lastmsg=94
heartbeat[1516]: 2006/09/04_04:02:06 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:06 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:06 info: hiseq =32695, lowseq=32495,ackseq=657,lastmsg=94
lrmd[1929]: 2006/09/04_04:02:06 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[5747]: 2006/09/04_04:02:06 ERROR: Cannot signon with heartbeat

lrmd[1929]: 2006/09/04_04:02:06 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[5747]: 2006/09/04_04:02:06 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

lrmd[1929]: 2006/09/04_04:02:06 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[5750]: 2006/09/04_04:02:06 ERROR: Cannot signon with heartbeat

lrmd[1929]: 2006/09/04_04:02:06 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[5750]: 2006/09/04_04:02:06 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

heartbeat[1516]: 2006/09/04_04:02:06 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:06 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:06 info: hiseq =32695, lowseq=32495,ackseq=657,lastmsg=94
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:07 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:07 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:07 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:08 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:08 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:08 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:08 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95
heartbeat[1516]: 2006/09/04_04:02:08 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_04:02:08 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_04:02:08 info: hist information:
heartbeat[1516]: 2006/09/04_04:02:08 info: hiseq =32696, lowseq=32496,ackseq=657,lastmsg=95

...


heartbeat[1516]: 2006/09/04_12:30:00 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:00 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:00 info: hiseq =58969, lowseq=58769,ackseq=657,lastmsg=168
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:01 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:01 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:01 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:01 info: hiseq =58970, lowseq=58770,ackseq=657,lastmsg=169
heartbeat[1516]: 2006/09/04_12:30:02 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:02 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:02 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:02 info: hiseq =58971, lowseq=58771,ackseq=657,lastmsg=170
heartbeat[1516]: 2006/09/04_12:30:02 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:02 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:02 info: hiseq =58971, lowseq=58771,ackseq=657,lastmsg=170
heartbeat[1516]: 2006/09/04_12:30:02 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:02 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:02 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:02 info: hiseq =58971, lowseq=58771,ackseq=657,lastmsg=170
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:03 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:03 info: hiseq =58971, lowseq=58771,ackseq=657,lastmsg=170
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:03 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:03 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:03 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:03 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
heartbeat[1516]: 2006/09/04_12:30:03 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:03 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:03 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:03 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
lrmd[1929]: 2006/09/04_12:30:04 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[25887]: 2006/09/04_12:30:04 ERROR: Cannot signon with heartbeat

lrmd[1929]: 2006/09/04_12:30:04 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[25887]: 2006/09/04_12:30:04 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

lrmd[1929]: 2006/09/04_12:30:04 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[25890]: 2006/09/04_12:30:04 ERROR: Cannot signon with heartbeat

lrmd[1929]: 2006/09/04_12:30:04 info: RA output: (IPaddr_private_shared1:monitor:stderr) cl_status[25890]: 2006/09/04_12:30:04 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

heartbeat[1516]: 2006/09/04_12:30:04 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:04 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:04 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
heartbeat[1516]: 2006/09/04_12:30:04 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:04 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:04 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:04 info: hiseq =58972, lowseq=58772,ackseq=657,lastmsg=171
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:05 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:05 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:05 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:05 info: hiseq =58973, lowseq=58773,ackseq=657,lastmsg=172
heartbeat[1516]: 2006/09/04_12:30:06 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:06 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:06 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:06 info: hiseq =58974, lowseq=58774,ackseq=657,lastmsg=173
heartbeat[1516]: 2006/09/04_12:30:06 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:06 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:06 info: hiseq =58974, lowseq=58774,ackseq=657,lastmsg=173
heartbeat[1516]: 2006/09/04_12:30:06 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:06 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:06 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:06 info: hiseq =58974, lowseq=58774,ackseq=657,lastmsg=173
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:07 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:07 info: hiseq =58974, lowseq=58774,ackseq=657,lastmsg=173
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:07 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:07 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:07 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:07 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:07 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:07 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:07 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:07 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:08 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:08 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:08 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:08 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:08 ERROR: Cannot rexmit pkt 2189 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:08 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:08 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:08 info: hiseq =58975, lowseq=58775,ackseq=657,lastmsg=174
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2190 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2191 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2192 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2193 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:09 ERROR: Cannot rexmit pkt 2194 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:09 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:09 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:09 info: hiseq =58976, lowseq=58776,ackseq=657,lastmsg=175
heartbeat[1516]: 2006/09/04_12:30:10 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:10 ERROR: Cannot rexmit pkt 2195 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:10 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:10 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:10 info: hiseq =58977, lowseq=58777,ackseq=657,lastmsg=176
heartbeat[1516]: 2006/09/04_12:30:10 ERROR: Cannot rexmit pkt 2196 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:10 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:10 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:10 info: hiseq =58977, lowseq=58777,ackseq=657,lastmsg=176
heartbeat[1516]: 2006/09/04_12:30:10 ERROR: Cannot rexmit pkt 2197 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:10 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:10 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:10 info: hiseq =58977, lowseq=58777,ackseq=657,lastmsg=176
heartbeat[1516]: 2006/09/04_12:30:11 ERROR: Cannot rexmit pkt 2184 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:11 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:11 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:11 info: hiseq =58977, lowseq=58777,ackseq=657,lastmsg=176
heartbeat[1516]: 2006/09/04_12:30:11 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:11 ERROR: Cannot rexmit pkt 2185 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:11 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:11 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:11 info: hiseq =58978, lowseq=58778,ackseq=657,lastmsg=177
heartbeat[1516]: 2006/09/04_12:30:11 ERROR: Cannot rexmit pkt 2186 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:11 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:11 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:11 info: hiseq =58978, lowseq=58778,ackseq=657,lastmsg=177
heartbeat[1516]: 2006/09/04_12:30:12 ERROR: Cannot rexmit pkt 2187 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:12 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:12 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:12 info: hiseq =58978, lowseq=58778,ackseq=657,lastmsg=177
heartbeat[1516]: 2006/09/04_12:30:12 ERROR: Cannot rexmit pkt 2188 for ha-2: seqno too low
heartbeat[1516]: 2006/09/04_12:30:12 info: fromnode =ha-2, fromnode's ackseq = 657
heartbeat[1516]: 2006/09/04_12:30:12 info: hist information:
heartbeat[1516]: 2006/09/04_12:30:12 info: hiseq =58978, lowseq=58778,ackseq=657,lastmsg=177
heartbeat[1516]: 2006/09/04_12:30:13 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:14 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:15 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:17 ERROR: Message hist queue is filling up (200 messages in queue)
heartbeat[1516]: 2006/09/04_12:30:18 ERROR: Message hist queue is filling up (200 messages in queue)

============== strace heartbeat ==========

1516  12:38:25.923404 send(6, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.923610 alarm(0)          = 0
1516  12:38:25.923702 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.923764 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:25.923832 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.923888 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:25.923954 send(10, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.924121 alarm(0)          = 0
1516  12:38:25.924185 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.924250 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:25.924320 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.924384 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:25.924449 send(14, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.924622 alarm(0)          = 0
1516  12:38:25.924693 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.924754 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:25.924821 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.924877 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:25.924942 send(18, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.925104 alarm(0)          = 0
1516  12:38:25.925160 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.925226 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:25.925293 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:25.925356 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:25.925439 send(22, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2605,15:(0)lastseq=2605,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf411,8:(0)ttl=4,%%%\n42:1 9cad4f5da29af281be0ba8798f3fa90adcda87af,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:25.925602 alarm(0)          = 0
1516  12:38:25.925654 times({tms_utime=6067893, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540287
1516  12:38:25.925707 times({tms_utime=6067893, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540287
1516  12:38:26.105640 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.105735 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.105795 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.105870 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.105955 getpid()          = 1516
1516  12:38:26.106053 time(NULL)        = 1157362706
1516  12:38:26.106203 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540305
1516  12:38:26.106321 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.106379 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.106454 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.106509 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.106574 send(6, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.106747 alarm(0)          = 0
1516  12:38:26.106827 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.106890 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.106960 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107023 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.107087 send(10, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.107261 alarm(0)          = 0
1516  12:38:26.107316 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107382 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.107447 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107503 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.107567 send(14, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.107730 alarm(0)          = 0
1516  12:38:26.107775 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107849 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.107915 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.107972 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.108036 send(18, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.108222 alarm(0)          = 0
1516  12:38:26.108279 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.108353 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.108418 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.108489 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.108555 send(22, "\374\0\0\0\315\253\0\0###\n14:(0)t=NS_rexmit,12:(0)dest=ha-2,16:(0)firstseq=2603,15:(0)lastseq=2603,28:(1)destuuid=\247\364\274]\30\30AX\255\336\23_\265\316\374\254,11:(0)src=ha-1,27:(1)srcuuid=\351\342\25\342D\244A6\261\f\377\313(\215Sp,14:(0)hg=44fab878,14:(0)ts=44fbf412,8:(0)ttl=4,%%%\n42:1 2f4446f0155bc1fa66992c7cc827b9b39d5ef516,", 260, MSG_DONTWAIT|0x4000) = 260
1516  12:38:26.108719 alarm(0)          = 0
1516  12:38:26.108773 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540306
1516  12:38:26.108827 times({tms_utime=6067911, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540306
1516  12:38:26.288359 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540323
1516  12:38:26.288457 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540323
1516  12:38:26.288518 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.288587 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.288655 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.288725 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.288781 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.288845 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.288901 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.288959 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.289028 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.289094 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289158 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.289223 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289278 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.289343 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289399 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.289457 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.289524 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.289590 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289646 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.289711 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289766 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.289830 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.289894 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.289960 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.290019 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.290085 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290142 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.290207 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290263 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.290335 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290417 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.290476 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.290536 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.290601 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290656 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.290722 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290786 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.290859 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.290914 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.290971 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291031 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291091 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.291157 recv(8, 0x810db00, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.291221 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.291288 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.291346 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291405 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291465 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.291529 recv(12, 0x814a816, 1194, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.291584 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.291652 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.291710 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291778 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.291837 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.291902 recv(16, 0x8111b30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.291958 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.292016 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.292074 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292133 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292201 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.292266 recv(20, 0x8138e28, 1984, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.292323 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.292380 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.292437 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292505 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292572 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.292638 recv(24, 0x8136543, 621, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.292693 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.292752 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.292811 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292870 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.292939 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.293004 recv(25, 0x8117b78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.293059 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.293118 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.293177 times({tms_utime=6067928, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540324
1516  12:38:26.338534 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.338618 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.338699 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.338770 poll([{fd=6, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN|POLLPRI}, {fd=18, events=POLLIN|POLLPRI}, {fd=22, events=POLLIN|POLLPRI}, {fd=8, events=0}, {fd=16, events=0}, {fd=20, events=0}, {fd=12, events=0}, {fd=24, events=0}, {fd=25, events=0}], 11, 0) = 0
1516  12:38:26.338869 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.338929 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339000 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339057 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.339122 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339178 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.339236 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339294 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339360 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339415 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.339479 recv(10, 0x8103e30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339534 poll([{fd=10, events=0}], 1, 0) = 0
1516  12:38:26.339592 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339660 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.339725 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339781 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.339845 recv(14, 0x8105e48, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.339901 poll([{fd=14, events=0}], 1, 0) = 0
1516  12:38:26.339959 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340027 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340101 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340156 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.340220 recv(18, 0x8107e60, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340276 poll([{fd=18, events=0}], 1, 0) = 0
1516  12:38:26.340335 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340395 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340461 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340516 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.340580 recv(22, 0x8109e78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340636 poll([{fd=22, events=0}], 1, 0) = 0
1516  12:38:26.340702 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340761 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.340835 recv(8, 0x810db00, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.340890 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.340950 poll([{fd=8, events=0}], 1, 0) = 0
1516  12:38:26.341008 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341068 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341133 recv(12, 0x814a816, 1194, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.341189 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.341248 poll([{fd=12, events=0}], 1, 0) = 0
1516  12:38:26.341314 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341396 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341463 recv(16, 0x8111b30, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.341518 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.341579 poll([{fd=16, events=0}], 1, 0) = 0
1516  12:38:26.341646 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341706 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.341772 recv(20, 0x8138e28, 1984, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.341827 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.341895 poll([{fd=20, events=0}], 1, 0) = 0
1516  12:38:26.341954 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.342014 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.342079 recv(24, 0x8136543, 621, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.342135 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.342194 poll([{fd=24, events=0}], 1, 0) = 0
1516  12:38:26.342253 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.342312 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.342378 recv(25, 0x8117b78, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.342441 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.342500 poll([{fd=25, events=0}], 1, 0) = 0
1516  12:38:26.342559 times({tms_utime=6067933, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540329
1516  12:38:26.387898 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.387983 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.388042 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.388107 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.388186 getpid()          = 1516
1516  12:38:26.388280 time(NULL)        = 1157362706
1516  12:38:26.388434 times({tms_utime=6067937, tms_stime=33483, tms_cutime=0, tms_cstime=0}) = 6540333
1516  12:38:26.388571 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.388633 poll([{fd=6, events=0}], 1, 0) = 0
1516  12:38:26.388705 recv(6, 0x8101e18, 4072, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
1516  12:38:26.388760 poll([{fd=6, events=0}], 1, 0) = 0

============= ha.cf ===========

use_logd yes
node ha-1
node ha-2
udpport 694
ucast eth7 172.17.2.171 #e.g. real eth7 address on host 1
ucast eth7 172.17.2.172 #e.g. real eth7 address on host 2
ucast eth6 172.17.2.71 #e.g. real eth3 address on host 1
ucast eth6 172.17.2.72 #e.g. real eth3 address on host 2
auto_failback off
autojoin none
keepalive 1
deadtime 60
ping_group routers 10.0.4.253
deadping 60
warntime 30
compression    bz2
compression_threshold 2
traditional_compression false
coredumps true
initdead 60
msgfmt netstring
watchdog /dev/watchdog
crm yes
respawn hacluster       /usr/lib/heartbeat/cibmon -d
respawn root            /usr/lib/heartbeat/pingd -m 1000 -d 5s -a default_ping_set


More information about the Linux-HA mailing list