[Fwd: [Linux-HA] active heartbeat dying < 8 minutes after standby
alanr at unix.sh
Fri Sep 9 11:41:00 MDT 2005
Guochun Shi wrote:
> At 07:03 PM 9/8/2005 -0600, you wrote:
>>Guochun Shi wrote:
>>>This one looks like the same bug as bug 699 (http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=699)
>>>The following is the patch I believe that will make the problem go away
>>>Of course, it hides the write in the send side, however, the receiving side will print out this media dead so the write failure will still
>>This leaves some really stale data in the write pipe.
>>When the network is restored, this will come out and it will generate irretrievably lost packet messages.
> why will it generate irretrievably lost packet messages?
> If there are other medias and those message are already transmitted, then the staled messages will be ignored after the network is restored.
> If there is no other media, then other nodes will be claimed dead.
> In either case, it will not generate irretrievably lost packet messages.
In the second case, we will be declared dead.
When someone fixes the link, then we will send a few packets which
haven't been seen yet. Or maybe the other machine will reboot. Either
And, we have hundreds (or thousands) of packets we've been sending but
no one has been hearing.
So, the recv queue in the child maybe has packet 100, 101, 102 and the
current sequence number is 1200, 1201 1202, etc. So, what will be sent
to the other nodes is the following packets:
100, 101, 102, 2200, 2201, 2202, ...
And the other sides will ask for packets 103-2199. And we won't have
them (or at least some of them) to retransmit. And, even if we had
them, they're way too old to be acting on.
When this happens, we need to flush out that queue - on both sides -
every few seconds at least.
This is what we do in the serial port case. I'm pretty sure it's necessary.
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
More information about the Linux-HA