[Linux-HA] Broadcast Heartbeat gets lost if a interface gets
temporary unavailable
Dejan Muhamedagic
dejanmm at fastmail.fm
Thu Jan 3 06:07:27 MST 2008
Hi,
On Wed, Jan 02, 2008 at 10:57:26PM +0100, Thomas Glanzmann wrote:
> Hello Dejan,
>
> > http://developerbugs.linux-foundation.org/show_bug.cgi?id=1732
>
> Alan Robertson wrote the following:
>
> > I think this is now fixed.
>
> > If we have repeated EBADF or ENODEV errors, then the write process will exit
>
> > If a write or read process exits, the device is reopened and the processes
> > are respawned
>
> > If the device won't open, then the reopen/respawn process is tried again
> > in 5 minutes.
>
> > If a write operation takes more than 'keepalive' ms, then it is assumed that
> > the device is hung, and the input queue to the write process is cleared out and
> > it tries again, with a bounded amount of whining. This ensures that when the
> > device eventually comes back, that we don't send out packets that are days old.
>
> > If a write operation to a child write process fails, the read/write pair is
> > killed and respawned. Rules about failures are as described above...
>
> > For good measure, I made it so you can kill the fifo process as well...
>
> But for me when I type
>
> ifdown eth1; sleep 10; ifup eth1
>
> it doesn't come back after 5 minutes. I just double checked.
Then please reopen the bugzilla. Use hb_report to generate the
log/configs.
Thanks,
Dejan
>
> Thomas
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA
mailing list