[Linux-HA] Broadcast Heartbeat gets lost if a interface gets temporary unavailable

Dejan Muhamedagic dejanmm at fastmail.fm
Thu Jan 3 06:07:27 MST 2008


Hi,

On Wed, Jan 02, 2008 at 10:57:26PM +0100, Thomas Glanzmann wrote:
> Hello Dejan,
> 
> > http://developerbugs.linux-foundation.org/show_bug.cgi?id=1732
> 
> Alan Robertson wrote the following:
> 
> > I think this is now fixed.
> 
> > If we have repeated EBADF or ENODEV errors, then the write process will exit
> 
> > If a write or read process exits, the device is reopened and the processes
> >    are respawned
> 
> > If the device won't open, then the reopen/respawn process is tried again
> >    in 5 minutes.
> 
> > If a write operation takes more than 'keepalive' ms, then it is assumed that
> > the device is hung, and the input queue to the write process is cleared out and
> > it tries again, with a bounded amount of whining.  This ensures that when the
> > device eventually comes back, that we don't send out packets that are days old.
> 
> > If a write operation to a child write process fails, the read/write pair is
> > killed and respawned.  Rules about failures are as described above...
> 
> > For good measure, I made it so you can kill the fifo process as well...
> 
> But for me when I type
> 
>         ifdown eth1; sleep 10; ifup eth1
> 
> it doesn't come back after 5 minutes. I just double checked.

Then please reopen the bugzilla. Use hb_report to generate the
log/configs.

Thanks,

Dejan

> 
>         Thomas
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


More information about the Linux-HA mailing list