Heartbeating two boxes

alanr at bell-labs.com alanr at bell-labs.com
Fri Oct 16 23:14:44 MDT 1998

Dave Rynne wrote:

> Only just joined this list so I don't know what status Linux HA is at...
> Anyway, I'm currently looking into what options are available for
> heartbeating Linux boxes. Setup is two completely independent boxes
> (own disks, etc..) acting as fall back for each other. They're only
> mail servers so I don't need immediate detection at this stage -
> five minutes or so would do and I could even live with one box being
> out for up to 15 minutes. Eventually this will scale up to N boxes
> so I'll need a different setup then (redundant boxes, etc..) but for
> now I just need a two box scenario.

I am now in the process of getting my company to allow me to release
software which will *detect* this condition in a reliable way.  It then
invokes a script when a server becomes unavailable, and another when it
comes back up.  It should be able to detect this condition in a few
seconds.  It is not yet well-tested.  Hope to do some more of that in the
next couple of weeks.  I should get permission to release it within 4 weeks.

> This should even be doable in shell scripts! I'm not one for re-inventing
> the wheel and was wondering if anyones already done this?

There's been a lot of discussion on this in the last few days.  You can read
all about it at:

> Details:
>    Two Linux x86 boxes, Intel eepro cards onto switch Ethernet. LAN
> has Intel x86 boxes, a few old FreeBSD boxes (soon to be thrown out)
> and a couple of old SPARCs (again - soon to be thrown out). All boxes
> connect to 3Com switches and the whole lot is plugged into a Cisco
> 3620. So a few different issues with ARP cache timeouts, etc... here.
> Linux boxes are RH5.1 with 2.1.1XX kernels. Not too sure how much other
> info is really relevant here.
> Scenario (rough & vague):
>    Box A falls over ... time passes (how much??) ... B see this, picks
> up A's service IP and MAC addr and all is well. Then A comes back up and
> B drops the IP & MAC addr as A picks them back up. Opposite applies if
> B falls over.

Hope this is of some help.

            -- Alan Robertson
               alanr at bell-labs.com

