More Linux-HA heartbeat thoughts

Paul A. Bender pbender at
Tue Mar 17 00:02:52 MST 1998

On Mon, 16 Mar 1998 alanr at wrote:

> I've been trying to think [a dangerous activity] some more about Linux-HA and
> heartbeat processes.  The results of this random rumination can be found in a
> couple of documents found on the web at:
> and
> They should probably be read in the indicated order.
> Please comment on them back to the list.  They're pretty rough at this point
> but should indicate what I had in mind.

Looks fairly well laid out... I think both connectivity schemes should
work properly.. but for the ring, what do we do when we're using modems to
connect two machines...i.e. you're keeping more than 3 machines connected
in a null-modem loop  which is fine when all the machines are enclosed
kept fairly close together... but when you start moving the spare servers
out of the building... so they can remain remail operational even durring
a natural disaster which destroyes the primary machine(s), If there are
multiple backup sets in multiple buildings, we also have to signal the
ring to rebuild itself using different telephone numbers... something you
accomplished with switches on the null modem ring... therefore, each host
involved will have to be able to determin which phone number it needs to
call (or should it be a call this list, until you get a connect) in the
case of a dropped connection on an outgoing line (the incoming line
shouldn't be a problem... if all it does is expect an incoming telephone

also, on the heartbeat protocol itself, we need to decide what kind of
timeout values constitute a failure... I.E. HOw often  do we send a
Heartbeat packet, and how many time intervals without a packet does it
take before we decide a packet is dead, and reconfigure the cluster...

Just some thoughts (sorry If I rambled a bit...)


