More Linux-HA heartbeat thoughts

alanr at bell-labs.com alanr at bell-labs.com
Wed Mar 18 21:49:16 MST 1998


Harald Milz wrote:
> alanr at bell-labs.com wrote:
> > Second off, you do well if you choose the most reliable communication
> > system you can get for heartbeat and then think very carefully about its
> > physical security and topology. 
> 
> No. You need to heartbeat across all networks and interfaces configured
> for IP address takeover plus the non-IP networks. Otherwise the cluster
> manager has no way of figuring out which interface is down, and makes the
> wrong decisions which IP address to fail over where. 

I can't argue with what you said here, but I still claim that you want one
version of the heartbeat that is based on a very reliable communication medium.
 This helps better and more reliably diagnose the nature of the failure, and
allows the cluster to transition smoothly more often.

> > communication arrangement mentioned by someone earlier.  Even so, it
> > makes sense for some kind of IP version of the heartbeat exist, if only
> > as a backup and verification.
> 
> Ummm... you may want to read the respective section of my HA-HOWTO which
> has all the pros and cons of heartbeat topologies.

I read it, guess I'd better re-read it.  I'm only intermittently connected to
the network at the moment, so it may have to wait.  I'm also reading a good
book entitled "In search of Clusters" (and it doesn't require a network :-))
 
> > This is an important topic for the following reason:
> > What if your network fragments into two working subnets that can't
> > communicate with each other, and that EACH KEEPS WORKING independently. 
> > Maybe they've done 
> 
> A so-called partitioned cluster. You need to have some logic which makes
> sure data isn't inconsistent. IBM's HACMP (which I happen to know best) has
> the following algo but I can also think of others. 
> 
> - if the partitions are equal in size, the alphanumeric lower machines (sum
>   of all alphanumeric names) survives. The other half shuts down. 
> - if the partitions aren't equal, the bigger half survives. 

This assumes that you are able to detect that the cluster has been partitioned.
 This happens more often when you have a highly reliable heartbeat mechanism.

> This strategy makes sure data doesn't get corrupted or inconsistent.
> Depending on your priorities, other algos can be thought of. 
> 
> Since network partitioning isn't that unlikely in real life, you should
> always care to set up a non-IP heartbeat. In case of multi-host SCSI
> attachments, SCSI target mode is also fine. The HB rate should very low,
> though, to prevent heartbeat packets from getting lost during peak load
> times, accidently interpreted as interface failures. This also has the
> advantage of checking the SCSI adapters being alive :-) Another argument
> for a HB abstraction layer!

It isn't as elegant as one might want, but it's implementable in the existing
structure.  I had it in mind (even if it wasn't right up front all the time).

> > If a node sees a rapid and radical change in cluster topology, it might
> > be the smartest thing going for the applications to decide to sit down
> > and do nothing until they hear from a human being.  This can ultimately
> > improve reliability and availability (from an application perspective) --
> > strange as it seems. 
> 
> You are entirely correct. This is e.g. what the deadman switch does. It
> halts a node if it thinks the cluster manager went haywire. 

	-- Alan Robertson
	   alanr at bell-labs.com



More information about the Linux-HA mailing list