More Linux-HA heartbeat thoughts
Alan Robertson <alanr@bell-labs.com>
Alan Robertson <alanr@bell-labs.com>
Thu, 19 Mar 1998 05:49:16 +0100
Harald Milz wrote:
> alanr@bell-labs.com wrote:
> > Second off, you do well if you choose the most reliable communication
> > system you can get for heartbeat and then think very carefully about its
> > physical security and topology.
>
> No. You need to heartbeat across all networks and interfaces configured
> for IP address takeover plus the non-IP networks. Otherwise the cluster
> manager has no way of figuring out which interface is down, and makes the
> wrong decisions which IP address to fail over where.
I can't argue with what you said here, but I still claim that you want one
version of the heartbeat that is based on a very reliable communication medium.
This helps better and more reliably diagnose the nature of the failure, and
allows the cluster to transition smoothly more often.
> > communication arrangement mentioned by someone earlier. Even so, it
> > makes sense for some kind of IP version of the heartbeat exist, if only
> > as a backup and verification.
>
> Ummm... you may want to read the respective section of my HA-HOWTO which
> has all the pros and cons of heartbeat topologies.
I read it, guess I'd better re-read it. I'm only intermittently connected to
the network at the moment, so it may have to wait. I'm also reading a good
book entitled "In search of Clusters" (and it doesn't require a network :-))
> > This is an important topic for the following reason:
> > What if your network fragments into two working subnets that can't
> > communicate with each other, and that EACH KEEPS WORKING independently.
> > Maybe they've done
>
> A so-called partitioned cluster. You need to have some logic which makes
> sure data isn't inconsistent. IBM's HACMP (which I happen to know best) has
> the following algo but I can also think of others.
>
> - if the partitions are equal in size, the alphanumeric lower machines (sum
> of all alphanumeric names) survives. The other half shuts down.
> - if the partitions aren't equal, the bigger half survives.
This assumes that you are able to detect that the cluster has been partitioned.
This happens more often when you have a highly reliable heartbeat mechanism.
> This strategy makes sure data doesn't get corrupted or inconsistent.
> Depending on your priorities, other algos can be thought of.
>
> Since network partitioning isn't that unlikely in real life, you should
> always care to set up a non-IP heartbeat. In case of multi-host SCSI
> attachments, SCSI target mode is also fine. The HB rate should very low,
> though, to prevent heartbeat packets from getting lost during peak load
> times, accidently interpreted as interface failures. This also has the
> advantage of checking the SCSI adapters being alive :-) Another argument
> for a HB abstraction layer!
It isn't as elegant as one might want, but it's implementable in the existing
structure. I had it in mind (even if it wasn't right up front all the time).
> > If a node sees a rapid and radical change in cluster topology, it might
> > be the smartest thing going for the applications to decide to sit down
> > and do nothing until they hear from a human being. This can ultimately
> > improve reliability and availability (from an application perspective) --
> > strange as it seems.
>
> You are entirely correct. This is e.g. what the deadman switch does. It
> halts a node if it thinks the cluster manager went haywire.
-- Alan Robertson
alanr@bell-labs.com