Linux-HA heart beats!

alanr at bell-labs.com alanr at bell-labs.com
Thu Mar 19 15:35:22 MST 1998


> alanr at bell-labs.com wrote:
> > I'm not an expert on this by any stretch of the imagination, but I
> > haven't seen anything that leads me to believe that a one-size-fits-all
> > approach is the right one for cluster management. 
> 
> But that's exactly how the commercial solutions work. What is quite common
> is the following approach:
> 
> - a single cluster manager daemon which cares for the cluster status and
>   spawns failover scripts determined by the type of failure. It should
>   read a configuration database describing what nodes, networks,
>   interfaces, and applications there are and how they should fail over
>   to a surviving node. The cluster manager should not make any decisions
>   or assumptions itself. Instead, you tell it which node should restart
>   the application that ran on a failing node before. This reduces the
>   potential complexity a lot and leaves the decision the the administrator.
>    For example Application A should run on Node A, B, and C in falling
>   priority, App. B should run on Node B, C, and A etc. You get the picture.
>   This way, the cluster manager's failover logic can be very generic as far
>   as the number of nodes is concerned. But this is described in the
>   HA-HOWTO... 

And each commercial system does it a little differently, with different
approaches, advantages, and disadvantages.

> - an abstraction layer for heartbeats which only tells the cluster manager
>   what went wrong. The cluster manager in turn decides what to do i.a.w.
>   the configuration database.

As a note, my design/code doesn't take any actions at all, except to invoke
scripts written by others, which do whatever they do :-).  The heartbeat
manager will send any kind of cluster-wide messages that it's asked to.  It
naturally performs a basic "status" communication without being told. It
doesn't know what the statuses mean, except for the special statuses "dead" and
"unknown".

	-- Alan Robertson
	   alanr at bell-labs.com



More information about the Linux-HA mailing list