drbd (was nmbd)

Alan Robertson alanr@bell-labs.com
Sun, 07 Nov 1999 15:52:02 -0700


"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Sat, 6 Nov 1999 12:13:42 +0100, Philipp Reisner <kde@ist.org> said:
> 
> > If the primary node fails, heartbeat is switching the secondary device
> > into primary state and starts the application there. (If you are using
> > it with a non-journaling FS this involves running fsck)
> 
> > If the failed node comes up again, it is a new secondary node and has
> > to synchronise its content to the primary. This, of course, will happen
> > whithout interruption of service in the background.
> 
> But what happened if the failure wasn't a node failure but was only a
> network failure?  In other words, what happens if both nodes see the
> other machine disappear and start running on their own?

> Both machines can continue to run on their local copy, but when the
> network fault is restored, there is no way to reconcile the updates
> which have been made to the two instances of the raid array.
> 
> This is the cluster partition problem, and it is one of the hardest
> parts of HA.

Stephen is exactly right.  There are *several* different approaches to this
problem - none of which is perfect.  I think Volker Wiegand posted them a couple
of weeks ago.  One technique which helps whichever approach you choose is to
make sure your lowest level communications between the two nodes is *very*
highly reliable (by making it redundant, including other media types, etc.)

If all your communications paths between your two machines fail, and you make
this event VERY highly improbable (through redundancy), and your application
makes it's important updates as a result of stimulus which comes over the LAN
(which is very common), then even if this should occur, the probabilities are
even more remote that one side contains a "meaningful" update that would be lost
should one side abandon it's copy in favor of the other side's.

In practice, this is probably an acceptably small risk for many applications.

Additionally, redundant communications minimizes the chances of being bit by any
bugs or limitations in the method you use to deal with partitioning (since it
minimizes the chances of loss of communications).

So, for two or three different reasons, it is an eminently practical and good
thing to do.

	-- Alan Robertson
	   alanr@bell-labs.com