drbd (was nmbd)

Stephen C. Tweedie sct at redhat.com
Sun Nov 7 10:35:03 MST 1999


Hi,

On Sat, 6 Nov 1999 12:13:42 +0100, Philipp Reisner <kde at ist.org> said:

> If the primary node fails, heartbeat is switching the secondary device
> into primary state and starts the application there. (If you are using
> it with a non-journaling FS this involves running fsck)

> If the failed node comes up again, it is a new secondary node and has
> to synchronise its content to the primary. This, of course, will happen
> whithout interruption of service in the background. 

But what happened if the failure wasn't a node failure but was only a
network failure?  In other words, what happens if both nodes see the
other machine disappear and start running on their own?

Both machines can continue to run on their local copy, but when the
network fault is restored, there is no way to reconcile the updates
which have been made to the two instances of the raid array.

This is the cluster partition problem, and it is one of the hardest
parts of HA.

--Stephen



More information about the Linux-HA mailing list