[Linux-HA] 2 subnets separated by cisco 6509 switch
lmb at suse.de
Wed Mar 16 08:14:43 MST 2005
On 2005-03-15T15:50:44, Carson Gaspar <carson at taltos.org> wrote:
> This really is a solved problem, but it does require a quorum mechanism.
The quorum algorithm for DR failover is typically tied to giving the
benefit of doubt to the primary site, and requiring manual
acknowledgement on the secondary site, because of the huge cost in
shifting operations over to a backup data center, and the impact this
has on business operations.
Also, the shift-back is costly, so a wrong failover costs a whole lot
more than just switching back from one blade in the same rack to the
> should be practical to implement in heartbeat 2.x (now that it supports
> more than 2 nodes).
Yes. Some of the quorum algorithms we're discussing fit this model well.
We're working on it, and appreciate your help over there on the
Patches are being accepted; want to sign up for writing the quorum
plugins for our membership system? ;-)
> But it does guarantee that split brain causes no harm, and is no worse
> that the current STONITH failure case (data center loss taking out
> server & STONITH device).
This is a deadly wrong assumption. Quorum, as Alan says, does not
guarantee timely release of resources by the active node. Fencing is
Lars Marowsky-Brée <lmb at suse.de>
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
More information about the Linux-HA