[Linux-HA] 2 subnets separated by cisco 6509 switch
lmb at suse.de
Wed Mar 16 08:20:55 MST 2005
On 2005-03-15T20:19:06, Carson Gaspar <carson at taltos.org> wrote:
> You have now solved the "who has control" problem. Replicate your data via
> SRDF, DRBD, or whatever, you won't have an unintentional active/active
> disaster (assuming properly behaved servers - STONITH has the extra
> guarantee of protecting against buggy servers).
Besides the reliance of your algorithm on timing and proper shutdown to
protect data integrity, yes.
Another reason why DR failover typically needs manual acknowledgement
is that if the links are cut, true automated fencing can't occur. Basic
impossibility 101, unless the fencing happens via telekinesis. So, the
manual override is the admin taking responsibility for this.
> There is a _lot_ of literature about this - it seems sometimes that
> heartbeat is reinventing the wheel (from an algorithm/theory
<tongue in cheek>
There's an awful lot of literature and previous discussion on that. It
seems sometimes that people new to a mailing list keep reinventing the
wheel. (From an FAQ perspective.)
Good literature here IMHO would be my all time favourite "Blue prints in
High Availability", but also "Fault Tolerance in Distributed Systems".
Please, by all means, help us design and write the code, and write up
the assorted algorithms and decision strategies for our documentation /
wiki. It's just not very nice to come up on a list and assume the people
keep reinventing the wheel for a perfectly solved problem: It's not
that we don't understand it, it's that it's a bit harder than others,
and that we have to build it one line of code at a time.
Lars Marowsky-Brée <lmb at suse.de>
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
More information about the Linux-HA