[Linux-HA] 2 subnets separated by cisco 6509 switch
Carson Gaspar
carson at taltos.org
Tue Mar 15 18:19:06 MST 2005
--On Tuesday, March 15, 2005 22:09:27 +0100 Kenneth Geisshirt
<kenneth at geisshirt.dk> wrote:
> Carson Gaspar wrote:
>
>> This really is a solved problem, but it does require a quorum mechanism.
>
> A SAN connection over 20 km? Or a distributed/mirrored SAN with show
> quorums like RHCS? I don't think it's a solved problem - maybe a partial
> solved problem.
No. This has _nothing_ to do with SANs.
You just need an odd number of servers >= 3. At that point you have one of
the following states for an active/standby config (all of this assumes some
sane quorum protocol - several exist):
- Active and Standby servers can talk to each other
- No problem
- Active and Standby servers can't talk to each other, Active server can
reach a quorum (>50% of servers)
- Active server stays active. Standby server, by definition, can not reach
a quorum, and thus may _not_ become active under any circumstance short of
human override.
- Active and Standby servers can't talk to each other, Active server can
not reach a quorum
- The Active server must give up all resources
- The Standby server may acquire resources after some safety interval iff
it has a quorum. The interval depends on how often the Active server checks
quorum, how long it takes to shut down resources, and how much of a safety
margin you want
- If the Standby server _also_ has no quorum it may not take over
resources (so the service stays down)
You have now solved the "who has control" problem. Replicate your data via
SRDF, DRBD, or whatever, you won't have an unintentional active/active
disaster (assuming properly behaved servers - STONITH has the extra
guarantee of protecting against buggy servers).
There is a _lot_ of literature about this - it seems sometimes that
heartbeat is reinventing the wheel (from an algorithm/theory perspective).
--
Carson
More information about the Linux-HA
mailing list