[Linux-HA] 2 subnets separated by cisco 6509 switch

Carson Gaspar carson at taltos.org
Tue Mar 15 18:19:06 MST 2005



--On Tuesday, March 15, 2005 22:09:27 +0100 Kenneth Geisshirt 
<kenneth at geisshirt.dk> wrote:

> Carson Gaspar wrote:
>
>> This really is a solved problem, but it does require a quorum mechanism.
>
> A SAN connection over 20 km? Or a distributed/mirrored SAN with show
> quorums like RHCS? I don't think it's a solved problem - maybe a partial
> solved problem.

No. This has _nothing_ to do with SANs.

You just need an odd number of servers >= 3. At that point you have one of 
the following states for an active/standby config (all of this assumes some 
sane quorum protocol - several exist):

- Active and Standby servers can talk to each other
	- No problem
- Active and Standby servers can't talk to each other, Active server can 
reach a quorum (>50% of servers)
	- Active server stays active. Standby server, by definition, can not reach 
a quorum, and thus may _not_ become active under any circumstance short of 
human override.
- Active and Standby servers can't talk to each other, Active server can 
not reach a quorum
	- The Active server must give up all resources
	- The Standby server may acquire resources after some safety interval iff 
it has a quorum. The interval depends on how often the Active server checks 
quorum, how long it takes to shut down resources, and how much of a safety 
margin you want
	- If the Standby server _also_ has no quorum it may not take over 
resources (so the service stays down)

You have now solved the "who has control" problem. Replicate your data via 
SRDF, DRBD, or whatever, you won't have an unintentional active/active 
disaster (assuming properly behaved servers - STONITH has the extra 
guarantee of protecting against buggy servers).

There is a _lot_ of literature about this - it seems sometimes that 
heartbeat is reinventing the wheel (from an algorithm/theory perspective).

-- 
Carson



More information about the Linux-HA mailing list