Rif: Re: [Linux-HA] explanations
alanr at unix.sh
Tue Jul 19 13:22:23 MDT 2005
Carson Gaspar wrote:
> --On Tuesday, July 12, 2005 01:51:40 PM -0600 Alan Robertson
> <alanr at unix.sh> wrote:
>> Quorum does not replace fencing. Split-brain cannot be completely
>> resolved by quorum alone.
> I am _fairly_ certain that quorum is (or can be - I haven't looked at
> quorum in 2.x) sufficient, assuming that all the code actually works as
> designed. STONITH (or storage fencing) protects against kernel /
> heartbeat / storage bugs in addition to split-brain. If I'm wrong, I'd
> love to see an example of a failure scenario.
Let's take this example from real life (i.e., this really happens):
1. Heartbeat on machine A has a shared disk mounted, machine B is
in standby mode. Machine C is in standby mode.
2. Machine A is running Red Hat 2.4.18
3. It's now 4 AM. The kernel stops scheduling heartbeat. It doesn't
run any more _at all_, for >= 2 minutes.
4. Machine B and C compute new membership w/o A. They have quorum.
5. Machine B mounts the shared disk. It is now toast.
6. Machine A wakes up and notices it doesn't have quorum. It
unmounts the disk. Too late!
There are numerous ways of solving this -- but they are all different
kinds of fencing, AFAIK...
- STONITH - kill node A.
- SCSI reservations - stop A from writing on the disk.
- Fiber Channel fencing - stop A from getting to the disk over FC
- "appropriate use" of a deadman timer
- self-fencing disk - which always talks exclusively to one node
in the cluster at a time (taking it over automatically severs
the connection to other servers)
Fencing is that thing which severs the ability of the errant node to get
access to cluster resources - without any cooperation on the part of the
software on errant node. It's errant, after all, you can't trust it to
be able to do anything...
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
More information about the Linux-HA