Rif: Re: [Linux-HA] explanations

Carson Gaspar carson at taltos.org
Wed Jul 20 17:35:09 MDT 2005

--On Wednesday, July 20, 2005 08:17:58 PM +0200 Lars Marowsky-Bree 
<lmb at suse.de> wrote:

> On 2005-07-20T14:07:33, Carson Gaspar <carson at taltos.org> wrote:
>> If you weight node members (via whatever algorithm), then it is
>> sufficient  as long as the total is odd, and all members can calculate
>> their weighting  even if they loose contact with other members.
> For 2 node clusters this is totally insufficient, because one side will
> always win - and if that is the side which goes down, the service is
> gone.

True. So don't do that. Add a cluster member who's sole purpose in life is 
to play quorum arbiter - it's a very low overhead job ;-)

> First, your minority _is_ using fencing, namely self-fencing - and to
> keep "t1" low, this will typically be something fast, simple and thus
> crude like node suicide.

Ah - terminology gap. I just don't think of that as fencing - it's more of 
a highway lane marker than a fence ;-)

> One must concede though that still, it is less safe than a fencing
> method which is not implicitly (by a timeout) but explicitly
> acknowledged to the survivors, like STONITH or storage fencing.

Of course - I firmly believe in a stong fencing mechanism where possible. 
Sadly, in many of my desired deployment scenarios, it just isn't.

> Node suicide is not a very good mechanism if it is only software based,
> and if you want to protect against hung kernels, broken HBAs etc, but it
> may be useful in carefully controlled environments.

Which was my original point ;-)

Of course I routinely deal with situations where my worst outages are 
building power-downs and WAN failures, so I'm biased in favour of having a 
quorum-only mechanism as an option.

Even better would be a combination of quorum and STONITH, where a STONITH 
failure isn't fatal. I can use STONITH with HP's iLO, but if the failure is 
a lack of power or network connectivity, STONITH is going to fail. In the 
cases where it fails, I'm _extremely_ confident in taking over shared 
resources if I also have quorum majority.


