[Linux-HA] General problems understanding split brain (quorum)

Dejan Muhamedagic dejanmm at fastmail.fm
Thu Jul 9 04:56:33 MDT 2009


Hi,

On Wed, Jul 08, 2009 at 02:58:05PM +0200, Ehlers, Kolja wrote:
> Hello again,
> 
> I was trying to set up stonith using our APC Smart UPS 1000
> with Network Management Card (see the other Mail) but now I
> realized that those devices are not accessible using the
> apcsmart plugin (not over lan) and the apcmaster is not
> compatible with those APC UPS. Using the serial cable I can
> only connect 1 node to each APC so this is not helping either.
> But right now I am again very confused about stonith and I hope
> maybe someone could bring some light in the dark. Let me try to
> explain my problem understanding. 
> 
> If I configure a stonith device which sends in split brain
> situations shutdown commandos to node(s), how does this make my
> environment any saver than it is w/o it. All my nodes have 2
> NIC (192.168.0.x and 10.0.0.x). My hearbeat communication runs
> though both networks. Now if I had such a power managment
> hardware we would hook it up into the 192.168.0.x lan. Now in a
> situation when 1 node is not seen by the others anymore its not
> on the lan anymore either so the power management hardware can
> not shut it down also.  But it can not cause any damage even if
> it grabs my only ciritcal resource (a virtual ip). 
> 
> The point of what I am saying is that if my heartbeats are send
> over the lan even if a split brain happens it can not cause
> anything dangerous since that node is not in the lan available
> anymore. I could configure the ssh stonith device to shut the
> other node down through the 10.0.0.x interface and everything
> is covered.

Note that fencing (stonith) is used in other situations too, not
only for split brain. But, using one of the ssh stonith devices
is dangerous.

> I guess I am missing something here. Please help me out

Perhaps try with
http://clusterlabs.org/mediawiki/images/f/f2/Crm_fencing.pdf

Thanks,

Dejan

> Thanks
> 
> Kolja
> 
> 
> -----Urspr?ngliche Nachricht-----
> Von: linux-ha-bounces at lists.linux-ha.org [mailto:linux-ha-bounces at lists.linux-ha.org] Im Auftrag von Dejan Muhamedagic
> Gesendet: Mittwoch, 17. Juni 2009 15:34
> An: General Linux-HA mailing list
> Betreff: Re: [Linux-HA] General problems understanding split brain (quorum)
> 
> Hi,
> 
> On Wed, Jun 17, 2009 at 02:39:54PM +0200, Ehlers, Kolja wrote:
> > Thanks for the fast reply. 
> > 
> > If I set the no-quorum-policy to ignore my 2 ciritical resources, 2 
> > virtual ip addresses are bound to each node. Of course I dont want 
> > this to happen. You say I must use a fencing technique/device to do 
> > what? To stonith the other node(s)?
> > Exactly how do I protect my virtual ips from being bound by more than 
> > one node if I do not have a configurable power supply.
> 
> The only way is to use stonith. That is going to protect your resources, because in case of split-brain one node won't start
> resources before it fences (i.e. kills) the other node.  Besides, and that has been discussed often, it is highly recommendable to
> configure stonith. And there is really no excuse for not doing that, unless it is a stretch cluster.
> 
> > And again all this theory only works if a majority count subcluster 
> > has quorum, right?
> 
> See above.
> 
> > So there is nothing one can do in a two node cluster about a split 
> > brain? I dont see how this is an advantage over just defining one node 
> > as the master node to take over all resources during a split brain and 
> > all other node will not run critical resources anymore.
> 
> What if your master node is really down? How do you know it is a split brain?
> 
> Thanks,
> 
> Dejan
> 
> > 
> > Thanks
> > 
> >  
> > -----Urspr?ngliche Nachricht-----
> > Von: linux-ha-bounces at lists.linux-ha.org 
> > [mailto:linux-ha-bounces at lists.linux-ha.org] Im Auftrag von Dejan 
> > Muhamedagic
> > Gesendet: Mittwoch, 17. Juni 2009 14:01
> > An: General Linux-HA mailing list
> > Betreff: Re: [Linux-HA] General problems understanding split brain 
> > (quorum)
> > 
> > Hi,
> > 
> > On Wed, Jun 17, 2009 at 12:55:41PM +0200, Ehlers, Kolja wrote:
> > > Hello everybody,
> > > 
> > > I am having problems understanding split brain situations. If I 
> > > understand correctly when a split brain situation happens the larger 
> > > cluster fragment have quorum and these cluster members can decide to 
> > > fence off resources or to stonith the cluster members which
> > are not seen.
> > > 
> > > I have read that it is not sane to use a 2 node cluster, because in 
> > > split brain situations no one has quorum and the no-quorum-policy 
> > > decides how to deal with this.
> > 
> > There is no quorum in two-node clusters, so you set the policy to "ignore". You can use fencing to effectively replace it.
> > 
> > > But what if I
> > > have a 3 node cluster and the switch delivering the heartbeat 
> > > between those members dies. Then again I will have three separate 
> > > clusters each consisting of only 1 node and none having quorum.
> > 
> > Right. Make sure that your switch doesn't die. Or use more than one switch.
> > 
> > > If I think of a split brain would it not be the best action to merge 
> > > all resources to the DC and the other node(s) will shut down?  Is 
> > > this not possible to configure?
> > 
> > No.
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > > Thanks for your help
> > > 
> > > Kolja
> > > 
> > > Gesch?ftsf?hrung: Dr. Michael Fischer, Reinhard Eisebitt Amtsgericht 
> > > K?ln HRB 32356
> > > Steuer-Nr.: 217/5717/0536
> > > Ust.Id.-Nr.: DE 204051920
> > > --
> > > This email transmission and any documents, files or previous email 
> > > messages attached to it may contain information that is confidential 
> > > or legally privileged. If you are not the intended recipient or a 
> > > person responsible for delivering this transmission to the intended 
> > > recipient, you are hereby notified that any disclosure, copying, 
> > > printing, distribution or use of this transmission is strictly 
> > > prohibited. If you have received this transmission in error, please 
> > > immediately notify the sender by telephone or return email and 
> > > delete the original transmission and its attachments without reading or saving in any manner.
> > > 
> > 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> > Gesch?ftsf?hrung: Dr. Michael Fischer, Reinhard Eisebitt Amtsgericht 
> > K?ln HRB 32356
> > Steuer-Nr.: 217/5717/0536
> > Ust.Id.-Nr.: DE 204051920
> > --
> > This email transmission and any documents, files or previous email 
> > messages attached to it may contain information that is confidential 
> > or legally privileged. If you are not the intended recipient or a 
> > person responsible for delivering this transmission to the intended 
> > recipient, you are hereby notified that any disclosure, copying, 
> > printing, distribution or use of this transmission is strictly 
> > prohibited. If you have received this transmission in error, please 
> > immediately notify the sender by telephone or return email and delete 
> > the original transmission and its attachments without reading or saving in any manner.
> > 
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> Gesch?ftsf?hrung: Dr. Michael Fischer, Reinhard Eisebitt
> Amtsgericht K?ln HRB 32356
> Steuer-Nr.: 217/5717/0536
> Ust.Id.-Nr.: DE 204051920
> --
> This email transmission and any documents, files or previous email
> messages attached to it may contain information that is confidential or
> legally privileged. If you are not the intended recipient or a person
> responsible for delivering this transmission to the intended recipient,
> you are hereby notified that any disclosure, copying, printing,
> distribution or use of this transmission is strictly prohibited. If you
> have received this transmission in error, please immediately notify the
> sender by telephone or return email and delete the original transmission
> and its attachments without reading or saving in any manner.
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list