[Linux-HA] General problems understanding split brain (quorum)

Ehlers, Kolja ehlers at clinresearch.com
Tue Jul 14 03:50:30 MDT 2009


Hello,

one more thing that I was wondering while reading in the mailinglist and in stonith documents. I understand that it is crucial to
shut down broken nodes when using mounted file systems in the cluster. But if I am not using that, isnt it quite bad to kill a node
hard when only using "light-weight" resources like a virtual IP. I mean just shutting the server off might result in data loss also
doesnt it? 

Thanks

-----Ursprüngliche Nachricht-----
Von: linux-ha-bounces at lists.linux-ha.org [mailto:linux-ha-bounces at lists.linux-ha.org] Im Auftrag von Dejan Muhamedagic
Gesendet: Donnerstag, 9. Juli 2009 12:57
An: General Linux-HA mailing list
Betreff: Re: [Linux-HA] General problems understanding split brain (quorum)

Hi,

On Wed, Jul 08, 2009 at 02:58:05PM +0200, Ehlers, Kolja wrote:
> Hello again,
> 
> I was trying to set up stonith using our APC Smart UPS 1000 with 
> Network Management Card (see the other Mail) but now I realized that 
> those devices are not accessible using the apcsmart plugin (not over 
> lan) and the apcmaster is not compatible with those APC UPS. Using the 
> serial cable I can only connect 1 node to each APC so this is not 
> helping either.
> But right now I am again very confused about stonith and I hope maybe 
> someone could bring some light in the dark. Let me try to explain my 
> problem understanding.
> 
> If I configure a stonith device which sends in split brain situations 
> shutdown commandos to node(s), how does this make my environment any 
> saver than it is w/o it. All my nodes have 2 NIC (192.168.0.x and 
> 10.0.0.x). My hearbeat communication runs though both networks. Now if 
> I had such a power managment hardware we would hook it up into the 
> 192.168.0.x lan. Now in a situation when 1 node is not seen by the 
> others anymore its not on the lan anymore either so the power 
> management hardware can not shut it down also.  But it can not cause 
> any damage even if it grabs my only ciritcal resource (a virtual ip).
> 
> The point of what I am saying is that if my heartbeats are send over 
> the lan even if a split brain happens it can not cause anything 
> dangerous since that node is not in the lan available anymore. I could 
> configure the ssh stonith device to shut the other node down through 
> the 10.0.0.x interface and everything is covered.

Note that fencing (stonith) is used in other situations too, not only for split brain. But, using one of the ssh stonith devices is
dangerous.

> I guess I am missing something here. Please help me out

Perhaps try with
http://clusterlabs.org/mediawiki/images/f/f2/Crm_fencing.pdf

Thanks,

Dejan

> Thanks
> 
> Kolja
> 
> 
> -----Urspr?ngliche Nachricht-----
> Von: linux-ha-bounces at lists.linux-ha.org 
> [mailto:linux-ha-bounces at lists.linux-ha.org] Im Auftrag von Dejan 
> Muhamedagic
> Gesendet: Mittwoch, 17. Juni 2009 15:34
> An: General Linux-HA mailing list
> Betreff: Re: [Linux-HA] General problems understanding split brain 
> (quorum)
> 
> Hi,
> 
> On Wed, Jun 17, 2009 at 02:39:54PM +0200, Ehlers, Kolja wrote:
> > Thanks for the fast reply. 
> > 
> > If I set the no-quorum-policy to ignore my 2 ciritical resources, 2 
> > virtual ip addresses are bound to each node. Of course I dont want 
> > this to happen. You say I must use a fencing technique/device to do 
> > what? To stonith the other node(s)?
> > Exactly how do I protect my virtual ips from being bound by more 
> > than one node if I do not have a configurable power supply.
> 
> The only way is to use stonith. That is going to protect your 
> resources, because in case of split-brain one node won't start 
> resources before it fences (i.e. kills) the other node.  Besides, and that has been discussed often, it is highly recommendable to
configure stonith. And there is really no excuse for not doing that, unless it is a stretch cluster.
> 
> > And again all this theory only works if a majority count subcluster 
> > has quorum, right?
> 
> See above.
> 
> > So there is nothing one can do in a two node cluster about a split 
> > brain? I dont see how this is an advantage over just defining one 
> > node as the master node to take over all resources during a split 
> > brain and all other node will not run critical resources anymore.
> 
> What if your master node is really down? How do you know it is a split brain?
> 
> Thanks,
> 
> Dejan
> 
> > 
> > Thanks
> > 
> >  
> > -----Urspr?ngliche Nachricht-----
> > Von: linux-ha-bounces at lists.linux-ha.org
> > [mailto:linux-ha-bounces at lists.linux-ha.org] Im Auftrag von Dejan 
> > Muhamedagic
> > Gesendet: Mittwoch, 17. Juni 2009 14:01
> > An: General Linux-HA mailing list
> > Betreff: Re: [Linux-HA] General problems understanding split brain
> > (quorum)
> > 
> > Hi,
> > 
> > On Wed, Jun 17, 2009 at 12:55:41PM +0200, Ehlers, Kolja wrote:
> > > Hello everybody,
> > > 
> > > I am having problems understanding split brain situations. If I 
> > > understand correctly when a split brain situation happens the 
> > > larger cluster fragment have quorum and these cluster members can 
> > > decide to fence off resources or to stonith the cluster members 
> > > which
> > are not seen.
> > > 
> > > I have read that it is not sane to use a 2 node cluster, because 
> > > in split brain situations no one has quorum and the 
> > > no-quorum-policy decides how to deal with this.
> > 
> > There is no quorum in two-node clusters, so you set the policy to "ignore". You can use fencing to effectively replace it.
> > 
> > > But what if I
> > > have a 3 node cluster and the switch delivering the heartbeat 
> > > between those members dies. Then again I will have three separate 
> > > clusters each consisting of only 1 node and none having quorum.
> > 
> > Right. Make sure that your switch doesn't die. Or use more than one switch.
> > 
> > > If I think of a split brain would it not be the best action to 
> > > merge all resources to the DC and the other node(s) will shut 
> > > down?  Is this not possible to configure?
> > 
> > No.
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > > Thanks for your help
> > > 
> > > Kolja
> > > 
> > > Gesch?ftsf?hrung: Dr. Michael Fischer, Reinhard Eisebitt 
> > > Amtsgericht K?ln HRB 32356
> > > Steuer-Nr.: 217/5717/0536
> > > Ust.Id.-Nr.: DE 204051920
> > > --
> > > This email transmission and any documents, files or previous email 
> > > messages attached to it may contain information that is 
> > > confidential or legally privileged. If you are not the intended 
> > > recipient or a person responsible for delivering this transmission 
> > > to the intended recipient, you are hereby notified that any 
> > > disclosure, copying, printing, distribution or use of this 
> > > transmission is strictly prohibited. If you have received this 
> > > transmission in error, please immediately notify the sender by 
> > > telephone or return email and delete the original transmission and its attachments without reading or saving in any manner.
> > > 
> > 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> > Gesch?ftsf?hrung: Dr. Michael Fischer, Reinhard Eisebitt Amtsgericht 
> > K?ln HRB 32356
> > Steuer-Nr.: 217/5717/0536
> > Ust.Id.-Nr.: DE 204051920
> > --
> > This email transmission and any documents, files or previous email 
> > messages attached to it may contain information that is confidential 
> > or legally privileged. If you are not the intended recipient or a 
> > person responsible for delivering this transmission to the intended 
> > recipient, you are hereby notified that any disclosure, copying, 
> > printing, distribution or use of this transmission is strictly 
> > prohibited. If you have received this transmission in error, please 
> > immediately notify the sender by telephone or return email and 
> > delete the original transmission and its attachments without reading or saving in any manner.
> > 
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> Gesch?ftsf?hrung: Dr. Michael Fischer, Reinhard Eisebitt Amtsgericht 
> K?ln HRB 32356
> Steuer-Nr.: 217/5717/0536
> Ust.Id.-Nr.: DE 204051920
> --
> This email transmission and any documents, files or previous email 
> messages attached to it may contain information that is confidential 
> or legally privileged. If you are not the intended recipient or a 
> person responsible for delivering this transmission to the intended 
> recipient, you are hereby notified that any disclosure, copying, 
> printing, distribution or use of this transmission is strictly 
> prohibited. If you have received this transmission in error, please 
> immediately notify the sender by telephone or return email and delete 
> the original transmission and its attachments without reading or saving in any manner.
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
Amtsgericht Köln HRB 32356
Steuer-Nr.: 217/5717/0536
Ust.Id.-Nr.: DE 204051920
--
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that any disclosure, copying, printing,
distribution or use of this transmission is strictly prohibited. If you
have received this transmission in error, please immediately notify the
sender by telephone or return email and delete the original transmission
and its attachments without reading or saving in any manner.




More information about the Linux-HA mailing list