[Linux-HA] Is the following logic by design for the Split-Brain Case ? If yes - can i disable it ?

Dejan Muhamedagic dejanmm at fastmail.fm
Thu Sep 6 07:54:10 MDT 2007


On Thu, Sep 06, 2007 at 04:47:46AM -0700, Harakiri wrote:
> Sorry, i havent been able to reply yet.
> 
> We are using heartbeat-2 v. 2.0.7-2 which is the
> stable from debian etch.
> 
> As far as i can see, your current release is 2.1.2
> which only differs in the minor number.
> 
> Did you fix issues between 2.0.7-2 and 2.1.2 regarding
> split-brain case ?

Perhaps, but probably not. The membership layer didn't see many
changes in the last couple of years. However, as I wrote
yesterday in another post, the split brain is something best to
be avoided.

> Regarding the information you have now, would you say
> that its definitly a bug in heartbeat if the logs, as
> you said look fine ?

Yes, it's definitely a bug somewhere. And, I'd definitely
recommend upgrading.

Thanks,

Dejan

> Thanks
> 
> --- Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> 
> > Hi,
> > 
> > On Fri, Aug 31, 2007 at 09:05:54AM -0700, Harakiri
> > wrote:
> > > Hello,
> > > 
> > > > It is most probably a bug. The cluster should be
> > > > able to recover
> > > > from split brain. Please post the logs.
> > > > 
> > > > Dejan
> > > 
> > > attached to this message are the log files.
> > > 
> > > server1_network_down.txt - the log of server1 when
> > the
> > > network went down
> > > 
> > > server2_network_down.txt - the log of server2 when
> > the
> > > network went down
> > > 
> > > server1_network_restored.txt - the log of server1
> > when
> > > the network has been restored
> > > 
> > > server2_network_restored.txt - the log of server2
> > when
> > > the network has been restored
> > > 
> > > resource_my_service = the service which has been
> > > configured for heartbeat
> > 
> > Read the logs and there everything looks fine. Don't
> > know why
> > crm_mon shows the nodes as offline on that one node.
> > In the logs,
> > the nodes claimed to have voted a DC:
> > 
> > Aug 25 04:35:53 server1 crmd: [24516]: info:
> > update_dc:utils.c Set DC to server1 (1.0.6)
> > Aug 25 04:35:55 server2 crmd: [19821]: info:
> > update_dc:utils.c Set DC to server1 (1.0.6)
> > 
> > which they wouldn't do unless they established the
> > current membership.
> > 
> > BTW, this seems to be a bit older version of
> > Heartbeat. Perhaps
> > you can upgrade to the latest stable (2.1.2).
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > > Thanks for your help
> > > 
> > > 
> > > --- Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > On Fri, Aug 31, 2007 at 07:36:34AM -0700,
> > Harakiri
> > > > wrote:
> > > > > Hello List,
> > > > > 
> > > > > suppose i have a cluster with 2 members, lets
> > call
> > > > > them server1 and server2.
> > > > > 
> > > > > Before a network outage - a service configured
> > for
> > > > > heartbeat is running on server2 only, crm_mon
> > > > shows
> > > > > that both nodes are online on both servers.
> > > > > 
> > > > > Now after the network outage, server1 starts
> > up
> > > > the
> > > > > same service as server2 - this makes sense at
> > is
> > > > the
> > > > > expected behaviour of a fail over. 
> > > > > 
> > > > > During this time the service is running on
> > both
> > > > > servers because they do not "see each other".
> > > > > 
> > > > > After a few hours, the network is restored -
> > > > server1
> > > > > sees that server2 is already (or still)
> > running
> > > > the
> > > > > service in question - so it disables the
> > service. 
> > > > > 
> > > > > server1 shows both nodes online and that the
> > > > service
> > > > > is running on server2.
> > > > > 
> > > > > However, on server2 both nodes show as offline
> > but
> > > > the
> > > > > service in question is still running and
> > managed
> > > > by
> > > > > heartbeat.
> > > > > 
> > > > > Is this the expected behaviour for a Split
> > Brain
> > > > > situation ? I.e. do not activate the node
> > > > (server2)
> > > > > after a split brain to be sure to not have
> > > > > inconsistency ?
> > > > > 
> > > > > For the record, after restarting heartbeat on
> > > > server2
> > > > > - crm_mon shows both nodes as online.
> > > > > 
> > > > > If this is the expected behaviour, can i
> > disable
> > > > it ?
> > > > > Because the service in question can run after
> > a
> > > > split
> > > > > brain - no harm will be done, no inconsitency
> > can
> > > > > exist.
> > > > > 
> > > > > Or is it a bug - that after a network outage,
> > and
> > > > > restoring of the network, server2 shows both
> > nodes
> > > > > (itself, and server1) as offline and only a
> > > > restart
> > > > > repairs it to the online status ?
> > > > 
> > > > It is most probably a bug. The cluster should be
> > > > able to recover
> > > > from split brain. Please post the logs.
> > > > 
> > > > Dejan
> > > > 
> > > > > Thank you for your input
> > > > > 
> > > > > 
> > > > >        
> > > > >
> > > >
> > >
> >
> ____________________________________________________________________________________
> > > > > Choose the right car based on your needs. 
> > Check
> > > > out Yahoo! Autos new Car Finder tool.
> > > > > http://autos.yahoo.com/carfinder/
> > > > >
> > _______________________________________________
> > > > > Linux-HA mailing list
> > > > > Linux-HA at lists.linux-ha.org
> > > > >
> > > >
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > See also:
> > http://linux-ha.org/ReportingProblems
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > Linux-HA at lists.linux-ha.org
> > > >
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > > 
> > > 
> > > 
> > > 
> > >        
> > >
> >
> ____________________________________________________________________________________
> > > Got a little couch potato? 
> > > Check out fun summer activities for kids.
> > >
> >
> http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
> > 
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> 
> 
> 
>        
> ____________________________________________________________________________________
> Need a vacation? Get great deals
> to amazing places on Yahoo! Travel.
> http://travel.yahoo.com/
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list