[Linux-HA] Is the following logic by design for the Split-Brain Case ? If yes - can i disable it ?

Dejan Muhamedagic dejanmm at fastmail.fm
Mon Sep 3 08:18:53 MDT 2007


Hi,

On Fri, Aug 31, 2007 at 09:05:54AM -0700, Harakiri wrote:
> Hello,
> 
> > It is most probably a bug. The cluster should be
> > able to recover
> > from split brain. Please post the logs.
> > 
> > Dejan
> 
> attached to this message are the log files.
> 
> server1_network_down.txt - the log of server1 when the
> network went down
> 
> server2_network_down.txt - the log of server2 when the
> network went down
> 
> server1_network_restored.txt - the log of server1 when
> the network has been restored
> 
> server2_network_restored.txt - the log of server2 when
> the network has been restored
> 
> resource_my_service = the service which has been
> configured for heartbeat

Read the logs and there everything looks fine. Don't know why
crm_mon shows the nodes as offline on that one node. In the logs,
the nodes claimed to have voted a DC:

Aug 25 04:35:53 server1 crmd: [24516]: info: update_dc:utils.c Set DC to server1 (1.0.6)
Aug 25 04:35:55 server2 crmd: [19821]: info: update_dc:utils.c Set DC to server1 (1.0.6)

which they wouldn't do unless they established the current membership.

BTW, this seems to be a bit older version of Heartbeat. Perhaps
you can upgrade to the latest stable (2.1.2).

Thanks,

Dejan

> Thanks for your help
> 
> 
> --- Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> 
> > Hi,
> > 
> > On Fri, Aug 31, 2007 at 07:36:34AM -0700, Harakiri
> > wrote:
> > > Hello List,
> > > 
> > > suppose i have a cluster with 2 members, lets call
> > > them server1 and server2.
> > > 
> > > Before a network outage - a service configured for
> > > heartbeat is running on server2 only, crm_mon
> > shows
> > > that both nodes are online on both servers.
> > > 
> > > Now after the network outage, server1 starts up
> > the
> > > same service as server2 - this makes sense at is
> > the
> > > expected behaviour of a fail over. 
> > > 
> > > During this time the service is running on both
> > > servers because they do not "see each other".
> > > 
> > > After a few hours, the network is restored -
> > server1
> > > sees that server2 is already (or still) running
> > the
> > > service in question - so it disables the service. 
> > > 
> > > server1 shows both nodes online and that the
> > service
> > > is running on server2.
> > > 
> > > However, on server2 both nodes show as offline but
> > the
> > > service in question is still running and managed
> > by
> > > heartbeat.
> > > 
> > > Is this the expected behaviour for a Split Brain
> > > situation ? I.e. do not activate the node
> > (server2)
> > > after a split brain to be sure to not have
> > > inconsistency ?
> > > 
> > > For the record, after restarting heartbeat on
> > server2
> > > - crm_mon shows both nodes as online.
> > > 
> > > If this is the expected behaviour, can i disable
> > it ?
> > > Because the service in question can run after a
> > split
> > > brain - no harm will be done, no inconsitency can
> > > exist.
> > > 
> > > Or is it a bug - that after a network outage, and
> > > restoring of the network, server2 shows both nodes
> > > (itself, and server1) as offline and only a
> > restart
> > > repairs it to the online status ?
> > 
> > It is most probably a bug. The cluster should be
> > able to recover
> > from split brain. Please post the logs.
> > 
> > Dejan
> > 
> > > Thank you for your input
> > > 
> > > 
> > >        
> > >
> >
> ____________________________________________________________________________________
> > > Choose the right car based on your needs.  Check
> > out Yahoo! Autos new Car Finder tool.
> > > http://autos.yahoo.com/carfinder/
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > >
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> 
> 
> 
>        
> ____________________________________________________________________________________
> Got a little couch potato? 
> Check out fun summer activities for kids.
> http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz 



More information about the Linux-HA mailing list