[Linux-HA] Forcing node to rejoin cluster after Split-Brain

Max Hofer max.hofer at apus.co.at
Mon Nov 6 02:24:16 MST 2006


Version 2.0.7

On Wednesday 01 November 2006 15:56, Andrew Beekhof wrote:
> On 10/31/06, Max Hofer <max.hofer at apus.co.at> wrote:
> > Ok, it should not happen but sometimes it happens (specially
> > during testing periods). A split-brain occured by one of my 4 test nodes.
> > (Network went down for a couple minutes and dead-time hit in and
> > somehow the three other nodes managed to rejoin (or maybe never left
> > the group) but management1 failed to rejoin the cluster).
> >
> > crm_mon shows me:
> >
> > Node: routing2 (c5e1bda1-b00b-42e8-89e1-702b2d715c76): online
> > Node: routing1 (a98d68fb-807a-4b22-af3c-82e60064aa95): online
> > Node: management2 (a69b64a2-4de8-4d4a-b4ba-8107136eec4b): online
> > Node: management1 (0044b88e-c148-4269-9d39-449324bf65b8): OFFLINE
> >
> > heartbeat is running on management 1 and the ha-log on that machine shows me:
> >
> > crmd[1185]: 2006/10/31_13:22:42 WARN: do_dc_join_finalize:join_dc.c join-4: We are still in a transition.  Delaying until the TE completes.
> > crmd[1185]: 2006/10/31_13:22:46 WARN: do_dc_join_finalize:join_dc.c join-4: We are still in a transition.  Delaying until the TE completes.
> > crmd[1185]: 2006/10/31_13:22:48 WARN: do_dc_join_finalize:join_dc.c join-4: We are still in a transition.  Delaying until the TE completes.
> > crmd[1185]: 2006/10/31_13:22:48 WARN: do_dc_join_finalize:join_dc.c join-4: We are still in a transition.  Delaying until the TE completes.
> > crmd[1185]: 2006/10/31_13:22:51 WARN: do_dc_join_finalize:join_dc.c join-4: We are still in a transition.  Delaying until the TE completes.
> > crmd[1185]: 2006/10/31_13:22:53 WARN: do_dc_join_finalize:join_dc.c join-4: We are still in a transition.  Delaying until the TE completes.
> >
> > Is there a "simple way" to tell node "management1" to rejoin the cluster without
> > shutting down heartbeat (rebooting) on that machine?
> 
> version?
> 
> it will probably also depend on at why the node is not part of the cluster
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 

-- 
Max Hofer
APUS Software G.m.b.H.
A-8074 Raaba, Bahnhofstraße 1/1
T| +43 316 401629 11
F| +43 316 401629 9
W| www.apus.co.at
E| max.hofer at apus.co.at


More information about the Linux-HA mailing list