[Linux-HA] Split-brain after removing network plug,
how to resolve?
Bala
balaknathan at gmail.com
Fri Jul 18 04:19:35 MDT 2008
I had the similar problem and I solve it by doing the following.
1) I wrote a small shell script which is started as a service (when
the system boots) and spawns another shell script which runs as a
background process and checks if it is able to reach the passive node
node every 5 seconds (or a defined interval)
2) If the ping to passive node fails, it checks if it is able to do a
ping to the ping node. If this fails as well, the general assumption
is that the active node is disconnected from the network
3) In such a case, it stops the heartbeat service of itself.
4) Since the other node already takes over from the active , a split
brain problem does not occur. i.e
4.1 ) when the primary node is disconnected from network,
the passive node
becomes active
4.2 ) primary node stops it's heartbeat service as it is
not able to reach passive and
ping node.
4.3) when the network connection is reestablished, no
split brain problem occurs as
the cluster ip is already allocated to the other node.
This works for me. You can try this approach.
Bala
On Fri, Jul 18, 2008 at 1:40 PM, Yves Glodt <y.glodt at sitasoftware.com> wrote:
> Hello,
>
> I guess the problem I have has to be quite common, but still I failed to find
> a solution... I use heartbeat-2/drbd8.0.11 from stock Ubuntu Hardy 32bit.
>
> The situation is the following:
>
> I have to boxes in an active/passive setup. As long as I trigger a takeover
> manually with /usr/lib/heartbeat/hb_takeover it works well.
>
> When I pull the power-plug, the resource-migration works well as well.
>
> But, the problem which I face occurs as soon as I pull the network-plug of any
> of the 2 boxes, be it the active or the passive.
>
> When I pull the plug from the passive, it does not detect the active anymore,
> and it becomes active itself. When I reconnect, we have split-brain which we
> need to resolve manually.
>
> When I pull the plug from the active, it stays active, albeit disconnected,
> and the passive which is still in the network becomes active. Upon
> reconnection of the former active, we have split-brain again.
>
> I tried to mess with "ping", "ping_group", and ipfail, and it seems that
> ipfail correctly detects that the ping_group is dead, but, it does not trigger
> any action in heartbeat. IMHO it should detect the failure, and go standby in
> any case, and never keep the active state, nor change from passive to active.
>
> Any pointers on how to solve this...? :-)
>
> hereby my ha.cf:
> keepalive 1
> deadtime 10
> warntime 5
> initdead 60
> udpport 694
> ucast eth0 10.65.68.18
> #ping_group pinggroup 10.65.68.1 10.65.68.3 10.65.68.6
> ping 10.65.68.1 10.65.68.3 10.65.68.6
> auto_failback off
> node ubuntu1
> node ubuntu2
> debug 1
> respawn hacluster /usr/lib/heartbeat/ipfail
>
>
> And as well my harecources:
> ubuntu1 \
> drbddisk::r0 \
> Filesystem::/dev/drbd0::/drbd01::ext3 \
> Delay::1::0 \
> samba \
> firebird1.5-super \
> IPaddr2::10.65.68.19/24/eth0:0
>
>
> Best regards,
> Yves
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list