[Linux-HA] DBRD - split brain - and HA is happily migrating

Thomas Glanzmann thomas at glanzmann.de
Tue Jan 1 15:42:58 MST 2008


Hello,

> I have drbd (newest version; same goes for heartbeat) running as a
> master/slave ressource on the latest heart beat ressource and had the
> following problem. I had a split brain situation and heartbeat made it
> possible to migrate from one node to another and I wonder how that is
> possible? How do other people handle this situation. My setup so far is
> the following:

I am not 100 percent sure, but I think. When I had monitor operation for
the Master and the Slave state configured like that:

                <master_slave id="ms-drbd0">
                        <meta_attributes id="ma-ms-drbd0">
                                <attributes>
                                        <nvpair id="ma-ms-drbd0-1" name="clone_max" value="2"/>
                                        <nvpair id="ma-ms-drbd0-2" name="clone_node_max" value="1"/>
                                        <nvpair id="ma-ms-drbd0-3" name="master_max" value="1"/>
                                        <nvpair id="ma-ms-drbd0-4" name="master_node_max" value="1"/>
                                        <nvpair id="ma-ms-drbd0-5" name="notify" value="yes"/>
                                        <nvpair id="ma-ms-drbd0-6" name="globally_unique" value="false"/>
                                </attributes>
                        </meta_attributes>
                        <primitive id="drbd0" class="ocf" provider="heartbeat" type="drbd">
                                <instance_attributes id="ia-drbd0">
                                        <attributes>
                                                <nvpair id="ia-drbd0-1" name="drbd_resource" value="postgres"/>
                                        </attributes>
                                </instance_attributes>
                                <operations>
                                        <op id="op-ms-drbd0-1" name="monitor" interval="5s" timeout="5s" start_delay="30s" role="Master"/>
                                        <op id="op-ms-drbd0-2" name="monitor" interval="6s" timeout="5s" start_delay="30s" role="Slave"/>
                                </operations>
                        </primitive>
                </master_slave>

The master refused to start. When I dropped that monitor operation (I
don't know if that is the reason) I could start drbd on an outdated
secondary and outdated primary. However I just re-enabled the monitor
operation and am now able to shut down my entire cluster and fire it up
again without any trouble. I would really be glad if somone with
expierence in this setup could enlighten me.

        Thomas


More information about the Linux-HA mailing list