[Linux-HA] Default-resource-stickiness of infinity with DRBD
not keeping Primary stuck
Daniel Stickney
dstickney at pronto.com
Sun Feb 3 22:34:06 MST 2008
Thanks so far for the feedback on my question. I have been unsuccessful
so far in finding a way to configure the master DRBD resource to stick
where it is unless there is a failure forcing it to move. We have it
working awesome with auto failback, but don't we don't want this
behavior. I want to make sure I am not trying to configure something
that is impossible at this time, so I should ask:
Is setting a stickiness on the master drbd resource even possible? If
so, how? Master_slave does not act like single instance resources like
an IP, so I am not sure the logic applies as I expect it to. If anyone
on this list has DRBD in a Heartbeat V2 CRM mode configuration with the
master DRBD resource successfully configured to stick as master on a
node it fails over onto, even when the previous master node comes back
online, can you please reply to me with it? I would be very
appreciative. My cib.xml and ha.cf are below in my original email. I
have tried dozens of permutations of this config with many
default-resource-stickiness values, and placing resource-stickiness in
the master_slave resource in instance_attributes and meta_attributes,
and in the opening <master_slave> tag. All of this has been unsuccessful
unfortunately. If the current DRBD/Heartbeat v2 versions simply do not
support my preferred configuration logic, it would be great to know now.
Thanks for your time!
Daniel
Daniel Stickney wrote:
> Hello everyone,
>
> Our setup: CentOS 5 (kernel 2.6.18-53), Heartbeat
> heartbeat-2.1.2-3.el5.centos, DRBD drbd-8.0.6-1.el5.centos
>
> We are running into a problem with getting the master DRBD resource to
> stick on a node it has failed onto. We have a simple 2 node cluster
> for demonstration of the issue, halinux1 and halinux2, with a single
> DRBD resource. What we are seeing is halinux2 selected as the Master
> node for DRBD on heartbeat startup, halinux1 as the slave. When
> halinux2 is placed into standby, the halinux1 is promoted to DRBD
> master as expected. When halinux2 is taken out of standby mode,
> halinux1 is demoted to secondary and halinux2 is promoted to master.
> We don't want this failback action. We want the DRBD master to stay on
> whatever node it is on unless there is a failure requiring it to move.
> We have default-resource-stickiness set to "infinity" in our cib.xml
> file. I repeated this experiment with a single IP address resource (no
> DRBD), and the stickiness of infinity worked exactly as expected: the
> IP stayed on whatever node it was on unless there was a failure (or
> standby mode) on the local node requiring the IP to move, so that was
> a positive confirmation that outside of our testing with DRBD, the
> stickiness of infinity works. We would very much appreciate
> suggestions on how we might go about resolving this issue.
>
> Here is the cib.xml file:
> ----------------------------------
> <cib generated="true" admin_epoch="0" have_quorum="true"
> ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" epoch="35"
> num_updates="1" cib-last-wr
> itten="Tue Jan 29 12:36:17 2008" ccm_transition="2"
> dc_uuid="d2c440e4-9668-4a70-b7e2-de7f52834325">
> <configuration>
> <crm_config>
> <cluster_property_set id="cluster_defaults">
> <attributes>
> <nvpair name="default-resource-stickiness" id="stickiness"
> value="INFINITY"/>
> </attributes>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node uname="halinux2" type="normal"
> id="216a5f87-c472-4ce6-a3f1-7ce4f6dc1bae">
> <instance_attributes
> id="nodes-216a5f87-c472-4ce6-a3f1-7ce4f6dc1bae">
> <attributes>
> <nvpair name="standby"
> id="standby-216a5f87-c472-4ce6-a3f1-7ce4f6dc1bae" value="false"/>
> </attributes>
> </instance_attributes>
> </node>
> <node uname="halinux1" type="normal"
> id="d2c440e4-9668-4a70-b7e2-de7f52834325">
> <instance_attributes
> id="nodes-d2c440e4-9668-4a70-b7e2-de7f52834325">
> <attributes>
> <nvpair name="standby"
> id="standby-d2c440e4-9668-4a70-b7e2-de7f52834325" value="false"/>
> </attributes>
> </instance_attributes>
> </node>
> </nodes>
> <resources>
> <master_slave id="ms-drbd0">
> <meta_attributes id="ma-ms-drbd0">
> <attributes>
> <nvpair id="ma-ms-drbd0-1" name="clone_max" value="2"/>
> <nvpair id="ma-ms-drbd0-2" name="clone_node_max" value="1"/>
> <nvpair id="ma-ms-drbd0-3" name="master_max" value="1"/>
> <nvpair id="ma-ms-drbd0-4" name="master_node_max" value="1"/>
> <nvpair id="ma-ms-drbd0-5" name="notify" value="yes"/>
> <nvpair id="ma-ms-drbd0-6" name="globally_unique"
> value="false"/>
> <nvpair id="ma-ms-drbd0-7" name="target_role"
> value="started"/>
> </attributes>
> </meta_attributes>
> <primitive id="DRBD" class="ocf" provider="heartbeat" type="drbd">
> <instance_attributes id="ia-DRBD">
> <attributes>
> <nvpair id="ia-DRBD-1" name="drbd_resource" value="mysql"/>
> </attributes>
> </instance_attributes>
> </primitive>
> </master_slave>
> </resources>
> <constraints/>
> </configuration>
> </cib>
> ----------------------------------
> =========================================================================
>
> Here is our ha.cf file:
> ----------------------------------
> use_logd yes
> udpport 695
> bcast eth0
> node halinux1
> node halinux2
> crm on
> ----------------------------------
> =========================================================================
>
> Here is a link to the /var/log/messages output on halinux1 starting
> from the time when halinux2 comes out of standby mode and the unwanted
> failback occurs: http://pastebin.com/m6e55f6b3
>
> Thank you in advance for your time,
> -Daniel
>
--
Daniel Stickney - Linux Systems Administrator
Email: dstickney at pronto.com
More information about the Linux-HA
mailing list