[Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

Peter Farrell peter.d.farrell at gmail.com
Tue Sep 11 10:07:14 MDT 2007


Thanks to both of you for your responses.
I've got my head round it a bit better now and have had a good fiddle
with all the crm* commands and options. Many of my troubles were do to
syntax (as in almost every niggling Unix issue I ever deal with!)

The quorum policy for a two node cluster is largely irrelevant in any case.
Heartbeat doesn't seem to care one way or another (what options I set here).

The only odd thing I didn't find anywhere grep'ing through the lists
and Google was the 'WARN: There is something wrong' message in the log
files.
If you recall the crm_mon said that 'ldirectord start' was failing  -
but it wasn't true.

Anyone have any ideas here?

Sep 11 15:29:42 dmz1 ldirectord[4298]: ldirectord for
/etc/ha.d/conf/ldirectord.cf is running with pid: 4265
Sep 11 15:29:42 dmz1 ldirectord[4298]: Exiting from ldirectord status
Sep 11 15:29:42 dmz1 lrmd: [3971]: WARN: There is something wrong: the
first line isn't read in. Maybe the heartbeat does not ouput string
correctly for status operation. Or the code (myself) is wrong.
Sep 11 15:29:42 dmz1 crmd: [3974]: info: process_lrm_event: LRM
operation ldirectord_3_monitor_120000 (call=11, rc=7) complete
Sep 11 15:29:42 dmz1 lrmd: [3971]: debug: RA output [] didn't match any pattern


Thanks again. (ldirectord.conf below)
-Peter Farrell

[root at dmz1 ha.d]# more conf/ldirectord.cf
# Global Directives
checktimeout=2
checkinterval=2
logfile="local7"

# heartbeat.example.com
virtual=212.140.130.37:80
        protocol=tcp
        scheduler=rr
        checktype=connect
        real=212.140.130.33:80 gate
        real=212.140.130.34:80 gate
virtual=212.140.130.38:80
        protocol=tcp
        scheduler=rr
        checktype=connect
        real=212.140.130.34:80 gate
        real=212.140.130.33:80 gate


On 11/09/2007, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> Hi,
>
> On Tue, Sep 11, 2007 at 01:34:51PM +0100, Peter Farrell wrote:
> > Hi all.
> >
> > I'm looking for clarification and/or direction. I've been reading
> > through the linux-ha.org site, watching Andrew's presentation in
> > Australia and experimenting with  R2 in anticipation of upgrading our
> > R1 setup.
> >
> > Versions:
> > heartbeat-stonith-2.1.2-3.el4.centos
> > heartbeat-pils-2.1.2-3.el4.centos
> > heartbeat-ldirectord-2.1.2-3.el4.centos
> > heartbeat-2.1.2-3.el4.centos
> >
> > I'm using 2 node clusters that monitor IPaddr among web servers.
> > Specifically - ldirectord is my only resource.
> >
> > The R1 setup I'm trying to test is:
> >
> > ha.cf:
> > crm yes
> > use_logd on
> > keepalive 2
> > deadtime 30
> > warntime 10
> > initdead 120
> > udpport 694
> > bcast   eth1
> > ucast eth1 10.0.0.1
> > auto_failback off
> > node    dmz1.example.com
> > node    dmz2.example.com
> >
> > haresources:
> > dmz1.example.com IPaddr::212.140.130.37 IPaddr::212.140.130.38
> > ldirectord::ldirectord.cf
> >
> >
> > My main 'issue' is creating / updating the cib.xml.
> > I tried various iterations with the python converter - but it's always
> > either blown away across reboots or missing parameters.
> >
> > I managed to to use 'cibadmin' and add strings (from the xml file I
> > wanted to use) to the 'live' cib.xml. It works, after a fashion and is
> > being replicated between the two nodes.
> >
> > crm_mon shows me:
> > ============
> > Last updated: Tue Sep 11 13:01:34 2007
> > Current DC: dmz2.example.com (4ffd8d6f-adaa-4fdb-888e-dcf520cf2189)
> > 2 Nodes configured.
> > 1 Resources configured.
> > ============
> >
> > Node: dmz2.example.com (4ffd8d6f-adaa-4fdb-888e-dcf520cf2189): online
> > Node: dmz1.example.com (d9ffeb49-3151-48fc-a976-0edfb39494f9): online
> >
> > Resource Group: group_1
> >     IPaddr_212_140_130_37       (heartbeat::ocf:IPaddr):
> > Started dmz1.scarceskills.com
> >     IPaddr_212_140_130_38       (heartbeat::ocf:IPaddr):
> > Started dmz1.scarceskills.com
> >     ldirectord_3        (heartbeat:ldirectord): Started dmz1.example.com FAILED
> >
> > Failed actions:
> >     ldirectord_3_monitor_120000 (node=dmz1.example.com, call=11, rc=7): complete
> >
> >
> > ----------------
> > Questions:
> > ----------------
> > 1. Is there a way or 'procedure' for maintaining a standard cib.xml file?
> > I mean, the 'cibadmin' seems really clumsy to me, copying in strings,
> > etc. Is this the "normal" way to manage it? For example - if I wanted
> > to create a new environment at another site - do I `cat
> > /var/lib/heartbeat/crm/cib.xml > save_this.xml` and the create a new
> > 'blank' cluster and sort of feed the saved file in via the cibadmin
> > tool?
>
> There are a bunch of tools to deal with individual
> attributes/properties. They all start with crm_ and crm_resource
> is probably the most used one. Check these pages for some
> documentation:
>
> http://www.novell.com/documentation/sles10/heartbeat/index.html
> http://www.linux-ha.org/v2/AdminTools/
>
> > BTW: it took me a day and half to actually figure out that I was meant
> > to be using the cibadmin tool at all... I tried in vain to copy it
> > directly to the /var/lib path to no avail.
>
> That could work too, but you have to do it before starting the
> cluster and sett the permissions too. cibadmin is preferred.
>
> > 2. The logs show ldirectord being restarted multiple times, ps shows
> > it running, and ifconfig -a shows me the correct virtual interfaces
> > have been configured. When I reboot either node - the other node picks
> > up as it's meant to. So I trust it's working (although I've not tested
> > sending request to the test page at the end of that address yet)
> >
> > What then do I make of the output from crm_mon?
> > ldirectord_3        (heartbeat:ldirectord): Started dmz1.example.com FAILED
> >
> > Failed actions:
> > ldirectord_3_monitor_120000 (node=dmz1.example.com, call=11, rc=7): complete
>
> The monitor action on that resource failed. The logs should
> hopefully say why.
>
> > 3. Do I need any additional statements in my ha.cf for this type of setup?
>
> No. You should also setup /etc/logd.cf.
>
> > 4. (Last one!) Although I can feed in the XML strings to construct the
> > file I want - I can't shake off the default 'quorum=true' bit in the
> > first line of the cib.xml:
> > <cib generated="true" admin_epoch="0" have_quorum="true"
> > ignore_dtd="false" num_peers="2" ccm_transition="10"
> > cib_feature_revision="1.3"
> > dc_uuid="4ffd8d6f-adaa-4fdb-888e-dcf520cf2189" epoch="5"
> > num_updates="130" cib-last-written="Tue Sep 11 13:00:58 2007">
> >
> > I want:
> > name="symmetric-cluster" value="true"/>
> >            <nvpair id="cib-bootstrap-options-no-quorum-policy"
> >
> > Does this matter? Can I safely ignore that first line? Or is there a
> > way to remove it?
>
> Don't get this one, but you probably want to keep the have_quorum
> at true. Otherwise, you can manage attributes using
> crm_attribute.
>
> Thanks.
>
> Dejan
>
> > Any pointers would be greatly appreciated.
> >
> > Best regards;
> >
> > -Peter Farrell
> > Cardiff, Wales - UK
> >
> >
> > cib.xml:
> > -----------
> > [root at dmz1 ha.d]# cat /var/lib/heartbeat/crm/cib.xml
> >  <cib generated="true" admin_epoch="0" have_quorum="true"
> > ignore_dtd="false" num_peers="2" ccm_transition="10"
> > cib_feature_revision="1.3"
> > dc_uuid="4ffd8d6f-adaa-4fdb-888e-dcf520cf2189" epoch="5"
> > num_updates="130" cib-last-written="Tue Sep 11 13:00:58 2007">
> >    <configuration>
> >      <crm_config>
> >        <cluster_property_set id="cib-bootstrap-options">
> >          <attributes>
> >            <nvpair id="cib-bootstrap-options-symmetric-cluster"
> > name="symmetric-cluster" value="true"/>
> >            <nvpair id="cib-bootstrap-options-no-quorum-policy"
> > name="no-quorum-policy" value="stop"/>
> >            <nvpair
> > id="cib-bootstrap-options-default-resource-stickiness"
> > name="default-resource-stickiness" value="0"/>
> >            <nvpair
> > id="cib-bootstrap-options-default-resource-failure-stickiness"
> > name="default-resource-failure-stickiness" value="0"/>
> >            <nvpair id="cib-bootstrap-options-stonith-enabled"
> > name="stonith-enabled" value="false"/>
> >            <nvpair id="cib-bootstrap-options-stonith-action"
> > name="stonith-action" value="reboot"/>
> >            <nvpair id="cib-bootstrap-options-stop-orphan-resources"
> > name="stop-orphan-resources" value="true"/>
> >            <nvpair id="cib-bootstrap-options-stop-orphan-actions"
> > name="stop-orphan-actions" value="true"/>
> >            <nvpair id="cib-bootstrap-options-remove-after-stop"
> > name="remove-after-stop" value="false"/>
> >            <nvpair id="cib-bootstrap-options-short-resource-names"
> > name="short-resource-names" value="true"/>
> >            <nvpair id="cib-bootstrap-options-transition-idle-timeout"
> > name="transition-idle-timeout" value="5min"/>
> >            <nvpair id="cib-bootstrap-options-default-action-timeout"
> > name="default-action-timeout" value="15s"/>
> >            <nvpair id="cib-bootstrap-options-is-managed-default"
> > name="is-managed-default" value="true"/>
> >          </attributes>
> >        </cluster_property_set>
> >      </crm_config>
> >      <nodes>
> >        <node id="4ffd8d6f-adaa-4fdb-888e-dcf520cf2189"
> > uname="dmz2.example.com" type="normal"/>
> >        <node id="d9ffeb49-3151-48fc-a976-0edfb39494f9"
> > uname="dmz1.example.com" type="normal"/>
> >      </nodes>
> >      <resources>
> >        <group id="group_1">
> >          <primitive class="ocf" id="IPaddr_212_140_130_37"
> > provider="heartbeat" type="IPaddr">
> >            <operations>
> >              <op id="IPaddr_212_140_130_37_mon" interval="5s"
> > name="monitor" timeout="5s"/>
> >            </operations>
> >            <instance_attributes id="IPaddr_212_140_130_37_inst_attr">
> >              <attributes>
> >                <nvpair id="IPaddr_212_140_130_37_attr_0" name="ip"
> > value="212.140.130.37"/>
> >              </attributes>
> >            </instance_attributes>
> >          </primitive>
> >          <primitive class="ocf" id="IPaddr_212_140_130_38"
> > provider="heartbeat" type="IPaddr">
> >            <operations>
> >              <op id="IPaddr_212_140_130_38_mon" interval="5s"
> > name="monitor" timeout="5s"/>
> >            </operations>
> >            <instance_attributes id="IPaddr_212_140_130_38_inst_attr">
> >              <attributes>
> >                <nvpair id="IPaddr_212_140_130_38_attr_0" name="ip"
> > value="212.140.130.38"/>
> >              </attributes>
> >            </instance_attributes>
> >          </primitive>
> >          <primitive class="heartbeat" id="ldirectord_3"
> > provider="heartbeat" type="ldirectord">
> >            <operations>
> >              <op id="ldirectord_3_mon" interval="120s" name="monitor"
> > timeout="60s"/>
> >            </operations>
> >            <instance_attributes id="ldirectord_3_inst_attr">
> >              <attributes>
> >                <nvpair id="ldirectord_3_attr_1" name="1" value="ldirectord.cf"/>
> >              </attributes>
> >            </instance_attributes>
> >          </primitive>
> >        </group>
> >      </resources>
> >      <constraints>
> >        <rsc_location id="rsc_location_group_1" rsc="group_1">
> >          <rule id="prefered_location_group_1" score="100">
> >            <expression attribute="#uname"
> > id="prefered_location_group_1_expr" operation="eq"
> > value="dmz1.example.com"/>
> >          </rule>
> >        </rsc_location>
> >      </constraints>
> >    </configuration>
> >  </cib>
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list