[Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Sep 12 04:38:45 MDT 2007


Hi,

On Tue, Sep 11, 2007 at 05:07:14PM +0100, Peter Farrell wrote:
> Thanks to both of you for your responses.
> I've got my head round it a bit better now and have had a good fiddle
> with all the crm* commands and options. Many of my troubles were do to
> syntax (as in almost every niggling Unix issue I ever deal with!)
> 
> The quorum policy for a two node cluster is largely irrelevant in any case.
> Heartbeat doesn't seem to care one way or another (what options I set here).

Indeed. On a two node cluster there's always a quorum.

> The only odd thing I didn't find anywhere grep'ing through the lists
> and Google was the 'WARN: There is something wrong' message in the log
> files.
> If you recall the crm_mon said that 'ldirectord start' was failing  -
> but it wasn't true.

Well, crm_mon just relays whatever the RA reported. If the RA is
buggy then what it says may not fit the reality.

> Anyone have any ideas here?
> 
> Sep 11 15:29:42 dmz1 ldirectord[4298]: ldirectord for
> /etc/ha.d/conf/ldirectord.cf is running with pid: 4265
> Sep 11 15:29:42 dmz1 ldirectord[4298]: Exiting from ldirectord status
> Sep 11 15:29:42 dmz1 lrmd: [3971]: WARN: There is something wrong: the
> first line isn't read in. Maybe the heartbeat does not ouput string
> correctly for status operation. Or the code (myself) is wrong.

The RA did not print anything and it was expected to.

> Sep 11 15:29:42 dmz1 crmd: [3974]: info: process_lrm_event: LRM
> operation ldirectord_3_monitor_120000 (call=11, rc=7) complete
> Sep 11 15:29:42 dmz1 lrmd: [3971]: debug: RA output [] didn't match any pattern
> 
> 
> Thanks again. (ldirectord.conf below)
> -Peter Farrell
> 
> [root at dmz1 ha.d]# more conf/ldirectord.cf
> # Global Directives
> checktimeout=2
> checkinterval=2
> logfile="local7"

Hmm, just a wild guess. Perhaps you should remove the logfile
reference so that the RA prints stuff to the stdout/stderr. It is
going to be captured by the lrmd and put into logs. However, I
have absolutely no experience with the ldirector. Anybody here
knows about it?

Thanks,

Dejan

> 
> # heartbeat.example.com
> virtual=212.140.130.37:80
>         protocol=tcp
>         scheduler=rr
>         checktype=connect
>         real=212.140.130.33:80 gate
>         real=212.140.130.34:80 gate
> virtual=212.140.130.38:80
>         protocol=tcp
>         scheduler=rr
>         checktype=connect
>         real=212.140.130.34:80 gate
>         real=212.140.130.33:80 gate
> 
> 
> On 11/09/2007, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> > Hi,
> >
> > On Tue, Sep 11, 2007 at 01:34:51PM +0100, Peter Farrell wrote:
> > > Hi all.
> > >
> > > I'm looking for clarification and/or direction. I've been reading
> > > through the linux-ha.org site, watching Andrew's presentation in
> > > Australia and experimenting with  R2 in anticipation of upgrading our
> > > R1 setup.
> > >
> > > Versions:
> > > heartbeat-stonith-2.1.2-3.el4.centos
> > > heartbeat-pils-2.1.2-3.el4.centos
> > > heartbeat-ldirectord-2.1.2-3.el4.centos
> > > heartbeat-2.1.2-3.el4.centos
> > >
> > > I'm using 2 node clusters that monitor IPaddr among web servers.
> > > Specifically - ldirectord is my only resource.
> > >
> > > The R1 setup I'm trying to test is:
> > >
> > > ha.cf:
> > > crm yes
> > > use_logd on
> > > keepalive 2
> > > deadtime 30
> > > warntime 10
> > > initdead 120
> > > udpport 694
> > > bcast   eth1
> > > ucast eth1 10.0.0.1
> > > auto_failback off
> > > node    dmz1.example.com
> > > node    dmz2.example.com
> > >
> > > haresources:
> > > dmz1.example.com IPaddr::212.140.130.37 IPaddr::212.140.130.38
> > > ldirectord::ldirectord.cf
> > >
> > >
> > > My main 'issue' is creating / updating the cib.xml.
> > > I tried various iterations with the python converter - but it's always
> > > either blown away across reboots or missing parameters.
> > >
> > > I managed to to use 'cibadmin' and add strings (from the xml file I
> > > wanted to use) to the 'live' cib.xml. It works, after a fashion and is
> > > being replicated between the two nodes.
> > >
> > > crm_mon shows me:
> > > ============
> > > Last updated: Tue Sep 11 13:01:34 2007
> > > Current DC: dmz2.example.com (4ffd8d6f-adaa-4fdb-888e-dcf520cf2189)
> > > 2 Nodes configured.
> > > 1 Resources configured.
> > > ============
> > >
> > > Node: dmz2.example.com (4ffd8d6f-adaa-4fdb-888e-dcf520cf2189): online
> > > Node: dmz1.example.com (d9ffeb49-3151-48fc-a976-0edfb39494f9): online
> > >
> > > Resource Group: group_1
> > >     IPaddr_212_140_130_37       (heartbeat::ocf:IPaddr):
> > > Started dmz1.scarceskills.com
> > >     IPaddr_212_140_130_38       (heartbeat::ocf:IPaddr):
> > > Started dmz1.scarceskills.com
> > >     ldirectord_3        (heartbeat:ldirectord): Started dmz1.example.com FAILED
> > >
> > > Failed actions:
> > >     ldirectord_3_monitor_120000 (node=dmz1.example.com, call=11, rc=7): complete
> > >
> > >
> > > ----------------
> > > Questions:
> > > ----------------
> > > 1. Is there a way or 'procedure' for maintaining a standard cib.xml file?
> > > I mean, the 'cibadmin' seems really clumsy to me, copying in strings,
> > > etc. Is this the "normal" way to manage it? For example - if I wanted
> > > to create a new environment at another site - do I `cat
> > > /var/lib/heartbeat/crm/cib.xml > save_this.xml` and the create a new
> > > 'blank' cluster and sort of feed the saved file in via the cibadmin
> > > tool?
> >
> > There are a bunch of tools to deal with individual
> > attributes/properties. They all start with crm_ and crm_resource
> > is probably the most used one. Check these pages for some
> > documentation:
> >
> > http://www.novell.com/documentation/sles10/heartbeat/index.html
> > http://www.linux-ha.org/v2/AdminTools/
> >
> > > BTW: it took me a day and half to actually figure out that I was meant
> > > to be using the cibadmin tool at all... I tried in vain to copy it
> > > directly to the /var/lib path to no avail.
> >
> > That could work too, but you have to do it before starting the
> > cluster and sett the permissions too. cibadmin is preferred.
> >
> > > 2. The logs show ldirectord being restarted multiple times, ps shows
> > > it running, and ifconfig -a shows me the correct virtual interfaces
> > > have been configured. When I reboot either node - the other node picks
> > > up as it's meant to. So I trust it's working (although I've not tested
> > > sending request to the test page at the end of that address yet)
> > >
> > > What then do I make of the output from crm_mon?
> > > ldirectord_3        (heartbeat:ldirectord): Started dmz1.example.com FAILED
> > >
> > > Failed actions:
> > > ldirectord_3_monitor_120000 (node=dmz1.example.com, call=11, rc=7): complete
> >
> > The monitor action on that resource failed. The logs should
> > hopefully say why.
> >
> > > 3. Do I need any additional statements in my ha.cf for this type of setup?
> >
> > No. You should also setup /etc/logd.cf.
> >
> > > 4. (Last one!) Although I can feed in the XML strings to construct the
> > > file I want - I can't shake off the default 'quorum=true' bit in the
> > > first line of the cib.xml:
> > > <cib generated="true" admin_epoch="0" have_quorum="true"
> > > ignore_dtd="false" num_peers="2" ccm_transition="10"
> > > cib_feature_revision="1.3"
> > > dc_uuid="4ffd8d6f-adaa-4fdb-888e-dcf520cf2189" epoch="5"
> > > num_updates="130" cib-last-written="Tue Sep 11 13:00:58 2007">
> > >
> > > I want:
> > > name="symmetric-cluster" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-no-quorum-policy"
> > >
> > > Does this matter? Can I safely ignore that first line? Or is there a
> > > way to remove it?
> >
> > Don't get this one, but you probably want to keep the have_quorum
> > at true. Otherwise, you can manage attributes using
> > crm_attribute.
> >
> > Thanks.
> >
> > Dejan
> >
> > > Any pointers would be greatly appreciated.
> > >
> > > Best regards;
> > >
> > > -Peter Farrell
> > > Cardiff, Wales - UK
> > >
> > >
> > > cib.xml:
> > > -----------
> > > [root at dmz1 ha.d]# cat /var/lib/heartbeat/crm/cib.xml
> > >  <cib generated="true" admin_epoch="0" have_quorum="true"
> > > ignore_dtd="false" num_peers="2" ccm_transition="10"
> > > cib_feature_revision="1.3"
> > > dc_uuid="4ffd8d6f-adaa-4fdb-888e-dcf520cf2189" epoch="5"
> > > num_updates="130" cib-last-written="Tue Sep 11 13:00:58 2007">
> > >    <configuration>
> > >      <crm_config>
> > >        <cluster_property_set id="cib-bootstrap-options">
> > >          <attributes>
> > >            <nvpair id="cib-bootstrap-options-symmetric-cluster"
> > > name="symmetric-cluster" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-no-quorum-policy"
> > > name="no-quorum-policy" value="stop"/>
> > >            <nvpair
> > > id="cib-bootstrap-options-default-resource-stickiness"
> > > name="default-resource-stickiness" value="0"/>
> > >            <nvpair
> > > id="cib-bootstrap-options-default-resource-failure-stickiness"
> > > name="default-resource-failure-stickiness" value="0"/>
> > >            <nvpair id="cib-bootstrap-options-stonith-enabled"
> > > name="stonith-enabled" value="false"/>
> > >            <nvpair id="cib-bootstrap-options-stonith-action"
> > > name="stonith-action" value="reboot"/>
> > >            <nvpair id="cib-bootstrap-options-stop-orphan-resources"
> > > name="stop-orphan-resources" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-stop-orphan-actions"
> > > name="stop-orphan-actions" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-remove-after-stop"
> > > name="remove-after-stop" value="false"/>
> > >            <nvpair id="cib-bootstrap-options-short-resource-names"
> > > name="short-resource-names" value="true"/>
> > >            <nvpair id="cib-bootstrap-options-transition-idle-timeout"
> > > name="transition-idle-timeout" value="5min"/>
> > >            <nvpair id="cib-bootstrap-options-default-action-timeout"
> > > name="default-action-timeout" value="15s"/>
> > >            <nvpair id="cib-bootstrap-options-is-managed-default"
> > > name="is-managed-default" value="true"/>
> > >          </attributes>
> > >        </cluster_property_set>
> > >      </crm_config>
> > >      <nodes>
> > >        <node id="4ffd8d6f-adaa-4fdb-888e-dcf520cf2189"
> > > uname="dmz2.example.com" type="normal"/>
> > >        <node id="d9ffeb49-3151-48fc-a976-0edfb39494f9"
> > > uname="dmz1.example.com" type="normal"/>
> > >      </nodes>
> > >      <resources>
> > >        <group id="group_1">
> > >          <primitive class="ocf" id="IPaddr_212_140_130_37"
> > > provider="heartbeat" type="IPaddr">
> > >            <operations>
> > >              <op id="IPaddr_212_140_130_37_mon" interval="5s"
> > > name="monitor" timeout="5s"/>
> > >            </operations>
> > >            <instance_attributes id="IPaddr_212_140_130_37_inst_attr">
> > >              <attributes>
> > >                <nvpair id="IPaddr_212_140_130_37_attr_0" name="ip"
> > > value="212.140.130.37"/>
> > >              </attributes>
> > >            </instance_attributes>
> > >          </primitive>
> > >          <primitive class="ocf" id="IPaddr_212_140_130_38"
> > > provider="heartbeat" type="IPaddr">
> > >            <operations>
> > >              <op id="IPaddr_212_140_130_38_mon" interval="5s"
> > > name="monitor" timeout="5s"/>
> > >            </operations>
> > >            <instance_attributes id="IPaddr_212_140_130_38_inst_attr">
> > >              <attributes>
> > >                <nvpair id="IPaddr_212_140_130_38_attr_0" name="ip"
> > > value="212.140.130.38"/>
> > >              </attributes>
> > >            </instance_attributes>
> > >          </primitive>
> > >          <primitive class="heartbeat" id="ldirectord_3"
> > > provider="heartbeat" type="ldirectord">
> > >            <operations>
> > >              <op id="ldirectord_3_mon" interval="120s" name="monitor"
> > > timeout="60s"/>
> > >            </operations>
> > >            <instance_attributes id="ldirectord_3_inst_attr">
> > >              <attributes>
> > >                <nvpair id="ldirectord_3_attr_1" name="1" value="ldirectord.cf"/>
> > >              </attributes>
> > >            </instance_attributes>
> > >          </primitive>
> > >        </group>
> > >      </resources>
> > >      <constraints>
> > >        <rsc_location id="rsc_location_group_1" rsc="group_1">
> > >          <rule id="prefered_location_group_1" score="100">
> > >            <expression attribute="#uname"
> > > id="prefered_location_group_1_expr" operation="eq"
> > > value="dmz1.example.com"/>
> > >          </rule>
> > >        </rsc_location>
> > >      </constraints>
> > >    </configuration>
> > >  </cib>
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list