[Linux-HA] Re: Failover of resource

Andrew Beekhof beekhof at gmail.com
Tue Jul 10 09:26:49 MDT 2007


On 7/10/07, Taldevkar, Chetan <chetan.taldevkar at patni.com> wrote:
> Message: 3
> Date: Mon, 9 Jul 2007 17:20:43 +0200
> From: "Andrew Beekhof" <beekhof at gmail.com>
> Subject: Re: [Linux-HA] RE: Re: Failover of resource
> To: "General Linux-HA mailing list" <linux-ha at lists.linux-ha.org>
> Message-ID:
>         <26ef5e70707090820p4d3b9ccbt7ae65f5f507ddaba at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 7/9/07, Taldevkar, Chetan <chetan.taldevkar at patni.com> wrote:
> >
> > Thanks Andrew,
> >
> > I have modified the configuration as per the details given under
> > http://www.linux-ha.org/v2/faq/forced_failover  link.
> >
> > I am using resource of class "heartbeat". The script which is doing
> > check on the database is copied into /etc/ha.d/resource.d folder. I am
> > using two operation one is start and other I have tried 'status' and
> > 'monitor'.
>
>
> monitor is the correct name
> (the LRM will magically change the action to status for heartbeat and
> lsb
> scripts)
>
> Both options are not working. They continue to execute the
> > script even though it returned "stopped".
>
> more information?
>
> <Chetan> What I mean here is on getting error during execution of
> monitor operation, I am returning echo "Error: stopped" followed by exit
> 9.

its not a good idea to make up return codes.
please read:
   http://linux-ha.org/LSBResourceAgent
and in particular follow the link through to the specification of how
an LSB resource is required to behave.

>
> My understanding is that after seeing 'stopped' string linux-ha will
> trigger on_fail='stop' and invoke stop part my script.

no, only return codes count.  see above page.

> After invoking
> stop operation of the script I am expecting linux-ha to start the
> failover operation on another node and invoke start operation on the
> resource.
> But actual behavior is as below.
> 1. monitor operation continues on the same node for some time around 20
> seconds (it varies). And then it starts the resource on the another
> node.
>
> Is it possible to avoid this? Can I achieve failover on first instance
> of the error during monitor operation?

http://linux-ha.org/v2/faq/forced_failover

>
> One more observation ,if I use on_fail='fence' without stonith enabled,
> fail over occurs with lesser time.
>
> Will use of resource type (OCF , heartbeat) fetch different results?
>
> My requirement is as below :
>
> In between two nodes there is no shared device, Each node has times ten
> datastore. One is in active and other is in standby mode. If active
> instance of timesten fails then scripts on standby node needs to be
> executed on standby node making it active.
>
> </Chetan>
>
>
> at startup, we will call you script to check that it is not already
> running
> in the cluster.
> is this what you are talking about or something else?
>
>
> Am I wrong in choosing resource type?
> >
> > What should I give on_fail as. (I tried stop, restart,block).
>
>
>
> It depends on what you're trying to achieve.
>
>
> I am not
> > using fence as my understanding is, it will reboot the failed machine
> > which I don't want or there is option not to reboot.
> >
> > What option should I use with on_fail to stop the monitor/status
> > operation in case it fails in first instance?
>
>
> on_fail is irrelevant here. as the page i referred you to indicates, you
> need to set default_resource_failure_stickiness
>
> Thanks again,
> > Chetan
> >
> > --
> >
> > <cib admin_epoch="0" have_quorum="true" ignore_dtd="false"
> num_peers="2"
> > cib_feature_revision="1.3" generated="true" epoch="9"
> num_updates="439"
> > cib-last-written="Mon Jul  9 19:43:26 2007" ccm_transition="2"
> > dc_uuid="5426e37c-9469-40a3-813c-eebeb0b7c6a0">
> >    <configuration>
> >      <crm_config>
> >        <cluster_property_set id="cib-bootstrap-options">
> >          <attributes>
> >            <nvpair id="symmetric_cluster" name="symmetric_cluster"
> > value="true"/>
> >            <nvpair id="no_quorum_policy" name="no_quorum_policy"
> > value="stop"/>
> >            <nvpair id="default_resource_stickiness"
> > name="default_resource_stickiness" value="500"/>
> >            <nvpair id="default_resource_failure_stickiness"
> > name="default_resource_failure_stickiness" value="-100"/>
> >            <nvpair
> > id="cib-bootstrap-options-default-resource-failure-stickiness"
> > name="default-resource-failure-stickiness" value="-1500"/>
> >            <nvpair name="last-lrm-refresh"
> > id="cib-bootstrap-options-last-lrm-refresh" value="1183985435"/>
> >          </attributes>
> >        </cluster_property_set>
> >      </crm_config>
> >      <nodes>
> >        <node id="5426e37c-9469-40a3-813c-eebeb0b7c6a0" uname="node1"
> > type="normal"/>
> >        <node id="1c3fdfbd-ee55-47e3-a8c2-52f34a5c5553" uname="node2"
> > type="normal"/>
> >      </nodes>
> >      <resources>
> >        <group id="group_org" collocated="true" ordered="true"
> > multiple_active="stop_start">
> >          <primitive class="ocf" id="IPaddr_1" provider="heartbeat"
> > type="IPaddr">
> >            <operations>
> >              <op id="1" interval="1s" name="monitor" timeout="2s"/>
> >            </operations>
> >            <instance_attributes id="i1">
> >              <attributes>
> >                <nvpair id="id1" name="ip" value="172.20.1.94"/>
> >                <nvpair id="mask1" name="netmask" value="24"/>
> >                <nvpair id="nic1" name="nic" value="eth0"/>
> >              </attributes>
> >            </instance_attributes>
> >          </primitive>
> >          <primitive class="heartbeat" type="ttmgr.sh"
> > provider="heartbeat" id="resource_tt">
> >            <instance_attributes id="resource_tt_instance_attrs">
> >              <attributes/>
> >            </instance_attributes>
> >            <operations>
> >              <op id="tt_start_1" name="start" description="begin op"
> > timeout="5" start_delay="0" disabled="false" role="Started"
> > prereq="nothing" on_fail="stop"/>
> >              <op description="check state" interval="2s" timeout="3s"
> > start_delay="0" disabled="false" role="Started" prereq="nothing"
> > on_fail="restart" id="tt_status_1" name="monitor"/>
> >            </operations>
> >          </primitive>
> >        </group>
> >      </resources>
> >      <constraints>
> >        <rsc_location id="place_testconfig" rsc="group_org">
> >          <rule id="prefered_place_testconfig" score="1500">
> >            <expression attribute="#uname"
> > id="0480539b-f4a5-4380-b573-86ab4fc2c0c6" operation="eq"
> value="node1"/>
> >          </rule>
> >        </rsc_location>
> >        <rsc_location id="place_wl1" rsc="group_org">
> >          <rule id="prefered_place_wl1" score="1000">
> >            <expression attribute="#uname"
> > id="df9404c3-ac5d-4968-b1c9-e4cb8b7ef566" operation="eq"
> value="node2"/>
> >          </rule>
> >        </rsc_location>
> >      </constraints>
> >    </configuration>
> > </cib>
> >
> >
> > ---
> >
> >
> >
> >
> >
> >
> >
> > Hello,
> > >
> > >
> > >
> > > I am new to cluster HA and I am trying to failover a service between
> > > two nodes. I am using heartbeat 2.0.8 on redhat linux (64 bits).
> > >
> > >
> > >
> > > The monitor operation on the resource is not doing failover after
> > > encountering error during monitor operation. It keeps running the
> > > script on the same node and do not fail over to another node.
> >
> >
> > http://www.linux-ha.org/v2/faq/forced_failover
> >
> >
> >
> >
> >
> > http://www.patni.com
> > World-Wide Partnerships. World-Class Solutions.
> > _____________________________________________________________________
> >
> > This e-mail message may contain proprietary, confidential or legally
> > privileged information for the sole use of the person or entity to
> > whom this message was originally addressed. Any review, e-transmission
> > dissemination or other use of or taking of any action in reliance upon
> > this information by persons or entities other than the intended
> > recipient is prohibited. If you have received this e-mail in error
> > kindly delete  this e-mail from your records. If it appears that this
> > mail has been forwarded to you without proper authority, please notify
> > us immediately at netadmin at patni.com and delete this mail.
> > _____________________________________________________________________
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
> ------------------------------
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
> End of Linux-HA Digest, Vol 44, Issue 35
> ****************************************
>
> http://www.patni.com
> World-Wide Partnerships. World-Class Solutions.
> _____________________________________________________________________
>
> This e-mail message may contain proprietary, confidential or legally
> privileged information for the sole use of the person or entity to
> whom this message was originally addressed. Any review, e-transmission
> dissemination or other use of or taking of any action in reliance upon
> this information by persons or entities other than the intended
> recipient is prohibited. If you have received this e-mail in error
> kindly delete  this e-mail from your records. If it appears that this
> mail has been forwarded to you without proper authority, please notify
> us immediately at netadmin at patni.com and delete this mail.
> _____________________________________________________________________
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list