[Linux-HA] Simple Active-Standby failover configuration.

Andrew Beekhof beekhof at gmail.com
Tue Oct 24 09:14:35 MDT 2006


On 10/22/06, Alex and Gill Strachan <asgks at yahoo.com> wrote:
> Hi Peter,
>
> Try adding on_fail="fence" to your monitor operation.

a brutal but reliable method.  hopefully we can get the graceful
method working for Peter instead :-)

>
> e.g.
>          <primitive id="resource_sinfids3A_vip" class="ocf" type="IPaddr" provider="heartbeat">
>            <instance_attributes id="resource_sinfids3A_vip_instance_attrs">
>              <attributes>
>                <nvpair id="resource_sinfids3A_vip_ip" name="ip" value="128.103.2.11"/>
>              </attributes>
>            </instance_attributes>
>            <operations>
>              <op id="IPaddr_sinfids3A_vip_mon" interval="60s" name="monitor" timeout="15s" on_fail="fence"/>
>            </operations>
>          </primitive>
>
> Some commands I have been using to check the failure count and the node weightings (thanks to Andrew)
>
> [root at sinfids3b1 hb2]# crm_verify -L -V -V -V -V -V 2>&1 | grep fail-count
> crm_verify[19453]: 2006/10/22_18:35:59 debug: unpack_lrm_rsc_state: fail-count-resource_sinfids3B_vip: 1
>
>
> [root at sinfids3b1 hb2]# /usr/lib/heartbeat/ptest -L -VVVVVVVVVVVVVVV 2>&1 | egrep assign
> ptest[19455]: 2006/10/22_18:36:03 debug: native_assign_node: Color resource_sinfids3B_vip, Node[0] sinfids3b1: 110098
> ptest[19455]: 2006/10/22_18:36:03 debug: native_assign_node: Color resource_sinfids3B_vip, Node[1] sinfids3a1: -1000000
> ptest[19455]: 2006/10/22_18:36:03 debug: native_assign_node: Color resource_sinfids3B_vip, Node[2] sinfids3a2: -1000000
> ptest[19455]: 2006/10/22_18:36:03 debug: native_assign_node: Assigning sinfids3b1 to resource_sinfids3B_vip
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_aims, Node[0] sinfids3a2: 70093
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_aims, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_aims, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_aims
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_oralsnr
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_oracle
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_smb, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_smb, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_smb, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_smb
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_fs, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_fs, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_fs, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_fs
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_drbd
> ...
>  sinfids3a2 to resource_sinfids3A_vip
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3_vip, Node[0] sinfids3b1: 13099
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3_vip, Node[1] sinfids3a2: 9100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3_vip, Node[2] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3b1 to resource_sinfids3_vip
>
> To be honest though how these numbers are generated is still a MYSTERY !

recent changes should demystify this a little (less of the weird
multipliers and some better logging)

> Similarily you can FAIL a resource e.g. unmount a filesystem and the failure count doesn't change.

There is a bug in 2.0.7 (since fixed) where failcount was not always
being incremented.
Hopefully you're not running with the patch that fixes this.

If you are, can you send me your logs and I'll do my best to fix it promptly.

>  Getting to grips with HB2 is proving a serious challenge - unfortunately it looks like it could be too much effort. :-(
>
>
>
> Peter Wong <peter.wong at mobidia.com> wrote: Greetings:
>
> Yes, I was following the example at
> > http://www.linux-ha.org/v2/faq/forced_failover
> when I created the original cib.xml file that I sent out a couple
> of days ago.
>
> I have specified in the  section:
> ---
>
>
> ...
>
> name="default_resource_failure_stickiness" value="-INFINITY"/>
> ---
>
> I thought by setting the "default_resource_failure_stickiness"
> to "-INFINITY", the resource will be failed over right away
> as soon as the first failure occur.
>
> But this didn't happen and the failure cause the resource to
> restart on the same node.
>
> Then I try modifying the cib.xml to include the following line:
> ---
>
>
>
> resource_stickiness="0" resource_failure_stickiness="-INFINITY">
> ---
>
> But setting "resource_failure_stickiness" to "-INFINITY" didn't
> change the situation. The failure cause the resource to restart
> but just on the local node.
>
> May be the Resource Agent is not doing the right thing?
>
> Here are more questions:
> ---
> 1. When Heartbeat first starts, is it true that it'll invoke the
>    "monitor" action first?
>
> 2. In the implementation of the RA, when the "monitor" action is called
>    should it distinguish the instance between when Heartbeat is first
>    started and the regular periodically called of the "monitor" action?
>
> 3. In the /usr/lib/ocf/resource.d directory, which example RA is a
>    good example to follow if I just want to do a simple Active-Standby
>    immediately failover setup?
> ---
>
> Thanks!
>
> Peter.
>
> > -----Original Message-----
> > From: linux-ha-bounces at lists.linux-ha.org
> > [mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of
> > Andrew Beekhof
> > Sent: Friday, October 20, 2006 12:21 AM
> > To: General Linux-HA mailing list
> > Subject: Re: RE: [Linux-HA] Simple Active-Standby failover
> > configuration.
> >
> > On 10/19/06, Peter Wong
>  wrote:
> > > Greetings:
> > >
> > > Thanks for replying.
> > >
> > > > you need the failure_stickiness attribute
> > > Where exactly do I specify this attribute?
> > > What value do I need to use (like INFINITY or -INFINITY)?
> > >
> > > I would really appreciate if you can give me an
> > > exact example in XML.
> >
> > this is probably the best page to look at:
> >
> > http://www.linux-ha.org/v2/faq/forced_failover
> >
> > its set exactly the same way as the other type of stickiness
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>
>
>
> ---------------------------------
> Do you Yahoo!?
>   Spring Racing Carnival - Check out Sonia Kruger's blog
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list