[Linux-HA] Simple Active-Standby failover configuration.
Andrew Beekhof
beekhof at gmail.com
Tue Oct 24 09:18:47 MDT 2006
On 10/23/06, Alex and Gill Strachan <asgks at yahoo.com> wrote:
> I been puzzling over the assign values ... My first goal is to fail the system over on failure of a resource assigned to a group.
>
> my setup
>
> [root at sinfids3b1 hb2]# crm_verify -L -V -V -V -V
> ...
> crm_verify[32386]: 2006/10/23_14:59:57 debug: unpack_config: Default stickiness: 9999
> crm_verify[32386]: 2006/10/23_14:59:57 debug: unpack_config: Default failure
> stickiness: -99999
>
> my score for a node is 100 when it matches a location.
> ...
> Note the resource stickiness is 9999 with the failure being -99999; this is used in the calculation for node weight.
>
> So, group_sinfids3A last resource in the chain has 100 + (6 * 9999) as it's current Color value. (6 - current resources assigned to the group) When something fails within the group -99999 is substracted making the host - which forces the failover !
is this good or bad? it certainly what you told it to do :-)
>
> I got confused with failure counts, it doesn't appear to be a cumulative number of failures. i.e. you can fail the vip many times but the failcount doesn't change.
See previous email.
It should do. It wont do in 2.0.7 but will do in 2.0.8.
> It is means that this resource failed on this node. This means that the resource will not return to this node until it is cleared. This is a manual process using crm_failcount command - enabling the node to run the resource again. Pesonally I would prefer that this wasn't manual and that the system just bounced between the nodes.
thats exactly what we're trying to avoid
>
>
> [root at sinfids3b1 hb2]# crm_mon -1
> ============
> Last updated: Mon Oct 23 15:01:55 2006
> Current DC: sinfids3b1 (338afa76-8997-4d66-8381-fc36ec4b456b)
> 3 Nodes configured.
> 3 Resources configured.
> ============
>
> Node: sinfids3b1 (338afa76-8997-4d66-8381-fc36ec4b456b): online
> Node: sinfids3a2 (ec74bd17-2016-4d32-a694-0f6983121cd9): online
> Node: sinfids3a1 (b757aece-0e47-41e5-92b7-6a80b4f3eea7): online
>
> Resource Group: group_sinfids3
> resource_sinfids3_vip (heartbeat::ocf:IPaddr): Started sinfids3b1
> Resource Group: group_sinfids3A
> resource_sinfids3A_vip (heartbeat::ocf:IPaddr): Started sinfids3a2
> resource_sinfids3A_drbd (heartbeat:drbddisk): Started sinfids3a2
> resource_sinfids3A_fs (heartbeat::ocf:Filesystem): Started sinfids3a2
> resource_sinfids3A_smb (lsb:smb): Started sinfids3a2
> resource_sinfids3A_oracle (heartbeat::ocf:oracle): Started sinfids3a2
> resource_sinfids3A_oralsnr (heartbeat::ocf:oralsnr): Started sinfids3a2
> Resource Group: group_sinfids3B
> resource_sinfids3B_vip (heartbeat::ocf:IPaddr): Started sinfids3b1
>
>
> [root at sinfids3b1 hb2]# crm_verify -L -V -V -V -V -V 2>&1 | grep fail-count
> crm_verify[32310]: 2006/10/23_14:54:40 debug: unpack_lrm_rsc_state: fail-count-resource_sinfids3A_smb: 1
>
>
> [root at sinfids3b1 hb2]# /usr/lib/heartbeat/ptest -L -VVVVVVVVVVVVVVV 2>&1 | egrep assign
> [root at sinfids3b1 hb2]# /usr/lib/heartbeat/ptest -L -VVVVVVVVVVVVVVV 2>&1 | egrep assign
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3B_vip, Node[0] sinfids3b1: 10099
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3B_vip, Node[1] sinfids3a1: -1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3B_vip, Node[2] sinfids3a2: -1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Assigning sinfids3b1 to resource_sinfids3B_vip
>
>
>
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[0] sinfids3a2: 60094
>
>
>
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[1] sinfids3a1: -99899
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[2] sinfids3b1: -1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_oralsnr
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[0] sinfids3a2: 1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[1] sinfids3a1: -99899
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[2] sinfids3b1: -1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_oracle
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_smb, Node[0] sinfids3a2: 1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_smb, Node[1] sinfids3a1: -99899
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_smb, Node[2] sinfids3b1: -1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_smb
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_fs, Node[0] sinfids3a2: 1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_fs, Node[1] sinfids3a1: 100
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_fs, Node[2] sinfids3b1: -1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_fs
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[0] sinfids3a2: 1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[1] sinfids3a1: 100
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[2] sinfids3b1: -1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_drbd
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_vip, Node[0] sinfids3a2: 1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_vip, Node[1] sinfids3a1: 100
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3A_vip, Node[2] sinfids3b1: -1000000
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_vip
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3_vip, Node[0] sinfids3b1: 13099
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3_vip, Node[1] sinfids3a2: 9100
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Color resource_sinfids3_vip, Node[2] sinfids3a1: 100
> ptest[32587]: 2006/10/23_15:21:30 debug: native_assign_node: Assigning sinfids3b1 to resource_sinfids3_vip
>
>
> For testing failover failure
> confirm system is running as expected
> check for failure counts
> remove any failure counts for resources
> fail resource - hopefully watch system failover
> confirm failure count on the failed resource
> and repeat !!!
>
>
> Alex and Gill Strachan <asgks at yahoo.com> wrote: Hi Peter,
>
> Try adding on_fail="fence" to your monitor operation.
>
> e.g.
>
>
>
>
>
>
>
>
>
>
>
>
> Some commands I have been using to check the failure count and the node weightings (thanks to Andrew)
>
> [root at sinfids3b1 hb2]# crm_verify -L -V -V -V -V -V 2>&1 | grep fail-count
> crm_verify[19453]: 2006/10/22_18:35:59 debug: unpack_lrm_rsc_state: fail-count-resource_sinfids3B_vip: 1
>
>
> [root at sinfids3b1 hb2]# /usr/lib/heartbeat/ptest -L -VVVVVVVVVVVVVVV 2>&1 | egrep assign
> ptest[19455]: 2006/10/22_18:36:03 debug: native_assign_node: Color resource_sinfids3B_vip, Node[0] sinfids3b1: 110098
> ptest[19455]: 2006/10/22_18:36:03 debug: native_assign_node: Color resource_sinfids3B_vip, Node[1] sinfids3a1: -1000000
> ptest[19455]: 2006/10/22_18:36:03 debug: native_assign_node: Color resource_sinfids3B_vip, Node[2] sinfids3a2: -1000000
> ptest[19455]: 2006/10/22_18:36:03 debug: native_assign_node: Assigning sinfids3b1 to resource_sinfids3B_vip
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_aims, Node[0] sinfids3a2: 70093
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_aims, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_aims, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_aims
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oralsnr, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_oralsnr
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_oracle, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_oracle
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_smb, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_smb, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_smb, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_smb
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_fs, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_fs, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_fs, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_fs
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[0] sinfids3a2: 1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[1] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3A_drbd, Node[2] sinfids3b1: -1000000
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3a2 to resource_sinfids3A_drbd
> ...
> sinfids3a2 to resource_sinfids3A_vip
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3_vip, Node[0] sinfids3b1: 13099
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3_vip, Node[1] sinfids3a2: 9100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Color resource_sinfids3_vip, Node[2] sinfids3a1: 100
> ptest[19455]: 2006/10/22_18:36:04 debug: native_assign_node: Assigning sinfids3b1 to resource_sinfids3_vip
>
> To be honest though how these numbers are generated is still a MYSTERY ! Similarily you can FAIL a resource e.g. unmount a filesystem and the failure count doesn't change. Getting to grips with HB2 is proving a serious challenge - unfortunately it looks like it could be too much effort. :-(
>
>
>
> Peter Wong
> wrote: Greetings:
>
> Yes, I was following the example at
> > http://www.linux-ha.org/v2/faq/forced_failover
> when I created the original cib.xml file that I sent out a couple
> of days ago.
>
> I have specified in the section:
> ---
>
>
> ...
>
> name="default_resource_failure_stickiness" value="-INFINITY"/>
> ---
>
> I thought by setting the "default_resource_failure_stickiness"
> to "-INFINITY", the resource will be failed over right away
> as soon as the first failure occur.
>
> But this didn't happen and the failure cause the resource to
> restart on the same node.
>
> Then I try modifying the cib.xml to include the following line:
> ---
>
>
>
> resource_stickiness="0" resource_failure_stickiness="-INFINITY">
> ---
>
> But setting "resource_failure_stickiness" to "-INFINITY" didn't
> change the situation. The failure cause the resource to restart
> but just on the local node.
>
> May be the Resource Agent is not doing the right thing?
>
> Here are more questions:
> ---
> 1. When Heartbeat first starts, is it true that it'll invoke the
> "monitor" action first?
>
> 2. In the implementation of the RA, when the "monitor" action is called
> should it distinguish the instance between when Heartbeat is first
> started and the regular periodically called of the "monitor" action?
>
> 3. In the /usr/lib/ocf/resource.d directory, which example RA is a
> good example to follow if I just want to do a simple Active-Standby
> immediately failover setup?
> ---
>
> Thanks!
>
> Peter.
>
> > -----Original Message-----
> > From: linux-ha-bounces at lists.linux-ha.org
> > [mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of
> > Andrew Beekhof
> > Sent: Friday, October 20, 2006 12:21 AM
> > To: General Linux-HA mailing list
> > Subject: Re: RE: [Linux-HA] Simple Active-Standby failover
> > configuration.
> >
> > On 10/19/06, Peter Wong
> wrote:
> > > Greetings:
> > >
> > > Thanks for replying.
> > >
> > > > you need the failure_stickiness attribute
> > > Where exactly do I specify this attribute?
> > > What value do I need to use (like INFINITY or -INFINITY)?
> > >
> > > I would really appreciate if you can give me an
> > > exact example in XML.
> >
> > this is probably the best page to look at:
> >
> > http://www.linux-ha.org/v2/faq/forced_failover
> >
> > its set exactly the same way as the other type of stickiness
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>
>
>
> ---------------------------------
> Do you Yahoo!?
> Spring Racing Carnival - Check out Sonia Kruger's blog
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>
>
>
> ---------------------------------
> On Yahoo!7
> PS Trixi: Check back weekly for Trixi's latest update
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list