[Linux-HA] Re: bug in failcount handling?

Alan Robertson alanr at unix.sh
Tue Oct 30 10:30:05 MDT 2007


Andrew Beekhof wrote:
> On Oct 30, 2007, at 2:44 AM, Alan Robertson wrote:
> 
>> Hi,
>>
>> I've been working with a customer - trying to get them up and running 
>> on version 2.1.2.  I got everything to work except for one thing:  
>> They require that their web server fail over on the 3rd failure.  I 
>> read the documentation on the failcount stuff on the web site here: 
>> http://www.linux-ha.org/v2/faq/forced_failover
>>
>> I think I understood it, and I created a CIB to match.  In the CIB I 
>> created, I believe it should fail over on the 3rd failure.  In 
>> practice it fails over reliably on the 9th iteration instead.  We had 
>> been doing a "killall httpd" to fail the web server.
> 
> 9th is correct.
> 
> As has been explained here on the list a number of times, the group's 
> stickiness is N * default-resource-stickiness, where N is the number of 
> resources in the group.
> 
> Including the rsc_location constraint, the group stickiness is therefor: 
> 4 * 20 + 1 = 81
> So clearly apache is going to need to fail 9 times (9 * 
> default-resource-failure-stickiness = -90) before the group is moved.
> 
> 
> Of course it all starts getting even more complicated when one starts 
> creating rsc_colocation constraints with other groups and primitives.

Can I specify the resource-failure-stickiness of a group either 
explicitly or implicitly?

Since I'm writing this up for the web site, I want to make sure I have 
this absolutely clear so I can write it up correctly:

Do you mean that you sum up the stickiness values for each resource in 
the group, or did you really mean that you it always uses n*default 
stickiness? (I'm asking for both for failure stickiness and resource 
stickiness).

If I have a locational constraints for a group of 'p' points, does that 
then distribute across the group of 'n' nodes so that we get a group 
preference of 'p' * 'n' points?  Or is it just just a total of 'p' 
points for the group as a whole?

My current attempt to document this can be found here:
	http://linux-ha.org/v2/faq/forced_failover

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce


More information about the Linux-HA mailing list