[Linux-HA] Re: bug in failcount handling?
sergeyfd at gmail.com
Tue Oct 30 10:48:13 MDT 2007
On 10/30/07, Alan Robertson <alanr at unix.sh> wrote:
> Andrew Beekhof wrote:
> > On Oct 30, 2007, at 2:44 AM, Alan Robertson wrote:
> >> Hi,
> >> I've been working with a customer - trying to get them up and running
> >> on version 2.1.2. I got everything to work except for one thing:
> >> They require that their web server fail over on the 3rd failure. I
> >> read the documentation on the failcount stuff on the web site here:
> >> http://www.linux-ha.org/v2/faq/forced_failover
> >> I think I understood it, and I created a CIB to match. In the CIB I
> >> created, I believe it should fail over on the 3rd failure. In
> >> practice it fails over reliably on the 9th iteration instead. We had
> >> been doing a "killall httpd" to fail the web server.
> > 9th is correct.
> > As has been explained here on the list a number of times, the group's
> > stickiness is N * default-resource-stickiness, where N is the number of
> > resources in the group.
> > Including the rsc_location constraint, the group stickiness is therefor:
> > 4 * 20 + 1 = 81
> > So clearly apache is going to need to fail 9 times (9 *
> > default-resource-failure-stickiness = -90) before the group is moved.
> > Of course it all starts getting even more complicated when one starts
> > creating rsc_colocation constraints with other groups and primitives.
> Can I specify the resource-failure-stickiness of a group either
> explicitly or implicitly?
> Since I'm writing this up for the web site, I want to make sure I have
> this absolutely clear so I can write it up correctly:
> Do you mean that you sum up the stickiness values for each resource in
> the group, or did you really mean that you it always uses n*default
> stickiness? (I'm asking for both for failure stickiness and resource
> If I have a locational constraints for a group of 'p' points, does that
> then distribute across the group of 'n' nodes so that we get a group
> preference of 'p' * 'n' points? Or is it just just a total of 'p'
> points for the group as a whole?
> My current attempt to document this can be found here:
I always wandered why this is so complex. Why you guys can't implement
one more resource attribute that would simply identify after how many
failures the resource has to be moved out of the failing node?
More information about the Linux-HA