[Linux-HA] Ipfail support for heartbeat 2.0.x
Alan Robertson
alanr at unix.sh
Thu Oct 20 08:52:22 MDT 2005
Lars Marowsky-Bree wrote:
> On 2005-10-19T08:38:02, Andrew Beekhof <beekhof at gmail.com> wrote:
>
> (Moving discussion to -dev)
>
>>>In this case, having it in the one common place for all variables is a
>>>simplification - since it can be done once instead of many times.
>>>
>>So what do people want? I'm not getting it...
>
>>It the PE detects a resource should be moved it will move it then and
>>there - not at some arbitrary point in the future.
>
> OK. I'll try to summarize again, maybe a more formal description of the
> problem will help us all to get to a common understanding. The good part
> is that this is clearly simpler than multi-state resources ;-)
>
> We have a metric "N"; say, number of paths to the storage, number of
> external nodes which can be reached, whatever.
>
> This metric is monitored on each node n -> N(n). As nodes are not
> running in lock-step, it's also observed at a specific time -> N(n,t).
>
> Requirements:
>
> 1. We want to be able to specify a dependency on running where N is
> maximal; so that our webserver runs on the node with the best
> connectivity, for example, or that, if the storage of a node has
> failed, we do a pro-active switch to a node which still has >=2 paths
> at least.
>
> 2. We want to _minimize_ switching resources, because otherwise we
> create more downtime (ie, by switching to a node which doesn't
> provide us any benefit and we have to switch again) than as if we had
> done nothing.
>
> These requirements are slightly conflicting.
>
> R.1 is easy: just feed the attribute into the CIB raw, and let the PE do
> its thing and right away - select a node based on a maximum value for a
> given node attribute isn't difficult. In fact, this will _converge_ on a
> correct solution, yet, we violate R.2.
>
> For example, for ping nodes to monitor external connectivity, it is
> quite likely that not all of them will be reachable all the time; it's
> expected they fluctuate. If we bounce resources every time a single ping
> node hiccups for a few seconds, the admin will not be happy - the
> switch-over caused unneeded downtime.
If that's the case, it will be equally visible to all nodes in the
cluster (given the note below). AND, we already have this type of
hysteresis in the ping nodes "deadtime" computation - so this isn't needed.
> Or, if the ping node goes down for
> real, all nodes will eventually see that - so it's silly to bounce
> resources because n1 has already noticed while n2 hasn't _yet_.
This part is exactly what ipfail currently does. And, without any
special effort for the previous case, there have been no complaints
about it.
> So, R.2 requires that we dampen the events, and just trigger the PE
> after the situation had a chance to stabilize. (Or, as any update to the
> CIB triggers the PE, this for us means to not update the CIB before it
> has stabilized a bit.)
>
> I think we a) need to average the N(n,t) metric over a configurable
> history - this will dampen at a per-node level and prevent minor hiccups
> from a single node to bounce resources, unless the error re-occurs
> frequently.
I disagree on this approach. It's more complicated than needed and only
works with integer values.
When someone tells you to set an attribute value with hysteresis, then
go ahead and set it internally now, but delay a specified amount before
notifying the CRM/PE that it has changed. This is what we do in
reporting newly-added nodes (and it's basically what ipfail does):
Here is how it might work:
Hysteresis set to 3 time units
time 1: node A changes the value of its node attribute to 2
time 2: node B changes the value of its node attribute to 2
time 4: CIB reports the CIB update to the CRM/PE.
no action is taken, because the values are now
the same.
(ping device recovers)
time 21: node A changes the value of its node attribute to 3
time 22: node B changes the value of its node attribute to 3
time 24: CIB reports the CIB update to the CRM/PE.
no action is taken, because the values are now
the same.
Without this change, here's what happens (worst case):
time 1: node A changes the attribute to 2
CIB reports change to CRM/PE
CRM->PE->TE->move resources around
time 2: node B changes the attribute
CIB reports change to CRM/PE
CRM->PE->TE->move resources back where they were
(ping device recovers)
time 21: node A changes the value of its node attribute to 3
CIB reports change to CRM/PE
CRM->PE->TE->move resources around
time 22: node B changes the value of its node attribute to 3
time 24: CIB reports the CIB update to the CRM/PE.
CIB reports change to CRM/PE
CRM->PE->TE->move resources around
In this case up to 4 outages were incurred when none was needed.
This was a cause of major complaints with early versions of
ipfail.
Of course, given the difficulties you've seen with CIB consistency, this
may be an evil choice. If so, then by all means, say so...
There's probably another way...
> But, this isn't enough; it's still a black-or-white decision whether
> that value is higher or smaller than some other nodes.
>
> So, b), we can't do a black-or-white decision, but we need to be able to
> say "N(n_1) greater than all other N(n_x) by d".
Take care of this yourself - in the attribute values. For example,
don't report temperatures in tenths of a degree. Report them as being
OK, too warm, and way too warm with just 3 values. Unlike the
hysteresis, this is easily done by the monitor processes.
Or your could report them as:
"green"
"yellow"
"red"
and write corresponding rules with arbitrary weights...
> This is, I think, sufficient to achieve the desired effect.
I believe that you can already do this by giving a rule whose weight is
an attribute value.
Hysteresis is something the monitor agents can only do by great effort
by themselves. Everything else you proposed is easily built into the
monitoring agents.
I would suggest putting nothing in the CRM that the monitoring agents
can easily do themselves.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
More information about the Linux-HA
mailing list