[Linux-HA] ipfail in V2
alanr at unix.sh
Fri Oct 21 07:16:16 MDT 2005
Andrew Beekhof wrote:
> On 10/21/05, Alan Robertson <alanr at unix.sh> wrote:
>>Andrew Beekhof wrote:
>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>>>Andrew Beekhof wrote:
>>>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>>>>>Andrew Beekhof wrote:
>>>>>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>>>>>>if you are the current winner, you should always update the CIB with
>>>>>>>your changed stable value (so that the previous step is always correct)
>>>>>>This is of no help. It does not provide hysteresis at all. Averaging
>>>>>>is NOT delay - and it only works for integers anyway...
>>>>>i dont recall saying average
>>>>Lars said "running average" and you said "rolling history window" - and
>>>>I mistakenly equated them.
>>>np. the idea behind my history window was to avoid pudating the CIB
>>>with N if you were probably going to update it with N+/-1 in some time
>>But, its the notification that needs to be delayed, not the updating,
>>>one thing that i will say (just between you, me and whoever is still
>>>paying attention) is that there is a back door in all this.
>>>I haven't advertised it because frankly its evil.
>>>So evil that even the CRM doesn't use it anymore.
>>>There is an option to say: "make this update, but dont tell anyone".
>>>The update still gets distributed (so consistency is preserved), but
>>>the CIB clients aren't told it happened.
>>>The TE is a client of the CIB.
>>>If the TE doesn't know the change happened, its not going to poke the PE.
>>>If no-one pokes the PE, nothing gets moved.
>>>I'll let you fill in the rest.
>>>In the long term, I think there is no substitute for an ipfail daemon.
>>Yes. But, this hysteresis updating would allow it to be very simple
>>indeed, and would set a nice simple precedent for other ipfail things.
>>>Don't think I'm trying to replace that, I'm really not.
>>>But in the meantime, this combined with some of the things I've
>>>previously mentioned may be enough for people to "get by" with.
>>>>But delaying each node separately by the same amount (which is what this
>>>>will do) doesn't change anything.
>>>well i was hoping to do more than just delay the updates. i was
>>>hoping to aggregate them and thus reduce the number of times we dip
>>>into the PE.
>>The hysteresis would in effect aggregate them - from the point of view
>>of the PE, like this:
>> t1: node 1 updates (hysteresis timer set for t4)
>> t2: node 2 updates (hysteresis timer unchanged)
>> t3: node 3 updates (hysteresis timer unchanged)
>> t4: hysteresis timer pops - all clients get notified
>> of the new CIB.
>>This aggregates the notification, and reduces the number of times you
>>dip into the PE, but doesn't aggregate the CIB updates.
>>You may think it is evil. It may _be_ evil. But, it is really useful ;-)
>>The question is, after thinking about it, _is_ it really evil?
>>Could you add a timer to it to notify people in a delayed fashion
>>(rather than never notifying them)? Of course, if you could, then you
>>could change that attribute updating command to use it, and the CIB/CRM
>>work would be completely done.
>>Writing the new ipfail and friends would be really easy then.
> the are a couple of problems with a timer like your proposing.
> the first is that it may be nullified by an unrelated update (ie. from
> a failed monitor) that _must_ be acted on straight away.
Understood that this is going to happen from time to time - but rarely -
since I presume that one only updates values when something has changed
- like the number of visible ping nodes. The moral of the story for
this is _always_ use a fairly coarse granularity measure.
In general _any_ kind of recovery action is supposed to be rare.
> the second is that ping node access isnt the only thing you'd want to
> monitor in this fashion. so you also potentially have multiple,
> unrelated, hysteresi tripping over each other.
If one sticks to the principle that updates only happen when something
interesting (like # of ping nodes) changes, "event tripping" should be
rare. And, then the shortest hysteresis interval will effectively win -
so you don't need to have more than one of these timers active. One
should suffice. If another event comes in with a shorter interval than
the amount remaining on the current one, replace it with the shorter
one. Otherwise ignore it.
The hysteresis interval for ping nodes is "keepalive" time - which is
typically short - minimizing this danger _for ping nodes_. For
temperature, the hysteresis interval might be a minute or maybe even more.
But, if this happens it means it's getting hotter and ping nodes have
both gone out at the same time... [Sounds hinky to me].
> third, it sounds a lot like work :-)
Does having a single "repoke" interval like I described make it any
easier? From what you say below, it may be much closer to being done
than I had thought. Minor changes to the repoke interval (or a clone of
the code) might be just what the doctor ordered.
> there is also the ability to set a "repoke" interval for the PE -
> would that be sufficient (again, only as a short-term option)?
Is this a one-shot timer? What happens when you get conflicting
"repokes"? Does the last one win? That might be OK - at least for
events with similar hysteresis intervals.
> the other option is to later trigger the change with an extra update
> that doesnt use the super-top-secret flag.
By the way, I'm not 100% sure that having this interval be the shortest
is always the best choice. Having it be the longest might be a better
choice in some circumstances.
If this is true, there are circumstances when the optimal choice is
undecidable. But, since this is rare - we probably shouldn't worry
about it _that_ much :-).
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
More information about the Linux-HA