[Linux-HA] ipfail in V2

Alan Robertson alanr at unix.sh
Fri Oct 21 07:16:16 MDT 2005

Andrew Beekhof wrote:
> On 10/21/05, Alan Robertson <alanr at unix.sh> wrote:
>>Andrew Beekhof wrote:
>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>>>Andrew Beekhof wrote:
>>>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>>>>>Andrew Beekhof wrote:
>>>>>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>[big snip]
>>>>>>>if you are the current winner, you should always update the CIB with
>>>>>>>your changed stable value (so that the previous step is always correct)
>>>>>>This is of no help.  It does not provide hysteresis at all.  Averaging
>>>>>>is NOT delay - and it only works for integers anyway...
>>>>>i dont recall saying average
>>>>Mea culpa.
>>>>Lars said "running average" and you said "rolling history window" - and
>>>>I mistakenly equated them.
>>>np. the idea behind my history window was to avoid pudating the CIB
>>>with N if you were probably going to update it with N+/-1 in some time
>>But, its the notification that needs to be delayed, not the updating,
>>I'm afraid.
>>>one thing that i will say (just between you, me and whoever is still
>>>paying attention) is that there is a back door in all this.
>>>I haven't advertised it because frankly its evil.
>>>So evil that even the CRM doesn't use it anymore.
>>>There is an option to say: "make this update, but dont tell anyone".
>>>The update still gets distributed (so consistency is preserved), but
>>>the CIB clients aren't told it happened.
>>>The TE is a client of the CIB.
>>>If the TE doesn't know the change happened, its not going to poke the PE.
>>>If no-one pokes the PE, nothing gets moved.
>>>I'll let you fill in the rest.
>>>In the long term, I think there is no substitute for an ipfail daemon.
>>Yes.  But, this hysteresis updating would allow it to be very simple
>>indeed, and would set a nice simple precedent for other ipfail things.
>>>Don't think I'm trying to replace that, I'm really not.
>>>But in the meantime, this combined with some of the things I've
>>>previously mentioned may be enough for people to "get by" with.
>>>>But delaying each node separately by the same amount (which is what this
>>>>will do) doesn't change anything.
>>>well i was hoping to do more than just delay the updates.  i was
>>>hoping to aggregate them and thus reduce the number of times we dip
>>>into the PE.
>>The hysteresis would in effect aggregate them - from the point of view
>>of the PE, like this:
>>        t1:     node 1 updates (hysteresis timer set for t4)
>>        t2:     node 2 updates (hysteresis timer unchanged)
>>        t3:     node 3 updates (hysteresis timer unchanged)
>>        t4:     hysteresis timer pops - all clients get notified
>>                of the new CIB.
>>This aggregates the notification, and reduces the number of times you
>>dip into the PE, but doesn't aggregate the CIB updates.
>>You may think it is evil.  It may _be_ evil.  But, it is really useful ;-)
>>The question is, after thinking about it, _is_ it really evil?
>>Could you add a timer to it to notify people in a delayed fashion
>>(rather than never notifying them)?  Of course, if you could, then you
>>could change that attribute updating command to use it, and the CIB/CRM
>>work would be completely done.
>>Writing the new ipfail and friends would be really easy then.
> the are a couple of problems with a timer like your proposing.
> the first is that it may be nullified by an unrelated update (ie. from
> a failed monitor) that _must_ be acted on straight away.

Understood that this is going to happen from time to time - but rarely - 
since I presume that one only updates values when something has changed 
- like the number of visible ping nodes.  The moral of the story for 
this is _always_ use a fairly coarse granularity measure.

In general _any_ kind of recovery action is supposed to be rare.

> the second is that ping node access isnt the only thing you'd want to
> monitor in this fashion.  so you also potentially have multiple,
> unrelated, hysteresi tripping over each other.

If one sticks to the principle that updates only happen when something 
interesting (like # of ping nodes) changes, "event tripping" should be 
rare.  And, then the shortest hysteresis interval will effectively win - 
so you don't need to have more than one of these timers active.  One 
should suffice.  If another event comes in with a shorter interval than 
the amount remaining on the current one, replace it with the shorter 
one.  Otherwise ignore it.

The hysteresis interval for ping nodes is "keepalive" time - which is 
typically short - minimizing this danger _for ping nodes_.  For 
temperature, the hysteresis interval might be a minute or maybe even more.

But, if this happens it means it's getting hotter and ping nodes have 
both gone out at the same time...  [Sounds hinky to me].

> third, it sounds a lot like work :-)

Does having a single "repoke" interval like I described make it any 
easier?  From what you say below, it may be much closer to being done 
than I had thought.  Minor changes to the repoke interval (or a clone of 
the code) might be just what the doctor ordered.

> there is also the ability to set a "repoke" interval for the PE -
> would that be sufficient (again, only as a short-term option)?

Is this a one-shot timer?  What happens when you get conflicting 
"repokes"?  Does the last one win?  That might be OK - at least for 
events with similar hysteresis intervals.

> the other option is to later trigger the change with an extra update
> that doesnt use the super-top-secret flag.

By the way, I'm not 100% sure that having this interval be the shortest 
is always the best choice.  Having it be the longest might be a better 
choice in some circumstances.

If this is true, there are circumstances when the optimal choice is 
undecidable.  But, since this is rare - we probably shouldn't worry 
about it _that_ much :-).

     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 

More information about the Linux-HA mailing list