[Linux-HA] ipfail in V2

Alan Robertson alanr at unix.sh
Fri Oct 21 10:50:57 MDT 2005


Andrew Beekhof wrote:
> On 10/21/05, Alan Robertson <alanr at unix.sh> wrote:
> 
> [snip]

[more-snip]

>>Is this a one-shot timer?  What happens when you get conflicting
>>"repokes"?  Does the last one win?  That might be OK - at least for
>>events with similar hysteresis intervals.
> 
> 
> the repoke is controlled by the DC.
> it is started when the DC enters the idle state and cancelled if it
> ever moves out of it.
> so there is never a conflict - because there is only 1 timer and only
> 1 node running it.
> 
> what you're thinking of is a timer running in the CIB.

Yes.

> you'd need to indicate somehow that this change should start/extend a timer.

Almost.  Start or shorten.  Never extend (AFAIK).

The algorithm is this:
	Is there a delayed notification timer running?
	  If yes, then see how much time is left on it.
	  If the current update is accompanied by a timer and it is
		shorter than the remaining time on that timer, then
		    	Cancel the current	delayed notification
			timeout and start a new timer with
			the timeout which came with the update.
	If no timer running, then
		start a new timer with the value which came with the
			update (if any)

The idea would be that the command for updating the attribute value 
would have a flag for a delayed notification value.
	crm_attribute -d -n node --set --attribute foo --value bar
		(or whatever it is)

> you'd also need to keep track of which updates have been sent out
> you'll start confusing clients because the order will be all messed up

I don't see this as an issue.  The order in which they were updated is 
irrelevant.  They would all appear to the clients to be updated 
simultaneously - which would be perfect.  Or possibly, I misunderstood 
you here.

The idea is that you set the delayed notification value to the "settling 
time" for the events which change the value of this attribute.  It's 
kind of like debouncing a switch in software (except in this case it's 
at an even higher level - for the whole cluster).

> you could even have a situation where the update doesnt even exit
> anymore because the whole CIB was replaced in the meantime.

Help me with this one please.  I don't follow this.


[more snip]

>>By the way, I'm not 100% sure that having this interval be the shortest
>>is always the best choice.  Having it be the longest might be a better
>>choice in some circumstances.
>>
>>If this is true, there are circumstances when the optimal choice is
>>undecidable.  But, since this is rare - we probably shouldn't worry
>>about it _that_ much :-).
> 
> 
> sorry, you lost me here.

No worries.  Not that important.  Just getting carried away in 
possibilities.

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce


More information about the Linux-HA mailing list