[Linux-HA] ipfail in V2

Alan Robertson alanr at unix.sh
Thu Oct 20 11:50:52 MDT 2005


Andrew Beekhof wrote:
> On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>Andrew Beekhof wrote:
>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>>>Andrew Beekhof wrote:
>>>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
>>>>>>Andrew Beekhof wrote:
>>>>>>>On 10/20/05, Lars Marowsky-Bree <lmb at suse.de> wrote:
>>>>>>>>On 2005-10-20T13:22:30, Simon Rowe <srowe at cambridgebroadband.com> wrote:
>>>>>>>>
>>>>>>>>>>(OK, or wait for someone on the linux-ha team to write it.)
>>>>>>>>>I don't have the time for either. I'll have to dump V2 and see if V1
>>>>>>>>>provides sufficient working functionality.
>>>>>>>>I suppose this being Open Source, you could find a developer who'd take
>>>>>>>>a bribe and implement this feature for you for less than you'd have paid
>>>>>>>>for a single node license of a commercial clustering product...
>>>>>>>While in principle I personally would never take bribes, in practice I
>>>>>>>have no principles :-)
>>>>>>>
>>>>>>>With what we have now in CVS, we can get a reasonably close
>>>>>>>approximation of ipfail.
>>>>>>>You wont get true damping - so some ping-pong'ing of resources may still occur.
>>>>>>>It may also be not as efficient as we'd like it to be.
>>>>>>>
>>>>>>>lmb: random thought... if the "ipfail RA" checked the values on the
>>>>>>>other nodes before it updated the CIB, then we could avoid dipping
>>>>>>>into the PE.  the check would be trivial, just specify a different
>>>>>>Writing the code to do this is not trivial...
>>>>>>
>>>>>>
>>>>>>Let's see ... you need to write a join protocol, and then you need to
>>>>>>exchange votes with everyone else, and...
>>>>>>
>>>>>>Right now, they have NO communication with other nodes at all.
>>>>>you dont need any of that.  query the CIB:
>>>>>
>>>>>crm_attribute -G -U some_host -n the_attribute -t nodes
>>>>How does this help with hysteresis?
>>>>
>>>>You need to know the value they have of the attribute _before they
>>>>update it_.  Because, once they update it, boom, you're off on a
>>>>reconfigure-the-cluster mission.  It's too late then.
>>>>
>>>this was the quick sketch that i outlined to lmb.
>>>
>>>you'd do it as part of a monitor-like repeating action.
>>>
>>>so first it would wait until its value had stabilized (the RA could
>>>store a rolling history window - not trivial but also not rocket
>>>science).
>>>
>>>then once its stabilized, you check if your value exceeds the existing
>>>"winner"'s by some threshold.
>>This is irrelevant.  You only know the old winner's value.  Irrelevant
>>to the current situation.
>>
>>>if you are the current winner, you should always update the CIB with
>>>your changed stable value (so that the previous step is always correct)
>>This is of no help.  It does not provide hysteresis at all.  Averaging
>>is NOT delay - and it only works for integers anyway...
> 
> i dont recall saying average

Mea culpa.

Lars said "running average" and you said "rolling history window" - and 
I mistakenly equated them.

But delaying each node separately by the same amount (which is what this 
will do) doesn't change anything.

>>You MUST know what everyone else's values are - before they update the
>>CRM.
> 
> for the perfect algorithm, yes.  but i never said this was trying to be.

I should have been more specific - basically what you need to do is 
change the values immediately, but delay acting on them.

But, once you decide to act then _at that moment_ you need to have all 
the changed values available.

If you put them in the CIB in advance, then it's the CIB that has to 
delay acting on them.

If you don't put them in the CIB but put the delay in the measuring 
apparatus, then you need to be able to trigger updating them in the CIB 
more-or-less simultaneously - which sounds like a bit of a trick (but an 
awesome test case)

This requires than instances of the applications can communicate with 
each other.  (A) below is the simplest version I can think of (it may be 
too simple).

(A) Like, when any observer sees that their hysteresis window has 
elapsed, then they tell the other observers "update the CIB now", and 
they would all commit any uncommitted changes more-or-less at once.

This seems a bit risky...

(B) Another more complex (but more likely to work) option is to choose a 
particular node to do all the updates at once, and they give the CIB a 
single piece of XML which contains all the updated values at once.

If we choose either of these last two options (A or B), I'd suggest 
writing such a daemon once and for all and let the various updaters talk 
to these daemons rather than directly to the CIB.

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce



More information about the Linux-HA mailing list