[Linux-HA] ipfail in V2

Andrew Beekhof beekhof at gmail.com
Thu Oct 20 14:19:17 MDT 2005


On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> Andrew Beekhof wrote:
> > On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> >>Andrew Beekhof wrote:
> >>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> >>>>Andrew Beekhof wrote:
> >>>>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> >>>>>>Andrew Beekhof wrote:
> >>>>>>>On 10/20/05, Lars Marowsky-Bree <lmb at suse.de> wrote:
> >>>>>>>>On 2005-10-20T13:22:30, Simon Rowe <srowe at cambridgebroadband.com> wrote:
> >>>>>>>>
> >>>>>>>>>>(OK, or wait for someone on the linux-ha team to write it.)
> >>>>>>>>>I don't have the time for either. I'll have to dump V2 and see if V1
> >>>>>>>>>provides sufficient working functionality.
> >>>>>>>>I suppose this being Open Source, you could find a developer who'd take
> >>>>>>>>a bribe and implement this feature for you for less than you'd have paid
> >>>>>>>>for a single node license of a commercial clustering product...
> >>>>>>>While in principle I personally would never take bribes, in practice I
> >>>>>>>have no principles :-)
> >>>>>>>
> >>>>>>>With what we have now in CVS, we can get a reasonably close
> >>>>>>>approximation of ipfail.
> >>>>>>>You wont get true damping - so some ping-pong'ing of resources may still occur.
> >>>>>>>It may also be not as efficient as we'd like it to be.
> >>>>>>>
> >>>>>>>lmb: random thought... if the "ipfail RA" checked the values on the
> >>>>>>>other nodes before it updated the CIB, then we could avoid dipping
> >>>>>>>into the PE.  the check would be trivial, just specify a different
> >>>>>>Writing the code to do this is not trivial...
> >>>>>>
> >>>>>>
> >>>>>>Let's see ... you need to write a join protocol, and then you need to
> >>>>>>exchange votes with everyone else, and...
> >>>>>>
> >>>>>>Right now, they have NO communication with other nodes at all.
> >>>>>you dont need any of that.  query the CIB:
> >>>>>
> >>>>>crm_attribute -G -U some_host -n the_attribute -t nodes
> >>>>How does this help with hysteresis?
> >>>>
> >>>>You need to know the value they have of the attribute _before they
> >>>>update it_.  Because, once they update it, boom, you're off on a
> >>>>reconfigure-the-cluster mission.  It's too late then.
> >>>>
> >>>this was the quick sketch that i outlined to lmb.
> >>>
> >>>you'd do it as part of a monitor-like repeating action.
> >>>
> >>>so first it would wait until its value had stabilized (the RA could
> >>>store a rolling history window - not trivial but also not rocket
> >>>science).
> >>>
> >>>then once its stabilized, you check if your value exceeds the existing
> >>>"winner"'s by some threshold.
> >>This is irrelevant.  You only know the old winner's value.  Irrelevant
> >>to the current situation.
> >>
> >>>if you are the current winner, you should always update the CIB with
> >>>your changed stable value (so that the previous step is always correct)
> >>This is of no help.  It does not provide hysteresis at all.  Averaging
> >>is NOT delay - and it only works for integers anyway...
> >
> > i dont recall saying average
>
> Mea culpa.
>
> Lars said "running average" and you said "rolling history window" - and
> I mistakenly equated them.

np. the idea behind my history window was to avoid pudating the CIB
with N if you were probably going to update it with N+/-1 in some time
interval.


one thing that i will say (just between you, me and whoever is still
paying attention) is that there is a back door in all this.

I haven't advertised it because frankly its evil.
So evil that even the CRM doesn't use it anymore.
There is an option to say: "make this update, but dont tell anyone".

The update still gets distributed (so consistency is preserved), but
the CIB clients aren't told it happened.

The TE is a client of the CIB.
If the TE doesn't know the change happened, its not going to poke the PE.
If no-one pokes the PE, nothing gets moved.

I'll let you fill in the rest.

In the long term, I think there is no substitute for an ipfail daemon.
Don't think I'm trying to replace that, I'm really not.
But in the meantime, this combined with some of the things I've
previously mentioned may be enough for people to "get by" with.

>
> But delaying each node separately by the same amount (which is what this
> will do) doesn't change anything.

well i was hoping to do more than just delay the updates.  i was
hoping to aggregate them and thus reduce the number of times we dip
into the PE.

>
> >>You MUST know what everyone else's values are - before they update the
> >>CRM.
> >
> > for the perfect algorithm, yes.  but i never said this was trying to be.
>
> I should have been more specific - basically what you need to do is
> change the values immediately, but delay acting on them.
>
> But, once you decide to act then _at that moment_ you need to have all
> the changed values available.
>
> If you put them in the CIB in advance, then it's the CIB that has to
> delay acting on them.
>
> If you don't put them in the CIB but put the delay in the measuring
> apparatus, then you need to be able to trigger updating them in the CIB
> more-or-less simultaneously - which sounds like a bit of a trick (but an
> awesome test case)
>
> This requires than instances of the applications can communicate with
> each other.  (A) below is the simplest version I can think of (it may be
> too simple).
>
> (A) Like, when any observer sees that their hysteresis window has
> elapsed, then they tell the other observers "update the CIB now", and
> they would all commit any uncommitted changes more-or-less at once.
>
> This seems a bit risky...
>
> (B) Another more complex (but more likely to work) option is to choose a
> particular node to do all the updates at once, and they give the CIB a
> single piece of XML which contains all the updated values at once.
>
> If we choose either of these last two options (A or B), I'd suggest
> writing such a daemon once and for all and let the various updaters talk
> to these daemons rather than directly to the CIB.
>
> --
>      Alan Robertson <alanr at unix.sh>
>
> "Openness is the foundation and preservative of friendship...  Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



More information about the Linux-HA mailing list