[Linux-HA] ipfail in V2
Andrew Beekhof
beekhof at gmail.com
Thu Oct 20 10:17:46 MDT 2005
On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> Andrew Beekhof wrote:
> > On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> >>Andrew Beekhof wrote:
> >>>On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> >>>>Andrew Beekhof wrote:
> >>>>>On 10/20/05, Lars Marowsky-Bree <lmb at suse.de> wrote:
> >>>>>>On 2005-10-20T13:22:30, Simon Rowe <srowe at cambridgebroadband.com> wrote:
> >>>>>>
> >>>>>>>>(OK, or wait for someone on the linux-ha team to write it.)
> >>>>>>>I don't have the time for either. I'll have to dump V2 and see if V1
> >>>>>>>provides sufficient working functionality.
> >>>>>>I suppose this being Open Source, you could find a developer who'd take
> >>>>>>a bribe and implement this feature for you for less than you'd have paid
> >>>>>>for a single node license of a commercial clustering product...
> >>>>>While in principle I personally would never take bribes, in practice I
> >>>>>have no principles :-)
> >>>>>
> >>>>>With what we have now in CVS, we can get a reasonably close
> >>>>>approximation of ipfail.
> >>>>>You wont get true damping - so some ping-pong'ing of resources may still occur.
> >>>>>It may also be not as efficient as we'd like it to be.
> >>>>>
> >>>>>lmb: random thought... if the "ipfail RA" checked the values on the
> >>>>>other nodes before it updated the CIB, then we could avoid dipping
> >>>>>into the PE. the check would be trivial, just specify a different
> >>>>Writing the code to do this is not trivial...
> >>>>
> >>>>
> >>>>Let's see ... you need to write a join protocol, and then you need to
> >>>>exchange votes with everyone else, and...
> >>>>
> >>>>Right now, they have NO communication with other nodes at all.
> >>>you dont need any of that. query the CIB:
> >>>
> >>>crm_attribute -G -U some_host -n the_attribute -t nodes
> >>How does this help with hysteresis?
> >>
> >>You need to know the value they have of the attribute _before they
> >>update it_. Because, once they update it, boom, you're off on a
> >>reconfigure-the-cluster mission. It's too late then.
> >>
> >
> > this was the quick sketch that i outlined to lmb.
> >
> > you'd do it as part of a monitor-like repeating action.
> >
> > so first it would wait until its value had stabilized (the RA could
> > store a rolling history window - not trivial but also not rocket
> > science).
> >
> > then once its stabilized, you check if your value exceeds the existing
> > "winner"'s by some threshold.
>
> This is irrelevant. You only know the old winner's value. Irrelevant
> to the current situation.
>
> >
> > if you are the current winner, you should always update the CIB with
> > your changed stable value (so that the previous step is always correct)
>
> This is of no help. It does not provide hysteresis at all. Averaging
> is NOT delay - and it only works for integers anyway...
i dont recall saying average
>
> You MUST know what everyone else's values are - before they update the
> CRM.
for the perfect algorithm, yes. but i never said this was trying to be.
> Knowing the old value is of no help at all.
>
> You can't know the value in another node unless you communicate outside
> of the CRM. Which gets back to my previous note about join protocol, etc.
>
> If this can't be done in the CIB (which is what it's beginning to sound
> like), then we'll have to create an update hysteresis daemon which deals
> with this. Basically, a pre-update-CIB for certain attribute values.
>
> What the ipfail code does, and this proposal does NOT do is:
>
> observe updates
> Delay a while
> collect ALL current updated values
> act on them
>
> Averages actually make this worse. If I have a value like 0/1, and want
> to average it, and present an integer value, then all you've done is
> delay the exact same events by the exact same amounts.
>
> So... If the events were observed in this way:
> t0: everyone sees 3 ping nodes
> t1: node 1 sees 2 ping nodes
> t2: node 2 sees 2 ping nodes
>
> You've changed that to:
> t0: everyone sees 3 ping nodes
> t4: node 1 sees 2 ping nodes
> t5 node 2 sees 2 ping nodes.
>
> This is of no help at all.
>
>
> --
> Alan Robertson <alanr at unix.sh>
>
> "Openness is the foundation and preservative of friendship... Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list