[Linux-HA] ipfail in V2

Andrew Beekhof beekhof at gmail.com
Thu Oct 20 09:43:26 MDT 2005


On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> Andrew Beekhof wrote:
> > On 10/20/05, Alan Robertson <alanr at unix.sh> wrote:
> >>Andrew Beekhof wrote:
> >>>On 10/20/05, Lars Marowsky-Bree <lmb at suse.de> wrote:
> >>>>On 2005-10-20T13:22:30, Simon Rowe <srowe at cambridgebroadband.com> wrote:
> >>>>
> >>>>>>(OK, or wait for someone on the linux-ha team to write it.)
> >>>>>I don't have the time for either. I'll have to dump V2 and see if V1
> >>>>>provides sufficient working functionality.
> >>>>I suppose this being Open Source, you could find a developer who'd take
> >>>>a bribe and implement this feature for you for less than you'd have paid
> >>>>for a single node license of a commercial clustering product...
> >>>While in principle I personally would never take bribes, in practice I
> >>>have no principles :-)
> >>>
> >>>With what we have now in CVS, we can get a reasonably close
> >>>approximation of ipfail.
> >>>You wont get true damping - so some ping-pong'ing of resources may still occur.
> >>>It may also be not as efficient as we'd like it to be.
> >>>
> >>>lmb: random thought... if the "ipfail RA" checked the values on the
> >>>other nodes before it updated the CIB, then we could avoid dipping
> >>>into the PE.  the check would be trivial, just specify a different
> >>Writing the code to do this is not trivial...
> >>
> >>
> >>Let's see ... you need to write a join protocol, and then you need to
> >>exchange votes with everyone else, and...
> >>
> >>Right now, they have NO communication with other nodes at all.
> >
> > you dont need any of that.  query the CIB:
> >
> > crm_attribute -G -U some_host -n the_attribute -t nodes
>
> How does this help with hysteresis?
>
> You need to know the value they have of the attribute _before they
> update it_.  Because, once they update it, boom, you're off on a
> reconfigure-the-cluster mission.  It's too late then.
>

this was the quick sketch that i outlined to lmb.

you'd do it as part of a monitor-like repeating action.

so first it would wait until its value had stabilized (the RA could
store a rolling history window - not trivial but also not rocket
science).

then once its stabilized, you check if your value exceeds the existing
"winner"'s by some threshold.

if you are the current winner, you should always update the CIB with
your changed stable value (so that the previous step is always correct)



I'm not trying to say this is perfect, but it gets you pretty close.



More information about the Linux-HA mailing list