[Linux-HA] Automatic Clenaup of certain resources
Andrew Beekhof
beekhof at gmail.com
Wed Feb 6 01:09:32 MST 2008
On Jan 31, 2008, at 3:53 PM, Hildebrand, Nils, 232 wrote:
> Hi Andrew,
>
>> who said anything about re-inventing?
>> "re-implementing" sure, but there aren't exactly many FOSS
>> cluster projects to be borrowing code from.
>
> Thus I tried to outline the ideas - since I just used these clusters
> for
> a couple of years.
>
>> granted not everything about our implementations are always
>> perfect^, but there's no need to get all preachy about it.
>
> Sorry if that came over as an insult. But I sometimes have a feeling
> that Heartbeat v2 is too general in some points. This makes it
> potentially better suited for any use case (even those no one can
> imagine yet) - but it also adds complicity to some simple tasks.
true, we need to work on that.
we've made some steps in that direction already (eg. for location
constraints you can now simply write <rsc_location rsc=group1
node=nodeX score=1000/> so non-"power users" can skip the complexity
and verbosity of rules)
also, part of the cleanup i'm doing for Pacemaker 1.0 is to streamline
the configuration instead of the perl-like 100-ways-to-do-everything
situation we have now.
hopefully the combination will make people's life a little easier
>
>
>> ^ you rightly point out the fail-count code, to which I'd
>> reply that we've been planning to do something about it for
>> some time but lack the resources. [...]
>
>>> So if you start with timing in lrm - please think about the above.
>
> I just wanted to transfer the idea to the right minds. ;-)
:)
> But back to the original question:
>
>>>>> Is there a way to tell Linux-HA to retry a failed resource after a
>
>>>>> certain amount of time again? [...]
>
> The mentioned cluster had also a feature called "auto-clear" which
> would
> clear the faulted-state after some time.
> I personally dislike this idea - while I think the idea of a
> confidence-interval, which clears the fail-count if a resource has not
> faulted and is online again is a good one.
is it not essentially the same thing but with a more complicated
formula?
More information about the Linux-HA
mailing list