*** (0.28) *** [Linux-HA] Parallel execution of RAs

Andrew Beekhof beekhof at gmail.com
Thu Mar 8 08:01:36 MST 2007


On 3/8/07, Alan Robertson <alanr at unix.sh> wrote:
> Pavol Gono wrote:
> > Hi
> >
> > I was long time in belief, that all executions of a resource agent
> > with unique instance parameters are serialized. And that there is no
> > way to have two such agents (processes) running at the same time.
> > (I am not talking about RAs, which are serving different instances of
> > resources - here parallelism is logical)
> >
> > But tests show that during periodic executions for monitoring reasons,
> > there can suddenly come stop request. The RA with stop parameter is
> > executed even when RA with monitor parameter is running. After some
> > time RA with monitor parameter is killed with SIGKILL.
>
> If this is happening on the same resource at the same time, it's a bug.
>  Please supply the appropriate logs.  I'd be VERY interested in seeing
> this, because there is quite a bit of code in the LRM which tries to
> keep this from happening.
>
> I just reread the code for handling timeouts.  I'm not convinced that it
> even _tries_ to kill an operation which has timed out.  I need to read
> it again, but at first glance, it looks like it just pretends it
> finished and goes on its merry way...

really?  surely not.

dejan, can you look into this please as, if true, this would also be a
rather serious bug.

>
> The filename is lrm/lrmd/lrmd.c, and the function is
> on_op_timeout_expired() which in my copy is near line 1780.  If you had
> an operation that didn't complete soon enough, and if my reading of the
> code is correct, this could result in parallel execution :-(.
>
> Could this be your situation?
>
> > Could you add to documents (e.g. to
> > http://www.linux-ha.org/ResourceAgent) the big fat warning, that
> > parallel execution of RAs is possible and also killing of RAs before
> > timeout is also possible?
> > I designed my RA in bad way because of this :(
> >
> > My questions are:
> > Is there another possibility of parallelism? E.g. start&monitor,
> > start&stop, start&start, stop&stop, monitor&monitor? The last case is
> > maybe normal when timeout>interval.
> > Is there another possibility to have RA killed, before
> > timeout/default-action-timeout expires?
>
> >From re-reading the code, it looks like it kills a monitor operation
> with SIGKILL if someone has requested that the monitoring be stopped,
> and a monitor is currently running.  If the resource is deleted without
> being stopped, then _any_ operation can be SIGKILLed.
>
>
> --
>     Alan Robertson <alanr at unix.sh>
>
> "Openness is the foundation and preservative of friendship...  Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list