[Linux-HA] implementing sleep() between RA's start/stop
operations
Andrew Beekhof
beekhof at gmail.com
Thu Sep 14 01:52:20 MDT 2006
On 9/13/06, konrad rzentarzewski <konrad.rzentarzewski at artegence.com> wrote:
> On Wed, Sep 13, 2006 at 02:54:44AM +0200, Dejan Muhamedagic wrote:
>
> > If the RA can't stop/start the resource than it should fail. In
> > your case (and you really have a resilient resource there), the
> > apache RA should have failed. But this should be fixed. Perhaps
> > something like the patch attached. Please let us know if it
> > worked.
>
> your patch still introduces 5 seconds delay and after that may fail (if
> eleanup after process is not done yet). i would opt for indefinite loop
> waiting for PID to disappear and define a timeout for stop action that
> would kill RA and set resource unmanaged if it loops longer than minute.
right - that is the approach i would recommend
generally RAs shouldn't impose artificial timeouts and should instead
rely on the CRM/LRM to deal with that.
the one problem with that however is that if stonith is configured the
node will be shot as a consequence.
it might be nice to have a stop and an escalated "force_stop" action -
and only if force_stop fails do we start shooting.
>
> is this right?
>
> apache RA:
>
> --- apache.dist 2006-08-29 13:41:11.701562750 +0200
> +++ apache 2006-08-29 13:42:18.161716250 +0200
> @@ -343,8 +343,15 @@
> [ $tries -lt 10 ]
> do
> sleep 1
> - kill $ApachePID >/dev/null 2>&1
> - ocf_log info "Killing apache PID $ApachePID"
> + if
> + [ $tries -ge 9 ]
> + then
> + kill -9 $ApachePID >/dev/null 2>&1
> + ocf_log info "Slaughtering apache PID $ApachePID after 9 unsuccessful tries"
> + else
> + kill $ApachePID >/dev/null 2>&1
> + ocf_log info "Killing apache PID $ApachePID"
> + fi
> tries=`expr $tries + 1`
> done
> else
> --- apache~ 2006-09-12 17:18:29.607037933 +0200
> +++ apache 2006-09-13 11:35:30.327221687 +0200
> @@ -354,6 +354,12 @@
> fi
> tries=`expr $tries + 1`
> done
> + while
> + ProcessRunning $ApachePID
> + do
> + ocf_log info "Apache PID $ApachePID still running, waiting for quit..."
> + sleep 1
> + done
> else
> ocf_log warn "Killing apache PID $ApachePID FAILED."
> fi
>
> cib.xml:
>
> <primitive class="ocf" id="apache" provider="heartbeat" type="apache">
> <operations>
> <op id="apache_mon-start" name="start" timeout="30s" prereq="nothing" on_fail="restart"/>
> <op id="apache_mon-stop" name="stop" timeout="60s" prereq="nothing" on_fail="restart"/>
> <op id="apache_mon" interval="120s" name="monitor" timeout="30s"/>
> </operations>
> <instance_attributes id="apache-defaults" /><!-- normal config here... -->
> </primitive>
>
> --
> konrad rzentarzewski
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list