[Linux-HA] implementing sleep() between RA's start/stop operations

Andrew Beekhof beekhof at gmail.com
Thu Sep 14 01:52:20 MDT 2006


On 9/13/06, konrad rzentarzewski <konrad.rzentarzewski at artegence.com> wrote:
> On Wed, Sep 13, 2006 at 02:54:44AM +0200, Dejan Muhamedagic wrote:
>
> > If the RA can't stop/start the resource than it should fail. In
> > your case (and you really have a resilient resource there), the
> > apache RA should have failed. But this should be fixed. Perhaps
> > something like the patch attached. Please let us know if it
> > worked.
>
> your patch still introduces 5 seconds delay and after that may fail (if
> eleanup after process is not done yet). i would opt for indefinite loop
> waiting for PID to disappear and define a timeout for stop action that
> would kill RA and set resource unmanaged if it loops longer than minute.

right - that is the approach i would recommend
generally RAs shouldn't impose artificial timeouts and should instead
rely on the CRM/LRM to deal with that.

the one problem with that however is that if stonith is configured the
node will be shot as a consequence.

it might be nice to have a stop and an escalated "force_stop" action -
and only if force_stop fails do we start shooting.

>
> is this right?
>
> apache RA:
>
> --- apache.dist 2006-08-29 13:41:11.701562750 +0200
> +++ apache      2006-08-29 13:42:18.161716250 +0200
> @@ -343,8 +343,15 @@
>          [ $tries -lt 10 ]
>        do
>          sleep 1
> -        kill $ApachePID >/dev/null 2>&1
> -        ocf_log info "Killing apache PID $ApachePID"
> +       if
> +         [ $tries -ge 9 ]
> +       then
> +         kill -9 $ApachePID >/dev/null 2>&1
> +         ocf_log info "Slaughtering apache PID $ApachePID after 9 unsuccessful tries"
> +       else
> +          kill $ApachePID >/dev/null 2>&1
> +          ocf_log info "Killing apache PID $ApachePID"
> +       fi
>          tries=`expr $tries + 1`
>        done
>      else
> --- apache~     2006-09-12 17:18:29.607037933 +0200
> +++ apache      2006-09-13 11:35:30.327221687 +0200
> @@ -354,6 +354,12 @@
>         fi
>          tries=`expr $tries + 1`
>        done
> +      while
> +       ProcessRunning $ApachePID
> +      do
> +       ocf_log info "Apache PID $ApachePID still running, waiting for quit..."
> +       sleep 1
> +      done
>      else
>        ocf_log warn "Killing apache PID $ApachePID FAILED."
>      fi
>
> cib.xml:
>
> <primitive class="ocf" id="apache" provider="heartbeat" type="apache">
>   <operations>
>     <op id="apache_mon-start" name="start" timeout="30s" prereq="nothing" on_fail="restart"/>
>     <op id="apache_mon-stop" name="stop" timeout="60s" prereq="nothing" on_fail="restart"/>
>     <op id="apache_mon" interval="120s" name="monitor" timeout="30s"/>
>   </operations>
>   <instance_attributes id="apache-defaults" /><!-- normal config here... -->
> </primitive>
>
> --
>  konrad rzentarzewski
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list