[Linux-HA] Service not starting?
Alan Robertson
alanr at unix.sh
Mon Mar 15 06:34:58 MST 2004
John Hearns wrote:
> On Sun, 14 Mar 2004, Alan Robertson wrote:
>
>
>>>I'm a bit puzzled. Is there a timeout set for services started
>>>by heartbeat? Am I just lost in space?
>>
>>
>>
>>In general, sometimes scripts which start fine by hand don't start when
>>started by programs. I've had this happen with cron jobs, and other things
>>as well...
>>
>
>
> Alan, thanks for your help.
> Use tried and trusted debugging tools before suspecting anything more
> exotic.
>
> It looks as if the problem is to do with Gridengine stopping.
> when the rcsge script is called with a stop, it kills of the SGE
> processes. So far, so good. One of these processes is the sge_schedd
> This takes a bit of time to die. The process has a PID file on disk.
> When the failover occurs, and the shared disk is mounted by the other
> host it looks like the PID file is still there. So rcsge on the new
> primary doesn't start up.
>
> The fix will be to put in a sleep after the process kill,
> plus maybe a deletion of the PID file or a hard kill.
>
> This is of course no reflection ont he excellent quality of Gridengine,
> in case anyone should take it that I'm slinging mud.
> Its just a consequence of issuing a kill, and yanking the disk away before
> it has time to complete and tidy up.
>
> So absolutely no criticism of Gridengine, just a bit of advice for anyone
> else trying the same thing.
OK. Glad to hear you found your problem.
I would suggest that no "stop" procedure should say it's stopped until it's
stopped. Although this is a minor bug, I would suggest filing a bug report
against gridengine - that it should wait until its processes die before
saying "stop" was complete.
Also, the start procedure shouldn't be quite so fussy about the existence
of the pid file, or it won't work when you have a real failover due to a
crash. A kill -9 `cat pidfile` is the usual test before removing the pid file.
So, I see your problem as two bugs - one of which you've worked around.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me claim
from you at all times your undisguised opinions." - William Wilberforce
More information about the Linux-HA
mailing list