[Linux-HA] Problem with Resource not starting after killing process.

ktomlinson at scarpgold.co.uk ktomlinson at scarpgold.co.uk
Tue Oct 30 10:16:24 MDT 2007


Dont know where to start with this one but i have a resource that is driving me mad and struggling to see the wood for the trees.

The script is in our ocf directory and has all the start/stop/status/monitor/meta-data etc as required. Its just a modified heartbeat apache one..

Anyway the problem. -2 node cluster.

All starts up.. works perfect.
Can place node into stand-by and it moves and starts and can fail back.
but for some reason sometimes it fails to start. the gui reports Failed and the status in ha.log says failed. But the start up script never gets call. 
I have checked the failcounts and this can happen when both are still 0 for each node.

Once running i just - do a 'dm stop' (script) and this kills the httpd.
Then it is sometimes seen and the cluster restarts the httpd. fail count will go to 1 etc.
then if a do a stop again it may or may not work. it then fails. Count is still 1 .

Then if i on the same node do a 'crm_resource -U -r dm_grp -t group -H nodeb' then the resource is restarted and i can see it call the start up script.

Just asking for any pointers where to look so that i can attempt to stop this from failing in this manner.

thanks
Kevin.




More information about the Linux-HA mailing list