[Linux-HA] Score Calculation

Max Hofer max.hofer at apus.co.at
Fri Sep 14 03:11:33 MDT 2007


On Friday 14 September 2007, Dominik Klein wrote:
> >> Yes I meant the resource is running first and crashes later on, so that
> >> monitor reports "not running".
> >
> > generally, one shouldn't report "not running" in such cases
>
> Okay, maybe I should have read this more precisely.
> http://www.linux-ha.org/OCFResourceAgent
> "monitor - monitor the health of a resource. Exit 0 if the resource is
> running, 7 if it is stopped and anything else if it is failed"
>
> Okay. Good to know. But how can I (my RA) know wether Linux-HA expects
> my resource to run or not to run when it calls the monitor script?
> Iirc it calls "monitor" on probe and on monitor action. Is there a way
> to determine what it expects to get? Because the way I understand it
> now, I have to return OCF_NOT_RUNNING in case "monitor" is called by
> probe and the resource is not runnning and return OCF_ERR_GENERIC (or
> some other non-0 and non-7 value) if "monitor" is called by monitor and
> the resource is not running.
The RA is assumed to be dumb...

You have two choices:
a) you keep track if it was started/stopped (lock file for example) and when 
the resource "dissapeared" you return ERROR
b) you don't care about start/stop and return either not-running and in case 
it is running but you are able to detect that something is wrong you return 
ERROR.

In both cases heartbeat handles it the same way because the CRM keeps track if 
the resource was succesfuly started and returns "not-running" even if no stop 
operation was called.

In case a) you just have to make sure your lock file is deleted by the stop 
operation and in case the node is shutdown on the hard way (poweroff etc.)

As far as i remember heartbeat (>2.1.0) provides a kind of lockfile framework 
for RA. Use this and you will do fine. Check the existing RAs and the files 
they include.



More information about the Linux-HA mailing list