[Linux-HA] Score Calculation

Max Hofer max.hofer at apus.co.at
Thu Sep 13 09:42:34 MDT 2007


On Thursday 13 September 2007, Dominik Klein wrote:
> > http://www.linux-ha.org/v2/faq
> >
> > That's were you'll find one example of calculating scores. Check
> > also the list archives---quite a few times this issue came up.
>
> Of course I read that.
>
> What I want to know is: Does the cluster make a difference between
> monitor result "error" and monitor result "not running". From what I
> saw, it looked like the decision was different.
>
> error: stickiness is still applied, then calculated
>
> not running: stickiness value is removed from current scores, then a
> decision is made where to run the resource and *then* the stickiness
> value is applied again.
>
> That's what I wanted to know and I did not come across an answer yet.
In general:
- the score calculation does not care about errors/not running. All it cares 
about is the resource fail-count for a node.

The score for a resource R on a Node N is:

score(R) = resource_stickiness(R) + (failcount(R,N) * failure_stickiness(R)) + 
constraint_score(R,N)

where resource_stickiness(R) = 0 for all nodes where R is not running.

So I bet your next question is - when is the fail-count increased ;-)

And here it is (for version 2.1.1):

a) * precondition: R was running on N, which means following happened:
- R was probed on N as not running
- R was started on N
- R start operation was called and returned OK
* if this precondition is met heartbeat does not distinguish between a monitor 
return of "not running" or "error". This means if monitor returns "not 
running" without(!) "resource stop" = "error" --> failcount for R on N 
increases by 1

b) * condition: failed start - which means:
- R was probed on N as not running
- R was started on N
- R start returned ERROR (not sure if monitor is called just after start)
* regardless what monitor reports (error/non-running) fail-count does not go 
up because it never really started (this behaviour is currently under 
discussion - AFAIK)

NOTE: be aware that heartbeat marks the score for a failed resource start on 
this node as -INFINITY until you clear the node/resource 
with "crm_resource -C -r resource -H node" ---> since -INFINITY overrules all 
scores the failcount of R on N doesn't really mater anymore.

And here a hint ...

The best way is always to try it out - use a Dummy resource and setup the 
configuration - you are faster with testing than with writing the email ----
worst of all those things change over time and most developers do know only 
the "current state of the art" (which does not help you if you use version 
x.y.z).
You have to test it out anyway with the real cluster.

kindr regards Max





More information about the Linux-HA mailing list