[Linux-HA] Question about monitoring operations in Heartbeat

Michael Alger linux-ha at mm.quex.org
Thu Jul 10 06:38:15 MDT 2008


On Thu, Jul 10, 2008 at 01:22:16PM +0200, Lukas Pecha wrote:
> I would like to know, if I can limit Heartbeat monitor operations
> to run on some nodes in cluster only?

I think heartbeat will run the monitor script regardless, because it
wants to make sure each resource is running in only one place. To do
that it needs to check where it is actually running when linux-ha
first starts up.

> And here is the problem: when I start up the main two nodes, it's
> all behaving pretty well. But when I start the last node, the
> cluster goes mad and starts to stop my resources - because it is
> trying to start the automatic monitoring operations on the last
> node and it can't find the init scripts for drbd and iscsi-target.
> But the last node was never intended to run those resources ever,
> so why it is trying to start monitoring operations for these
> resources? I don't even have those operations defined in cib.xml.

It'll define a monitor operation automatically, just like it
automatically defines the start and stop operations. You only need
to define them yourself if you wish to change some of the parameters
it uses.

Have you tried running the showscores script to see how it's scoring
your nodes?

I'm no expert, but it looks like your constraint here:

> here is my cib.xml:
> 
>     <constraints>
>       <rsc_location id="run_ha_storage" rsc="ha_storage">
>         <rule score="-INFINITY" boolean_op="and" 
>          id="run_ha_storage_only_there">

is using the "and" operation, so it means in order for this rule to
match (and apply a -INFINITY) score, the node's uname must be:

>           <expression attribute="#uname" operation="ne" 
>            value="node_target2" id="expr_run_ha_storage1"/>

node_target2, and it must also be:

>           <expression attribute="#uname" operation="ne" 
>            value="node_target1" id="expr_run_ha_storage2"/>

node_target1, which strikes me as being a little bit unlikely.
Shouldn't this rule be using an "or" so that if either expression
matches, the score is applied?  Or alternatively, change it to only
match the one node you don't want to run it on, which seems simpler.

But, even if it really is scoring the resource at -INFINITY, it'll
probably still want to check its status to make sure it's not
running where it's not supposed to be.

Your dummy script should be returning 7 (OCF_NOT_RUNNING) in
response to the "monitor" command. Otherwise, heartbeat will think
it's started on that node, and issue a stop command. Returning 0 in
response to the stop is correct, though.

>         </rule>
>         <rule score="600" id="run_it_here">
>           <expression attribute="#uname" operation="eq" 
>            value="node_target2" id="expr_run_it_here1"/>
>         </rule>
>       </rsc_location>
>    </constraints>


More information about the Linux-HA mailing list