[Linux-HA] Question about monitoring operations in Heartbeat

Andrew Beekhof beekhof at gmail.com
Fri Jul 11 00:43:10 MDT 2008


On Jul 10, 2008, at 2:38 PM, Michael Alger wrote:

> On Thu, Jul 10, 2008 at 01:22:16PM +0200, Lukas Pecha wrote:
>> I would like to know, if I can limit Heartbeat monitor operations
>> to run on some nodes in cluster only?
>
> I think heartbeat will run the monitor script regardless, because it
> wants to make sure each resource is running in only one place. To do
> that it needs to check where it is actually running when linux-ha
> first starts up.

correct

>
>
>> And here is the problem: when I start up the main two nodes, it's
>> all behaving pretty well. But when I start the last node, the
>> cluster goes mad and starts to stop my resources - because it is
>> trying to start the automatic monitoring operations on the last
>> node and it can't find the init scripts for drbd and iscsi-target.
>> But the last node was never intended to run those resources ever,
>> so why it is trying to start monitoring operations for these
>> resources? I don't even have those operations defined in cib.xml.
>
> It'll define a monitor operation automatically,

a non-recurring one that is.  the automatic one only happens at node  
startup.

> just like it
> automatically defines the start and stop operations. You only need
> to define them yourself if you wish to change some of the parameters
> it uses.
>
> Have you tried running the showscores script to see how it's scoring
> your nodes?
>
> I'm no expert, but it looks like your constraint here:
>
>> here is my cib.xml:
>>
>>    <constraints>
>>      <rsc_location id="run_ha_storage" rsc="ha_storage">
>>        <rule score="-INFINITY" boolean_op="and"
>>         id="run_ha_storage_only_there">
>
> is using the "and" operation, so it means in order for this rule to
> match (and apply a -INFINITY) score, the node's uname must be:
>
>>          <expression attribute="#uname" operation="ne"
>>           value="node_target2" id="expr_run_ha_storage1"/>
>
> node_target2, and it must also be:
>
>>          <expression attribute="#uname" operation="ne"
>>           value="node_target1" id="expr_run_ha_storage2"/>
>

that is the correct interpretation

> node_target1, which strikes me as being a little bit unlikely.
> Shouldn't this rule be using an "or" so that if either expression
> matches, the score is applied?  Or alternatively, change it to only
> match the one node you don't want to run it on, which seems simpler.
>
> But, even if it really is scoring the resource at -INFINITY, it'll
> probably still want to check its status to make sure it's not
> running where it's not supposed to be.
>
> Your dummy script should be returning 7 (OCF_NOT_RUNNING) in
> response to the "monitor" command. Otherwise, heartbeat will think
> it's started on that node, and issue a stop command. Returning 0 in
> response to the stop is correct, though.
>
>>        </rule>
>>        <rule score="600" id="run_it_here">
>>          <expression attribute="#uname" operation="eq"
>>           value="node_target2" id="expr_run_it_here1"/>
>>        </rule>
>>      </rsc_location>
>>   </constraints>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list