[Linux-HA] Resource failover problems with pingd

Andrew Beekhof beekhof at gmail.com
Wed Dec 27 14:29:12 MST 2006


this was fixed a while back in http://hg.linux-ha.org/dev ready for 2.0.8
if you search the archives you'll find the link for the patch that fixed it

On 12/27/06, ranyhere at bnb.gov.br <ranyhere at bnb.gov.br> wrote:
>         Hello,
>
>         I have an active/active cluster with two nodes using Red Hat Enterprise Linux ES v.4 and HeartBeat 2.0.7.
>
>         There are two resource groups in the cluster:
>
>         RESOURCE_GROUP1 on NODE1
>         RESOURCE_GROUP2 on NODE2
>
>         Two network adapters are used on each cluster node, one adapter for heartbeat communications (through cross-over cable), and other for corporate network communications.
>
>         pingd RA clone is configured to monitor network availability, and, when network connection to corporate router is lost by NODE2 (cable is disconnected), HeartBeat doesn't move RESOURCE_GROUP2 to NODE1, but, start RESOURCE_GROUP1 on NODE2, without stop it on NODE1, and evaluate NODE1 off-line, although NODE1 connection to corporate router is normal.
>
>         crm_mon shows output below:
>
>
> ============
> Last updated: Wed Dec 27 15:14:41 2006
> Current DC: node2.localdomain (46445383-5102-4fba-a517-502abad59095)
> 2 Nodes configured.
> 3 Resources configured.
> ============
>
> Node: node2.localdomain (46445383-5102-4fba-a517-502abad59095): online
> Node: node1.localdomain (606d9d99-5bab-4427-92d4-06cf5a7b1a8a): OFFLINE
>   .
>   .
>   .
> Clone Set: pingd
>     ping-child:0        (heartbeat::ocf:pingd): Started node2.localdomain
>     ping-child:1        (heartbeat::ocf:pingd): Stopped
>
>
>         However, when testing network availability to NODE1, disconnecting network cable, everything  works fine, and RESOURCE_GROUP1 is moved to NODE2,
>
>         Bellow is presented configurations used in ha.cf and cib.xml files. Is there something wrong with such configurations?
>
> ha.cf (NODE1)
>
> logfile /var/log/ha-log
> logfacility     local0
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 694
> bcast   eth1 eth0
> ucast eth1 10.0.0.2
> node    node1.localdomain
> node    node2.localdomain
> ping 172.16.0.254
> crm yes
> apiauth cibmon uid=hacluster
> respawn hacluster /usr/lib/heartbeat/cibmon -d
>
> ha.cf (NODE2)
>
> logfile /var/log/ha-log
> logfacility     local0
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 694
> bcast   eth1 eth0
> ucast eth1 10.0.0.1
> node    node1.localdomain
> node    node2.localdomain
> ping 172.16.0.254
> crm yes
> apiauth cibmon uid=hacluster
> respawn hacluster /usr/lib/heartbeat/cibmon -d
>
> cib.xml
>
> <clone id="pingd">
>   <instance_attributes id="pingd_attr">
>     <attributes>
>       <nvpair id="pingd-clone_max" name="clone_max" value="2"/>
>       <nvpair id="pingd-clone_node_max" name="clone_node_max" value="1"/>
>     </attributes>
>   </instance_attributes>
>   <primitive class="ocf" id="ping-child" provider="heartbeat" type="pingd">
>     <instance_attributes id="pingd-child_attr">
>       <attributes>
>         <nvpair id="pingd-child-dampen" name="dampen" value="5s"/>
>         <nvpair id="pingd-child-multiplier" name="multiplier" value="1000"/>
>         <nvpair id="pingd-child-user" name="user" value="root"/>
>         <nvpair id="pingd-child-pidfile" name="pidfile" value="/tmp/pingd.pid"/>
>       </attributes>
>     </instance_attributes>
>     <operations>
>       <op id="pingd-child_monitor" interval="10s" name="monitor" timeout="5s" prereq="nothing"/>
>       <op id="pingd-child_start" name="start" prereq="nothing"/>
>     </operations>
>   </primitive>
> </clone>
>
>
> <rsc_location id="RESOURCE_GROUP1_NODE1" rsc="RESOURCE_GROUP1">
>   <rule id="RESOURCE_GROUP1_NODE1_SCORE" score="150" boolean_op="and">
>     <expression attribute="#uname" id="RESOURCE_GROUP1_NODE1_SCORE_EXPR" operation="eq"    value="node1.localdomain"/>
>   </rule>
>   <rule id="RESOURCE_GROUP1_PINGD_RULE" score_attribute="pingd" boolean_op="and">
>     <expression id="RESOURCE_GROUP1_PINGD_RULE_EXPR01" attribute="pingd" operation="defined"/>
>     <expression id="RESOURCE_GROUP1_PINGD_RULE_EXPR02" attribute="pingd" operation="gt"     value="0"/>
>   </rule>
> </rsc_location>
>
>
> <rsc_location id="RESOURCE_GROUP2_NODE2" rsc="RESOURCE_GROUP2">
>   <rule id="RESOURCE_GROUP2_NODE2_SCORE" score="150" boolean_op="and">
>     <expression attribute="#uname" id="RESOURCE_GROUP2_NODE2_SCORE_EXPR" operation="eq" value="node2.localdomain"/>
>   </rule>
>   <rule id="RESOURCE_GROUP2_PINGD_RULE" score_attribute="pingd" boolean_op="and">
>     <expression id="RESOURCE_GROUP2_PINGD_RULE_EXPR01" attribute="pingd" operation="defined"/>
>       <expression id="RESOURCE_GROUP2_PINGD_RULE_EXPR02" attribute="pingd" operation="gt" value="0"/>
>   </rule>
> </rsc_location>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list