[Linux-HA] Resource failover problems with pingd
Andrew Beekhof
beekhof at gmail.com
Wed Dec 27 14:29:12 MST 2006
this was fixed a while back in http://hg.linux-ha.org/dev ready for 2.0.8
if you search the archives you'll find the link for the patch that fixed it
On 12/27/06, ranyhere at bnb.gov.br <ranyhere at bnb.gov.br> wrote:
> Hello,
>
> I have an active/active cluster with two nodes using Red Hat Enterprise Linux ES v.4 and HeartBeat 2.0.7.
>
> There are two resource groups in the cluster:
>
> RESOURCE_GROUP1 on NODE1
> RESOURCE_GROUP2 on NODE2
>
> Two network adapters are used on each cluster node, one adapter for heartbeat communications (through cross-over cable), and other for corporate network communications.
>
> pingd RA clone is configured to monitor network availability, and, when network connection to corporate router is lost by NODE2 (cable is disconnected), HeartBeat doesn't move RESOURCE_GROUP2 to NODE1, but, start RESOURCE_GROUP1 on NODE2, without stop it on NODE1, and evaluate NODE1 off-line, although NODE1 connection to corporate router is normal.
>
> crm_mon shows output below:
>
>
> ============
> Last updated: Wed Dec 27 15:14:41 2006
> Current DC: node2.localdomain (46445383-5102-4fba-a517-502abad59095)
> 2 Nodes configured.
> 3 Resources configured.
> ============
>
> Node: node2.localdomain (46445383-5102-4fba-a517-502abad59095): online
> Node: node1.localdomain (606d9d99-5bab-4427-92d4-06cf5a7b1a8a): OFFLINE
> .
> .
> .
> Clone Set: pingd
> ping-child:0 (heartbeat::ocf:pingd): Started node2.localdomain
> ping-child:1 (heartbeat::ocf:pingd): Stopped
>
>
> However, when testing network availability to NODE1, disconnecting network cable, everything works fine, and RESOURCE_GROUP1 is moved to NODE2,
>
> Bellow is presented configurations used in ha.cf and cib.xml files. Is there something wrong with such configurations?
>
> ha.cf (NODE1)
>
> logfile /var/log/ha-log
> logfacility local0
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 694
> bcast eth1 eth0
> ucast eth1 10.0.0.2
> node node1.localdomain
> node node2.localdomain
> ping 172.16.0.254
> crm yes
> apiauth cibmon uid=hacluster
> respawn hacluster /usr/lib/heartbeat/cibmon -d
>
> ha.cf (NODE2)
>
> logfile /var/log/ha-log
> logfacility local0
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 694
> bcast eth1 eth0
> ucast eth1 10.0.0.1
> node node1.localdomain
> node node2.localdomain
> ping 172.16.0.254
> crm yes
> apiauth cibmon uid=hacluster
> respawn hacluster /usr/lib/heartbeat/cibmon -d
>
> cib.xml
>
> <clone id="pingd">
> <instance_attributes id="pingd_attr">
> <attributes>
> <nvpair id="pingd-clone_max" name="clone_max" value="2"/>
> <nvpair id="pingd-clone_node_max" name="clone_node_max" value="1"/>
> </attributes>
> </instance_attributes>
> <primitive class="ocf" id="ping-child" provider="heartbeat" type="pingd">
> <instance_attributes id="pingd-child_attr">
> <attributes>
> <nvpair id="pingd-child-dampen" name="dampen" value="5s"/>
> <nvpair id="pingd-child-multiplier" name="multiplier" value="1000"/>
> <nvpair id="pingd-child-user" name="user" value="root"/>
> <nvpair id="pingd-child-pidfile" name="pidfile" value="/tmp/pingd.pid"/>
> </attributes>
> </instance_attributes>
> <operations>
> <op id="pingd-child_monitor" interval="10s" name="monitor" timeout="5s" prereq="nothing"/>
> <op id="pingd-child_start" name="start" prereq="nothing"/>
> </operations>
> </primitive>
> </clone>
>
>
> <rsc_location id="RESOURCE_GROUP1_NODE1" rsc="RESOURCE_GROUP1">
> <rule id="RESOURCE_GROUP1_NODE1_SCORE" score="150" boolean_op="and">
> <expression attribute="#uname" id="RESOURCE_GROUP1_NODE1_SCORE_EXPR" operation="eq" value="node1.localdomain"/>
> </rule>
> <rule id="RESOURCE_GROUP1_PINGD_RULE" score_attribute="pingd" boolean_op="and">
> <expression id="RESOURCE_GROUP1_PINGD_RULE_EXPR01" attribute="pingd" operation="defined"/>
> <expression id="RESOURCE_GROUP1_PINGD_RULE_EXPR02" attribute="pingd" operation="gt" value="0"/>
> </rule>
> </rsc_location>
>
>
> <rsc_location id="RESOURCE_GROUP2_NODE2" rsc="RESOURCE_GROUP2">
> <rule id="RESOURCE_GROUP2_NODE2_SCORE" score="150" boolean_op="and">
> <expression attribute="#uname" id="RESOURCE_GROUP2_NODE2_SCORE_EXPR" operation="eq" value="node2.localdomain"/>
> </rule>
> <rule id="RESOURCE_GROUP2_PINGD_RULE" score_attribute="pingd" boolean_op="and">
> <expression id="RESOURCE_GROUP2_PINGD_RULE_EXPR01" attribute="pingd" operation="defined"/>
> <expression id="RESOURCE_GROUP2_PINGD_RULE_EXPR02" attribute="pingd" operation="gt" value="0"/>
> </rule>
> </rsc_location>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list