[Linux-HA] Re: pingd problem

Achim Stumpf hakim.news at googlemail.com
Thu Apr 3 04:20:08 MDT 2008


Hi,

I had some problems with my subscription and e-mail, so I send my forst 
mail again here:

I am using Heartbeat 2.0.8 on Fedora 7.

It's a two node cluster Activie/Passive with crm activated.


The cluster starts up fine, but if i block with iptables the ping nodes 
on the active node, then it will not start the resources on the other node.

In the logs I see of the actove node:

Apr  2 10:28:16 sputnik mysql[11916]: [11925]: INFO: MySQL monitor succeded
Apr  2 10:28:17 sputnik apache[11926]: [11986]: INFO: 10:28:17 
URL:http://127.0.0.1/server-status/ [1765/1765] -> "-" [1]
Apr  2 10:28:55 sputnik heartbeat: [11577]: WARN: node routers: is dead
Apr  2 10:28:55 sputnik heartbeat: [11577]: info: Link routers:routers dead.
Apr  2 10:28:55 sputnik crmd: [11592]: notice: crmd_ha_status_callback: 
Status update: Node routers now has status [dead]
Apr  2 10:28:55 sputnik pingd: [11594]: notice: pingd_nstatus_callback: 
Status update: Ping node routers now has status [dead]
Apr  2 10:28:55 sputnik pingd: [11594]: info: send_update: 0 active ping 
nodes
Apr  2 10:28:55 sputnik pingd: [11594]: notice: pingd_lstatus_callback: 
Status update: Ping node routers now has status [dead]
Apr  2 10:28:55 sputnik pingd: [11594]: notice: pingd_nstatus_callback: 
Status update: Ping node routers now has status [dead]
Apr  2 10:28:55 sputnik pingd: [11594]: info: send_update: 0 active ping 
nodes
Apr  2 10:28:56 sputnik crmd: [11592]: WARN: get_uuid: Could not 
calculate UUID for routers
Apr  2 10:28:56 sputnik crmd: [11592]: info: crmd_ha_status_callback: 
Ping node routers is dead
Apr  2 10:29:16 sputnik mysql[12017]: [12026]: INFO: MySQL monitor succeded
Apr  2 10:29:18 sputnik apache[12027]: [12087]: INFO: 10:29:18 
URL:http://127.0.0.1/server-status/ [1766/1766] -> "-" [1]
Apr  2 10:29:25 sputnik attrd: [11591]: info: attrd_timer_callback: 
Sending flush op to all hosts for: pingd
Apr  2 10:29:26 sputnik attrd: [11591]: info: attrd_ha_callback: flush 
message from sputnik.test
Apr  2 10:29:26 sputnik attrd: [11591]: info: attrd_ha_callback: Sent 
update 5: pingd=0
Apr  2 10:29:27 sputnik cib: [11588]: info: cib_diff_notify: Update 
(client: 11591, call:5): 0.1.28 -> 0.1.29 (ok)
Apr  2 10:29:27 sputnik cib: [11588]: info: cib_diff_notify: Update 
(client: 7817, call:5): 0.1.29 -> 0.1.30 (ok)
Apr  2 10:29:27 sputnik cib: [12088]: info: write_cib_contents: Wrote 
version 0.1.30 of the CIB to disk (digest: f337be40ad9bed30d302a4af031b5cf1)
Apr  2 10:30:16 sputnik mysql[12115]: [12124]: INFO: MySQL monitor succeded
Apr  2 10:30:18 sputnik apache[12125]: [12185]: INFO: 10:30:18 
URL:http://127.0.0.1/server-status/ [1766/1766] -> "-" [1]
Apr  2 10:31:16 sputnik mysql[12210]: [12219]: INFO: MySQL monitor succeded
Apr  2 10:31:18 sputnik apache[12220]: [12280]: INFO: 10:31:18 
URL:http://127.0.0.1/server-status/ [1767/1767] -> "-" [1]


ha.cf:

use_logd yes

udpport 695

keepalive 1
deadtime 10
warntime 5
initdead 30 # depend on your hardware

bcast bond0
watchdog /dev/watchdog

ping_group routers 10.14.0.10 10.14.0.11 10.14.0.12 10.14.0.13

crm yes
node    sputnik.test
node    sputnik1.fra

respawn root /usr/lib/heartbeat/pingd -m 500 -d 30s -a pingd


cib.xml:

<?xml version="1.0" ?>
<cib admin_epoch="0" epoch="0" num_updates="0">
         <configuration>
                 <crm_config>
                         <cluster_property_set id="cib-bootstrap-options">
                                 <attributes>
                                         <nvpair 
id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" 
value="true"/>
                                         <nvpair 
id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" 
value="stop"/>
                                         <nvpair 
id="cib-bootstrap-options-default-resource-stickiness" 
name="default-resource-stickiness" value="0"/>
                                         <nvpair 
id="cib-bootstrap-options-default-resource-failure-stickiness" 
name="default-resource-failure-stickiness" value="0"/>
                                         <nvpair 
id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" 
value="false"/>
                                         <nvpair 
id="cib-bootstrap-options-stonith-action" name="stonith-action" 
value="reboot"/>
                                         <nvpair 
id="cib-bootstrap-options-stop-orphan-resources" 
name="stop-orphan-resources" value="true"/>
                                         <nvpair 
id="cib-bootstrap-options-stop-orphan-actions" 
name="stop-orphan-actions" value="true"/>
                                         <nvpair 
id="cib-bootstrap-options-remove-after-stop" name="remove-after-stop" 
value="false"/>
                                         <nvpair 
id="cib-bootstrap-options-short-resource-names" 
name="short-resource-names" value="true"/>
                                         <nvpair 
id="cib-bootstrap-options-cluster-delay" name="cluster-delay" value="5min"/>
                                         <nvpair 
id="cib-bootstrap-options-is-managed-default" name="is-managed-default" 
value="true"/>
                                         <nvpair 
id="cib-bootstrap-options-default-action-timeout" 
name="default-action-timeout" value="120s"/>
                                         <nvpair 
id="cib-bootstrap-options-dc_deadtime" name="dc_deadtime" value="10s"/>
                                         <nvpair 
id="cib-bootstrap-options-cluster_recheck_interval" 
name="cluster_recheck_interval" value="0"/>
                                         <nvpair 
id="cib-bootstrap-options-election_timeout" name="election_timeout" 
value="120s"/>
                                         <nvpair 
id="cib-bootstrap-options-shutdown_escalation" 
name="shutdown_escalation" value="7min"/>
                                         <nvpair 
id="cib-bootstrap-options-crmd-integration-timeout" 
name="crmd-integration-timeout" value="3min"/>
                                         <nvpair 
id="cib-bootstrap-options-crmd-finalization-timeout" 
name="crmd-finalization-timeout" value="5min"/>
                                         <nvpair 
id="cib-bootstrap-options-pe-error-series-max" 
name="pe-error-series-max" value="-1"/>
                                         <nvpair 
id="cib-bootstrap-options-pe-warn-series-max" name="pe-warn-series-max" 
value="-1"/>
                                         <nvpair 
id="cib-bootstrap-options-pe-input-series-max" 
name="pe-input-series-max" value="-1"/>
                                         <nvpair 
id="cib-bootstrap-options-startup-fencing" name="startup-fencing" 
value="true"/>
                                 </attributes>
                         </cluster_property_set>
                 </crm_config>
                 <nodes/>
                 <resources>
                         <group id="group_1">
                                 <primitive class="heartbeat" 
id="drbddisk_1" provider="heartbeat" type="drbddisk">
                                         <operations>
                                                 <op id="drbddisk_1_mon" 
interval="30s" name="monitor" timeout="25s"/>
                                         </operations>
                                         <instance_attributes 
id="drbddisk_1_inst_attr">
                                                 <attributes>
                                                         <nvpair 
id="drbddisk_1_attr_1" name="1" value="data1"/>
                                                 </attributes>
                                         </instance_attributes>
                                 </primitive>
                                 <primitive class="ocf" 
id="Filesystem_1" provider="heartbeat" type="Filesystem">
                                         <operations>
                                                 <op 
id="Filesystem_1_mon" interval="30s" name="monitor" timeout="25s"/>
                                         </operations>
                                         <instance_attributes 
id="Filesystem_1_inst_attr">
                                                 <attributes>
                                                         <nvpair 
id="Filesystem_1_attr_0" name="device" value="/dev/drbd0"/>
                                                         <nvpair 
id="Filesystem_1_attr_1" name="directory" value="/data1"/>
                                                         <nvpair 
id="Filesystem_1_attr_2" name="fstype" value="ext3"/>
                                                 </attributes>
                                         </instance_attributes>
                                 </primitive>
                                 <primitive class="ocf" id="mysql_1" 
provider="interactivedata" type="mysql">
                                         <operations>
                                                 <op id="mysql_1_mon" 
interval="60s" name="monitor" timeout="55s"/>
                                         </operations>
                                         <instance_attributes 
id="mysql_1_inst_attr">
                                                 <attributes>
                                                         <nvpair 
id="mysql_1_attr_0" name="test_table" value="heartbeat.test"/>
                                                         <nvpair 
id="mysql_1_attr_1" name="test_user" value="heartbeat"/>
                                                         <nvpair 
id="mysql_1_attr_2" name="test_passwd" value="test"/>
                                                         <nvpair 
id="mysql_1_attr_3" name="binary" value="/usr/bin/mysqld_safe"/>
                                                         <nvpair 
id="mysql_1_attr_4" name="pid" value="/var/run/mysqld/mysqld.pid"/>
                                                         <nvpair 
id="mysql_1_attr_5" name="OCF_CHECK_LEVEL" value="10"/>
                                                 </attributes>
                                         </instance_attributes>
                                 </primitive>
                                 <primitive class="ocf" id="apache2_1" 
provider="heartbeat" type="apache">
                                         <operations>
                                                 <op id="apache2_1_mon" 
interval="60s" name="monitor" timeout="55s"/>
                                         </operations>
                                         <instance_attributes 
id="apache_2_1_inst_attr">
                                                 <attributes>
                                                         <nvpair 
id="apache_2_1_attr_0" name="configfile" 
value="/etc/httpd/conf/httpd.conf"/>
                                                         <nvpair 
id="apache_2_1_attr_1" name="statusurl" 
value="http://127.0.0.1/server-status/"/>
                                                         <nvpair 
id="apache_2_1_attr_2" name="options" value="-DSSL"/>
                                                 </attributes>
                                         </instance_attributes>
                                 </primitive>
                                 <instance_attributes 
id="group_1_instance_attrs">
                                         <attributes>
                                                 <nvpair 
id="group_1_target_role" name="target_role" value="started"/>
                                                 <nvpair 
id="group_1_resource_stickiness" name="resource_stickiness" value="200"/>
                                         </attributes>
                                 </instance_attributes>
                         </group>
                 </resources>
                 <constraints>
                         <rsc_location id="rsc_location_group_1" 
rsc="group_1">
                                 <rule id="prefered_location_group_1" 
score="100">
                                         <expression attribute="#uname" 
id="prefered_location_group_1_expr" operation="eq" value="sputnik.test"/>
                                 </rule>
                         </rsc_location>
                         <rsc_location id="group_1:connected" rsc="group_1">
                                 <rule id="group_1:connected:rule" 
score_attribute="pingd" >
                                         <expression 
id="group_1:connected:expr:defined" attribute="pingd" operation="defined"/>
                                 </rule>
                         </rsc_location>
                 </constraints>
         </configuration>
         <status/>
</cib>


It would be nice to acutally get this setup working with scores, but it 
does not work as you see in the logs. There won't be a failover to the 
other node.

Any hints, how i could get working the setup with scores as above?


Thanks,

Achim




Dominik Klein schrieb:
> Achim Stumpf wrote:
>> Hi,
>>
>> Now it works. I have changed in cib.xml:
>>
>>                         <rsc_location id="group_1:connected" 
>> rsc="group_1">
>>                                 <rule id="group_1:connected:rule" 
>> score_attribute="pingd" >
>>                                         <expression 
>> id="group_1:connected:expr:defined" attribute="pingd" 
>> operation="defined"/>
>>                                 </rule>
>>                         </rsc_location>
>>
>>
>> to
>>
>>
>>                         <rsc_location id="group_1:connected" 
>> rsc="group_1">
>>                                 <rule id="group_1:connected:rule" 
>> score="-INFINITY" boolean_op="or">
>>                                         <expression 
>> id="group_1:connected:expr:undefined" attribute="pingd" 
>> operation="not_defined"/>
>>                                         <expression 
>> id="group_1:connected:expr:zero" attribute="pingd" operation="lte" 
>> value="0"/>
>>                                 </rule>
>>                         </rsc_location>
>>
>>
>> Now it works as expected. Those two setups were described on:
>>
>> http://www.linux-ha.org/pingd
>>
>> But still it would be nice to get pingnodes working with scores as 
>> made in my first example or described on taht page in "Quickstart - 
>> Run my resource on the node with the best connectivity".
>>
>> Does anyone have any hints how to get that stuff working?
> 
> That should work - what did you see (and what did you expect)? You 
> propably need to adjust the pingd multiplier. showscores.sh helps a lot 
> here to figure out what's not as expected (see 
> http://www.linux-ha.org/ScoreCalculation )
> 
> Regards
> Dominik
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


More information about the Linux-HA mailing list