[Linux-HA] Re: pingd problem
Dominik Klein
dk at in-telegence.net
Thu Apr 3 04:46:51 MDT 2008
Hi
Achim Stumpf wrote:
> Hi,
>
> I had some problems with my subscription and e-mail, so I send my forst
> mail again here:
>
> I am using Heartbeat 2.0.8 on Fedora 7.
>
> It's a two node cluster Activie/Passive with crm activated.
>
>
> The cluster starts up fine, but if i block with iptables the ping nodes
> on the active node, then it will not start the resources on the other node.
>
> In the logs I see of the actove node:
>
> Apr 2 10:28:16 sputnik mysql[11916]: [11925]: INFO: MySQL monitor succeded
> Apr 2 10:28:17 sputnik apache[11926]: [11986]: INFO: 10:28:17
> URL:http://127.0.0.1/server-status/ [1765/1765] -> "-" [1]
> Apr 2 10:28:55 sputnik heartbeat: [11577]: WARN: node routers: is dead
> Apr 2 10:28:55 sputnik heartbeat: [11577]: info: Link routers:routers
> dead.
> Apr 2 10:28:55 sputnik crmd: [11592]: notice: crmd_ha_status_callback:
> Status update: Node routers now has status [dead]
> Apr 2 10:28:55 sputnik pingd: [11594]: notice: pingd_nstatus_callback:
> Status update: Ping node routers now has status [dead]
> Apr 2 10:28:55 sputnik pingd: [11594]: info: send_update: 0 active ping
> nodes
> Apr 2 10:28:55 sputnik pingd: [11594]: notice: pingd_lstatus_callback:
> Status update: Ping node routers now has status [dead]
> Apr 2 10:28:55 sputnik pingd: [11594]: notice: pingd_nstatus_callback:
> Status update: Ping node routers now has status [dead]
> Apr 2 10:28:55 sputnik pingd: [11594]: info: send_update: 0 active ping
> nodes
> Apr 2 10:28:56 sputnik crmd: [11592]: WARN: get_uuid: Could not
> calculate UUID for routers
> Apr 2 10:28:56 sputnik crmd: [11592]: info: crmd_ha_status_callback:
> Ping node routers is dead
> Apr 2 10:29:16 sputnik mysql[12017]: [12026]: INFO: MySQL monitor succeded
> Apr 2 10:29:18 sputnik apache[12027]: [12087]: INFO: 10:29:18
> URL:http://127.0.0.1/server-status/ [1766/1766] -> "-" [1]
> Apr 2 10:29:25 sputnik attrd: [11591]: info: attrd_timer_callback:
> Sending flush op to all hosts for: pingd
> Apr 2 10:29:26 sputnik attrd: [11591]: info: attrd_ha_callback: flush
> message from sputnik.test
> Apr 2 10:29:26 sputnik attrd: [11591]: info: attrd_ha_callback: Sent
> update 5: pingd=0
> Apr 2 10:29:27 sputnik cib: [11588]: info: cib_diff_notify: Update
> (client: 11591, call:5): 0.1.28 -> 0.1.29 (ok)
> Apr 2 10:29:27 sputnik cib: [11588]: info: cib_diff_notify: Update
> (client: 7817, call:5): 0.1.29 -> 0.1.30 (ok)
> Apr 2 10:29:27 sputnik cib: [12088]: info: write_cib_contents: Wrote
> version 0.1.30 of the CIB to disk (digest:
> f337be40ad9bed30d302a4af031b5cf1)
> Apr 2 10:30:16 sputnik mysql[12115]: [12124]: INFO: MySQL monitor succeded
> Apr 2 10:30:18 sputnik apache[12125]: [12185]: INFO: 10:30:18
> URL:http://127.0.0.1/server-status/ [1766/1766] -> "-" [1]
> Apr 2 10:31:16 sputnik mysql[12210]: [12219]: INFO: MySQL monitor succeded
> Apr 2 10:31:18 sputnik apache[12220]: [12280]: INFO: 10:31:18
> URL:http://127.0.0.1/server-status/ [1767/1767] -> "-" [1]
>
>
> ha.cf:
>
> use_logd yes
>
> udpport 695
>
> keepalive 1
> deadtime 10
> warntime 5
> initdead 30 # depend on your hardware
>
> bcast bond0
> watchdog /dev/watchdog
>
> ping_group routers 10.14.0.10 10.14.0.11 10.14.0.12 10.14.0.13
>
> crm yes
> node sputnik.test
> node sputnik1.fra
>
> respawn root /usr/lib/heartbeat/pingd -m 500 -d 30s -a pingd
This will give you a pingd score of 500. A ping_group is treated as one
ping_host score wise.
If you want to take each ping hosts connectivity into play, you should have
ping 10.14.0.10
ping 10.14.0.11
ping 10.14.0.12
ping 10.14.0.13
instead. This would give a pingd score of 2000 (and make your setup work
score-wise).
>
> cib.xml:
>
> <?xml version="1.0" ?>
> <cib admin_epoch="0" epoch="0" num_updates="0">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <attributes>
> <nvpair
> id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster"
> value="true"/>
> <nvpair
> id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy"
> value="stop"/>
> <nvpair
> id="cib-bootstrap-options-default-resource-stickiness"
> name="default-resource-stickiness" value="0"/>
> <nvpair
> id="cib-bootstrap-options-default-resource-failure-stickiness"
> name="default-resource-failure-stickiness" value="0"/>
> <nvpair
> id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled"
> value="false"/>
> <nvpair
> id="cib-bootstrap-options-stonith-action" name="stonith-action"
> value="reboot"/>
> <nvpair
> id="cib-bootstrap-options-stop-orphan-resources"
> name="stop-orphan-resources" value="true"/>
> <nvpair
> id="cib-bootstrap-options-stop-orphan-actions"
> name="stop-orphan-actions" value="true"/>
> <nvpair
> id="cib-bootstrap-options-remove-after-stop" name="remove-after-stop"
> value="false"/>
> <nvpair
> id="cib-bootstrap-options-short-resource-names"
> name="short-resource-names" value="true"/>
> <nvpair
> id="cib-bootstrap-options-cluster-delay" name="cluster-delay"
> value="5min"/>
> <nvpair
> id="cib-bootstrap-options-is-managed-default" name="is-managed-default"
> value="true"/>
> <nvpair
> id="cib-bootstrap-options-default-action-timeout"
> name="default-action-timeout" value="120s"/>
> <nvpair
> id="cib-bootstrap-options-dc_deadtime" name="dc_deadtime" value="10s"/>
> <nvpair
> id="cib-bootstrap-options-cluster_recheck_interval"
> name="cluster_recheck_interval" value="0"/>
> <nvpair
> id="cib-bootstrap-options-election_timeout" name="election_timeout"
> value="120s"/>
> <nvpair
> id="cib-bootstrap-options-shutdown_escalation"
> name="shutdown_escalation" value="7min"/>
> <nvpair
> id="cib-bootstrap-options-crmd-integration-timeout"
> name="crmd-integration-timeout" value="3min"/>
> <nvpair
> id="cib-bootstrap-options-crmd-finalization-timeout"
> name="crmd-finalization-timeout" value="5min"/>
> <nvpair
> id="cib-bootstrap-options-pe-error-series-max"
> name="pe-error-series-max" value="-1"/>
> <nvpair
> id="cib-bootstrap-options-pe-warn-series-max" name="pe-warn-series-max"
> value="-1"/>
> <nvpair
> id="cib-bootstrap-options-pe-input-series-max"
> name="pe-input-series-max" value="-1"/>
> <nvpair
> id="cib-bootstrap-options-startup-fencing" name="startup-fencing"
> value="true"/>
> </attributes>
> </cluster_property_set>
> </crm_config>
> <nodes/>
> <resources>
> <group id="group_1">
> <primitive class="heartbeat"
> id="drbddisk_1" provider="heartbeat" type="drbddisk">
> <operations>
> <op id="drbddisk_1_mon"
> interval="30s" name="monitor" timeout="25s"/>
> </operations>
> <instance_attributes
> id="drbddisk_1_inst_attr">
> <attributes>
> <nvpair
> id="drbddisk_1_attr_1" name="1" value="data1"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive class="ocf" id="Filesystem_1"
> provider="heartbeat" type="Filesystem">
> <operations>
> <op
> id="Filesystem_1_mon" interval="30s" name="monitor" timeout="25s"/>
> </operations>
> <instance_attributes
> id="Filesystem_1_inst_attr">
> <attributes>
> <nvpair
> id="Filesystem_1_attr_0" name="device" value="/dev/drbd0"/>
> <nvpair
> id="Filesystem_1_attr_1" name="directory" value="/data1"/>
> <nvpair
> id="Filesystem_1_attr_2" name="fstype" value="ext3"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive class="ocf" id="mysql_1"
> provider="interactivedata" type="mysql">
> <operations>
> <op id="mysql_1_mon"
> interval="60s" name="monitor" timeout="55s"/>
> </operations>
> <instance_attributes
> id="mysql_1_inst_attr">
> <attributes>
> <nvpair
> id="mysql_1_attr_0" name="test_table" value="heartbeat.test"/>
> <nvpair
> id="mysql_1_attr_1" name="test_user" value="heartbeat"/>
> <nvpair
> id="mysql_1_attr_2" name="test_passwd" value="test"/>
> <nvpair
> id="mysql_1_attr_3" name="binary" value="/usr/bin/mysqld_safe"/>
> <nvpair
> id="mysql_1_attr_4" name="pid" value="/var/run/mysqld/mysqld.pid"/>
> <nvpair
> id="mysql_1_attr_5" name="OCF_CHECK_LEVEL" value="10"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive class="ocf" id="apache2_1"
> provider="heartbeat" type="apache">
> <operations>
> <op id="apache2_1_mon"
> interval="60s" name="monitor" timeout="55s"/>
> </operations>
> <instance_attributes
> id="apache_2_1_inst_attr">
> <attributes>
> <nvpair
> id="apache_2_1_attr_0" name="configfile"
> value="/etc/httpd/conf/httpd.conf"/>
> <nvpair
> id="apache_2_1_attr_1" name="statusurl"
> value="http://127.0.0.1/server-status/"/>
> <nvpair
> id="apache_2_1_attr_2" name="options" value="-DSSL"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <instance_attributes
> id="group_1_instance_attrs">
> <attributes>
> <nvpair
> id="group_1_target_role" name="target_role" value="started"/>
> <nvpair
> id="group_1_resource_stickiness" name="resource_stickiness" value="200"/>
> </attributes>
> </instance_attributes>
Apart from the fact that these attributes should be "meta_attributes"
instead of "instance_attributes", this will give you a score of 4 * 200
= 800 for the node the group is actually running on.
So with ping working, you should have scores of
800 + 500 for node1
500 for node2
Now you block icmp on node1. You will have:
800 on node1
500 on node2
So why should the cluster move any resource?
Regards
Dominik
> </group>
> </resources>
> <constraints>
> <rsc_location id="rsc_location_group_1"
> rsc="group_1">
> <rule id="prefered_location_group_1"
> score="100">
> <expression attribute="#uname"
> id="prefered_location_group_1_expr" operation="eq" value="sputnik.test"/>
> </rule>
> </rsc_location>
> <rsc_location id="group_1:connected" rsc="group_1">
> <rule id="group_1:connected:rule"
> score_attribute="pingd" >
> <expression
> id="group_1:connected:expr:defined" attribute="pingd" operation="defined"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> <status/>
> </cib>
>
>
> It would be nice to acutally get this setup working with scores, but it
> does not work as you see in the logs. There won't be a failover to the
> other node.
>
> Any hints, how i could get working the setup with scores as above?
>
>
> Thanks,
>
> Achim
>
>
>
>
> Dominik Klein schrieb:
>> Achim Stumpf wrote:
>>> Hi,
>>>
>>> Now it works. I have changed in cib.xml:
>>>
>>> <rsc_location id="group_1:connected"
>>> rsc="group_1">
>>> <rule id="group_1:connected:rule"
>>> score_attribute="pingd" >
>>> <expression
>>> id="group_1:connected:expr:defined" attribute="pingd"
>>> operation="defined"/>
>>> </rule>
>>> </rsc_location>
>>>
>>>
>>> to
>>>
>>>
>>> <rsc_location id="group_1:connected"
>>> rsc="group_1">
>>> <rule id="group_1:connected:rule"
>>> score="-INFINITY" boolean_op="or">
>>> <expression
>>> id="group_1:connected:expr:undefined" attribute="pingd"
>>> operation="not_defined"/>
>>> <expression
>>> id="group_1:connected:expr:zero" attribute="pingd" operation="lte"
>>> value="0"/>
>>> </rule>
>>> </rsc_location>
>>>
>>>
>>> Now it works as expected. Those two setups were described on:
>>>
>>> http://www.linux-ha.org/pingd
>>>
>>> But still it would be nice to get pingnodes working with scores as
>>> made in my first example or described on taht page in "Quickstart -
>>> Run my resource on the node with the best connectivity".
>>>
>>> Does anyone have any hints how to get that stuff working?
>>
>> That should work - what did you see (and what did you expect)? You
>> propably need to adjust the pingd multiplier. showscores.sh helps a
>> lot here to figure out what's not as expected (see
>> http://www.linux-ha.org/ScoreCalculation )
>>
>> Regards
>> Dominik
More information about the Linux-HA
mailing list