[Linux-HA] pingd and resources
Phil Manuel
phil at zomojo.com
Tue Oct 2 00:00:23 MDT 2007
Hi,
I have a two node cluster configured with a group of IPAddr2 resources,
4 ip addresses each on a separate interface. Each resource successfully
starts and if the heartbeat service fails or the box fails they
transition across to the other node. If I manually take down an
interface using ipdown <interface>, then heartbeat recognises the
interface is down and restarts it.
The only issue I have is when the ethernet cable is removed, heartbeat
just doesn't notice, leaving the resources running on the main node.
In order to overcome this situation I tried to configure pingd, extract
from cib.xml below:-
<primitive id="pingd:connected" class="ocf" type="pingd"
provider="heartbeat">
<instance_attributes id="pingd:connected_instance_attrs">
<attributes>
<nvpair id="15c8d68d-9729-4db9-b92e-141d30e8eac3"
name="pidfile" value="/tmp/ha_pingd_pid"/>
<nvpair id="6b01b3be-c298-4f2e-8d08-e22084f5c5ca"
name="host_list" value="carbon dubnium sydsw1"/>
<nvpair id="979fb490-8899-4368-a33a-d06c1ae8dadb"
name="name" value="pingd:connected:id"/>
<nvpair id="8cd4aff4-117b-4e33-ad4c-fe3cd220255b"
name="multiplier" value="100"/>
</attributes>
</instance_attributes>
</primitive>
<rsc_location id="group_1:connected" rsc="group_1">
<rule id="group_1:connected:rule"
score_attribute="pingd:connected">
<expression id="group_1:connected:expr:defined"
attribute="pingd:connected" operation="defined"/>
</rule>
</rsc_location>
This is just as happy with the situation as before, even though the node
with the failed network connection in no way can ping those hosts.
In the log from the first node:-
Oct 2 15:57:06 sydgw1 lrmd: [32694]: info: RA output:
(pingd:connected:start:stdout) Adding ping host carbonAdding ping host
dubniumAdding ping host sydsw1
Oct 2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM
operation pingd:connected_start_0 (call=16, rc=0) complete
Oct 2 15:57:06 sydgw1 crmd: [32697]: info: build_operation_update:
Digest for 0:0;13:2:d1e63583-0eba-4a44-8b53-b10ed4aa449e
(pingd:connected_start_0) was 30362598aa31f8e8d68c0c9870c6703c
Oct 2 15:57:06 sydgw1 crmd: [32697]: info: log_data_element:
build_operation_update: digest:source <parameters multiplier="100"
name="pingd:connected:id" host_list="carbon dubnium sydsw1"
pidfile="/tmp/ha_pingd_pid"/>
Oct 2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM
operation IPaddr2_4_monitor_5000 (call=15, rc=0) complete
Oct 2 15:57:11 sydgw1 pingd: [643]: info: do_node_walk: Requesting the
list of configured nodes
Oct 2 15:57:11 sydgw1 attrd: [32696]: info: find_hash_entry: Creating
hash entry for pingd:connected:id
Oct 2 15:57:11 sydgw1 pingd: [643]: info: send_update: 0 active ping nodes
Oct 2 15:57:11 sydgw1 pingd: [643]: info: main: Starting pingd
Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_trigger_update:
Sending flush op to all hosts for: pingd:connected:id
Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_ha_callback: flush
message from sydgw1.zomojo.com
Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_perform_update: Sent
update 3: pingd:connected:id=0
What have I missed ?
Thanks for your help
Phil.
More information about the Linux-HA
mailing list