[Linux-HA] pingd and resources

Phil Manuel phil at zomojo.com
Tue Oct 2 00:00:23 MDT 2007


Hi,

I have a two node cluster configured with a group of IPAddr2 resources, 
4 ip addresses each on a separate interface.  Each resource successfully 
starts and if the heartbeat service fails or the box fails they 
transition across to the other node.  If I manually take down an 
interface using ipdown <interface>, then heartbeat recognises the 
interface is down and restarts it.

The only issue I have is when the ethernet cable is removed, heartbeat 
just doesn't notice, leaving the resources running on the main node.

In order to overcome this situation I tried to configure pingd, extract 
from cib.xml below:-

         <primitive id="pingd:connected" class="ocf" type="pingd" 
provider="heartbeat">
           <instance_attributes id="pingd:connected_instance_attrs">
             <attributes>
               <nvpair id="15c8d68d-9729-4db9-b92e-141d30e8eac3" 
name="pidfile" value="/tmp/ha_pingd_pid"/>
               <nvpair id="6b01b3be-c298-4f2e-8d08-e22084f5c5ca" 
name="host_list" value="carbon dubnium sydsw1"/>
               <nvpair id="979fb490-8899-4368-a33a-d06c1ae8dadb" 
name="name" value="pingd:connected:id"/>
               <nvpair id="8cd4aff4-117b-4e33-ad4c-fe3cd220255b" 
name="multiplier" value="100"/>
             </attributes>
           </instance_attributes>
         </primitive>

       <rsc_location id="group_1:connected" rsc="group_1">
         <rule id="group_1:connected:rule" 
score_attribute="pingd:connected">
           <expression id="group_1:connected:expr:defined" 
attribute="pingd:connected" operation="defined"/>
         </rule>
       </rsc_location>

This is just as happy with the situation as before, even though the node 
with the failed network connection in no way can ping those hosts.

In the log from the first node:-
Oct  2 15:57:06 sydgw1 lrmd: [32694]: info: RA output: 
(pingd:connected:start:stdout) Adding ping host carbonAdding ping host 
dubniumAdding ping host sydsw1
Oct  2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM 
operation pingd:connected_start_0 (call=16, rc=0) complete
Oct  2 15:57:06 sydgw1 crmd: [32697]: info: build_operation_update: 
Digest for 0:0;13:2:d1e63583-0eba-4a44-8b53-b10ed4aa449e 
(pingd:connected_start_0) was 30362598aa31f8e8d68c0c9870c6703c
Oct  2 15:57:06 sydgw1 crmd: [32697]: info: log_data_element: 
build_operation_update: digest:source <parameters multiplier="100" 
name="pingd:connected:id" host_list="carbon dubnium sydsw1" 
pidfile="/tmp/ha_pingd_pid"/>
Oct  2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM 
operation IPaddr2_4_monitor_5000 (call=15, rc=0) complete
Oct  2 15:57:11 sydgw1 pingd: [643]: info: do_node_walk: Requesting the 
list of configured nodes
Oct  2 15:57:11 sydgw1 attrd: [32696]: info: find_hash_entry: Creating 
hash entry for pingd:connected:id
Oct  2 15:57:11 sydgw1 pingd: [643]: info: send_update: 0 active ping nodes
Oct  2 15:57:11 sydgw1 pingd: [643]: info: main: Starting pingd
Oct  2 15:57:12 sydgw1 attrd: [32696]: info: attrd_trigger_update: 
Sending flush op to all hosts for: pingd:connected:id
Oct  2 15:57:12 sydgw1 attrd: [32696]: info: attrd_ha_callback: flush 
message from sydgw1.zomojo.com
Oct  2 15:57:12 sydgw1 attrd: [32696]: info: attrd_perform_update: Sent 
update 3: pingd:connected:id=0


What have I missed ?

Thanks for your help

Phil.


More information about the Linux-HA mailing list