[Linux-HA] pingd and resources

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Oct 2 04:35:31 MDT 2007


Hi,

On Tue, Oct 02, 2007 at 04:00:23PM +1000, Phil Manuel wrote:
> Hi,
> 
> I have a two node cluster configured with a group of IPAddr2 resources, 
> 4 ip addresses each on a separate interface.  Each resource successfully 
> starts and if the heartbeat service fails or the box fails they 
> transition across to the other node.  If I manually take down an 
> interface using ipdown <interface>, then heartbeat recognises the 
> interface is down and restarts it.
> 
> The only issue I have is when the ethernet cable is removed, heartbeat 
> just doesn't notice, leaving the resources running on the main node.
> 
> In order to overcome this situation I tried to configure pingd, extract 
> from cib.xml below:-
> 
>         <primitive id="pingd:connected" class="ocf" type="pingd" 
> provider="heartbeat">
>           <instance_attributes id="pingd:connected_instance_attrs">
>             <attributes>
>               <nvpair id="15c8d68d-9729-4db9-b92e-141d30e8eac3" 
> name="pidfile" value="/tmp/ha_pingd_pid"/>
>               <nvpair id="6b01b3be-c298-4f2e-8d08-e22084f5c5ca" 
> name="host_list" value="carbon dubnium sydsw1"/>
>               <nvpair id="979fb490-8899-4368-a33a-d06c1ae8dadb" 
> name="name" value="pingd:connected:id"/>

This name ...

>               <nvpair id="8cd4aff4-117b-4e33-ad4c-fe3cd220255b" 
> name="multiplier" value="100"/>
>             </attributes>
>           </instance_attributes>
>         </primitive>
> 
>       <rsc_location id="group_1:connected" rsc="group_1">
>         <rule id="group_1:connected:rule" 
> score_attribute="pingd:connected">

... does not match this one.

Thanks,

Dejan

>           <expression id="group_1:connected:expr:defined" 
> attribute="pingd:connected" operation="defined"/>
>         </rule>
>       </rsc_location>
> 
> This is just as happy with the situation as before, even though the node 
> with the failed network connection in no way can ping those hosts.
> 
> In the log from the first node:-
> Oct  2 15:57:06 sydgw1 lrmd: [32694]: info: RA output: 
> (pingd:connected:start:stdout) Adding ping host carbonAdding ping host 
> dubniumAdding ping host sydsw1
> Oct  2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM 
> operation pingd:connected_start_0 (call=16, rc=0) complete
> Oct  2 15:57:06 sydgw1 crmd: [32697]: info: build_operation_update: 
> Digest for 0:0;13:2:d1e63583-0eba-4a44-8b53-b10ed4aa449e 
> (pingd:connected_start_0) was 30362598aa31f8e8d68c0c9870c6703c
> Oct  2 15:57:06 sydgw1 crmd: [32697]: info: log_data_element: 
> build_operation_update: digest:source <parameters multiplier="100" 
> name="pingd:connected:id" host_list="carbon dubnium sydsw1" 
> pidfile="/tmp/ha_pingd_pid"/>
> Oct  2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM 
> operation IPaddr2_4_monitor_5000 (call=15, rc=0) complete
> Oct  2 15:57:11 sydgw1 pingd: [643]: info: do_node_walk: Requesting the 
> list of configured nodes
> Oct  2 15:57:11 sydgw1 attrd: [32696]: info: find_hash_entry: Creating 
> hash entry for pingd:connected:id
> Oct  2 15:57:11 sydgw1 pingd: [643]: info: send_update: 0 active ping nodes
> Oct  2 15:57:11 sydgw1 pingd: [643]: info: main: Starting pingd
> Oct  2 15:57:12 sydgw1 attrd: [32696]: info: attrd_trigger_update: 
> Sending flush op to all hosts for: pingd:connected:id
> Oct  2 15:57:12 sydgw1 attrd: [32696]: info: attrd_ha_callback: flush 
> message from sydgw1.zomojo.com
> Oct  2 15:57:12 sydgw1 attrd: [32696]: info: attrd_perform_update: Sent 
> update 3: pingd:connected:id=0
> 
> 
> What have I missed ?
> 
> Thanks for your help
> 
> Phil.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list