[Linux-HA] heartbeat 2.0.8: pingd failover

greno at verizon.net greno at verizon.net
Sat Feb 3 23:44:08 MST 2007


>From: Andreas Kurz <andreas.kurz at gmail.com>
>Date: 2007/02/03 Sat PM 04:53:10 CST
>To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
>Subject: Re: [Linux-HA] heartbeat 2.0.8: pingd failover

>Try to add a monitor operation to the ip_resource to check it regularly.
>
>Is the "pingd_score" completely removed from the cib, in the case of
>an uplugged interface, or is the value of the disconnected node zero?
>... This would explain why the constraint is not working correctly ...
>then a constraint depending on the value of pingd_score would be more
>helpful.
>
>Regards,
>Andreas
>

Ok, I changed things around to try this with a resource group since that is what I really need to end up with.
Same ha.cf and modified cib.xml:

==================================
 <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" generated="true" epoch="17" num_updates="213" cib-last-written="Sun Feb  4 01:00:05 2007" ccm_transition="4" dc_uuid="29626f17-db1f-4139-aa33-5a6b4110da51">
   <configuration>
     <crm_config/>
     <nodes>
       <node id="29626f17-db1f-4139-aa33-5a6b4110da51" uname="server2" type="normal"/>
       <node id="67b0bfa7-0165-4a8c-9c0f-ec82e0ae2c91" uname="server1" type="normal"/>
     </nodes>
     <resources>
       <group id="GRP_webserver_RG">
         <primitive id="GRP_webserverip_R" class="ocf" type="IPaddr" provider="heartbeat">
           <instance_attributes id="GRP_webserver_RA">
             <attributes>
               <nvpair id="GRP_webserverip_RA_ip" name="ip" value="192.168.1.215"/>
             </attributes>
           </instance_attributes>
         </primitive>
       </group>
     </resources>
     <constraints>
       <rsc_location id="run_GRP_webserver_RG" rsc="GRP_webserver_RG">
         <rule id="pref_run_GRP_webserver_RG" score="100">
           <expression id="pref_run_GRP_webserver_RG_expr1" attribute="#uname" operation="eq" value="server1"/>
         </rule>
       </rsc_location>
       <rsc_location id="GRP_webserver_RG:not_connected" rsc="GRP_webserver_RG">
         <rule id="GRP_webserver_RG:not_connected:rule" score="-INFINITY">
           <expression id="GRP_webserver_RG:not_connected:expr" attribute="pingd_score" operation="not_defined"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>

==================================

What I get now is that the resource fails over from server1 to server2.  And then when connectivity is restored to server1, the log on server2 shows "Link server1:eth0 up." and "Started server2"; but it actually started server1 which I tested by logging into the webserver virtual ip.  So I got the failover but with having autofailback set to no I expected that the IP would remain on server2.  It told me that it started it on server2 but actually it is on server1.

So now with the IP on server1 I pulled the network cable and the log on server2 shows server1:eth0 is dead and says Starting server1.  And if I try to login to the IP from server1 itself (which is disconnected at this point) it is on server1.  This makes no sense.





More information about the Linux-HA mailing list