[Linux-HA] Resources not starting on other node

justin.kinney at academy.com justin.kinney at academy.com
Thu Oct 25 15:46:15 MDT 2007


Hello all,

Like many others I've read and re-read the webpage and searched the 
mailing list for the past week and a half, and I'm still not getting where 
I want to be.

I'm working with a two-node cluster whose configuration details are below. 
 To produce the logs, I performed the following:
1. started heartbeat on both nodes
2. started all resources
3. unplugged 100 network on node1
4. waited exactly 5 minutes
5. plugged 100 network back into node1

First, all of my resources are in the "EnterpriseSprayer" group, and are 
ordered and collocated.  The startup order is always correct and and the 
resources are always started on the same node.  Using the gui, I can 
manually standby the 1st node and all of the resources get transitioned 
perfectly.  The problem is that if I simulate a network failure on the 100 
subnet (by unplugging the cable), the resources never transition.

My desired behavior is:

1. Only start the resources on a node where the gateway is reachable.
2. Keep monitoring the gateway and transition the resources if the gateway 
becomes unreachable.
3. If any of the resources go down, restart them as necessary.
4. I don't care where the resources run, as long as they are running.  (I 
don't need them to stick to one node or the other)

Thanks,
Justin

Network details:
100 subnet is a DMZ vlan
66 is a heartbeat vlan

machine1:
sles10 sp1 x86_64
eth0 - 10.1.100.177
eth1 - 10.1.66.177

machine2:
sles10 sp1 x86_64
eth0 - 10.1.100.178
eth1 - 10.1.66.178

heartbeat 2.1.2
resources:
3 virtual aliases (ocf)
pound (lsb)
apace2 (ocf)

ha.cf:
deadtime 5
deadping 5
initdead 60 
warntime 5 
autojoin any 
crm true
udpport 3636
ucast eth0 10.1.100.177         # 10.1.100.178 on the second node
ucast eth1 10.1.66.177      # 10.1.66.178 on the second node
respawn root /usr/lib64/heartbeat/mgmtd         # Enable GUI management 
tool
ping 10.1.100.1

cib.xml:
<cib generated="true" admin_epoch="0" have_quorum="true" 
ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" 
num_updates="1" epoch="44" cib-last-written="Wed Oct 24 16:43:25 2007" 
ccm_transition="2" dc_uuid="dd9f8237-50c4-482e-9b98-924c0b878a04">
   <configuration>
     <crm_config/>
     <nodes>
       <node uname="plspgen01" type="normal" 
id="dd9f8237-50c4-482e-9b98-924c0b878a04">
       </node>
       <node uname="plspgen02" type="normal" 
id="d3d4fb5d-8cf0-4c60-90cd-b5ec9b24c980">
       </node>
     </nodes>

     <resources>
       <group ordered="true" collocated="true" id="EnterpriseSprayer">
         <primitive id="ip_10-1-100-180" class="ocf" type="IPaddr" 
provider="heartbeat" is_managed="true" description="HA Address 
10.1.100.180">
           <instance_attributes id="ip_10-1-100-180_instance_attributes">
             <attributes>
               <nvpair id="9593fec4-fc97-4e7d-a74d-2ffd06a7be5e" name="ip" 
value="10.1.100.180"/>
               <nvpair id="ip_10-1-100-180_target_role" name="target_role" 
value="started"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive id="ip_10-1-100-181" class="ocf" type="IPaddr" 
provider="heartbeat" is_managed="true" description="HA Address 
10.1.100.181">
           <instance_attributes id="ip_10-1-100-181_instance_attributes">
             <attributes>
               <nvpair id="4f3c49f9-dcd9-4f46-81b8-eb13cb38f8d4" name="ip" 
value="10.1.100.181"/>
               <nvpair id="ip_10-1-100-181_target_role" name="target_role" 
value="started"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive id="ip_10-1-100-182" class="ocf" type="IPaddr" 
provider="heartbeat" is_managed="true" description="HA Address 
10.1.100.182">
           <instance_attributes id="ip_10-1-100-182_instance_attributes">
             <attributes>
               <nvpair id="b474dc00-41b7-4982-8eda-4a20105dd706" name="ip" 
value="10.1.100.182"/>
               <nvpair id="ip_10-1-100-182_target_role" name="target_role" 
value="started"/>
             </attributes>
           </instance_attributes>
         </primitive>

         <primitive id="apache_process" class="lsb" type="apache2" 
provider="heartbeat" description="Apache process running on the HA 
addresses">
           <instance_attributes id="apache_process_instance_attrs">
             <attributes>
               <nvpair id="apache_process_target_role" name="target_role" 
value="started"/>
             </attributes>
           </instance_attributes>
           <operations>
              <op id="b1" name="stop" timeout="3s"/>
              <op id="b2" name="start" timeout="5s"/>
              <op id="b3" name="monitor" interval="10s" timeout="3s"/>
           </operations>
         </primitive>

         <primitive class="lsb" type="pound2" provider="heartbeat" 
description="Pound process running on the HA addresses" 
id="pound_process">
           <instance_attributes id="pound_process_instance_attrs">
             <attributes>
               <nvpair name="target_role" id="pound_process_target_role" 
value="started"/>
             </attributes>
           </instance_attributes>
         </primitive>

         <instance_attributes id="EnterpriseSprayer_attributes">
           <attributes>
             <nvpair id="EnterpriseSprayer_target_role" name="target_role" 
value="started"/>
           </attributes>
         </instance_attributes>
       </group>

       <clone id="pingd">
          <instance_attributes id="pingd">
             <attributes>
                <nvpair id="pingd-clone_max" name="clone_max" value="2"/>
                <nvpair id="pingd-clone_node_max" name="clone_node_max" 
value="1"/>
             </attributes>
          </instance_attributes>
          <primitive id="gateway" class="ocf" type="pingd" 
provider="heartbeat">
             <operations>
                <op id="gateway:child-monitor" name="monitor" 
interval="20s" timeout="40s" prereq="nothing"/>
                <op id="gateway:child-start" name="start" 
prereq="nothing"/>
             </operations>
             <instance_attributes id="pingd_inst_attrs">
                <attributes>
                   <nvpair id="pingd-dampen" name="dampen" value="5s"/>
                   <nvpair id="pingd-multiplier" name="multiplier" 
value="100"/>
                </attributes>
             </instance_attributes>
          </primitive>
        </clone>
     </resources>

     <constraints>
       <rsc_colocation id="colocation_EnterpriseSprayer" 
from="EnterpriseSprayer" to="EnterpriseSprayer" score="INFINITY"/>
       <rsc_location id="gateway:connected" rsc="EnterpriseSprayer">
         <rule id="gateway:connected:rule" score="-INFINITY" 
boolean_op="or">
           <expression id="gateway:connected:expr:undefined" 
attribute="pingd" operation="not_defined"/>
           <expression id="gateway:connected:expr:zero" attribute="pingd" 
operation="lte" value="0"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node1_log.out
Type: application/octet-stream
Size: 14919 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20071025/790a9f7c/node1_log-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node2_log.out
Type: application/octet-stream
Size: 18170 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20071025/790a9f7c/node2_log-0001.obj


More information about the Linux-HA mailing list