[Linux-HA] Resources not starting on other node
justin.kinney at academy.com
justin.kinney at academy.com
Thu Oct 25 15:46:15 MDT 2007
Hello all,
Like many others I've read and re-read the webpage and searched the
mailing list for the past week and a half, and I'm still not getting where
I want to be.
I'm working with a two-node cluster whose configuration details are below.
To produce the logs, I performed the following:
1. started heartbeat on both nodes
2. started all resources
3. unplugged 100 network on node1
4. waited exactly 5 minutes
5. plugged 100 network back into node1
First, all of my resources are in the "EnterpriseSprayer" group, and are
ordered and collocated. The startup order is always correct and and the
resources are always started on the same node. Using the gui, I can
manually standby the 1st node and all of the resources get transitioned
perfectly. The problem is that if I simulate a network failure on the 100
subnet (by unplugging the cable), the resources never transition.
My desired behavior is:
1. Only start the resources on a node where the gateway is reachable.
2. Keep monitoring the gateway and transition the resources if the gateway
becomes unreachable.
3. If any of the resources go down, restart them as necessary.
4. I don't care where the resources run, as long as they are running. (I
don't need them to stick to one node or the other)
Thanks,
Justin
Network details:
100 subnet is a DMZ vlan
66 is a heartbeat vlan
machine1:
sles10 sp1 x86_64
eth0 - 10.1.100.177
eth1 - 10.1.66.177
machine2:
sles10 sp1 x86_64
eth0 - 10.1.100.178
eth1 - 10.1.66.178
heartbeat 2.1.2
resources:
3 virtual aliases (ocf)
pound (lsb)
apace2 (ocf)
ha.cf:
deadtime 5
deadping 5
initdead 60
warntime 5
autojoin any
crm true
udpport 3636
ucast eth0 10.1.100.177 # 10.1.100.178 on the second node
ucast eth1 10.1.66.177 # 10.1.66.178 on the second node
respawn root /usr/lib64/heartbeat/mgmtd # Enable GUI management
tool
ping 10.1.100.1
cib.xml:
<cib generated="true" admin_epoch="0" have_quorum="true"
ignore_dtd="false" num_peers="2" cib_feature_revision="1.3"
num_updates="1" epoch="44" cib-last-written="Wed Oct 24 16:43:25 2007"
ccm_transition="2" dc_uuid="dd9f8237-50c4-482e-9b98-924c0b878a04">
<configuration>
<crm_config/>
<nodes>
<node uname="plspgen01" type="normal"
id="dd9f8237-50c4-482e-9b98-924c0b878a04">
</node>
<node uname="plspgen02" type="normal"
id="d3d4fb5d-8cf0-4c60-90cd-b5ec9b24c980">
</node>
</nodes>
<resources>
<group ordered="true" collocated="true" id="EnterpriseSprayer">
<primitive id="ip_10-1-100-180" class="ocf" type="IPaddr"
provider="heartbeat" is_managed="true" description="HA Address
10.1.100.180">
<instance_attributes id="ip_10-1-100-180_instance_attributes">
<attributes>
<nvpair id="9593fec4-fc97-4e7d-a74d-2ffd06a7be5e" name="ip"
value="10.1.100.180"/>
<nvpair id="ip_10-1-100-180_target_role" name="target_role"
value="started"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="ip_10-1-100-181" class="ocf" type="IPaddr"
provider="heartbeat" is_managed="true" description="HA Address
10.1.100.181">
<instance_attributes id="ip_10-1-100-181_instance_attributes">
<attributes>
<nvpair id="4f3c49f9-dcd9-4f46-81b8-eb13cb38f8d4" name="ip"
value="10.1.100.181"/>
<nvpair id="ip_10-1-100-181_target_role" name="target_role"
value="started"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="ip_10-1-100-182" class="ocf" type="IPaddr"
provider="heartbeat" is_managed="true" description="HA Address
10.1.100.182">
<instance_attributes id="ip_10-1-100-182_instance_attributes">
<attributes>
<nvpair id="b474dc00-41b7-4982-8eda-4a20105dd706" name="ip"
value="10.1.100.182"/>
<nvpair id="ip_10-1-100-182_target_role" name="target_role"
value="started"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="apache_process" class="lsb" type="apache2"
provider="heartbeat" description="Apache process running on the HA
addresses">
<instance_attributes id="apache_process_instance_attrs">
<attributes>
<nvpair id="apache_process_target_role" name="target_role"
value="started"/>
</attributes>
</instance_attributes>
<operations>
<op id="b1" name="stop" timeout="3s"/>
<op id="b2" name="start" timeout="5s"/>
<op id="b3" name="monitor" interval="10s" timeout="3s"/>
</operations>
</primitive>
<primitive class="lsb" type="pound2" provider="heartbeat"
description="Pound process running on the HA addresses"
id="pound_process">
<instance_attributes id="pound_process_instance_attrs">
<attributes>
<nvpair name="target_role" id="pound_process_target_role"
value="started"/>
</attributes>
</instance_attributes>
</primitive>
<instance_attributes id="EnterpriseSprayer_attributes">
<attributes>
<nvpair id="EnterpriseSprayer_target_role" name="target_role"
value="started"/>
</attributes>
</instance_attributes>
</group>
<clone id="pingd">
<instance_attributes id="pingd">
<attributes>
<nvpair id="pingd-clone_max" name="clone_max" value="2"/>
<nvpair id="pingd-clone_node_max" name="clone_node_max"
value="1"/>
</attributes>
</instance_attributes>
<primitive id="gateway" class="ocf" type="pingd"
provider="heartbeat">
<operations>
<op id="gateway:child-monitor" name="monitor"
interval="20s" timeout="40s" prereq="nothing"/>
<op id="gateway:child-start" name="start"
prereq="nothing"/>
</operations>
<instance_attributes id="pingd_inst_attrs">
<attributes>
<nvpair id="pingd-dampen" name="dampen" value="5s"/>
<nvpair id="pingd-multiplier" name="multiplier"
value="100"/>
</attributes>
</instance_attributes>
</primitive>
</clone>
</resources>
<constraints>
<rsc_colocation id="colocation_EnterpriseSprayer"
from="EnterpriseSprayer" to="EnterpriseSprayer" score="INFINITY"/>
<rsc_location id="gateway:connected" rsc="EnterpriseSprayer">
<rule id="gateway:connected:rule" score="-INFINITY"
boolean_op="or">
<expression id="gateway:connected:expr:undefined"
attribute="pingd" operation="not_defined"/>
<expression id="gateway:connected:expr:zero" attribute="pingd"
operation="lte" value="0"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node1_log.out
Type: application/octet-stream
Size: 14919 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20071025/790a9f7c/node1_log-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node2_log.out
Type: application/octet-stream
Size: 18170 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20071025/790a9f7c/node2_log-0001.obj
More information about the Linux-HA
mailing list