[Linux-HA] Problems with resources failing over and other little
problems.
Chris Gallo
chrisagallo at gmail.com
Fri Sep 1 16:18:13 MDT 2006
First, my setup.
I have 2 servers both running HA 2.0.6., running RHE4u3 on an i386 (intel)
I'm using an ocf version of an ldap script that I hacked together to
conform with http://www.linux-ha.org/OCFResourceAgent
The thing that I really need working is if my resource fails, to fail
over to the other node, which currently does not happen. Heartbeat
knows when my resource goes down and is able to restart it just fine
on the node which is good. However if Heartbeat fails to restart my
resource (it fails to start because I broke the config file) it just
sits there. Both nodes do nothing.
Here is my cib.xml file which I hope will be helpful. I am pretty new
to heartbeat so I don't know all the fancy stuff yet. This was pretty
much peiced together from the haresources2cib.py and the getting
started guides.
=============================================
<cib generated="true" admin_epoch="0" have_quorum="true"
num_peers="1" ccm_transition="1" cib_feature_revision="1.2"
crm_feature_set="1.0.4" debug_source="create_node_entry"
dc_uuid="3afea6dc-2c7b-480b-85f4-44e080dffff2" last_written="Fri Sep
1 14:38:09 2006" epoch="63" num_updates="766">
<configuration>
<crm_config>
<nvpair id="default_resource_stickiness"
name="default_resource_stickiness" value="100"/>
<nvpair id="transition_timeout" name="transition_timeout" value="10s"/>
<nvpair id="transition_idle_timeout"
name="transition_idle_timeout" value="20s"/>
<nvpair id="stonith_enabled" name="stonith_enabled" value="false"/>
<nvpair id="symmetric_cluster" name="symmetric_cluster" value="false"/>
<nvpair id="no_quorum_policy" name="no_quorum_policy" value="ignore"/>
</crm_config>
<nodes>
<node id="3afea6dc-2c7b-480b-85f4-44e080dffff2" uname="ldap-1"
type="normal"/>
<node id="daedf2f2-95df-457d-a534-c53787dc8eb0" uname="ldap-2"
type="normal"/>
<node id="03c973d1-2652-41a3-87f8-1ad6f23a5e3c" uname="ldap-2"
type="normal"/>
</nodes>
<resources>
<primitive id="ip_resource_1" class="ocf" type="IPaddr"
provider="heartbeat" resource_stickiness="0">
<instance_attributes>
<attributes>
<nvpair name="ip" value="207.218.204.194" id="floating_ip"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="ldap" class="ocf" type="ldap" provider="ldap"
resource_stickiness="0" multiple_active="block">
<instance_attributes id="resource_ldap_instance_attrs">
<attributes>
<nvpair id="resource_apache_target_role"
name="target_role" value="started"/>
</attributes>
</instance_attributes>
<operations>
<op id="1" name="stop" timeout="8s"/>
<op id="2" name="start" timeout="9s" on_fail="fence"/>
<op id="3" name="monitor" interval="15s" timeout="4s"/>
</operations>
</primitive>
</resources>
<constraints>
<rsc_location id="run_ldap" rsc="ldap">
<rule id="pref_run_ldap_on_ldap1" score="100">
<expression attribute="#uname" operation="eq"
value="ldap-1" id="a8dc2d64-c848-42c4-baf5-3c2ec79323f6"/>
</rule>
</rsc_location>
<rsc_location id="run_ldap2" rsc="ldap">
<rule id="pref_run_ldap_on_ldap2" score="100">
<expression attribute="#uname" operation="eq"
value="ldap-2" id="35b13bff-cd78-481f-97ef-27053ac8aa4b"/>
</rule>
</rsc_location>
<rsc_location id="run_ip_resource_1" rsc="ip_resource_1">
<rule id="pref_run_on_ldap1" score="100">
<expression attribute="#uname" operation="eq"
value="ldap-1" id="6aa68f52-2eb5-430b-910b-435ffefb55bf"/>
</rule>
<rule id="pref_run_on_ldap2" score="0">
<expression attribute="#uname" operation="eq"
value="ldap-2" id="578d2e59-3508-4c40-b932-9a5ba9117265"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
======================================
I dont know why there are 2 nodes for ldap2, those get generated automatically.
Also, I have to start one server, wait for it to get all the resources
up, then start the other. If I start them both at the same time they
both kinda think the other is in charge and don't do anything.
Now, what I'm aiming for here is to have the IP be failed over if ldap
on ldap1 fails (or ldap1 fails in general) but have ldap always
running on both servers. Right now when I start the second server it
doesnt start ldap like I think it should.
Here is the log for ldap1, from starting up by itself to ldap2 coming online.
http://isthesuck.com/ha-log
And here is the log for ldap2 coming online while ldap1 is already up
http://isthesuck.com/ha-log2
And here is the log for when I stop ldap, it gets restarted, then I
make some bad changes to the ldap conf, stop ldap and heartbeat fails
to either restart, or fail it over.
http://isthesuck.com/ha-log3
I hope that is enough information. I've been muddling through this
problem for about a week now and getting no luck with this part :( and
any help would really be appreciated.
I have been all over the http://www.linux-ha.org website but I just
can't seem to figure out how to get heartbeat to act like I need it
to.
-Thanks
More information about the Linux-HA
mailing list