[Linux-HA] Problems with resources failing over and other little
problems.
Serge Dubrouski
sergeyfd at gmail.com
Fri Sep 1 19:09:54 MDT 2006
> First, my setup.
> I have 2 servers both running HA 2.0.6., running RHE4u3 on an i386 (intel)
> I'm using an ocf version of an ldap script that I hacked together to
> conform with http://www.linux-ha.org/OCFResourceAgent
>
> The thing that I really need working is if my resource fails, to fail
> over to the other node, which currently does not happen. Heartbeat
> knows when my resource goes down and is able to restart it just fine
> on the node which is good. However if Heartbeat fails to restart my
> resource (it fails to start because I broke the config file) it just
> sits there. Both nodes do nothing.
>
> Here is my cib.xml file which I hope will be helpful. I am pretty new
> to heartbeat so I don't know all the fancy stuff yet. This was pretty
> much peiced together from the haresources2cib.py and the getting
> started guides.
>
> =============================================
> <cib generated="true" admin_epoch="0" have_quorum="true"
> num_peers="1" ccm_transition="1" cib_feature_revision="1.2"
> crm_feature_set="1.0.4" debug_source="create_node_entry"
> dc_uuid="3afea6dc-2c7b-480b-85f4-44e080dffff2" last_written="Fri Sep
> 1 14:38:09 2006" epoch="63" num_updates="766">
> <configuration>
> <crm_config>
> <nvpair id="default_resource_stickiness"
> name="default_resource_stickiness" value="100"/>
> <nvpair id="transition_timeout" name="transition_timeout" value="10s"/>
> <nvpair id="transition_idle_timeout"
> name="transition_idle_timeout" value="20s"/>
> <nvpair id="stonith_enabled" name="stonith_enabled" value="false"/>
> <nvpair id="symmetric_cluster" name="symmetric_cluster" value="false"/>
> <nvpair id="no_quorum_policy" name="no_quorum_policy" value="ignore"/>
> </crm_config>
> <nodes>
> <node id="3afea6dc-2c7b-480b-85f4-44e080dffff2" uname="ldap-1"
> type="normal"/>
> <node id="daedf2f2-95df-457d-a534-c53787dc8eb0" uname="ldap-2"
> type="normal"/>
> <node id="03c973d1-2652-41a3-87f8-1ad6f23a5e3c" uname="ldap-2"
> type="normal"/>
> </nodes>
> <resources>
> <primitive id="ip_resource_1" class="ocf" type="IPaddr"
> provider="heartbeat" resource_stickiness="0">
> <instance_attributes>
> <attributes>
> <nvpair name="ip" value="207.218.204.194" id="floating_ip"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive id="ldap" class="ocf" type="ldap" provider="ldap"
> resource_stickiness="0" multiple_active="block">
> <instance_attributes id="resource_ldap_instance_attrs">
> <attributes>
> <nvpair id="resource_apache_target_role"
> name="target_role" value="started"/>
> </attributes>
> </instance_attributes>
> <operations>
> <op id="1" name="stop" timeout="8s"/>
> <op id="2" name="start" timeout="9s" on_fail="fence"/>
> <op id="3" name="monitor" interval="15s" timeout="4s"/>
> </operations>
> </primitive>
> </resources>
> <constraints>
> <rsc_location id="run_ldap" rsc="ldap">
> <rule id="pref_run_ldap_on_ldap1" score="100">
> <expression attribute="#uname" operation="eq"
> value="ldap-1" id="a8dc2d64-c848-42c4-baf5-3c2ec79323f6"/>
> </rule>
> </rsc_location>
> <rsc_location id="run_ldap2" rsc="ldap">
> <rule id="pref_run_ldap_on_ldap2" score="100">
> <expression attribute="#uname" operation="eq"
> value="ldap-2" id="35b13bff-cd78-481f-97ef-27053ac8aa4b"/>
> </rule>
> </rsc_location>
> <rsc_location id="run_ip_resource_1" rsc="ip_resource_1">
> <rule id="pref_run_on_ldap1" score="100">
> <expression attribute="#uname" operation="eq"
> value="ldap-1" id="6aa68f52-2eb5-430b-910b-435ffefb55bf"/>
> </rule>
> <rule id="pref_run_on_ldap2" score="0">
> <expression attribute="#uname" operation="eq"
> value="ldap-2" id="578d2e59-3508-4c40-b932-9a5ba9117265"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
> ======================================
>
> I dont know why there are 2 nodes for ldap2, those get generated automatically.
> Also, I have to start one server, wait for it to get all the resources
> up, then start the other. If I start them both at the same time they
> both kinda think the other is in charge and don't do anything.
>
> Now, what I'm aiming for here is to have the IP be failed over if ldap
> on ldap1 fails (or ldap1 fails in general) but have ldap always
> running on both servers. Right now when I start the second server it
> doesnt start ldap like I think it should.
>
> Here is the log for ldap1, from starting up by itself to ldap2 coming online.
> http://isthesuck.com/ha-log
> And here is the log for ldap2 coming online while ldap1 is already up
> http://isthesuck.com/ha-log2
> And here is the log for when I stop ldap, it gets restarted, then I
> make some bad changes to the ldap conf, stop ldap and heartbeat fails
> to either restart, or fail it over.
> http://isthesuck.com/ha-log3
>
> I hope that is enough information. I've been muddling through this
> problem for about a week now and getting no luck with this part :( and
> any help would really be appreciated.
>
> I have been all over the http://www.linux-ha.org website but I just
> can't seem to figure out how to get heartbeat to act like I need it
> to.
>
> -Thanks
First it would be good to tale a look at your ha.cf file to figure out
why you have ldap-2 twice in the nodes section. I think the problem is
there.
Second. Set "symmetric_cluster" to true. You want both nodes to be
able to run your resources, right?
Third. Set different scores for rsc_location for different nodes. Node
with the higher score will be primary node.
Fourth. I would combine you IP and LDAP resources into a group. It
doesn't make sense to fail LDAP over without failing IP. In your
current configuration if LDAP fails but IP doesn't only LDAP will be
failed over.
Fifth. It looks like you messed up with Apache as a resource in you
cluster and then replaced it with LDAP. Your config looks kind of
strange. I would start it over from scratch :-)
Best wishes.
More information about the Linux-HA
mailing list