[Linux-HA] Re:Re:Problems with resources failing over and other
little problems.
Chris Gallo
chrisagallo at gmail.com
Sat Sep 2 18:28:43 MDT 2006
> First, my setup.
> I have 2 servers both running HA 2.0.6., running RHE4u3 on an i386 (intel)
> I'm using an ocf version of an ldap script that I hacked together to
> conform with http://www.linux-ha.org/OCFResourceAgent
>
> The thing that I really need working is if my resource fails, to fail
> over to the other node, which currently does not happen. Heartbeat
> knows when my resource goes down and is able to restart it just fine
> on the node which is good. However if Heartbeat fails to restart my
> resource (it fails to start because I broke the config file) it just
> sits there. Both nodes do nothing.
>
> Here is my cib.xml file which I hope will be helpful. I am pretty new
> to heartbeat so I don't know all the fancy stuff yet. This was pretty
> much peiced together from the haresources2cib.py and the getting
> started guides.
>
> =============================================
> <cib generated="true" admin_epoch="0" have_quorum="true"
> num_peers="1" ccm_transition="1" cib_feature_revision="1.2"
> crm_feature_set="1.0.4" debug_source="create_node_entry"
> dc_uuid="3afea6dc-2c7b-480b-85f4-44e080dffff2" last_written="Fri Sep
> 1 14:38:09 2006" epoch="63" num_updates="766">
> <configuration>
> <crm_config>
> <nvpair id="default_resource_stickiness"
> name="default_resource_stickiness" value="100"/>
> <nvpair id="transition_timeout" name="transition_timeout" value="10s"/>
> <nvpair id="transition_idle_timeout"
> name="transition_idle_timeout" value="20s"/>
> <nvpair id="stonith_enabled" name="stonith_enabled" value="false"/>
> <nvpair id="symmetric_cluster" name="symmetric_cluster" value="false"/>
> <nvpair id="no_quorum_policy" name="no_quorum_policy" value="ignore"/>
> </crm_config>
> <nodes>
> <node id="3afea6dc-2c7b-480b-85f4-44e080dffff2" uname="ldap-1"
> type="normal"/>
> <node id="daedf2f2-95df-457d-a534-c53787dc8eb0" uname="ldap-2"
> type="normal"/>
> <node id="03c973d1-2652-41a3-87f8-1ad6f23a5e3c" uname="ldap-2"
> type="normal"/>
> </nodes>
> <resources>
> <primitive id="ip_resource_1" class="ocf" type="IPaddr"
> provider="heartbeat" resource_stickiness="0">
> <instance_attributes>
> <attributes>
> <nvpair name="ip" value="207.218.204.194" id="floating_ip"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive id="ldap" class="ocf" type="ldap" provider="ldap"
> resource_stickiness="0" multiple_active="block">
> <instance_attributes id="resource_ldap_instance_attrs">
> <attributes>
> <nvpair id="resource_apache_target_role"
> name="target_role" value="started"/>
> </attributes>
> </instance_attributes>
> <operations>
> <op id="1" name="stop" timeout="8s"/>
> <op id="2" name="start" timeout="9s" on_fail="fence"/>
> <op id="3" name="monitor" interval="15s" timeout="4s"/>
> </operations>
> </primitive>
> </resources>
> <constraints>
> <rsc_location id="run_ldap" rsc="ldap">
> <rule id="pref_run_ldap_on_ldap1" score="100">
> <expression attribute="#uname" operation="eq"
> value="ldap-1" id="a8dc2d64-c848-42c4-baf5-3c2ec79323f6"/>
> </rule>
> </rsc_location>
> <rsc_location id="run_ldap2" rsc="ldap">
> <rule id="pref_run_ldap_on_ldap2" score="100">
> <expression attribute="#uname" operation="eq"
> value="ldap-2" id="35b13bff-cd78-481f-97ef-27053ac8aa4b"/>
> </rule>
> </rsc_location>
> <rsc_location id="run_ip_resource_1" rsc="ip_resource_1">
> <rule id="pref_run_on_ldap1" score="100">
> <expression attribute="#uname" operation="eq"
> value="ldap-1" id="6aa68f52-2eb5-430b-910b-435ffefb55bf"/>
> </rule>
> <rule id="pref_run_on_ldap2" score="0">
> <expression attribute="#uname" operation="eq"
> value="ldap-2" id="578d2e59-3508-4c40-b932-9a5ba9117265"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
> ======================================
>
> I dont know why there are 2 nodes for ldap2, those get generated automatically.
> Also, I have to start one server, wait for it to get all the resources
> up, then start the other. If I start them both at the same time they
> both kinda think the other is in charge and don't do anything.
>
> Now, what I'm aiming for here is to have the IP be failed over if ldap
> on ldap1 fails (or ldap1 fails in general) but have ldap always
> running on both servers. Right now when I start the second server it
> doesnt start ldap like I think it should.
>
> Here is the log for ldap1, from starting up by itself to ldap2 coming online.
> http://isthesuck.com/ha-log
> And here is the log for ldap2 coming online while ldap1 is already up
> http://isthesuck.com/ha-log2
> And here is the log for when I stop ldap, it gets restarted, then I
> make some bad changes to the ldap conf, stop ldap and heartbeat fails
> to either restart, or fail it over.
> http://isthesuck.com/ha-log3
>
> I hope that is enough information. I've been muddling through this
> problem for about a week now and getting no luck with this part :( and
> any help would really be appreciated.
>
> I have been all over the http://www.linux-ha.org website but I just
> can't seem to figure out how to get heartbeat to act like I need it
> to.
>
> -Thanks
>First it would be good to tale a look at your ha.cf file to figure out
>why you have ldap-2 twice in the nodes section. I think the problem is
>there.
Here it is, this is pretty much what was in the walkthrough.
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility syslog
keepalive 2
deadtime 7
warntime 8
initdead 15
baud 19200
serial /dev/ttyS0 # Linux
bcast eth1 # Linux
watchdog /dev/watchdog
node ldap-1.ev1servers.net
node ldap-2.ev1servers.net
ping 207.218.204.193
crm yes
>Second. Set "symmetric_cluster" to true. You want both nodes to be
>able to run your resources, right?
Yes, I think I was under the impression that if it was symmetric the
resource would never go back to the prefered node. Anyway, I set that
to true. However, ldap2 will not start up the ldap service if the
other node is already up. Its kinda of like when ldap2 sees another HA
node online it just stops and waits for it to fail.
>Third. Set different scores for rsc_location for different nodes. Node
>with the higher score will be primary node.
Well, I wanted ldap to run on both nodes at once (so the database will
get updated on both) which is why its the same for both nodes. However
for the ip address the primary is 100 and the secondary is 0 so it
would go back to the primary if it comes back up, however this is not
the case.
>Fourth. I would combine you IP and LDAP resources into a group. It
>doesn't make sense to fail LDAP over without failing IP. In your
>current configuration if LDAP fails but IP doesn't only LDAP will be
>failed over.
Ideall ldap should be running on both, so I don't really want it
failed over. However nothing gets failed over when ldap isnt running.
But if the whole node goes down both ldap and the ip come up on the
other node. However this only happens when the whole node dies/ goes
offline.
>Fifth. It looks like you messed up with Apache as a resource in you
>cluster and then replaced it with LDAP. Your config looks kind of
>strange. I would start it over from scratch :-)
Well, I pretty much just copied the active/active example from the
website which uses apache. Are there any other examples you know of
that I could learn from perhaps?
>Best wishes.
Thanks for the help so far.
More information about the Linux-HA
mailing list