[Linux-HA] Re:Re:Problems with resources failing over and other little problems.

Serge Dubrouski sergeyfd at gmail.com
Sat Sep 2 19:45:12 MDT 2006


On 9/2/06, Chris Gallo <chrisagallo at gmail.com> wrote:
> > First, my setup.
> > I have 2 servers both running HA 2.0.6., running RHE4u3 on an i386 (intel)
> > I'm using an ocf version of an ldap script that I hacked together to
> > conform with http://www.linux-ha.org/OCFResourceAgent
> >
> > The thing that I really need working is if my resource fails, to fail
> > over to the other node, which currently does not happen. Heartbeat
> > knows when my resource goes down and is able to restart it just fine
> > on the node which is good. However if Heartbeat fails to restart my
> > resource (it fails to start because I broke the config file) it just
> > sits there. Both nodes do nothing.
> >
> > Here is my cib.xml file which I hope will be helpful. I am pretty new
> > to heartbeat so I don't know all the fancy stuff yet. This was pretty
> > much peiced together from the haresources2cib.py and the getting
> > started guides.
> >
> > =============================================
> >  <cib generated="true" admin_epoch="0" have_quorum="true"
> > num_peers="1" ccm_transition="1" cib_feature_revision="1.2"
> > crm_feature_set="1.0.4" debug_source="create_node_entry"
> > dc_uuid="3afea6dc-2c7b-480b-85f4-44e080dffff2" last_written="Fri Sep
> > 1 14:38:09 2006" epoch="63" num_updates="766">
> >   <configuration>
> >     <crm_config>
> >       <nvpair id="default_resource_stickiness"
> > name="default_resource_stickiness" value="100"/>
> >       <nvpair id="transition_timeout" name="transition_timeout" value="10s"/>
> >       <nvpair id="transition_idle_timeout"
> > name="transition_idle_timeout" value="20s"/>
> >       <nvpair id="stonith_enabled" name="stonith_enabled" value="false"/>
> >       <nvpair id="symmetric_cluster" name="symmetric_cluster" value="false"/>
> >       <nvpair id="no_quorum_policy" name="no_quorum_policy" value="ignore"/>
> >     </crm_config>
> >     <nodes>
> >       <node id="3afea6dc-2c7b-480b-85f4-44e080dffff2" uname="ldap-1"
> > type="normal"/>
> >       <node id="daedf2f2-95df-457d-a534-c53787dc8eb0" uname="ldap-2"
> > type="normal"/>
> >       <node id="03c973d1-2652-41a3-87f8-1ad6f23a5e3c" uname="ldap-2"
> > type="normal"/>
> >     </nodes>
> >     <resources>
> >       <primitive id="ip_resource_1" class="ocf" type="IPaddr"
> > provider="heartbeat" resource_stickiness="0">
> >         <instance_attributes>
> >           <attributes>
> >             <nvpair name="ip" value="207.218.204.194" id="floating_ip"/>
> >           </attributes>
> >         </instance_attributes>
> >       </primitive>
> >       <primitive id="ldap" class="ocf" type="ldap" provider="ldap"
> > resource_stickiness="0" multiple_active="block">
> >         <instance_attributes id="resource_ldap_instance_attrs">
> >           <attributes>
> >             <nvpair id="resource_apache_target_role"
> > name="target_role" value="started"/>
> >           </attributes>
> >         </instance_attributes>
> >         <operations>
> >           <op id="1" name="stop" timeout="8s"/>
> >           <op id="2" name="start" timeout="9s" on_fail="fence"/>
> >           <op id="3" name="monitor" interval="15s" timeout="4s"/>
> >         </operations>
> >       </primitive>
> >     </resources>
> >     <constraints>
> >       <rsc_location id="run_ldap" rsc="ldap">
> >         <rule id="pref_run_ldap_on_ldap1" score="100">
> >           <expression attribute="#uname" operation="eq"
> > value="ldap-1" id="a8dc2d64-c848-42c4-baf5-3c2ec79323f6"/>
> >         </rule>
> >       </rsc_location>
> >       <rsc_location id="run_ldap2" rsc="ldap">
> >         <rule id="pref_run_ldap_on_ldap2" score="100">
> >           <expression attribute="#uname" operation="eq"
> > value="ldap-2" id="35b13bff-cd78-481f-97ef-27053ac8aa4b"/>
> >         </rule>
> >       </rsc_location>
> >       <rsc_location id="run_ip_resource_1" rsc="ip_resource_1">
> >         <rule id="pref_run_on_ldap1" score="100">
> >           <expression attribute="#uname" operation="eq"
> > value="ldap-1" id="6aa68f52-2eb5-430b-910b-435ffefb55bf"/>
> >         </rule>
> >         <rule id="pref_run_on_ldap2" score="0">
> >           <expression attribute="#uname" operation="eq"
> > value="ldap-2" id="578d2e59-3508-4c40-b932-9a5ba9117265"/>
> >         </rule>
> >       </rsc_location>
> >     </constraints>
> >   </configuration>
> >  </cib>
> > ======================================
> >
> > I dont know why there are 2 nodes for ldap2,  those get generated automatically.
> > Also, I have to start one server, wait for it to get all the resources
> > up, then start the other. If I start them both at the same time they
> > both kinda think the other is in charge and don't do anything.
> >
> > Now, what I'm aiming for here is to have the IP be failed over if ldap
> > on ldap1 fails (or ldap1 fails in general) but have ldap always
> > running on both servers. Right now when I start the second server it
> > doesnt start ldap like I think it should.
> >
> > Here is the log for ldap1, from starting up by itself to ldap2 coming online.
> > http://isthesuck.com/ha-log
> > And here is the log for ldap2 coming online while ldap1 is already up
> > http://isthesuck.com/ha-log2
> > And here is the log for when I stop ldap, it gets restarted, then I
> > make some bad changes to the ldap conf, stop ldap and heartbeat fails
> > to either restart, or fail it over.
> > http://isthesuck.com/ha-log3
> >
> > I hope that is enough information. I've been muddling through this
> > problem for about a week now and getting no luck with this part :( and
> > any help would really be appreciated.
> >
> > I have been all over the http://www.linux-ha.org website but I just
> > can't seem to figure out how to get heartbeat to act like I need it
> > to.
> >
> > -Thanks
>
> >First it would be good to tale a look at your ha.cf file to figure out
> >why you have ldap-2 twice in the nodes section. I think the problem is
> >there.
>
> Here it is, this is pretty much what was in the walkthrough.
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility syslog
> keepalive 2
> deadtime 7
> warntime 8
> initdead 15
> baud    19200
> serial  /dev/ttyS0      # Linux
> bcast   eth1            # Linux
> watchdog /dev/watchdog
> node    ldap-1.ev1servers.net
> node    ldap-2.ev1servers.net

Really strange. Names of the nodes in the cib.xml and in ha.cf are
different. Try to stop cluster, remove nodes section from it and
replace it with <nodes/>, remove *sig files and start heartbeat again.

> ping 207.218.204.193
> crm yes
>
>
>
> >Second. Set "symmetric_cluster" to true. You want both nodes to be
> >able to run your resources, right?
>
> Yes, I think I was under the impression that if it was symmetric the
> resource would never go back to the prefered node. Anyway, I set that
> to true. However, ldap2 will not start up the ldap service if the
> other node is already up. Its kinda of like when ldap2 sees another HA
> node online it just stops and waits for it to fail.
>
> >Third. Set different scores for rsc_location for different nodes. Node
> >with the higher score will be primary node.
>
> Well, I wanted ldap to run on both nodes at once (so the database will
> get updated on both) which is why its the same for both nodes. However
> for the ip address the primary is 100 and the secondary is 0 so it
> would go back to the primary if it comes back up, however this is not
> the case.

 Take a look at clones: http://www.linux-ha.org/v2/Concepts/Clones

>
> >Fourth. I would combine you IP and LDAP resources into a group. It
> >doesn't make sense to fail LDAP over without failing IP. In your
> >current configuration if LDAP fails but IP doesn't only LDAP will be
> >failed over.
>
> Ideall ldap should be running on both, so I don't really want it
> failed over. However nothing gets failed over when ldap isnt running.
> But if the whole node goes down both ldap and the ip come up on the
> other node. However this only happens when the whole node dies/ goes
> offline.

Clones again.

>
> >Fifth. It looks like you messed up with Apache as a resource in you
> >cluster and then replaced it with LDAP. Your config looks kind of
> >strange. I would start it over from scratch :-)
>
> Well, I pretty much just copied the active/active example from the
> website which uses apache.  Are there any other examples you know of
> that I could learn from perhaps?
>
> >Best wishes.
>
> Thanks for the help so far.


More information about the Linux-HA mailing list