[Linux-HA] Re:Re:Problems with resources failing over and other little problems.

Andrew Beekhof beekhof at gmail.com
Mon Sep 4 03:52:16 MDT 2006


On 9/3/06, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> On 9/2/06, Chris Gallo <chrisagallo at gmail.com> wrote:
> > > First, my setup.
> > > I have 2 servers both running HA 2.0.6., running RHE4u3 on an i386 (intel)
> > > I'm using an ocf version of an ldap script that I hacked together to
> > > conform with http://www.linux-ha.org/OCFResourceAgent
> > >
> > > The thing that I really need working is if my resource fails, to fail
> > > over to the other node, which currently does not happen. Heartbeat
> > > knows when my resource goes down and is able to restart it just fine
> > > on the node which is good. However if Heartbeat fails to restart my
> > > resource (it fails to start because I broke the config file) it just
> > > sits there. Both nodes do nothing.
> > >
> > > Here is my cib.xml file which I hope will be helpful. I am pretty new
> > > to heartbeat so I don't know all the fancy stuff yet. This was pretty
> > > much peiced together from the haresources2cib.py and the getting
> > > started guides.
> > >
> > > =============================================
> > >  <cib generated="true" admin_epoch="0" have_quorum="true"
> > > num_peers="1" ccm_transition="1" cib_feature_revision="1.2"
> > > crm_feature_set="1.0.4" debug_source="create_node_entry"
> > > dc_uuid="3afea6dc-2c7b-480b-85f4-44e080dffff2" last_written="Fri Sep
> > > 1 14:38:09 2006" epoch="63" num_updates="766">
> > >   <configuration>
> > >     <crm_config>
> > >       <nvpair id="default_resource_stickiness"
> > > name="default_resource_stickiness" value="100"/>
> > >       <nvpair id="transition_timeout" name="transition_timeout" value="10s"/>
> > >       <nvpair id="transition_idle_timeout"
> > > name="transition_idle_timeout" value="20s"/>
> > >       <nvpair id="stonith_enabled" name="stonith_enabled" value="false"/>
> > >       <nvpair id="symmetric_cluster" name="symmetric_cluster" value="false"/>
> > >       <nvpair id="no_quorum_policy" name="no_quorum_policy" value="ignore"/>
> > >     </crm_config>
> > >     <nodes>
> > >       <node id="3afea6dc-2c7b-480b-85f4-44e080dffff2" uname="ldap-1"
> > > type="normal"/>
> > >       <node id="daedf2f2-95df-457d-a534-c53787dc8eb0" uname="ldap-2"
> > > type="normal"/>
> > >       <node id="03c973d1-2652-41a3-87f8-1ad6f23a5e3c" uname="ldap-2"
> > > type="normal"/>
> > >     </nodes>
> > >     <resources>
> > >       <primitive id="ip_resource_1" class="ocf" type="IPaddr"
> > > provider="heartbeat" resource_stickiness="0">
> > >         <instance_attributes>
> > >           <attributes>
> > >             <nvpair name="ip" value="207.218.204.194" id="floating_ip"/>
> > >           </attributes>
> > >         </instance_attributes>
> > >       </primitive>
> > >       <primitive id="ldap" class="ocf" type="ldap" provider="ldap"
> > > resource_stickiness="0" multiple_active="block">
> > >         <instance_attributes id="resource_ldap_instance_attrs">
> > >           <attributes>
> > >             <nvpair id="resource_apache_target_role"
> > > name="target_role" value="started"/>
> > >           </attributes>
> > >         </instance_attributes>
> > >         <operations>
> > >           <op id="1" name="stop" timeout="8s"/>
> > >           <op id="2" name="start" timeout="9s" on_fail="fence"/>
> > >           <op id="3" name="monitor" interval="15s" timeout="4s"/>
> > >         </operations>
> > >       </primitive>
> > >     </resources>
> > >     <constraints>
> > >       <rsc_location id="run_ldap" rsc="ldap">
> > >         <rule id="pref_run_ldap_on_ldap1" score="100">
> > >           <expression attribute="#uname" operation="eq"
> > > value="ldap-1" id="a8dc2d64-c848-42c4-baf5-3c2ec79323f6"/>
> > >         </rule>
> > >       </rsc_location>
> > >       <rsc_location id="run_ldap2" rsc="ldap">
> > >         <rule id="pref_run_ldap_on_ldap2" score="100">
> > >           <expression attribute="#uname" operation="eq"
> > > value="ldap-2" id="35b13bff-cd78-481f-97ef-27053ac8aa4b"/>
> > >         </rule>
> > >       </rsc_location>
> > >       <rsc_location id="run_ip_resource_1" rsc="ip_resource_1">
> > >         <rule id="pref_run_on_ldap1" score="100">
> > >           <expression attribute="#uname" operation="eq"
> > > value="ldap-1" id="6aa68f52-2eb5-430b-910b-435ffefb55bf"/>
> > >         </rule>
> > >         <rule id="pref_run_on_ldap2" score="0">
> > >           <expression attribute="#uname" operation="eq"
> > > value="ldap-2" id="578d2e59-3508-4c40-b932-9a5ba9117265"/>
> > >         </rule>
> > >       </rsc_location>
> > >     </constraints>
> > >   </configuration>
> > >  </cib>
> > > ======================================
> > >
> > > I dont know why there are 2 nodes for ldap2,  those get generated automatically.
> > > Also, I have to start one server, wait for it to get all the resources
> > > up, then start the other. If I start them both at the same time they
> > > both kinda think the other is in charge and don't do anything.
> > >
> > > Now, what I'm aiming for here is to have the IP be failed over if ldap
> > > on ldap1 fails (or ldap1 fails in general) but have ldap always
> > > running on both servers. Right now when I start the second server it
> > > doesnt start ldap like I think it should.
> > >
> > > Here is the log for ldap1, from starting up by itself to ldap2 coming online.
> > > http://isthesuck.com/ha-log
> > > And here is the log for ldap2 coming online while ldap1 is already up
> > > http://isthesuck.com/ha-log2
> > > And here is the log for when I stop ldap, it gets restarted, then I
> > > make some bad changes to the ldap conf, stop ldap and heartbeat fails
> > > to either restart, or fail it over.
> > > http://isthesuck.com/ha-log3
> > >
> > > I hope that is enough information. I've been muddling through this
> > > problem for about a week now and getting no luck with this part :( and
> > > any help would really be appreciated.
> > >
> > > I have been all over the http://www.linux-ha.org website but I just
> > > can't seem to figure out how to get heartbeat to act like I need it
> > > to.
> > >
> > > -Thanks
> >
> > >First it would be good to tale a look at your ha.cf file to figure out
> > >why you have ldap-2 twice in the nodes section. I think the problem is
> > >there.
> >
> > Here it is, this is pretty much what was in the walkthrough.
> > debugfile /var/log/ha-debug
> > logfile /var/log/ha-log
> > logfacility syslog
> > keepalive 2
> > deadtime 7
> > warntime 8
> > initdead 15
> > baud    19200
> > serial  /dev/ttyS0      # Linux
> > bcast   eth1            # Linux
> > watchdog /dev/watchdog
> > node    ldap-1.ev1servers.net
> > node    ldap-2.ev1servers.net
>
> Really strange. Names of the nodes in the cib.xml and in ha.cf are
> different. Try to stop cluster, remove nodes section from it and
> replace it with <nodes/>, remove *sig files and start heartbeat again.

heartbeat's hostcache files can also be a factor here

>
> > ping 207.218.204.193
> > crm yes
> >
> >
> >
> > >Second. Set "symmetric_cluster" to true. You want both nodes to be
> > >able to run your resources, right?
> >
> > Yes, I think I was under the impression that if it was symmetric the
> > resource would never go back to the prefered node. Anyway, I set that
> > to true. However, ldap2 will not start up the ldap service if the
> > other node is already up. Its kinda of like when ldap2 sees another HA
> > node online it just stops and waits for it to fail.
> >
> > >Third. Set different scores for rsc_location for different nodes. Node
> > >with the higher score will be primary node.
> >
> > Well, I wanted ldap to run on both nodes at once (so the database will
> > get updated on both) which is why its the same for both nodes. However
> > for the ip address the primary is 100 and the secondary is 0 so it
> > would go back to the primary if it comes back up, however this is not
> > the case.
>
>  Take a look at clones: http://www.linux-ha.org/v2/Concepts/Clones
>
> >
> > >Fourth. I would combine you IP and LDAP resources into a group. It
> > >doesn't make sense to fail LDAP over without failing IP. In your
> > >current configuration if LDAP fails but IP doesn't only LDAP will be
> > >failed over.
> >
> > Ideall ldap should be running on both, so I don't really want it
> > failed over. However nothing gets failed over when ldap isnt running.
> > But if the whole node goes down both ldap and the ip come up on the
> > other node. However this only happens when the whole node dies/ goes
> > offline.
>
> Clones again.
>
> >
> > >Fifth. It looks like you messed up with Apache as a resource in you
> > >cluster and then replaced it with LDAP. Your config looks kind of
> > >strange. I would start it over from scratch :-)
> >
> > Well, I pretty much just copied the active/active example from the
> > website which uses apache.  Are there any other examples you know of
> > that I could learn from perhaps?
> >
> > >Best wishes.
> >
> > Thanks for the help so far.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list