[Linux-HA] Re: Re:Re:Re:Problems with resources failing over and other little problems

Serge Dubrouski sergeyfd at gmail.com
Tue Sep 5 13:43:29 MDT 2006


On 9/5/06, Chris Gallo <chrisagallo at gmail.com> wrote:
> On 9/5/06, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> > On 9/5/06, Chris Gallo <chrisagallo at gmail.com> wrote:
> > > Alright, here is my new cib file http://isthesuck.com/cib.xml
> >
> > There is still something wrong with the nodes section. You shouldn't
> > have 3 nodes there. Probably you need to remove hostcache file and
> > restart heartbeat.
>
> Yep, that fixed it, the hostcache was different on the 2 machines,
> removing both hostcache files seems to have worked.
>
> > >
> > > and my ha.cf has remained the same
> > > > > Here it is, this is pretty much what was in the walkthrough.
> > > > > debugfile /var/log/ha-debug
> > > > > logfile /var/log/ha-log
> > > > > logfacility syslog
> > > > > keepalive 2
> > > > > deadtime 7
> > > > > warntime 8
> > > > > initdead 15
> mcast eth1 225.0.0.1 694 1 0
> udpport 694
> > > > > watchdog /dev/watchdog
> > > > > node    ldap-1.ev1servers.net
> > > > > node    ldap-2.ev1servers.net
> > > > > crm yes
> > >
> >
> > No need for ping here. It's not supported this way in 2.0.x
>
> What would be a better way to make sure I have internet connectivity?
> The guides are a little unclear on where 1.X and 2.X end and begin.

Take a look at this: http://www.linux-ha.org/v2/faq/pingd though I've
never used it myself.

>
> >
> > >
> > > > > >Third. Set different scores for rsc_location for different nodes. Node
> > > > > >with the higher score will be primary node.
> > > > >
> > > > > Well, I wanted ldap to run on both nodes at once (so the database will
> > > > > get updated on both) which is why its the same for both nodes. However
> > > > > for the ip address the primary is 100 and the secondary is 0 so it
> > > > > would go back to the primary if it comes back up, however this is not
> > > > > the case.
> > > >
> > > >  Take a look at clones: http://www.linux-ha.org/v2/Concepts/Clones
> > >
> > > I put that in the cib, however my problem continues. ldap1 starts up
> > > fine and brings my resources up. But when I bring ldap2 up ldap2 just
> > > sits there. This is all that ldap2 generates in the logs when it
> > > starts up.
> > >
> > > heartbeat[22466]: 2006/09/05_10:16:13 info: Configuration validated.
> > > Starting heartbeat 2.0.4
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: heartbeat: version 2.0.4
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: Heartbeat generation: 60
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_TriggerHandler:
> > > Added signal manual handler
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_TriggerHandler:
> > > Added signal manual handler
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: Removing
> > > /var/run/heartbeat/rsctmp failed, recreating.
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: glib: Starting serial
> > > heartbeat on tty /dev/ttyS0 (19200 baud)
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: glib: UDP Broadcast
> > > heartbeat started on port 694 (694) interface eth1
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: glib: UDP Broadcast
> > > heartbeat closed on port 694 interface eth1 - Status: 1
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: glib: ping heartbeat started.
> > > heartbeat[22467]: 2006/09/05_10:16:13 ERROR: Cannot open watchdog
> > > device: /dev/watchdog
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_SignalHandler:
> > > Added signal handler for signal 17
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: Local status now set to: 'up'
> > > heartbeat[22467]: 2006/09/05_10:16:13 info: Exiting
> > > write_hostcachedata process 22477 returned rc 0.
> > > heartbeat[22467]: 2006/09/05_10:16:14 info: Link
> > > ldap-1.ev1servers.net:/dev/ttyS0 up.
> > > heartbeat[22467]: 2006/09/05_10:16:14 info: Status update for node
> > > ldap-1.ev1servers.net: status active
> > > heartbeat[22467]: 2006/09/05_10:16:15 info: Link ldap-1.ev1servers.net:eth1 up.
> > > heartbeat[22467]: 2006/09/05_10:16:15 info: Link
> > > 207.218.204.193:207.218.204.193 up.
> > > heartbeat[22467]: 2006/09/05_10:16:15 info: Status update for node
> > > 207.218.204.193: status ping
> > > heartbeat[22467]: 2006/09/05_10:16:15 info: Link ldap-2.ev1servers.net:eth1 up.
> > >
> > > and then it just waits for ldap1 to die or lose connection. My main
> > > problem is why doesnt ldap2 start up anything or read its config like
> > > ldap1 does? One thing I have noticed is that when both nodes are up
> > > and have been up for a while, the cib.xml files still shows
> > > num_peers=1, shouldnt this be 2?
> >
> > There were some problems with Serial connections beween HA nodes. I
> > personally never used it. Could you swithc to UDP, just for testings?
>
> Doing that did fix the problem. I updated the ha.cf above. Also I
> noticed having bcast and mcast on at the same time doesnt work, is
> this expected? Having just mcast though seems to work fine so no
> worries.

Sure enough that if you have 2 communication methods configure (MCAST
and BCAST) only one would work. Don't know which one.

>
>
> > >
> > >
> > >
> > > Another concern, although not quite as important, is how would I go
> > > about decreasing the time between these 2 log entries.
> > > crmd[22491]: 2006/09/05_10:41:25 info: mask(utils.c:crm_timer_popped):
> > > Wait Timer (I_NULL) just popped!
> > > crmd[22491]: 2006/09/05_10:42:25 info: mask(utils.c:crm_timer_popped):
> > > Election Trigger (I_DC_TIMEOUT) just popped!
> > >
> > > When the node starts up it waits for 60s after starting the HA
> > > services, and then starting my services. Can't seem to find anything
> > > on decreasing this time, is it possible?
> >
> > No way for that.
>
> Well, now that I got the nodes to talk to each other when they are
> both up this isnt so much of a problem, so no worries here either.
>
> So right now I am stress testing it to hell and back to make sure
> everything works like I expect. So you might be hearing back from me
> with more questions :)
>

Congratulations!

> Thanks for all the help.


More information about the Linux-HA mailing list