[Linux-HA] Re: Re:Re:Re:Problems with resources failing over and other little problems

Serge Dubrouski sergeyfd at gmail.com
Tue Sep 5 10:09:38 MDT 2006


On 9/5/06, Chris Gallo <chrisagallo at gmail.com> wrote:
> Alright, here is my new cib file http://isthesuck.com/cib.xml

There is still something wrong with the nodes section. You shouldn't
have 3 nodes there. Probably you need to remove hostcache file and
restart heartbeat.
>
> and my ha.cf has remained the same
> > > Here it is, this is pretty much what was in the walkthrough.
> > > debugfile /var/log/ha-debug
> > > logfile /var/log/ha-log
> > > logfacility syslog
> > > keepalive 2
> > > deadtime 7
> > > warntime 8
> > > initdead 15
> > > baud    19200
> > > serial  /dev/ttyS0      # Linux
> > > bcast   eth1            # Linux
> > > watchdog /dev/watchdog
> > > node    ldap-1.ev1servers.net
> > > node    ldap-2.ev1servers.net
> > > ping 207.218.204.193
> > > crm yes
>

No need for ping here. It's not supported this way in 2.0.x

>
> > > >Third. Set different scores for rsc_location for different nodes. Node
> > > >with the higher score will be primary node.
> > >
> > > Well, I wanted ldap to run on both nodes at once (so the database will
> > > get updated on both) which is why its the same for both nodes. However
> > > for the ip address the primary is 100 and the secondary is 0 so it
> > > would go back to the primary if it comes back up, however this is not
> > > the case.
> >
> >  Take a look at clones: http://www.linux-ha.org/v2/Concepts/Clones
>
> I put that in the cib, however my problem continues. ldap1 starts up
> fine and brings my resources up. But when I bring ldap2 up ldap2 just
> sits there. This is all that ldap2 generates in the logs when it
> starts up.
>
> heartbeat[22466]: 2006/09/05_10:16:13 info: Configuration validated.
> Starting heartbeat 2.0.4
> heartbeat[22467]: 2006/09/05_10:16:13 info: heartbeat: version 2.0.4
> heartbeat[22467]: 2006/09/05_10:16:13 info: Heartbeat generation: 60
> heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_TriggerHandler:
> Added signal manual handler
> heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_TriggerHandler:
> Added signal manual handler
> heartbeat[22467]: 2006/09/05_10:16:13 info: Removing
> /var/run/heartbeat/rsctmp failed, recreating.
> heartbeat[22467]: 2006/09/05_10:16:13 info: glib: Starting serial
> heartbeat on tty /dev/ttyS0 (19200 baud)
> heartbeat[22467]: 2006/09/05_10:16:13 info: glib: UDP Broadcast
> heartbeat started on port 694 (694) interface eth1
> heartbeat[22467]: 2006/09/05_10:16:13 info: glib: UDP Broadcast
> heartbeat closed on port 694 interface eth1 - Status: 1
> heartbeat[22467]: 2006/09/05_10:16:13 info: glib: ping heartbeat started.
> heartbeat[22467]: 2006/09/05_10:16:13 ERROR: Cannot open watchdog
> device: /dev/watchdog
> heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> heartbeat[22467]: 2006/09/05_10:16:13 info: Local status now set to: 'up'
> heartbeat[22467]: 2006/09/05_10:16:13 info: Exiting
> write_hostcachedata process 22477 returned rc 0.
> heartbeat[22467]: 2006/09/05_10:16:14 info: Link
> ldap-1.ev1servers.net:/dev/ttyS0 up.
> heartbeat[22467]: 2006/09/05_10:16:14 info: Status update for node
> ldap-1.ev1servers.net: status active
> heartbeat[22467]: 2006/09/05_10:16:15 info: Link ldap-1.ev1servers.net:eth1 up.
> heartbeat[22467]: 2006/09/05_10:16:15 info: Link
> 207.218.204.193:207.218.204.193 up.
> heartbeat[22467]: 2006/09/05_10:16:15 info: Status update for node
> 207.218.204.193: status ping
> heartbeat[22467]: 2006/09/05_10:16:15 info: Link ldap-2.ev1servers.net:eth1 up.
>
> and then it just waits for ldap1 to die or lose connection. My main
> problem is why doesnt ldap2 start up anything or read its config like
> ldap1 does? One thing I have noticed is that when both nodes are up
> and have been up for a while, the cib.xml files still shows
> num_peers=1, shouldnt this be 2?

There were some problems with Serial connections beween HA nodes. I
personally never used it. Could you swithc to UDP, just for testings?

>
>
>
> Another concern, although not quite as important, is how would I go
> about decreasing the time between these 2 log entries.
> crmd[22491]: 2006/09/05_10:41:25 info: mask(utils.c:crm_timer_popped):
> Wait Timer (I_NULL) just popped!
> crmd[22491]: 2006/09/05_10:42:25 info: mask(utils.c:crm_timer_popped):
> Election Trigger (I_DC_TIMEOUT) just popped!
>
> When the node starts up it waits for 60s after starting the HA
> services, and then starting my services. Can't seem to find anything
> on decreasing this time, is it possible?

No way for that.

>
>
> I really appreciate all the help so far. I feel that I have this
> ALMOST working..so close.
>
> -Chris
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list