[Linux-HA] Re: Re:Re:Re:Problems with resources failing over and other little problems

Andrew Beekhof beekhof at gmail.com
Wed Sep 6 02:40:58 MDT 2006


On 9/5/06, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> On 9/5/06, Chris Gallo <chrisagallo at gmail.com> wrote:
> > Alright, here is my new cib file http://isthesuck.com/cib.xml
>
> There is still something wrong with the nodes section. You shouldn't
> have 3 nodes there. Probably you need to remove hostcache file and
> restart heartbeat.
> >
> > and my ha.cf has remained the same
> > > > Here it is, this is pretty much what was in the walkthrough.
> > > > debugfile /var/log/ha-debug
> > > > logfile /var/log/ha-log
> > > > logfacility syslog
> > > > keepalive 2
> > > > deadtime 7
> > > > warntime 8
> > > > initdead 15
> > > > baud    19200
> > > > serial  /dev/ttyS0      # Linux
> > > > bcast   eth1            # Linux
> > > > watchdog /dev/watchdog
> > > > node    ldap-1.ev1servers.net
> > > > node    ldap-2.ev1servers.net
> > > > ping 207.218.204.193
> > > > crm yes
> >
>
> No need for ping here. It's not supported this way in 2.0.x

Actually it is... http://www.linux-ha.org/v2/faq/pingd

>
> >
> > > > >Third. Set different scores for rsc_location for different nodes. Node
> > > > >with the higher score will be primary node.
> > > >
> > > > Well, I wanted ldap to run on both nodes at once (so the database will
> > > > get updated on both) which is why its the same for both nodes. However
> > > > for the ip address the primary is 100 and the secondary is 0 so it
> > > > would go back to the primary if it comes back up, however this is not
> > > > the case.
> > >
> > >  Take a look at clones: http://www.linux-ha.org/v2/Concepts/Clones
> >
> > I put that in the cib, however my problem continues. ldap1 starts up
> > fine and brings my resources up. But when I bring ldap2 up ldap2 just
> > sits there. This is all that ldap2 generates in the logs when it
> > starts up.
> >
> > heartbeat[22466]: 2006/09/05_10:16:13 info: Configuration validated.
> > Starting heartbeat 2.0.4
> > heartbeat[22467]: 2006/09/05_10:16:13 info: heartbeat: version 2.0.4
> > heartbeat[22467]: 2006/09/05_10:16:13 info: Heartbeat generation: 60
> > heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_TriggerHandler:
> > Added signal manual handler
> > heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_TriggerHandler:
> > Added signal manual handler
> > heartbeat[22467]: 2006/09/05_10:16:13 info: Removing
> > /var/run/heartbeat/rsctmp failed, recreating.
> > heartbeat[22467]: 2006/09/05_10:16:13 info: glib: Starting serial
> > heartbeat on tty /dev/ttyS0 (19200 baud)
> > heartbeat[22467]: 2006/09/05_10:16:13 info: glib: UDP Broadcast
> > heartbeat started on port 694 (694) interface eth1
> > heartbeat[22467]: 2006/09/05_10:16:13 info: glib: UDP Broadcast
> > heartbeat closed on port 694 interface eth1 - Status: 1
> > heartbeat[22467]: 2006/09/05_10:16:13 info: glib: ping heartbeat started.
> > heartbeat[22467]: 2006/09/05_10:16:13 ERROR: Cannot open watchdog
> > device: /dev/watchdog
> > heartbeat[22467]: 2006/09/05_10:16:13 info: G_main_add_SignalHandler:
> > Added signal handler for signal 17
> > heartbeat[22467]: 2006/09/05_10:16:13 info: Local status now set to: 'up'
> > heartbeat[22467]: 2006/09/05_10:16:13 info: Exiting
> > write_hostcachedata process 22477 returned rc 0.
> > heartbeat[22467]: 2006/09/05_10:16:14 info: Link
> > ldap-1.ev1servers.net:/dev/ttyS0 up.
> > heartbeat[22467]: 2006/09/05_10:16:14 info: Status update for node
> > ldap-1.ev1servers.net: status active
> > heartbeat[22467]: 2006/09/05_10:16:15 info: Link ldap-1.ev1servers.net:eth1 up.
> > heartbeat[22467]: 2006/09/05_10:16:15 info: Link
> > 207.218.204.193:207.218.204.193 up.
> > heartbeat[22467]: 2006/09/05_10:16:15 info: Status update for node
> > 207.218.204.193: status ping
> > heartbeat[22467]: 2006/09/05_10:16:15 info: Link ldap-2.ev1servers.net:eth1 up.
> >
> > and then it just waits for ldap1 to die or lose connection. My main
> > problem is why doesnt ldap2 start up anything or read its config like
> > ldap1 does? One thing I have noticed is that when both nodes are up
> > and have been up for a while, the cib.xml files still shows
> > num_peers=1, shouldnt this be 2?
>
> There were some problems with Serial connections beween HA nodes. I
> personally never used it. Could you swithc to UDP, just for testings?
>
> >
> >
> >
> > Another concern, although not quite as important, is how would I go
> > about decreasing the time between these 2 log entries.
> > crmd[22491]: 2006/09/05_10:41:25 info: mask(utils.c:crm_timer_popped):
> > Wait Timer (I_NULL) just popped!
> > crmd[22491]: 2006/09/05_10:42:25 info: mask(utils.c:crm_timer_popped):
> > Election Trigger (I_DC_TIMEOUT) just popped!
> >
> > When the node starts up it waits for 60s after starting the HA
> > services, and then starting my services. Can't seem to find anything
> > on decreasing this time, is it possible?
>
> No way for that.

With 2.0.7 we support the "dc_deadtime" cluster option which IIRC controls this.

>
> >
> >
> > I really appreciate all the help so far. I feel that I have this
> > ALMOST working..so close.
> >
> > -Chris
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list