[Linux-HA] pingd does not work for connection loss
Andrew Beekhof
beekhof at gmail.com
Wed Aug 9 06:29:45 MDT 2006
I'll know more once you sort out the logging.
Its not 100% obvious, but you're attaching the wrong files :-)
If you specify: use_logd on
Then these entries have no effect:
logfile /var/log/ha_log
debugfile /var/log/ha_dlog
However because ha.cf is parsed sequentially, you'll still get *some*
logs in those files before it switches over to ha_logd. Hence the
confusion.
http://linux-ha.org/ha.cf#use_logd - you've got case 1b.
Quote: If the logging daemon is used, logfile/debugfile/logfacility in
this file are not meaningful any longer. You should check the config
file for logging daemon (the default is /etc/logd.cf).
On 8/7/06, Yann Zurcher <yann.zurcher at gmail.com> wrote:
> Hi,
> 1. Ok, the pingnode was wrong, but this wasn't the problem... I changed it
> in between [192.168.1.1].
> 2. So I comment out the line hacluster /usr/lib/heartbeat/pingd -m 100 -d 5s
> in the ha.cf file
> 3. Do I have to configure something else than crm=yes in the ha.cf and all
> the definitions in the cib.xml file? Is there a separate CRM logfile?
> 4. I increased it to 20s
>
> Result:
> - pingd is no longer started as a process (ps -A) --> I restored 2.
> - crm_mon still shows a stopped pingd ressource (clone) as above
> - with ethereal I can see the pings to 192.168.1.1 and the successfull ICMP
> answer (2. step doesn't matter)
> - ha_log file shows me an error which I don't understand
> heartbeat[10374]: 2006/08/07_10:06:58 info: respawn directive:
> hacluster /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd
> heartbeat[10444]: 2006/08/07_10:07:21 info: respawn directive:
> hacluster /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd
> heartbeat[1865]: 2006/08/07_10:07:26 ERROR: socket_resume_io_read:
> unknown recv error, peerpid=1842: Bad file descriptor
> heartbeat[1865]: 2006/08/07_10:07:26 info: Current resources: -R -C
> none
> heartbeat[1865]: 2006/08/07_10:07:26 info: respawn directive: hacluster
> /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd
> -ha_dlog file shows me the same error:
> heartbeat[10444]: 2006/08/07_10:07:21 debug: add_option(autojoin,none)
> heartbeat[10444]: 2006/08/07_10:07:21 debug: add_option(node,
> ha80.netsapiens.com)
> heartbeat[10444]: 2006/08/07_10:07:21 debug: add_option(node,
> ha81.netsapiens.com)
> heartbeat[10444]: 2006/08/07_10:07:21 debug: add_option(ping,
> 192.168.1.1)
> heartbeat[10444]: 2006/08/07_10:07:21 info: respawn directive:
> hacluster /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd
> heartbeat[10444]: 2006/08/07_10:07:21 debug: add_option(use_logd,on)
> heartbeat[1865]: 2006/08/07_10:07:26 ERROR: socket_resume_io_read:
> unknown recv error, peerpid=1842: Bad file descriptor
> heartbeat[1865]: 2006/08/07_10:07:26 info: Current resources: -R -C
> none
> heartbeat[1865]: 2006/08/07_10:07:26 debug: add_option(deadtime,10)
> heartbeat[1865]: 2006/08/07_10:07:26 debug: add_option(deadping,20)
> heartbeat[1865]: 2006/08/07_10:07:26 debug: add_option(initdead,120)
> heartbeat[1865]: 2006/08/07_10:07:26 debug:
> add_option(auto_failback,on)
> heartbeat[1865]: 2006/08/07_10:07:26 debug: add_option(autojoin,none)
> heartbeat[1865]: 2006/08/07_10:07:26 debug: add_option(node,
> ha80.netsapiens.com)
> heartbeat[1865]: 2006/08/07_10:07:26 debug: add_option(node,
> ha81.netsapiens.com)
> heartbeat[1865]: 2006/08/07_10:07:26 debug: add_option(ping,192.168.1.1
> )
> heartbeat[1865]: 2006/08/07_10:07:26 info: respawn directive: hacluster
> /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd
> heartbeat[1865]: 2006/08/07_10:07:26 debug: add_option(use_logd,on)
> pingd[10507]: 2006/08/07_10:07:28 debug:
> crm_set_env_options:utils.cHA_use_logd = on
> attrd[10512]: 2006/08/07_10:07:28 debug:
> crm_set_env_options:utils.cHA_use_logd = on
> cib[10509]: 2006/08/07_10:07:28 debug:
> crm_set_env_options:utils.cHA_use_logd = on
> crmd[10513]: 2006/08/07_10:07:28 debug:
> crm_set_env_options:utils.cHA_use_logd = on
> tengine[12604]: 2006/08/07_10:20:11 debug:
> crm_set_env_options:utils.cHA_use_logd = on
> pengine[12605]: 2006/08/07_10:20:11 debug:
> crm_set_env_options:utils.cHA_use_logd = on
>
> -Failover after deconnect LAN Cable still doesn't work.
>
> Any other ideas...? Thank you verz much...
> Regards,
> Yann
>
>
>
>
>
>
>
>
> On 8/6/06, Andrew Beekhof <beekhof at gmail.com> wrote:
> >
> > So I found a couple of things in the logs.
> > First heartbeat doesn't seem to like your ping-node directive:
> > heartbeat[3013]: 2006/08/03_18:47:33 ERROR: Illegal ping [ping
> > membership] in config file [192.168.1.1.]
> >
> > Secondly, I see:
> > heartbeat[3045]: 2006/08/03_18:48:14 info: respawn directive:
> > hacluster /usr/lib/heartbeat/pingd -m 100 -d 5s
> > So you're actually telling heartbeat to start it, and the CRM to start
> > it... pick just one.
> >
> > The third thing I'm noticing is that there are no log entries from the
> > CRM... very odd.
> >
> > And the fourth thing:
> > <nvpair id="cib-bootstrap-options-default_action_timeout"
> > name="default_action_timeout" value="5s"/>
> >
> > Thats really quite short, you might want to increase it to 20s or 30s.
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
>
> --
> Yann Zurcher, NetSapiens Inc., 3914 Murphy Canyon Rd., San Diego, CA 92130
> Phone:858-764-5213
> Email: yzurcher at netsapiens.com
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list