[Linux-HA] ERROR: Message hist queue is filling up

Dejan Muhamedagic dejanmm at fastmail.fm
Mon Nov 26 15:13:11 MST 2007


On Sun, Nov 25, 2007 at 02:50:34PM -0500, Scott Mann wrote:
> Hi,
> 
> I started getting this message on 1 system in a 2 node hb
> cluster AFTER installing 2.1.2 via the fc8 rpms (yum install
> heartbeat*, so both heartbeat and heartbeat-devel). I actually
> installed the rpms on two freshly installed FC8 systems. Also
> installed: libnet and glib-devel. I basically did the same
> thing a few weeks ago when these systems were FC7 (but got hb
> 2.0.8 via the rpms).
> 
> I found an earlier email from Alan R regarding this and 2.0.5,
> but could find no resolution. I'm certainly a newbie with this
> product and it may be something I'm doing. I've written an app
> to the API that seems to be working on 2.0.8. It uses
> "azClient" as its "signon" name. The problem didn't appear on
> wiley-coyote until after I'd started the app (although, it
> could be that I simply did not see the messages until after the
> app started). The problem DID NOT and still does not appear on
> the other node, beauregard. I ran the app on it also, and it
> signed on properly, etc.
> 
> Having said all that, when starting heartbeat, here are the messages in the log file:
> 
> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: Version 2 support: no
> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: **************************
> Nov 25 12:31:59 wiley-coyote heartbeat: [26165]: info: Configuration validated. Starting heartbeat 2.1.2
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: heartbeat: version 2.1.2
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Heartbeat generation: 1196015782
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: bound send socket to device: eth0
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: bound receive socket to device: eth0
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: glib: ucast: started on port 694 interface eth0 to 192.168.0.11
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 25 12:31:59 wiley-coyote heartbeat: [26166]: info: Local status now set to: 'up'
> Nov 25 12:32:00 wiley-coyote heartbeat: [26166]: info: Link beauregard:eth0 up.
> Nov 25 12:32:00 wiley-coyote heartbeat: [26166]: info: Status update for node beauregard: status active
> Nov 25 12:32:00 wiley-coyote harc[26173]: info: Running /etc/ha.d/rc.d/status status
> Nov 25 12:33:04 wiley-coyote heartbeat: [26166]: info: all clients are now paused
> Nov 25 12:33:37 wiley-coyote heartbeat: [26166]: ERROR: Message hist queue is filling up (151 messages in queue)
> <above ERROR message continues to repeat>
> 
> It is also worth noting that when I execute "cl_status nodestatus wiley-coyote" on wiley-coyote I get:
> 
> cl_status[26192]: 2007/11/25_12:33:22 ERROR: Cannot signon with heartbeat
> cl_status[26192]: 2007/11/25_12:33:22 ERROR: REASON: hb_api_signon: Can't initiate connection  to heartbeat

Strange case. Did you check permissions? HB clients connect
typically through /var/run/heartbeat/register, but that's a 
unix domain socket and dynamically created. Anyway, perhaps it
would be worth comparing permissions on both systems.

> which seems to indicate a problem with the socket? Or pipe?
> BTW, this command works correctly on beauregard, returning
> "alive" for beauregard and "dead" for wiley-coyote.

Can you try strace on cl_status too?

Thanks,

Dejan


> 
> Anyway, please point me to whatever you think appropriate for
> me to look at (especially source as I'd like to learn more). My
> config file is simple and is below (comments mostly removed).
> Also, the only resource I'm managing is an IP address. I'm not
> using CRM, so I've got an haresources file which contains
> exactly:
> 
> wiley-coyote    192.168.0.98/24/eth0
> 
> 
> Any help would be greatly appreciated!
> TIA
> 
> Scott Mann
> Sr Software Engineer
> Aztek Networks
> 
> ha.cf (identical on both systems except for the change in ucast)
> ----------------------------------------------------------------
> #       Facility to use for syslog()/logger 
> #
> logfacility     local0
> #
> #
> keepalive 2
> #
> #
> deadtime 30
> #
> #
> warntime 10
> #
> #
> initdead 120
> #
> #
> udpport 694
> #
> # beauregard
> ucast eth0 192.168.0.11
> # wiley-coyote
> #ucast eth0 192.168.0.31
> #
> #
> #auto_failback on
> 
> auto_failback off
> 
> #
> 
> node wiley-coyote
> node beauregard
> #
> #apiauth client-name gid=gidlist uid=uidlist
> #apiauth ipfail gid=haclient uid=hacluster
> apiauth azClient uid=root,smann
> 
> #
> #compression_threshold 2
> crm no
> 
> 
> <end>
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


More information about the Linux-HA mailing list