[Linux-HA] ERROR: Message hist queue is filling up (200 messages in queue)!!!!
davea at support.kcm.org
Wed Sep 26 18:59:04 MDT 2007
On Wed, 2007-09-26 at 23:07 +0200, Andrew Beekhof wrote:
> On 9/26/07, Dave Augustus <davea at support.kcm.org> wrote:
> > Hello All,
> > Thanks for your help up to this point. We now have a 6 node cluster
> > running in test mode. The DC is r6 and the load balancer resource group
> > is NOT running but my 5 clones are. So I started updating my cib and I
> > found that I couldn't.
> > I got this message instead:
> > "No messages received in 30 seconds.. aborting"
> > Looking at the logs I found that it is just filling up with entries like
> > this:
> > How can I get control of my cluster from this error ?
> > Dave
> usually there is a firewall or some other comms-related failure
> involved - try starting there
There was no firewall- these machines are on their own LAN segment-
nothing in the middle but a switch. I could ssh into each server and
ended up stopping heartbeat on the troublesome host. I restarted
heartbeat and the problem never reappeared. It was scary because the
machine that could not be reached was the DC and so management of the
cluster was hosed until I intervened. Not a pretty sight!!!
More information about the Linux-HA