[Linux-HA] Heartbeat Serial problem "kernel: ttyS: 1 input
overrun(s)"
Steve Wilson
linux-ha at aliceclarke.co.uk
Sun Mar 27 01:52:17 MST 2005
Hello Alan,
Thanks for your response, the other HA systems I have implemented are
all based on RHEL3, however I installed the Ultramonkey collection on
these systems and have had no problems.
However because I did not need LVS for this implementation I decided
just to install heartbeat.
This RHEL3 was already up2date and running the latest Kernel
2.4.21-27.0.2.ELsmp. I have installed the UM RHEL3 kernel now to see if
the problem still happens.
Can you give me some more details of this problem?
Thanks
Steve Wilson
On Fri, 2005-03-25 at 12:00 -0700, linux-ha-request at lists.linux-ha.org
wrote:
>
> Message: 3
> Date: Fri, 25 Mar 2005 09:32:44 -0700
> From: Alan Robertson <alanr at unix.sh>
> Subject: Re: [Linux-HA] Heartbeat Serial problem "kernel: ttyS: 1
> input overrun(s)"
> To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
> Message-ID: <42443D2C.8010500 at unix.sh>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Steve Wilson wrote:
> > I have recently implemented HA failover for an Exim mailrelay I control.
> >
> > The master (p-svc) is an up to date RHEL 3.0 server running on standard
> > Intel hardware (Dual Xeon), the slave is RH 9.0 server running again on
> > standard Intel hardware (P4).
> >
> > Set up is uses only a serial connection for heartbeat, and I seem to be
> > having a problem with this, but have no idea why.
> >
> > The cluster will run fine for an indeterminate period of time and then I
> > will get the following error on p-svc 'kernel: ttyS: 1 input overrun(s)'
> > then it all goes horribly wrong with the slave taking over the cluster
> > (while the master still has the IP address and Exim is working fine),
> > eventually p-svc will restart heartbeat, the serial line seems to start
> > working again, and p-svc takes control again of the cluster.
> >
> > I have attached logs for each machine during the time this happened,
> > note that time is sync'd the same on both servers.
> >
> > I am fairly new to HA, however I have implented 2 clusters already both
> > using serial connections.
>
> I think this is likely a bug in the kernel you're using. It stopped
> reading characters from the serial port, and it stopped heartbeating for 28
> seconds - on the *idle* machine.
>
> This sounds pretty much exactly like our favorite red hat scheduler bug. I
> don't know how this bug mapped into RHEL3 kernels. But, trying the latest
> RHEL3 kernel will likely make it go away.
>
>
> --
> Alan Robertson <alanr at unix.sh>
>
> "Openness is the foundation and preservative of friendship... Let me claim
> from you at all times your undisguised opinions." - William Wilberforce
More information about the Linux-HA
mailing list