[Linux-HA] Heartbeat Serial problem "kernel: ttyS: 1 input overrun(s)"

Steve Wilson linux-ha at aliceclarke.co.uk
Sun Mar 27 01:52:17 MST 2005


Hello Alan,

Thanks for your response, the other HA systems I have implemented are
all based on RHEL3, however I installed the Ultramonkey collection on
these systems and have had no problems.

However because I did not need LVS for this implementation I decided
just to install heartbeat.

This RHEL3 was already up2date and running the latest Kernel
2.4.21-27.0.2.ELsmp. I have installed the UM RHEL3 kernel now to see if
the problem still happens.

Can you give me some more details of this problem?

Thanks

Steve Wilson

On Fri, 2005-03-25 at 12:00 -0700, linux-ha-request at lists.linux-ha.org
wrote:

> 
> Message: 3
> Date: Fri, 25 Mar 2005 09:32:44 -0700
> From: Alan Robertson <alanr at unix.sh>
> Subject: Re: [Linux-HA] Heartbeat Serial problem "kernel: ttyS: 1
> 	input	overrun(s)"
> To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
> Message-ID: <42443D2C.8010500 at unix.sh>
> Content-Type: text/plain; charset=us-ascii; format=flowed
> 
> Steve Wilson wrote:
> > I have recently implemented HA failover for an Exim mailrelay I control.
> > 
> > The master (p-svc) is an up to date RHEL 3.0 server running on standard
> > Intel hardware (Dual Xeon), the slave is RH 9.0 server running again on
> > standard Intel hardware (P4).
> > 
> > Set up is uses only a serial connection for heartbeat, and I seem to be
> > having a problem with this, but have no idea why.
> > 
> > The cluster will run fine for an indeterminate period of time and then I
> > will get the following error on p-svc 'kernel: ttyS: 1 input overrun(s)'
> > then it all goes horribly wrong with the slave taking over the cluster
> > (while the master still has the IP address and Exim is working fine),
> > eventually p-svc will restart heartbeat, the serial line seems to start
> > working again, and p-svc takes control again of the cluster.
> > 
> > I have attached logs for each machine during the time this happened,
> > note that time is sync'd the same on both servers.
> > 
> > I am fairly new to HA, however I have implented 2 clusters already both
> > using serial connections.
> 
> I think this is likely a bug in the kernel you're using.  It stopped 
> reading characters from the serial port, and it stopped heartbeating for 28 
> seconds - on the *idle* machine.
> 
> This sounds pretty much exactly like our favorite red hat scheduler bug.  I 
> don't know how this bug mapped into RHEL3 kernels.  But, trying the latest 
> RHEL3 kernel will likely make it go away.
> 
> 
> -- 
>      Alan Robertson <alanr at unix.sh>
> 
> "Openness is the foundation and preservative of friendship...  Let me claim 
> from you at all times your undisguised opinions." - William Wilberforce




More information about the Linux-HA mailing list