[Linux-ha-dev] Re: Error in serial code of heartbeat?
Fri, 21 Apr 2000 08:55:39 -0600
Holger Kiehl wrote:
> There seems to be a bug in heartbeat serial code. I have been using
> heartbeat for a very long time and have had no problems. But since I moved
> the machine and put a higher constant load on it, I am getting
> the following errors every hour:
> TTY write timeout on [/dev/ttyS1] (no connection?)
> At first I was running version 0.4.6c when these errors popped up. I
> rebooted both nodes several times, but this did not help. The error
> always popped up again. I then tried to do an strace on the heartbeat
> doing the serial stuff and could see that it always reads every two
> seconds from the serial fd, although the serial buffer was full with
> data! I could verify this by simply disconnecting the serial connection
> and the heartbeat process was still reading data from the serial
> port for about 5 - 10 minutes before the buffer was empty! Connecting
> it again, this time with a serial analyser between the two, one
> could see the buffer fill up until it was full again and the RTS
> signal dropped.
> It seems that heartbeat is reading just one record every two seconds
> and does not read everything from the buffer. So if the process
> writing to the port writes faster, it will always fill the
> buffer and heartbeat will NOT detect if the other node has
> crashed for 5 - 10 minutes until the buffer is empty.
> Two days ago I decided to upgrade to 0.4.7 and everything seemed to
> be running. However looking at the log files this morning I see that
> the same messages appear in my log files on both nodes.
> As I said this all started to happen when I moved the nodes from one
> room to another one and have more procceses running on it causing a
> higher load on the active node:
> 9:06am up 1 day, 22:19, 5 users, load average: 0.87, 0.61, 0.44
> There are about 195 processes now running on the active node. Before
> I moved load average was always around zero.
What OS are you running this with? What was the version of heartbeat
that you were using before that worked fine?
-- Alan Robertson