[Linux-HA] two node firewall using heartbeat v2 problems [SOLVED]
Matt Zagrabelny
mzagrabe at d.umn.edu
Mon Oct 1 12:56:46 MDT 2007
On Mon, 2007-10-01 at 14:37 +0200, Andrew Beekhof wrote:
> On 10/1/07, Dejan Muhamedagic <dejanmm at fast
[...]
> > > (there will be no <status/> element in the following file, I believe
> > > that this is due to me manually 'kill -9'ing the processes after they
> > > would not stop nicely)
> >
> > No, the status section is never saved to a file. It only exists
> > in running nodes.
I know that the actual status doesn't get written out, but doesn't the "<status/>" tag get written out when the processes exit?
> >
> > > Here are some snippets from the log files, I am not sure what are the
> > > valuable pieces and what are not. The files themselves are long (600 and
> > > 900 lines for the primary and backup servers). Locations of the (almost
> > > complete) log files is:
> > >
> > > http://www.d.umn.edu/~mzagrabe/ha-log.cody.txt
> > > http://www.d.umn.edu/~mzagrabe/ha-log.tim.txt
> >
> > >From cody:
> >
> > heartbeat[18326]: 2007/09/28_11:29:27 WARN: string2msg_ll: node [tim] failed authentication
> >
> > This one's interesting. It shouldn't be happening.
> >
> > heartbeat[18326]: 2007/09/28_11:29:27 WARN: 6 lost packet(s) for [tim] [253:260]
> > heartbeat[18326]: 2007/09/28_11:29:27 WARN: Late heartbeat: Node tim: interval 3000 ms
> >
> > Flaky network?
> >
> > heartbeat[18330]: 2007/09/28_11:29:29 WARN: glib: TTY write timeout on [/dev/ttyS0] (no connection or bad cable? [see documentation])
> >
> > Problems with serial?
>
> one of the nice things about v2 is that it keeps the resource config
> in sync between nodes. however this also includes the status section
> and means that the data being transferred could quite conceivably
> max-out a serial connection.
>
> a second NIC and a crossover cable is usually a good alternative
I am already using a pair of NIC's (between the nodes) for heartbeat, in
addition to the serial link. Are you suggesting using two NIC's per node
to send heartbeat messages?
Are the status messages sent across both links? (ie. do they go across
the serial link and the ethernet link between the nodes?) I would assume
they would, but I thought I would ask for clarification.
> > heartbeat[18326]: 2007/09/28_11:29:56 CRIT: Cluster node tim returning after partition.
> >
> > The node is leaving and coming back. Looks like the
> > network/serial connection doesn't deliver what we expect. Perhaps
> > you could try some other combinations:
> >
> > - without serial/higher baud
Yes! Both of these solutions fix the problem. Should the default baud
rate for a serial line be higher than 19200? What baud rate do others
use for v2 heartbeat configurations? The reason I ask is that currently
I have it set to 115200 and I am wondering if I am just above the
threshold of saturating the serial link. Perhaps I will run some tests
as well to see when the serial link gets saturated and report the
findings.
--
Matt Zagrabelny - mzagrabe at d.umn.edu - (218) 726 8844
University of Minnesota Duluth
Information Technology Systems & Services
PGP key 1024D/84E22DA2 2005-11-07
Fingerprint: 78F9 18B3 EF58 56F5 FC85 C5CA 53E7 887F 84E2 2DA2
He is not a fool who gives up what he cannot keep to gain what he cannot
lose.
-Jim Elliot
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20071001/dd65831d/attachment.pgp
More information about the Linux-HA
mailing list