[Linux-HA] two node firewall using heartbeat v2 problems [SOLVED]

Dejan Muhamedagic dejanmm at fastmail.fm
Mon Oct 1 13:20:40 MDT 2007


Hi,

On Mon, Oct 01, 2007 at 01:56:46PM -0500, Matt Zagrabelny wrote:
> 
> On Mon, 2007-10-01 at 14:37 +0200, Andrew Beekhof wrote:
> > On 10/1/07, Dejan Muhamedagic <dejanmm at fast
> 
> [...]
> 
> > > > (there will be no <status/> element in the following file, I believe
> > > > that this is due to me manually 'kill -9'ing the processes after they
> > > > would not stop nicely)
> > >
> > > No, the status section is never saved to a file. It only exists
> > > in running nodes.
> 
> I know that the actual status doesn't get written out, but doesn't the "<status/>" tag get written out when the processes exit?
> 
> > >
> > > > Here are some snippets from the log files, I am not sure what are the
> > > > valuable pieces and what are not. The files themselves are long (600 and
> > > > 900 lines for the primary and backup servers). Locations of the (almost
> > > > complete) log files is:
> > > >
> > > > http://www.d.umn.edu/~mzagrabe/ha-log.cody.txt
> > > > http://www.d.umn.edu/~mzagrabe/ha-log.tim.txt
> > >
> > > >From cody:
> > >
> > > heartbeat[18326]: 2007/09/28_11:29:27 WARN: string2msg_ll: node [tim] failed authentication
> > >
> > > This one's interesting. It shouldn't be happening.
> > >
> > > heartbeat[18326]: 2007/09/28_11:29:27 WARN: 6 lost packet(s) for [tim] [253:260]
> > > heartbeat[18326]: 2007/09/28_11:29:27 WARN: Late heartbeat: Node tim: interval 3000 ms
> > >
> > > Flaky network?
> > >
> > > heartbeat[18330]: 2007/09/28_11:29:29 WARN: glib: TTY write timeout on [/dev/ttyS0] (no connection or bad cable? [see documentation])
> > >
> > > Problems with serial?
> > 
> > one of the nice things about v2 is that it keeps the resource config
> > in sync between nodes.  however this also includes the status section
> > and means that the data being transferred could quite conceivably
> > max-out a serial connection.
> > 
> > a second NIC and a crossover cable is usually a good alternative
> 
> I am already using a pair of NIC's (between the nodes) for heartbeat, in
> addition to the serial link. Are you suggesting using two NIC's per node
> to send heartbeat messages? 

No, it's just that you are better off with some redundancy in
communication links.

> Are the status messages sent across both links? (ie. do they go across
> the serial link and the ethernet link between the nodes?) I would assume
> they would, but I thought I would ask for clarification.

No, I don't think so. The heartbeats go over both links, but the
messages only over one.

> > > heartbeat[18326]: 2007/09/28_11:29:56 CRIT: Cluster node tim returning after partition.
> > >
> > > The node is leaving and coming back. Looks like the
> > > network/serial connection doesn't deliver what we expect. Perhaps
> > > you could try some other combinations:
> > >
> > > - without serial/higher baud
> 
> Yes! Both of these solutions fix the problem. Should the default baud
> rate for a serial line be higher than 19200? What baud rate do others
> use for v2 heartbeat configurations? The reason I ask is that currently
> I have it set to 115200 and I am wondering if I am just above the
> threshold of saturating the serial link. Perhaps I will run some tests
> as well to see when the serial link gets saturated and report the
> findings.

Yes, that would be interesting. And we should probably print a
warning for v2 configurations and low speed serial links.

Thanks,

Dejan

> -- 
> Matt Zagrabelny - mzagrabe at d.umn.edu - (218) 726 8844
> University of Minnesota Duluth
> Information Technology Systems & Services
> PGP key 1024D/84E22DA2 2005-11-07
> Fingerprint: 78F9 18B3 EF58 56F5 FC85  C5CA 53E7 887F 84E2 2DA2
> 
> He is not a fool who gives up what he cannot keep to gain what he cannot
> lose.
> -Jim Elliot



> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list