[Linux-HA] Re: heartbeat inside xen
mcousin at sigma.fr
Mon Mar 7 05:40:53 MST 2005
On Monday 07 March 2005 12:54, Lars Marowsky-Bree wrote:
> On 2005-03-07T10:16:38, Marc Cousin <mcousin at sigma.fr> wrote:
> (Changing subject etc, this is a different issue.)
> > I'm reacting about 2.6.10 ...
> > We just tried to use heartbeat inside Xen hosts ...
> > Heartbeat fails here with 2.6.10
> > heartbeat: ERROR: No local heartbeat. Forcing restart,
> > and then
> > heartbeat: WARN: Late heartbeat: Node fwautocom-2: interval 41580 ms
> > , and not with 2.4.29 (rock stable). Is anybody having the same kind of
> > problems with 2.6.10 ?
> > I must mention we have no load whatsoever on the failing nodes with
> > 2.6.10, and that the failure is systematic.
> > We're using XEN 2.0.4, Debian unstable, heartbeat 1.2.3 (packaged with
> > debian).
> > Of course, the kernels are patched to use Xen ...
> If Xen internally does not allocate the timeslices for the guests well
> enough (like the above: this indicates heartbeat hasn't been run for
> ~42s!), this can happen. I'm not sure how well Xen scales, but the
> question to ask about it is the scheduler fairness and the timeslices
> the guests get.
I don't think the problem somes directly from Xen ... we're hosting several
OSes on the same machine (5 on each for the moment).
The machines using 2.4.29 are OK.
The machines using 2.6.10 show the strange behaviour.
I don't think there is a Xen scheduling problem, as I'm working via ssh on the
hosted 2.6.10 when the problem occurs, and there's no freeze.
The problem with 2.6.10 (split cluster) occurs a few minutes after booting...
and only once, it seems.
If there are any tests I can do, let me know, as creating a new environment
isn't painfull at all...
> Lars Marowsky-Brée <lmb at suse.de>
More information about the Linux-HA