[Linux-HA] Coredump on active node when other node joins in
Andrew Beekhof
beekhof at gmail.com
Tue May 15 03:32:22 MDT 2007
Hi Berhhard,
can you create a bugzilla entry for this please?
i'm no expert on this part of the code, but even _if_ you're doing
something wrong we shouldn't be dumping core.
On 5/14/07, Bernhard Limbach <bernhard.limbach at gmx.de> wrote:
> Hi,
>
> Update:
>
> The runlevel thing was not a solution (I assumed kind of a racing condition and thought the deferred start of heartbeat would solve this) but it happened again.
>
> I know have the suspicion that it is caused by the missing hb_generation file on the freshly installed server. After re-reading the docs I was wondering why I didn't run into a reply attack protection anyway ??
>
> I tried now with "hbgenmethod time" and my reinstall-procedure now succeeded without coredump (at least one try, I'm getting tired of doing this installation thing all the time again...).
>
> Still I have a little concern that a new joining node, whether legal or not and whether it behaves nice or not, can cause my running heartbeat to fail in this dramatic way...
>
> Regards,
> Bernhard
>
>
>
> -------- Original-Nachricht --------
> Datum: Fri, 11 May 2007 13:07:28 +0200
> Von: "Bernhard Limbach" <bernhard.limbach at gmx.de>
> An: linux-ha at lists.linux-ha.org
> Betreff: [Linux-HA] Coredump on active node when other node joins in
>
> > Hi,
> >
> > I'm currently practicing the reinstallation of one cluster node
> > (maintenance procedure to replace a server), while the other node is running and
> > providing the services.
> >
> > When the freshly installed node comes up, heartbeat on the primary node
> > dumps core and does an emergency shutdown.
> >
> > Freshly installed means that in addition to the config files the only file
> > in /var/lib/heartbeat and below, that I have restored, is the file
> > hb_uuid. Everything else there should be automatically updated, as far as I have
> > understood the concepts...
> >
> > The error happened (reproducably) when heartbeat was started in runlevel
> > 2.
> >
> > When started in runlevel 5 it did not happen (that's now my current
> > workaround).
> >
> >
> > The error also did not happen when one of the nodes was rebootet normally,
> > i.e. after it has been online in the cluster already.
> >
> >
> > The setup is a simple 2-node cluster with:
> > - heartbeat-2.0.8 compiled from the tarball that is available on the
> > download page.
> > - Fedora Core 5 with kernel 2.6.20-1.2316.fc5smp
> >
> >
> > Attached you will find: ha.cf, cib.xml, the logs of both nodes and the
> > backtrace of the core-dump (if I managed to extract it correctly...).
> >
> > Please note also that after the emergency shutdown two heartbeat processes
> > still were running:
> >
> > DMM1:/root # ps -ef |grep heartbeat
> > root 17535 1 0 07:31 ? 00:00:00 /usr/lib/heartbeat/lrmd -r
> > 17 17537 1 0 07:31 ? 00:00:00 /usr/lib/heartbeat/attrd
> >
> >
> > As starting of a freshly installed server in runlevel 5 is a workable
> > workaround for me I merely wanted to inform you about this error, maybe it
> > helps to track down another of those little bugs...
> >
> > Best regards,
> > Bernhard
> > --
> > GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
> > Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
>
> --
> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list