[Linux-HA] heartbeat waits for initdead even after all nodes have joined

Lars Ellenberg lars.ellenberg at linbit.com
Fri Jan 15 09:53:30 MST 2010


On Fri, Jan 15, 2010 at 01:22:45AM -0500, David Sickmiller wrote:
> > > I don't have autojoin in my ha.cf, and I believe it defaults to
> > > "autojoin none", so that wouldn't explain why heartbeat keeps
> waiting
> > > after all nodes have joined.
> >
> > True. That should be fixed. Can you please open a bugzilla for
> > this issue,
> 
> Thanks for your help!  I've filed this as Bug 2311
> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2311)

again:

maybe this is because dc-timeout defaults to initdead.
as these are independent values,
you may configure them independently.

	please try and configure an explicit dc-timeout.

and see if that improves things.

ha.cf initdead is for the heartbeat cluster communication layer
and ccm.

dc-timeout can be configured in the cib, using cibadmin, or the crm shell, or ...

you could have initdead of 900,
and dc-timeout of 40 (just an example).

for AIS based clusters, there seem to be no real "initdead",
I think that is why the dc-timeout was changed to default to
the ha.cf initdead setting.

iirc there have been improvements to this startup behaviour in pacemaker
somewhen, but I don't remember the details.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.


More information about the Linux-HA mailing list