[Linux-HA] Start up issues with heartbeat and 2.6. series kernel.
lee.insik at gmail.com
Fri Feb 10 15:36:50 MST 2006
On 2/10/06, prosolutions at gmx.net <prosolutions at gmx.net> wrote:
> I have 2 separate sets of directors, and both of them exhibit this
> behavior. After approximately 3 1/2 min. after the primary node is
> restarted it detects itself as dead and causes a resource migration away
> from and then back to itself. With respect to the other comment someone
> made about timer frequency, on these systems it is set to 100 Hz. I
> will change it and do some experiments.
It's quite an annoying problem isn't it, I was at for quite a while
doing some kernel testing and after about 7 or 8 trials was able to
pin point it to the HZ timer setting. Setting it to 1000 should fix
the problem for you, I had asked people on the list if they could
confirm and test my findings, just waiting to see if I get any results
For me, heartbeat hasn't restarted on the test servers we have and
they've been running for 2 days straight. Now it's a matter of testing
the other components with the higher HZ timing to see if they act up
or not. Preemption of tasks is one thing to watch for with higher HZ
> Here's a log excerpt similar to above:
> heartbeat: 2006/02/10_04:23:07 info: remote resource transition completed.
> heartbeat: 2006/02/10_04:26:25 WARN: node dir01: is dead
> heartbeat: 2006/02/10_04:26:25 ERROR: No local heartbeat. Forcing restart.
> heartbeat: 2006/02/10_04:26:25 info: Heartbeat shutdown in progress. (1424)
> heartbeat: 2006/02/10_04:26:25 WARN: node dir02: is dead
> heartbeat: 2006/02/10_04:26:25 info: Link dir02:bond0 dead.
> heartbeat: 2006/02/10_04:26:25 WARN: Late heartbeat: Node dir01: interval 41160 ms
> Here are a few settings from ha.cf:
> keepalive 1
> deadtime 30
> initdead 250
More information about the Linux-HA