Tips? Children take a long time to go up and down
peter.smith at UTSouthwestern.edu
Tue Mar 19 09:28:59 MST 2002
Thank you both for the information. I discovered a few problems that I
think we (on my side) may have caused through misconfiguration.
Firstly, the startup script shoved the slow starting app in the
background (with a `&`) and returned with no error immediately.
Secondly, I had haresources different with respects to each machine,
messing things up quite a lot I think. Thirdly, I was using
nice_failback. Right now, I've removed any `&` backgrounded
scripts/apps--now when they complete, they complete. I've fixed the
haresources so that the master really is the master, and am now running
non-nice_failback, so the master should always be the master. I've also
adjusted the initdead, deadtime, warntime, and keepalive so that,
hopefully, it may keep ha from doing a quick flip-flop of ownership (too
quick for the apps to start/stop.)
Linux Systems Administrator
University of Texas Southwestern Medical Center at Dallas
(USA) 214 648 3111
peter.smith at utsouthwestern.edu
Alan Robertson wrote:
> Peter Smith wrote:
>> I have 2 nodes in a HA arrangement using Heartbeat 0.4.9.1 . The
>> software that Heartbeat handles the startup and shutdown of takes ~5
>> minutes to complete a startup and to complete a shutdown. Weird
>> things can happen with respects to Heartbeat during this time. Does
>> anyone have any tips for timing on Heartbeat for this type of
>> situation? Imagine starting a script that, at startup, does a sleep
>> 300 (5 minutes) and then just loops. Then, a shutdown is done in the
>> same manner, doing a sleep 300 (5 minutes) and then completion.
>> Heartbeat has to be able to check its status and not cut its hand off
>> while trying to close the lid on the cookie jar so to speak. I've
>> been running into the "dual-brain" problem a lot with this.
> If you are able to run in nice-failback mode, this problem should
> disappear. In nice_failback mode it's a lot more sophisticated about
> these kinds of things.
> The usual problem isn't the time to start, but the time to stop. It
> tend to think resources stop immediately. There are some relatively
> simple workarounds for this situation if you're doing active-active
> and really can't live with nice_failback.
> -- Alan Robertson
> alanr at unix.sh
More information about the Linux-HA