Tips? Children take a long time to go up and down

Peter Smith peter.smith at UTSouthwestern.edu
Tue Mar 19 09:28:59 MST 2002


Thank you both for the information. I discovered a few problems that I 
think we (on my side) may have caused through misconfiguration.

Firstly, the startup script shoved the slow starting app in the 
background (with a `&`) and returned with no error immediately. 
Secondly, I had haresources different with respects to each machine, 
messing things up quite a lot I think. Thirdly, I was using 
nice_failback. Right now, I've removed any `&` backgrounded 
scripts/apps--now when they complete, they complete. I've fixed the 
haresources so that the master really is the master, and am now running 
non-nice_failback, so the master should always be the master. I've also 
adjusted the initdead, deadtime, warntime, and keepalive so that, 
hopefully, it may keep ha from doing a quick flip-flop of ownership (too 
quick for the apps to start/stop.)

Peter Smith
Linux Systems Administrator
University of Texas Southwestern Medical Center at Dallas
(USA) 214 648 3111
peter.smith at utsouthwestern.edu


Alan Robertson wrote:

> Peter Smith wrote:
>
>> I have 2 nodes in a HA arrangement using Heartbeat 0.4.9.1 . The 
>> software that Heartbeat handles the startup and shutdown of takes ~5 
>> minutes to complete a startup and to complete a shutdown. Weird 
>> things can happen with respects to Heartbeat during this time. Does 
>> anyone have any tips for timing on Heartbeat for this type of 
>> situation? Imagine starting a script that, at startup, does a sleep 
>> 300 (5 minutes) and then just loops. Then, a shutdown is done in the 
>> same manner, doing a sleep 300 (5 minutes) and then completion. 
>> Heartbeat has to be able to check its status and not cut its hand off 
>> while trying to close the lid on the cookie jar so to speak. I've 
>> been running into the "dual-brain" problem a lot with this.
>
>
>
> If you are able to run in nice-failback mode, this problem should 
> disappear. In nice_failback mode it's a lot more sophisticated about 
> these kinds of things.
>
> The usual problem isn't the time to start, but the time to stop. It 
> tend to think resources stop immediately. There are some relatively 
> simple workarounds for this situation if you're doing active-active 
> and really can't live with nice_failback.
>
>
> -- Alan Robertson
> alanr at unix.sh
>





More information about the Linux-HA mailing list