[Linux-HA] Initial dead time is smaller than deadtime

Andrew Beekhof beekhof at gmail.com
Wed Apr 9 08:41:58 MDT 2008


On Apr 8, 2008, at 8:18 PM, Bernd Schubert wrote:

> On Tuesday 08 April 2008 19:32:58 Bernd Schubert wrote:
>> Hello,
>>
>> I need to set a rather huge dead time of 1200s, but the initial  
>> dead time
>> is supposed to be of 120s or less. However, heartbeat tries to be
>> schoolmasterly and doesn't want to accept my settings:
>>
>> deadtime 1200 # time to declare a node dead
>> initdead 120  # time to declare a node dead on heartbeat startup
>> keepalive 120 # how often to send keepalive packets
>>
>>
>> heartbeat[6523]: 2008/04/08_19:23:16 ERROR: Initial dead time  
>> [120000] is
>> smaller than deadtime [1200000]
>> eartbeat[6523]: 2008/04/08_19:23:16 ERROR: Configuration error,  
>> heartbeat
>> not started.
>>
>>
>> Well, heartbeat is not startup up automatically here and even the  
>> nodes are
>> not powered on automatically after a hard reset. So when I start  
>> heartbeat
>> I'm activeley monitoring everything and there is absolutely no need  
>> to let
>> me wait at least 20min on start up. I'm even not convinced a  
>> deadtime of
>> 20min is sufficient, since this is for a Lustre cluster and Lustre
>> sometimes manages to create such a high load that nothing else than  
>> the
>> Lustre and related kernel threads do work on the system...
>>
>> So pretty please, is there a setting allowing to override this  
>> ridiculous
>> initdead  time checking?
>
> Doesn't look like the error can be overriden
>
>        /* Check deadtime parameters */
>        if (config->initial_deadtime_ms < config->deadtime_ms) {
>                ha_log(LOG_ERR
>                ,       "Initial dead time [%ld] is smaller than"
>                " deadtime [%ld]"
>                ,       config->initial_deadtime_ms, config- 
> >deadtime_ms);
>                ++errcount;
>        }else if (config->initial_deadtime_ms < 10000) {

Have you tried compiling a version with the "++errcount;" part  
commented out?
Seems like a strange thing to be a fatal - unless the internal  
algorithms make crappy assumptions.


More information about the Linux-HA mailing list