[Linux-HA] cluster-delay and action timeouts

Keisuke MORI kskmori at intellilink.co.jp
Fri Feb 23 06:14:24 MST 2007


Hi,

If I configured an action timeout to quite a long time,
my guess is that cluster-delay (or transition_idle_timeout in 2.0.7) 
must be a larger value than it.
Is this correct? (or documented somewhere?)

I tried to induce a start timeout with the following parameters,
expecting the node would be fenced.

     <crm_config>
          <nvpair id="cluster-delay" name="cluster-delay" value="60s"/>

       <primitive id="dummy0" class="ocf" type="Dummy-test" provider="heartbeat" >
           <op id="d0_start" name="start" timeout="70s" on_fail="fence"/>

But what actually happened is that, it timeouts after 60s and
'start' is executed again and again.

If I set cluster-delay to 120s then it works as expected 
so I assume that my guess is correct.


The log when 60 seconds elapsed is like this.
----8<-------8<-------8<-------8<-------8<-------8<-------8<-------8<-------8<---
tengine[20793]: 2007/02/23_20:42:11 WARN: global_timer_callback: Timer popped (abort_level=1000000, complete=false)
tengine[20793]: 2007/02/23_20:42:11 info: unconfirmed_actions: Waiting on 1 unconfirmed actions
tengine[20793]: 2007/02/23_20:42:11 WARN: global_timer_callback: Transition abort timeout reached... marking transition complete.
tengine[20793]: 2007/02/23_20:42:11 WARN: global_timer_callback: Writing 1 unconfirmed actions to the CIB
tengine[20793]: 2007/02/23_20:42:11 ERROR: unconfirmed_actions: Action 4 unconfirmed from peer
tengine[20793]: 2007/02/23_20:42:11 WARN: cib_action_update: rsc_op 4: dummy0_start_0 on pacifica timed out
----8<-------8<-------8<-------8<-------8<-------8<-------8<-------8<-------8<---


Thanks,


Keisuke MORI
NTT DATA Intellilink Corporation


More information about the Linux-HA mailing list