[Linux-HA] start-delay for monitor operation not implemented?

Andrew Beekhof beekhof at gmail.com
Mon Oct 9 02:34:32 MDT 2006


On 10/9/06, Max Hofer <max.hofer at apus.co.at> wrote:
> I just want to point out that all resource agent delieviered by heartbeat
> have "start-delay" in their meta-data and not "start_delay".
>
> So i assume the RA are patched to "start_delay".

start_delay is not for the RA, its used by the LRM

>
> regards Max
>
> On Wednesday 04 October 2006 18:14, Dejan Muhamedagic wrote:
> > On Wed, Oct 04, 2006 at 11:36:59AM +0200, Max Hofer wrote:
> > > On Tuesday 03 October 2006 14:20, Dejan Muhamedagic wrote:
> > > > On Tue, Oct 03, 2006 at 02:10:01PM +0200, Andrew Beekhof wrote:
> > > > > On 10/2/06, Max Hofer <max.hofer at apus.co.at> wrote:
> > > > > >All RA provided by "heartbeat" have a value "start-delay" in their
> > > > > > monitor actions.
> > > > > >
> > > > > >I assumed thats a time period after which the monitor operation is
> > > > > > called the
> > > > > >first time after a resource start. I tested it and it seems this
> > > > > > value is ignored because the first first "monitor" operation is
> > > > > > called just right after the successful start.
> > > > > >
> > > > > >Is there is following functioanlity implemented in heartbeat-2:
> > > > > >
> > > > > >"Start monitoring resource X seconds after successful start" - where
> > > > > >successful start means the start operation returned OCF_SUCCESS.
> > > > >
> > > > > I believe one of the IBM guys tested this a week or so ago and
> > > > > concluded it was working.  Maybe he did not test monitor.
> > > >
> > > > Yes, it did work for the monitor operation.
> > >
> > > And how do i enable "start-delay"?
> > >
> > > At first I thought defining those valuse in my RA meta-data would be
> > > enough. But it seems meta-data output is just used for the GU and nowhere
> > > else (please correct me if i'm wrong).
> > >
> > > As far as i can tell the <op> in the CIB does not have an attribute
> > > called "start-delay". And setting it via an attribute did not resolve my
> > > problem either.
> >
> > True that it's not in the DTD, but it is implemented. To pacify
> > crm_verify, you can apply the attached dtd.patch.
> >
> > > Attached the CIB test with the Dummy resource provided by heatbeat
> > > (please do not confuse the attribute "start-delay" with the
> > > OCF_RESKYE_start_delay used by the Dummy RA - which just simulates a
> > > resource delay when starting the dummy resource).
> > >
> > > And an exceprt of the ha-log when starting up the cluster.
> > >
> > > As you can see the "monitor" operation is just called after a successfull
> > > start operation.
> > >
> > > Any suggestions?
> >
> > I think that the default in case you specify an unnamed value is
> > in milliseconds. Probably you should say "20s" instead of "20".
> > Otherwise, the configuration looks good to me.
> >
> > Cheers,
> >
> > Dejan
> >
> > > regards Max
> > >
> > > <?xml version="1.0" ?>
> > > <cib>
> > >     <configuration>
> > >             <crm_config>
> > >                     <cluster_property_set id="default">
> > >                             <attributes>
> > >                                     <nvpair id="symmetric_cluster" name="symmetric_cluster"
> > > value="true"/> <nvpair id="transition_idle_timeout"
> > > name="transition_idle_timeout" value="120s"/> <nvpair
> > > id="no_quorum_policy" name="no_quorum_policy" value="ignore"/> <nvpair
> > > id="default_resource_stickiness" name="default_resource_stickiness"
> > > value="INFINITY"/> <nvpair id="default_resource_failure_stickiness"
> > > name="default_resource_failure_stickiness" value="-INFINITY"/> <nvpair
> > > id="short_resource_names" name="short_resource_names" value="true"/>
> > > </attributes>
> > >                     </cluster_property_set>
> > >             </crm_config>
> > >             <nodes/>
> > >
> > >             <resources>
> > >                     <!--
> > >                             Start monitoring the resource with a delay.
> > >                     -->
> > >                     <primitive class="ocf" id="dummy_resource" provider="heartbeat"
> > > type="Dummy"> <operations>
> > >                                     <op id="dummy_monitor" interval="10s" name="monitor" timeout="15s">
> > >                                             <instance_attributes>
> > >                                                     <attributes>
> > >                                                             <nvpair id="dummy_monitor_start_delay" name="start-delay"
> > > value="20"/> </attributes>
> > >                                             </instance_attributes>
> > >                                     </op>
> > >                             </operations>
> > >                     </primitive>
> > >
> > >             </resources>
> > >
> > >             <constraints/>
> > >
> > >     </configuration>
> > >
> > >     <status/>
> > >
> > > </cib>
> > >
> > >
> > > pengine[13948]: 2006/10/04_11:27:39 WARN: unpack_config:unpack.c No value
> > > specified for cluster preference: default_action_timeout pengine[13948]:
> > > 2006/10/04_11:27:39 info: unpack_config:unpack.c Default stickiness:
> > > 1000000 pengine[13948]: 2006/10/04_11:27:39 info: unpack_config:unpack.c
> > > Default failure stickiness: -1000000 pengine[13948]: 2006/10/04_11:27:39
> > > WARN: unpack_config:unpack.c No value specified for cluster preference:
> > > stonith_enabled pengine[13948]: 2006/10/04_11:27:39 info:
> > > unpack_config:unpack.c STONITH of failed nodes is disabled
> > > pengine[13948]: 2006/10/04_11:27:39 WARN: unpack_config:unpack.c No value
> > > specified for cluster preference: stonith_action pengine[13948]:
> > > 2006/10/04_11:27:39 info: unpack_config:unpack.c STONITH will reboot
> > > nodes pengine[13948]: 2006/10/04_11:27:39 info: unpack_config:unpack.c
> > > Cluster is symmetric - resources can run anywhere by default
> > > pengine[13948]: 2006/10/04_11:27:39 notice: unpack_config:unpack.c On
> > > loss of CCM Quorum: Ignore pengine[13948]: 2006/10/04_11:27:39 WARN:
> > > unpack_config:unpack.c No value specified for cluster preference:
> > > stop_orphan_resources pengine[13948]: 2006/10/04_11:27:39 info:
> > > unpack_config:unpack.c Orphan resources are stopped pengine[13948]:
> > > 2006/10/04_11:27:39 WARN: unpack_config:unpack.c No value specified for
> > > cluster preference: stop_orphan_actions pengine[13948]:
> > > 2006/10/04_11:27:39 info: unpack_config:unpack.c Orphan resource actions
> > > are stopped pengine[13948]: 2006/10/04_11:27:39 WARN:
> > > unpack_config:unpack.c No value specified for cluster preference:
> > > remove_after_stop pengine[13948]: 2006/10/04_11:27:39 info:
> > > unpack_config:unpack.c Stopped resources are removed from the status
> > > section: false pengine[13948]: 2006/10/04_11:27:39 WARN:
> > > unpack_config:unpack.c No value specified for cluster preference:
> > > is_managed_default pengine[13948]: 2006/10/04_11:27:39 info:
> > > unpack_config:unpack.c By default resources are managed pengine[13948]:
> > > 2006/10/04_11:27:39 info: determine_online_status:unpack.c Node
> > > management1 is online pengine[13948]: 2006/10/04_11:27:39 info:
> > > dummy_resource      (heartbeat::ocf:Dummy): Stopped pengine[13948]:
> > > 2006/10/04_11:27:39 notice: native_create_probe:native.c management1:
> > > Created probe for dummy_resource pengine[13948]: 2006/10/04_11:27:39
> > > notice: StartRsc:native.c  management1      Start dummy_resource
> > > pengine[13948]: 2006/10/04_11:27:39 notice: Recurring:native.c
> > > management1    dummy_resource_monitor_10000 pengine[13948]:
> > > 2006/10/04_11:27:39 notice: stage8:allocate.c Created transition graph 0.
> > > pengine[13948]: 2006/10/04_11:27:39 WARN: process_pe_message:pengine.c No
> > > value specified for cluster preference: pe-input-series-max
> > > pengine[13948]: 2006/10/04_11:27:39 info: process_pe_message:pengine.c
> > > Transition 0: PEngine Input stored in:
> > > /var/lib/heartbeat/pengine/pe-input-373.bz2 crmd[13941]:
> > > 2006/10/04_11:27:39 info: do_state_transition:fsa.c management1: State
> > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> > > cause=C_IPC_MESSAGE origin=route_message ] tengine[13947]:
> > > 2006/10/04_11:27:39 info: unpack_graph:unpack.c Unpacked transition 0: 5
> > > actions in 5 synapses tengine[13947]: 2006/10/04_11:27:39 info:
> > > send_rsc_command:actions.c Initiating action 3: dummy_resource_monitor_0
> > > on management1 crmd[13941]: 2006/10/04_11:27:39 info: do_lrm_rsc_op:lrm.c
> > > Performing op monitor on dummy_resource (interval=0ms,
> > > key=0:c3cec89f-7e5b-4027-b4cb-1c5c690890c6) crmd[13941]:
> > > 2006/10/04_11:27:49 info: process_lrm_event:lrm.c LRM operation (2)
> > > monitor_0 on dummy_resource Error: (7) not running cib[13937]:
> > > 2006/10/04_11:27:49 info: activateCibXml:io.c CIB size is 42068 bytes
> > > (was 38308) cib[13937]: 2006/10/04_11:27:49 info:
> > > cib_diff_notify:notify.c Update (client: 13941, call:19): 0.179.4378 ->
> > > 0.179.4379 (ok) cib[13957]: 2006/10/04_11:27:49 info:
> > > write_cib_contents:io.c Wrote version 0.179.4379 of the CIB to disk
> > > (digest: f0787767711318dcc20b79bdfcdb4dc0) tengine[13947]:
> > > 2006/10/04_11:27:49 info: te_update_diff:callbacks.c Processing diff
> > > (cib_update): 0.179.4378 -> 0.179.4379 tengine[13947]:
> > > 2006/10/04_11:27:49 info: match_graph_event:events.c Action
> > > dummy_resource_monitor_0 (3) confirmed tengine[13947]:
> > > 2006/10/04_11:27:49 info: send_rsc_command:actions.c Initiating action 2:
> > > probe_complete on management1 tengine[13947]: 2006/10/04_11:27:49 info:
> > > te_pseudo_action:actions.c Pseudo action 1 confirmed tengine[13947]:
> > > 2006/10/04_11:27:49 info: send_rsc_command:actions.c Initiating action 4:
> > > dummy_resource_start_0 on management1 crmd[13941]: 2006/10/04_11:27:49
> > > info: do_lrm_rsc_op:lrm.c Performing op start on dummy_resource
> > > (interval=0ms, key=0:c3cec89f-7e5b-4027-b4cb-1c5c690890c6) cib[13937]:
> > > 2006/10/04_11:27:49 info: activateCibXml:io.c CIB size is 43464 bytes
> > > (was 42068) cib[13937]: 2006/10/04_11:27:49 info:
> > > cib_diff_notify:notify.c Update (client: 13941, call:20): 0.179.4379 ->
> > > 0.179.4380 (ok) tengine[13947]: 2006/10/04_11:27:49 info:
> > > te_update_diff:callbacks.c Processing diff (cib_update): 0.179.4379 ->
> > > 0.179.4380 tengine[13947]: 2006/10/04_11:27:49 info:
> > > extract_event:events.c Aborting on transient_attributes changes
> > > tengine[13947]: 2006/10/04_11:27:49 info: update_abort_priority:utils.c
> > > Abort priority upgraded to 1000000 tengine[13947]: 2006/10/04_11:27:49
> > > info: update_abort_priority:utils.c Abort action 0 superceeded by 2
> > > cib[13961]: 2006/10/04_11:27:49 info: write_cib_contents:io.c Wrote
> > > version 0.179.4380 of the CIB to disk (digest:
> > > 1bd1671fa30d4ddb31130b839f6f07cb) crmd[13941]: 2006/10/04_11:27:59 info:
> > > process_lrm_event:lrm.c LRM operation (3) start_0 on dummy_resource
> > > complete cib[13937]: 2006/10/04_11:27:59 info: activateCibXml:io.c CIB
> > > size is 45756 bytes (was 43464) cib[13937]: 2006/10/04_11:27:59 info:
> > > cib_diff_notify:notify.c Update (client: 13941, call:21): 0.179.4380 ->
> > > 0.179.4381 (ok) tengine[13947]: 2006/10/04_11:27:59 info:
> > > te_update_diff:callbacks.c Processing diff (cib_update): 0.179.4380 ->
> > > 0.179.4381 tengine[13947]: 2006/10/04_11:27:59 info:
> > > match_graph_event:events.c Action dummy_resource_start_0 (4) confirmed
> > > tengine[13947]: 2006/10/04_11:27:59 info: run_graph:graph.c
> > > ==================================================== tengine[13947]:
> > > 2006/10/04_11:27:59 notice: run_graph:graph.c Transition 0: (Complete=4,
> > > Pending=0, Fired=0, Skipped=1, Incomplete=0) crmd[13941]:
> > > 2006/10/04_11:27:59 info: do_state_transition:fsa.c management1: State
> > > transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
> > > cause=C_IPC_MESSAGE origin=route_message ] crmd[13941]:
> > > 2006/10/04_11:27:59 info: do_state_transition:fsa.c All 1 cluster nodes
> > > are eligable to run resources. pengine[13948]: 2006/10/04_11:27:59 info:
> > > process_pe_message: [generation] <cib generated="true" admin_epoch="0"
> > > have_quorum="true" num_peers="1" cib_feature_revision="1.3" epoch="179"
> > > num_updates="4381" cib-last-written="Wed Oct  4 11:26:56 2006"
> > > ccm_transition="1" dc_uuid="0044b88e-c148-4269-9d39-449324bf65b8"/>
> > > pengine[13948]: 2006/10/04_11:27:59 WARN: unpack_config:unpack.c No value
> > > specified for cluster preference: default_action_timeout pengine[13948]:
> > > 2006/10/04_11:27:59 info: unpack_config:unpack.c Default stickiness:
> > > 1000000 pengine[13948]: 2006/10/04_11:27:59 info: unpack_config:unpack.c
> > > Default failure stickiness: -1000000 pengine[13948]: 2006/10/04_11:27:59
> > > WARN: unpack_config:unpack.c No value specified for cluster preference:
> > > stonith_enabled pengine[13948]: 2006/10/04_11:27:59 info:
> > > unpack_config:unpack.c STONITH of failed nodes is disabled
> > > pengine[13948]: 2006/10/04_11:27:59 WARN: unpack_config:unpack.c No value
> > > specified for cluster preference: stonith_action pengine[13948]:
> > > 2006/10/04_11:27:59 info: unpack_config:unpack.c STONITH will reboot
> > > nodes pengine[13948]: 2006/10/04_11:27:59 info: unpack_config:unpack.c
> > > Cluster is symmetric - resources can run anywhere by default
> > > pengine[13948]: 2006/10/04_11:27:59 notice: unpack_config:unpack.c On
> > > loss of CCM Quorum: Ignore pengine[13948]: 2006/10/04_11:27:59 WARN:
> > > unpack_config:unpack.c No value specified for cluster preference:
> > > stop_orphan_resources pengine[13948]: 2006/10/04_11:27:59 info:
> > > unpack_config:unpack.c Orphan resources are stopped pengine[13948]:
> > > 2006/10/04_11:27:59 WARN: unpack_config:unpack.c No value specified for
> > > cluster preference: stop_orphan_actions pengine[13948]:
> > > 2006/10/04_11:27:59 info: unpack_config:unpack.c Orphan resource actions
> > > are stopped pengine[13948]: 2006/10/04_11:27:59 WARN:
> > > unpack_config:unpack.c No value specified for cluster preference:
> > > remove_after_stop pengine[13948]: 2006/10/04_11:27:59 info:
> > > unpack_config:unpack.c Stopped resources are removed from the status
> > > section: false pengine[13948]: 2006/10/04_11:27:59 WARN:
> > > unpack_config:unpack.c No value specified for cluster preference:
> > > is_managed_default pengine[13948]: 2006/10/04_11:27:59 info:
> > > unpack_config:unpack.c By default resources are managed pengine[13948]:
> > > 2006/10/04_11:27:59 info: determine_online_status:unpack.c Node
> > > management1 is online pengine[13948]: 2006/10/04_11:27:59 info:
> > > dummy_resource      (heartbeat::ocf:Dummy): Started management1
> > > pengine[13948]: 2006/10/04_11:27:59 notice: NoRoleChange:native.c Leave
> > > resource dummy_resource     (management1) pengine[13948]: 2006/10/04_11:27:59
> > > notice: Recurring:native.c management1         dummy_resource_monitor_10000
> > > pengine[13948]: 2006/10/04_11:27:59 notice: stage8:allocate.c Created
> > > transition graph 1. pengine[13948]: 2006/10/04_11:27:59 WARN:
> > > process_pe_message:pengine.c No value specified for cluster preference:
> > > pe-input-series-max pengine[13948]: 2006/10/04_11:27:59 info:
> > > process_pe_message:pengine.c Transition 1: PEngine Input stored in:
> > > /var/lib/heartbeat/pengine/pe-input-374.bz2 crmd[13941]:
> > > 2006/10/04_11:27:59 info: do_state_transition:fsa.c management1: State
> > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> > > cause=C_IPC_MESSAGE origin=route_message ] tengine[13947]:
> > > 2006/10/04_11:27:59 info: unpack_graph:unpack.c Unpacked transition 1: 1
> > > actions in 1 synapses tengine[13947]: 2006/10/04_11:27:59 info:
> > > send_rsc_command:actions.c Initiating action 5:
> > > dummy_resource_monitor_10000 on management1 crmd[13941]:
> > > 2006/10/04_11:27:59 info: do_lrm_rsc_op:lrm.c Performing op monitor on
> > > dummy_resource (interval=10000ms,
> > > key=1:c3cec89f-7e5b-4027-b4cb-1c5c690890c6) cib[13964]:
> > > 2006/10/04_11:27:59 info: write_cib_contents:io.c Wrote version
> > > 0.179.4381 of the CIB to disk (digest: 48fbe112a7a5d3ed0edb1fb43d267324)
> > > cib[13937]: 2006/10/04_11:28:10 info: activateCibXml:io.c CIB size is
> > > 48048 bytes (was 45756) crmd[13941]: 2006/10/04_11:28:10 info:
> > > process_lrm_event:lrm.c LRM operation (4) monitor_10000 on dummy_resource
> > > complete cib[13937]: 2006/10/04_11:28:10 info: cib_diff_notify:notify.c
> > > Update (client: 13941, call:23): 0.179.4381 -> 0.179.4382 (ok)
> > > cib[13969]: 2006/10/04_11:28:10 info: write_cib_contents:io.c Wrote
> > > version 0.179.4382 of the CIB to disk (digest:
> > > 93233eaeb8d65d1cae312e07b97c7bd8) crmd[13941]: 2006/10/04_11:28:10 info:
> > > do_state_transition:fsa.c management1: State transition
> > > S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE
> > > origin=route_message ] tengine[13947]: 2006/10/04_11:28:10 info:
> > > te_update_diff:callbacks.c Processing diff (cib_update): 0.179.4381 ->
> > > 0.179.4382 tengine[13947]: 2006/10/04_11:28:10 info:
> > > match_graph_event:events.c Action dummy_resource_monitor_10000 (5)
> > > confirmed tengine[13947]: 2006/10/04_11:28:10 info: run_graph:graph.c
> > > Transition 1: (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0)
> > > tengine[13947]: 2006/10/04_11:28:10 info: notify_crmd:actions.c
> > > Transition 1 status: te_complete - (null)
>
> --
> Max Hofer
> APUS Software G.m.b.H.
> A-8074 Raaba, Bahnhofstra�e 1/1
> T| +43 316 401629 11
> F| +43 316 401629 9
> W| www.apus.co.at
> E| max.hofer at apus.co.at
>


More information about the Linux-HA mailing list