Resource doesn't start (was: [Linux-HA] Version 2 resource groups
and more)
Jack Forester, Jr.
linux_wiz at verizon.net
Sun Jul 10 14:43:47 MDT 2005
Hm...as soon as I modified my hb_res script to include an id field for
the <nvpair> flags, the resources would start. I guess I need to go
back and read the DTD a little more closely. Seems a few things have
changed between 1.99.5 and 1.99.6 :)
On Sun, 2005-07-10 at 10:49 +0200, Andrew Beekhof wrote:
> Can you include the complete logs (not just the PE) please?
>
> It looks like the PE thinks it should start the resource and the TE
> does too. Indeed the TE makes the first step towards starting it but
> then... well i dont know because I need the logs.
>
> Before you send them though, can you make sure all <nvpair> tags have
> an "id" field. It doesn't really matter what they contain, as long as
> they are unique.
>
> Andrew
>
> On 7/10/05, Jack Forester, Jr. <linux_wiz at verizon.net> wrote:
> > Sorry it's taken me so long to get back around to this. I've grabbed
> > the latest bits off of CVS and built it. The delete is working for me
> > now. Thanks, Andrew!
> >
> > Unfortunately, now I'm unable to get heartbeat to start a resource.
> > I've been looking at logs and code and the closest I can determine is
> > that heartbeat is marking the resource un-runnable. I'm only running on
> > one cluster node at the moment, but I get the same result if I have both
> > nodes up.
> >
> > I don't see where the resource script is ever invoked. Any pointers to
> > where I should look?
> >
> > Here's my cib.xml:
> >
> > <cib generated="true" cib_feature_revision="1" admin_epoche="0"
> > epoche="1" num_updates="10" have_quorum="true" last_writt
> > en="Sat Jul 9 16:19:40 2005" ccm_transition="1" origin="persian"
> > num_peers="1" dc_uuid="21510af9-f998-420d-8fe8-7258329cc
> > b4a" debug_source="finalize_join">
> > <configuration>
> > <crm_config/>
> > <nodes>
> > <node id="21510af9-f998-420d-8fe8-7258329ccb4a" uname="persian"
> > type="member"/>
> > </nodes>
> > <resources>
> > <primitive id="WebServerIP" class="ocf" type="IPaddr"
> > provider="heartbeat" on_stopfail="ignore">
> > <instance_attributes>
> > <attributes>
> > <nvpair name="nic" value="eth0"/>
> > <nvpair name="ip" value="192.168.1.100"/>
> > </attributes>
> > </instance_attributes>
> > </primitive>
> > </resources>
> > <constraints/>
> > </configuration>
> > <status>
> > <node_state id="21510af9-f998-420d-8fe8-7258329ccb4a"
> > uname="persian" in_ccm="true" join="member" origin="do_lrm_quer
> > y" crmd="online" ha="active" expected="member">
> > <lrm>
> > <lrm_resources>
> > <lrm_resource id="WebServerIP" rsc_state="starting"
> > op_status="-1" rc_code="-1" last_op="start">
> > <lrm_rsc_op id="WebServerIP_start_0" operation="start"
> > op_status="-1" rc_code="-1" origin="cib_action_update"
> > transition_id="4" transition_magic="4:-1"/>
> > </lrm_resource>
> > </lrm_resources>
> > </lrm>
> > </node_state>
> > </status>
> > </cib>
> >
> >
> > Here's some of my ha-debug log:
> >
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(unpack.c:unpack_nodes):
> > Begining unpack... nodes
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(unpack.c:unpack_resources): Begining unpack... resources
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(unpack.c:unpack_resources [2]): Begining unpack... primitive
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > Processing resource input...:
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > <primitive id="WebServerIP" class="ocf" type="IPaddr"
> > provider="heartbeat" on_stopfail="ignore">
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > <instance_attributes>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > <attributes>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > <nvpair name="nic" value="eth0"/>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > <nvpair name="ip" value="192.168.1.100"/>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > </attributes>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > </instance_attributes>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(common_unpack [2]):
> > </primitive>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(complex.c:common_unpack
> > [2]): Options for WebServerIP
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(complex.c:common_unpack
> > [2]): Dependancy restart handling: ignore
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(complex.c:common_unpack
> > [2]): Multiple running resource recovery: stop/s
> > tart
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(complex.c:common_unpack
> > [2]): Placement: optimal (default)
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > contents %s:
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > <lrm>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > <lrm_resources/>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > </lrm>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > found in %s:
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > <node_state id="21510af9-f998-420d-8fe8-7258329ccb4a"
> > uname="persian" in_ccm="true" join="member" origin="do_lrm_query"
> > crmd="online" ha="active" expected="member">
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > <lrm>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > <lrm_resources/>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > </lrm>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > </node_state>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > contents %s:
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > <lrm_resources/>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > found in %s:
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > <lrm>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > <lrm_resources/>
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(find_xml_node [5]):
> > </lrm>
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(unpack.c:determine_online_status [2]): Node persian is online
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(unpack.c:unpack_constraints): Begining unpack... constraints
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(stages.c:choose_node_from_list [2]): Could not allocate a node for
> > color 0
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(stages.c:stage4 [2]): No
> > node available for color 0
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(utils.c:custom_action
> > [2]): Creating action 1: start for WebServerIP on pe
> > rsian
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(utils.c:find_rsc_op_entry [2]): No matching for WebServerIP_start_0
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(utils.c:unpack_operation
> > [2]): Action start requires: quorum (default)
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(utils.c:unpack_operation
> > [2]): start failure results in: resource stop (d
> > efault)
> > pengine[32492]: 2005/07/09_16:16:55 info:
> > mask(native.c:native_create_actions): Start resource WebServerIP
> > (persian)
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(native.c:create_recurring_actions [2]): Creating recurring actions
> > for Web
> > ServerIP
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(graph.c:update_action_states [2]): Updating 1 actions
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(pengine.c:do_calculations [2]): =#=#=#=#= Summary =#=#=#=#=
> > pengine[32492]: 2005/07/09_16:16:55 debug:
> > mask(pengine.c:do_calculations [2]): ========= All Actions =========
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(utils.c:log_action
> > [2]): : (Provisional) Action 1: start WebServerI
> > P @ persian
> > pengine[32492]: 2005/07/09_16:16:55 debug: mask(utils.c:log_action [3]):
> > ====== Preceeding Actions
> > pengine[32492]: 2005/07/09_16:16:56 debug: mask(utils.c:log_action [3]):
> > ====== Subsequent Actions
> > pengine[32492]: 2005/07/09_16:16:56 debug: mask(utils.c:log_action [3]):
> > ====== End
> > pengine[32492]: 2005/07/09_16:16:56 debug:
> > mask(pengine.c:do_calculations [2]): ========= Set -1
> > (Un-runnable) ===
> > ======
> > pengine[32492]: 2005/07/09_16:16:56 info: mask(stages.c:stage8):
> > Creating transition graph 1.
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(unpack.c:determine_online_status [2]): Node persian is online
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(unpack.c:unpack_rsc_op
> > [2]): Unpacking task WebServerIP/start (-1) on pers
> > ian
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:custom_action
> > [2]): Creating action 1: start for WebServerIP on <N
> > ULL>
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(utils.c:find_rsc_op_entry [2]): No matching for WebServerIP_start_0
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:unpack_operation
> > [2]): Action start requires: quorum (default)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:unpack_operation
> > [2]): start failure results in: resource stop (d
> > efault)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:custom_action
> > [2]): Action 1 marked manditory
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(unpack.c:unpack_constraints): Begining unpack... constraints
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > rsc action: : !!Non-Startable!! Action 1:
> > start WebServerIP @ (null)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > (seen=0, before=0, after=0)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > rsc action: : !!Non-Startable!! Action 1:
> > start WebServerIP @ (null)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > (seen=0, before=0, after=0)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > rsc action: : !!Non-Startable!! Action 1:
> > start WebServerIP @ (null)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > (seen=0, before=0, after=0)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > rsc action: : !!Non-Startable!! Action 1:
> > start WebServerIP @ (null)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > (seen=0, before=0, after=0)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > rsc action: : !!Non-Startable!! Action 1:
> > start WebServerIP @ (null)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > (seen=0, before=0, after=0)
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(stages.c:choose_node_from_list [2]): Could not allocate a node for
> > color 0
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(stages.c:stage4 [2]): No
> > node available for color 0
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(native.c:native_create_actions [2]): Stop and restart of
> > WebServerIP
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:custom_action
> > [2]): Creating action 2: stop for WebServerIP on per
> > sian
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(utils.c:find_rsc_op_entry [2]): No matching for WebServerIP_stop_0
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:unpack_operation
> > [2]): Action stop requires: nothing (default)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:unpack_operation
> > [2]): stop failure results in: nothing
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:find_actions
> > [2]): While looking for WebServerIP_start_0 action on
> > persian, found an unallocated one. Assigning it to the requested
> > node...
> > pengine[32492]: 2005/07/09_16:17:39 info:
> > mask(native.c:native_create_actions): Leave resource WebServerIP
> > (persian)
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(native.c:create_recurring_actions [2]): Creating recurring actions
> > for Web
> > ServerIP
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(complex.c:order_actions
> > [2]): Ordering 1: Action 2 before 1
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [4]):
> > LH (order_actions): Optional Action 2: stop WebSe
> > rverIP @ persian
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [4]):
> > (seen=0, before=0, after=0)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [4]):
> > RH (order_actions): (Provisional) Action 1: start
> > WebServerIP @ persian
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [4]):
> > (seen=0, before=0, after=0)
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(graph.c:update_action_states [2]): Updating 2 actions
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(pengine.c:do_calculations [2]): =#=#=#=#= Summary =#=#=#=#=
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(pengine.c:do_calculations [2]): ========= All Actions =========
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action
> > [2]): : (Provisional) Action 1: start WebServerI
> > P @ persian
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > ====== Preceeding Actions
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action
> > [3]): : Optional Action 2: stop WebServe
> > rIP @ persian
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > (seen=0, before=0, after=1)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > ====== Subsequent Actions
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > ====== End
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action
> > [2]): : Optional Action 2: stop WebServerIP @ pe
> > rsian
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > ====== Preceeding Actions
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > ====== Subsequent Actions
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action
> > [3]): : (Provisional) Action 1: start We
> > bServerIP @ persian
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > (seen=0, before=1, after=0)
> > pengine[32492]: 2005/07/09_16:17:39 debug: mask(utils.c:log_action [3]):
> > ====== End
> > pengine[32492]: 2005/07/09_16:17:39 debug:
> > mask(pengine.c:do_calculations [2]): ========= Set -1
> > (Un-runnable) ===
> > ======
> >
> >
> > On Mon, 2005-06-27 at 10:45 +0200, Andrew Beekhof wrote:
> > > Hi Jack,
> > >
> > > I've just finished testing and fixing delete (/me makes a note to add
> > > regression tests for cibadmin one day). If you're confident in using
> > > CVS, I would recommend giving it a try. This page will explain how to
> > > obtain it: http://wwnew.linux-ha.org/CVS
> > >
> > > Thanks for pointing out the problem!
> >
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
More information about the Linux-HA
mailing list