[Linux-HA] ERROR: EvmsSCC: vs4 (local) not on active list!
John Lange
john.lange at open-it.ca
Wed Jan 31 13:53:24 MST 2007
I hate to be a pain here but does anyone have any suggestions?
We have a cluster that is running only on one node at the moment while
we attempt to get heartbeat+evms+ocfs2+nfs in working order.
Any suggestions would be much appreciated.
Thanks,
John Lange
On Mon, 2007-01-29 at 14:18 -0600, John Lange wrote:
> Ok, thanks for the suggestions.
>
> I've made the suggested change to the EvmsSCC cloneset and indeed it
> seems to be getting closer to working but is still will not start.
>
> For the sake of completeness I have included my entire cib.xml and below
> that is what I hope is the relevant portion of the log file.
>
> >From what I can see the evms_activate command is timing out. I think
> this then causes heartbeat to bounce the evms_activate around to the
> other nodes causing them to lock each other out.
>
> Below the log files you can see the result of "evms_activate" when I run
> it manually on vs4 after heartbeat has tried to start evms.
>
> This is a rather large evms setup and even when I run evms_activate
> manually it takes 2 1/2 minutes to start normally so perhaps we need a
> larger timeout value?
>
> I have one other concern about evms; is EVMS even viable for redundancy?
> When a node is down evms will not start complaining that it can't
> contact one of the nodes. How do you make it work in a cluster when one
> or more of the nodes may be down? Perhaps it makes more sense to back
> out of EVMS and go with straight LVM on the shared storage with OCFS2 on
> top of that?
>
> ------
>
> <cib admin_epoch="0" have_quorum="true" num_peers="4" cib_feature_revision="1.3" generated="true" ccm_transition="89" dc_uuid="21a514da-4a8c-49a8-bd78-79179418a3f5" epoch="114" num_updates="4922" cib-last-written="Mon Jan 29 13:46:42 2007">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <attributes>
> <nvpair id="cib-bootstrap-options-transition_idle_timeout" name="transition_idle_timeout" value="60"/>
> <nvpair id="cib-bootstrap-options-default_resource_stickiness" name="default_resource_stickiness" value="INFINITY"/>
> <nvpair id="cib-bootstrap-options-default_resource_failure_stickiness" name="default_resource_failure_stickiness" value="-500"/>
> <nvpair id="cib-bootstrap-options-stonith_enabled" name="stonith_enabled" value="False"/>
> <nvpair id="cib-bootstrap-options-stonith_action" name="stonith_action" value="reboot"/>
> <nvpair id="cib-bootstrap-options-symmetric_cluster" name="symmetric_cluster" value="true"/>
> <nvpair id="cib-bootstrap-options-no_quorum_policy" name="no_quorum_policy" value="ignore"/>
> <nvpair id="cib-bootstrap-options-stop_orphan_resources" name="stop_orphan_resources" value="true"/>
> <nvpair id="cib-bootstrap-options-stop_orphan_actions" name="stop_orphan_actions" value="true"/>
> <nvpair id="cib-bootstrap-options-is_managed_default" name="is_managed_default" value="true"/>
> <nvpair name="last-lrm-refresh" id="cib-bootstrap-options-last-lrm-refresh" value="1170099970"/>
> </attributes>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="21a514da-4a8c-49a8-bd78-79179418a3f5" uname="vs1" type="normal"/>
> <node id="9ba549a0-8f53-46fe-9946-02d1ea6acc2d" uname="vs2" type="normal"/>
> <node id="4cb9baf7-747c-46dc-9d8c-debb00225d84" uname="vs3" type="normal"/>
> <node id="f6ed8bf2-eb64-4fa0-8bab-c7e990193876" uname="vs4" type="normal">
> <instance_attributes id="nodes-f6ed8bf2-eb64-4fa0-8bab-c7e990193876">
> <attributes>
> <nvpair id="standby-f6ed8bf2-eb64-4fa0-8bab-c7e990193876" name="standby" value="off"/>
> </attributes>
> </instance_attributes>
> </node>
> </nodes>
> <resources>
> <primitive class="ocf" type="IPaddr" provider="heartbeat" id="vs2vip" resource_stickiness="#default">
> <instance_attributes id="vs2vip_instance_attrs">
> <attributes>
> <nvpair name="target_role" id="vs2vip_target_role" value="started"/>
> <nvpair id="c7e3b680-d5a5-4fd9-be12-55b34e5ad71b" name="ip" value="142.160.197.59"/>
> <nvpair id="8d68ab51-3fe9-47ea-8945-4dd65a2558a4" name="nic" value="eth0"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive class="ocf" type="IPaddr" provider="heartbeat" id="vs1vip">
> <instance_attributes id="vs1vip_instance_attrs">
> <attributes>
> <nvpair name="target_role" id="vs1vip_target_role" value="started"/>
> <nvpair id="c41cd38b-dec5-49e2-8394-f487e50f77d3" name="ip" value="142.160.197.58"/>
> <nvpair id="c716fb30-af32-4b78-9af6-1536beac6469" name="nic" value="eth0"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive class="ocf" type="IPaddr" provider="heartbeat" id="vs3vip">
> <instance_attributes id="vs3vip_instance_attrs">
> <attributes>
> <nvpair name="target_role" id="vs3vip_target_role" value="started"/>
> <nvpair id="abca92c6-d079-49e0-a5b1-5c0473ff648a" name="ip" value="142.160.197.61"/>
> <nvpair id="93e30e85-e929-4e49-931b-2a1e1cc7389f" name="nic" value="eth0"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive id="vs4vip" class="ocf" type="IPaddr" provider="heartbeat">
> <instance_attributes id="vs4vip_instance_attrs">
> <attributes>
> <nvpair id="vs4vip_target_role" name="target_role" value="started"/>
> <nvpair id="26690918-cb87-429b-a5e0-439cc2100834" name="ip" value="142.160.197.62"/>
> <nvpair id="414a8489-e7d8-4e50-ad04-b3606b30c687" name="nic" value="eth0"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <clone id="evmscloneset" notify="true" globally_unique="false">
> <instance_attributes id="evmscloneset">
> <attributes>
> <nvpair id="evmscloneset-01" name="clone_node_max" value="1"/>
> </attributes>
> </instance_attributes>
> <primitive id="evmsclone" class="ocf" type="EvmsSCC" provider="heartbeat"/>
> </clone>
> </resources>
> <constraints>
> <rsc_location id="vip1" rsc="vs1vip">
> <rule id="prefered_vip1" score="100">
> <expression attribute="#uname" id="2be02610-4149-439b-8426-c4f0ab73b6f8" operation="eq" value="vs1"/>
> </rule>
> </rsc_location>
> <rsc_location id="vip2" rsc="vs2vip">
> <rule id="prefered_vip2" score="100">
> <expression attribute="#uname" id="4c2d30c0-6d35-4d61-b06a-c5906ff433a2" operation="eq" value="vs2"/>
> </rule>
> </rsc_location>
> <rsc_location id="vip3" rsc="vs3vip">
> <rule id="prefered_vip3" score="100">
> <expression attribute="#uname" id="2af01f2a-c221-4a4f-90b0-185cb6422dfd" operation="eq" value="vs3"/>
> </rule>
> </rsc_location>
> <rsc_location id="vip4" rsc="vs4vip">
> <rule id="prefered_vip4" score="100">
> <expression attribute="#uname" id="35353cb8-7b17-4ac8-924c-95d56b78e637" operation="eq" value="vs4"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
>
>
>
>
> Jan 29 13:46:13 vs4 crmd: [2849]: info: do_lrm_rsc_op: Performing op=evmsclone:2_start_0 key=21:938:40444442-4fd8-4af3-b7df-981c9a6fed63)
> Jan 29 13:46:13 vs4 lrmd: [2846]: info: RA output: (evmsclone:2:start:stdout) 2851
> Jan 29 13:46:13 vs4 EvmsSCC[3076]: [3080]: DEBUG: EvmsSCC: Start: starting node(s): vs4 .
> Jan 29 13:46:13 vs4 EvmsSCC[3076]: [3081]: DEBUG: EvmsSCC: Start_Notify: I am node vs4.
> Jan 29 13:46:13 vs4 EvmsSCC[3076]: [3085]: DEBUG: EvmsSCC: Start_Notify: First node in starting list is vs4.
> Jan 29 13:46:13 vs4 EvmsSCC[3076]: [3086]: DEBUG: EvmsSCC: Start_Notify: I am running evms_activate.
> Jan 29 13:46:14 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 2849, call:36): 0.114.4903 -> 0.114.4904 (ok)
> Jan 29 13:46:14 vs4 cib: [3093]: info: write_cib_contents: Wrote version 0.114.4904 of the CIB to disk (digest: 4b3533d4711344dcd552434fb7196ee5)
> Jan 29 13:46:16 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4931, call:286): 0.114.4904 -> 0.114.4905 (ok)
> Jan 29 13:46:16 vs4 cib: [3118]: info: write_cib_contents: Wrote version 0.114.4905 of the CIB to disk (digest: f3d39dd34623ebff64d917380b644791)
> Jan 29 13:46:33 vs4 lrmd: [2846]: WARN: on_op_timeout_expired: TIMEOUT: operation start[17] on ocf::EvmsSCC::evmsclone:2 for client 2849, its parameters: CRM_meta_notify_ac
> tive_resource=[evmsclone:0 evmsclone:1 evmsclone:3 ] CRM_meta_notify_start_resource=[evmsclone:2 ] CRM_meta_notify_active_uname=[vs2 vs3 vs1 ] CRM_meta_ti.
> Jan 29 13:46:33 vs4 crmd: [2849]: ERROR: process_lrm_event: LRM operation evmsclone:2_start_0 (17) Timed Out (timeout=20000ms)
> Jan 29 13:46:33 vs4 crmd: [2849]: info: append_restart_list: Resource evmsclone:2 does not support reloads
> Jan 29 13:46:34 vs4 crmd: [2849]: info: do_lrm_rsc_op: Performing op=evmsclone:2_notify_0 key=43:939:40444442-4fd8-4af3-b7df-981c9a6fed63)
> Jan 29 13:46:34 vs4 crmd: [2849]: info: process_lrm_event: LRM operation evmsclone:2_notify_0 (call=18, rc=0) complete
> Jan 29 13:46:34 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 2849, call:37): 0.114.4905 -> 0.114.4906 (ok)
> Jan 29 13:46:34 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4930, call:1349): 0.114.4906 -> 0.114.4907 (ok)
> Jan 29 13:46:34 vs4 cib: [3126]: info: write_cib_contents: Wrote version 0.114.4907 of the CIB to disk (digest: beb04f6da8be4afbd6b82659bec972c8)
> Jan 29 13:46:35 vs4 crmd: [2849]: info: do_lrm_rsc_op: Performing op=evmsclone:2_stop_0 key=1:939:40444442-4fd8-4af3-b7df-981c9a6fed63)
> Jan 29 13:46:35 vs4 crmd: [2849]: info: process_lrm_event: LRM operation evmsclone:2_stop_0 (call=19, rc=0) complete
> Jan 29 13:46:35 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4676, call:82): 0.114.4907 -> 0.114.4908 (ok)
> Jan 29 13:46:35 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 2849, call:38): 0.114.4908 -> 0.114.4909 (ok)
> Jan 29 13:46:35 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4608, call:81): 0.114.4909 -> 0.114.4910 (ok)
> Jan 29 13:46:35 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4930, call:1350): 0.114.4910 -> 0.114.4911 (ok)
> Jan 29 13:46:35 vs4 cib: [3130]: info: write_cib_contents: Wrote version 0.114.4911 of the CIB to disk (digest: deaa7050cb5e42b386bf2f2e1e3c9d19)
> Jan 29 13:46:36 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 2849, call:39): 0.114.4911 -> 0.114.4912 (ok)
> Jan 29 13:46:36 vs4 cib: [3131]: info: write_cib_contents: Wrote version 0.114.4912 of the CIB to disk (digest: 5d6a95ee315d55c12e2167c439d117a9)
> Jan 29 13:46:37 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4676, call:83): 0.114.4912 -> 0.114.4913 (ok)
> Jan 29 13:46:37 vs4 cib: [3132]: info: write_cib_contents: Wrote version 0.114.4913 of the CIB to disk (digest: a5e99073be8aa9d506b85f989ef5e6d8)
> Jan 29 13:46:38 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4608, call:82): 0.114.4913 -> 0.114.4914 (ok)
> Jan 29 13:46:38 vs4 cib: [3133]: info: write_cib_contents: Wrote version 0.114.4914 of the CIB to disk (digest: fc2662f3af88a8f78fefbec8f41ae9a8)
> Jan 29 13:46:39 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4676, call:84): 0.114.4914 -> 0.114.4915 (ok)
> Jan 29 13:46:39 vs4 cib: [3134]: info: write_cib_contents: Wrote version 0.114.4915 of the CIB to disk (digest: de897537e15c320e37dadd3488e37fb4)
> Jan 29 13:46:40 vs4 crmd: [2849]: info: do_lrm_rsc_op: Performing op=evmsclone:3_start_0 key=21:939:40444442-4fd8-4af3-b7df-981c9a6fed63)
> Jan 29 13:46:40 vs4 lrmd: [2846]: info: RA output: (evmsclone:3:start:stdout) 2851
> Jan 29 13:46:40 vs4 EvmsSCC[3135]: [3139]: DEBUG: EvmsSCC: Start: starting node(s): vs1 vs4 .
> Jan 29 13:46:40 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4608, call:83): 0.114.4915 -> 0.114.4916 (ok)
> Jan 29 13:46:40 vs4 crmd: [2849]: info: process_lrm_event: LRM operation evmsclone:3_start_0 (call=20, rc=0) complete
> Jan 29 13:46:40 vs4 EvmsSCC[3135]: [3140]: DEBUG: EvmsSCC: Start_Notify: I am node vs4.}
> Jan 29 13:46:40 vs4 EvmsSCC[3135]: [3144]: DEBUG: EvmsSCC: Start_Notify: First node in starting list is vs1.^?
> Jan 29 13:46:40 vs4 crmd: [2849]: info: append_restart_list: Resource evmsclone:3 does not support reloads
> Jan 29 13:46:40 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4930, call:1351): 0.114.4916 -> 0.114.4917 (ok)
> Jan 29 13:46:40 vs4 cib: [3145]: info: write_cib_contents: Wrote version 0.114.4917 of the CIB to disk (digest: 15daf39892a99345584557c6c679b88f)
> Jan 29 13:46:41 vs4 crmd: [2849]: info: do_lrm_rsc_op: Performing op=evmsclone:3_notify_0 key=44:939:40444442-4fd8-4af3-b7df-981c9a6fed63)
> Jan 29 13:46:41 vs4 crmd: [2849]: info: process_lrm_event: LRM operation evmsclone:3_notify_0 (call=21, rc=0) complete
> Jan 29 13:46:41 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 2849, call:40): 0.114.4917 -> 0.114.4918 (ok)
> Jan 29 13:46:41 vs4 cib: [2845]: info: cib_diff_notify: Update (client: 4930, call:1352): 0.114.4918 -> 0.114.4919 (ok)
>
> =====
>
> # evms_activate
> >From node vs3: Engine: The EVMS Engine is currently in use by process
> 7983 (/sbin/evmsd_worker).
> The process has locked the Engine on behalf of node vs4.
> >From node vs1: Engine: The EVMS Engine is currently in use by process
> 8448 (/sbin/evmsd_worker).
> The process has locked the Engine on behalf of node vs4.
>
>
>
> On Mon, 2007-01-29 at 10:30 +0100, Andrew Beekhof wrote:
> > On 1/26/07, John Lange <john.lange at open-it.ca> wrote:
> > > Just some additional information. After much digging I determined that
> > > the EvmsSCC script determines its active list from the variable
> > >
> > > OCF_RESKEY_CRM_meta_notify_active_uname
> > >
> > > This led me to add the parameter "meta_notify_active_uname"
> >
> > this is an automatically generated field (we add CRM_meta_ to such
> > things to keep them out of the regular namespace) so you shouldn't be
> > setting them yourself.
> >
> > > in the clone
> > > set and add value of the node there with the resulting resource looking
> > > like this:
> > >
> > > <clone id="evmscloneset">
> > > <instance_attributes id="evmscloneset_instance_attrs">
> > > <attributes>
> > > <nvpair id="evmscloneset_clone_max" name="clone_max" value="4"/>
> > > <nvpair id="evmscloneset_clone_node_max" name="clone_node_max" value="1"/>
> > > <nvpair name="target_role" id="evmscloneset_target_role" value="stopped"/>
> > > </attributes>
> > > </instance_attributes>
> > > <primitive class="ocf" type="EvmsSCC" provider="heartbeat" id="evms">
> > > <instance_attributes id="evms_instance_attrs">
> > > <attributes>
> > > <nvpair name="target_role" id="evms_target_role" value="started"/>
> > > </attributes>
> > > <attributes>
> > > <nvpair name="meta_notify_active_uname" id="evms_meta_notify_active_uname" value="vs4"/>
> > > </attributes>
> > > </instance_attributes>
> > > <operations>
> > > <op id="78df7ddc-5a7c-4262-a203-82b5ad7c995d" name="notify" interval="20s" timeout="120s"/>
> > > </operations>
> > > </primitive>
> > > </clone>
> > >
> > >
> > > Alas, the nodes still do not start giving the same error as before.
> > >
> > > I'm almost positive its juts a matter of setting the correct attribute
> > > values but I just can't seem to figure out their names.
> >
> > its possibly a problem in the RA or in the CRM itself...
> > if you can send through your logs i'd be happy to help you get this up
> > and running
> >
> > >
> > > John
> > >
> > > On Fri, 2007-01-26 at 11:09 -0600, John Lange wrote:
> > > > I believe the reason why we can't start our EVMS nodes is because we are
> > > > missing some paramater for the resource.
> > > >
> > > > It shows the following error in the logs:
> > > >
> > > > ERROR: EvmsSCC: vs4 (local) not on active list!
> > > >
> > > > Can anyone shed some light on how to fix this issue?
> > > >
> > > > Regards,
> > > >
> > > > John
> > > >
> > > >
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > Linux-HA at lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > >
> > >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list