[Linux-HA] do_lrm_invoke: Bad command and crm_failcount
Dejan Muhamedagic
dejanmm at fastmail.fm
Fri Oct 6 07:00:08 MDT 2006
Hi,
Did you resolve this issue? Last night I had a very similar
occurence:
Oct 6 06:43:09 rbxtc01 crmd: [8070]: ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:783 : class != NULL
Oct 6 06:43:09 n01 crmd: [8070]: ERROR: do_lrm_invoke:lrm.c Invalid resource definition
Oct 6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command <rsc_op transition_key="crm_resource-5450">
Oct 6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command <primitive id="gs" long-id="gs"/>
Oct 6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command <attributes crm_feature_set="1.0.6"/>
Oct 6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command </rsc_op>
There was nothing else in the logs.
crm_verify found the configuration good.
The resource wouldn't start. It's a clone:
<clone id="gs" globally_unique="false">
<instance_attributes id="gs_inst_attr">
<attributes>
<nvpair id="gs_attr_1" name="clone_max" value="3"/>
<nvpair id="gs_attr_2" name="clone_node_max" value="1"/>
</attributes>
</instance_attributes>
<primitive class="ocf" id="Filesystem_1" provider="heartbeat" type="Filesystem">
<instance_attributes id="Filesystem_1_inst_attr">
<attributes>
<nvpair id="Filesystem_1_attr_0" name="device" value="a02:/gs"/>
<nvpair id="Filesystem_1_attr_1" name="directory" value="/gs"/>
<nvpair id="Filesystem_1_attr_2" name="fstype" value="nfs"/>
<nvpair id="Filesystem_1_attr_3" name="options" value="proto=udp" />
</attributes>
</instance_attributes>
</primitive>
<instance_attributes id="gs">
<attributes/>
</instance_attributes>
</clone>
After restarting heartbeat everything was back to normal.
Cheers,
Dejan
On Mon, Sep 18, 2006 at 04:59:11PM +0300, Oren Nechushtan wrote:
> We get the next warning/errors
> ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:783 : class != NULL
> WARN: do_lrm_invoke: Bad command
>
> The cib.xml seems OK.
> Here is a caption of the ha-log,ha.cf and cib.xml
>
> This started out of the blue and continues every ~10 minutes.
>
> Oren
>
> P.S.
> Special node: we use a crontab script that is called periodically (every 10m!)
> crm_resource -C -r ..
> crm_failcount -D -r ..
> Without this, the system breaks down after reboots/long test periods; so I guess the script got called at the wrong moment..
> --------------------------
>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <cib num_updates="1705">
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <status>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <node_state id="4f1373b2-98ba-46fb-bc3f-7
> a800b9dd9a9">
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <lrm id="4f1373b2-98ba-46fb-bc3f-7a800b
> 9dd9a9">
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <lrm_resources>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <lrm_resource id="IPaddr_private_sh
> ared1">
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <lrm_rsc_op transition_magic="4:7
> ;17:4e622eb8-e354-42c4-b53f-7eef9e1dc843" rc_code="7" op_status="4" id="IPaddr_private_shared1_moni
> tor_15000"/>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </lrm_resource>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </lrm_resources>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </lrm>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </node_state>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </status>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </cib>
> cib[17260]: 2006/09/17_15:50:00 info: write_cib_contents:io.c Wrote version 0.6.1705 of the CIB to
> disk (digest: d109daaa481346a3a72bf71b62bc2527)
> pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message: [generation] <cib admin_epoch="0" hav
> e_quorum="true" generated="true" ccm_transition="6" num_peers="2" cib_feature_revision="1.3" dc_uui
> d="4f1373b2-98ba-46fb-bc3f-7a800b9dd9a9" epoch="6" num_updates="1705"/>
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default stickiness: 15
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default failure stickiness: -10
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH of failed nodes is disable
> d
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH will reboot nodes
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Cluster is symmetric - resources c
> an run anywhere by default
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c On loss of CCM Quorum: Stop ALL re
> sources
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan resources are stopped
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan resource actions are stoppe
> d
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Stopped resources are removed from
> the status section: false
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c By default resources are managed
> pengine[30505]: 2006/09/17_15:50:00 info: determine_online_status:unpack.c Node mgmnt-ha-1 is online
> pengine[30505]: 2006/09/17_15:50:00 WARN: unpack_rsc_op:unpack.c Processing failed op (IPaddr_priva
> te_shared1_monitor_15000) for IPaddr_private_shared1 on mgmnt-ha-1
> pengine[30505]: 2006/09/17_15:50:00 info: determine_online_status:unpack.c Node mgmnt-ha-2 is onlin
> e
> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared1 (heartbeat::ocf:IPaddr3):
> Started mgmnt-ha-1 FAILED
> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared2 (heartbeat::ocf:IPaddr3):
> Started mgmnt-ha-2
> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_cluster_mg (heartbeat::ocf:IPaddr3): Sta
> rted mgmnt-ha-1
> pengine[30505]: 2006/09/17_15:50:00 info: HA_CounterACT (lsb:HA_CounterACT): Started mgmnt-ha-1
> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Recover resource IPaddr_private_s
> hared1 (mgmnt-ha-1)
> pengine[30505]: 2006/09/17_15:50:00 notice: Recurring:native.c mgmnt-ha-1 IPaddr_private_s
> hared1_monitor_15000
> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource IPaddr_private_sha
> red2 (mgmnt-ha-2)
> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource IPaddr_cluster_mg
> (mgmnt-ha-1)
> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource HA_CounterACT (mg
> mnt-ha-1)
> pengine[30505]: 2006/09/17_15:50:00 notice: stage8:allocate.c Created transition graph 27.
> pengine[30505]: 2006/09/17_15:50:00 WARN: process_pe_message:pengine.c No value specified for clust
> er preference: pe-input-series-max
> crmd[1932]: 2006/09/17_15:50:00 info: do_state_transition:fsa.c mgmnt-ha-1: State transition S_POLI
> CY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message:pengine.c Transition 27: PEngine Input
> stored in: /var/lib/heartbeat/pengine/pe-input-167.bz2
> tengine[30504]: 2006/09/17_15:50:00 info: unpack_graph:unpack.c Unpacked transition 27: 3 actions in 3 synapses
> tengine[30504]: 2006/09/17_15:50:00 info: send_rsc_command:actions.c Initiating action 4: IPaddr_pr
> ivate_shared1_stop_0 on mgmnt-ha-1
> crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_rsc_op:lrm.c Performing op stop on IPaddr_private_shar
> ed1 (interval=0ms, key=27:4e622eb8-e354-42c4-b53f-7eef9e1dc843)
> crmd[1932]: 2006/09/17_15:50:00 WARN: process_lrm_event:lrm.c LRM operation (115) monitor_15000 on
> IPaddr_private_shared1 Cancelled
> ccm[1927]: 2006/09/17_15:50:00 info: client (pid=17274) removed from ccm
> crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_invoke:lrm.c Removing resource IPaddr_private_shared1
> from the LRM
> crmd[1932]: 2006/09/17_15:50:00 info: send_direct_ack:lrm.c ACK'ing resource op: delete for IPaddr_
> private_shared1
> crmd[1932]: 2006/09/17_15:50:01 ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:7
> 83 : class != NULL
> crmd[1932]: 2006/09/17_15:50:01 ERROR: do_lrm_invoke:lrm.c Invalid resource definition
> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command <rsc_op transition_key="crm_resour
> ce-4088">
> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command <primitive id="IPaddr_private_sh
> ared1" long-id="IPaddr_private_shared1"/>
> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command <attributes crm_feature_set="1.0
> .6"/>
> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command </rsc_op>
> cib[1928]: 2006/09/17_15:50:02 info: cib_diff_notify:notify.c Update (client: 1932, call:20): 0.6.1
> 705 -> 0.6.1706 (ok)
> cibmon[1934]: 2006/09/17_15:50:02 info: cibmon_diff:cibmon.c [cib_diff_notify] cib_update confirmed
> tengine[30504]: 2006/09/17_15:50:02 info: te_update_diff:callbacks.c Processing diff (cib_update):
> 0.6.1705 -> 0.6.1706
> cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: --- 0.6.1705
> tengine[30504]: 2006/09/17_15:50:02 WARN: process_graph_event:events.c Event not found.
> cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: +++ 0.6.1706
>
> ********************************************************************
>
> <primitive class="ocf" id="IPaddr_private_shared1" provider="heartbeat" t
> ype="IPaddr3">
> <operations>
> <op id="IPaddr_private_shared1_mon" interval="15s" name="monitor" tim
> eout="15s"/>
> </operations>
> <instance_attributes id="IPaddr_private_shared1_inst_attr">
> <attributes>
> <nvpair id="IPaddr_private_shared1_attr_0" name="ip" value="172.17.
> 2.201"/>
> <nvpair id="IPaddr_private_shared1_attr_1" name="netmask" value="28
> "/>
> <nvpair id="IPaddr_private_shared1_attr_2" name="nic" value="eth1
> fallback eth1"/>
> <nvpair id="IPaddr_private_shared1_attr_3" name="link_status_node"
> value="haha-2"/>
> </attributes>
> </instance_attributes>
> </primitive>
>
> **********************************
>
> #debug 10 #enabling this slows down the interactions with cib, et. al.
> #debug 1 #enabling this slows down the interactions with cib, et. al.
> #debugfile /var/log/ha-debug
> #logfile /var/log/ha-log
> #logfacility local7
> use_logd yes
> node haha-1
> node haha-2
> udpport 694
> ucast eth0 10.0.4.181 #e.g. real eth0 address on host 1
> ucast eth0 10.0.4.182 #e.g. real eth0 address on host 2
> ucast eth1 172.17.2.171 #e.g. real eth7 address on host 1
> ucast eth1 172.17.2.172 #e.g. real eth7 address on host 2
> #bcast eth1
> serial /dev/ttyS0
> baud 19200
> auto_failback off
> autojoin none
> luster
> keepalive 1
> deadtime 60
> ping_group routers 10.0.4.1 10.0.4.253
> deadping 60
> warntime 30
> compression bz2
> compression_threshold 2
> traditional_compression false
> coredumps true
> initdead 60
> msgfmt netstring
> watchdog /dev/watchdog
> max_rexmit_delay 250 # set the maximum rexmit delay time
> CPU
> hbgenmethod time # Workaround against HB rexmit generation
> spoofing errors
> crm yes
> respawn hacluster /usr/lib/heartbeat/cibmon -d
> respawn root /usr/lib/heartbeat/pingd -m 1000 -d 5s -a default_ping_s
> et
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA
mailing list