[Linux-HA] do_lrm_invoke: Bad command and crm_failcount
Dejan Muhamedagic
dejanmm at fastmail.fm
Mon Oct 9 03:05:52 MDT 2006
On Sat, Oct 07, 2006 at 09:43:19AM +0200, Andrew Beekhof wrote:
> On 10/6/06, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> >Hi,
> >
> >Did you resolve this issue?
>
> you didnt see the patch?
no. can you pass a link?
> >Last night I had a very similar
> >occurence:
> >
> >Oct 6 06:43:09 rbxtc01 crmd: [8070]: ERROR: get_lrm_resource:lrm.c
> >Triggered non-fatal assert at lrm.c:783 : class != NULL
> >Oct 6 06:43:09 n01 crmd: [8070]: ERROR: do_lrm_invoke:lrm.c Invalid
> >resource definition
> >Oct 6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command <rsc_op
> >transition_key="crm_resource-5450">
> >Oct 6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command
> ><primitive id="gs" long-id="gs"/>
> >Oct 6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command
> ><attributes crm_feature_set="1.0.6"/>
> >Oct 6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command
> ></rsc_op>
> >
> >There was nothing else in the logs.
> >
> >crm_verify found the configuration good.
> >
> >The resource wouldn't start. It's a clone:
> >
> > <clone id="gs" globally_unique="false">
> > <instance_attributes id="gs_inst_attr">
> > <attributes>
> > <nvpair id="gs_attr_1" name="clone_max" value="3"/>
> > <nvpair id="gs_attr_2" name="clone_node_max" value="1"/>
> > </attributes>
> > </instance_attributes>
> > <primitive class="ocf" id="Filesystem_1" provider="heartbeat"
> > type="Filesystem">
> > <instance_attributes id="Filesystem_1_inst_attr">
> > <attributes>
> > <nvpair id="Filesystem_1_attr_0" name="device"
> > value="a02:/gs"/>
> > <nvpair id="Filesystem_1_attr_1" name="directory"
> > value="/gs"/>
> > <nvpair id="Filesystem_1_attr_2" name="fstype" value="nfs"/>
> > <nvpair id="Filesystem_1_attr_3" name="options"
> > value="proto=udp" />
> > </attributes>
> > </instance_attributes>
> > </primitive>
> > <instance_attributes id="gs">
> > <attributes/>
> > </instance_attributes>
> > </clone>
> >
> >After restarting heartbeat everything was back to normal.
> >
> >Cheers,
> >
> >Dejan
> >
> >On Mon, Sep 18, 2006 at 04:59:11PM +0300, Oren Nechushtan wrote:
> >> We get the next warning/errors
> >> ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:783 :
> >class != NULL
> >> WARN: do_lrm_invoke: Bad command
> >>
> >> The cib.xml seems OK.
> >> Here is a caption of the ha-log,ha.cf and cib.xml
> >>
> >> This started out of the blue and continues every ~10 minutes.
> >>
> >> Oren
> >>
> >> P.S.
> >> Special node: we use a crontab script that is called periodically (every
> >10m!)
> >> crm_resource -C -r ..
> >> crm_failcount -D -r ..
> >> Without this, the system breaks down after reboots/long test periods; so
> >I guess the script got called at the wrong moment..
> >> --------------------------
> >>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <cib
> >num_updates="1705">
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <status>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <node_state
> >id="4f1373b2-98ba-46fb-bc3f-7
> >> a800b9dd9a9">
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <lrm
> >id="4f1373b2-98ba-46fb-bc3f-7a800b
> >> 9dd9a9">
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> ><lrm_resources>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> ><lrm_resource id="IPaddr_private_sh
> >> ared1">
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> ><lrm_rsc_op transition_magic="4:7
> >> ;17:4e622eb8-e354-42c4-b53f-7eef9e1dc843" rc_code="7" op_status="4"
> >id="IPaddr_private_shared1_moni
> >> tor_15000"/>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> ></lrm_resource>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> ></lrm_resources>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </lrm>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </node_state>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </status>
> >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </cib>
> >> cib[17260]: 2006/09/17_15:50:00 info: write_cib_contents:io.c Wrote
> >version 0.6.1705 of the CIB to
> >> disk (digest: d109daaa481346a3a72bf71b62bc2527)
> >> pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message:
> >[generation] <cib admin_epoch="0" hav
> >> e_quorum="true" generated="true" ccm_transition="6" num_peers="2"
> >cib_feature_revision="1.3" dc_uui
> >> d="4f1373b2-98ba-46fb-bc3f-7a800b9dd9a9" epoch="6" num_updates="1705"/>
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default
> >stickiness: 15
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default
> >failure stickiness: -10
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH
> >of failed nodes is disable
> >> d
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH
> >will reboot nodes
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Cluster
> >is symmetric - resources c
> >> an run anywhere by default
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c On loss
> >of CCM Quorum: Stop ALL re
> >> sources
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan
> >resources are stopped
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan
> >resource actions are stoppe
> >> d
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Stopped
> >resources are removed from
> >> the status section: false
> >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c By
> >default resources are managed
> >> pengine[30505]: 2006/09/17_15:50:00 info:
> >determine_online_status:unpack.c Node mgmnt-ha-1 is online
> >> pengine[30505]: 2006/09/17_15:50:00 WARN: unpack_rsc_op:unpack.c
> >Processing failed op (IPaddr_priva
> >> te_shared1_monitor_15000) for IPaddr_private_shared1 on mgmnt-ha-1
> >> pengine[30505]: 2006/09/17_15:50:00 info:
> >determine_online_status:unpack.c Node mgmnt-ha-2 is onlin
> >> e
> >> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared1
> >(heartbeat::ocf:IPaddr3):
> >> Started mgmnt-ha-1 FAILED
> >> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared2
> >(heartbeat::ocf:IPaddr3):
> >> Started mgmnt-ha-2
> >> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_cluster_mg
> >(heartbeat::ocf:IPaddr3): Sta
> >> rted mgmnt-ha-1
> >> pengine[30505]: 2006/09/17_15:50:00 info: HA_CounterACT
> >(lsb:HA_CounterACT): Started mgmnt-ha-1
> >> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c
> >Recover resource IPaddr_private_s
> >> hared1 (mgmnt-ha-1)
> >> pengine[30505]: 2006/09/17_15:50:00 notice: Recurring:native.c
> >mgmnt-ha-1 IPaddr_private_s
> >> hared1_monitor_15000
> >> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave
> >resource IPaddr_private_sha
> >> red2 (mgmnt-ha-2)
> >> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave
> >resource IPaddr_cluster_mg
> >> (mgmnt-ha-1)
> >> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave
> >resource HA_CounterACT (mg
> >> mnt-ha-1)
> >> pengine[30505]: 2006/09/17_15:50:00 notice: stage8:allocate.c Created
> >transition graph 27.
> >> pengine[30505]: 2006/09/17_15:50:00 WARN: process_pe_message:pengine.c
> >No value specified for clust
> >> er preference: pe-input-series-max
> >> crmd[1932]: 2006/09/17_15:50:00 info: do_state_transition:fsa.c
> >mgmnt-ha-1: State transition S_POLI
> >> CY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> >cause=C_IPC_MESSAGE origin=route_message ]
> >> pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message:pengine.c
> >Transition 27: PEngine Input
> >> stored in: /var/lib/heartbeat/pengine/pe-input-167.bz2
> >> tengine[30504]: 2006/09/17_15:50:00 info: unpack_graph:unpack.c Unpacked
> >transition 27: 3 actions in 3 synapses
> >> tengine[30504]: 2006/09/17_15:50:00 info: send_rsc_command:actions.c
> >Initiating action 4: IPaddr_pr
> >> ivate_shared1_stop_0 on mgmnt-ha-1
> >> crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_rsc_op:lrm.c Performing op
> >stop on IPaddr_private_shar
> >> ed1 (interval=0ms, key=27:4e622eb8-e354-42c4-b53f-7eef9e1dc843)
> >> crmd[1932]: 2006/09/17_15:50:00 WARN: process_lrm_event:lrm.c LRM
> >operation (115) monitor_15000 on
> >> IPaddr_private_shared1 Cancelled
> >> ccm[1927]: 2006/09/17_15:50:00 info: client (pid=17274) removed from ccm
> >> crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_invoke:lrm.c Removing
> >resource IPaddr_private_shared1
> >> from the LRM
> >> crmd[1932]: 2006/09/17_15:50:00 info: send_direct_ack:lrm.c ACK'ing
> >resource op: delete for IPaddr_
> >> private_shared1
> >> crmd[1932]: 2006/09/17_15:50:01 ERROR: get_lrm_resource:lrm.c Triggered
> >non-fatal assert at lrm.c:7
> >> 83 : class != NULL
> >> crmd[1932]: 2006/09/17_15:50:01 ERROR: do_lrm_invoke:lrm.c Invalid
> >resource definition
> >> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command <rsc_op
> >transition_key="crm_resour
> >> ce-4088">
> >> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command
> ><primitive id="IPaddr_private_sh
> >> ared1" long-id="IPaddr_private_shared1"/>
> >> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command
> ><attributes crm_feature_set="1.0
> >> .6"/>
> >> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command
> ></rsc_op>
> >> cib[1928]: 2006/09/17_15:50:02 info: cib_diff_notify:notify.c Update
> >(client: 1932, call:20): 0.6.1
> >> 705 -> 0.6.1706 (ok)
> >> cibmon[1934]: 2006/09/17_15:50:02 info: cibmon_diff:cibmon.c
> >[cib_diff_notify] cib_update confirmed
> >> tengine[30504]: 2006/09/17_15:50:02 info: te_update_diff:callbacks.c
> >Processing diff (cib_update):
> >> 0.6.1705 -> 0.6.1706
> >> cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: --- 0.6.1705
> >> tengine[30504]: 2006/09/17_15:50:02 WARN: process_graph_event:events.c
> >Event not found.
> >> cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: +++ 0.6.1706
> >>
> >> ********************************************************************
> >>
> >> <primitive class="ocf" id="IPaddr_private_shared1"
> >provider="heartbeat" t
> >> ype="IPaddr3">
> >> <operations>
> >> <op id="IPaddr_private_shared1_mon" interval="15s"
> >name="monitor" tim
> >> eout="15s"/>
> >> </operations>
> >> <instance_attributes id="IPaddr_private_shared1_inst_attr">
> >> <attributes>
> >> <nvpair id="IPaddr_private_shared1_attr_0" name="ip"
> >value="172.17.
> >> 2.201"/>
> >> <nvpair id="IPaddr_private_shared1_attr_1" name="netmask"
> >value="28
> >> "/>
> >> <nvpair id="IPaddr_private_shared1_attr_2" name="nic"
> >value="eth1
> >> fallback eth1"/>
> >> <nvpair id="IPaddr_private_shared1_attr_3"
> >name="link_status_node"
> >> value="haha-2"/>
> >> </attributes>
> >> </instance_attributes>
> >> </primitive>
> >>
> >> **********************************
> >>
> >> #debug 10 #enabling this slows down the interactions with cib, et. al.
> >> #debug 1 #enabling this slows down the interactions with cib, et. al.
> >> #debugfile /var/log/ha-debug
> >> #logfile /var/log/ha-log
> >> #logfacility local7
> >> use_logd yes
> >> node haha-1
> >> node haha-2
> >> udpport 694
> >> ucast eth0 10.0.4.181 #e.g. real eth0 address on host 1
> >> ucast eth0 10.0.4.182 #e.g. real eth0 address on host 2
> >> ucast eth1 172.17.2.171 #e.g. real eth7 address on host 1
> >> ucast eth1 172.17.2.172 #e.g. real eth7 address on host 2
> >> #bcast eth1
> >> serial /dev/ttyS0
> >> baud 19200
> >> auto_failback off
> >> autojoin none
> >> luster
> >> keepalive 1
> >> deadtime 60
> >> ping_group routers 10.0.4.1 10.0.4.253
> >> deadping 60
> >> warntime 30
> >> compression bz2
> >> compression_threshold 2
> >> traditional_compression false
> >> coredumps true
> >> initdead 60
> >> msgfmt netstring
> >> watchdog /dev/watchdog
> >> max_rexmit_delay 250 # set the maximum rexmit delay time
> >> CPU
> >> hbgenmethod time # Workaround against HB rexmit
> >generation
> >> spoofing errors
> >> crm yes
> >> respawn hacluster /usr/lib/heartbeat/cibmon -d
> >> respawn root /usr/lib/heartbeat/pingd -m 1000 -d 5s -a
> >default_ping_s
> >> et
> >> _______________________________________________
> >> Linux-HA mailing list
> >> Linux-HA at lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >
More information about the Linux-HA
mailing list