[Linux-HA] do_lrm_invoke: Bad command and crm_failcount

Andrew Beekhof beekhof at gmail.com
Mon Oct 9 03:24:34 MDT 2006


On 10/9/06, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> On Sat, Oct 07, 2006 at 09:43:19AM +0200, Andrew Beekhof wrote:
> > On 10/6/06, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> > >Hi,
> > >
> > >Did you resolve this issue?
> >
> > you didnt see the patch?
>
> no. can you pass a link?

ah, my bad - the patch email only went to Oren.

http://hg.linux-ha.org/dev?cs=d0f8d4c45eab

>
> > >Last night I had a very similar
> > >occurence:
> > >
> > >Oct  6 06:43:09 rbxtc01 crmd: [8070]: ERROR: get_lrm_resource:lrm.c
> > >Triggered non-fatal assert at lrm.c:783 : class != NULL
> > >Oct  6 06:43:09 n01 crmd: [8070]: ERROR: do_lrm_invoke:lrm.c Invalid
> > >resource definition
> > >Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command <rsc_op
> > >transition_key="crm_resource-5450">
> > >Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command
> > ><primitive id="gs" long-id="gs"/>
> > >Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command
> > ><attributes crm_feature_set="1.0.6"/>
> > >Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command
> > ></rsc_op>
> > >
> > >There was nothing else in the logs.
> > >
> > >crm_verify found the configuration good.
> > >
> > >The resource wouldn't start. It's a clone:
> > >
> > >       <clone id="gs" globally_unique="false">
> > >         <instance_attributes id="gs_inst_attr">
> > >           <attributes>
> > >             <nvpair id="gs_attr_1" name="clone_max" value="3"/>
> > >             <nvpair id="gs_attr_2" name="clone_node_max" value="1"/>
> > >           </attributes>
> > >         </instance_attributes>
> > >         <primitive class="ocf" id="Filesystem_1" provider="heartbeat"
> > >         type="Filesystem">
> > >           <instance_attributes id="Filesystem_1_inst_attr">
> > >             <attributes>
> > >               <nvpair id="Filesystem_1_attr_0" name="device"
> > >               value="a02:/gs"/>
> > >               <nvpair id="Filesystem_1_attr_1" name="directory"
> > >               value="/gs"/>
> > >               <nvpair id="Filesystem_1_attr_2" name="fstype" value="nfs"/>
> > >               <nvpair id="Filesystem_1_attr_3" name="options"
> > >               value="proto=udp" />
> > >             </attributes>
> > >           </instance_attributes>
> > >         </primitive>
> > >         <instance_attributes id="gs">
> > >           <attributes/>
> > >         </instance_attributes>
> > >       </clone>
> > >
> > >After restarting heartbeat everything was back to normal.
> > >
> > >Cheers,
> > >
> > >Dejan
> > >
> > >On Mon, Sep 18, 2006 at 04:59:11PM +0300, Oren Nechushtan wrote:
> > >> We get the next warning/errors
> > >> ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:783 :
> > >class != NULL
> > >> WARN: do_lrm_invoke: Bad command
> > >>
> > >> The cib.xml seems OK.
> > >> Here is a caption of the ha-log,ha.cf and cib.xml
> > >>
> > >> This started out of the blue and continues every ~10 minutes.
> > >>
> > >> Oren
> > >>
> > >> P.S.
> > >> Special node: we use a crontab script that is called periodically (every
> > >10m!)
> > >>  crm_resource -C -r ..
> > >>  crm_failcount -D -r ..
> > >> Without this, the system breaks down after reboots/long test periods; so
> > >I guess the script got called at the wrong moment..
> > >> --------------------------
> > >>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <cib
> > >num_updates="1705">
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +   <status>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +     <node_state
> > >id="4f1373b2-98ba-46fb-bc3f-7
> > >> a800b9dd9a9">
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +       <lrm
> > >id="4f1373b2-98ba-46fb-bc3f-7a800b
> > >> 9dd9a9">
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> > ><lrm_resources>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> > ><lrm_resource id="IPaddr_private_sh
> > >> ared1">
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> > ><lrm_rsc_op transition_magic="4:7
> > >> ;17:4e622eb8-e354-42c4-b53f-7eef9e1dc843" rc_code="7" op_status="4"
> > >id="IPaddr_private_shared1_moni
> > >> tor_15000"/>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> > ></lrm_resource>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +
> > ></lrm_resources>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +       </lrm>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +     </node_state>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +   </status>
> > >> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </cib>
> > >> cib[17260]: 2006/09/17_15:50:00 info: write_cib_contents:io.c Wrote
> > >version 0.6.1705 of the CIB to
> > >> disk (digest: d109daaa481346a3a72bf71b62bc2527)
> > >> pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message:
> > >[generation] <cib admin_epoch="0" hav
> > >> e_quorum="true" generated="true" ccm_transition="6" num_peers="2"
> > >cib_feature_revision="1.3" dc_uui
> > >> d="4f1373b2-98ba-46fb-bc3f-7a800b9dd9a9" epoch="6" num_updates="1705"/>
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default
> > >stickiness: 15
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default
> > >failure stickiness: -10
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH
> > >of failed nodes is disable
> > >> d
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH
> > >will reboot nodes
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Cluster
> > >is symmetric - resources c
> > >> an run anywhere by default
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c On loss
> > >of CCM Quorum: Stop ALL re
> > >> sources
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan
> > >resources are stopped
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan
> > >resource actions are stoppe
> > >> d
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Stopped
> > >resources are removed from
> > >>  the status section: false
> > >> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c By
> > >default resources are managed
> > >> pengine[30505]: 2006/09/17_15:50:00 info:
> > >determine_online_status:unpack.c Node mgmnt-ha-1 is online
> > >> pengine[30505]: 2006/09/17_15:50:00 WARN: unpack_rsc_op:unpack.c
> > >Processing failed op (IPaddr_priva
> > >> te_shared1_monitor_15000) for IPaddr_private_shared1 on mgmnt-ha-1
> > >> pengine[30505]: 2006/09/17_15:50:00 info:
> > >determine_online_status:unpack.c Node mgmnt-ha-2 is onlin
> > >> e
> > >> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared1
> > >(heartbeat::ocf:IPaddr3):
> > >>         Started mgmnt-ha-1 FAILED
> > >> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared2
> > >(heartbeat::ocf:IPaddr3):
> > >>         Started mgmnt-ha-2
> > >> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_cluster_mg
> > >(heartbeat::ocf:IPaddr3):       Sta
> > >> rted mgmnt-ha-1
> > >> pengine[30505]: 2006/09/17_15:50:00 info: HA_CounterACT
> > >(lsb:HA_CounterACT):    Started mgmnt-ha-1
> > >> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c
> > >Recover resource IPaddr_private_s
> > >> hared1  (mgmnt-ha-1)
> > >> pengine[30505]: 2006/09/17_15:50:00 notice: Recurring:native.c
> > >mgmnt-ha-1          IPaddr_private_s
> > >> hared1_monitor_15000
> > >> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave
> > >resource IPaddr_private_sha
> > >> red2    (mgmnt-ha-2)
> > >> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave
> > >resource IPaddr_cluster_mg
> > >>         (mgmnt-ha-1)
> > >> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave
> > >resource HA_CounterACT  (mg
> > >> mnt-ha-1)
> > >> pengine[30505]: 2006/09/17_15:50:00 notice: stage8:allocate.c Created
> > >transition graph 27.
> > >> pengine[30505]: 2006/09/17_15:50:00 WARN: process_pe_message:pengine.c
> > >No value specified for clust
> > >> er preference: pe-input-series-max
> > >> crmd[1932]: 2006/09/17_15:50:00 info: do_state_transition:fsa.c
> > >mgmnt-ha-1: State transition S_POLI
> > >> CY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> > >cause=C_IPC_MESSAGE origin=route_message ]
> > >> pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message:pengine.c
> > >Transition 27: PEngine Input
> > >>  stored in: /var/lib/heartbeat/pengine/pe-input-167.bz2
> > >> tengine[30504]: 2006/09/17_15:50:00 info: unpack_graph:unpack.c Unpacked
> > >transition 27: 3 actions in 3 synapses
> > >> tengine[30504]: 2006/09/17_15:50:00 info: send_rsc_command:actions.c
> > >Initiating action 4: IPaddr_pr
> > >> ivate_shared1_stop_0 on mgmnt-ha-1
> > >> crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_rsc_op:lrm.c Performing op
> > >stop on IPaddr_private_shar
> > >> ed1 (interval=0ms, key=27:4e622eb8-e354-42c4-b53f-7eef9e1dc843)
> > >> crmd[1932]: 2006/09/17_15:50:00 WARN: process_lrm_event:lrm.c LRM
> > >operation (115) monitor_15000 on
> > >> IPaddr_private_shared1 Cancelled
> > >> ccm[1927]: 2006/09/17_15:50:00 info: client (pid=17274) removed from ccm
> > >> crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_invoke:lrm.c Removing
> > >resource IPaddr_private_shared1
> > >> from the LRM
> > >> crmd[1932]: 2006/09/17_15:50:00 info: send_direct_ack:lrm.c ACK'ing
> > >resource op: delete for IPaddr_
> > >> private_shared1
> > >> crmd[1932]: 2006/09/17_15:50:01 ERROR: get_lrm_resource:lrm.c Triggered
> > >non-fatal assert at lrm.c:7
> > >> 83 : class != NULL
> > >> crmd[1932]: 2006/09/17_15:50:01 ERROR: do_lrm_invoke:lrm.c Invalid
> > >resource definition
> > >> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command <rsc_op
> > >transition_key="crm_resour
> > >> ce-4088">
> > >> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command
> > ><primitive id="IPaddr_private_sh
> > >> ared1" long-id="IPaddr_private_shared1"/>
> > >> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command
> > ><attributes crm_feature_set="1.0
> > >> .6"/>
> > >> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command
> > ></rsc_op>
> > >> cib[1928]: 2006/09/17_15:50:02 info: cib_diff_notify:notify.c Update
> > >(client: 1932, call:20): 0.6.1
> > >> 705 -> 0.6.1706 (ok)
> > >> cibmon[1934]: 2006/09/17_15:50:02 info: cibmon_diff:cibmon.c
> > >[cib_diff_notify] cib_update confirmed
> > >> tengine[30504]: 2006/09/17_15:50:02 info: te_update_diff:callbacks.c
> > >Processing diff (cib_update):
> > >> 0.6.1705 -> 0.6.1706
> > >> cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: --- 0.6.1705
> > >> tengine[30504]: 2006/09/17_15:50:02 WARN: process_graph_event:events.c
> > >Event not found.
> > >> cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: +++ 0.6.1706
> > >>
> > >> ********************************************************************
> > >>
> > >>       <primitive class="ocf" id="IPaddr_private_shared1"
> > >provider="heartbeat" t
> > >> ype="IPaddr3">
> > >>          <operations>
> > >>            <op id="IPaddr_private_shared1_mon" interval="15s"
> > >name="monitor" tim
> > >> eout="15s"/>
> > >>          </operations>
> > >>          <instance_attributes id="IPaddr_private_shared1_inst_attr">
> > >>            <attributes>
> > >>              <nvpair id="IPaddr_private_shared1_attr_0" name="ip"
> > >value="172.17.
> > >> 2.201"/>
> > >>              <nvpair id="IPaddr_private_shared1_attr_1" name="netmask"
> > >value="28
> > >> "/>
> > >>              <nvpair id="IPaddr_private_shared1_attr_2" name="nic"
> > >value="eth1
> > >> fallback eth1"/>
> > >>              <nvpair id="IPaddr_private_shared1_attr_3"
> > >name="link_status_node"
> > >> value="haha-2"/>
> > >>            </attributes>
> > >>          </instance_attributes>
> > >>        </primitive>
> > >>
> > >> **********************************
> > >>
> > >> #debug 10 #enabling this slows down the interactions with cib, et. al.
> > >> #debug 1 #enabling this slows down the interactions with cib, et. al.
> > >> #debugfile /var/log/ha-debug
> > >> #logfile /var/log/ha-log
> > >> #logfacility local7
> > >> use_logd yes
> > >> node haha-1
> > >> node haha-2
> > >> udpport 694
> > >> ucast eth0 10.0.4.181 #e.g. real eth0 address on host 1
> > >> ucast eth0 10.0.4.182 #e.g. real eth0 address on host 2
> > >> ucast eth1 172.17.2.171 #e.g. real eth7 address on host 1
> > >> ucast eth1 172.17.2.172 #e.g. real eth7 address on host 2
> > >> #bcast eth1
> > >> serial /dev/ttyS0
> > >> baud 19200
> > >> auto_failback off
> > >> autojoin none
> > >> luster
> > >> keepalive 1
> > >> deadtime 60
> > >> ping_group routers 10.0.4.1 10.0.4.253
> > >> deadping 60
> > >> warntime 30
> > >> compression    bz2
> > >> compression_threshold 2
> > >> traditional_compression false
> > >> coredumps true
> > >> initdead 60
> > >> msgfmt netstring
> > >> watchdog /dev/watchdog
> > >> max_rexmit_delay        250     #       set the maximum rexmit delay time
> > >> CPU
> > >> hbgenmethod time                 #       Workaround against HB rexmit
> > >generation
> > >>  spoofing errors
> > >> crm yes
> > >> respawn hacluster       /usr/lib/heartbeat/cibmon -d
> > >> respawn root            /usr/lib/heartbeat/pingd -m 1000 -d 5s -a
> > >default_ping_s
> > >> et
> > >> _______________________________________________
> > >> Linux-HA mailing list
> > >> Linux-HA at lists.linux-ha.org
> > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >> See also: http://linux-ha.org/ReportingProblems
> > >
>


More information about the Linux-HA mailing list