[Linux-HA] do_lrm_invoke: Bad command and crm_failcount

Andrew Beekhof beekhof at gmail.com
Sat Oct 7 01:43:19 MDT 2006


On 10/6/06, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> Hi,
>
> Did you resolve this issue?

you didnt see the patch?

> Last night I had a very similar
> occurence:
>
> Oct  6 06:43:09 rbxtc01 crmd: [8070]: ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:783 : class != NULL
> Oct  6 06:43:09 n01 crmd: [8070]: ERROR: do_lrm_invoke:lrm.c Invalid resource definition
> Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command <rsc_op transition_key="crm_resource-5450">
> Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command   <primitive id="gs" long-id="gs"/>
> Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command   <attributes crm_feature_set="1.0.6"/>
> Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command </rsc_op>
>
> There was nothing else in the logs.
>
> crm_verify found the configuration good.
>
> The resource wouldn't start. It's a clone:
>
>        <clone id="gs" globally_unique="false">
>          <instance_attributes id="gs_inst_attr">
>            <attributes>
>              <nvpair id="gs_attr_1" name="clone_max" value="3"/>
>              <nvpair id="gs_attr_2" name="clone_node_max" value="1"/>
>            </attributes>
>          </instance_attributes>
>          <primitive class="ocf" id="Filesystem_1" provider="heartbeat" type="Filesystem">
>            <instance_attributes id="Filesystem_1_inst_attr">
>              <attributes>
>                <nvpair id="Filesystem_1_attr_0" name="device" value="a02:/gs"/>
>                <nvpair id="Filesystem_1_attr_1" name="directory" value="/gs"/>
>                <nvpair id="Filesystem_1_attr_2" name="fstype" value="nfs"/>
>                <nvpair id="Filesystem_1_attr_3" name="options" value="proto=udp" />
>              </attributes>
>            </instance_attributes>
>          </primitive>
>          <instance_attributes id="gs">
>            <attributes/>
>          </instance_attributes>
>        </clone>
>
> After restarting heartbeat everything was back to normal.
>
> Cheers,
>
> Dejan
>
> On Mon, Sep 18, 2006 at 04:59:11PM +0300, Oren Nechushtan wrote:
> > We get the next warning/errors
> > ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:783 : class != NULL
> > WARN: do_lrm_invoke: Bad command
> >
> > The cib.xml seems OK.
> > Here is a caption of the ha-log,ha.cf and cib.xml
> >
> > This started out of the blue and continues every ~10 minutes.
> >
> > Oren
> >
> > P.S.
> > Special node: we use a crontab script that is called periodically (every 10m!)
> >  crm_resource -C -r ..
> >  crm_failcount -D -r ..
> > Without this, the system breaks down after reboots/long test periods; so I guess the script got called at the wrong moment..
> > --------------------------
> >
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <cib num_updates="1705">
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +   <status>
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +     <node_state id="4f1373b2-98ba-46fb-bc3f-7
> > a800b9dd9a9">
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +       <lrm id="4f1373b2-98ba-46fb-bc3f-7a800b
> > 9dd9a9">
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +         <lrm_resources>
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +           <lrm_resource id="IPaddr_private_sh
> > ared1">
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +             <lrm_rsc_op transition_magic="4:7
> > ;17:4e622eb8-e354-42c4-b53f-7eef9e1dc843" rc_code="7" op_status="4" id="IPaddr_private_shared1_moni
> > tor_15000"/>
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +           </lrm_resource>
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +         </lrm_resources>
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +       </lrm>
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +     </node_state>
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +   </status>
> > cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </cib>
> > cib[17260]: 2006/09/17_15:50:00 info: write_cib_contents:io.c Wrote version 0.6.1705 of the CIB to
> > disk (digest: d109daaa481346a3a72bf71b62bc2527)
> > pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message: [generation] <cib admin_epoch="0" hav
> > e_quorum="true" generated="true" ccm_transition="6" num_peers="2" cib_feature_revision="1.3" dc_uui
> > d="4f1373b2-98ba-46fb-bc3f-7a800b9dd9a9" epoch="6" num_updates="1705"/>
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default stickiness: 15
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default failure stickiness: -10
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH of failed nodes is disable
> > d
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH will reboot nodes
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Cluster is symmetric - resources c
> > an run anywhere by default
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c On loss of CCM Quorum: Stop ALL re
> > sources
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan resources are stopped
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan resource actions are stoppe
> > d
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Stopped resources are removed from
> >  the status section: false
> > pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c By default resources are managed
> > pengine[30505]: 2006/09/17_15:50:00 info: determine_online_status:unpack.c Node mgmnt-ha-1 is online
> > pengine[30505]: 2006/09/17_15:50:00 WARN: unpack_rsc_op:unpack.c Processing failed op (IPaddr_priva
> > te_shared1_monitor_15000) for IPaddr_private_shared1 on mgmnt-ha-1
> > pengine[30505]: 2006/09/17_15:50:00 info: determine_online_status:unpack.c Node mgmnt-ha-2 is onlin
> > e
> > pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared1        (heartbeat::ocf:IPaddr3):
> >         Started mgmnt-ha-1 FAILED
> > pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared2        (heartbeat::ocf:IPaddr3):
> >         Started mgmnt-ha-2
> > pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_cluster_mg     (heartbeat::ocf:IPaddr3):       Sta
> > rted mgmnt-ha-1
> > pengine[30505]: 2006/09/17_15:50:00 info: HA_CounterACT (lsb:HA_CounterACT):    Started mgmnt-ha-1
> > pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Recover resource IPaddr_private_s
> > hared1  (mgmnt-ha-1)
> > pengine[30505]: 2006/09/17_15:50:00 notice: Recurring:native.c mgmnt-ha-1          IPaddr_private_s
> > hared1_monitor_15000
> > pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource IPaddr_private_sha
> > red2    (mgmnt-ha-2)
> > pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource IPaddr_cluster_mg
> >         (mgmnt-ha-1)
> > pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource HA_CounterACT  (mg
> > mnt-ha-1)
> > pengine[30505]: 2006/09/17_15:50:00 notice: stage8:allocate.c Created transition graph 27.
> > pengine[30505]: 2006/09/17_15:50:00 WARN: process_pe_message:pengine.c No value specified for clust
> > er preference: pe-input-series-max
> > crmd[1932]: 2006/09/17_15:50:00 info: do_state_transition:fsa.c mgmnt-ha-1: State transition S_POLI
> > CY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message:pengine.c Transition 27: PEngine Input
> >  stored in: /var/lib/heartbeat/pengine/pe-input-167.bz2
> > tengine[30504]: 2006/09/17_15:50:00 info: unpack_graph:unpack.c Unpacked transition 27: 3 actions in 3 synapses
> > tengine[30504]: 2006/09/17_15:50:00 info: send_rsc_command:actions.c Initiating action 4: IPaddr_pr
> > ivate_shared1_stop_0 on mgmnt-ha-1
> > crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_rsc_op:lrm.c Performing op stop on IPaddr_private_shar
> > ed1 (interval=0ms, key=27:4e622eb8-e354-42c4-b53f-7eef9e1dc843)
> > crmd[1932]: 2006/09/17_15:50:00 WARN: process_lrm_event:lrm.c LRM operation (115) monitor_15000 on
> > IPaddr_private_shared1 Cancelled
> > ccm[1927]: 2006/09/17_15:50:00 info: client (pid=17274) removed from ccm
> > crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_invoke:lrm.c Removing resource IPaddr_private_shared1
> > from the LRM
> > crmd[1932]: 2006/09/17_15:50:00 info: send_direct_ack:lrm.c ACK'ing resource op: delete for IPaddr_
> > private_shared1
> > crmd[1932]: 2006/09/17_15:50:01 ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:7
> > 83 : class != NULL
> > crmd[1932]: 2006/09/17_15:50:01 ERROR: do_lrm_invoke:lrm.c Invalid resource definition
> > crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command <rsc_op transition_key="crm_resour
> > ce-4088">
> > crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command   <primitive id="IPaddr_private_sh
> > ared1" long-id="IPaddr_private_shared1"/>
> > crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command   <attributes crm_feature_set="1.0
> > .6"/>
> > crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command </rsc_op>
> > cib[1928]: 2006/09/17_15:50:02 info: cib_diff_notify:notify.c Update (client: 1932, call:20): 0.6.1
> > 705 -> 0.6.1706 (ok)
> > cibmon[1934]: 2006/09/17_15:50:02 info: cibmon_diff:cibmon.c [cib_diff_notify] cib_update confirmed
> > tengine[30504]: 2006/09/17_15:50:02 info: te_update_diff:callbacks.c Processing diff (cib_update):
> > 0.6.1705 -> 0.6.1706
> > cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: --- 0.6.1705
> > tengine[30504]: 2006/09/17_15:50:02 WARN: process_graph_event:events.c Event not found.
> > cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: +++ 0.6.1706
> >
> > ********************************************************************
> >
> >       <primitive class="ocf" id="IPaddr_private_shared1" provider="heartbeat" t
> > ype="IPaddr3">
> >          <operations>
> >            <op id="IPaddr_private_shared1_mon" interval="15s" name="monitor" tim
> > eout="15s"/>
> >          </operations>
> >          <instance_attributes id="IPaddr_private_shared1_inst_attr">
> >            <attributes>
> >              <nvpair id="IPaddr_private_shared1_attr_0" name="ip" value="172.17.
> > 2.201"/>
> >              <nvpair id="IPaddr_private_shared1_attr_1" name="netmask" value="28
> > "/>
> >              <nvpair id="IPaddr_private_shared1_attr_2" name="nic" value="eth1
> > fallback eth1"/>
> >              <nvpair id="IPaddr_private_shared1_attr_3" name="link_status_node"
> > value="haha-2"/>
> >            </attributes>
> >          </instance_attributes>
> >        </primitive>
> >
> > **********************************
> >
> > #debug 10 #enabling this slows down the interactions with cib, et. al.
> > #debug 1 #enabling this slows down the interactions with cib, et. al.
> > #debugfile /var/log/ha-debug
> > #logfile /var/log/ha-log
> > #logfacility local7
> > use_logd yes
> > node haha-1
> > node haha-2
> > udpport 694
> > ucast eth0 10.0.4.181 #e.g. real eth0 address on host 1
> > ucast eth0 10.0.4.182 #e.g. real eth0 address on host 2
> > ucast eth1 172.17.2.171 #e.g. real eth7 address on host 1
> > ucast eth1 172.17.2.172 #e.g. real eth7 address on host 2
> > #bcast eth1
> > serial /dev/ttyS0
> > baud 19200
> > auto_failback off
> > autojoin none
> > luster
> > keepalive 1
> > deadtime 60
> > ping_group routers 10.0.4.1 10.0.4.253
> > deadping 60
> > warntime 30
> > compression    bz2
> > compression_threshold 2
> > traditional_compression false
> > coredumps true
> > initdead 60
> > msgfmt netstring
> > watchdog /dev/watchdog
> > max_rexmit_delay        250     #       set the maximum rexmit delay time
> > CPU
> > hbgenmethod time                 #       Workaround against HB rexmit generation
> >  spoofing errors
> > crm yes
> > respawn hacluster       /usr/lib/heartbeat/cibmon -d
> > respawn root            /usr/lib/heartbeat/pingd -m 1000 -d 5s -a default_ping_s
> > et
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list