[Linux-HA] do_lrm_invoke: Bad command and crm_failcount

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Oct 6 07:00:08 MDT 2006


Hi,

Did you resolve this issue? Last night I had a very similar
occurence:

Oct  6 06:43:09 rbxtc01 crmd: [8070]: ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:783 : class != NULL
Oct  6 06:43:09 n01 crmd: [8070]: ERROR: do_lrm_invoke:lrm.c Invalid resource definition
Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command <rsc_op transition_key="crm_resource-5450">
Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command   <primitive id="gs" long-id="gs"/>
Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command   <attributes crm_feature_set="1.0.6"/>
Oct  6 06:43:09 n01 crmd: [8070]: WARN: do_lrm_invoke: Bad command </rsc_op>

There was nothing else in the logs.

crm_verify found the configuration good.

The resource wouldn't start. It's a clone:

       <clone id="gs" globally_unique="false">
         <instance_attributes id="gs_inst_attr">
           <attributes>
             <nvpair id="gs_attr_1" name="clone_max" value="3"/>
             <nvpair id="gs_attr_2" name="clone_node_max" value="1"/>
           </attributes>
         </instance_attributes>
         <primitive class="ocf" id="Filesystem_1" provider="heartbeat" type="Filesystem">
           <instance_attributes id="Filesystem_1_inst_attr">
             <attributes>
               <nvpair id="Filesystem_1_attr_0" name="device" value="a02:/gs"/>
               <nvpair id="Filesystem_1_attr_1" name="directory" value="/gs"/>
               <nvpair id="Filesystem_1_attr_2" name="fstype" value="nfs"/>
               <nvpair id="Filesystem_1_attr_3" name="options" value="proto=udp" />
             </attributes>
           </instance_attributes>
         </primitive>
         <instance_attributes id="gs">
           <attributes/>
         </instance_attributes>
       </clone>

After restarting heartbeat everything was back to normal.

Cheers,

Dejan

On Mon, Sep 18, 2006 at 04:59:11PM +0300, Oren Nechushtan wrote:
> We get the next warning/errors
> ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:783 : class != NULL 
> WARN: do_lrm_invoke: Bad command 
> 
> The cib.xml seems OK.
> Here is a caption of the ha-log,ha.cf and cib.xml
> 
> This started out of the blue and continues every ~10 minutes.
> 
> Oren
> 
> P.S.
> Special node: we use a crontab script that is called periodically (every 10m!)
>  crm_resource -C -r ..
>  crm_failcount -D -r ..
> Without this, the system breaks down after reboots/long test periods; so I guess the script got called at the wrong moment..
> --------------------------
> 
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + <cib num_updates="1705">
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +   <status>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +     <node_state id="4f1373b2-98ba-46fb-bc3f-7
> a800b9dd9a9">
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +       <lrm id="4f1373b2-98ba-46fb-bc3f-7a800b
> 9dd9a9">
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +         <lrm_resources>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +           <lrm_resource id="IPaddr_private_sh
> ared1">
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +             <lrm_rsc_op transition_magic="4:7
> ;17:4e622eb8-e354-42c4-b53f-7eef9e1dc843" rc_code="7" op_status="4" id="IPaddr_private_shared1_moni
> tor_15000"/>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +           </lrm_resource>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +         </lrm_resources>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +       </lrm>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +     </node_state>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: +   </status>
> cibmon[1934]: 2006/09/17_15:50:00 info: cib_update: + </cib>
> cib[17260]: 2006/09/17_15:50:00 info: write_cib_contents:io.c Wrote version 0.6.1705 of the CIB to
> disk (digest: d109daaa481346a3a72bf71b62bc2527)
> pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message: [generation] <cib admin_epoch="0" hav
> e_quorum="true" generated="true" ccm_transition="6" num_peers="2" cib_feature_revision="1.3" dc_uui
> d="4f1373b2-98ba-46fb-bc3f-7a800b9dd9a9" epoch="6" num_updates="1705"/>
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default stickiness: 15
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Default failure stickiness: -10
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH of failed nodes is disable
> d
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c STONITH will reboot nodes
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Cluster is symmetric - resources c
> an run anywhere by default
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c On loss of CCM Quorum: Stop ALL re
> sources
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan resources are stopped
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Orphan resource actions are stoppe
> d
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c Stopped resources are removed from
>  the status section: false
> pengine[30505]: 2006/09/17_15:50:00 info: unpack_config:unpack.c By default resources are managed
> pengine[30505]: 2006/09/17_15:50:00 info: determine_online_status:unpack.c Node mgmnt-ha-1 is online
> pengine[30505]: 2006/09/17_15:50:00 WARN: unpack_rsc_op:unpack.c Processing failed op (IPaddr_priva
> te_shared1_monitor_15000) for IPaddr_private_shared1 on mgmnt-ha-1
> pengine[30505]: 2006/09/17_15:50:00 info: determine_online_status:unpack.c Node mgmnt-ha-2 is onlin
> e
> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared1        (heartbeat::ocf:IPaddr3):
>         Started mgmnt-ha-1 FAILED
> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_private_shared2        (heartbeat::ocf:IPaddr3):
>         Started mgmnt-ha-2
> pengine[30505]: 2006/09/17_15:50:00 info: IPaddr_cluster_mg     (heartbeat::ocf:IPaddr3):       Sta
> rted mgmnt-ha-1
> pengine[30505]: 2006/09/17_15:50:00 info: HA_CounterACT (lsb:HA_CounterACT):    Started mgmnt-ha-1
> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Recover resource IPaddr_private_s
> hared1  (mgmnt-ha-1)
> pengine[30505]: 2006/09/17_15:50:00 notice: Recurring:native.c mgmnt-ha-1          IPaddr_private_s
> hared1_monitor_15000
> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource IPaddr_private_sha
> red2    (mgmnt-ha-2)
> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource IPaddr_cluster_mg
>         (mgmnt-ha-1)
> pengine[30505]: 2006/09/17_15:50:00 notice: NoRoleChange:native.c Leave resource HA_CounterACT  (mg
> mnt-ha-1)
> pengine[30505]: 2006/09/17_15:50:00 notice: stage8:allocate.c Created transition graph 27.
> pengine[30505]: 2006/09/17_15:50:00 WARN: process_pe_message:pengine.c No value specified for clust
> er preference: pe-input-series-max
> crmd[1932]: 2006/09/17_15:50:00 info: do_state_transition:fsa.c mgmnt-ha-1: State transition S_POLI
> CY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> pengine[30505]: 2006/09/17_15:50:00 info: process_pe_message:pengine.c Transition 27: PEngine Input
>  stored in: /var/lib/heartbeat/pengine/pe-input-167.bz2
> tengine[30504]: 2006/09/17_15:50:00 info: unpack_graph:unpack.c Unpacked transition 27: 3 actions in 3 synapses
> tengine[30504]: 2006/09/17_15:50:00 info: send_rsc_command:actions.c Initiating action 4: IPaddr_pr
> ivate_shared1_stop_0 on mgmnt-ha-1
> crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_rsc_op:lrm.c Performing op stop on IPaddr_private_shar
> ed1 (interval=0ms, key=27:4e622eb8-e354-42c4-b53f-7eef9e1dc843)
> crmd[1932]: 2006/09/17_15:50:00 WARN: process_lrm_event:lrm.c LRM operation (115) monitor_15000 on
> IPaddr_private_shared1 Cancelled
> ccm[1927]: 2006/09/17_15:50:00 info: client (pid=17274) removed from ccm
> crmd[1932]: 2006/09/17_15:50:00 info: do_lrm_invoke:lrm.c Removing resource IPaddr_private_shared1
> from the LRM
> crmd[1932]: 2006/09/17_15:50:00 info: send_direct_ack:lrm.c ACK'ing resource op: delete for IPaddr_
> private_shared1
> crmd[1932]: 2006/09/17_15:50:01 ERROR: get_lrm_resource:lrm.c Triggered non-fatal assert at lrm.c:7
> 83 : class != NULL
> crmd[1932]: 2006/09/17_15:50:01 ERROR: do_lrm_invoke:lrm.c Invalid resource definition
> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command <rsc_op transition_key="crm_resour
> ce-4088">
> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command   <primitive id="IPaddr_private_sh
> ared1" long-id="IPaddr_private_shared1"/>
> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command   <attributes crm_feature_set="1.0
> .6"/>
> crmd[1932]: 2006/09/17_15:50:01 WARN: do_lrm_invoke: Bad command </rsc_op>
> cib[1928]: 2006/09/17_15:50:02 info: cib_diff_notify:notify.c Update (client: 1932, call:20): 0.6.1
> 705 -> 0.6.1706 (ok)
> cibmon[1934]: 2006/09/17_15:50:02 info: cibmon_diff:cibmon.c [cib_diff_notify] cib_update confirmed
> tengine[30504]: 2006/09/17_15:50:02 info: te_update_diff:callbacks.c Processing diff (cib_update):
> 0.6.1705 -> 0.6.1706
> cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: --- 0.6.1705
> tengine[30504]: 2006/09/17_15:50:02 WARN: process_graph_event:events.c Event not found.
> cibmon[1934]: 2006/09/17_15:50:02 info: cib_update: Diff: +++ 0.6.1706
> 
> ********************************************************************
> 
>       <primitive class="ocf" id="IPaddr_private_shared1" provider="heartbeat" t
> ype="IPaddr3">
>          <operations>
>            <op id="IPaddr_private_shared1_mon" interval="15s" name="monitor" tim
> eout="15s"/>
>          </operations>
>          <instance_attributes id="IPaddr_private_shared1_inst_attr">
>            <attributes>
>              <nvpair id="IPaddr_private_shared1_attr_0" name="ip" value="172.17.
> 2.201"/>
>              <nvpair id="IPaddr_private_shared1_attr_1" name="netmask" value="28
> "/>
>              <nvpair id="IPaddr_private_shared1_attr_2" name="nic" value="eth1
> fallback eth1"/>
>              <nvpair id="IPaddr_private_shared1_attr_3" name="link_status_node"
> value="haha-2"/>
>            </attributes>
>          </instance_attributes>
>        </primitive>
> 
> **********************************
> 
> #debug 10 #enabling this slows down the interactions with cib, et. al.
> #debug 1 #enabling this slows down the interactions with cib, et. al.
> #debugfile /var/log/ha-debug
> #logfile /var/log/ha-log
> #logfacility local7
> use_logd yes
> node haha-1
> node haha-2
> udpport 694
> ucast eth0 10.0.4.181 #e.g. real eth0 address on host 1
> ucast eth0 10.0.4.182 #e.g. real eth0 address on host 2
> ucast eth1 172.17.2.171 #e.g. real eth7 address on host 1
> ucast eth1 172.17.2.172 #e.g. real eth7 address on host 2
> #bcast eth1
> serial /dev/ttyS0
> baud 19200
> auto_failback off
> autojoin none
> luster
> keepalive 1
> deadtime 60
> ping_group routers 10.0.4.1 10.0.4.253
> deadping 60
> warntime 30
> compression    bz2
> compression_threshold 2
> traditional_compression false
> coredumps true
> initdead 60
> msgfmt netstring
> watchdog /dev/watchdog
> max_rexmit_delay        250     #       set the maximum rexmit delay time
> CPU
> hbgenmethod time                 #       Workaround against HB rexmit generation
>  spoofing errors
> crm yes
> respawn hacluster       /usr/lib/heartbeat/cibmon -d
> respawn root            /usr/lib/heartbeat/pingd -m 1000 -d 5s -a default_ping_s
> et
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


More information about the Linux-HA mailing list