[Linux-HA] crm_abort: get_lrm_resource: Triggered non-fatal assert at lrm.c:864

Sebastian Reitenbach sebastia at l00-bugdead-prods.de
Thu Mar 1 07:07:14 MST 2007


Hi list,

I had a cluster running with 5 nodes, and 5 stonith resources, these were
working great so 
far. Then I added 21 ocfs clone resources, these stayed "unused" (white lamp in
the gui), 
then I stopped the stonith resources, and started them again, and they stayed
white too.

after restarting heartbeat on all nodes, having all but one in standby, the one
living 
took the stonith resources and the ocfs resources too. 

this i saw in the log file, after adding the ocfs2 resoureces via cibadmin:


Mar  1 14:22:27 ppsdb102 sudo: ppsadmin : TTY=pts/6 ; PWD=/tmp/from_dol ;
USER=root ; 
COMMAND=/bin/bash
Mar  1 14:24:48 ppsdb102 cib: [7259]: info: cib_diff_notify: Update (client:
10549, 
call:2): 0.5.88 -> 0.5.89 (ok)
Mar  1 14:24:48 ppsdb102 haclient: on_event:evt:cib_changed
Mar  1 14:24:48 ppsdb102 cib: [10550]: info: write_cib_contents: Wrote version
0.5.89 of 
the CIB to disk (digest: 3025b3d515c63168a77c5240516c57db)
Mar  1 14:25:18 ppsdb102 mgmtd: [7264]: info: on_set_target_role:<clone 
id="Clone_PPS_CACHE_1"><primitive id="PPS_CACHE_1"><instance_attributes 
id="PPS_CACHE_1_instance_attrs"><attributes><nvpair
id="PPS_CACHE_1:0_target_role" 
name="target_role" 
value="started"/></attributes></instance_attributes></primitive></clone>
Mar  1 14:25:18 ppsdb102 cib: [7259]: info: cib_diff_notify: Update (client:
7264, 
call:94): 0.5.89 -> 0.5.90 (ok)
Mar  1 14:25:18 ppsdb102 cib: [10635]: info: write_cib_contents: Wrote version
0.5.90 of 
the CIB to disk (digest: edbe9b27df26aa27b3b3cbd6918c53f5)
Mar  1 14:25:18 ppsdb102 haclient: on_event: from message queue: evt:cib_changed
Mar  1 14:25:46 ppsdb102 crmd: [7263]: info: verify_stopped: Checking for active
resources 
before exit
Mar  1 14:25:46 ppsdb102 crmd: [7263]: ERROR: crm_abort: get_lrm_resource:
Triggered 
non-fatal assert at lrm.c:864 : class != NULL
Mar  1 14:25:46 ppsdb102 crmd: [7263]: ERROR: do_lrm_invoke: Invalid resource
definition
Mar  1 14:25:46 ppsdb102 crmd: [7263]: WARN: log_data_element: do_lrm_invoke:
Bad command 
<rsc_op transition_key="mgmtd-7264">
Mar  1 14:25:46 ppsdb102 crmd: [7263]: WARN: log_data_element: do_lrm_invoke:
Bad command   
<primitive id="PPS_CACHE_1:0"/>
Mar  1 14:25:46 ppsdb102 crmd: [7263]: WARN: log_data_element: do_lrm_invoke:
Bad command   
<attributes crm_feature_set="1.0.8"/>
Mar  1 14:25:46 ppsdb102 crmd: [7263]: WARN: log_data_element: do_lrm_invoke:
Bad command 
</rsc_op>
Mar  1 14:25:46 ppsdb102 crmd: [7263]: info: verify_stopped: Checking for active
resources 
before exit
Mar  1 14:25:46 ppsdb102 crmd: [7263]: info: append_restart_list: Resource
Stonith:0 does 
not support reloads
Mar  1 14:25:46 ppsdb102 crmd: [7263]: info: do_lrm_invoke: Forcing a local LRM
refresh
Mar  1 14:25:46 ppsdb102 crmd: [7263]: info: verify_stopped: Checking for active
resources 
before exit
Mar  1 14:25:46 ppsdb102 cib: [7259]: info: cib_diff_notify: Update (client:
7263, 
call:94): 0.5.90 -> 0.5.91 (ok)
Mar  1 14:25:46 ppsdb102 cib: [10717]: info: write_cib_contents: Wrote version
0.5.91 of 
the CIB to disk (digest: fc4cfb29da8b0fd01a3979e9a07d9ec3)


I found this thread, with in my eyes the same problem:
http://lists.community.tummy.com/pipermail/linux-ha/2006-October/021987.html
which later links to this patch:
http://hg.linux-ha.org/dev?cs=d0f8d4c45eab

but the assert in lrm.c is on a different line.

is this related to my problem? Is the patch added in heartbeat 2.0.8?

I use heartbeat 2.0.8, on SLES 10, x86_64.

Kind regards
Sebastian



More information about the Linux-HA mailing list