[Linux-HA] troubles with external/riloe plugin.

Dave Blaschke debltc at us.ibm.com
Mon Jul 2 11:27:20 MDT 2007


Edward Clay wrote:
> I am looking for some help with the external/riloe stonith plug in.  I have been working with the one that ships in SLES 10 SP1 heartbeat 2.0.8-0.19.  I have used the following XML to create the clone resource. 
>
> <clone id="CL_stonithset_node1"> 
>    <instance_attributes id="CL_stonithset_node1"> 
>     <attributes> 
>       <nvpair id="CL_stonithset_node1_clone_node_max" name="clone_node_max" value="1"/> 
>     </attributes> 
>   </instance_attributes> 
>   <primitive id="CL_stonith_node1" class="stonith" type="external/riloe" provider="heartbeat"> 
>     <operations> 
>       <op name="monitor" interval="30s" timeout="20s" id="CL_stonith_node1_monitor"/> 
>       <op name="start" timeout="60s" id="CL_stonith_node1_start"/> 
>     </operations> 
>     <instance_attributes id="CL_stonith_node1"> 
>       <attributes> 
>         <nvpair id="CL_stonith_node1_hostlist" name="hostlist" value="node1"/> 
>         <nvpair id="CL_stonith_node1_RI_HOSTRI" name="RI_HOSTRI" value="il-node1"/> 
>         <nvpair id="CL_stonith_node1_RI_LOGIN" name="RI_LOGIN" value="Administrator"/> 
>         <nvpair id="CL_stonith_node1_RI_PASSWORD" name="RI_PASSWORD" value="password"/> 
>       </attributes> 
>     </instance_attributes> 
>   </primitive> 
> </clone> 
>
> Sample errors in the messages log. 
> Jun 27 11:30:41 node1 haclient: on_event:evt:cib_changed 
> Jun 27 11:30:41 node1 stonithd: [5318]: info: Cannot get parameter hostname from StonithNVpair 
> Jun 27 11:30:41 node1 stonithd: [5318]: ERROR: Invalid config info for external/riloe device. 
> Jun 27 11:30:41 node1 lrmd: [12035]: ERROR: sending stonithRA op to stonithd failed. 
> Jun 27 11:30:41 node1 cib: [12048]: info: write_cib_contents: Wrote version 0.46.2095 of the CIB to disk 
>   
This problem has already been fixed by Novell bug 266551 so you should 
be able to get the fix from them (sorry, can't be any help there).  A 
less-preferred but quicker alternative (only in a non-production 
environment) is to apply the patch at 
http://hg.linux-ha.org/dev/rev/48477653f995 directly to your system to 
see if you get farther.
> This error shows up a couple of times in a row also. 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: couldnt find attr_name 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: ="ilo_hostname" uniq 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before:    <longdesc lang=en 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: >  <parameter name=" 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: c> <parameters>  <pa 
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: crm_abort: find_xml_node: Triggered non-fatal assert at xml.c:75 : root != NULL 
>
> The resource is created OK but I can't start the resource.  It gives an error that it can't run anywhere.  I also see errors about not being able to fin hostname.  So I did some digging in the riloe file and it shows the RI_ entries as legacy.  lower it in the file it shows some ilo_ values.  So I tried creating the same file above with the new ilo equivalents. 
>
>
>
> <clone id="CL_stonithset_node1"> 
>    <instance_attributes id="CL_stonithset_node1"> 
>     <attributes> 
>       <nvpair id="CL_stonithset_node1_clone_node_max" name="clone_node_max" value="1"/> 
>     </attributes> 
>   </instance_attributes> 
>   <primitive id="CL_stonith_node1" class="stonith" type="external/riloe" provider="heartbeat"> 
>     <operations> 
>       <op name="monitor" interval="30s" timeout="20s" id="CL_stonith_node1_monitor"/> 
>       <op name="start" timeout="60s" id="CL_stonith_node1_start"/> 
>     </operations> 
>     <instance_attributes id="CL_stonith_node1"> 
>       <attributes> 
>         <nvpair id="CL_stonith_node1_hostlist" name="hostlist" value="node1"/> 
>         <nvpair id="CL_stonith_node1_ilo_hostname" name="ilo_hostname" value="il-node1"/> 
>         <nvpair id="CL_stonith_node1_ilo_user" name="ilo_user" value="Administrator"/> 
>         <nvpair id="CL_stonith_node1_ilo_password" name="ilo_password" value="password"/> 
>         <nvpair id="CL_stonith_node1_ilo_protocol" name="ilo_protocol" value="1.2"/> 
>       </attributes> 
>     </instance_attributes> 
>   </primitive> 
> </clone> 
>
> Same results resource is created but doesn't start.  I can ping the hostname and the ilo hostname of node1 and il-node1 from all boxes.  I am able to ssh and https to the ilo card and login with the admin account.  I have attached the riloe plug in that I am trying to use. 
>
> The hardware is a dl350 running ilo firmware 1.22. 
>
> Does anyone know what type of connection the plug in makes to the ilo card? 
>
> Do I need to have the ilo2 device at a certain firmware version? 
>
> Do I need a driver loaded for the ilo card to work or does it communicate to it through ssh or https? 
>   
HTTPS
> What can I do to trouble shoot this problem? 
>   
You can add "debug 1" to your ha.cf or, for even more detailed debug 
info, run with the stonith command and -d option:

export RI_HOSTRI=il-node1
::
stonith -d -t external/riloe hostlist=node1 -S

to verify your stonith environment.
>
> TIA 
> Edward 
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems




More information about the Linux-HA mailing list