[Linux-HA] troubles with external/riloe plugin.

Edward Clay eclay at novell.com
Sun Jul 1 20:35:42 MDT 2007


I am looking for some help with the external/riloe stonith plug in.  I have been working with the one that ships in SLES 10 SP1 heartbeat 2.0.8-0.19.  I have used the following XML to create the clone resource. 

<clone id="CL_stonithset_node1"> 
   <instance_attributes id="CL_stonithset_node1"> 
    <attributes> 
      <nvpair id="CL_stonithset_node1_clone_node_max" name="clone_node_max" value="1"/> 
    </attributes> 
  </instance_attributes> 
  <primitive id="CL_stonith_node1" class="stonith" type="external/riloe" provider="heartbeat"> 
    <operations> 
      <op name="monitor" interval="30s" timeout="20s" id="CL_stonith_node1_monitor"/> 
      <op name="start" timeout="60s" id="CL_stonith_node1_start"/> 
    </operations> 
    <instance_attributes id="CL_stonith_node1"> 
      <attributes> 
        <nvpair id="CL_stonith_node1_hostlist" name="hostlist" value="node1"/> 
        <nvpair id="CL_stonith_node1_RI_HOSTRI" name="RI_HOSTRI" value="il-node1"/> 
        <nvpair id="CL_stonith_node1_RI_LOGIN" name="RI_LOGIN" value="Administrator"/> 
        <nvpair id="CL_stonith_node1_RI_PASSWORD" name="RI_PASSWORD" value="password"/> 
      </attributes> 
    </instance_attributes> 
  </primitive> 
</clone> 

Sample errors in the messages log. 
Jun 27 11:30:41 node1 haclient: on_event:evt:cib_changed 
Jun 27 11:30:41 node1 stonithd: [5318]: info: Cannot get parameter hostname from StonithNVpair 
Jun 27 11:30:41 node1 stonithd: [5318]: ERROR: Invalid config info for external/riloe device. 
Jun 27 11:30:41 node1 lrmd: [12035]: ERROR: sending stonithRA op to stonithd failed. 
Jun 27 11:30:41 node1 cib: [12048]: info: write_cib_contents: Wrote version 0.46.2095 of the CIB to disk 

This error shows up a couple of times in a row also. 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: couldnt find attr_name 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: ="ilo_hostname" uniq 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before:    <longdesc lang=en 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: >  <parameter name=" 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: c> <parameters>  <pa 
Jun 27 11:30:41 node1 crmd: [5320]: ERROR: crm_abort: find_xml_node: Triggered non-fatal assert at xml.c:75 : root != NULL 

The resource is created OK but I can't start the resource.  It gives an error that it can't run anywhere.  I also see errors about not being able to fin hostname.  So I did some digging in the riloe file and it shows the RI_ entries as legacy.  lower it in the file it shows some ilo_ values.  So I tried creating the same file above with the new ilo equivalents. 



<clone id="CL_stonithset_node1"> 
   <instance_attributes id="CL_stonithset_node1"> 
    <attributes> 
      <nvpair id="CL_stonithset_node1_clone_node_max" name="clone_node_max" value="1"/> 
    </attributes> 
  </instance_attributes> 
  <primitive id="CL_stonith_node1" class="stonith" type="external/riloe" provider="heartbeat"> 
    <operations> 
      <op name="monitor" interval="30s" timeout="20s" id="CL_stonith_node1_monitor"/> 
      <op name="start" timeout="60s" id="CL_stonith_node1_start"/> 
    </operations> 
    <instance_attributes id="CL_stonith_node1"> 
      <attributes> 
        <nvpair id="CL_stonith_node1_hostlist" name="hostlist" value="node1"/> 
        <nvpair id="CL_stonith_node1_ilo_hostname" name="ilo_hostname" value="il-node1"/> 
        <nvpair id="CL_stonith_node1_ilo_user" name="ilo_user" value="Administrator"/> 
        <nvpair id="CL_stonith_node1_ilo_password" name="ilo_password" value="password"/> 
        <nvpair id="CL_stonith_node1_ilo_protocol" name="ilo_protocol" value="1.2"/> 
      </attributes> 
    </instance_attributes> 
  </primitive> 
</clone> 

Same results resource is created but doesn't start.  I can ping the hostname and the ilo hostname of node1 and il-node1 from all boxes.  I am able to ssh and https to the ilo card and login with the admin account.  I have attached the riloe plug in that I am trying to use. 

The hardware is a dl350 running ilo firmware 1.22. 

Does anyone know what type of connection the plug in makes to the ilo card? 

Do I need to have the ilo2 device at a certain firmware version? 

Do I need a driver loaded for the ilo card to work or does it communicate to it through ssh or https? 

What can I do to trouble shoot this problem? 


TIA 
Edward 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riloe
Type: application/octet-stream
Size: 6261 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20070701/da9e2abc/riloe.obj


More information about the Linux-HA mailing list