[Linux-HA] troubles with external/riloe plugin.
Dave Blaschke
debltc at us.ibm.com
Mon Jul 2 11:27:20 MDT 2007
Edward Clay wrote:
> I am looking for some help with the external/riloe stonith plug in. I have been working with the one that ships in SLES 10 SP1 heartbeat 2.0.8-0.19. I have used the following XML to create the clone resource.
>
> <clone id="CL_stonithset_node1">
> <instance_attributes id="CL_stonithset_node1">
> <attributes>
> <nvpair id="CL_stonithset_node1_clone_node_max" name="clone_node_max" value="1"/>
> </attributes>
> </instance_attributes>
> <primitive id="CL_stonith_node1" class="stonith" type="external/riloe" provider="heartbeat">
> <operations>
> <op name="monitor" interval="30s" timeout="20s" id="CL_stonith_node1_monitor"/>
> <op name="start" timeout="60s" id="CL_stonith_node1_start"/>
> </operations>
> <instance_attributes id="CL_stonith_node1">
> <attributes>
> <nvpair id="CL_stonith_node1_hostlist" name="hostlist" value="node1"/>
> <nvpair id="CL_stonith_node1_RI_HOSTRI" name="RI_HOSTRI" value="il-node1"/>
> <nvpair id="CL_stonith_node1_RI_LOGIN" name="RI_LOGIN" value="Administrator"/>
> <nvpair id="CL_stonith_node1_RI_PASSWORD" name="RI_PASSWORD" value="password"/>
> </attributes>
> </instance_attributes>
> </primitive>
> </clone>
>
> Sample errors in the messages log.
> Jun 27 11:30:41 node1 haclient: on_event:evt:cib_changed
> Jun 27 11:30:41 node1 stonithd: [5318]: info: Cannot get parameter hostname from StonithNVpair
> Jun 27 11:30:41 node1 stonithd: [5318]: ERROR: Invalid config info for external/riloe device.
> Jun 27 11:30:41 node1 lrmd: [12035]: ERROR: sending stonithRA op to stonithd failed.
> Jun 27 11:30:41 node1 cib: [12048]: info: write_cib_contents: Wrote version 0.46.2095 of the CIB to disk
>
This problem has already been fixed by Novell bug 266551 so you should
be able to get the fix from them (sorry, can't be any help there). A
less-preferred but quicker alternative (only in a non-production
environment) is to apply the patch at
http://hg.linux-ha.org/dev/rev/48477653f995 directly to your system to
see if you get farther.
> This error shows up a couple of times in a row also.
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: couldnt find attr_name
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: ="ilo_hostname" uniq
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: <longdesc lang=en
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: > <parameter name="
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: c> <parameters> <pa
> Jun 27 11:30:41 node1 crmd: [5320]: ERROR: crm_abort: find_xml_node: Triggered non-fatal assert at xml.c:75 : root != NULL
>
> The resource is created OK but I can't start the resource. It gives an error that it can't run anywhere. I also see errors about not being able to fin hostname. So I did some digging in the riloe file and it shows the RI_ entries as legacy. lower it in the file it shows some ilo_ values. So I tried creating the same file above with the new ilo equivalents.
>
>
>
> <clone id="CL_stonithset_node1">
> <instance_attributes id="CL_stonithset_node1">
> <attributes>
> <nvpair id="CL_stonithset_node1_clone_node_max" name="clone_node_max" value="1"/>
> </attributes>
> </instance_attributes>
> <primitive id="CL_stonith_node1" class="stonith" type="external/riloe" provider="heartbeat">
> <operations>
> <op name="monitor" interval="30s" timeout="20s" id="CL_stonith_node1_monitor"/>
> <op name="start" timeout="60s" id="CL_stonith_node1_start"/>
> </operations>
> <instance_attributes id="CL_stonith_node1">
> <attributes>
> <nvpair id="CL_stonith_node1_hostlist" name="hostlist" value="node1"/>
> <nvpair id="CL_stonith_node1_ilo_hostname" name="ilo_hostname" value="il-node1"/>
> <nvpair id="CL_stonith_node1_ilo_user" name="ilo_user" value="Administrator"/>
> <nvpair id="CL_stonith_node1_ilo_password" name="ilo_password" value="password"/>
> <nvpair id="CL_stonith_node1_ilo_protocol" name="ilo_protocol" value="1.2"/>
> </attributes>
> </instance_attributes>
> </primitive>
> </clone>
>
> Same results resource is created but doesn't start. I can ping the hostname and the ilo hostname of node1 and il-node1 from all boxes. I am able to ssh and https to the ilo card and login with the admin account. I have attached the riloe plug in that I am trying to use.
>
> The hardware is a dl350 running ilo firmware 1.22.
>
> Does anyone know what type of connection the plug in makes to the ilo card?
>
> Do I need to have the ilo2 device at a certain firmware version?
>
> Do I need a driver loaded for the ilo card to work or does it communicate to it through ssh or https?
>
HTTPS
> What can I do to trouble shoot this problem?
>
You can add "debug 1" to your ha.cf or, for even more detailed debug
info, run with the stonith command and -d option:
export RI_HOSTRI=il-node1
::
stonith -d -t external/riloe hostlist=node1 -S
to verify your stonith environment.
>
> TIA
> Edward
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA
mailing list