[Linux-HA] CRM and STONITH questions

Spam Filter Spam at citadelcomputer.com.au
Fri Oct 19 04:31:07 MDT 2007


OK, I got a better understanding now.

Attached is my small script (missing one class file but not needed to
show) which is my stonith script.
It does all the methods required but what I have at the beginning is
that the "hostlist" environment must be set otherwise it exits with a 1.
I don't know if this is always set when the script is called all the
time or only when stonith action (on|off|reset) is needed.

Now from the information you gave below, I think my script does as it
should maybe except the environment stuff I mentioned above.

I originally had it so you can call it like this :
./powerio castor reset
./powerio castor off
Etc.

But after reading the hostlist, thought it would be better so it can do
multiples and compatible with the way HA do the environment
stuff..*shrugs*

Now onto loading the stonith into the cib, do I specifically need to put
all host names capable of stonith as the parameter below as I would've
though all members on HA would/could be stonithed... If not, then I
assume I need to add names to this list as more members join, correct?

I assume the below modified example would suffice my script. Is there an
easy way to implement this on a live system and test that HA can stonith
but make sure it's a test run with the plugs unplugged so it doesn't
actually kill systems as well as not touch any other resources... Only
enough to ensure stonithing is working and able to be called rather than
just test from command line..etc.etc..???

Sorry for sounding dumb if it's the obvious, getting old ;)

<clone id="DoFencing">
  <instance_attributes>
    <attributes>
      <nvpair name="clone_max" value="2"/>
      <nvpair name="clone_node_max" value="1"/>
    </attributes>
  </instance_attributes>
  <primitive id="child_DoFencing" class="stonith"
type="external/powerio" provider="heartbeat">
    <operations>
      <op name="monitor" interval="5s" timeout="20s" prereq="nothing"/>
      <op name="start" timeout="20s" prereq="nothing"/>
    </operations>
    <instance_attributes>
      <attributes>
        <nvpair name="hostlist" value="castor,pollux"/>
      </attributes>
    </instance_attributes>
  </primitive>
</clone>


George
-----Original Message-----
From: linux-ha-bounces at lists.linux-ha.org
[mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of matilda
matilda
Sent: Friday, 19 October 2007 5:43 PM
To: General Linux-HA mailing list
Subject: Re: RE: [Linux-HA] CRM and STONITH questions

>>> "Spam Filter" <Spam at citadelcomputer.com.au> 19.10.2007 04:36 >>>
> Hi,

Also hi, hi all,

> Is the nvpair for clone_max and clone_node_max a HA parameter or meant
>for my script? If HA, how do I know if I need the example settings or
>changed for a 2 node fail over system?

The stonith plugin for HAv2 has to be configures like a normal resource
(single resource or clone resource). The configuration example in the
wiki article uses a clone configuration. In the example you have a
two node cluster, therefore the example states 
'<nvpair name="clone_max" value="2"/>'
because a maximum of 2 clones is requested. Without any other
configuration
these 2 clones can run one one node if requested. But this doesn't make
much sense if exactly this node gets crazy (has to be stonithed).
Because of this there is the config snippet
<nvpair name="clone_node_max" value="1"/>
saying that on every node only a maximum of 1 clone has to be run.
These two config snippets together lead to a situation (in normal 
circumstances) where exactly one stonith clone runs on every node.
One node can shoot the other node or itself. That is NOT specified
by this configuration.
Short answer to your question: clone_max and clone_node_max are
config parameters for stonithd at the end.

> What exactly does the "monitor" do, is it just a status check as my
> device is a webpage and passing a 'status' returns a success if it can
> reach the website to stonith the nodes?
The monitor action does the same as with a normal resource, checking if
this resource is operational. If you have configured the monitor action
stonithd calls the external monitor plugin with the argument 'status'.
If the external stonith plugin resturns with return code 0, everything
is fine, if it returns with something different, stonithd is assuming
a failure of the plugin (stonith channel) and is propagating this
failure
to the deciding instance of HA (lrm->crm->pengine).
In an error case the failcount of this stonith resource is incremented.
Failover behaviour is the same as for normal resources (gurus out there:
Please correct me if I'm saying something wrong)

> What does the start and timeout meant for as well?
The same as for normal resources.

> For the parm1 and parm2 attributes, if my script uses the "hostlist"
> environment variable do I need to pass this in here or is it
> automatically set when the stonith is called.etc.etc.
etc, etc. is a little bit very unspecific, don't you think so?

To your first part of question: If a stonith plugin needs parameters,
these parameters are transferred as environment variables. The snippet
in
the example:
    <instance_attributes>
      <attributes>
        <nvpair name="parm1-name" value="parm1-value"/>
        <nvpair name="parm2-name" value="parm2-value"/>
        <!-- ... -->
      </attributes>
    </instance_attributes>
defines two parameters 'parm1-name' and 'parm2-name' and
the associated values. If you configure the stonith plugin that way,
the stonith plugin is called with these environment variables set.
(Caution: This is not true for ALL of the calls to the stonith plugin.
Only to those which need this information (on, off, reset))

Now to the 'hostlist': The stonith plugin can be one that can stonith
more that one node, like a stonith macine gun ;-)
in the startup phase of the stonith plugin the plugin is called with
the first argument 'gethost' (see documentation). The stonith plugin
has to answer with exactly one nodename (aka hostname) per line. But
it's
o.k. to send more that one line to state that the plugin is able to
shoot
more nodes. After that stonithd (or someone else in the machinery)
knows whom to ask when a node has to be shot.

When the external stonith plugin is called to shoot a node (1. parameter
is 'reset') than the second parameter is the node name of the node
to shoot. (By the way, I have to correct my last published stonith
plugin,
arghhh)

The other interface calls (getconfignames, getinfo-devid,
getinfo-devname, 
getinfo-devdescr, getinfo-devurl, getinfo-xml) are calls to the external
stonith plugin to present metainformations to the constrolling instance.
They are called at the start time of the plugin. Informations returned
there must be consistent to the parameters your stonith plugin need.
E.g. the parameters returned by the call to 'getconfignames' must
match the parameters returned as xml-snippet by the call to
'getinfo-xml')


> I'm totally lost on where the detailed info for this is so I can
> successfully make this work.
I think these information bring light into the dark. If these
informations
let you understand the way stonith plugins work, than you have (!!) to
put
an article to the wiki explaining that. That will be the price you have
to
pay.  ;-))


Best regards
Andreas Mock

_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


-------------- next part --------------
A non-text attachment was scrubbed...
Name: powerio.php
Type: application/octet-stream
Size: 5798 bytes
Desc: powerio.php
URL: <http://lists.linux-ha.org/pipermail/linux-ha/attachments/20071019/5b301214/attachment.obj>


More information about the Linux-HA mailing list