[Linux-HA] CRM and STONITH questions

Andrew Beekhof beekhof at gmail.com
Fri Oct 19 04:55:04 MDT 2007

On 10/19/07, Spam Filter <Spam at citadelcomputer.com.au> wrote:
> OK, I got a better understanding now.
> Attached is my small script (missing one class file but not needed to
> show) which is my stonith script.
> It does all the methods required but what I have at the beginning is
> that the "hostlist" environment must be set otherwise it exits with a 1.
> I don't know if this is always set when the script is called all the
> time or only when stonith action (on|off|reset) is needed.

if you set a parameter called "hostlist" in the CIB, then it should be
available all the time

the alternative is to have the stonith agent figure out which hosts it can shoot

> Now from the information you gave below, I think my script does as it
> should maybe except the environment stuff I mentioned above.
> I originally had it so you can call it like this :
> ./powerio castor reset
> ./powerio castor off
> Etc.
> But after reading the hostlist, thought it would be better so it can do
> multiples and compatible with the way HA do the environment
> stuff..*shrugs*
> Now onto loading the stonith into the cib, do I specifically need to put
> all host names capable of stonith as the parameter


>  below as I would've
> though all members on HA would/could be stonithed... If not, then I
> assume I need to add names to this list as more members join, correct?
> I assume the below modified example would suffice my script. Is there an
> easy way to implement this on a live system and test that HA can stonith
> but make sure it's a test run with the plugs unplugged so it doesn't
> actually kill systems as well as not touch any other resources... Only
> enough to ensure stonithing is working and able to be called rather than
> just test from command line..etc.etc..???

there is a command called stonith that you can use to test it

> Sorry for sounding dumb if it's the obvious, getting old ;)
> <clone id="DoFencing">
>   <instance_attributes>
>     <attributes>
>       <nvpair name="clone_max" value="2"/>
>       <nvpair name="clone_node_max" value="1"/>
>     </attributes>
>   </instance_attributes>
>   <primitive id="child_DoFencing" class="stonith"
> type="external/powerio" provider="heartbeat">
>     <operations>
>       <op name="monitor" interval="5s" timeout="20s" prereq="nothing"/>
>       <op name="start" timeout="20s" prereq="nothing"/>
>     </operations>
>     <instance_attributes>
>       <attributes>
>         <nvpair name="hostlist" value="castor,pollux"/>
>       </attributes>
>     </instance_attributes>
>   </primitive>
> </clone>
> George
> -----Original Message-----
> From: linux-ha-bounces at lists.linux-ha.org
> [mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of matilda
> matilda
> Sent: Friday, 19 October 2007 5:43 PM
> To: General Linux-HA mailing list
> Subject: Re: RE: [Linux-HA] CRM and STONITH questions
> >>> "Spam Filter" <Spam at citadelcomputer.com.au> 19.10.2007 04:36 >>>
> > Hi,
> Also hi, hi all,
> > Is the nvpair for clone_max and clone_node_max a HA parameter or meant
> >for my script? If HA, how do I know if I need the example settings or
> >changed for a 2 node fail over system?
> The stonith plugin for HAv2 has to be configures like a normal resource
> (single resource or clone resource). The configuration example in the
> wiki article uses a clone configuration. In the example you have a
> two node cluster, therefore the example states
> '<nvpair name="clone_max" value="2"/>'
> because a maximum of 2 clones is requested. Without any other
> configuration
> these 2 clones can run one one node if requested. But this doesn't make
> much sense if exactly this node gets crazy (has to be stonithed).
> Because of this there is the config snippet
> <nvpair name="clone_node_max" value="1"/>
> saying that on every node only a maximum of 1 clone has to be run.
> These two config snippets together lead to a situation (in normal
> circumstances) where exactly one stonith clone runs on every node.
> One node can shoot the other node or itself. That is NOT specified
> by this configuration.
> Short answer to your question: clone_max and clone_node_max are
> config parameters for stonithd at the end.
> > What exactly does the "monitor" do, is it just a status check as my
> > device is a webpage and passing a 'status' returns a success if it can
> > reach the website to stonith the nodes?
> The monitor action does the same as with a normal resource, checking if
> this resource is operational. If you have configured the monitor action
> stonithd calls the external monitor plugin with the argument 'status'.
> If the external stonith plugin resturns with return code 0, everything
> is fine, if it returns with something different, stonithd is assuming
> a failure of the plugin (stonith channel) and is propagating this
> failure
> to the deciding instance of HA (lrm->crm->pengine).
> In an error case the failcount of this stonith resource is incremented.
> Failover behaviour is the same as for normal resources (gurus out there:
> Please correct me if I'm saying something wrong)
> > What does the start and timeout meant for as well?
> The same as for normal resources.
> > For the parm1 and parm2 attributes, if my script uses the "hostlist"
> > environment variable do I need to pass this in here or is it
> > automatically set when the stonith is called.etc.etc.
> etc, etc. is a little bit very unspecific, don't you think so?
> To your first part of question: If a stonith plugin needs parameters,
> these parameters are transferred as environment variables. The snippet
> in
> the example:
>     <instance_attributes>
>       <attributes>
>         <nvpair name="parm1-name" value="parm1-value"/>
>         <nvpair name="parm2-name" value="parm2-value"/>
>         <!-- ... -->
>       </attributes>
>     </instance_attributes>
> defines two parameters 'parm1-name' and 'parm2-name' and
> the associated values. If you configure the stonith plugin that way,
> the stonith plugin is called with these environment variables set.
> (Caution: This is not true for ALL of the calls to the stonith plugin.
> Only to those which need this information (on, off, reset))
> Now to the 'hostlist': The stonith plugin can be one that can stonith
> more that one node, like a stonith macine gun ;-)
> in the startup phase of the stonith plugin the plugin is called with
> the first argument 'gethost' (see documentation). The stonith plugin
> has to answer with exactly one nodename (aka hostname) per line. But
> it's
> o.k. to send more that one line to state that the plugin is able to
> shoot
> more nodes. After that stonithd (or someone else in the machinery)
> knows whom to ask when a node has to be shot.
> When the external stonith plugin is called to shoot a node (1. parameter
> is 'reset') than the second parameter is the node name of the node
> to shoot. (By the way, I have to correct my last published stonith
> plugin,
> arghhh)
> The other interface calls (getconfignames, getinfo-devid,
> getinfo-devname,
> getinfo-devdescr, getinfo-devurl, getinfo-xml) are calls to the external
> stonith plugin to present metainformations to the constrolling instance.
> They are called at the start time of the plugin. Informations returned
> there must be consistent to the parameters your stonith plugin need.
> E.g. the parameters returned by the call to 'getconfignames' must
> match the parameters returned as xml-snippet by the call to
> 'getinfo-xml')
> > I'm totally lost on where the detailed info for this is so I can
> > successfully make this work.
> I think these information bring light into the dark. If these
> informations
> let you understand the way stonith plugins work, than you have (!!) to
> put
> an article to the wiki explaining that. That will be the price you have
> to
> pay.  ;-))
> Best regards
> Andreas Mock
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

More information about the Linux-HA mailing list