[Linux-HA] STONITH Module for Dell DRAC5

Andrew Beekhof beekhof at gmail.com
Mon May 14 10:50:04 MDT 2007


On 5/14/07, Alan Robertson <alanr at unix.sh> wrote:
> Andrew Beekhof wrote:
> >
> > On May 14, 2007, at 2:36 PM, Alan Robertson wrote:
> >
> >> Dejan Muhamedagic wrote:
> >>> On Fri, May 11, 2007 at 10:04:11AM +0200, Th.Paschy, hepasoft oHG wrote:
> >>>> Hi all,
> >>>>
> >>>> I am a new user of heartbeat.
> >>>>
> >>>> I configure an active/passive cluster with too Dell PE1900 based on
> >>>> SuSE
> >>>> linux with heartbeat 2.0.8 (r1-style). After some problems by DRBD
> >>>> resources
> >>>> after a cold reset of the master node (not removed locks), which was
> >>>> fixed
> >>>> by Phillipp Reissner last weekend all works fine.
> >>>>
> >>>> At next I was looking for a stonith module for the Dell Remote Access
> >>>> Controller DRAC 5 but I find only one for the drac3. Inside the
> >>>> drac5 the
> >>>> layout of the embedded Web-Interface have been changed, so the drac3
> >>>> module
> >>>> won't work.
> >>>>
> >>>> So I've write my own module strongly based on the acpmaster module. The
> >>>> module uses the SM-CLP command line interface of the drac5 via
> >>>> telnet. I'm
> >>>> really not a good C-programmer but it works perfectly.
> >>>>
> >>>> But there would be one problem (and with the drac3 module it would
> >>>> be too),
> >>>> if the server lost power connection. So the Remote Access Card won't be
> >>>> accessible and the fencing process will never been stopped and so no
> >>>> resource take_over take place, unless you manually take corrective
> >>>> action.
> >>>> So a redundant power supply would be strongly recommend.
> >>>>
> >>>> I've seen that other users are looking for a drac5 module too, so I've
> >>>> attached the source of the drac5 module.
> >>>
> >>> Thanks for the contribution! Alan will probably want to the usual
> >>> legal chanting.
> >>
> >> I'll send you an email on this.
> >>
> >>>> It would be glad, if some one could tell me a way, how I can handle the
> >>>> described problem on never endings fencing, if the access to the
> >>>> drac will
> >>>> loss (cause of power lost or the network connection would fail).
> >>>
> >>> Unfortunately, there's no workaround. If heartbeat cannot stonith
> >>> the node, it will go on trying forever. If stonith is configured,
> >>> we must make sure that the node is rebooted or shutdown. If the
> >>> stonith device is not accessible, well, too bad. The UPS based
> >>> stonith devices are definitely preferable to the lights-out
> >>> embedded kind.
> >>
> >> Sometime in the past, I asked Andrew for a feature which would allow the
> >> takeover to proceed after a certain number of failed STONITHs, if things
> >> were configured to allow that.  I don't remember whether he did that
> >> or not.
> >
> > It never got implemented.
> >
> >>
> >> For these kind of cases, it seems like a good thing.
> >
> > I disagree - remember the "you can't make it up" part of "You dont know
> > what you dont know"
> > In general, you don't know that the node is dead, only that the stonith
> > device is...
> >
> > In the case of this plugin, apparently, the stonith device being dead
> > implies the host is also.
>
> Or a network failure, or other things.  It's not a certain thing.
>
> Although one wishes not to make things up, and avoids it when one can,
> for some configurations it's better than sitting on one's hands.  And,
> for split-site configurations, it would mean leaving out STONITH
> completely, when one should _try_ and use it.  In the split-site case,
> all stonith plugins would be unusable  if the inter-site link fails.
>
> > This makes me inclined to think that the plugin is therefor a "better"
> > place to implement such behavior.
> >
> > It also means that such a feature could be turned on for individual
> > stonith devices rather than unilaterally - which may not be a good idea
> > especially in mixed-stonith-device environments
>
> I can see the logic to this.  But, of course, it is more work to
> implement it 20 times in 20 plugins than in one place.  And probably
> then 20 slightly different criteria for detecting it and 4 or 5
> different ways to specify it in the configuration.
>
> And, of course, since most of the plugin authors don't work on the
> project, and many no longer have access to the necessary hardware, it's
> probably impossible to implement in practice in the plugins.
>
> I suppose it could be implemented in the stonith daemon instead.
> Unfortunately, it has no access to configuration information, so it
> would be painful to specify to the stonithd compared to the CRM or the
> plugins.
>
> If stonithd were a resource, then configuring it would be easy.
> [although having the stonith daemon be a resource would create its own
> problems].
>

or you could implement it in the stonithd and enable/disable it in the
stonith resource definition at the same time as it loads the plugin.


More information about the Linux-HA mailing list