[Linux-HA] STONITH Module for Dell DRAC5

Andrew Beekhof abeekhof at suse.de
Mon May 14 06:52:21 MDT 2007


On May 14, 2007, at 2:36 PM, Alan Robertson wrote:

> Dejan Muhamedagic wrote:
>> On Fri, May 11, 2007 at 10:04:11AM +0200, Th.Paschy, hepasoft oHG  
>> wrote:
>>> Hi all,
>>>
>>> I am a new user of heartbeat.
>>>
>>> I configure an active/passive cluster with too Dell PE1900 based  
>>> on SuSE
>>> linux with heartbeat 2.0.8 (r1-style). After some problems by  
>>> DRBD resources
>>> after a cold reset of the master node (not removed locks), which  
>>> was fixed
>>> by Phillipp Reissner last weekend all works fine.
>>>
>>> At next I was looking for a stonith module for the Dell Remote  
>>> Access
>>> Controller DRAC 5 but I find only one for the drac3. Inside the  
>>> drac5 the
>>> layout of the embedded Web-Interface have been changed, so the  
>>> drac3 module
>>> won't work.
>>>
>>> So I've write my own module strongly based on the acpmaster  
>>> module. The
>>> module uses the SM-CLP command line interface of the drac5 via  
>>> telnet. I'm
>>> really not a good C-programmer but it works perfectly.
>>>
>>> But there would be one problem (and with the drac3 module it  
>>> would be too),
>>> if the server lost power connection. So the Remote Access Card  
>>> won't be
>>> accessible and the fencing process will never been stopped and so no
>>> resource take_over take place, unless you manually take  
>>> corrective action.
>>> So a redundant power supply would be strongly recommend.
>>>
>>> I've seen that other users are looking for a drac5 module too, so  
>>> I've
>>> attached the source of the drac5 module.
>>
>> Thanks for the contribution! Alan will probably want to the usual
>> legal chanting.
>
> I'll send you an email on this.
>
>>> It would be glad, if some one could tell me a way, how I can  
>>> handle the
>>> described problem on never endings fencing, if the access to the  
>>> drac will
>>> loss (cause of power lost or the network connection would fail).
>>
>> Unfortunately, there's no workaround. If heartbeat cannot stonith
>> the node, it will go on trying forever. If stonith is configured,
>> we must make sure that the node is rebooted or shutdown. If the
>> stonith device is not accessible, well, too bad. The UPS based
>> stonith devices are definitely preferable to the lights-out
>> embedded kind.
>
> Sometime in the past, I asked Andrew for a feature which would  
> allow the
> takeover to proceed after a certain number of failed STONITHs, if  
> things
> were configured to allow that.  I don't remember whether he did  
> that or not.

It never got implemented.

>
> For these kind of cases, it seems like a good thing.

I disagree - remember the "you can't make it up" part of "You dont  
know what you dont know"
In general, you don't know that the node is dead, only that the  
stonith device is...

In the case of this plugin, apparently, the stonith device being dead  
implies the host is also.
This makes me inclined to think that the plugin is therefor a  
"better" place to implement such behavior.

It also means that such a feature could be turned on for individual  
stonith devices rather than unilaterally - which may not be a good  
idea especially in mixed-stonith-device environments




More information about the Linux-HA mailing list