[Linux-HA] "Clones, Stonith and Suicide" The SysAdmin who had a nervous breakdown.

Dejan Muhamedagic dejanmm at fastmail.fm
Wed Oct 3 11:18:41 MDT 2007


Hi Dave,

On Wed, Oct 03, 2007 at 08:49:23AM -0500, Dave Blaschke wrote:
> Dejan Muhamedagic wrote:
> >Hi,
> >
> >On Tue, Oct 02, 2007 at 10:55:03PM +0100, Peter Farrell wrote:
> >  
> >>On 02/10/2007, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> >>    
> >>>Hi,
> >>>
> >>>On Tue, Oct 02, 2007 at 05:17:38PM +0100, Peter Farrell wrote:
> >>>      
> >>>>Can someone verify my CIB please?
> >>>>
> >>>>It's not working as intended and the more I read the less I 
> >>>>understand...
> >>>>I've stared at the config for the past 2 days hoping to be struck by
> >>>>sudden understanding... hasn't happened yet.
> >>>>        
> >>>Don't worry, the learning curve is extremely steep. We all need
> >>>quite some patience.
> >>>
> >>>      
> >>>>I don't understand how you make a rule, and then call that rule as a
> >>>>result of an action. I used the bit from the pingd FAQ page:
> >>>>http://www.linux-ha.org/v2/faq/pingd
> >>>>"Quickstart - Only Run my_resource on Nodes with Access to at Least
> >>>>One Ping Node"
> >>>>
> >>>>So - for my pingd clone, the operation is 'monitor' and 'on_fail=fence'
> >>>><op id="pingd-child-monitor" name="monitor" interval="20s"
> >>>>timeout="40s" prereq="nothing" on_fail="fence"/>
> >>>>
> >>>>I assume that this literally means:
> >>>>"ask the LRM to see if pingd is running every 20s, if after 40s pingd
> >>>>is not running, call it 'failed', and as it's 'failed' - fence it off,
> >>>>which forces the resource to migrate to another node and marks this
> >>>>one as 'degraded' and will not allow resource to run until it's been
> >>>>cleaned up"
> >>>>
> >>>>Is that right? If so, then this bit I'm OK with.
> >>>>        
> >>>No, not exactly. The monitor operation may fail (i.e. the
> >>>resource agent says that the resource isn't running) or timeout
> >>>(that's what you described). Of course, both are considered to be
> >>>failures by CRM. on_fail=fence means that if this operation
> >>>fails, the node will be fenced, i.e. rebooted if you have an
> >>>operational stonith device. Perhaps a tad harsh for a monitor
> >>>failure.
> >>>      
> >>1. The approach for me is (this is a test cluster - but I want to use
> >>it to replace a production one) - if either of the load balancers
> >>can't ping one or two routers in my DMZ, then this must mean they're
> >>dead. I figured if they can't see the router - how the hell can they
> >>see the apache servers they're meant to be managing?
> >>Is this 'correct political thought' or a sloppy foundation to begin with?
> >>    
> >
> >It's just that the resources _are_ going to move. No need to kill
> >the cooperating node.
> >
> >  
> >>2. I didn't know that fence meant 'rebooted'. I thought it was sort of
> >>'fenced off' and left in a degraded state should someone want to poke
> >>around a bit.
> >>RE: Perhaps a tad harsh for a monitor failure - I agree. But what's a
> >>girl to do?
> >>Am I on the right track here? Do I want it rebooting? Do I just want
> >>Heartbeat to restart? Does it matter? If it comes up and the link is
> >>still dead - will it cycle forever w/ reboots?
> >>    
> >
> >Not sure, but could be. Whenever a node comes up all resources
> >are probed, i.e. one monitor operation is fired.
> >
> >  
> >>3. the real bit I'm missing: Let's say I want it rebooted after
> >>fencing.
> >>    
> >
> >Fencing _is_ rebooting.
> >
> >  
> Sorry to jump in the middle of this thread, but can't you also power off 
> the node by setting stonith_action to poweroff instead of reboot?  Of 
> course you need a stonith device that supports ST_POWEROFF...  I haven't 
> read through the code but I'd assume that option works.

True. Thanks for mentioning this.

Dejan


> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


More information about the Linux-HA mailing list