[Linux-HA] STONITH reincarnation pause

Andrew Beekhof beekhof at gmail.com
Mon Dec 17 01:50:36 MST 2007


On Dec 13, 2007, at 3:28 PM, John Gorman wrote:

> Hi All
>
> I have successfully set up Heartbeat / Xen / LVM / DRBD / LVM
> to run 3 Xen processes on each side of a pair of machines
> that can fail over to the sister machine as a group.
> They are named a1xen and b1xen.
>
> Each group of 3 Xen consists of two Master-Master replicating
> MySQL servers and an application server running Point of
> Sale software that sales clerks log into.
>
> I have instrumented my resource scripts to show what happens
> when one node fails.  This is the log on b1xen:
>
> Thu Dec 13 01:18:44 AST 2007 stonith reset a1xen
> Thu Dec 13 01:18:48 AST 2007 drbddisk vga start
> Thu Dec 13 01:18:48 AST 2007 lvm start VolGroupA
> Thu Dec 13 01:18:51 AST 2007 xen start a1my1
> Thu Dec 13 01:18:51 AST 2007 xen start a1my2
> Thu Dec 13 01:18:51 AST 2007 xen start a1asp
> Thu Dec 13 01:20:02 AST 2007 xen stop a1my1
> Thu Dec 13 01:20:03 AST 2007 xen stop a1my2
> Thu Dec 13 01:20:03 AST 2007 xen stop a1asp
> Thu Dec 13 01:20:38 AST 2007 lvm stop VolGroupA
> Thu Dec 13 01:20:39 AST 2007 drbddisk vga stop
>
> Node a1xen really did fail:  I have flaky hardware to
> test with for this purpose.  Node b1xen did correctly
> fence a1xen and took over its services.  After a1xen
> rebooted, it correctly migrated the services back.
>
> Here is the problem: taking over the services right
> away like this doesn't achieve anything except to
> bounce the MySQL servers and irritate the users who
> log in only to be dumped again one minute later.
>
> What I am looking for is a way to tell the surviving
> node to reset the sick node and wait a while to see
> if it will come back before taking over its services.
>
> Any ideas?

add a start delay (equal to how long you want the cluster to wait) for  
the first resource in the group(s)

eg.
<op id="wait-for-a-while" name="start" interval="0" start_delay="100s"/>

>
>
> Thanks, John
>
> John Gorman
> Master Merchant Systems
>
> P.S.  I wrote a nice external/ippower9258 stonith script
> to support the IPPower network power controller family.
> Is there some place that I should be submitting it to
> for other people to use?

here is the right place.
you'll need a note from your employer though - saying that they  
approve of you doing so

>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list