[Linux-HA] Re: deadtime, warntime, and drbd

Alan Robertson alanr at unix.sh
Fri Mar 4 23:21:28 MST 2005

Jason Joines wrote:
> Alan Robertson wrote:
>> Jason Joines wrote:
>>>    I recently experienced the "Cluster node returning after 
>>> partition" problem described in FAQ #12.  I have two nodes and two 
>>> resource groups, one is the prefered node for each.  Nodea is the 
>>> prefered node for drbd0, it's filesystem, an ip address, and samba.  
>>> Nodeb is the prefered node for drbd1.  Both are connected to a public 
>>> 100 Mbps switch via eth0 and a private 1 Gbps switch via eth1.
>>>    At the time this occurred, nodea was serving smb requests to a 
>>> large number of clients via eth0.  I had mounted drbd1 on nodeb, 
>>> exported it via NFS, and was rapidly copying the entire filesystem of 
>>> another box to it via eth1.  Apparently the load got high enough on 
>>> nodeb that communication between the nodes failed and mass confusion 
>>> ensued (at least that's what I can make of the logs).  Eventually 
>>> nodeb rebooted itself, the drbds went into either StandAlone or 
>>> Disconnected mode and I had to manually tell nodea to take the smb 
>>> resource group back.
>>>    My timing settings in ha.cf at the time were
>>> keepalive 1
>>> deadtime 16
>>>    Following the FAQ suggestion I have upped deadtime to 64 and set 
>>> warntime to 16 so I can watch the logs for a while.  However, I'm 
>>> unsure how my drbd timing settings are interacting with this.  They 
>>> were, and at the moment still are, connect-int 8
>>> ping-int 4
>>> timeout 20
>>>    Any suggestions for modifying these settings to be more in tune 
>>> with heartbeat?
>> What version of heartbeat are you trying this on?
>    I'm using heartbeat 1.2.3 and drbd 0.7.10 on SuSE 9.2 with kernel 
> 2.6.10.

This sounds mostly reasonable.  What kind of disks are you using, with what 
kind of disk controller?  Do you have DMA enabled on the disks?

The DRBD deadtime needs to be shorter than the heartbeat deadtime - at 
least slightly.

I find the rebooting a little worrisome.

     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce

More information about the Linux-HA mailing list