[Linux-HA] deadtime, warntime, and drbd

Jason Joines support at bus.okstate.edu
Fri Mar 4 08:43:21 MST 2005


    I recently experienced the "Cluster node returning after partition" 
problem described in FAQ #12.  I have two nodes and two resource groups, 
one is the prefered node for each.  Nodea is the prefered node for 
drbd0, it's filesystem, an ip address, and samba.  Nodeb is the prefered 
node for drbd1.  Both are connected to a public 100 Mbps switch via eth0 
and a private 1 Gbps switch via eth1.
    At the time this occurred, nodea was serving smb requests to a large 
number of clients via eth0.  I had mounted drbd1 on nodeb, exported it 
via NFS, and was rapidly copying the entire filesystem of another box to 
it via eth1.  Apparently the load got high enough on nodeb that 
communication between the nodes failed and mass confusion ensued (at 
least that's what I can make of the logs).  Eventually nodeb rebooted 
itself, the drbds went into either StandAlone or Disconnected mode and I 
had to manually tell nodea to take the smb resource group back.
    My timing settings in ha.cf at the time were
keepalive 1
deadtime 16
    Following the FAQ suggestion I have upped deadtime to 64 and set 
warntime to 16 so I can watch the logs for a while.  However, I'm unsure 
how my drbd timing settings are interacting with this.  They were, and 
at the moment still are, connect-int 8
ping-int 4
timeout 20
    Any suggestions for modifying these settings to be more in tune with 
heartbeat?

Jason Joines
=================================



More information about the Linux-HA mailing list