[Linux-HA] Re: deadtime, warntime, and drbd

Alan Robertson alanr at unix.sh
Mon Mar 7 13:04:50 MST 2005

Jason Joines wrote:
> Lars Marowsky-Bree wrote:
>> On 2005-03-04T09:43:21, Jason Joines <support at bus.okstate.edu> wrote:
>>>   At the time this occurred, nodea was serving smb requests to a 
>>> large number of clients via eth0.  I had mounted drbd1 on nodeb, 
>>> exported it via NFS, and was rapidly copying the entire filesystem of 
>>> another box to it via eth1.  Apparently the load got high enough on 
>>> nodeb that communication between the nodes failed and mass confusion 
>>> ensued (at least that's what I can make of the logs).  Eventually 
>>> nodeb rebooted itself, the drbds went into either StandAlone or 
>>> Disconnected mode and I had to manually tell nodea to take the smb 
>>> resource group back.
>> It literally rebooted itself? Are you using the watchdog timer?
>> Please provide the log messages of the node directly prior to the
>> reboot.
>> Sincerely,
>>    Lars Marowsky-Brée <lmb at suse.de>
>    Yep, literally.  I'm having trouble getting the logs through due to 
> the 40 Kb message size limit on the list.  Looks like mine hit 57 Kb.  
> I'm going to try and send them separately.
>    Honestly, I don't even know what the "watchdog timer" is.  Both boxes 
> are Dell Poweredges.  Nodea is a 2450 and nodeb is a 2550.  Both are 
> using onboard Adaptec aic7899 Ultra160 SCSI adapters.  Both boxes are 
> using IBM Ultrastar, 146 GB, Ultra320 SCSI drives.  Both have drbd0 as 
> sdb and drbd1 as sdc.  The following messages contain everything from 
> the logs on both boxes that contains drbd OR ipfail OR heartbeat from 
> the time I started the NFS operation on nodeb (12:02:42) through the 
> reboot of nodeb (12:38:16) up until nodeb came back up (12:42:38).

Send your big message.

I'll find it and approve it.

     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce

