[Linux-HA] Re: deadtime, warntime, and drbd
alanr at unix.sh
Mon Mar 7 13:04:50 MST 2005
Jason Joines wrote:
> Lars Marowsky-Bree wrote:
>> On 2005-03-04T09:43:21, Jason Joines <support at bus.okstate.edu> wrote:
>>> At the time this occurred, nodea was serving smb requests to a
>>> large number of clients via eth0. I had mounted drbd1 on nodeb,
>>> exported it via NFS, and was rapidly copying the entire filesystem of
>>> another box to it via eth1. Apparently the load got high enough on
>>> nodeb that communication between the nodes failed and mass confusion
>>> ensued (at least that's what I can make of the logs). Eventually
>>> nodeb rebooted itself, the drbds went into either StandAlone or
>>> Disconnected mode and I had to manually tell nodea to take the smb
>>> resource group back.
>> It literally rebooted itself? Are you using the watchdog timer?
>> Please provide the log messages of the node directly prior to the
>> Lars Marowsky-Brée <lmb at suse.de>
> Yep, literally. I'm having trouble getting the logs through due to
> the 40 Kb message size limit on the list. Looks like mine hit 57 Kb.
> I'm going to try and send them separately.
> Honestly, I don't even know what the "watchdog timer" is. Both boxes
> are Dell Poweredges. Nodea is a 2450 and nodeb is a 2550. Both are
> using onboard Adaptec aic7899 Ultra160 SCSI adapters. Both boxes are
> using IBM Ultrastar, 146 GB, Ultra320 SCSI drives. Both have drbd0 as
> sdb and drbd1 as sdc. The following messages contain everything from
> the logs on both boxes that contains drbd OR ipfail OR heartbeat from
> the time I started the NFS operation on nodeb (12:02:42) through the
> reboot of nodeb (12:38:16) up until nodeb came back up (12:42:38).
Send your big message.
I'll find it and approve it.
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me claim
from you at all times your undisguised opinions." - William Wilberforce
More information about the Linux-HA