[Linux-HA] Re: deadtime, warntime, and drbd
Lars Marowsky-Bree
lmb at suse.de
Tue Mar 8 09:59:51 MST 2005
On 2005-03-08T09:14:59, Jason Joines <support at bus.okstate.edu> wrote:
> Nope, the only messages whatsoever in the logs between 12:31:04 and
> 12:38:00 are all slapd messages.
OK, so at least the node itself is alive. That's fine, that's all I
wanted to know to make sure it's not nodea's clock jumping crazily
around.
> I don't have a STONITH configuration.
> The intent here is to have an active-active setup with one node using
> drbd0 and the other drbd1. I just don't have anything running on drbd1
> yet. I thought STONITH wasn't for use in an active-active setup
STONITH is needed regardless of active/passive or active/active
configuration.
> >What now happens is that heartbeat tries to recover from the split brain
> >scenario by restarting itself on both sides, which is expected
> >behaviour.
> >
> From the outside, nodeb still looked fine up intil it rebooted. A
> third box, backup, was copying files to it's drbd1 via NFS over eth1 and
> still making progress. Though I'm guessing it may have been saturating
> eth1 with NFS traffic and interfering with heartbeat communications.
> There was a lot of activity in nodeb's logs.
Even NFS shouldn't saturate the network so badly that 430 packets get
lost. That's 7 minutes without a single packet making it through! The
probability of that, is, like, rather low ;-)
And, I assume you have more than one heartbeat medium, and it's unlikely
that they all should be saturated at the same time.
Now, NFS is known to cause resource starvation because it basically
sucks, but again, not for a whole of 7 minutes.
> Pretty sure it's correct. I haven't yet setup any resources to run
> on drbd1. Since nodeb had drbd1 as primary, I had temporarily mounted
> it and exported it via nfs to a third box which accessed it via nodeb's
> administrative address on eth0.
>
> ########### haresources start ###########
> nodea drbddisk::drbd0
> Filesystem::/dev/drbd0::/home::xfs::rw,suid,dev,exec,noauto,nouser,async,noatime,quota,grpquota
> 172.18.88.102 smb
> nodeb drbddisk::drbd1
> ########### haresources stop ###########
OK, that looks fine. (Assuming that all of nodea's resources are on one
line in practice, which probably is the case.)
> The reboot wasn't on nodea, it was on nodeb. It shows up in the logs
> on nodeb. I sent the nodeb logs in a separate message.
Ohh. Then I need to rescan those. Sorry.
Sincerely,
Lars Marowsky-Brée <lmb at suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
More information about the Linux-HA
mailing list