DRBD performance question ?

Bruce Walker bruce at kahuna.cag.cpqcorp.net
Wed Mar 6 15:30:28 MST 2002


Ravi,
> Bruce,
> If protocol A is fully asynchronous then it should
> not depend on the network BW or the remote server
> (node 2) I/O BW at all. What should 
> ideally happen is that the disk writes must happen
> at the primary(node 1) at a rate some what close to 
> the I/O band width of node 1. 

I guess it comes down to how one does "asynchronous" in
this context.  If, as I suspect, the data to be sent
to the replica is incore and queued to be sent, then
you will eventually run out of memory (or the kernel
will throttle you so it looks like you are out of memory).
At that point, when the next write comes in, it gets
blocked trying to queue the async write to the other node.

The only way around this would be to not queue the data
to be replicated but just queue the block number and then
perhaps have to read-off-disk the data to asynchronously
send it to the replica.  I doubt if Phillip is doing that.

Another possibility is that DRBD itself has a throttling
so it's async stuff doesn't get too far behind.  
Barriers for write ordering, as you suggested, are another
possibility.  Try the tests I suggested and we might be
able to infer if it is just a volume thing.

bruce


>However this does not
> seem to happend with DRBD as you can see in all
> results shown
> in the DRBD webpage, throughput is limitted by the
> network or the remote server I?O band width. I got
> similar results in my experiments.
> If the kernel is not throttling I/O s then some thing
> else must be, like the barriers for write ordering.
> What do u think ?
> 
> I will run some other benchmarks and let u all know
> the
> results. Meanwhile if you have some insights as to 
> this bottleneck please email me. 
> 
> Ravi 
> 
> --- Bruce Walker <bruce at kahuna.cag.cpqcorp.net> wrote:
> > Ravi,
> >   Not sure if you have the specific reason why the
> > kernel is
> > throttling or not, but at some point the kernel
> > must,
> > and after that you won't get more than the network
> > b/w throughput (my guess would be VM throttling and
> > not
> > network data structures but it is only a guess at
> > this
> > point).  To test this theory, you could try
> > sending more than the 100MB in the test (say 200MB).
> > I would guess that the throughput would decrease
> > below the
> > 9MB you saw.  Similarly, if you decreased to say
> > 50MB, you might see a dramatic hike in the
> > throughput
> > (because most of all of the data to be sent to the
> > replica could be in memory).
> > 
> > Assuming DRBD starting sending data near the
> > beginning
> > of the test, you could expect about 30MB to have
> > arrived after the 10 second test, leaving the other
> > 70MB to be asynchronously sent (note here you
> > have to be careful about not starting another test
> > while the async stuff is ongoing).
> > 
> > Why is network bandwidth only 3MB/s?  3MB isn't
> > terribly surprising, since that is about 30Megabits,
> > which is about 1/3 of the capacity, not atypical for
> > utilization of ethernet.  To get 8MB/s on a single
> > connection (as Alan indicated) would be very
> > impressive.    
> > 
> > Your last test (writes from node 2)
> > are clearly not being replicated (you got disk
> > i/o b/w results).
> > 
> > bruce
> > 
> > > Alan,
> > > 
> > > If protocol A is asynchronous how does the network
> > > performance so drastically affect the performance 
> > > of protocol A ?
> > > 
> > > I am seeing similar results in my testing. I have
> > > attached the output from the bebchmark. I have two
> > > identical 1.6GHz machines with 512MB memory and
> > > 100Mb/s NIC s. The test was conducted using a
> > point to
> > > point link. I have 22GB raid0 software raid set up
> > on
> > > two disks /dev/hda and /dev/hdb. I am running
> > Linux
> > > 2.4.17.
> > > 
> > > So it seems that the benchmark reads from
> > /dev/zero
> > > and writes /dev/null for the network B/W test. The
> > > network bandwith is around 3MB/s. When the actual
> > I/O
> > > is done I seem to be getting about 10GB/s for
> > protocol
> > > A on the primary node. If protocol A was
> > asynchronous
> > > I should be seeing some thing close to 50MB/s. The
> > > figure I get is not even close. 
> > > 
> > > I have an idea. Because the number of allocated
> > pages
> > > for disk I/o is growing the kernel must be
> > redeeming
> > > pages from the slab caches limitting the number of
> > > buffer_head ,skbuff s and request structures in
> > the
> > > system. Since DRBD uses all these structures we
> > see
> > > that the kernel is throttling the data limitting
> > the
> > > I/O performance. This is significantly apparent in
> > the
> > > test for network bandwidth. Since now these
> > structures
> > > and data are consumed faster that disk I/O the
> > > available memory gets depeleted sooner and we see
> > even
> > > a lower bandwidth for network.
> > > 
> > > Any thoughts ?
> > > 
> > > Ravi        
> > > 
> > > ------------------------ ==oo==
> > > -----------------------
> > >  DRBD Benchmark
> > >  Version: 0.6.1-pre9 (api:58)
> > >  SETSIZE = 100M
> > > 
> > > Node1:
> > >  Linux 2.4.17 i686
> > >  bogomips	: 3217.81
> > >  Disk write: 54.45 MB/sec (104857600 B /
> > 00:01.836696)
> > >  Drbd unconnected: 56.90 MB/sec (104857600 B /
> > > 00:01.757340)
> > > 
> > > Node2:
> > >  Linux 2.4.17-xfs i686
> > >  bogomips	: 3217.81
> > >  Disk write: 51.55 MB/sec (104857600 B /
> > 00:01.939839)
> > >  Drbd unconnected: 56.10 MB/sec (104857600 B /
> > > 00:01.782465)
> > > 
> > > Network: 
> > >  Bandwidth: 3.11 MB/sec (104857600 B /
> > 00:32.152080)
> > >  Latency: round-trip min/avg/max/mdev =
> > > 0.072/0.074/0.140/0.012 ms
> > > 
> > > Drbd connected (writing on node1):
> > >  Protocol A: 9.31 MB/sec (104857600 B /
> > 00:10.744729)
> > >  Protocol B: 9.38 MB/sec (104857600 B /
> > 00:10.662692)
> > >  Protocol C: 7.36 MB/sec (104857600 B /
> > 00:13.582487)
> > > 
> > > Drbd connected (writing on node2):
> > >  Protocol A: 57.68 MB/sec (104857600 B /
> > 00:01.733805)
> > >  Protocol B: 58.28 MB/sec (104857600 B /
> > 00:01.715931)
> > >  Protocol C: 55.53 MB/sec (104857600 B /
> > 00:01.800703)
> > > 
> > > 
> > > --- Alan Robertson <alanr at unix.sh> wrote:
> > > > Ravi Wijayaratne wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > I was looking at DRBD performance numbers from
> > 
> > > > >
> > > >
> > >
> >
> http://www.complang.tuwien.ac.at/reisner/drbd/performance.html
> > > > > 
> > > > >>From Phillip Reisner's paper I gather that
> > > > protocol A
> > > > > is an asynchronous protocol. Therefore
> > protocol A
> > > > > should not severely impact the write
> > performance
> > > > at
> > > > > the
> > > > > primary server. Is the above assertion correct
> > ?
> > > > > 
> > > > > If it is so how is it that the performance of
> > > > protocol
> > > > > A on the primary side is seems to be limitted
> > by
> > > > the
> > > > > network B/W ? If protocol A is asynchronous we
> > > > should
> > > > > see a significant difference in throughput
> > between
> > > > > protocol A and C. However they seem to be
> > quite
> > > > close.
> > > > > Is this discrepency caused by write ordering
> > or is
> > > > > there a hidden bottleneck in the protocol ?
> > > > 
> > > > 
> > > > First of all I'm sure that these performance
> > numbers
> > > > (from a year ago) were 
> > > > based on 2.2 kernels.  In 2.2, DRBD had to
> > suffer
> > > > the disk I/O scheduling 
> > > > twice: one at the DRBD level, and one at the
> > real
> > > > disk layer.  So, with 2.4 
> > > > (where this is avoided), the numbers look a lot
> > > > different.
> > > > 
> > > > As an aside, disks don't write that much
> > compared to
> > > > their total bandwidth, 
> > > > and that the smart protocol only rarely has to
> > > > synchronize.
> > > > 
> > > > A dedicated 100mbit connection (which is what he
> > > > tested with) provides about 
> > > > 8 megabytes/second writes.  If you say each
> > write is
> > 
> === message truncated ===
> 
> 
> =====
> ------------------------------
> Ravi Wijayaratne
> 
> __________________________________________________
> Do You Yahoo!?
> Try FREE Yahoo! Mail - the world's greatest free email!
> http://mail.yahoo.com/




More information about the Linux-HA mailing list