Another look at bzr network traffic

Martin Pool mbp at canonical.com
Wed Apr 14 06:33:02 BST 2010


On 6 April 2010 13:28, Andrew Cowie <andrew at operationaldynamics.com> wrote:
>> What that means for the current smart protocol, is that all fetches will
>> copy at least 1 fulltext, regardless of the delta size. So a 1 byte
>> change to a 1MB file will transmit 1MB of data, though after
>> autopack/pack it will probably get stored as a small handful of bytes
>> again. (While it is in its own pack, it is still stored as 1MB on disk.)
>
> Once upon a time when people were optimizing normal operations over
> http:// it was mentioned that Bazaar is very good about using HTTP Range
> requests to only grab the actual bytes that it cares about. All good.
>
> But this makes it sound like over the bzr{,+ssh}:// protocol that
> requesting a small revision may end up shipping an entire pack file no
> matter what. If true, then the obvious concern is "I just got repacked
> into a single pack that's now 100 MB big, goodie! here it comes!"

It does make it sound a bit like that, but your conclusion is not
quite right.  A pack file contains multiple compression groups.  Each
compression group is independent in the sense that you can extract
anything that's in it without needing to read any other groups, along
the lines of John's cited mail, so they grow in relation to the
gzipped size of the largest single file included within them.

So repacking should have ~0 effect on how much data is sent over http
when you read it, and it should reduce the number of round trips.

(An important exception is that actually repacking over a dumb
transport like sftp will of course need to read down and write up the
whole contents.)

> On Wed, 2010-03-17 at 12:24 +1100, Andrew Bennetts wrote:
>> I'm not sure that this is worth worrying about, e.g. on my usual
>> internet connection 200kB takes less than a second to receive.  Is
>> this
>> behaviour a problem for you?
>
> That sounds ... cavalier. I mean, sure, sometimes we have nice fat
> network connections all to ourselves, but just as often not (when I read
> this email last week I was on a client site with 200 developers on less
> bandwidth than you have at home). We all get that network bytes = time
> and given that we're trying to make Bazaar fast, unnecessary network
> traffic seems ... unnecessary.

I don't think it's cavalier so much as a recognition that different
connections have different delay/bandwidth characteristics.  Generally
speaking it's time elapsed that counts, not bytes sent, and so we want
to keep the pipe full.  Many software developers have a 0.5-20Mbps
connection with 20-300ms speed-of-light latency, and in this case
sending 100kB is as cheap as sending a single byte, so if there is
anything else that can usefully be sent we should send it.  I think
this is reasonable as the normal case.  On the other hand of course
there are people with cell data connections that are very slow and
charged per byte, people with 56k6 modems, and people with 10Gbps
local links; the normal case is not the only important case.

> For what it's worth, I see "excessive" network traffic (as subjectively
> judged by "long" wall clock times for `bzr pull`s against remote
> repositories) quite frequently.

We're not satisfied yet with network speed.

-- 
Martin <http://launchpad.net/~mbp/>



More information about the bazaar mailing list