Excessive network traffic for pulling a small change

Andrew Bennetts andrew.bennetts at canonical.com
Wed Mar 17 01:24:34 GMT 2010


Stefan Monnier wrote:
> > Although by contrast, doing the corresponding pull is a lot smaller via
> > a smart protocol, and then doing the log is free.
> 
> > bzr -Dbytes -Dfetch pull -r 2251 \
> >    bzr+ssh://bazaar.launchpad.net/~vcs-imports/grub/grub2-bzr
> 
> > Transferred: 286KiB (53.8K/s r:282K w:4K)
> 
> 286KB is still hard to justify for such a small change.

Is this a problem for you, or is it just that you were surprised?

I'll try to explain a little about why this particular smart server
interaction took as much traffic as it did.

Basically, we make some tradeoffs to reduce round trips at the expense
of sending more data than is strictly necessary to reduce wall
clock time.  This expense is usually pretty insignificant, but when the
actually data to transfer is small it can be larger — but 286kB is still
not that much.

For example, the server deliberately pads out the response to
“Repository.get_parent_map” to about 64kB, even if the request only
asked about one item, because most of the time the client will need that
information soon anyway, and an extra 60kB is rarely noticeable.  So
that's a small part of the cost here (maybe 50kB).

Also, bzr doesn't just store deltas of file versions.  At regular
intervals there will tend to be a fulltext version in the repository so
that accessing arbitrary versions doesn't require reading and
decompressing all of the history of that file.   (I don't immediately
recall the precise details of this for the 2a format, so pardon any
vagueness in that description.)

Perhaps surprisingly, when transferring over the network we don't
transform those fulltext records to deltas... this is simpler and also
saves the processing effort of decompressing on one side and
recompressing on the other; instead the records read off the wire can be
written directly to disk on the assumption that they are already
reasonably well packed for general use.  I don't recall the details, but
I think we found under (what we assume to be) typical conditions this is
faster than decompressing/recompressing.

So in this case what seems to have happened is that the server has
transferred a (compressed) fulltext of the Changelog, and that accounts
for about 200kB of that transfer.

I'm not sure that this is worth worrying about, e.g. on my usual
internet connection 200kB takes less than a second to receive.  Is this
behaviour a problem for you?

-Andrew.




More information about the bazaar mailing list