Notes on smart server network performance: what to do next

John Arbash Meinel john at arbash-meinel.com
Fri Dec 7 20:31:39 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> One area we're trying to improve is the performance over the network when using
> the smart server.  Here's a braindump composed from notes I made during a
> conversation with Robert recently:

The only thing I might add to this, is that sometimes the optimal disk storage
is not the optimal network transfer layout.

For example, we might use gzip/zlib for each hunk. But we would probably see a
large compression increase if we could decompress them, and then send the whole
thing as a large bzip2 stream. Since then you would get cross-hunk compression.
(Though we have at least thought about doing cross-hunk compression for the
disk storage, too).

Also, along those lines... even if you know that revision 100 is a fulltext
locally, you might send a delta across the wire, since you know the client has
revision 99. This would especially help bandwidth when you are sending 1
revision which just happens to be a fulltext.

I do know there is always the "make the server scale by pushing more work out
into the clients", and CPU versus Bandwidth tradeoffs.

Also, when we work out a better revision detection algorithm, we might also
consider the benefits of not getting too fine-grained. If I send you 2
revisions you don't need, that may be less of a penalty then spending more
round trips getting it exactly right. (Especially true here are when a
repository might have some extra file texts, etc, from a previously canceled
push/pull. Though that effects Knits more).

Oh, that also brings something else up...

With Knit format, if you do "bzr branch $BIG_PROJECT", wait 30 minutes and then
^C. You have quite a bit of content already downloaded which is in nearly
usable form. You probably have copied a lot of the file-texts at least. Though
we require you to re-download all of the inventory texts (which is a non
trivial amount of data in the current scheme).

With Packs, though, it is a "If I didn't download everything, then I get
*nothing*". I'm happy to have the ability to consider interrupted data
incomplete. But it sure would be nice if long transfers had a bit of
checkpointing. So I think we want to be wary of being *too* pipelined.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHWa2rJdeBCYSNAAMRApxdAJ0YZ5GIngoxUZDNgM01ET2eiBeipQCePGyk
J4TItMFUDWdZ82XHrV+vrqk=
=+Sgk
-----END PGP SIGNATURE-----



More information about the bazaar mailing list