[merge] http multirange support

John Arbash Meinel john at arbash-meinel.com
Fri Jul 14 18:27:25 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> Michael Ellerman wrote:

...

> Hopefully we can test it against some non-apache servers, to make sure
> we really get what we expect. And I think it is reasonable to add some
> of your concerns as 'TODO' items. I don't know that bzr supports
> redirect all that well right now. But we should probably handle 416 in
> the future.
> 
> John
> =:->

Thanks to Andrew Bennetts for giving me the commands for setting up a
local twisted server. Interestingly enough, the new http code kicks real
but when branching from a local server. I believe this is because
Twisted doesn't support ranges, is returning the whole file for every
readv(). The old code didn't collapse readv() very well, so it was
making a full request for the inventory.knit for every partial it
wanted. (Some of this is just that the max collapse is 50, but we are
requesting 6000+ entries for an initial branch)

Here are some test results. Keep in mind that there is a few seconds
spent just building the tree (about 6s). Also, all of these test runs
are done without caching of inventory.knit.

When localhost is a twisted server:
 167s	bzr branch http://localhost:8888/bzr.dev
  35s	bzr-http branch http+pycurl://localhost:8888/bzr.dev
  38s	bzr-http branch http+urllib://localhost:8888/bzr.dev

When localhost is a python SimpleHTTPServer, which I'm sure doesn't
support partial range requests, and I believe is only HTTP/1.0 compliant
(and also doesn't support keepalive):

 109s	bzr branch http://localhost:8888/bzr.dev
  33s	bzr-http branch http+pycurl://localhost:8888/bzr.dev
  34s	bzr-http branch http+urllib://localhost:8888/bzr.dev

(interestingly, SimpleHTTPServer is actually faster than twisted for
this specific very simple case. I'm sure it doesn't scale worth a damn,
though)

With this in mind, I went ahead and tried it on my local network, to see
what the difference was. My local server is Apache, but seems to be
denying Keep-Alive:

  89s	bzr branch http://bzr.arbash-meinel.com/mirrors/bzr/bzr.dev
  40s	bzr-http branch http+pycurl://
  46s	bzr-http branch http+urllib://

And for a reference here is the time just to branch locally, and over
sftp. The http branch performs the same, since we haven't done any
changes to the sftp transport:

  30s	bzr branch ../bzr.dev
  83s	bzr branch sftp://other/.../bzr.dev
  74s	bzr branch sftp://localhost/.../bzr.dev
  29s	bzr-http branch ../bzr.dev
  83s	bzr-http branch sftp://other/.../bzr.dev
  74s	bzr-http branch sftp://localhost/.../bzr.dev

And the time to do a local branch without building the working tree (6s
faster)
	bzr init-repo bzr-local
  24s	bzr branch ../bzr.dev bzr-local/bzr.dev

So it looks like we might want to focus on getting sftp branching to be
faster, because right now, local http is almost as fast as a local
branch (40s vs 30s) while sftp on a local machine is really lagging
behind (80s).


John
=:->

PS> Now, to be nice to the current bzr.dev codebase, I did realize that
the current bzr.dev was being extra penalized because my local bzr.dev
repository is really fragmented (I've got lots of extra revisions in
there). So after creating a clean bzr.dev branch, doing 'bzr branch
http://localhost' using SimpleHttpServer only takes 41s, down from 109s.
  bzr-http stays stable at ~35s.

However, real branches are also going to be fragmented (the official
bzr.dev is). Any time you have a shared repository, there is going to be
fragmentation. So I think the performance improvements are valid. People
aren't going to sanitize their repositories all the time. And with the
new code, there isn't a big benefit to doing so.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEt9P8JdeBCYSNAAMRAjEjAJ4gTT7ymAt1sS4W4xMlZVU2irXiUwCfaPXC
PYVkl1U+JEcxfNDOVNFGYxk=
=V7rN
-----END PGP SIGNATURE-----




More information about the bazaar mailing list