bzr log http+urllib does not work, http+pycurl is too slow

John Arbash Meinel john at arbash-meinel.com
Wed Dec 12 15:14:50 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
...

> That, my friend, is some very bad news.... Well, the good news is
> that the patch fixes the bug at least...
> 
> And sorry for asking again, but can you do that one more time
> with -Dhttp so that I can diagnose more easily. There may be
> several GET requests for one readv (that's should not be the case
> here, but seeing the ranges requested may help evaluate the
> wasted bandwidth :/ ).
> 
> We have a bad situation here, because even if the http transport
> can reuse the whole file transferred inside one readv, it will
> not be able to reuse that file *across* several readv if we don't
> add a local cache (which we want to avoid for several reasons).
> 
>     Vincent

Did you see my earlier comment about readv() returning extra data?

At least for the index reads, Packs are designed to allow the readv() request
to buffer more data than requested. This is to make it so that HTTP requests
buffer 64k pages, while local requests only buffer 4-8k. (The whole latency
versus bandwidth idea.)

Which means that for indexes (adjust_for_latency=True), we would be fine. I'm
guessing that for the raw .pack file we would not.

If we want to handle HTTP servers that cannot do ranges, we could detect it and
write to a local cache that gets cleaned up (something in $TMP).

Then we don't have to buffer in memory. I don't really like having to
shutil.rmtree($TMP/foo) as part of an HTTP teardown, though.

But yes, it is known that the plain Python HTTP server cannot do ranges. It was
one of the test cases in the past, to make sure we at least keep working in
those situations.

I believe HTTP says that "if ranges were requested, it is valid to return the
whole file". We can certainly detect that (a return code of 200 versus 206).

We do need to access packs a few times. We have 1 request for inventories, and
then another to requests the texts referenced by those inventories. (And I
think one more for the revisions, but we could bundle that with the request for
the texts.)


What about this compromise... Write a plugin which provides an HTTP transport
that does the caching. Don't bring it into the core, but mention it if people
really need to access servers that don't support range requests.

Oh, and get Vincent's great urllib2 work merged into upstream, so that they can
have a real HTTP server built-in :).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHX/rqJdeBCYSNAAMRAku9AKCTzhCXifcxabVik7mla97jrBeVlACguZbU
FzDTtZvmRtuKNH2qNM9/nH4=
=GiLW
-----END PGP SIGNATURE-----



More information about the bazaar mailing list