[UGLY HACK] Proof of concept multipart/byteranges support and connection sharing

Sat May 20 14:25:28 BST 2006

Robert Collins wrote:
> On Fri, 2006-05-19 at 19:15 +1000, Michael Ellerman wrote:
>> Hi guys,
> 
> 
>> I didn't really believe it made this much difference, but I've run these
>> a few times, and I think I'm not going crazy. (this is just "time bzr
>> branch foo bar").
> 
> Its real. Latency bites badly.
> 
>> [1] Currently we only ever request contiguous ranges. ie. If we're asked
>> for 10-20,20-30 we'll do one request for 10-30. But if we're asked for
>> 10-20,30-40 we do two requests. This sucks, in some cases we do > 500
>> requests on one file.
> 
> Yah. 
> 
>> [2] Unfortunately we're creating about 15 PyCurlTransport() objects, so
>> to see much improvement we have to share the Curl() object globally.
>> Yuck. Also it seems (??) you can't unset pycurl.RANGE/NOBODY, so we have
>> to have three Curl() objects, one for GET, one for HEAD and one for GET
>> + Range.
> 
> I'm not sure why we have to share the curl objects specially - unless
> you mean for the connection sharing. If thats so, I'd introduct a
> HttpClient object or something that is shared between the
> PyCurlTransports. It could then hold the 3 curl objects needed. I'm
> suggesting that get_transport(http://...) would have the effect of
> making a new one of these always.
> 
> Just having good range support would rock. Its been on my TODO for a
> bit. Things to watch out for in the multipart response: web servers may
> return the full object, or may return combined ranges - IIRC the bytes
> cant be reordered from the requested range though. You'll need to check
> rfc2616 on that.
> 
> +++1 on getting this in place.
> 
> In a related note, we need to make better use of the readv api present
> in paramiko latests releases.
> 
> Rob

Well, when I was looking into it, I was seeing a whole lot of:

readv(inventory.knit, 1,50)
readv(inventory.knit, 51,100)
readv(inventory.knit, 101,150)
readv(inventory.knit, 151,200)
readv(inventory.knit, 201,250)
readv(inventory.knit, 251,300)

Basically, we were reading all of the index, but we were reading it one
revision at a time, which caused a round-trip for each. If we had just
buffered those up ahead of time we could have just done the combined
read of 1-300.

I think if we track that down it would help as well.
John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060520/52be827b/attachment.pgp