[UGLY HACK] Proof of concept multipart/byteranges support and connection sharing

Michael Ellerman michael at ellerman.id.au
Mon May 22 02:15:05 BST 2006


On Sat, 2006-05-20 at 20:57 +1000, Robert Collins wrote:
> On Fri, 2006-05-19 at 19:15 +1000, Michael Ellerman wrote:
> > Hi guys,
> 
> 
> > I didn't really believe it made this much difference, but I've run these
> > a few times, and I think I'm not going crazy. (this is just "time bzr
> > branch foo bar").
> 
> Its real. Latency bites badly.

Yeah, I never doubted it, but it's nice to have numbers to show just how
bad it is.

> > [2] Unfortunately we're creating about 15 PyCurlTransport() objects, so
> > to see much improvement we have to share the Curl() object globally.
> > Yuck. Also it seems (??) you can't unset pycurl.RANGE/NOBODY, so we have
> > to have three Curl() objects, one for GET, one for HEAD and one for GET
> > + Range.
> 
> I'm not sure why we have to share the curl objects specially - unless
> you mean for the connection sharing. If thats so, I'd introduct a
> HttpClient object or something that is shared between the
> PyCurlTransports. It could then hold the 3 curl objects needed. I'm
> suggesting that get_transport(http://...) would have the effect of
> making a new one of these always.

Yeah, the Curl() stuff is purely to do connection sharing. I don't know
how the transport code works, but from a quick glance, if we create a
new HttpClient() for each get_transport(..) isn't that equivalent to
having one Curl() per PyCurlTransport() ? If so, that doesn't get us as
big a win, because we create lots of transports.

As another data point, doing the byterange stuff without the connection
sharing gets me these rough numbers:

real    29m8.436s
real    31m13.397s
real    29m5.429s

So it's definitely helping, although the bulk of the improvement is the
byterange stuff.

> Just having good range support would rock. Its been on my TODO for a
> bit. Things to watch out for in the multipart response: web servers may
> return the full object, or may return combined ranges - IIRC the bytes
> cant be reordered from the requested range though. You'll need to check
> rfc2616 on that.

Yeah, there's lots of corner cases. I think I already handle the full
result case, as long as we get code 200 back, not 206. Reordering would
break that code in a jiffie. Before writing a proper version I'd like to
check what twisted do, and/or any other implementations.

cheers

-- 
Michael Ellerman
IBM OzLabs

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060522/e80c4d68/attachment.pgp 


More information about the bazaar mailing list