[rfc] [patch] pycurl transport

Thu Jan 12 01:42:10 GMT 2006

Martin Pool wrote:
> On 10 Jan 2006, John A Meinel <john at arbash-meinel.com> wrote:
> 
> 
>>What about gzip encoding. Weaves compress quite well. And it could cause
>>as much as a 4x download speed increase.
> 
> 
> So do you mean doing gzip transfer-encoding, to take advantage of
> something like mod_gzip on the other end?  That should be good.  
> 
> Though I think in general we'd want to actually create the weaves as .gz
> files so that it works regardless of the server.  I'd like to put in
> such a feature if you could find your branch that did it.
> 

The branch hasn't moved, but hasn't been updated lately either.
http://bzr.arbash-meinel.com/branches/bzr/compressed-weaves/

>From what I remember, you do pay a noticeable overhead for local trees.
Especially the infamous inventory.weave.

> 
>>But I agree with Jamie. If you get as much of an increase as you say,
>>it is worth using if available.  Kind of like cElementTree, only not
>>as necessary.
>>
>>Oh, and it would be nice if you could actually use a HEAD request for
>>'has', and do a partial download for a get_partial request.
> 
> 
> Good point.
> 
> 
>>Though I'm not sure how it compares with the knit changes to
>>transport.  Since they were switching to returning file-like objects
>>that support seek(). (Which I think makes it difficult to do a partial
>>read, unless all file-like objects return by Transport would also
>>support scatter read).
> 
> 
> Perhaps you could defer doing the request until they seek & try to read,
> then read from that point on.
> 

I personally think that seek() + read() is a really poor way of
describing that you want to read specific chunks of a file. It works
(and fairly well) for local files. But for remote files, you would want
to do all sorts of read-ahead/pipelining/etc. And you aren't telling
anyone what your plan is, even though you have probably created one
before you issue the first seek.

As an example, paramiko would really like to set the prefetch flag if
you are going to be reading a file multiple times in a row. But that
would be wasted bandwidth if you are going to read a little, seek, read
a little more, etc.

And http can't even seek. I'm not positive if seeking is out-weighed by
round trip time. I'm sure for smaller than some specific read size, it
is cheaper to read the whole thing rather than seeking inbetween.
But doing a single 'give me this range, and this range, and this range'
should be reasonably efficient.

But that was also why I wanted the _multi() functions, since you can do
a little bit of planning, then make a batch call, and with generators,
you can even do a little bit of work while the information comes in.
Robert hasn't convinced me that this is evil yet, though I do believe
all the functionality will end up removed.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060111/c4eeb6fb/attachment.pgp