[RFC] Multipart support for _urllib_

John Arbash Meinel john at arbash-meinel.com
Sun Jun 18 14:43:56 BST 2006


Michael Ellerman wrote:
> On Sat, 2006-06-17 at 07:47 -0500, John Arbash Meinel wrote:

...

>> You can use the python2.4 idioms:
>> import operator
>> offsets = sorted(offsets, key=operator.itemgetter(0))
>>
>> Though if you sort a tuple with (start, end), it seems like you might as
>> well just call:
>> offsets = sorted(offsets)
> 
> Yeah you're right, I should have just tried offsets.sort() rather than
> doing it the "right" way.

Actually, I ended up cleaning it up because I was very excited about the
performance implications.
I went with 'offsets = sorted(offsets)', because the test suite passes
in a tuple of tuples, which doesn't have a 'sort' field.
But sorted() works because it is an iterator.

> 
>>> +    def readv(self, relpath, offsets):
>>> +        """Get parts of the file at the given relative path.
>>> +
>>> +        :param offsets: A list of (offset, size) tuples.
>>> +        :param return: A list or generator of (offset, data) tuples
>>> +        """
>>> +        mutter('readv of %s [%s]', relpath, offsets)
>>> +        ranges = self._offsets_to_ranges(offsets)
>>> +        code, f = self._get(relpath, ranges)
>>> +        for start, size in offsets:
>>> +            f.seek(start, 0)
>>> +            data = f.read(size)
>>> +            assert len(data) == size
>>> +            yield start, data
>> This is where it would seem to make more sense to just make the readv
>> call into the RangeFile rather than a bunch of seek + read calls.
> 
> Yeah, I dunno. Given that I ended up having to wrap all the response
> anyway it probably is easier to just do readv() into the wrapper.
> 
> cheers
> 

Yeah, I ended up just going for the seek+read stuff for now. Most things
are wrapped in a RangeFile at this point anyway. I suppose we could move
readv up higher, and use it on LocalTransport files. I'm also thinking
we should do something like this for SFTP. We can actually seek on the
SFTP handle, but that requires another round trip. Far better to
collapse the ranges, pull the data locally, and then seek around them in
memory, rather than over the remote connection.

So I still think adding readv() to RangeFile might be worth it. Or at
least a 'read_hunk' which would take an offset + size to read. Rather
than doing 2 function calls (one for seek, one for read).

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060618/da938491/attachment.pgp 


More information about the bazaar mailing list