[RFC] Multipart support for _urllib_

Sun Jun 18 10:09:08 BST 2006

On Sat, 2006-06-17 at 07:47 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Michael Ellerman wrote:
> > Hi guys,
> > 
> > I've cleaned up the work I did to do multipart HTTP. Actually I
> > started from scratch, I think it's reasonably readable now.

> I'm very interested in this. I've already started doing some of my own
> benchmarks, so we'll see where we can get. Just a few comments.
> 
> 1) You sent a bundle, but not against anything else. This is okay, but
> it means we can't merge you (I'm missing
> michael at ellerman.id.au-20060608162722-df63ead19775ce65)
> When you really want bundles, you should use "bzr bundle $path/bzr.dev"
> That way we have everything needed to apply your ancestry against bzr.dev.
> (You may already know this and just chose to send a simplified bundle)

Oops, sorry. That's my fix "Don't coallesce try blocks, it can lead to
confusing exceptions."

> Why did you decide to do this as a 'seek + read(size)' rather than
> readv()? Just so it would be more 'file-like'?

Yeah. I was thinking that seek(offset) + read(size) was equivalent to
readv() and meant we could operate on normal files. I originally thought
that I could use StringIO for all the response types, but then I
discovered that StringIO actually creates huge arrays of null bytes if
you do sparse writes.

I also thought originally that I'd be able to pass the response from
urllib straight out for the 200 case, but although it's a "file" it
doesn't support seek. So now it has to be wrapped in a StringIO anyway.

> > +        if self._pos < 0:
> > +            self._pos = 0
> 
> You need another space here.

Will fix.

> > +    def _offsets_to_ranges(self, offsets):
> > +        """Turn a list of offsets and sizes into a list of byte ranges.
> > +
> > +        :param offsets: A list of tuples of (start, size).
> > +        An empty list is not accepted.
> > +
> > +        :return: a list of byte ranges (start, end). Adjacent ranges will
> > +        be combined in the result.
> > +        """
> 
> You can use the python2.4 idioms:
> import operator
> offsets = sorted(offsets, key=operator.itemgetter(0))
> 
> Though if you sort a tuple with (start, end), it seems like you might as
> well just call:
> offsets = sorted(offsets)

Yeah you're right, I should have just tried offsets.sort() rather than
doing it the "right" way.

> > +    def readv(self, relpath, offsets):
> > +        """Get parts of the file at the given relative path.
> > +
> > +        :param offsets: A list of (offset, size) tuples.
> > +        :param return: A list or generator of (offset, data) tuples
> > +        """
> > +        mutter('readv of %s [%s]', relpath, offsets)
> > +        ranges = self._offsets_to_ranges(offsets)
> > +        code, f = self._get(relpath, ranges)
> > +        for start, size in offsets:
> > +            f.seek(start, 0)
> > +            data = f.read(size)
> > +            assert len(data) == size
> > +            yield start, data
> 
> This is where it would seem to make more sense to just make the readv
> call into the RangeFile rather than a bunch of seek + read calls.

Yeah, I dunno. Given that I ended up having to wrap all the response
anyway it probably is easier to just do readv() into the wrapper.

cheers

-- 
Michael Ellerman
IBM OzLabs

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060618/b1aad7c0/attachment.pgp