[RFC] Multipart support for _urllib_
Michael Ellerman
michael at ellerman.id.au
Sun Jun 18 10:09:08 BST 2006
On Sat, 2006-06-17 at 07:47 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Michael Ellerman wrote:
> > Hi guys,
> >
> > I've cleaned up the work I did to do multipart HTTP. Actually I
> > started from scratch, I think it's reasonably readable now.
> I'm very interested in this. I've already started doing some of my own
> benchmarks, so we'll see where we can get. Just a few comments.
>
> 1) You sent a bundle, but not against anything else. This is okay, but
> it means we can't merge you (I'm missing
> michael at ellerman.id.au-20060608162722-df63ead19775ce65)
> When you really want bundles, you should use "bzr bundle $path/bzr.dev"
> That way we have everything needed to apply your ancestry against bzr.dev.
> (You may already know this and just chose to send a simplified bundle)
Oops, sorry. That's my fix "Don't coallesce try blocks, it can lead to
confusing exceptions."
> Why did you decide to do this as a 'seek + read(size)' rather than
> readv()? Just so it would be more 'file-like'?
Yeah. I was thinking that seek(offset) + read(size) was equivalent to
readv() and meant we could operate on normal files. I originally thought
that I could use StringIO for all the response types, but then I
discovered that StringIO actually creates huge arrays of null bytes if
you do sparse writes.
I also thought originally that I'd be able to pass the response from
urllib straight out for the 200 case, but although it's a "file" it
doesn't support seek. So now it has to be wrapped in a StringIO anyway.
> > + if self._pos < 0:
> > + self._pos = 0
>
> You need another space here.
Will fix.
> > + def _offsets_to_ranges(self, offsets):
> > + """Turn a list of offsets and sizes into a list of byte ranges.
> > +
> > + :param offsets: A list of tuples of (start, size).
> > + An empty list is not accepted.
> > +
> > + :return: a list of byte ranges (start, end). Adjacent ranges will
> > + be combined in the result.
> > + """
>
> You can use the python2.4 idioms:
> import operator
> offsets = sorted(offsets, key=operator.itemgetter(0))
>
> Though if you sort a tuple with (start, end), it seems like you might as
> well just call:
> offsets = sorted(offsets)
Yeah you're right, I should have just tried offsets.sort() rather than
doing it the "right" way.
> > + def readv(self, relpath, offsets):
> > + """Get parts of the file at the given relative path.
> > +
> > + :param offsets: A list of (offset, size) tuples.
> > + :param return: A list or generator of (offset, data) tuples
> > + """
> > + mutter('readv of %s [%s]', relpath, offsets)
> > + ranges = self._offsets_to_ranges(offsets)
> > + code, f = self._get(relpath, ranges)
> > + for start, size in offsets:
> > + f.seek(start, 0)
> > + data = f.read(size)
> > + assert len(data) == size
> > + yield start, data
>
> This is where it would seem to make more sense to just make the readv
> call into the RangeFile rather than a bunch of seek + read calls.
Yeah, I dunno. Given that I ended up having to wrap all the response
anyway it probably is easier to just do readv() into the wrapper.
cheers
--
Michael Ellerman
IBM OzLabs
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060618/b1aad7c0/attachment.pgp
More information about the bazaar
mailing list