Why does the RemoteTransport readv reorder the ranges requested ?

Robert Collins robertc at robertcollins.net
Tue Aug 28 23:00:08 BST 2007


On Tue, 2007-08-28 at 16:52 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > This seems counterproductive to me - by reordering we have to wait for
> > data to arrive and buffer arbitrarily large amounts of it. Surely its
> > better not to do that?
> > 
> > -Rob
> 
> In the general case, we request things in approximate order, so you don't have
> to buffer much.
> 
> We already buffer the whole thing anyway, because the RPC calls are not
> designed to incrementally read more data from the socket. (This should, of
> course, be fixed, but at the moment we buffer everything in a StringIO anyway.)

Sure.

> When sending the request for different ranges, we probably want to collapse
> locally as much as possible, so that we aren't sending 10,000 ranges across the
> wire. I believe the collapse code reorders the request.
> Arguably, we could have different collapse code that collapses as much as it
> can without doing any reordering.

I think that that is important for packs. Considering some readv
requests may ask for iso's as part of the data, you don't want to buffer
that much data just because it was first.


> So I think the simple answer is that it was expedient to do so. And the more
> general answer is that we want to do at least some collapsing on the in-memory
> end before sending the request out. For 'bzr branch REMOTE' means the
> difference of sending 13,000 offset+length tuples over the wire, versus
> probably sending ~10-100, depending on how fragmented the remote repository is.
> (If it is completely unfragmented, then we may even send a single offset+length.)



> Again, if we simply wrote new code to collapse the ranges, which didn't sort
> them, we could get most of the benefit, and not have to buffer anything. (As
> long as the read_body_bytes also didn't buffer everything.)

Right, this is where we want to head.

So I think the immediate thing to do is:

 - for non adjust_for_latency requests, I'll leave the current code, so
its compatible on the wire.
 - for adjust_for_latency requests, I'll put in a new RPC that returns
offset,length prefixed data, allowing the remote side to decide what
order, and how much data is returned.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070829/9da82a09/attachment.pgp 


More information about the bazaar mailing list