[MERGE] Implement and use Repository.iter_files_bytes

Robert Collins robertc at robertcollins.net
Sun Aug 19 22:03:39 BST 2007


On Thu, 2007-08-16 at 12:10 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Aaron Bentley wrote:
> > Hi all,
> > 
> > This bundle implements a new interface for getting file contents:
> > 
> > Tree and Repository.iter_files_bytes.
> > 
> > It will be highly efficient on pack repos, requiring a single readv to
> > extract all desired texts.  This should greatly improve build_tree
> > performance.  It currently is a negligible improvement for knits.
> > 
> > This patch also changes build_tree and revert to use Tree.iter_files_bytes.
> > 
> > I propose merging this now, to reduce code divergence between Robert's
> > branch and the mainline.
> > 
> > Aaron
> 
> I just wanted to check some of my assumptions before I do a full review.
> 
> I may be misunderstanding what you are doing, but it certainly sounds
> like this ends up reading all texts *for the whole tree* into memory.
> Especially for http which sends a single request, and buffers the whole
> thing before returning. (sftp and local are a bit better about it, and
> it is one of the things I would like to have fixed for http, because it
> effects all downloads because we buffer all of inventory.knit before we
> start doing any processing of it.)

So does bzr+http with the smart server on inintial pull. 

> Also, I thought you mentioned having it return texts in a potentially
> different order, does this mean that you have to watch out to make sure
> you create directories at the right time (before the files you get back)?
> 
> It might be better to have a few calls to this. For example, you could
> buffer it per directory, or some other smaller amount.

No: that gives us readvs * directories * pack files, we'll pay for that
hugely for non-minute latency situations.

> Maybe you were assuming that readv was going to be optimally small in
> all cases, and didn't realize that the HTTP code is not. Then again, we
> aren't often building a working tree out of an HTTP repository. Though I
> guess if we make it work well, that may become more common.

I don't think Aaron or I assumed 'optimally small'. I do assume that
readv will coalesce nearby requests, but will keep them in the requested
order.
-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070820/0056dd8d/attachment.pgp 


More information about the bazaar mailing list