[MERGE] Implement and use Repository.iter_files_bytes

John Arbash Meinel john at arbash-meinel.com
Thu Aug 16 18:10:31 BST 2007

Hash: SHA1

Aaron Bentley wrote:
> Hi all,
> This bundle implements a new interface for getting file contents:
> Tree and Repository.iter_files_bytes.
> It will be highly efficient on pack repos, requiring a single readv to
> extract all desired texts.  This should greatly improve build_tree
> performance.  It currently is a negligible improvement for knits.
> This patch also changes build_tree and revert to use Tree.iter_files_bytes.
> I propose merging this now, to reduce code divergence between Robert's
> branch and the mainline.
> Aaron

I just wanted to check some of my assumptions before I do a full review.

I may be misunderstanding what you are doing, but it certainly sounds
like this ends up reading all texts *for the whole tree* into memory.
Especially for http which sends a single request, and buffers the whole
thing before returning. (sftp and local are a bit better about it, and
it is one of the things I would like to have fixed for http, because it
effects all downloads because we buffer all of inventory.knit before we
start doing any processing of it.)

Also, I thought you mentioned having it return texts in a potentially
different order, does this mean that you have to watch out to make sure
you create directories at the right time (before the files you get back)?

It might be better to have a few calls to this. For example, you could
buffer it per directory, or some other smaller amount.

Maybe you were assuming that readv was going to be optimally small in
all cases, and didn't realize that the HTTP code is not. Then again, we
aren't often building a working tree out of an HTTP repository. Though I
guess if we make it work well, that may become more common.

Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the bazaar mailing list