[RFC] Repository.get_file_texts API and planning for it

Wed Aug 15 14:02:12 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Aaron and I chatted on IRC earlier today. We'd like to add an interface
> which will extract a number of file texts from a repository in whatever
> manner is best for that repository, calling back to a tree transform (or
> possibly a callback ?) for each text.

In order to maintain our current set of abstractions, it seems like this
will need a corresponding method on RevisionTree:

def create_files([(file_id, trans_id)])

Which would have a naive implementation on Tree.

Also, it kinda sucks that DirStateRevisionTree is not a RevisionTree,
because we have to implement it there, too.

> This needs changes to Repository - adding the new method (we need a good
> name

Repository.create_files ?

> - its not a regular getter, because it calls back with the text, or
> the text lines, or a file object - we should decide what is least
> friction here too).

Iterables of bytes is a very convenient one.  Text lines is nice only
when working with text.  File objects have high API demands, but even
strings are iterables of bytes.

> And code like checkout, merge, revert will need to learn about it.

build_tree and revert can the Tree.create_files method.  Merge needs
multiple revisions of the same file, and it knows about repos already,
so I think it can use Repository.create_files

> 
> the proposed signature is:
> def FOO(self, requestor, file_details):
>     """Extract a number of historyical file texts.
> 
>     :param requestor: An object which offers a create_file method which
>         will be called for each element of file_details, though not
>         necessarily in the order supplied. The create_file method has 
>         the signature (callback_data, bytes).

That doesn't match the TreeTransform method, which wants:

def create_file(self, contents, trans_id, mode_id=None)

where contents is an iterable of bytes.

Also, it seems like a bad idea to be passing around bytes rather than
iterators of bytes anyhow.  We don't want our APIs to require reading
entire files into memory anyhow.

Can we just call YAGNI on generalizing this to "requestors" and focus on
TreeTransforms instead?

> I propose that I will write up an implementation for our generic
> repository asap, and Aaron do the tree-transform related changes,
> because I'll be writing the pack based version anyhow, I may as well do
> both the generic and specialised versions.

I was willing to do the TT updates and the generalized version.  I can't
really work on the TT updates without the generalized version.

Also, we can get the generalized version+TT updates merged before packs
come in, which will reduce divergence for you.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGwvlT0F+nu1YWqI0RAgHjAJwNIHbiA/iS85uQAqeCEUPbK6iTuACeJGBI
txq5G5Vu2FUIUHVDO0a0ErE=
=ajCw
-----END PGP SIGNATURE-----