[MERGE] Implement and use Repository.iter_files_bytes
Aaron Bentley
aaron.bentley at utoronto.ca
Mon Aug 20 06:20:30 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> Martin Pool has voted resubmit.
> Status is now: Resubmit
> Comment:
> I know you were already going to redraft it but I have some more comments:
>
> + def iter_files_bytes(self, desired_files):
> + """Iterate through file versions.
> +
> + The order in which files will be yielded is unspecified.
> + Yields pairs of identifier, bytes_iterator. identifier is an
> opaque
> + value that uniquely identifies the file version. (Example: a
> + TreeTransform trans_id.)
> +
> + The default implementation just does VersionedFile.get_lines().
> + :param desired_files: a list of (file_id, revision_id, identifier)
> + pairs
> + """
>
> What is 'identifier' and where does it come from?
I'm not sure whether you missed the top of the docstring where I said
it's an opaque value, and gave a TreeTransform trans_id as an example.
For build_tree, it is a TreeTransform trans_id. For revert, it is a
(trans_id, mode_id) pair. They come from TreeTransform._assign_id. The
mode_id is used in revert to ensure that when a file is backed up, the
new file has the same unix mode as the backed-up (moved) file.
> Is it there just to
> match
> up the input and output parameters?
It is there to provide an identifier for the content that is unique in
the context of the operation.
> The test case does not seem to pass one at all.
Yes it does:
+ extracted = dict((i, ''.join(b)) for i, b in
+ tree.branch.repository.iter_files_bytes(
+ [('file1-id', 'rev1', 'file1-old'),
+ ('file1-id', 'rev2', 'file1-new'),
+ ('file2-id', 'rev1', 'file2'),
+ ]))
In this test case, the identifiers are file1-old, file1-new, and file2.
In the Tree.iter_files_bytes test case, they were 'id1', 'id2', and 'id3'.
> Maybe it would be better to just yield (file_id, revision_id, bytes_iter)
> and let the caller match them up from there?
That would mean lots of extra work.
First, RevisionTree.iter_files_bytes must match
WorkingTree.iter_files_bytes. And WorkingTree.iter_files_bytes doesn't
have any revision. So RevisionTree.iter_files_bytes must strip off the
revision id. So it has to *be* an iterator, instead of merely *using*
an iterator.
Second, build_tree and revert must now look up the trans_id (and
mode_id) associated with that file_id.
So iter_files_bytes may not be perfect for *general* use, but for
*actual* uses, it's very nice.
> Please also document what bytes_iterator is - from the test, it yields.
> It looks like in some cases you are returning just a byte string, which
> can be used as an iterator, but which will then return one byte at
> a time - so ''.join(bytes_iterator) is pretty slow.
No, it never returns a string. The bytes_iterator for
Tree.iter_files_bytes is tuple containing a single string. The
bytes_iterator for Repository.iter_files_bytes is a list of lines.
> I'm wondering whether for new iterators we shoud document whether
> the caller is allowed to quit without consuming the whole thing.
I think for iterators the assumption is that the caller *may* quit
without consuming the whole thing. I think violating that assumption
may not be a good idea. Certainly if we do, we should document that loudly.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGySSd0F+nu1YWqI0RApLaAJ9bR29nN38TBazSB8hpAqRpAcqt4wCcDT2O
tNiFw6OpTeUoRv9VE5qmSeA=
=tAyL
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list