[MERGE] Implement and use Repository.iter_files_bytes

Mon Aug 20 06:47:21 BST 2007

On 8/20/07, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> > +    def iter_files_bytes(self, desired_files):
> > +        """Iterate through file versions.
> > +
> > +        The order in which files will be yielded is unspecified.
> > +        Yields pairs of identifier, bytes_iterator.  identifier is an
> > opaque
> > +        value that uniquely identifies the file version.  (Example: a
> > +        TreeTransform trans_id.)
> > +
> > +        The default implementation just does VersionedFile.get_lines().
> > +        :param desired_files: a list of (file_id, revision_id, identifier)
> > +            pairs
> > +        """
> >
> > What is 'identifier' and where does it come from?
>
> I'm not sure whether you missed the top of the docstring where I said
> it's an opaque value, and gave a TreeTransform trans_id as an example.
>
> For build_tree, it is a TreeTransform trans_id.  For revert, it is a
> (trans_id, mode_id) pair.  They come from TreeTransform._assign_id.  The
> mode_id is used in revert to ensure that when a file is backed up, the
> new file has the same unix mode as the backed-up (moved) file.

Yes, I saw that it's opaque, but if I wanted to call this that doesn't tell me
what I should pass.  I guess you mean that it's opaque to iter_files_bytes
because it doesn't interpret it but just passes it through to the output?

I think you mean that the caller can pass anything they like, as long as
no value is repeated?

To be clear, I understand that the interface makes sense but I think it
needs better documentation.

> > Is it there just to
> > match
> > up the input and output parameters?
>
> It is there to provide an identifier for the content that is unique in
> the context of the operation.
>
> > The test case does not seem to pass one at all.
>
> Yes it does:

My mistake.

> Second, build_tree and revert must now look up the trans_id (and
> mode_id) associated with that file_id.
>
> So iter_files_bytes may not be perfect for *general* use, but for
> *actual* uses, it's very nice.
>
> > Please also document what bytes_iterator is - from the test, it yields.
> > It looks like in some cases you are returning just a byte string, which
> > can be used as an iterator, but which will then return one byte at
> > a time - so ''.join(bytes_iterator) is pretty slow.
>
> No, it never returns a string.  The bytes_iterator for
> Tree.iter_files_bytes is tuple containing a single string.  The
> bytes_iterator for Repository.iter_files_bytes is a list of lines.

Ok, you're right.

+            cur_file = (self.get_file_text(file_id),)

Can you please add a comment here explaining that this is the point of
constructing a tuple?

-- 
Martin