[MERGE] Implement and use Repository.iter_files_bytes
Martin Pool
mbp at sourcefrog.net
Mon Aug 20 06:47:21 BST 2007
On 8/20/07, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> > + def iter_files_bytes(self, desired_files):
> > + """Iterate through file versions.
> > +
> > + The order in which files will be yielded is unspecified.
> > + Yields pairs of identifier, bytes_iterator. identifier is an
> > opaque
> > + value that uniquely identifies the file version. (Example: a
> > + TreeTransform trans_id.)
> > +
> > + The default implementation just does VersionedFile.get_lines().
> > + :param desired_files: a list of (file_id, revision_id, identifier)
> > + pairs
> > + """
> >
> > What is 'identifier' and where does it come from?
>
> I'm not sure whether you missed the top of the docstring where I said
> it's an opaque value, and gave a TreeTransform trans_id as an example.
>
> For build_tree, it is a TreeTransform trans_id. For revert, it is a
> (trans_id, mode_id) pair. They come from TreeTransform._assign_id. The
> mode_id is used in revert to ensure that when a file is backed up, the
> new file has the same unix mode as the backed-up (moved) file.
Yes, I saw that it's opaque, but if I wanted to call this that doesn't tell me
what I should pass. I guess you mean that it's opaque to iter_files_bytes
because it doesn't interpret it but just passes it through to the output?
I think you mean that the caller can pass anything they like, as long as
no value is repeated?
To be clear, I understand that the interface makes sense but I think it
needs better documentation.
> > Is it there just to
> > match
> > up the input and output parameters?
>
> It is there to provide an identifier for the content that is unique in
> the context of the operation.
>
> > The test case does not seem to pass one at all.
>
> Yes it does:
My mistake.
> Second, build_tree and revert must now look up the trans_id (and
> mode_id) associated with that file_id.
>
> So iter_files_bytes may not be perfect for *general* use, but for
> *actual* uses, it's very nice.
>
> > Please also document what bytes_iterator is - from the test, it yields.
> > It looks like in some cases you are returning just a byte string, which
> > can be used as an iterator, but which will then return one byte at
> > a time - so ''.join(bytes_iterator) is pretty slow.
>
> No, it never returns a string. The bytes_iterator for
> Tree.iter_files_bytes is tuple containing a single string. The
> bytes_iterator for Repository.iter_files_bytes is a list of lines.
Ok, you're right.
+ cur_file = (self.get_file_text(file_id),)
Can you please add a comment here explaining that this is the point of
constructing a tuple?
--
Martin
More information about the bazaar
mailing list