[MERGE] Implement and use Repository.iter_files_bytes

Mon Aug 20 04:41:23 BST 2007

Martin Pool has voted resubmit.
Status is now: Resubmit
Comment:
I know you were already going to redraft it but I have some more 
comments:

+    def iter_files_bytes(self, desired_files):
+        """Iterate through file versions.
+
+        The order in which files will be yielded is unspecified.
+        Yields pairs of identifier, bytes_iterator.  identifier is an 
opaque
+        value that uniquely identifies the file version.  (Example: a
+        TreeTransform trans_id.)
+
+        The default implementation just does VersionedFile.get_lines().
+        :param desired_files: a list of (file_id, revision_id, 
identifier)
+            pairs
+        """

What is 'identifier' and where does it come from?  Is it there just to 
match
up the input and output parameters?
The test case does not seem to pass one at all.
Maybe it would be better to just yield (file_id, revision_id, 
bytes_iter)
and let the caller match them up from there?

Rather than 'the order is unspecified' it might be clearer to say
'files will not necessarily be returned in the order they occur in
desired_files'.

Please also document what bytes_iterator is - from the test, it yields.
It looks like in some cases you are returning just a byte string, which
can be used as an iterator, but which will then return one byte at
a time - so ''.join(bytes_iterator) is pretty slow.

I'm wondering whether for new iterators we shoud document whether
the caller is allowed to quit without consuming the whole thing.
I suppose that is not a problem for the present code
since it's a readonly operation, but it might be a problem if a smart
server was streaming down the content.

For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C46C43BE5.8050209%40utoronto.ca%3E