[MERGE] Make log revision filtering pluggable.

Wed Aug 27 05:31:00 BST 2008

John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
So, we still have to decide what "_foo" means, so as that conversation 
settles down we can decide what to name "log_adapters".

This actually also goes for all of the "make_*" adapters as well. Mostly 
because if they are going to be public, they should have tests, to 
ensure we don't accidentally change how they work.

For some reason, this doesn't seem right to me:
+    for batch in log_rev_iterator:
+        batch = iter(batch)
+        while True:
+            step = [detail for _, detail in zip(range(num), batch)]
+            if len(step) == 0:
+                break
+            yield step
+            num = min(int(num * 1.5), 200)

But it seems that iter(iterator) just returns "iterator" (as opposed to 
iter(list) or any other "iterable" rather than an "iterator").

It is technically slightly wrong, in that it will always break at an 
iterable boundary, but as this is the primary function that *does* the 
split up, it doesn't really matter.

Since it doesn't return an iterator of LogRevision objects, I would 
probably change the name to something like: 
"make_revision_tuple_iterators()".

And correspondingly change the names of "log_revision_iterator". 
Possibly call it "rev_tuple_iterator_iterator" though I wouldn't say I 
*like* that.

...

+def _generate_deltas(repository, log_rev_iterator):
+    for revs in log_rev_iterator:
+        revisions = [rev[1] for rev in revs]
+        deltas = repository.get_deltas_for_revisions(revisions)
+        revs = [(rev[0], rev[1], delta) for rev, delta in izip(revs, 
deltas)]
+        yield revs

^- This seems to assume that "revs" can be iterated 2 times. Which I 
don't think is guaranteed by your comments. As it should be an iterator, 
not a list.

If you want to change it so that it is always a list, that is fine, but
"An iterator over iterators of ((rev_id, revno, merge_depth), rev, 
delta)."

does not work for 2 passes. You could alternatively use:

revs = list(revs)
Or even "revs = tuple(revs)". (With the advantage that if revs is a 
tuple already, tuple(tuple) returns the object, while list(list) returns 
a *new* list.)

Stuff like this *might* be a reason to directly test all of these 
functions.

In general, you should either change how the api is declared (make it an 
iterator of lists), or make sure the functions properly only iterate 
over the lists once. You can chose which way works best for you.

Instead of special casing lists, why not use a generator comprehension 
here:
+    # Convert view_revisions into (view, None, None) groups to fit with
+    # the standard interface here.
+    if type(view_revisions) == list:
+        nones = [None] * len(view_revisions)
+        log_rev_iterator = iter([zip(view_revisions, nones, nones)])
+    else:
+        def _convert():
+            for view in view_revisions:
+                yield (view, None, None)
+        log_rev_iterator = iter([_convert()])

log_rev_iterator = iter([((view, None, None) for view in 
view_revisions)])

Unless you know specifically that there is a performance reason to 
special cases lists. In which case, you should make a comment about it.

in _filter_message_re, you unpack the structure

+def _filter_message_re(searchRE, log_rev_iterator):
+    for revs in log_rev_iterator:
+        new_revs = []
+        for (rev_id, revno, merge_depth), rev, delta in revs:

versus generate_deltas you address parts of the tuples:
+def _generate_deltas(repository, log_rev_iterator):
+    for revs in log_rev_iterator:
+        revisions = [rev[1] for rev in revs]

I find the former easier to read, though I suppose unpacking and 
repacking new tuples isn't strictly a good thing.

It might be nice to be a bit more consistent in how "log_rev_iterator" 
is handled.

So the only thing I'm strict on is that _generate_deltas is incorrect if 
"revs" is a pure iterator. The rest are suggestions to make things 
better, but are above and beyond the "better than what it is now".

For details, see: 
http://bundlebuggy.aaronbentley.com/project/bzr/request/%3C1219353364.31549.149.camel%40lifeless-64%3E
Project: Bazaar