[MERGE] Make log revision filtering pluggable.

Robert Collins robertc at robertcollins.net
Thu Aug 28 00:23:46 BST 2008


On Wed, 2008-08-27 at 00:31 -0400, John Arbash Meinel wrote:
> John Arbash Meinel has voted tweak.
> Status is now: Conditionally approved
> Comment:
> So, we still have to decide what "_foo" means, so as that conversation 
> settles down we can decide what to name "log_adapters".

We dont have a convention yet, so I don't think we have to wait for it.
If we want complete consistency in the code base, we can retroactively
do that.

> This actually also goes for all of the "make_*" adapters as well. Mostly 
> because if they are going to be public, they should have tests, to 
> ensure we don't accidentally change how they work.

I'm happy to make them _prefixed names. 

> For some reason, this doesn't seem right to me:
> +    for batch in log_rev_iterator:
> +        batch = iter(batch)
> +        while True:
> +            step = [detail for _, detail in zip(range(num), batch)]
> +            if len(step) == 0:
> +                break
> +            yield step
> +            num = min(int(num * 1.5), 200)
> 
> But it seems that iter(iterator) just returns "iterator" (as opposed to 
> iter(list) or any other "iterable" rather than an "iterator").
>
> It is technically slightly wrong, in that it will always break at an 
> iterable boundary, but as this is the primary function that *does* the 
> split up, it doesn't really matter.

Yes, I was aware of this but decided it didn't really matter - the
complexity to flatten-and-re-break, smoothly, just isn't worth it.

> Since it doesn't return an iterator of LogRevision objects, I would 
> probably change the name to something like: 
> "make_revision_tuple_iterators()".
> 
> And correspondingly change the names of "log_revision_iterator". 
> Possibly call it "rev_tuple_iterator_iterator" though I wouldn't say I 
> *like* that.

indeed. Its a bit ugly. We don't *have* to be totally hungarian you know
- I used log_revision_iterator to indicate that its the thing used for
logging of revisions and its an iterator. A complete description would
be
log_rev_merge_sort_tuple_option_revision_optional_delta_iterator_iterator

But python tracks that for us.

The key thing that matters for new comers to the code is:
 - its specific to log
 - its an iterator not a static object - you have to pass over it.

I think the name indicates this today. I don't really want to change it
unless it is clearly more helpful to people. I don't think
iterator_iterator is more helpful.

> +def _generate_deltas(repository, log_rev_iterator):
> +    for revs in log_rev_iterator:
> +        revisions = [rev[1] for rev in revs]
> +        deltas = repository.get_deltas_for_revisions(revisions)
> +        revs = [(rev[0], rev[1], delta) for rev, delta in izip(revs, 
> deltas)]
> +        yield revs
> 
> 
> ^- This seems to assume that "revs" can be iterated 2 times. Which I 
> don't think is guaranteed by your comments. As it should be an iterator, 
> not a list.
>
> If you want to change it so that it is always a list, that is fine,
> but
> "An iterator over iterators of ((rev_id, revno, merge_depth), rev, 
> delta)."
> 

I will change the definition of log_rev_iterator to
 An iterator over lists of ((rev_id, revno, merge_depth), rev,
        delta).

Which matches implementation more.

> Instead of special casing lists, why not use a generator comprehension 
> here:
> +    # Convert view_revisions into (view, None, None) groups to fit with
> +    # the standard interface here.
> +    if type(view_revisions) == list:
> +        nones = [None] * len(view_revisions)
> +        log_rev_iterator = iter([zip(view_revisions, nones, nones)])
> +    else:
> +        def _convert():
> +            for view in view_revisions:
> +                yield (view, None, None)
> +        log_rev_iterator = iter([_convert()])

Because its faster this way, in my adhoc testing.

> 
> log_rev_iterator = iter([((view, None, None) for view in 
> view_revisions)])
> 
> Unless you know specifically that there is a performance reason to 
> special cases lists. In which case, you should make a comment about it.

I don't know it rigorously, but I'll add a comment.

> in _filter_message_re, you unpack the structure
> 
> +def _filter_message_re(searchRE, log_rev_iterator):
> +    for revs in log_rev_iterator:
> +        new_revs = []
> +        for (rev_id, revno, merge_depth), rev, delta in revs:
> 
> 
> versus generate_deltas you address parts of the tuples:
> +def _generate_deltas(repository, log_rev_iterator):
> +    for revs in log_rev_iterator:
> +        revisions = [rev[1] for rev in revs]
> 
> I find the former easier to read, though I suppose unpacking and 
> repacking new tuples isn't strictly a good thing.

I think this is due to how the different helpers emerged, rather than a
specific intent. 

> It might be nice to be a bit more consistent in how "log_rev_iterator" 
> is handled.
> 
> So the only thing I'm strict on is that _generate_deltas is incorrect if 
> "revs" is a pure iterator. The rest are suggestions to make things 
> better, but are above and beyond the "better than what it is now".

Thank you!

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080828/6da32cf1/attachment.pgp 


More information about the bazaar mailing list