[MERGE] Make log revision filtering pluggable.
Robert Collins
robertc at robertcollins.net
Thu Aug 28 00:23:46 BST 2008
On Wed, 2008-08-27 at 00:31 -0400, John Arbash Meinel wrote:
> John Arbash Meinel has voted tweak.
> Status is now: Conditionally approved
> Comment:
> So, we still have to decide what "_foo" means, so as that conversation
> settles down we can decide what to name "log_adapters".
We dont have a convention yet, so I don't think we have to wait for it.
If we want complete consistency in the code base, we can retroactively
do that.
> This actually also goes for all of the "make_*" adapters as well. Mostly
> because if they are going to be public, they should have tests, to
> ensure we don't accidentally change how they work.
I'm happy to make them _prefixed names.
> For some reason, this doesn't seem right to me:
> + for batch in log_rev_iterator:
> + batch = iter(batch)
> + while True:
> + step = [detail for _, detail in zip(range(num), batch)]
> + if len(step) == 0:
> + break
> + yield step
> + num = min(int(num * 1.5), 200)
>
> But it seems that iter(iterator) just returns "iterator" (as opposed to
> iter(list) or any other "iterable" rather than an "iterator").
>
> It is technically slightly wrong, in that it will always break at an
> iterable boundary, but as this is the primary function that *does* the
> split up, it doesn't really matter.
Yes, I was aware of this but decided it didn't really matter - the
complexity to flatten-and-re-break, smoothly, just isn't worth it.
> Since it doesn't return an iterator of LogRevision objects, I would
> probably change the name to something like:
> "make_revision_tuple_iterators()".
>
> And correspondingly change the names of "log_revision_iterator".
> Possibly call it "rev_tuple_iterator_iterator" though I wouldn't say I
> *like* that.
indeed. Its a bit ugly. We don't *have* to be totally hungarian you know
- I used log_revision_iterator to indicate that its the thing used for
logging of revisions and its an iterator. A complete description would
be
log_rev_merge_sort_tuple_option_revision_optional_delta_iterator_iterator
But python tracks that for us.
The key thing that matters for new comers to the code is:
- its specific to log
- its an iterator not a static object - you have to pass over it.
I think the name indicates this today. I don't really want to change it
unless it is clearly more helpful to people. I don't think
iterator_iterator is more helpful.
> +def _generate_deltas(repository, log_rev_iterator):
> + for revs in log_rev_iterator:
> + revisions = [rev[1] for rev in revs]
> + deltas = repository.get_deltas_for_revisions(revisions)
> + revs = [(rev[0], rev[1], delta) for rev, delta in izip(revs,
> deltas)]
> + yield revs
>
>
> ^- This seems to assume that "revs" can be iterated 2 times. Which I
> don't think is guaranteed by your comments. As it should be an iterator,
> not a list.
>
> If you want to change it so that it is always a list, that is fine,
> but
> "An iterator over iterators of ((rev_id, revno, merge_depth), rev,
> delta)."
>
I will change the definition of log_rev_iterator to
An iterator over lists of ((rev_id, revno, merge_depth), rev,
delta).
Which matches implementation more.
> Instead of special casing lists, why not use a generator comprehension
> here:
> + # Convert view_revisions into (view, None, None) groups to fit with
> + # the standard interface here.
> + if type(view_revisions) == list:
> + nones = [None] * len(view_revisions)
> + log_rev_iterator = iter([zip(view_revisions, nones, nones)])
> + else:
> + def _convert():
> + for view in view_revisions:
> + yield (view, None, None)
> + log_rev_iterator = iter([_convert()])
Because its faster this way, in my adhoc testing.
>
> log_rev_iterator = iter([((view, None, None) for view in
> view_revisions)])
>
> Unless you know specifically that there is a performance reason to
> special cases lists. In which case, you should make a comment about it.
I don't know it rigorously, but I'll add a comment.
> in _filter_message_re, you unpack the structure
>
> +def _filter_message_re(searchRE, log_rev_iterator):
> + for revs in log_rev_iterator:
> + new_revs = []
> + for (rev_id, revno, merge_depth), rev, delta in revs:
>
>
> versus generate_deltas you address parts of the tuples:
> +def _generate_deltas(repository, log_rev_iterator):
> + for revs in log_rev_iterator:
> + revisions = [rev[1] for rev in revs]
>
> I find the former easier to read, though I suppose unpacking and
> repacking new tuples isn't strictly a good thing.
I think this is due to how the different helpers emerged, rather than a
specific intent.
> It might be nice to be a bit more consistent in how "log_rev_iterator"
> is handled.
>
> So the only thing I'm strict on is that _generate_deltas is incorrect if
> "revs" is a pure iterator. The rest are suggestions to make things
> better, but are above and beyond the "better than what it is now".
Thank you!
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080828/6da32cf1/attachment.pgp
More information about the bazaar
mailing list