[MERGE] Make log revision filtering pluggable.

Tue Aug 26 23:36:31 BST 2008

On Tue, 2008-08-26 at 17:11 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > On Tue, 2008-08-26 at 16:38 -0400, John Arbash Meinel wrote:
> >> John Arbash Meinel has voted comment.
> >> Status is now: Semi-approved
> >> Comment:
> >> I'm a bit concerned about your api. You seem to have an iterator of 
> >> iterators, which is a bit confusing in its nesting.
> > 
> > This is for performance, to allow the current batching of revisions to
> > be presented to the filters, rather than them all reinventing it, etc.
> 
> I do understand that. I also know that I'm fairly comfortable with the logging
> code, generators and iterators and it still took me quite a while to figure
> out what was going on. Especially stuff like the _convert one snippet.
> 
> Fundamentally, my concern is that this is rather hard to understand. A
> "pipeline" isn't specifically tricky, but way everything is implemented is
> pretty hard to follow.
> 
> For starters, the fundamental object is a nested tuple of:
>   ((revision_id, revno, merge_depth), Revision_or_None, TreeDelta_or_None)
>
> Which is already a bit odd. What is x[0][1]? We have LogRevision explicitly
> because dealing in parameter tuples is difficult and hard to maintain
> backwards compatibility. We *might* need to use tuples like this for
> performance reasons, but I'm not strictly convinced we do. (Getting rid of
> O(ancestry) merge_sorted would do far more for us than switching to tuples.)

I didn't invent the nested object - it was there when I started. My
entire goal was to refactor the pipeline to be pluggable, not to
reinvent it; as long as I haven't made it worse I think I've passed our
gatekeeping requirements. I don't have the inclination to do a general
overhaul of log right now - I agree its hard to follow - I think it was
harder to follow before this work though, because it wasn't clear what
was happening, or where, or how things like deltas interacted with
messages (and note the massive performance win by making that clear and
then reversing the order).

I guess I'm saying:
 - tell me what you want done to be happy with this merging; 
 - I didn't sit down to improve log per se, but rather to let bzr-search
   integrate in; I don't care that its not optimal - as long as the
   original code was less clear I think I have not harmed things.
 - I think the original code was less clear
 - This really cannot be more brittle or complex than what it 
   replaces, because what it replaces was exactly the same thing but all
   smushed up into one function.

> Second you use terms like: "log_rev_iterator" but it isn't iterating
> revisions. It is iterating iterators of tuples which may have a "rev" in them
> somewhere. The fact that the previous sentence is difficult signals to me that
> the api is difficult to understand.

log_rev_rev_iterator ? Please, give me a hand otherwise we just create
round trips: I guess at a name that is better, and you can come back
with more objections. bb:tweak doesn't work that way because then its a
handoff from the reviewer back to the reviewee, but bb:comment and or
bb:resubmit don't have this property.

> ...
> 
> >>
> >> So... I really like it being a chain of filters. But I wonder if we 
> >> would be a lot better served by having a chain of Classes rather than 
> >> plain functions. At least at the moment, I have difficulty following 
> >> what is really going on. I can sort of tease it out, but it doesn't 
> >> really pass my "simplicity" filter.
> > 
> > How would a chain of classes work any different? This is quite literally
> > exactly what we had for log before (though it was a bit inconsistent
> > before - some levels were single rev, some were multiple).
> 
> Classes have significantly more obvious state than generators. You could
> define them as:
...
> Which would allow you to put things like "generate_deltas" in as a "**kwargs"
> parameter, rather than as parameters for the function call itself. Arguably
> you could make the extra arguments "**kwargs" in the same fashion.

So, in short - classes make no difference there. The pipeline itself is
a pull iterator pipeline, and you can implement an item in the pipeline
as a class or as a generator - it makes no difference to the driving
logic which is simply:
for batch in batches:
   for rev_details in batch:
       pass

If I was writing a complex pipeline stage (I have one planned, in
bzr-search) I would definitely use a class. But the canned log ones are
all trivial.

> I'm not strictly against the model, but it feels brittle and complex.

Suggestions are welcome - I don't feel that classes change the model *at
all* - because it totally supports classes today.

> >> Oh, and if "log_adapters" is going to be non-stable, I think it should 
> >> be "_log_adapters" for now.
> > 
> > _log_adapters would be private. This is public, but not stable.
> > 
> > -Rob
> > 
> 
> I know we are at a bit of a lack of having enough ways to decorate a symbol to
> indicate its public/private status versus its stable/unstable status. But
> generally we say that "public apis are stable and will be deprecated, etc." So
> I think making it "_log_adapters" is perfectly valid, as we aren't
> guaranteeing to plugins not to change the api without deprecating.

I'd like to take this to a different thread: This keeps coming up and
its like a very large grain of sand.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080827/501c5a59/attachment.pgp