[MERGE/RFC] log refactoring

Wed Jan 14 18:21:37 GMT 2009

>>>>> "Ian" == Ian Clatworthy <ian.clatworthy at internode.on.net> writes:

    Ian> This patch refactors the implementation of log so
    Ian> that --short logging iterates most of the time,
    Ian> rather than being O(history) as often as it currently is.

    Ian> I'm yet to measure the performance gains, if any.
    Ian> I'll do that soon. As long as performance is as
    Ian> least as good as it currently is, I think this is
    Ian> a win in term of clearer code anyhow.

Having spent some long hours lately in this code I'm not that
convinced :-/

That's certainly because we attack the problem from different
ends, but I don't think it's a good sign that I don't like more
what you propose.

On the other hand, I don't have a better proposal either except
the following idea: LogFormatter should become the first class
citizen/main entry point of log queries.

Its doc string already says:

    Abstract class to display log messages.

    At a minimum, a derived class must implement the log_revision method.

    If the LogFormatter needs to be informed of the beginning or end of
    a log it should implement the begin_log and/or end_log hook methods.

I'd say, give it all the parameters show_log is receiving today
and more from command line if needed and let's have a replacement for show_log:

    new_show_log(lf, branch, revision=None)

Haaa, I feel better :) This one should be enough for
log/push/pull/missing given the appropriate log formatters (your
'linear' or 'medium' being a needed one).

Ok, that's not how we should proceed because we want to get there
incrementally.

I've yet to find all the intermediate steps, but I have some
hints about the final result:

Log display can be decomposed in the following stages:

- establish the revision range (but don't expand it yet,
  i.e. mostly translate command-line revision into a range),

- filter the revision range,

- display the revisions.

The filtering may be a several steps processing like the
_filter_revisions_touching_file_id case which:

- find the revisions really touching the file,
- add the merging revisions towards mainline.

Also, the filtering stage is and *should remain* iterator based.

Filtering can also be a set action (remove these revisions from
the revision range, intersect the revision range with these
revisions, etc). In particular, I view the "add the merging
revisions towards mainline" as such an action.

Filtering can also be: don't show me the pqm commits or don't
show me the 'bzr up-thread --auto' commits, etc.

I'm not saying all of this should be implemented in bzr core,
quite the contrary, I'm just dumping my ideas to define the
limits and expected perimeter.

Some filtering may be easier to do for some log formatters during
the display stage... I see no reason to forbid that :)

How to handle backward compatibility ?

Well, the LogFormatter.__init__ prototype received an additional
optional parameter between 1.10 and 1.11, so any log formatter
that were expecting a fixed prototype is already broken. The
other must have used *args, **kwargs appropriately, so we are free
to add more optional parameters :-)

And as long as we add new behavior through public methods a la
begin_log()/end_log() we should be able to move most of the
existing module methods to the LogFormatter class as default
behaviors.

This is still incomplete as a description but enough to address
your current target, avoiding any O(history) operations for the
default log formatter can be rephrased as: let define a log
formatter that does no O(history) operations, because it shows
only mainline revisions for example. 

And that final remark kind of implies that the log formatter must
be responsible for handling the -r parameter coming from the
command line.

<snip/>

    Ian> +    # Find and print the interesting revisions
    Ian> +    try:
    Ian> +        log_count = 0
    Ian> +        revision_iterator = _create_log_revision_iterator(branch,
    Ian> +            start_revision, end_revision, direction, specific_fileid, search,
    Ian> +            generate_merge_revisions, allow_single_merge_revision,
    Ian> +            generate_delta)
    Ian> +        for revs in revision_iterator:
    Ian> +            for (rev_id, revno, merge_depth), rev, delta in revs:
    Ian> +                lr = LogRevision(rev, revno, merge_depth, delta,
    Ian> +                                 rev_tag_dict.get(rev_id))
    Ian> +                lf.log_revision(lr)
    Ian> +                if limit:
    Ian> +                    log_count += 1
    Ian> +                    if log_count >= limit:
    Ian> +                        return
    Ian> +    except _NonMainlineRevisionLimit:
    Ian> +        raise errors.BzrCommandError('Selected log formatter only supports'
    Ian> +            ' mainline revisions.')

Which explains why I'm not comfortable with your current approach
which forces the generic code to take into account some
specificities of a given log formatter (or a family of them but
surely not all).

       Vincent