[RFC] About log [-v] [--deep] [file|dir]*

Vincent Ladeuil v.ladeuil+lp at free.fr
Wed Dec 17 18:13:28 GMT 2008


Hi,

As summarized in the subject we have a couple of bugs/feature
requests pending regarding the ability to search the history for
multiple files or even all the files in a given directory.

I focused on these parts here, even if log is under investigation
for other reasons :-) (See '[RFC] How various commands display
revisions' for other points and concerns regarding log (but also,
push, pull, missing, status, etc)).

The patch fixing bug #175520 implements a --deep option that
searches all revision inventories with the file *path* (instead
of using the file-id and the .tix index as an quick way to
establish which versions are requested).

First, I'd like to mention again
https://bugs.edge.launchpad.net/bzr/+bug/181520 (bzr log FILE
don't show revisions where file was removed), so that we don't
forget it ;)

Then there are a couple of issues I'd like to discuss:

1) What format should we use when displaying the delta ?

So far we have two candidates: 

- delta.show()

- delta._ChangeReporter()

The former provides a short and a long format, handles --show-ids
and knows how to filter the delta when used for 'log -v file'.

The former seems to be favored because it's used by status as its
short format but, it doesn't have a long alternate format, it
doesn't handle --show-ids and doesn't know how to filter (the
last point is not really relevant).

2) Should we accept custom delta formatters as we accept custom
log formatters ?

Or not ?

I'm tempted to answer yes because log formatters are the ultimate
bike-shedding subject and we can't hope to provide all the options
anybody can dream about.

'I want log --branch-nick', 'I want log --my-commits-only', 'I
want log --show-diffs', etc

What we should provide instead in a sane default but enough hooks
to allow full redefinition (most of that already exists).

The answer(s) to this question should also mention whether or not
we should/must/could use this delta formatters for 'bzr
status'...

3) Should the delta format be the log formatter responsibility ?

Currently 'log --short -v' uses delta.show(short_status=True) and
'log --long -v' uses delta.show(short_status=False).

Other formatters... don't get their hands on the --verbose option
so they can't control the format used via this mean... This may
be a sign that it shouldn't be used for that purpose, this may be
solved by providing an explicit parameter for the delta format
though and just use '--verbose' to mean: 'show the delta'.


4) How should the delta format be specified ?

Since we have a 'log_format' config variable, that can be
overridden from the command line, I'd like a 'delta_format'
variable with the same properties and capabilities.

5) Performance issues ?

Well, we did some experiments with John using its xml entry
cache.

It helps.

But when it comes to '-v' in large working trees deserializing
two trees and rebuild the delta is never going to be the right
solution.

The chk-inv format should behave far more better in that respect
and I plan to verify that instead of trying to optimize the
current log -v implementation for all formats.

Does that sounds like a reasonable strategy ?

The other point is a better streaming, the core function
(show_log()) is already using iterators so there is nothing to
be really done here.

Except that log -v file is still perceives as taking a
long time to start because:

- it should calculate revnos upfront instead of incrementally of
  course but that's a different topic,

- it find all revisions modified by a given file-id at
  once. _filter_revisions_touching_file_id may rewritten as a
  generator, synchronized with the main log one so that text keys
  are still queried by chunks (as they are now), but delivered on
  demand instead of all at once. I.e. instead of filtering the
  revisions upfront, we do that more incrementally still querying
  the text keys by chunks (since that is the (one ?) heart of
  *that* optimization).

6) But what about multiple files ?

Yes sorry, I'm getting there but I'd like answers for the above
points before adding even more features in log.

So multiple file-ids and multiple file paths aren't really a
problem, so far we have only a specific_file_id parameter used by
various functions which can be deprecated and replaced by (a
file-id list and/or a file path list) or (a file list and a kind
parameter) or (some callables).

I've been able to implement --deep with a single callable but I
suspect two may be needed for better performances when filtering
a delta. Both callables will be defined from a file list anyway.

7) log DIR

Since we now have two ways to search history for a given file (id
or path) searching that history raise an interesting question:

a - do we want to search all file ids contained in DIR right now
    (i.e. in the tip, i.e file-ids)

or

b - do we want to search for all files that has been contained in
    this DIR-id

or

c - do we want to search all paths that have had DIR as parent in
    all revisions

For consistency (and ease of coding :) I'll favor an approach
where --deep is still responsible to decide whether we want to
search by id (a) or by path (c), and forget about (b)  but I'm
biased.

(b) is still nice to keep in mind if only to ensure that it can
be handled by a specific log formatter.

Thanks in advance for the feedback,

       Vincent



More information about the bazaar mailing list