[MERGE] Fix #175520 by implementing a --deep option for log
Vincent Ladeuil
v.ladeuil+lp at free.fr
Tue Jan 6 14:19:39 GMT 2009
>>>>> "Ian" == Ian Clatworthy <ian.clatworthy at internode.on.net> writes:
Ian> Vincent Ladeuil wrote:
>> This patch provides a --deep option that allows to search history
>> with a path instead of a file-id.
Ian> I'll make my official vote 'comment' because doing so will
Ian> hopefully get some others to provide their thoughts if they
Ian> haven't already. If no-one else comments soon, treat my vote
Ian> as (borderline) resubmit.
>>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:
robert> I agree with most of Ian's comments.
Ok, thanks for the reviews, I'll reply to both reviews here.
Ian> bb: comment
Right, so overall it looks like some discussion is needed even
before a resubmit.
Ian> To start with, I'm not that keen on --deep as the option name,
Ian> though I can't say I have something necessarily better yet. :-(
Ian> To me, this option is all about name-matching, as opposed to
Ian> deep searching. In other words, it explicitly logs revisions
Ian> that add/delete/change a file matching that path and *not*
Ian> previous names of that logical file.
Ok.
Ian> So I'd prefer an option name that emphasized that
Ian> aspect. (In comparison, --deep *could* one day be used
Ian> to ask for deep searching of the path -> file-id
Ian> mapping, i.e. look in inventories until you find the
Ian> file-id and then log against it.)
With the caveat that you will find the first previous file id
when there can be several...
Ian> You're solving two problems at once here but perhaps they
Ian> really ought to have separate solutions?
Ian> * --name (say) for name-matching
Ian> * --old to search for the most recent file-id with that path.
robert> I would say that just --old is enough as a name - I
robert> mean, log -v FILE today does file->fileid mapping, so
robert> log -v --old FILE looking for file names is not much
robert> different from the users perspective.
You agree on the name (I can do that too) but disagree on its
semantic.
--deep (or let's just say --old from now on) is meant to address
one hole: we don't have (and maybe still don't want) a mean to
query log for a given file id.
Most of the time nobody cares, I think more than 90% of log uses
doesn't even try to query for files that aren't present anymore
in the current tree.
Yet, when you want to search for such a file, without --old,
you're out of luck.
Ian makes a distinction between 'show me that previous
file->fileid match' whereas I just implemented 'show me all the
file->fileids matches'.
I don't think we need such a distinction as long as the number of
ids for the same path remain small (and I think it generally
remains small).
Especially if we modify log to interpret file->fileid from the
first revision queried instead of against the working tree (as
done now). But this is not addressed yet.
So, in that patch, can we agree on handling only two cases as
done here ?:
- file -> file id against the working tree by default
- file -> any file id if --old is used
Otherwise we need yet another mean to access even older file ids.
>> This patch prepares some parts to be able to address other log
>> related bugs like the ability to specify multiple files or
>> requesting the log for a directory.
Ian> Cool.
>> Since the revisions displayed are different when --deep is used,
>> I'd like feedback on the choices made (in particular, unlike the
>> file-id based implementation, the revisions toward mainline are
>> *not* displayed if they don't reference the file).
Ian> I fully expect someone to raise that inconsistency as a bug and,
Ian> from a UI perspective, I don't think we can justify it. If it's
Ian> important to know the merge revisions when the file exists, it
Ian> remains important later even if the file has since been deleted.
Ian> I'm hesitant to block this patch on this because I hate it when
Ian> reviewers make perfect the enemy of good. Your patch is a step
Ian> forward but differences like this are red flags about the
Ian> fragility of the implementation.
robert> I think this should show the same revisions as log
robert> does today, its IMO definitely a bug.
Right, I mentioned that to get feedback, I got feedback :-)
I should have explained my motivations better though (but I'd
really like if Robert could elaborate on why he thinks it's a
bug?): I view the actual display of the merging revisions as a
bug or a misfeature or may be just one more missing option due
to:
- our actual dotted revno numbering scheme,
- the fact that we don't display revisions *removing* a file.
I'm almost sure that the main reason they are displayed here is
to identify the mainline merging revision *number* to reuse that
number for either another bzr log or a bzr annotate. If we had a
dotted revno numbering where merged revisions start with mainline
merging revno instead... they won't be needed anymore.
As in:
- do 'log -v FILE' and get just the revisions modifying the file
(as a rough sieve),
- from there do 'log -c mainline.revno' from the interesting
dotted revno[s] obtained above
The other reason is that displaying the mainline revisions also
separate the revisions merged. Omitting them may give the wrong
impression (to the user) that they occurred consecutively.
To be honest it will also make the implementation subjectively far
slower (since the merging revisions appear *before* the merged
revisions, it would not be possible to start the display early,
see below).
Also, there is a subtle edge case that can be detected by having
the actual behavior difference:
- a file is deleted in a merged revision
- the merge generates a conflict because the mainline want to keep the file,
- the conflict is resolved by *keeping* the file
I know the argument is pretty weak (even if I *used* it in real
life because I couldn't find another way to find such scenarios),
but on the other hand, I also feel that log is the main tool to
explore a project history and I don't think we offer all the
needed basic features and the ability to combine them.
'missing' for example select revisions in a way that can't be
reproduced from log. Marius Kruger's attempt with
[MERGE][RFC][1.11] implement `missing --revision 3..4
--other-revision 3..4` should, IMHO, be doable by log (it isn't
right now).
So I think *that* patch is just scratching the surface of a
deeper problem.
>> Also, there may be concerns about compatibility as '-v' now
>> respects the FILE argument in that it doesn't display the *whole*
>> inventory anymore. Since it was almost unusable with large
>> working trees, I hope the NEWS entry explaining the bug fix is
>> enough in that regard.
Ian> In this case, I think you've made an improvement. We ought to
Ian> list it under CHANGES inside NEWS but it's otherwise fine IMO.
robert> As for the change to -v to not show other paths
robert> changed at the same time, I don't like that change at
robert> all - I think it really needs discussion.
robert> I don't like it because, it makes log -v FILE
robert> meaningless, every leaf revision will show just the
robert> files name.
And the associated action yes (added, modified, deleted). That's
meaningful for me.
Note that the main motivation here is that log -v FILE on big to
huge working trees is actually *useless*, the generated output
also being impossible to grep unless we make it possible to use a
different format (see the thread named '[RFC] About log [-v]
[--deep] [file|dir]*').
Since it sounds that both points of view can't be reconciled, do
we need one more option or should we just use verbosity_level ?
In that case '-vv' will restore the previous behavior ?
Ian> Some more explicit feedback below ...
>> + * ``bzr log`` now accepts a --deep option to look at files not present
>> + anymore in the working tree. Be aware that the actual implementation is
Ian> anymore -> any more
>> + * ``bzr log -v FILE`` don't display the whole inventory anymore. Only the
>> + line about FILE is displayed.
>> + (Vincent Ladeuil)
Ian> don't -> doesn't
>> + Option('deep',
>> + help='Search for file in all revisions (slower).'),
Ian> I'd like to see the help for the log command expanded
Ian> quite a bit as part of this patch. I think it needs to
Ian> explain file-id-based vs file-path-based logging and
Ian> include examples of both.
Ok, I could use some help here though :)
Ian> I wonder if we ought to suggest --deep (or its better
Ian> name if we can come up with one) when a path isn't
Ian> found?
>> + for item in files:
>> + path, file_id, kind = item[:3]
>> + if (filter is not None and not filter(path, file_id)):
Ian> The () around the condition here isn't needed.
>> for (rev_id, revno, merge_depth), rev, delta in revs:
>> + if _path_filter is not None:
>> + try:
>> + if _filter_delta(delta, _path_filter) == 0:
>> + continue
>> + finally:
>> + if not generate_delta:
>> + # We don't need the delta anymore
>> + delta = None
>> lr = LogRevision(rev, revno, merge_depth, delta,
>> rev_tag_dict.get(rev_id))
Ian> What does try/finally buy here?
Nothing :-) Inherited from an intermediate try.
Ian> Can't you just put the two if blocks after each other
Ian> with exactly the same effect? In fact, it really feels
Ian> like the checking of path_filter needs to go higher in
Ian> the code, as a parameter to either
Ian> calculate_view_revisions() or make_log_rev_iterator().
Unfortunately, no.
To elaborate: I first tried to do it at a higher level (in
calculate_view_revisions like _filter_revisions_touching_file_id)
but the consequence is that the revisions are then selected
*before* any display occurs.
This is a key design choice for file_id (which quickly identified
the revisions) but --old means *all* revisions should be searched
and that's slow as hell (It can't take minutes or even hours with
our actual formats). So doing the filtering here, means some
display occurs early and at least the user can start reading
while we search the rest.
>> +def _filter_delta(delta, filter):
>> + """Returns the number of times filter returns True for the given delta."""
>> + matches = 0
>> + for l in (delta.added, delta.removed, delta.kind_changed, delta.modified):
>> + for item in l:
>> + if filter(item[0], item[1]):
>> + matches += 1
>> + for item in delta.renamed:
>> + if filter(item[0], item[2]) or filter(item[1], item[2]):
>> + matches += 1
>> + return matches
Ian> I'm guessing you're looking to reuse this function later?
Not especially, but I wanted to minimize the impact on _show_log
to make it clearer that the delta is generated for different
purposes.
In fact, it will surely be rewritten when multiple files should
be filtered and even the signature may change...
Ian> Even so, you may well be better off just returning match
Ian> yes/no as opposed to counting matches, because the
Ian> former is easier to optimise. For example, you could
Ian> return as soon as any match is found and search
Ian> modifications before the other lists, say.
Yes we can :) But that's not a significant optimization in that
case (building the delta cost orders of magnitude more than just
filtering it) so I just didn't even try.
<snip/> typos fixed.
Ian> Oops - two methods with the same name. Don't you just love how
Ian> that's a "feature" in Python? :-)
Grrr.
Ian> I think the second one needs to drop the "not_"
Ian> bit.
Indeed.
Ian> Also the last comment doesn't match the code does it?
Ian> Didn't you mean assertContainsRe there???
>> + def test_log_old_file_not_deep(self):
>> + # Without deep, we can't query an old file
>> + log = self.run_bzr(['log', '-v', 'f1'], retcode=3)[0]
Ian> As mentioned previously, perhaps we ought to suggest --deep
Ian> to the user in this case?
>> + def test_log_old_file_deep(self):
>> + log = self.run_bzr(['log', '-v', '--short', '--deep', 'f1'])[0]
>> + # revnos 1 and 2 appear and mention adding and deleting f1
>> + self.assertContainsRe(log, '(?sm)^ +2 joe.*D f1\n.*^ +1 joe.*A f1\n')
>> + self.assertNotContainsRe(log, '(?sm)^ +3 joe')
Ian> Hmmm, this is testing exactly the same parameters as
Ian> "test_log_file_deep_short".
Same parameters but a different file (f1 instead of f2) with a
different history.
Ian> In summary, thanks for working on this - getting a
Ian> rich-featured, bug-free, high-performing log command
Ian> is quite a challenge for us given our UI desires.
Ian> I like most of the refactoring you did but feel this
Ian> patch isn't ready for merging just yet, at least not
Ian> without a bit more discussion re the UI and implementation
Ian> alternatives.
That's why I also started *two* related threads:
- [RFC] About log [-v] [--deep] [file|dir]*
- [RFC] How various commands display revisions
Don't hesitate to give feedback there too,
Vincent
More information about the bazaar
mailing list