[MERGE][BUG] 51980: bzr log <file>returns inappropriate revisions

Mon Apr 2 18:58:58 BST 2007

Aaron Bentley wrote:
> Kent Gibson wrote:
>> In my mind at least, ``bzr log <file>`` should only show the revisions
>> that change the file, and exclude merge commits.  Is it the general
>> consensus that it should include merge commits?  I'm just talking the
>> long form here - I agree with you on the short forms.
> 
> Personally, I'd like to see the merge commits in the long form, but
> others may have different ideas.

So the difficulty with this is for things like:

A
| \
B C
|/
D

If 'C' modifies 'foo.py', but 'B' doesn't, then when we commit 'D', we
have ie.revision = 'C'. Because 'D' doesn't change 'foo.py' relative to
its parents.

So 'D' will not show up in either fileids_altered_by_revision_ids, nor
in foo.py.kndx

In fact, I would venture that foo.py.kndx is always a direct mapping of
fileids_altered_by_revision_ids. And it better be, considering we use
fileids... to figure out what Knit files need to be copied. So if they
don't match, we would fail to copy texts.

It is possible for .kndx to have, say, unreferenced revisions.

The reason the current code shows mainline revisions affecting merged
files, is because we compute the delta versus first-parent for every
revision. And there is a delta (there isn't versus the second-parent...).

One 'correctness' thing I wanted to add, is to have us do a delta for
*every* revision. Which means that:

A
| \
| |\
| B C
| |/
| D
|/
E

Would include D and E if C (and only C) modifies 'foo.'

At least that follows naturally from why the current code works for what
Aaron wants.

This is *really* slow, but it is correct. (Right now we show all merge
revisions, which is just horribly broken).

Another possibility, which is a bit hackish, would be to only do deltas
for true mainline revisions, and use the indexes for all non-mainline.

That makes them 'correct-ish'. But inconsistent when you have a lot of
nested merging.

We could write a helper "fileids_mentioned_by", which would report what
revisions include the fileid in their delta/fulltext against the primary
parent. If we didn't have full texts, this would probably be reasonably
correct. Since that delta shows changes against the primary parent.

It fails for fulltexts (since they refer to all revisions), but we could
try to special case them.

Another option is to just use 'fileids_mentioned_by' as a way to cull
out what revisions we might want to delta.

However, all of this has a different problem, which is its asymmetry.

Specifically, Aaron has requested that

A
|\
B C
|/
D

Show C, D when C modifies 'foo'. (C did the modification, and D merged it).

However, what if 'B' modified foo. Arguably then we should show B, D,
since D was a merge, and relative to C it modified foo.

Now, we do support some asymetrical operations in 'bzr'. Considering
that we have a notion of a mainline at all.

There are a few more issues with logging by file-id versus logging by
path. Specifically if you rename foo=>bar and create a new foo. Arguably
doing "bzr log foo" should show both of them.
Even further it seems it should show any new changes to 'bar'.

(It is a bit more obvious when you look in the other direction, and say
that bar=>foo should follow 'bar' backwards. But what if before the
rename there was a 'foo' there already?)

I'm pretty sure that people coming from git like that you can do "git
log path/*", and have it track that back. I don't know if that includes
renames or not. I would guess that it doesn't. (If you rename a
other/foo => path/foo, it would show that action, but probably not show
any other changes to other/foo).

I have a patch:
http://librarian.launchpad.net/6894938/log-verbose.diff

Which changes the 'bzr log filename' to be correct. At the expense of
being extra slow. (because it is doing all the extra deltas).

I would like to at least start there, (with tests) so that we can start
from a 'correctness' perspective.

I would also like to incorparate James Westby's refactoring of the log
formatters, so that they can be more intelligent about processing. I'm
willing to do the work that Robert and Aaron required, which is to
refactor it more, based on one class to figure out important revisions,
and another class to figure out how to display them.

I still wish that we had a saved delta, which meant we didn't have to
resort to hacks like 'fileids_altered_by'. Especially since the hack
fails for deletions.

I'm trying to get together the use cases, so that we can be sure our
implementation is correct. In the short term, we should probably do
something so that 'bzr log filename' isn't just plain wrong.

We had:
https://blueprints.launchpad.net/bzr/+spec/per-file-log-output

But it doesn't seem to have any real details yet.

Also, that spec doesn't really cover other cases like "bzr log -r
revid:non-mainline" is also borked.

So I changed it to 'log-output', and I'll try to build up some use-cases
of what we want.

John
=:->