[MERGE][BUG 51980] bzr log <file>returns inappropriate revisions
Kent Gibson
warthog618 at gmail.com
Tue Apr 10 17:35:57 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> Aaron Bentley wrote:
>
>> Kent Gibson wrote:
>>
>>> In my mind at least, ``bzr log <file>`` should only show the revisions
>>> that change the file, and exclude merge commits. Is it the general
>>> consensus that it should include merge commits? I'm just talking the
>>> long form here - I agree with you on the short forms.
>>>
>> Personally, I'd like to see the merge commits in the long form, but
>> others may have different ideas.
>>
>
> So the difficulty with this is for things like:
>
> A
> | \
> B C
> |/
> D
>
> If 'C' modifies 'foo.py', but 'B' doesn't, then when we commit 'D', we
> have ie.revision = 'C'. Because 'D' doesn't change 'foo.py' relative to
> its parents.
>
> So 'D' will not show up in either fileids_altered_by_revision_ids, nor
> in foo.py.kndx
>
> In fact, I would venture that foo.py.kndx is always a direct mapping of
> fileids_altered_by_revision_ids. And it better be, considering we use
> fileids... to figure out what Knit files need to be copied. So if they
> don't match, we would fail to copy texts.
>
> It is possible for .kndx to have, say, unreferenced revisions.
>
> The reason the current code shows mainline revisions affecting merged
> files, is because we compute the delta versus first-parent for every
> revision. And there is a delta (there isn't versus the second-parent...).
>
>
> One 'correctness' thing I wanted to add, is to have us do a delta for
> *every* revision. Which means that:
>
> A
> | \
> | |\
> | B C
> | |/
> | D
> |/
> E
>
> Would include D and E if C (and only C) modifies 'foo.'
>
> At least that follows naturally from why the current code works for what
> Aaron wants.
>
> This is *really* slow, but it is correct. (Right now we show all merge
> revisions, which is just horribly broken).
>
> Another possibility, which is a bit hackish, would be to only do deltas
> for true mainline revisions, and use the indexes for all non-mainline.
>
> That makes them 'correct-ish'. But inconsistent when you have a lot of
> nested merging.
>
> We could write a helper "fileids_mentioned_by", which would report what
> revisions include the fileid in their delta/fulltext against the primary
> parent. If we didn't have full texts, this would probably be reasonably
> correct. Since that delta shows changes against the primary parent.
>
> It fails for fulltexts (since they refer to all revisions), but we could
> try to special case them.
>
> Another option is to just use 'fileids_mentioned_by' as a way to cull
> out what revisions we might want to delta.
>
>
> However, all of this has a different problem, which is its asymmetry.
>
> Specifically, Aaron has requested that
>
> A
> |\
> B C
> |/
> D
>
> Show C, D when C modifies 'foo'. (C did the modification, and D merged it).
>
> However, what if 'B' modified foo. Arguably then we should show B, D,
> since D was a merge, and relative to C it modified foo.
>
> Now, we do support some asymetrical operations in 'bzr'. Considering
> that we have a notion of a mainline at all.
>
>
I'm despairing of any solution that requires delta generation since
that is so damn slow.
So I've been wondering if we can compare ancestry instead of full deltas.
Given that my initial patch provides the list of revisions with
substantive changes to a given file (I'll call this set Rc), what we
are trying to add are the revisions where these changes are merged
(I'll call that set Rm).
In the example above, at D we compare the ancestry of it's parents -
the left (B) and right (C).
If C's ancestry contains a revision from Rc that is not in B's
ancestry then we should add D to Rm. Repeat that for all merge
revisions in the graph to determine the full set Rm.
Note that in this context I am assuming the ancestry of a revision is
the complete list of all revisions contained in that revision,
including itself.
So if C and only C changes the file then C and D are in the log.
And if B and only B changes the file then only B is in the log.
Wrt scalability, using full ancestry scales very badly, but that can
be reduced to only the set of revisions changing <file>. That can be
generated from the revision graph and Rc.
In fact the updated diff attached does just that.
It is slightly slower than the previous patch, since it has to
generate the ancestry, but it still leaves the delta approach standing.
- From the little testing I've done the output is a superset of the
output of your full delta patch. There are a few additional entries
because get_weave is reporting a few spurious revisions, as mentioned
in previously in this thread.
I'd really like to investigate why get_weave is reporting those
spurious revisions to see if they can be stamped out.
And I am still wondering if this is the output we want to see. Most
of the merge revisions in the log are just bzr.dev being merged into
someone's local branch. This just seems like noise to me. While I
appreciate having the merges towards the mainline, I'd like to be able
to filter the merges away from the mainline, say with 'bzr log
- --no-merges-from-mainline <file>'
> There are a few more issues with logging by file-id versus logging by
> path. Specifically if you rename foo=>bar and create a new foo. Arguably
> doing "bzr log foo" should show both of them.
>
I agree logging should continue through renames.
But I'd argue that 'bzr log foo' should only show the log for the file
known as foo in the current working tree revision. Outputting the
logs for both bar and foo would be too confusing - and what if I
really only want to see the log for the new foo?
> Even further it seems it should show any new changes to 'bar'.
>
> (It is a bit more obvious when you look in the other direction, and say
> that bar=>foo should follow 'bar' backwards. But what if before the
> rename there was a 'foo' there already?)
>
>
>
> I'm pretty sure that people coming from git like that you can do "git
> log path/*", and have it track that back. I don't know if that includes
> renames or not. I would guess that it doesn't. (If you rename a
> other/foo => path/foo, it would show that action, but probably not show
> any other changes to other/foo).
>
>
I'd rather the behaviour I mentioned above - producing the logs for
the files currently in path/*.
Also note that there are two cases to consider here, one is where the
shell expands the path/*, the other is 'bzr log <directory>'.
> I have a patch:
> http://librarian.launchpad.net/6894938/log-verbose.diff
>
> Which changes the 'bzr log filename' to be correct. At the expense of
> being extra slow. (because it is doing all the extra deltas).
>
>
Your patch is slightly broken - non-mainline revisions are always
logged verbosely.
For non-verbose logging you should reset the delta prior to the
merge_depth test:
+ if not verbose:
+ # although we calculated it, throw it away without display
+ delta = None
+
if merge_depth == 0:
# a mainline revision.
- - if not verbose:
- - # although we calculated it, throw it away without
display
- - delta = None
> I would like to at least start there, (with tests) so that we can start
> from a 'correctness' perspective.
>
Agreed. The test coverage is pretty minimal at the moment.
> I would also like to incorparate James Westby's refactoring of the log
> formatters, so that they can be more intelligent about processing. I'm
> willing to do the work that Robert and Aaron required, which is to
> refactor it more, based on one class to figure out important revisions,
> and another class to figure out how to display them.
>
>
>
> I still wish that we had a saved delta, which meant we didn't have to
> resort to hacks like 'fileids_altered_by'. Especially since the hack
> fails for deletions.
>
>
Initially I would've agreed with you.
Now I'm not sure there is any point - for this particular problem anyway.
> I'm trying to get together the use cases, so that we can be sure our
> implementation is correct. In the short term, we should probably do
> something so that 'bzr log filename' isn't just plain wrong.
>
>
Agreed.
> We had:
> https://blueprints.launchpad.net/bzr/+spec/per-file-log-output
>
> But it doesn't seem to have any real details yet.
>
> Also, that spec doesn't really cover other cases like "bzr log -r
> revid:non-mainline" is also borked.
>
> So I changed it to 'log-output', and I'll try to build up some use-cases
> of what we want.
>
> John
> =:->
>
>
Cheers,
Kent.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGG7zsgoxTFTi1P8QRAhlFAKCoscC98SvARzpO3FPFcY03+lco0QCdGeaH
XNqASPl07/DGdVR8VvSE9do=
=OpAR
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lp51980_b.patch
Type: text/x-patch
Size: 7073 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070411/9ce1952c/attachment.bin
More information about the bazaar
mailing list