What is a modification?

Martin von Gagern Martin.vGagern at gmx.net
Wed Oct 5 17:39:41 UTC 2011


Hi!


A. MOTIVATION

This mail is supposed to start a discussion. It is motivated by
https://bugs.launchpad.net/bzr/+bug/842695 .

The key question here is: what is a modification? In other words, if you
run "bzr log -n0" with a file or directory name as an argument, what is
the list of revisions you would expect to see in the output?

I personally won't focus on aspects of how files move through the
hierarchy. If the discussion evolved in that direction, so be it, but my
primary concern is even in the absence of any file renames.


B. CURRENT SITUATION

The current "bzr log -n0" implementation has the following concept of a
modification: it compares the committed tree with the one from its
LEFTMOST (i.e. main-line) parent. Every file present in both trees is
considered unmodified, everything else is modified in one way or another.

The problem here is the LEFTMOST, particularly as it can lead to the
same modification being reported multiple times. In the context of the
above bug, consider the following DAG:

  1.1.1
 /     \
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8
 \               \     /
  1.2.1 - 1.2.2 - 1.2.3

So you have a trunk at the center, and two feature branches. The 1.1
branch introduced some modification, which was merged into trunk at
revision 3. The other branch, 1.2, merged revision 5 from trunk, and was
itself merged into trunk at revision 7. We'll be only interested in the
trunk log, so those revision numbers will stay the same. Notice that
such configurations are quite common in the history of bzr itself.

Now the problem is that the two children of 1.2.3 are 1.2.2 and 5, in
that order. So any difference introduced into the 1.2 branch by the
merge appears as a new modification in the log output of 1.2.3, which in
my opinion is rather confusing, particularly if the branches 1.1 and 1.2
are dealing with completely unrelated parts of the tree. In case where
1.1.1 added a new file, 1.2.3 will again be reported to be adding that
file. This is probably the best reason why the current scheme isn't
sufficient: it's strange to have a single file (single as in single file
id, not only the path) reported as new more than once.


C. TERMS AND DEFINITIONS

I'd like to define two terms for the scope of this discussion here. I'd
say a commit has an ORIGINAL modification to a file if none of its
parent has the same file with the same content. In contrast to that, a
modification is INHERITED if the left parent doesn't have the same file
content, but some other parent does. Obviously, only merges can inherit
modifications. The same terms can be applied to directories by comparing
their whole recursive content.

Two important observations: if a file is edited in two branches, then
merging them will cause a 3-way merge, resulting in a file different
from the version in either branch. So the file content merge produces an
ORIGINAL modification (as automatic merges may depend on plug-ins and
therefore cannot be easily identified as such). On the other hand, if a
branch never modifies a given file itself, then repeated merges from a
single trunk will never cause an ORIGINAL modification for that file, as
its content will remain completely in sync with the trunk.


D. EXPECTED BEHAVIOR

So what behavior would I expect for the -n0 log restricted to a given
file or directory? First of all, I'd like to see every ORIGINAL
modification to that file. Secondly, I guess that for every ORIGINAL
modification, I'd like to see a SINGLE line of (ORIGINAL or INHERITED)
modifications leading from that modification to the main line I'm
logging. That line should end at the earliest point integrating the
ORIGINAL modification into the mainline, and should continue by
recursively applying this criterion to the appropriate branch from which
the modification originated.

So given the DAG from above:

  1.1.1
 /     \
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8
 \               \     /
  1.2.1 - 1.2.2 - 1.2.3

then if 1.1.1 has an original modification that I'm interested in, 3
will inherit it from 1.1.1, and 1.2.3 will inherit it from 5. Looking at
the trunk line of development, I'd see 3 as the earliest point
inheriting that modification. And I'd see 1.1.1 as the child it was
inherited from. So I'd include 3 in the log output, and then take 1.1.1
as the tip for a new recursive search for the path towards the original
modification. In this case, that's directly the commit containing that
modification, but I guess the recursive approach should be clear.

So the log would print 3 and 1.1.1, but not 1.2.3.

Note that the above algorithm is only intended as a way of describing
what I would expect. It is not meant as a suggested implementation. In
fact, identifying every original modification touching a large directory
might give a large result set, and finding the path for each of these
could be very time-consuming. Some better approach should be found. But
I prefer to think of what I'd expect and then find ways to obtain it. So
if you agree to my expectation, then is the time for thinking about
implementing it. I guess any reasonable fix to the current situation
would require a major rewrite of how log iteration works.


E. DISCUSSION

So the question to you: do you agree with my expectation?
Or would you really expect 1.2.3 to be included in the log, as it
currently is?
Or do you have yet a different expectation altogether, even if it
coincides with mine or current behavior on this simple graph?

Please note that I'm not usually reading this list, so although I'll try
to follow discussion here for the next few days, I might miss replies
after a longer time of inactivity. If in doubt, please send me a copy to
my personal address as well. I'll try to remember posting a summary of
the discussion on https://bugs.launchpad.net/bzr/+bug/842695 for future
reference. If you are interested in the long-term development of this
issue, feel free to subscribe to that report.


Greetings,
 Martin von Gagern

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20111005/4df5fae8/attachment.pgp>


More information about the bazaar mailing list