git and bzr
Linus Torvalds
torvalds at osdl.org
Thu Nov 30 16:45:42 GMT 2006
On Thu, 30 Nov 2006, Nicholas Allen wrote:
>
> Does this mean if I have, for example, a large C++ file with a bunch of
> methods in it and I move one of the methods from the bottom of the file to the
> top and in another branch someone makes a change to that method that when I
> merge their changes git will merge their changes into the method at the top of
> the file where I have moved it?
Right now (and in the near future), nope. "git blame" will track the
changes (so the pure movement wasn't just an addition of new code, but
you'll see it track it all the way down to the original), but "git merge"
is still file-based.
In other words, "git merge" does uses a data similarity analysis that
could be used for smaller chunks than a whole file, but at least for now
it does it on a file granularity only (and then passes it off to the
standard RCS three-way merge on a file-by-file basis).
That said, if the movement happens _within_ a file, then just about any
SCM could do what you ask for, by just using something smarter than the
standard 3-way merge. So that part isn't even about tracking data across
files - it's just about a per-file merge strategy.
The "track data, not files" thing becomes more interesting when you factor
out a file into two or more files, and can continue to merge across such a
code re-filing event. Git can do it for "annotate", but doesn't do it for
anything else.
> If so that would be really quite impressive!
Indeed, and it's one of the potential future goals that was discussed very
early in the git design phase. The point of _not_ doing file ID tracking
is exactly that you can actually do better than that by just tracking the
data.
So some day, we may do it. And not just within one file, but even between
files. Because file renames really is just a very specific special case of
data movement, and I don't think it's even the most common case.
That said, there are several reasons why you might not actually _ever_
want it in practice, and why I say "potential future goal" and "we may do
it". I think this is going to be both a matter of not just writing the
code (which we haven't done), but also deciding if it's really worth it.
Because merges are things where you may not want too much smarts:
- Quite often, a failed merge that needs manual fixup may even be
_preferable_ to a successful merge that did the merge "technically
correctly", but in an unexpected way.
- There's a _big_ difference between "merging code" and "examining code".
It makes much more sense to try to track where code came from and what
the "deep history" was when you examine code, because the reason you're
doing so is generally exactly because you're looking for what went
wrong, and who to blame.
When going "merging", the history of the code is arguably a lot less
important. What is the most important part is that the two branches you
merge have been (hopefully) verified in their _current_ state. The
history may be full of bugs, and they may have been fixed differently,
and even trying to be really clever may not actually be a good idea at
all.
Code may have moved or may have been copied, but what is much more
important than the original code and where it came from is the state it
was in _after_ the move, because that's the tested working state, and
in many ways the history of how it came to be really shouldn't matter
as much at all.
In other words, "annotate" and "merge" have almost entirely opposite
interests. An annotation is supposed to find the history in order to maybe
help find bugs, while a merge is supposed to use the _current_ state, and
very arguably, if the two current states don't match _so_ obviously that
there is no question about what you should do, then the merge should make
that very very very clear to the user.
So my personal opinion has always been that a merge should be extremely
simpleminded. I think all teh VCS people who concentrate on smart merging
absolutely have their heads up their arses, and do exactly the wrong
thing. A merge should not do anything "clever" at all. It should be just
_barely_ smart enough to do the obvious thing, and even then we all know
that it will still occasionally do the wrong thing.
So I actually think that a bog-standard and totally stupid three-way merge
is simply not far from the right thing to do. And the git "recursive"
thing basically repeats that stupid merge (a) in time (ie the criss-cross
merge thing causes a recursive three-way merge to take place) and (b) in
the metadata space (ie you can see the rename following basically as just
a "3-way merge in filenames").
And yes, this is probably some mental deficiency and hang-up, but I think
that's sufficient, and that where the real "clever" stuff should be is to
then help people resolve conflicts (and maybe also help you find
mis-merges even with the totally stupid and simple merge). Because
conflicts _will_ happen, regardless of your merge strategy, and you do
need people to look at them, but you can make it _easier_ for people to
say "ok, that's obviously the right merge".
So me personally, I'd rather have the "real merge" be what git already
does, and then have something like a graphical "resolution helper"
application that tries to resolve the remaining things with user help. And
that "resolution helper" is where I'd put all the magic code movement
logic, not in the merge itself.
So you could look at a failed hunk, and press a "show me a best guess"
button, and at that point the thing would say "that code might fit here,
does that look sane to you? <Ok>, <Next guess>, <Cancel>".
THAT is what a good VCS should do, in my opinion. Not do "smart merges".
Btw, git doesn't do the above kind of smart graphical thing, but git
_does_ do something very much in that direction. Unlike a lot of things,
git doesn't just leave the "conflict marker" turds in the working tree.
No, the index will contain the three-way merge base and both of the actual
files you were trying to merge, and a "git diff" will actually show you a
three-way diff of the working tree (and you can say "git diff --ours" to
see the diff just against our old head, and "--theirs" to see a regular
two-way diff against the _other_ side that you tried to merge).
So git already very much embodies this concept of "don't be overly smart
when merging, but try to help the user out when resolving the merge". It
may not be pretty GUI etc, and it mostly helps with regular bog-standard
data conflicts, but boy is it pleasant to use for those once you get used
to it.
So we get NONE of those horrible "you just get conflict turds, you figure
it out" things. It gives you the turds (because people, including me, are
used to them, and you want _something_ in the working tree that shows both
versions at the same time, of course), but then you can edit them to your
hearts content, and even _after_ you've edited them, you can do the above
three-way (or two-way against either branch) diffs, and it will show what
you edited and its relationship to the two branches you merged.
THAT is what merging is all about. Not smart merges. Stupid merges with
good tools to help you do the right thing when the right thing isn't _so_
obvious that you can just leave it to the machine.
Linus
More information about the bazaar
mailing list