[ANNOUNCE] Example Cogito Addon - cogito-bundle
Linus Torvalds
torvalds at osdl.org
Sat Oct 21 19:42:58 BST 2006
On Sat, 21 Oct 2006, Jan Hudec wrote:
>
> [ On not moving files that weren't moved originally, but whose
> directories were moved ]
>
> I still consider it a bug, but different problems of the file-id
> solution have already been described in this thread that I consider bugs
> as well.
>
> Besides I start to think that it should be actually possible to solve
> this case with the git-style approach.
It's certainly _possible_ to figure out, but one reason git does what it
does is that it's just simpler (ie just ignore the whole "directory move"
situation entirely, and just consider it to be "many files moved").
Another reason is that this really is an ambigious case. When the
directory was moved, the file in question really didn't exist. So when it
was created independently of the move, it really _is_ somewhat ambiguous
whether the intention was to move it with the other files or whether the
new creation point is the right one.
I think that for a human, the details would likely be obvious (and I
suspect that in most cases it would indeed move with the directory). But
it really isn't totally clear: what does moving a directory imply for the
future? Does it imply that the directory should never exist in the future,
or does it just imply that the _current_ contents move?
Git "tends to" have a policy of not caring about directories at all. For
example, git will not track an empty directory by default. You _can_ make
it track one in your commits (the data structures support it), but you're
really just better of just thinking of git as tracking individual files,
and nor really directories. So as far as git is concerned, "directories"
mostly don't really have any existence on their own, they only exist as
paths to reach files.
In that kind of mindset, renaming a directory really is about renaming the
files that are in that directory, and that explains the git behaviour. It
may not necessarily be what you expect, but it _is_ consistent, and it's
not really "wrong" either. It's just another way of looking at the thing.
Also, I'd like to point out that people worry way too much about merges.
There are much harder merge conflicts to fix up. If you notice that things
didn't go the way you expected in a merge, even if it was done
automatically, you can just do a
git mv unexpected/directory/file expected/directory/file
git commit --amend
which basically "fixes up" the automatic merge (that's what the "--amend"
means: it means "re-do the last commit with _this_ state instead).
(Of course, you could also just make a separate commit to move the file,
but I think the "manual fixup of the merge" is just cleaner - just add a
note in the commit message to say you fixed it up by hand. When you do
your "git commit --amend", it will automatically just give you an editor
to edit up the commit message too while you're at it).
So again: merges are certainly fairly "hard" from a SCM standpoint, but
from a user standpoint, they tend to be not at all as important. I would
again argue that more important than the merge itself (which you can
trivially just fix up to match your expectations) is to make it easy to
later _show_ what happened, ie if you examine the file later, you should
be able to see where it came from.
(And again, with git, things like "git pickaxe" - think of it as just a
"better annotate" - will indeed pick up the similarity, regardless of
whether the rename was done manually or automatically as part of the
merge - exactly because git only really cares about actual contents).
Btw, just to be honest: git _mostly_ thinks in terms of "constant
pathname patterns" as opposed to "individual paths that move around".
That's at least partly because of how I work. I actually fairly seldom
look at an individual file, and tend to much more often look at a group of
files, and then it's a _lot_ more convenient to do
gitk drivers/usb include/linux/usb*
where those argument pathnames are _not_ a set of filenames that we track,
but really somethign more generic, namely a "repository pathname subset"
which is constant. The above will show the _subset_ of the kernel
repository history that is relevant for all the named pathnames, but the
pathnames are _fixed_. It won't follow files that move out of the
subdirectories: it will show the history as seen from the viewpoint of a
certain subset of pathnames.
This also extends to things like "git log". So when you do
git log kernel/sched.c
if you have a "file ID" mentality, you expect the above to follow renames.
It doesn't - even though git -can- follow renames, what the above actually
_means_ is "show the log for the fixed pathname set that only includes one
single path".
So if "kernel/sched.c" had originally been called something else, the
above wouldn't show the rename at all. It would just show that "oh, this
pathname suddenly was created as a new file", because from the viewpoint
of that fixed pathname, that's _exactly_ what happens.
We've discussed adding a "--follow" flag to tell "git log" to consider the
argument to not be a "pathname filter", but a "individual file" kind of
thing, and I think there was even a patch for it, but I suspect it hasn't
been a big issue, probably partly because you get rather used to the
"pathname filter" approach fairly quickly. If you knew what the old
pathname was, for example, you could get git to _tell_ you about the
rename by doing
git log -M -- <set-of-all-pathnames-we're-interested-in-old-included>
and git would happily see the renames that happen _within_ that pathname
filter (the "-M" is there because by default "git log" doesn't show any
patches at all, of course, so if you want to see the rename, you need to
tell git so).
As a particular example of this behaviour, if you do
git log -M kernel/
you'll always see any renames that happen _within_ that subdirectory, but
any files that are moved into (or out of) the subdirectory will be
considered to be "create" or "delete" events - because you've literally
told git to ignore all history that is not relevant to the kernel/
subdirectory (so they really _are_ "create/delete" events as far as that
subdirectory is concerned).
Is this different from other SCM's? Hell yes. git does a lot of things
differently. Is it useful? Again, hell yes. Especially for a maintainer,
the ability to talk about pathname _patterns_ is generally much more
important than talking about any particular file.
[ The pathname thing also means that it's trivial to ask questions like
"ok, so what happened to file xyz that I _know_ we used to have, but
clearly don't have any more?".
You just do "git log -- xyz", and you'll see exactly what you wanted to
see. The "--" here (and in a previous example) is because to avoid
ambiguity, git requires that if you name files that don't actually
exist, you make it clear that they are filenames, not just mistyped
revision ID's or something else. ]
In general, git gives you the best of both worlds. It knows how to follow
individual files if you want to, but by default it uses this much more
generic concept of "pathname filters". The default is definitely
influenced both by my usage, and my (obviously very strong) opinions on
what is more important (and thus the git "mental model").
Linus
More information about the bazaar
mailing list