[ANNOUNCE] Example Cogito Addon - cogito-bundle

Sat Oct 21 00:59:40 BST 2006

On Fri, 20 Oct 2006, Jeff Licquia wrote:
> 
> After this conflict is resolved, merging from b causes conflicts, while
> merging from c appears to work fine.  This continues until b merges from
> a (and resolves a conflict in a similar manner to a), at which time
> merging/pulling works as you'd expect between the branches.  Whenever b
> is marked as conflicting before it merges from a, bzr preserves b's
> changes by moving b's modified file.

This sounds somewhat like what I think BK did. I'm not sure if BK actually 
marked it as a conflict or whether BK just warned about "changes to 
deleted file" or something similar, but it didn't entirely _silently_ 
throw them away.

But I hope this shows some of the basic problems.

The much more _serious_ problem of "file identity" tracking is actually 
that you can't track partial file movement or file copies sanely. The 
thing is, tracking things at file boundaries simply is fundamnetally a 
broken notion, simply because _code_ doesn't get done at file boundaries.

Both of these things that git can actually do. Admittedly it does not do 
that in any _released_ version, so you'd have to work with the development 
branch, and it's a fairly early thing, but currently it can actually 
notice that our "revision.c" file largely came from the "rev-list.c" file 
that still exists!

And btw, that's not just some random feature that happened to get 
implemented last week. Yes, it actually _did_ get implemented last week, 
but this was something I outlined when I started git in April of last 
year, and tried to explain to people WHY TRACKING FILE ID'S ARE WRONG!

You can find me explaining these things to people in April-2005, which 
should tell you something: the initial revision of "git" was on Thursday, 
April 7. So the lack of file identity tracking has been controversial from 
the very beginning, but I was right then, and I'm right now.

Because the _fact_ is, that as long as you track stuff on a file basis, 
you're _never_ going to be able to do the things that git alreadt does, 
and that are very natural.

Here's the real-world example of something that git CAN DO TODAY:

 - we used to have a file called "rev-list.c", which did a lot of the 
   commit history revision traversal, and is the source of the git command 
   "git rev-list".

 - I (and others) extended it a lot, and turned it into a more generic 
   library interface, so that other commands could traverse the commit 
   graph on their own, rather than forking and executing "git-rev-list" 
   and piping the output between them.

 - as a result, the old "rev-list.c" still exists (except it was renamed 
   to "builtin-rev-list.c" since it's now a builtin command to the main 
   "git" binary). 

 - HOWEVER, a lot of the actual code got split into the library file, 
   called "revision.c", which contains the real smarts of the program.

See? There was a file rename involved (rev-list.c => builtin-rev-list.c), 
but that actually happened after a lot of the really _interesting_ code 
had been excised from that file, and put into the new internal library 
file (revision.c).

Now, as a result, in many ways the rename is _much_ less interesting than 
the question about the history of the code in "revision.c" (because that's 
really some very core code). And that was never a rename at all. That was 
just a file create, where a lot of the contents happened to come from a 
file that continued to exist.

Wouldn't you want "annotate" to be able to follow this kind of data 
movement? Notice how there is no "file" that moved at all. Only code that 
moved between files.

I tell you: as long as you work with "file ID's", you'll always be 
inferior. You'll never be able to see that some code was copied 
_partially_ from one file into another. You'll never be able to see an 
important function moving between file boundaries.

Unless you work with "git", that is. Because git isn't so _stupid_ as to 
think that file boundaries matter. Git knows better. The only thing that 
matters is the actual _data_, and file boundaries are just one way of 
delimiting that data.

Just try it out. Get the "next" branch of the git repository (that's the 
"stable development" branch in git.git - ie it's going to be in the next 
release and is expected to work, unless some of the more "experimental 
development" that is in the "pu" branch - pu = proposed updates), compile 
it, and run

	git pickaxe -C revision.c | less -S

and marvel. Marvel at my shining intelligence (and the small matter of 
programming, which was all done by Junio, but I'm taking all the credit 
_anyway_, because *dammit* I talked about this last year when people 
didn't understand! And besides, I always take all the credit regardless, 
so what are you whining about? Get off my back!).

More seriously, Junio really did a kick-ass job. I really had nothing at 
all to do with it, and deserve no real credit. But I _did_ forsee it, and 
yes, it really is about the fact that git tracks _contents_.

As somebody smarter that I have said (*): "I'm always right, but this time 
I'm even more right than usual".

			Linus

(*) Just kidding. It was me. Of course.