VCS comparison table
Linus Torvalds
torvalds at osdl.org
Thu Oct 19 21:47:53 BST 2006
On Thu, 19 Oct 2006, Matthieu Moy wrote:
>
> By curiosity, how would you compare git and Bitkeeper, on a purely
> technical basis? (not asking for a detailed comparison, but an "X is
> globaly/much/terribly/not better than Y" kind of statement ;-) )
I think git is better for kernel work these days, but a large portion of
that is that a lot of the features have literally been tweaked for us (for
very obvious reasons).
For example, the whole "rebase" thing (or explicitly making cherry-picking
easy) is something that a number of kernel people do, and even if I have
to admit to not liking the practice very much (it kind of hides the "true"
development history), it does have huge advantages, and it makes history a
lot easier to read.
Similarly, I often used the single-file graphical history viewing in BK
("revtool"), but being able to follow the history of multiple files as one
"entity" really is something that once you get used to, it's really really
hard going back, and "gitk" does generate a much more readable graph.
And I think the git way of doing branches is just simply superior. Git
always did branches in the sense that the way merges happened you _always_
had several heads, but actually making them available and switching
between them was something that wasn't my idea, and that I even was a bit
apprehensive about. I was wrong. Git branches are branches done right. I
just don't see how you _could_ do them better.
That said, a lot of the features I like and _I_ consider really important
are possibly not that important to others. For example, maybe nobody else
really cares about viewing the history of a particular subsystem, the way
I do. For a lot of people, single-file is probably ok.
For example, while git now does "annotate" (or "blame"), it's not
lightning fast, and I simply don't care. Doing a
git blame kernel/sched.c
takes about three seconds for me, and that's on a pretty good machine (and
on the kernel tree, which for me is always in the cache ;). Quite frankly,
if I cared deeply about that kind of annotation, I'd probably be upset
about it. There are basically _no_ other git operations that take that
long. I can get the _full_ log of the last 18 months of the kernel much
faster than that.
And the slowness of annotate comes directly from the design of git, and
from the fact that it's not how I tend to look at changes. Rather than
doing "git blame kernel/sched.c", I'm _much_ more likely to just do
git log -p kernel/sched.c
and see the changes as individual patches instead (and perhaps search for
some pattern that I'm looking for by just literally using a regex in the
pager).
Also, the fact that you need to repack the archive every once in a while
doesn't disturb me. I probably end up repacking the kernel almost daily,
which is _waay_ excessive, but it's just become habit of mine. I've seen
people who really don't like it, and I've also seen people who apparently
never even realized that they should do an occasional "git repack -a -d",
and then they have hundreds of thousands of loose objects and wonder why
the performance is so bad ;)
BK never had these issues. BK always kept things "packed", which made a
lot of operations much slower ("bk undo" was painfully slow). BK could
annotate quickly, since it was really a file-based history, in a way that
git fundamentally isn't, and can never be (and I don't _want_ it to be,
but it means that "annotate" is slow).
And BK had some great tools. The merge tool was superior ("bk resolve"? I
forget). The patch-application tool was great.
But both of those tools are things that git doesn't have, for _another_
reason: the way git works, you don't really need them. For example, the
patch application tool was great, but the biggest reason it was needed in
the first place was tracking renames explicitly.
In that kind of environment, you have serious problems with patches, and
you actually _need_ a tool to let the user explain when something is a
rename and when it isn't. With git not tracking renames, the patch
application tool simply isn't needed.
The same goes to some degree to "bk resolve". Because git has the index,
and you can _leave_ things unresolved in the index, you don't need a
graphical tool to resolve things - git knows very fundamentally about
incomplete merges _and_ about multiple branches (which you need in order
to keep track of both the branch you merge from and the branch you merge
into), and it's fine to resolve any conflicts in the normal working tree.
So for at least _my_ usage, git does everything very well, but that's
because if it didn't fit me, I fixed it until it did.
And "git bisect" really does rock. I still cannot believe that apparently
nobody did it before us. It's such a useful thing, and it works so well in
unambiguous cases (and not all cases are that unambiguous, but an
appreciably large subset is).
So that said, git does work very well for us, but I do want to end on a
note on thigns that BitKeeper did and nobody else has:
- Larry was first. The undeniable fact is, that before BK (and for
several years _after_ BK), the open-source alternatives were just CRAP.
You can say anything you like about his personality, but dammit,
compared to Larry, most people I know are idiots. People don't give BK
the credit it deserves. When Tridge "reverse-engineered" it, people
were making jokes about how trivial some of the protocols were. That
misses the point ENTIRELY. The point is, compared to BK, everything
else absolutely _sucked_, and BK really was a watershed program.
Never EVER underestimate how important BK was. Quite frankly, I think
most open-source SCM's _still_ suck. I'm constantly amazed that anybody
would touch SVN with a ten-foot pole. Talk about crap. And SVN is at
least usable, unlike a lot of other projects.
- When I did git, one of the things that actually _helped_ me was that I
was consciously trying to not do a BK clone. I wanted to do the same
things that BK did, but I very much did _not_ want to do them the _way_
BK did them. I respect Larry too much, and I didn't want there to be
any question about git being just a "clone".
So a lot of the git design ended up very much trying to avoid old
designs on purpose, and I think that really helped. The fact that I
didn't have a background in SCM's, and that I thought all the weaves
etc were confusing, meant that I instead went for a radically different
way of doing things.
And I'm 100% convinced that "radically different" was the right thing
to do. That was what allowed git to really soar. A lot of the good
things in git come exactly from the fact that git does _not_ do things
like most traditional SCM's do. But BK should still get a lot of
credit, because it was what taught me (and a lot of other people) what
being "distributed" really meant.
- On a more personal note: people say that BK showed the "failure" of
using a commercial closed-source program. I would disagree. Not only
did the kernel get a whole lot of useful work out of BK, we learnt how
distributed systems _should_ work, and quite frankly, I'd do ít all
over again in a heartbeat.
If there was a "failure" in the BK saga, it was in how horrendously
_bad_ all open-source SCM's were, even with BK showing how it should
have been done for several years. THAT is the failure. The fact that
there were hundreds of people who whined about BK, and nobody really
did anything productive.
Now, I'm obviously biased, but I really do believe that git is the best
open-source SCM there is, by a _mile_. I don't know how many people
realize this, but we literally haven't changed our data formats in over a
year. I was looking at my old git import of the BKCVS tree today, because
I wanted to look up the "BKrev" format for the email earlier in this tree,
and I realized that the pack-file was from July of last year. That's
within a few _weeks_ of the pack-file being introduced at all, and guess
what? It all still worked. No "on-the-fly format conversion", no
_nothing_. It just worked.
That should tell people something. It's pretty much the fastest SCM out
there (and yeah, that's on almost any operation you can name), it still
has the smallest disk footprint I've ever heard of, and it hasn't had the
"format of the week" disease that every other project seems to go through.
And it's used in production settings on some of the biggest projects out
there. SVN has more users, but let's face it, SVN really isn't even in the
running. Technology-wise, the thing is just not worth bothering with, but
it's a good crutch for people who are used to CVS and never want to use
anything lse.
Am I happy with git? I'm happy as a clam. It turned out even better than I
ever thought it would. And BK was what taught me what to aim for.
Linus
More information about the bazaar
mailing list