VCS comparison table

Vincent Ladeuil v.ladeuil+lp at
Thu Oct 26 17:04:50 BST 2006

>>>>> "Linus" == Linus Torvalds <torvalds at> writes:

    Linus> On Thu, 26 Oct 2006, Vincent Ladeuil wrote:
    >> Ok, so git make a distinction between the commit (code created by
    >> someone) and the tree (code only).
    >> Commits are defined by their parents.

    Linus> Commits are defined by a _combination_ of:

    Linus>  - the tree they commit (which is recursive, so the
    Linus>  commit name indirectly includes information EVERY
    Linus>  SINGLE BIT in the whole tree, in every single file)

And here you keep that separate from any SCM related info,
right ?

    Linus>  - the parent(s) if any (which is also recursive, so
    Linus>  the commit name indirectly includes information about
    Linus>  EVERY SINGLE BIT in not just the current tree, but
    Linus>  every tree in the history, and every commit that is
    Linus>  reachable from it)

    Linus>  - the author, committer, and dates of each (and
    Linus>  committer is actually very often different from
    Linus>  author)

    Linus>  - the actual commit message

    Linus> So a commit really names - uniquely and authoratively
    Linus> - not just the commit itself, but everything ever
    Linus> associated with it.

Thanks for the clarification. But no need to shout about EVERY
SINGLE BIT, the pointer to BDDs was already talking a bit about
bits :) 

But I agree, this is the important point that may be missed.

    >> Trees are defined by their content only ?

    Linus> Where "contents" does include names and
    Linus> permissions/types (eg execute bit and symlink etc).

Which can also be expressed as: "Everything the user can
manipulate outside the SCM context", right ?

    >> If that's the case, how do you proceed ? 

    Linus> If you compare the commit name, and they are equal,
    Linus> you automatically know

    Linus>  - the trees are 100% identical
    Linus>  - the histories are 100% identical

And that's the only info you can get, no ordering here. (Just
pointing the obvious, as soon as you try to put more info into
the signature, the equality will vanish).

But for various optimizations this equality property is the only
needed one.

Do we agree ?

    Linus> If you only care about the actual tree, you compare
    Linus> the tree name for equality, ie you can do

    Linus> 	git-rev-parse commit1^{tree} commit2^{tree}

    Linus> and compare the two: if and only if they are equal are
    Linus> the actual contents 100% equal.

Actually, that's backwards:

"their actual contents are equal" implies "their signatures are

But, two totally different trees can have the same signature.

My god ! What an horror ! Not. I even wonder if I will live so
long as to see it occurs... So we *can* pretend that:

"theirs signatures are equal" is equivalent to "their contents
are equal"

And that's all we care :)

But I digressed, the question was about a detail on your tree
definition, once the signature is defined to be unique (as in
canonical), the property of comparing the signatures as if they
were the objects themselves follows. Thanks for the confirmation.

    >> Calculate a sha1 representing the content (or the content
    >> of the diff from parent) of all the files and dirs in the
    >> tree ?  Or from the sha1s of the files and dirs themselves
    >> recursively based on sha1s of the files and dirs they
    >> contain ?

    Linus> The latter. 

Thanks for providing the clarification. So of course, finding the
differences between the trees is quick, you can prune anywhere
the signatures equality is verified.

    >> I ask because the later seems to provide some nice effects
    >> similar to what makes BDD
    >> ( so
    >> efficient: you can compare graphs of any complexity or size in
    >> O(1) by just comparing their signatures.

    Linus> This is exactly what git does. You can compare entire
    Linus> trees (and subdirectories are just other trees) by
    Linus> just comparing 20 bytes of information.

I understand that, years ago even. I have a bit of practice with
BDDs and I am accustomed to that so lovely property. But without
that practice, I think most people will just wonder...


    Linus> And the reason it's fast is that we can compare 20,000
    Linus> files (names, contents, permissions) by just comparing
    Linus> a _single_ 20-byte SHA1.

Yeah, let's go further ! We can compare gazillions of files and
their history since epoch by comparing _two_ signatures ! :-)

    Linus> In git, revision names (and _everything_ has a
    Linus> revision name: commits, trees, blobs, tags) really
    Linus> have meaning. They're not just random noise.

I know that effect, but I understand people complaining that they
*look* like noise. 

I'm still searching a parallel in nature, but the best I could
find is DNA, ever look at a DNA ? 

Looks like noise no ? No ordering either between parents and
children... But there is a way to identify a parent from the DNA
of a children...


More information about the bazaar mailing list