VCS comparison table
Vincent Ladeuil
v.ladeuil+lp at free.fr
Thu Oct 26 17:04:50 BST 2006
>>>>> "Linus" == Linus Torvalds <torvalds at osdl.org> writes:
Linus> On Thu, 26 Oct 2006, Vincent Ladeuil wrote:
>>
>> Ok, so git make a distinction between the commit (code created by
>> someone) and the tree (code only).
>>
>> Commits are defined by their parents.
Linus> Commits are defined by a _combination_ of:
Linus> - the tree they commit (which is recursive, so the
Linus> commit name indirectly includes information EVERY
Linus> SINGLE BIT in the whole tree, in every single file)
And here you keep that separate from any SCM related info,
right ?
Linus> - the parent(s) if any (which is also recursive, so
Linus> the commit name indirectly includes information about
Linus> EVERY SINGLE BIT in not just the current tree, but
Linus> every tree in the history, and every commit that is
Linus> reachable from it)
Linus> - the author, committer, and dates of each (and
Linus> committer is actually very often different from
Linus> author)
Linus> - the actual commit message
Linus> So a commit really names - uniquely and authoratively
Linus> - not just the commit itself, but everything ever
Linus> associated with it.
Thanks for the clarification. But no need to shout about EVERY
SINGLE BIT, the pointer to BDDs was already talking a bit about
bits :)
But I agree, this is the important point that may be missed.
>> Trees are defined by their content only ?
Linus> Where "contents" does include names and
Linus> permissions/types (eg execute bit and symlink etc).
Which can also be expressed as: "Everything the user can
manipulate outside the SCM context", right ?
>> If that's the case, how do you proceed ?
Linus> If you compare the commit name, and they are equal,
Linus> you automatically know
Linus> - the trees are 100% identical
Linus> - the histories are 100% identical
And that's the only info you can get, no ordering here. (Just
pointing the obvious, as soon as you try to put more info into
the signature, the equality will vanish).
But for various optimizations this equality property is the only
needed one.
Do we agree ?
Linus> If you only care about the actual tree, you compare
Linus> the tree name for equality, ie you can do
Linus> git-rev-parse commit1^{tree} commit2^{tree}
Linus> and compare the two: if and only if they are equal are
Linus> the actual contents 100% equal.
Actually, that's backwards:
"their actual contents are equal" implies "their signatures are
equal".
But, two totally different trees can have the same signature.
My god ! What an horror ! Not. I even wonder if I will live so
long as to see it occurs... So we *can* pretend that:
"theirs signatures are equal" is equivalent to "their contents
are equal"
And that's all we care :)
But I digressed, the question was about a detail on your tree
definition, once the signature is defined to be unique (as in
canonical), the property of comparing the signatures as if they
were the objects themselves follows. Thanks for the confirmation.
>> Calculate a sha1 representing the content (or the content
>> of the diff from parent) of all the files and dirs in the
>> tree ? Or from the sha1s of the files and dirs themselves
>> recursively based on sha1s of the files and dirs they
>> contain ?
Linus> The latter.
Thanks for providing the clarification. So of course, finding the
differences between the trees is quick, you can prune anywhere
the signatures equality is verified.
>> I ask because the later seems to provide some nice effects
>> similar to what makes BDD
>> (http://en.wikipedia.org/wiki/Binary_decision_diagram) so
>> efficient: you can compare graphs of any complexity or size in
>> O(1) by just comparing their signatures.
Linus> This is exactly what git does. You can compare entire
Linus> trees (and subdirectories are just other trees) by
Linus> just comparing 20 bytes of information.
I understand that, years ago even. I have a bit of practice with
BDDs and I am accustomed to that so lovely property. But without
that practice, I think most people will just wonder...
<snip/>
Linus> And the reason it's fast is that we can compare 20,000
Linus> files (names, contents, permissions) by just comparing
Linus> a _single_ 20-byte SHA1.
Yeah, let's go further ! We can compare gazillions of files and
their history since epoch by comparing _two_ signatures ! :-)
Linus> In git, revision names (and _everything_ has a
Linus> revision name: commits, trees, blobs, tags) really
Linus> have meaning. They're not just random noise.
I know that effect, but I understand people complaining that they
*look* like noise.
I'm still searching a parallel in nature, but the best I could
find is DNA, ever look at a DNA ?
Looks like noise no ? No ordering either between parents and
children... But there is a way to identify a parent from the DNA
of a children...
Vincent
More information about the bazaar
mailing list