Storage internals: UUID

Wed Jun 13 01:11:20 UTC 2012

On Tuesday, June 12, 2012 at 10:34 PM, Mark Grandi <markgrandi at gmail.com> wrote:
>
> Yeah, that makes sense. However, in the git and hg side, it seems that
> the hashes are chained, so revision 2 has the revision id of revision
> 1 as part of the sha1 hash, so thats easy to verify all the way back
> to the root of the tree, unless you rebase it (which someone talked
> about earlier on in this thread, about how rebasing can give a commit
> a new id), so i don't see how someone can change a commit in a 
> gitrepo without effecting the revision ids of everything that comes after it,

You get the same benefit either way, and with the same caveats:

1. With Hg / Git, if you change a file, this is reflected in the revision ID for that revision and all the children.

2. With Bzr, if you change a file, this is reflected in the testament in that revision and all children so long as that change remains on the tree.

In either case, you can detect a corrupt tree, but only if you have a copy of the correct ID or testament. If you don't have a copy of the SHA1 hash from a source that you trust, you can just as easily get a malicious tree from Hg / Git as you can from Bzr... That said, there is one thing that Hg / Git can detect that a testament can't: A change to the history that in the end leaves the repository intact. Imagine that you have a repository with the following revisions:

A -> B -> C -> D -> E -> F

Imagine that after "A" I insert a malicious revision "Q". The tree will look like this:

A -> Q -> B* -> C* -> D* -> E* -> F*

The '*' denotes the fact that the testament is different, so the insertion is detectable. Now imagine that after 'C*' I insert another revision "R" that changes back all the changes made by "Q", completely removing the inserted code. Now the repository will look like this:

A -> Q -> B* -> C* -> R -> D* -> E -> F

In revision D* you can still recognize that something went wrong because the revision id of the parent is not the same as that of revision "D". But after that, revisions E and F are indistinguishable from those of the original history. In this way, Hg and Git protect the integrity of not only the tree, but also the history leading up to it (assuming that you don't rebase, which I think Git people do often).

> Gpg signatures seem like those would work fine, but then how does 
> anyone verify which gpg key is the right one?

This can be as simple or as complex as you want to make it... For example, you could post it everywhere so that at some point "everyone" knows your GPG key and there's no feasible way for someone to hack the server with your website, and your facebook account, and all the signatures in the mailing list archive that contain a copy of your key... For the more paranoid, you could wait to meet me in person and ask for two pieces of ID including a passport before agreeing to accept the piece of paper I'm handing to you as an accurate copy of my GPG key... Somewhere in between, you can rely on the web of trust system. People hold "key signing parties" where they verify each other's GPG keys and sign them, and they build a network. The web of trust works somewhat like this:

1. You've been to a key-signing party where you met Bob in person, checked is ID, and signed his key. You also trust Bob.

2. Bob was to another party where he met Alice, whom he trusts, and he signed her key.

3. Alice was to another party where she met me, checked my ID and signed my key.

Now my key has Alice's signature, Bob trusts Alice and you trust Bob. So, through the You -> Bob -> Alice -> Me chain, you can have a copy of my GPG key with a level of confidence that matches your security requirements (after all, it is up to you to decide whom you trust, and how far down the chain you will trust). This whole concept of web of trust is already part of the GPG software.

I'm not making this up. My wife has been to key signing parties (though not for GPG, it was something else).

> Not to mention that its not  a requirement to sign commits anyway, so any
> unsigned ones are not protected at all.

All of these problems have solutions that involve different levels of security vs convenience. It's a trade-off, and GPG allows you to choose where you draw the line:

* GPG allows you to be strict and only accept keys of people whom you've met personally and for whom you've personally checked three pieces of ID and whom you've questioned personally.

* GPG allows you to be relaxed and trust a GPG key you got from my email signature.

* GPG allows many options between these extremes.

Cheers,
Daniel.