VCS comparison table
Linus Torvalds
torvalds at osdl.org
Mon Oct 23 20:18:06 BST 2006
On Mon, 23 Oct 2006, Jelmer Vernooij wrote:
>
> Bzr stores a checksum of the commit separately from the revision id in
> the metadata of a revision. The revision is not used by itself to check
> the integrity of a revision.
That wasn't what I was trying to aim at - the problem is that the bzr
revision ID isn't "safe" in itself. Anybody can create a revision with the
same names - and they may both have checksums that match their own
revision, but you have no idea which one is "correct".
So you just have to trust the person that generates the name, to use a
proper name generation algorithm. You have to _trust_ that your 64-bit
random number really is random, for example. And that nobody is trying to
mess with your repo.
This isn't a problem in normal behaviour, but it's a problem in an attack
schenario: imagine somebody hacking the central server, and replacing the
repository with something that had all the same commit names, but one of
the revisions was changed to introduce a nasty backhole problem. Change
all the checksums to match too..
It would _look_ fine to somebody who fetches an update, and the maintainer
might not ever even notice (because he wouldn't send the _old_ revision
again, and _his_ tree would be fine, so he'd happily continue to to send
out new revisions on top of the bad one on the public site, never even
realizing that people are fetching something that doesn't match what he is
pushing).
In contrast, in git, if you replace something in a git repository, the
name changes, and if I were to try to push an update on top of a broken
repo like that, it simply wouldn't work - I couldn't fast-forward my own
branch, because it's no longer a proper subset of what I'm trying to send.
So in git, you can _trust_ the names. They actually self-verify. You can't
have maliciously made-up names that point to something else than what they
are.
[ Also, as a result, and related to this same issue: the git protocol
actually never sends object names when sending the object itself. It
just sends the object data, and the _recipient_ generates the name from
that.
So you can't do the _other_ kind of spoofing, and make a repository that
_claims_ to have one name and the data would differ - because if you do
that, anybody who pulls from the spoofed repository will re-create
different names than you claimed, and won't even be able to pull such a
malicious repository. ]
Linus
More information about the bazaar
mailing list