please check out weave-format branch

John A Meinel john at arbash-meinel.com
Fri Sep 23 17:59:58 BST 2005


Martin Pool wrote:
> On 23/09/05, John Arbash Meinel <john at arbash-meinel.com> wrote:
>
>
>>3) You removed "parent_sha1" from the revision XML. So now we reference
>>a parent only by it's revision id, rather than both id and sha1 hash. I
>>realize that it makes upgrading easier (since you no longer have a hash
>>which is invalidated), but it raises potential security concerns. These
>>may not be major, but you can fake the ancestry if you remove the sha1.
>
>
> Yes, I've thought the same things.  Having it there does feel like it
> gives some protection, but it's hard to say against exactly what.
>
> The difficulty comes not just when you upgrade your own branch, but
> also if I've upgraded and then you pull from me -- all of those
> revisions you pull in will have the wrong hashes.

Well, I would say we need to come up with a canonical form for
revisions. Because we are going to be doing GPG signing at some point,
and after that, we can't really be changing the sha hashes.

I know Aaron has brought up the idea of how to handle if we want to
switch to a new hash (say sha256). I'm thinking that we can handle it,
by stating, If you are checking the sha256, then all other hashes are
also sha256. So sha256 doesn't see the sha1 hashes.
>
> If we want to support references to revisions whose value is not known
> (which some people call "ghost" revisions) then this gets even more
> difficult.  Of course we don't know the sha of the ghost revision, and
> if we did find it out we'd have to redo everything from that point
> forward.

I don't really agree with this. Because at some point, the "ghost"
revision was not a ghost. So at that point you did have the sha. And
when you tell someone else about it, you just need to tell them both
bits of information. So if the ghost comes in by a changeset, or just a
missing revision, you still should be able to have the sha hash as a
last known value.

But, there is another alternative. Store the sha hash, but don't include
it as part of a new hash. So the canonical form for a Revision includes
the committer, revision id, inventory hash, parent ids, but not the
parent hashes. You lose the nice property of encapsulating all of the
history in a single hash, but it means that upgrading doesn't break as
much stuff.

>
>
>>If a hacker can get his version of your parent injected into the system,
>> then he can change the ancestry. At first, this just seems that it
>>would mess up the merge command, since it can't find an appropriate
>>merge base. But also if you ever just try to do "merge just the changes
>>for this revision against its parent" (cherry picking, which bzr may or
>>may not ever support), then the hacker has quite a bit of freedom about
>>what sort of diff would be created. There isn't a lot of freedom, and I
>>don't know what kind of dangerous stuff could be done, but it seems like
>>a potential leak.
>
>
> I think the goal would be to prevent it getting in in the first place
> by checking a signature on that revision.

Sure, but signatures are hashes too :)

>
>
>>I kind of liked the fact that for a given revision, all of the ancestry
>>up to that point was contained in its hash. Because it had a hash of its
>>parents, who have a hash of their parents, all the way back to the Null
>>revision.
>
>
> It is an elegant property.

I think if the design were settled, then it would be a good idea to use
it. But since we are still in flux, it is probably too strong of a bond.

So there are a few changes I would like to see. First, the
"get_revision_sha1" should not be a property of Branch, it should be a
member of Revision. So a Revision can know what its canonical form is,
and return the correct sha. So you could have:

class Revision(object):

	def _canonical_form(self, hash='sha1'):
		# Do some processing here to figure out
		# what the pure form is. It probably is not the
		# same format as what is written to the xml file
		# we probably need to know what hash methods
		# we will be using, so that we can get the right
		# inventory hash, etc.

	def get_sha1(self):
		import sha
		sum = sha(self._canonical_form())
		return sum.digest()

	def get_sha256(self):
		... something similar

The same thing should be done for inventory objects, since they will
need a canonical form as well.

John
=:->

>
> --
> Martin
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050923/4e7dd7c0/attachment.pgp 


More information about the bazaar mailing list