Storage internals: UUID

Max Bowsher _ at maxb.eu
Tue Jun 5 16:33:57 UTC 2012


On 04/06/12 23:18, Daniel Carrera wrote:
> On Monday, June 04, 2012 at 10:01 PM, Max Bowsher <_ at maxb.eu> wrote:
>> Sorry, but it is incorrect. Bazaar uses IDs, and those IDs are
>> constructed such that there there's good justification to believe 
>> them to be universally unique, but they're not UUIDs in the sense of the
>> Internet Draft.
> 
> Ok. Thanks. Can you give me details?  Links to docs are welcome.

I don't think the actual method by which the IDs are generated is
documented anywhere. If that sounds bad to you, please take note that as
far as the format is concerned, they're simply unique opaque strings.

But, if you do want to dig further, the ID generation all happens in
bzrlib/generate_ids.py in the Bazaar source tree.


>> I can't speak to the initial design decision, which was many years
>> before my time, but a couple of advantages that come to mind:
>>
>> * the ID is not inextricably tied to the binary format of the revision
> 
> Why is that an advantage?

This part was addressed in John Meinel's response in this thread; I
don't have anything to add beyond what he said.


>> There's no hashing in Bazaar's IDs, so this isn't particularly 
>> relevant.
> 
> How can bzr guarantee that a commit is not tampered if the ID doesn't
> contain a hash? This discussion would be easier if I had documentation
> that I could read, but I assume you have some sort of index file that
> maps IDs to hashes and locations where the revision data can be found.
> It seems trivial to change the revision data and change the hash in the
> index without altering the ID.

There is no index file mapping IDs to hashes.

For circumstances where tersely identifying the content of a revision is
desired, Bazaar defines the concept of a 'testament': a rendition of the
revision content as a block of text (including SHA-1s of each file in
the tree). This text block can be cryptographically signed to sign the
revision, or further reduced by taking the SHA-1 of the text.

Try executing 'bzr testament' and 'bzr testament --long' in some Bazaar
branch, and inspecting bzrlib/testament.py for more detail.

Bazaar simply rejects the concept that the primary identifier for a
revision should be a hash; a move which has made it considerably easier
to introduce changes to the data structures of a revision: if you look
in testament.py you will find there are actually three versions of
testament, the latter two reflecting the evolutions to include more
information.


> I think Monotone had a good idea in using revision numbers that use a
> cryptographic hash that depends on the revision contents, metadata and
> history. I think Git and Mercurial did well in copying that idea. I am
> under the impression that bzr is relatively willing to make changes to
> its storage format if there seems to be a benefit. Do you think that
> there is any chance that a future version of bzr will expand the ID to
> additionally contain a hash? For example:
> 
> <author>-<date>-<branch>-<secure-hash>
> 
> So the new ID could have whatever information it has today, and it'd
> just be made longer by adding a secure hash at the end. The ID would
> still provide useful information by inspection, but it'd still allow the
> same security guarantees of Monotone, Mercurial and Git.
> 
> Or am I barking up the wrong tree?

What you are doing, is arguing a case based on accepting as an axiomatic
truth that it is necessary and valuable to incorporate a cryptographic
hash of a revision's contents in a revision's primary identifier.

Bazaar does not accept this as an axiom. You'll need to justify your
thoughts on why this should be the case, instead of treating identity
and cryptographic integrity as two separate pieces of data.

In regards to your suggestion above, it would be nigh on impossible.
Bazaar makes use of the fact that you can know a revision ID before the
data that makes up that revision has been finalized.

Max.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20120605/e54d5972/attachment.pgp>


More information about the bazaar mailing list