Storage internals: UUID

Daniel Carrera dcarrera at hush.com
Mon Jun 4 19:15:29 UTC 2012


Hello,

I'm interested in getting to know how Bazaar stores data internally (links would be welcome, the "developer" pages seem to cover more of the API rather than how things work). I have read in some forum somewhere that bzr uses UUIDs instead of SHA1 hashes like Mercurial and Git. If this is correct, I'd like to ask a few questions:

1. Why was the decision made for UUIDs instead of SHA1? What were the pros and cons discussed?

2. Which version of UUID does bzr use? There are three versions. Versions 3 and 5 use hashes (MD5 and SHA1 resp) and as I understand it, there is no set rule as to how to generate the hash. In other words, my impression is that it is legal to take a SHA1 of the revision contents and metadata and use that to produce the UUID. In fact, I wonder if this might be what bzr already does.

3. Is anyone watching the evolution of the SHA3 specification? NIST is supposed to select the SHA3 algorithm this year. This means that the next revision of the bzr format could use the freshly minted SHA3 algorithm for its UUIDs. You don't have to wait for RFC 4122 to be updated. In their wisdom, the creators of UUID included version 4 which is "random". Since SHA3 is a valid pseudo-random number generator, you could use SHA3 to make the UUID.

Thoughts?

Cheers,
Daniel.
-- 
Linux: Because rebooting is for installing hardware.




More information about the bazaar mailing list