Storage internals: UUID

Mon Jun 4 22:19:32 UTC 2012

Hi John,

On Monday, June 04, 2012 at 10:17 PM, John Meinel <john at arbash-meinel.com> wrote:

> We don't use the sha hash for a variety of reasons. We do track 
> the sha1 hash of revisions for integrity/security checking.  Some of the 
> reasons to use a separate identifier:
> 
> 1) you can pick an identifier before you finish with the revision. This
> let's you write things like indexes while you are writing out the data.

This is probably a stupid question, but why is this important? Does it help with speed or something? Clearly it must be important because apparently hg and git have thought about it too...

> Hg 'cheats' this by using a reference of "the revision at position  $int".

Sorry to ask an hg question in a bzr forum, but how does this work? I don't get what you are saying here.

> Git handles it by not having the concept of an individual file history. You have to infer Fe 
> history by walking through the inventory info.

Interesting. I always thought that not tracking files was just a weird indiosyncracy.

> 2) Reflection of data in 3rd party storage. via bzr-svn/git/hg we are able
> to treat other vcs as another bzr compatible branch. (Eg you can use bzr
> log "svn://...."). It is similar to using a map file, but the mapping is
> stored as the identifier, rather than having to transmit, store, share
> another file.

Ok, I think I see how this works: You can have an index file that maps (say) SVN IDs to bzr IDs. For hg to do something like that, it would have to pull the entire SVN history and do SHA1 sums the whole way through. Am I right?

> 3) Along those lines, it let's you talk about revisions that you've never
> seen. So if it gets converted in the future, it gets auto-grafted into the
> right location in history.

This is a little over my head. I'm not sure what "auto-grafted" means. Can you explain again what you mean?

> 4) it decouples your identifiers from their current representation. If, for
> example, git decided it really wanted their tree entry to be in XML, they
> would have to regenerate the sha hashes for the whole history. And 
> without a map file, you couldn't incrementally pull in more data from 
> another person who branched from somewhere in your history. (Format 
> upgrades can be done independently by different users and over time, not
> in lock-step).

Thanks.

> There are downsides to it, but it isn't just that we do it differently
> without cause. It is some explicit choices along the way.
> John

I appreciate you taking the time to explain. This is all very interesting. I would like to understand what bzr does to ensure the integrity of the repository. I am coming from Mercurial. I am interested in security, reliability, etc. Though I like Mercurial, I am starting a new project and I'm eager to use this chance to learn a new VCS.

Cheers,
Daniel.