Storage internals: UUID

Tue Jun 5 16:49:46 UTC 2012

On 04/06/12 23:19, Daniel Carrera wrote:
> Hi John,
> 
> On Monday, June 04, 2012 at 10:17 PM, John Meinel <john at arbash-meinel.com> wrote:
> 
>> We don't use the sha hash for a variety of reasons. We do track 
>> the sha1 hash of revisions for integrity/security checking.  Some of the 
>> reasons to use a separate identifier:
>>
>> 1) you can pick an identifier before you finish with the revision. This
>> let's you write things like indexes while you are writing out the data.
> 
> This is probably a stupid question, but why is this important? Does
> it help with speed or something?

You can debate how much benefit it gives, but there's clearly some
amount of benefit in not needing to defer finalization of various data
structures because you don't know what the ID of something is going to
be before you completed it.

> Clearly it must be important because
> apparently hg and git have thought about it too...

Is it clear? hg and git happen to have made the same choice. It's not
self-evident how much research went into making that choice.

>> Hg 'cheats' this by using a reference of "the revision at position  $int".
> 
> Sorry to ask an hg question in a bzr forum, but how does this work?
> I don't get what you are saying here.

I'm no expert on Hg, but I'd interpret John's words to mean that Hg has
to define a secondary means of identifying a revision based on the byte
offset at which it begins on disk, to work around its primary id form
being unavailable until later.

>> 2) Reflection of data in 3rd party storage. via bzr-svn/git/hg we are able
>> to treat other vcs as another bzr compatible branch. (Eg you can use bzr
>> log "svn://...."). It is similar to using a map file, but the mapping is
>> stored as the identifier, rather than having to transmit, store, share
>> another file.
> 
> Ok, I think I see how this works: You can have an index file that maps (say) SVN IDs to bzr IDs.

No, the point is that there doesn't need to be such an index file at
all; instead we can define a mapping that says the Subversion revision
consisting of changes in a repository with uuid A on branch B in revnum
C has a Bazaar revision ID of "svn-v4:A:B:C". And then we can use those
IDs even though we never downloaded the full content of the revision in
question at all.

> For hg to do something like that, it would have to pull the entire
> SVN history and do SHA1 sums the whole way through. Am I right?

Yes.

>> 3) Along those lines, it let's you talk about revisions that you've never
>> seen. So if it gets converted in the future, it gets auto-grafted into the
>> right location in history.
> 
> This is a little over my head. I'm not sure what "auto-grafted" means. Can you explain again what you mean?

Suppose I convert just the trunk of a project from Subversion, but
whatever I do the convert with knows enough to record that a certain
revision in corporates a merge from a branch revision.

Later, I change my mind, go back and convert the branch. The additional
history naturally associates itself with the previous trunk conversion.

Max.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20120605/8a26a1a2/attachment.pgp>