[RFC]New style of revision id [Was Re: VCS comparison table]

John Arbash Meinel john at arbash-meinel.com
Tue Oct 24 17:15:48 BST 2006


Martin Pool wrote:
> On Tue, 2006-10-24 at 00:17 +0200, Goffredo Baroncelli wrote:
>> But I agree with Linus that the testament ( the bzr checksum ) isn't very 
>> integrated with the usual bazaar workflow as the git "revision-ID".
>>
>> From a bazaar developer point of view I think if it is possible to switch from 
>> a pseudo-random revidion id, to a checksum basis revision id: the checksum 
>> can be computed on the basis of the sha1 of the files, and 
>> timestamp/commiter/parent-revision(s)/properties.
>>
>> This new style of revid can be in the form
>> <user>@<host>-<date>-<cksum>
>>
>> Yes, the first three field are redundant: but so the change isn't too 
>> dramatic ! :-)
> 
> I think storing or naming those objects by their hash is a pretty
> interesting idea, and I've warmed to it after this discussion.  As you
> say it should pretty much drop in to the existing framework.
> 

The biggest problem I see is that you can't know your final revision id
until you have done all the work. Which is a place where hg has lots of
issues with their indexes.

Either pre-compute all of the work, figure out your final hash, and then
go back and start writing, or you write as you go, but then have to go
back at the end, and rewrite the indexes to include the correct revision id.

And since the filesystem *might* be changing as you go along, you have
to store a pristine copy of any files you would be adding to the repository.

I'm not sure if git stores any back pointers (pointer from the file text
up into what manifest it would be associated with).

hg stores them by just storing an index, and it always just stores the
*next* index because of how it is layed out. Which works most of the
time, but causes problems if you ever have a repeated hash, because now
you have 1 think that should be pointing at 2 different revisions.

My best guess is that git always has a top down approach. So to do the
log of changes for a file you have to unpack the manifest and see if
that file is listed. Rather than reading the index for the file changes,
and going back to the commit information from there.

Having arbitrary revision ids means you can have a handle before you
start doing any committing, and have it apply cleanly all the way out.

So unfortunately, it isn't a simple drop-in replacement. We could
possibly have a look-aside naming scheme. So that after a revision has
been committed, we compute the hash, and then it can be accessed by
either name.

Also, git hashes include the hashes for their parents, which means that
you need an unbroken chain back to the NULL revision. In other words,
you can't have ghosts. Or at least, no ghosts whose hash you don't know.
(Though you wouldn't know what handle to give them if you didn't know
their hash).

I realize ghost support isn't something super critical, and it may be
something worth getting rid of, in exchange for the hash security. But
this came up a long time ago when we were being strict about the sha1
value in the Revision texts, and we were discussing whether they should
include the parent references or not.

We got rid of them because you cannot change the serialization without
affecting the final values. Which is why we went for Testaments.

So in 'git', if you ever tried a different algorithm for laying out your
meta information (manifest, inventory, what have you), suddenly all of
your "revision ids" change. And my new format git branch can't talk
(well) to your git branch. There are some compatibility possibilities,
but with git, it has to be 100% correct from the start, because an
upgrade is going to potentially break lots of stuff.

Now maybe git did get it all right. It is possible. Though I'm wondering
if there are people wishing for feature X, but it just isn't possible
without breaking stuff. (And further, it isn't something Linus needs, so
it won't go into *his* workflow...)

Anyway, content addressable namespaces do have some neat stuff, but I'm
not convinced that they are the perfect solution.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061024/0196f2b9/attachment.pgp 


More information about the bazaar mailing list