Proposal: use predictable file-ids

Fri Aug 12 00:07:59 BST 2005

Aaron Bentley wrote:
> Hi all,
>
> I'd like to suggest that we make file-ids predictable.  There are two
> advantages I can see:
> 1. less human-unreadable stuff in changesets
> 2. it is easier to make imports from other SCMs produce identical
> branches every time.

I suppose. I think for (1) an easier and more beneficial change would be
to make text-ids predictable. Right now, the only time you need to
display file-ids is for an add. Since otherwise you have the base
revision, and all of the ids are there.

>
> My suggestion:
> REVSISION-ID/pathname/at/commit/time
>
> This is a UUID because revision ids are unique and pathnames are unique
> for a given revision.  If we can assign predictable IDs to revisions
> from other SCMs (e.g. based on foreign rev-ids and/or hashes of tree
> state), then this makes it fairly easy to ensure that repeated
> operations produce the same result.
>
> There are two problems with this:
>
> 1. path contains forbidden characters
> 2. revision-ID is not known until commit time
>
> The first can be worked around with suitable escaping.
>
> The second is not easy to work around.  If we use the parent ID instead
> of the commit ID, we no longer have a UUID, because more than one branch
> can have that parent and create that file.  Yet we are required to have
> an ID for all versioned files.

The problem with this statement, is that earlier you state:
"it is easier to make imports from other SCMs produce identical branches
every time"

Which is in a lot of ways indistinguishable from creating a new file and
checking it in.

>
> Similarly, we can't select the next revision-id in advance, because we
> could branch after selecting that revision-id, and that would be a very
> bad invariant violation.
>
> Perhaps we could use special temporary file-IDs for new files, and
> change them at commit time.

If you really wanted, you could always go the darcs/svn route, and just
get rid of ids entirely. As long as you have enough history to compare
between two trees you can match up files without an id.

I personally like ids. But I go so far as to like the 'arch-tag' style
with the id embedded in the file. I understand the trade-offs, and I
still like it. But that is my opinion, many people feel differently.

Also, a different perspective stemming from bzr being snapshot based,
not changeset based: ids may not really matter.

1) If I add a file in the same place that you add a file, they are going
to conflict anyway.
2) There aren't really going to be changesets lying around that are
referencing a particular file id. Our changeset format references a base
revision, and uses the ids from that. It can just as easily use the
*paths* from that.

Just think about it. How are file-ids actually used in the source? I
know they are stored in the inventory, and designed so that when you
rename a file, it keeps its id, and then notes a rename when you commit.

Why not just note the rename at the time of "bzr mv". It made sense in
arch, because it was part of the changeset format. And in arch a
changeset is much more self sufficient. I can grab a changeset and apply
it, without any further context. That isn't really a design criteria for
bzr.

Anyway, I can at least come up with a case for completely getting rid of
file ids. Is it something that we should look closer at?

Note: Revision-ids are still required. It's just that files are only
defined by their revision-id and the path therein. It means that the
Inventory XML would need to be changed to include the act of renaming.

For text-ids, we can just use the escaped path + the revision of last
change, which means that the only unique id we need is the revision-id.
Everything else stems from that.

John
=:->

>
> Aaron
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050811/17996595/attachment.pgp