Proposal: use predictable file-ids

Thu Aug 11 16:09:44 BST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

I'd like to suggest that we make file-ids predictable.  There are two
advantages I can see:
1. less human-unreadable stuff in changesets
2. it is easier to make imports from other SCMs produce identical
branches every time.

My suggestion:
REVSISION-ID/pathname/at/commit/time

This is a UUID because revision ids are unique and pathnames are unique
for a given revision.  If we can assign predictable IDs to revisions
from other SCMs (e.g. based on foreign rev-ids and/or hashes of tree
state), then this makes it fairly easy to ensure that repeated
operations produce the same result.

There are two problems with this:

1. path contains forbidden characters
2. revision-ID is not known until commit time

The first can be worked around with suitable escaping.

The second is not easy to work around.  If we use the parent ID instead
of the commit ID, we no longer have a UUID, because more than one branch
can have that parent and create that file.  Yet we are required to have
an ID for all versioned files.

Similarly, we can't select the next revision-id in advance, because we
could branch after selecting that revision-id, and that would be a very
bad invariant violation.

Perhaps we could use special temporary file-IDs for new files, and
change them at commit time.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC+2o40F+nu1YWqI0RAga0AJwIVRp0fvoNTCAFd4CBOlR+hS4wDwCdHF+J
LdryaSxdm8BN1iFComhEKBo=
=S9lU
-----END PGP SIGNATURE-----