Thoughts on file ids

Fri May 6 15:06:44 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-05-05 11:07 AM, Jelmer Vernooij wrote:
>> That's about right, but 3) is about any operation involving 2 or more
>> trees, not just merges.
> Would it be correct to say all transform and delta operations? Are there
> more operations that involve file ids across multiple trees ?

Sure.  Every operation I can think of involving multiple trees requires
delta-ing them.

> I wonder if it would make sense to have a process before transform
> operations to find renames/copies - was that what you had in mind? Such
> a process in its simplest form could just return the existing file ids.

No, that wasn't something I had in mind.  Finding renames is one thing,
but merge-across-copies, and the inverse, merge-across-joins, is evil
and would require lots of work.

I have thought about implementing merge-by-path, though.

>> Absent (3), we'd probably just use the path for (1).  Using the path for
>> (2) would mean that renaming files without changing their contents would
>> take more space than it does with file-ids.
> For (2), it doesn't necessarily have to be the path if we're not using a
> file id - it could be a checksum, or perhaps even the file id of another
> file in a parallel import. Whatever it is, it should be a repository
> implementation detail not exposed at the higher level API / UI level.

The tuples we use for versionedfiles are already repository
implementation details, aren't they?

Mind you, there's also the per-file graph, which I don't think you've
really discussed here.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3EDoQACgkQ0F+nu1YWqI2tkgCfXlw5Ziu5v0jZ521Wb+UoEdTa
XBwAniStywQmbeKpMUoDUGDx5qAskCLZ
=dPf0
-----END PGP SIGNATURE-----