Thoughts on file ids

Mon May 9 15:34:07 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-05-09 11:09 AM, Jelmer Vernooij wrote:
> That was what I had in mind. My guess was that that was the reason the
> trans_ids were different from file ids in the first place, but I have no
> idea why they are actually different.

There are several reasons:

TreeTransforms need to refer to unversioned files.

TreeTransform operations can happen in any order, which means that in an
intermediate stage, there may be duplicate file-ids.  However, if one of
the duplicates is not deleted before attempting to apply the transform,
this is considered a conflict.

>> Sure, but you could also achieve this kind of thing by rewriting the
>> file-ids in one of the trees, e.g. using a PreviewTree.
> That would need some fancy hooks in "bzr merge" too though

I don't think so.  It's just a matter of preprocessing a tree before
handing it into the main merge code.

>, and would
> require a similar process to find map the file ids..

Yes, of course.

> It's certainly an
> option but it seems like just mapping the ids would be simpler and
> cleaner.

Depends what you mean by "simple".  This approach is something you can
implement without changing any of the core merge code, and without
introducing any new concepts.

> It also doesn't eliminate any reliance on file ids during merge
> operations.

Yes, but I consider it a virtue that it works with the existing codebase.

>> I'm not sure the per-file graph would survive the elimination of
>> file-ids.  File-ids represent the idea that we know at commit time which
>> files in a tree are comparable to which other files in another tree.  I
>> think that if we can't encode that comparability at commit time, we
>> can't have per-file *anything* encoded in a repository.  And
>> establishing that comparability later could be very expensive.
> The per-file graph is useful for finding out the relation between a file
> and older incarnations of it. I don't see why that requires us to store
> those relations up front rather than discovering them later in some
> way. 
> 
> It might still be a good thing to store those relations explicitly as we
> are doing now, but I don't see why being able to browse those relations
> requires them to be stored up front.

Without storing them up front, I think generating the graph would be too
expensive.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3ICW8ACgkQ0F+nu1YWqI1g6wCfd5o77JrcjTYn1Yhcl4wdVbeE
j5wAn3nl+FzHzFBLcfHThkzQ5NjCdoBk
=C4v5
-----END PGP SIGNATURE-----