Thoughts on file ids

Tue May 17 00:53:21 UTC 2011

On Tue, May 17, 2011 at 5:33 AM, Jelmer Vernooij <jelmer at samba.org> wrote:
>  * iter_changes() needs to yield not just one file id, but both the file
> id in the old tree and the file id in the new tree; and everything that
> currently relies on it would need to be updated

Two concerns: this seems to violate the basic model of bzr (file ids
are unique in a tree). I expect huge propogating fallout (in the form
of bugs and inefficiencies) right up the stack from dirstate through
to plugins.

Secondly, this would open the door to bugs like 'rename showing up as
a delete and add'.

Its true that you say 'everything relying on it would need to be
updated', but I worry that that is too much of a handwave.

have you considered other ways of modelling file id<->path
correspondences across trees? [e.g. path tokens as I described an
algebra for a couple of years back]

Assuming you're convinced this is the right way forward, I suggest not
touching the existing API but adding a new one and migrating -very-
carefully. The axioms:
 - a fileid is unique in a [well formed] tree
 - a path has only one fileid at any point in time

are (IMNSHO) directly responsible for some of the very nice merge and
tracking behaviour we have, and by making any migration stepwise and
layered you will be able to see if/when this comes into tension with
your work.

In particular I think you will want tests for the various pathologies
on merge that both git and hg suffer from (and we don't) - it would be
a darn shame to generalise bzr to work more efficiently with those
systems by downgrading its behaviour to match them.

-Rob