Thoughts on file ids

Tue May 17 09:50:02 UTC 2011

Hi Rob,

Thanks for your thoughts. In hindsight it was probably a bad idea to
send that previous email at the end of a long sprint day and with a
sleepy head. Sorry if it wasn't as clearly worded as it could have been.

On Tue, 2011-05-17 at 12:53 +1200, Robert Collins wrote:
> On Tue, May 17, 2011 at 5:33 AM, Jelmer Vernooij <jelmer at samba.org> wrote:
> >  * iter_changes() needs to yield not just one file id, but both the file
> > id in the old tree and the file id in the new tree; and everything that
> > currently relies on it would need to be updated

> Two concerns: this seems to violate the basic model of bzr (file ids
> are unique in a tree). I expect huge propogating fallout (in the form
> of bugs and inefficiencies) right up the stack from dirstate through
> to plugins.

> Secondly, this would open the door to bugs like 'rename showing up as
> a delete and add'.
I'm *not* proposing that multiple files in the same tree should be able
to have the same file id, or that we drop file ids altogether. 

What I'm exploring is the consequences of dropping the requirement - in
the API - that the same conceptual file has to have the same file id in
different trees. 

In other words, I would like to tree API file ids to just be things that
uniquely identify files in that tree, and to delegate finding renames
between different trees to the repository. 

The existing Bazaar repository formats would still use the stored file
ids to determine renames across trees (and just be as quick and correct
at that as they are now). Other formats (both new bzr formats and
foreign formats) could provide custom ways of finding renames - by
having a custom InterTree.iter_changes() and some custom glue for merge.

> Its true that you say 'everything relying on it would need to be
> updated', but I worry that that is too much of a handwave.
With that I mainly mean that callers of iter_changes() would have to
deal with the fact that file_id becomes a tuple. Given it's likely there
are a number of (external) users I agree it's something we have to be
careful about; I didn't mean to be handwavy about it.

> have you considered other ways of modelling file id<->path
> correspondences across trees? [e.g. path tokens as I described an
> algebra for a couple of years back]
Path tokens is a more radical change than I'm proposing, potentially
changing behaviour and requiring format changes AFAIU.

http://wiki.bazaar.canonical.com/DraftSpecs/PathTokens is the only page
on path tokens I'm aware of and it only states the goals of file tokens.
Are there more details somewhere?

> Assuming you're convinced this is the right way forward, I suggest not
> touching the existing API but adding a new one and migrating -very-
> carefully. The axioms:
>  - a fileid is unique in a [well formed] tree
>  - a path has only one fileid at any point in time

> are (IMNSHO) directly responsible for some of the very nice merge and
> tracking behaviour we have, and by making any migration stepwise and
> layered you will be able to see if/when this comes into tension with
> your work.

> In particular I think you will want tests for the various pathologies
> on merge that both git and hg suffer from (and we don't) - it would be
> a darn shame to generalise bzr to work more efficiently with those
> systems by downgrading its behaviour to match them.
To be absolutely clear, I have no intention of degrading the existing
behaviour or getting rid of the two axioms you mention. I appreciate the
existing merge behaviour too much for that. :)

Cheers,

Jelmer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20110517/a270b3cf/attachment.pgp>