path tokens

Thu Mar 15 09:02:48 GMT 2007

On 3/15/07, Robert Collins <robertc at robertcollins.net> wrote:
> I'd like us to consider overhauling the concept of fileids.

+1

> So here are a few ideas that I have about the shape of a new tool, which
> I'll call path tokens, to avoid confusion with file ids [a better name
> is welcome]. I dont intend on talking about implementation yet - partly
> because I dont have one in mind, but mainly because I want us to agree
> on the *goals* first: theres no point talking about an implementation
> until we agree on what we want to achieve.

I don't really object to that name, though I do wonder if people will
assume it means a token per path, like an id per file...  I understand
that you're being careful in this paragraph to exclude that, and that
this is about more than just file copies.

> I propose a multi step plan
> to tackling this problem:
>  - identify the problems/use cases to solve.
>  - design acceptable semantics for the new functionality that we've
> decided we want to solve.
>  - design an implementation that can supplant/extend file_ids to deliver
> the agree semantics.
>  - go forth and implement.
>
> path tokens should:
>  * For currently supported cases, have no more corner cases than
> file-ids.
>  * allow us to support parallel imports better than file-ids.
>  * allow us to support copies as first-class operations.
>  * allow us to support 'two versioned paths become one versioned path'.
>  * allow us to compare two trees with no reference historical data.
>
> path tokens should not:
>  * increase storage size proportional to history or tree size. Note that
> this isn't the same as saying 'they should have fixed size'.

all things i'd like to see improved

.. or at least, these are all things we'd like to improve in how we
deal with historical trees, and it would be awfully nice if one change
fixed them all...

*parallel imports*

The simplest test here is importing the same tree twice and diffing or
merging between them.  It might be ok for bzr to tell you a parallel
import has happened, but it shouldn't make a fuss about it, or
conflict.  Then beyond that if you make some changes on either side
they should merge across also without trouble, and so on.  I'm not
sure exactly how well we do on this at the moment, but it's less than
perfect.

These cases are interesting both because they are things users might
reasonably want to do, and also because achieving them means that file
ids (or whatever) is just internal mechanism and not intruding on the
user's work.

*file copies*

I note your explanation but I think we can say more about what
"support file copies" really means, otherwise we can claim we do it
already :-)

It would be elegant if copy+delete had the same effect as renaming.  I
don't mean we should special-case them to be the same, or that this is
necessary.  At least, if they do behave differently, it would be good
to have a clear reason why.

(In these cases A was copied to B)

I would say basic support for copies encompasses:

 * when looking at the log of B, you can see what happened to A too
(svn does this nicely)
 * ... and the fact that the copy occurred is clear in the log (svn
does not iirc)
 * and when annotating B, you also see correct annotations for lines
that came from A
 * (any others here?)

Advanced support for copies seems to mostly mean merging, and seems to
require knowing more about what the copy means.  Are they copying the
file to split it, or make a new copy of the same thing (like the gpl
example).

> No reference to historical data: Accessing lots of historical data is
> expensive - it means performance degrades as history accumulates.
> Additionally, in order to support history horizons, which is a proposal
> that we allow people to set a strict limit on what historical data is
> available to bzr, we need to be able to identify 'these are the same'
> across trees without necessarily having acccess to a common ancestor.

I agree we should set goals that this works with history horizons, and
performs well with long histories.  I'm not in favour of setting the
goal that you should know everything about copies just given two trees
and no history as that would seem to mean trees grow proportional to
all the copies.

-- 
Martin