bzr copy (Re: Weekly 0.10 release status)

Tue Aug 15 16:32:59 BST 2006

Andrew Bennetts wrote:

...

> (and also take note of the lines in the previous revision of ORIGINAL no longer
> present anywhere, I suppose).
> 
> So rather than recording that "NEW is a copy of ORIGINAL", it would be recording
> "these lines were split out of ORIGINAL and into NEW".
> 
> Then if there's a merge like my example above, then bzr could, in principle, put
> the new and modified lines in the right places automatically.
> 
> Implementing this efficiently and correctly is left as an exercise for the
> reader...
> 
> <end handwaving>
> 
> -Andrew.

Aaron had some ideas about this, going along with edge-based merging and
line-identity.

http://revctrl.org/EdgeVersioning

The basic idea is that the first time a line appears in a given file,
you give it a line identity. (Something like revision_id+file_id+line_num).

Then as that line moves around, it keeps the original line identity.
Possibly even if moving to another file.

With EdgeVersioning, identity is sort of what line am I, and what lines
am I next to.

There are quite a few things you can do with file-id aliasing/copy
notation. The biggest trick is how you handle when both ids are in the
same revisions, versus when the revisions are disjoint.

The biggest problem with the 'perform well', is this is the sort of
thing that starts making all merge operations have to look around to see
if there is another file that they should be handling. And depending on
how the copy information is stored, this can be an expensive search
through history. Especially since most implementations only store the
'copy' information in the copied file, not in the one it was copied from.

So if I do:
bzr copy a b
bzr commit -m b b

And then modify 'a'. I have to look at the entire inventory and see if
there are any copies of 'a' lying around.

You could store both pointers, so doing 'bzr copy a b' would modify both
'a' and 'b' to indicate there is an association there. It is
non-intuitive to people using other SCM copy semantics, though.

You could also make the list of copied files an independent file/index.
So that it is a single lookup to find any copied entries. But that index
could get really big over the lifetime of the project.

It would be cleaner to just mark in the A index and B index that they
are associated. However, this scales weird when you have copies of
copies. Because you have a commutative effect. bzr copy a b, bzr copy b
c, bzr copy a d, technically c and d are associated.
And I don't think you want the N(N-1)/2 copy associations to be recorded
in every index.

Things get even weirder if you start doing shallow branches and history
horizons. Since now you can't look through all of history....

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060815/77e95d7f/attachment.pgp