file copies conception

Thu Jan 17 21:13:27 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belchenko wrote:
> Hi,
> 
> I have some troubles with understanding conception of file copies.
> Or maybe how this conception should work with merging different branches.
> 
> Till now bzr does not have such conception, but some users time-to-time
> ask about support of copies.
> Because I never used it I can't be sure that I understand how it should
> be used.
> 
> How merge before and after copy should work? How log and annotation
> should work for copies?
> Some simple examples from another VCS or how people think it should work
> -- will help me.
> 
> IIUC, there is 2 potential use cases: when user want to split file
> content for 2 files,
> and when user need to create another file with almost similar but
> partially different content.
> But in the first case there is no copy, content from one file moved
> (i.e. cut and paste) to another
> file. It's make sense for annotation in tracking content movement, but
> does it make much sense to
> keep history pre-split?
> In the second case I'm again not sure about history. Why user want to
> have history of new file
> attributed to similar but different file?
> 
> My be I'm just lack of imagination?

No, these questions are why we don't have an implementation. merge post
copy is poorly defined.

If someone is doing a "copy" because they have a template file, then
maybe changes to the template should propagate to the derived files. Or
maybe it shouldn't because the similarity should be stopped. For
example, setting the date in the copyright header. Probably something
that should only take effect for new files, not retroactively update all
files with the new modified date.

If someone is doing a "split" of a file, then what they probably want is
to have any changes associated with the first part applied to the first
file, and any changes associated with the second applied to the second
file. This is close to where git's "track file contents" starts to take
effect.

However, what happens when you take 1 file, split it into 2, but both
have some duplicate sections, and some unique sections (and some
sections that did not exist before at all). Or content gets moved around
 (say they decided to sort alphabetically, etc).

With git, my understanding is that you are supposed to commit a rename
before you commit any changes to that file. Otherwise its rename
tracking could get confused because it cannot link up the two parts of
the rename. This probably generates a temporarily broken tree, since you
can't update any references during the rename. (Though sometimes when
renaming, the file itself doesn't need to update, but all the other
files need to reference the new location, which should be safe.)

I think the clearest case I saw broke down something like:

a) Split a file into 2 copies, A => A' B'
b) Track them as 2 new independent files, which share some common history
c) If you see a change to A (before the split) you should apply it to
both A' and B'
d) Changes which would only apply to one side become a simple conflict
in the other.

It would be possible to change (d) such that if it applies cleanly to
one side, ignore a conflict in the other. (But note that there will be
some changes which apply cleanly to both sides.)

Now, you might run into some difficulties if someone merges a cherrypick
of a change to A' or B', because that *should* be merged into A.

This design doesn't seem terrible to implement, and it seems like it
would be fairly obvious what is going on. So either it works like the
person expects, or it fails in obvious ways.

Also, if person 1 splits into A' and B', and person 2 splits into C' and
D', and makes a change to D', how would that get propagated into A', B'?
I think it is solvable, but we need to think about it.

A lot of people who want file copies simply want "bzr log" and "bzr
annotate" to track back the full set of changes. And that seems pretty
easy to implement.

Of course, once you start tracking file splits/copies you probably also
want to track file joins (A + B => A'), so that in the future changes to
B will show up in A', and annotation will attribute lines to ones coming
from B, etc.

Probably my biggest concern is figuring out how to efficiently track the
links between A, A' and B'. So that when you try to say "what changes
need to be merged" you can answer it efficiently. We've done a lot of
that with file-ids, since then we can track through renames without
having to look at history at all. Now you also need an accumulator to
lookup when file-ids are actually associated with different file-ids.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHj8T3JdeBCYSNAAMRAr6FAJ9nzUtKHudn1yee6Es0KJJA8WQhdwCg0yjn
Nx13LI3Aa3pAOtVzpbjb/L8=
=SiUb
-----END PGP SIGNATURE-----