Defining semantics for copying and combing files/directories/symlinks.

Mon Mar 19 02:21:35 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> For copying files I have a single use case in mind:
> - create two files from a single file. (e.g. a user has a class Foo and
> is splitting it into two, so they copy foo.c to foo-extracted.c).

So this is problematic, because it doesn't jibe with your previous comments.

- - This is just file splitting, which is not controversial.  Copies are.
  If all you meant was file splitting, you could have saved me a lot of
  concern
- - On the other hand, file splitting does not allow "us to support copies
as first-class operations" as you previously described.

Yet lower down, you say "merging a branch that has altered the original
file into a branch that has copied it will apply the changes made to the
original file to both sides of the copy;".  This is "copy" semantics,
not "split" semantics.  Split semantics would apply some of the changes
to one file, and some of the changes to the other file, and there would
be no changes applied to both.  This would avoid conflicts.

Because your solution isn't the best solution for the sole use case you
describe, I think you are also trying to address other use cases (e.g.
hi-fidelity SVN imports) that you haven't articulated here.

In particular, if we are going to support file copies, it seems foolish
to support them in ways that do not encompass SVN.

I have no data about how file copies are used in SVN.  So I don't know
how common it is to start a new file using an existing file as a
template.  If it is not uncommon, then we must account for it, and that
means recognizing that some copies aren't really copies.

Subversion users don't merge (aside from update) on nearly the same
scale that Bazaar users do, so they are less likely to be bitten by
copies that aren't really copies.

If you are convinced that template copies are not likely to be common, I
would like to understand why.  (But since you gave copying COPYING as an
example, I am not hopeful.)  Otherwise, I can go into much greater
detail about the potential problems I forsee with template copying.

Also, it is rather disappointing to see discussion of file-splitting
without accompanying discussion of code movement.  I think code movement
is as common as file splitting, and it is equally frustrating to deal
with moved code as split files.  Some representations of file-splitting
would also encompass code movement.  For example, a file split could be
represented as "new file"+"code movement".

>                 There are two basic cases for merge with respect to
>                 copies: Either both branches have already done the copy,
>                 or only one has.

What about the case where the branches have each done different copies?

>> It's not clear to me that we should use the same primitive to represent
>> both those operations.  The output of a split is two files with no
>> common contents that are both related to the base file.  The output of a
>> copy is two files that have identical contents to the base file.  In the
>> first case, applying a merge from a pre-split tree should apply each
>> change only once.  But in the second case, a merge from a pre-copy tree
>> the changes would be applied twice: once to each file.
> 
> Its easier for a user to delete a 'deleted-region' conflict than to
> manually repeat a merge that we didn't do for them.

I wonder, though, how many times they would have to do that.
Potentially quite a lot, if they performed the split, and they are
running a long-lived branch.  If we support file splits, we can handle
this gracefully.  If we support file copies, we cannot.  So if I take
your use case at face value, we should support file splits and not file
copies.
> Later on we could look at detecting when a
> conflict in a split file applied correctly in another branch of the
> split; if it did and the conflict was a 'region deleted' conflict, we
> could elide that conflict completely, with no data-loss implications. I
> think that there is not enough of a win by having 'split vs copy'
> defined to justify the complexity in explaining it, let alone
> implementing it.

I think you are saying, "supporting copies at the expense of file-splits
is a win", which contradicts your single use case.

Though some nuance is probably in order: by implementing file splits
rather than file copies, we could apply only the relevant changes to
each side of the split.  So rather than "eliding conflicts" as you say,
we could simply not produce the conflicts in the first place.

I am also not convinced that eliding a "deletion conflict" would ever be
a correct choice when dealing with file copies.  Deletion conflicts do
happen with unsplit files, after all.

>> Finally, it's not at all clear that anyone really wants COPYING to be
>> treated as the same everywhere.

> I've handled this in the above user instructions by giving the user
> predictable behaviour: If the user wants to change all COPYING files
> ever, they branch from before the first one was created, change just
> COPYING, commit, then merge that wherever.

Not good enough.  The branch with the copy may be a long-lived fork, and
so your "branch from before the first one was created" scenario can
effectively happen by accident.

What start out as clones can diverge to such a degree that they deserve
a new identity.  If you have a/COPYING (content:gplv2), and you produce
b/COPYING, and then, many commits later, you change b/COPYING into
gplv3, merges against a/COPYING should not apply to b/COPYING.

>>> Advanced support for copies seems to mostly mean merging, and seems to
>>> require knowing more about what the copy means.  Are they copying the
>>> file to split it, or make a new copy of the same thing (like the gpl
>>> example).
> 
> I dont think we need to know what the copy means. Users are very capable
> of getting what they want given reasonable primitives.

To cite just one problem with this, SVN users whose data we import will
have never read our instructions.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/fOv0F+nu1YWqI0RAtKVAJ9p+5iNHathqD0lfx5JdzBTO0vF5QCfdayo
VEd4yV8Wy4oQhCGHNGoWZV8=
=RLPW
-----END PGP SIGNATURE-----