Defining semantics for copying and combing files/directories/symlinks.
Aaron Bentley
aaron.bentley at utoronto.ca
Mon Mar 19 13:30:46 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
>> Robert Collins wrote:
>
> lets assume that splitting and copying are sufficiently different that
> they need separate definitions, and lets make the line a hard one:
> * splitting A into A and B gives an A' and a B that share a common
> heritage, but no operations from that point on will consider them
> linked. Merges from an unsplit branch will ???. Merging to the split
> branch will ???. I dont have good answers to these two '???'s because
> I'm assuming that we want something different to the copy case. I'll try
> though: Merges from an unsplit branch will apply to both A' and B, but
> each hunk of difference may only apply to one of A' and B, if it applies
> successfully to both A' and B it is marked as conflicted in both.
What I have in mind for file splits is that they would be represented as
"Lines X from file A become file A', Lines Y from file A become file B".
This would mean that when we were performing the merge, changes
affecting lines X in the old file would be applied to file A', and
changes affecting lines Y in the old file would be applied to file B.
There would be no changes that we would attempt to apply to either
file-- they would be applied to one or the other.
> Merges
> from a split branch to an unsplit branch will split the file at the
> point it was split in the source branch and apply the changes from the
> target branches A to both A' and B as per the reverse operation.
I would have described it as splitting A into A' and B per the original
split, then applying the changes from the previously-split branch to the
newly-split one.
> Now to me, these are clearly different, but I still dont think they are
> different *enough* to justify having two separate concepts in the
> system. I may be wrong :).
I'm not proposing that we have both copy and split. I think that the
distiction would not be clear enough. In terms of your use case, I
think split has better behavior. The edge representation that would
work so well for split would also work nicely for moving code between
files, and for moving code within a file.
So based on our criteria so far, I think we should have have split and
not copy.
> Well, if you accept that having copy and split be concretely separate
> things. At the moment I don't, but one way out of this subdebate is for
> for me to rewrite the copying side with a use case that is clearly not a
> candidate for splitting. Would that help?
Yes. That would change our criteria, and split would no longer be a
clear winner.
>> I think a logical representation would be to represent the entire
>> contents of all files as a set of edges. The beginnings and ends of
>> files would just be special edges.
>>
>> Merge would then be an edge merge, applied to an optimized variant of
>> the entire contents of the tree.
>>
>> This representation works well for code movement and file splitting, but
>> does an abysmal job of representing copies.
>
> Well, it has no copy semantics defined at all, but surely we can define
> them in much the same way at the granularity of lines as at the
> granularity of files.
True. My point is that this representation can reflect a lot of
desirable operations: split, move-between-files and (though I forgot to
mention it) move-inside-a-file. But it cannot represent copy by itself.
This is one of the reasons I consider the difference between copy and
split to be a real difference.
> We'd want to consider what to do for directory
> copies too though: in the proposal I put above, copying a directory, and
> merging from an uncopied one that adds a file would copy the file into
> the new directory, and I think we'd want to keep that.
Yeesh. Directory copies aren't handled by the file splitting concept at
all.
>> So according to the use case you've supplied, I think you've chosen the
>> wrong solution. We should support file splits, but not file copying.
>
> To summarise the list of better behaviours: to make sure I've been
> paying attention, they are:
> * repeated merges from before-a-split into after-a-split should not
> show conflicts on the portion of the file partitioned into the over part
> of the split.
>
> As far as I can tell, thats the only difference?
The only difference is better merge behavior-- changes are only applied
once, and only to the correct portion of the file.
>> So I hold that some copies are copies, and some copies are not.
>> Sometimes when people copy a template, they are making a new template,
>> maybe with a few changes. In that case, a merge should target both
>> copies. But frequently when copying a template, the copies will diverge
>> almost instantly.
>
> Do you mean here that there should be three operations? copy, split,
> copy-for-diverge? Or are you saying that when you copy a template and
> diverge immediately, that that is a form of split?
No, I think that copies may become unrelated either immediately (in
which case, we can berate the user for using "bzr cp" instead of "cp;bzr
add") or later on.
> My position on this at the moment is that it doesn't matter: If you copy
> a template to make a new template, changes made from before the copy
> should affect both, because bzr cannot know whether they are relevant to
> both copies or not; a template where you diverge a lot will conflict
> when they change on both sides a lot. That said, changing of templates
> should be rare, and it should work nicely I think.
Say the user has done a copy of a template, and they now decide that
they want the file to be distinct from the template. "bzr remove foo;
bzr add foo" isn't a good option, because it damages merging from recent
branches. So I think if we're supporting true copies, we would want a
way to break the association between "foo" and the old template.
>>> I'd really like to get a good
>>> answer to the 'where is bzr cp' question
>> See, this is what makes me think you have additional criteria that
>> you're admitting. If we *never* had support for copies, but supported
>> file splitting really, really well, would you be happy?
>
> No, because 'splitting' is not the inverse of combine
We differ here. If splits are represented as "Lines X from A become A',
lines Y from A become B", then splits are the inverse: "A' becomes lines
X in A", "B becomes lines Y in A". It's perfectly possible for combine
to be symmetrical with split.
>, and the combine
> operation which seems to be otherwise non-contentious should be
> something users can undo easily post-hoc, just like they can move files
> and directories back after a rename.
I think the symmetry means that users would be able to undo a
split+combine easily.
>>> Well I do note immediately later what we could do as a more advanced
>>> implementation, to remove the repetition there.
>> True, but you put it off for later, and I don't think that the
>> heuristics you're proposing are adequate to replicate the behavior of
>> file splits.
>
> Why not? Is there a case where something that knows a file has been
> split can do better than the heuristic I proposed?
Yes. If splits record what regions went into each file, then they can
apply only the changes that affect that region to that file.
>> The problem is that if the file was actually copied, rather than split,
>> you will fail to emit a necessary conflict.
>
> Are you saying that for a, lets call it 'real copy', that *every change*
> made before the copy must apply to *all copies*, and the heuristic I'm
> proposing of not complaining about a conflict which on one file the
> lines are present (in some form) and in the other are completely missing
> does not honour this? I can't think of a case where this is desirable
> except in a purely hypothetical sense.
For original file A, containing these lines:
"""import StringIO
f = StringIO.StringIO()
"""
Suppose there are two copies, B and C.
B is unchanged.
C has """import cStringIO as StringIO
f = StringIO.StringIO()
"""
Now suppose a new version of A has
"""from StringIO import StringIO
f = StringIO()
"""
If we apply that change to B and C using your heuristic, C will have
"""import cStringIO as StringIO
f = StringIO()
"""
It would be better to have
"""
<<<<<<<< MERGE-SOURCE
from StringIO import StringIO
=======
import cStringIO as StringIO
>>>>>>> TREE
f = StringIO()
"""
The same kind of thing could prevent a bugfix from being applied
everywhere it was relevant. When you have two copies, it's just not
kosher to silently drop conflicts.
>> Consider the contents of bzrlib/util. Would you consider us a hostile
>> fork of configobj? Say we merge changes from the mainline. If Fuzzyman
>> updates his __init__.py (which is currently blank), it's conceivable
>> that this would affect other blank copies of __init__.py.
>
> Well, if we've been copying __init__.py all around. Sure, I'd expect it to
> do that, and *so would we*. It seems hard to have a copy which is a copy
> but not a copy. That is, if people use 'bzr cp', they should *expect* it
> to propogate, rather than be surprised when it does.
And I hold that people will intuit that "bzr cp" should be used in
preference to "cp && bzr add". And that when they realize their
mistake, they should be able to fix it without "bzr remove && bzr add"
> More importantly, I dont think our behaviour post-conversion matters too
> much if we can represent svn properly. But I haven't explicitly tried to
> accomodate svn at this point; its obviously in the back of my head, but
> I dont use it enough, nor do I think the choices of svn should influence
> us too much, for it to be a significant factor in this design decision
> (at this point; maybe once we've a proposal we're happy with we can go
> back and assess svn conversions in detail, to see what we might want to
> tweak or special case for that.
Understanding your position on svn is helpful.
I think we are swimming in possibilities ATM, and it would be really
nice to get some statistics on how people use (non-branching) SVN
copies. They are the most prominent users of file copying, and I think
if we're going to implement copying, it would help for us to understand
how it's commonly used.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF/pCG0F+nu1YWqI0RAobmAJ45w+qqa9sNPs56DvY8SCfVtPxY3wCfSHjs
+uY28viCdjqgle2FNTwnmUs=
=Zpxa
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list