Please check my thinking on bug 646979

Mon Oct 4 23:01:01 BST 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 9/29/2010 11:23 PM, James Westby wrote:

...

> Here's why: (apologies to anyone using screen readers or variable width
> fonts)
> 
>   ---B---F
>  /  /   /
> /  / .-D
> \ A-=
>  \    `-E
>   \      \
>    C------G
> 
> (Time passes as you go right)
> 
> Here A is an "upstream" revision that is an import of a tarball. B and 
> packaging revision based on that, and C is another packaging revision in
> turn based on that. (Think B==Debian and C==Ubuntu)

> Then D and E are independent packagings of two different upstream
> release (say Debian jumped to 3.0, and Ubuntu took the point release
> 2.1). F and G are the new packaging revisions based on merging the old
> ones with these new upstreams.

Drawn vertically

  A
 /|\
E D B
 \ \|\
  \ F C
   \  |
    \ |
     \|
      G

> 
> Now if we simply merge all F->G, we try and merge (D, F) in to (C, E,
> G), which means we are merging changes from A->D with changes from A->E,
> or in other words we are merging two upstream versions together.
> 

If you just merge F => G, then the common ancestor is B, so it will
merge (F - B) into (G - C) which should pretty much be D.

But yes, it will be mixing the upstream versions together.

> Consider the case of a file with the version number in it. The BASE
> would be "2.0", one side you change it to "2.1" and the other to "3.0",
> but in reality they were sequential changes, we just can't represent
> that here.
> 
> As a packager you don't care about this (generally), and you assume that
> whatever in the latest is fine (you don't try and track at this point
> patches in 2.1 that didn't make it in to 3.0, and even if you wanted to
> this operation wouldn't just show you that)
> 
> What merge package does is first merge the two upstream revisions
> together, taking the tree from whichever has the highest version number.
> 
>   ---B---F
>  /  /   /
> /  / .-D--.
> \ A-=      H
>  \    `-E-`
>   \      \
>    C------G
> 
> Currently it will then just merge H in to G (the target). This can
> generate conflicts, which are very, very confusing to users, as it's
> incredibly hard to explain why they are getting them.
> 

Does merging D & E generate conflicts itself? It would seem that if
merging to G generates conflicts, then you should have gotten a conflict
in the intermediate stage as well. (offhand the best you can usually
hope for is more understandable conflicts, unless you have a real
'criss-cross' merge and we are selecting a very poor base.)

Again, redrawing your graph vertically so that I can see it as I'm used to:

  A
 /|\
E D B
|\|\|\
| H F C
 \    |
  '-. |
     \|
      G

Having drawn that, I'm 75% sure that there is no way to merge H => G
that doesn't involve crossing an existing line. The common ancestor is
only E, though. So we probably wouldn't detect it as a criss-cross.

>   ---B---F
>  /  /   /
> /  / .-D--.
> \ A-=      H
>  \    `-E-` \
>   \      \   \
>    C------G---I
> 
> Once we have that merged revision we can merge F->I, which will merge
> what we want.
> 
> I was thinking about this the other day and realised that uncoditionally
> merging H to the target might not be the right thing to do. I think that
> it should merge it in to the side that had the highest version. That
> should never generate conflicts, as the revision that is being merged
> has no changes against the LCA. I think it should generate an
> equivalent merge though.
> 
> So, back to the graphs, if we this time consider D to be newer, but
> still want to merge F->G, would it be ok if we created I on the other
> side:
> 
>   ---B---F----I
>  /  /   /    /
> /  / .-D--. /
> \ A-=      H
>  \    `-E-`
>   \      \
>    C------G
> 
> and then merged I->G for our final revision?

  A
 /|\
E D B
|\|\|\
| H F C
 \ \| |
  \ I |
   \  |
    \ |
     \|
      G

Now you have a genuine criss-cross. As the lcas are E and B (ancestors
of both I and G that are not superseded by a more recent ancestor.)

Just using 3-way merge (vs say --weave) I would expect this to conflict
more than merging H => G, because of our specific base selection (when
we find a criss-cross 3-way goes to the next base, which will be A,
which then will try to merge (I-A) into (G-A).

> 
> I think that it should be "safe" and remove those "odd" conflicts you
> get in the intermediate state, instead moving them to the final merge
> when they should hopefully make sense.
> 
> Can anyone generate a scenario where this would give a worse outcode?
> Can anyone in fact generate a scenario where either strategy is the
> wrong thing to do?
> 
> The more I think about it the more I am confident in the change, but it
> certainly doesn't seem like an obviously correct change to me.
> 
> Thanks,
> 
> James
> 

My quick analysis says the opposite. The default merge code will give
you worse results introducing the artifical "I child of F and H".
Practice matters more than theory, though.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyqTp0ACgkQJdeBCYSNAAMXYACgiCjKQlo7iX8EPqPOTdpAKUpZ
aPYAoIa8hThmw8jGR2I5hch4XGu5Ykku
=/v/+
-----END PGP SIGNATURE-----