[RFC/MERGE] Avoid syncing on incorrect lines during diff

John Arbash Meinel john at arbash-meinel.com
Thu Jan 11 00:41:19 GMT 2007


John Arbash Meinel wrote:
> Martin Pool wrote:

I also found another very good example just today. I was changing the
indentation of a section to bring it out of a try/finally, and move the
try/finally somewhere else. I'm attaching the two forms of diff, and you
probably can see why I like the new one better.

Patience originally matched on the try/except errors.NoSuchRevision,
after ignoring whitespace it marks the whole chunk as moved. Which is
much easier to read as a human.

I have a patch now which allows 2 parameters to PatienceSequenceMatcher.
munge_func, which can be applied to every line (I recommend str.strip).
And match_munged. Which changes the algorithm from my version to what
Aaron proposed. Which is that we just run diff on the munged form, and
then remove entries that don't actually match.

In most cases Aaron's version and my version will perform similarly. But
Aaron's idea does a little better with blocks that only change
indentation, because rather than creating "don't match on these common
lines" it creates "match on these indentation changed lines".

To be more specific, my version helps prevent matching "else:" lines
because it considers them to be duplicates. If you take:

  if foo:
    foo()
  else:
    if baz:
      baz()
    else:
      bar()

versus:

  if foo:
    foo()
  if baz:
    baz()
  else:
    bar()

In my version, all I'm doing is preventing the first 'else:' from
accidentally matching the moved else. In Aaron's version, the baz() and
bar() lines will actually be marked as matching until the final pass.

As I think of it, I think Aaron's is conceptually better. But I figured
we could put them in, and then see what we think of it.

I looked into what it would take to add something to the command line,
but unfortunately it is pretty ugly. The problem is that cmd_diff calls
diff_cmd_helper calls show_diff_trees calls _show_diff_trees, which uses
a diff_function and passes it to InventoryEntry.diff() which then calls
back into internal_diff, which then passes a class to unified_diff().

So the best I can come up with is that we would need a different
'internal_diff' function for the different permutations, and different
classes or functions that return PatienceSequenceMatcher objects to pass
to unified diff.

And then the parameters from cmd_diff need to be passed down through
diff_cmd_helper and show_diff_trees and _show_diff_trees.

Which tells me that we *might* want to be using a slightly different
api, which could make all of this easier.

But for now if you want to test out the different methods, you can just
change this lines 290 in bzrlib/patiencediff.py
Setting munge_func=str.strip will turn on munged matching, and setting
that and making match_munged=True will use Aaron's version.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: status-branch-bzr.dev.diff
Type: text/x-patch
Size: 2232 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070110/4bcbb005/attachment-0003.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: status-branch-no-white.diff
Type: text/x-patch
Size: 2320 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070110/4bcbb005/attachment-0004.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patience_no_whitespace.patch
Type: text/x-patch
Size: 64086 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070110/4bcbb005/attachment-0005.bin 


More information about the bazaar mailing list