[MERGE] Patience diff

John Arbash Meinel john at arbash-meinel.com
Tue May 23 00:30:17 BST 2006


Aaron Bentley wrote:
> I've now coded up a torture test for patience diff.  It opens a branch,
> and walks through its repository, diffing each version against the last,
> ~ patching the result, and ensuring that the patched result is the same
> as the target file.  The test passes with no issues.
> 
> I believe this addresses Martin's desire to test Patience thoroughly,
> and if so, a merge request is in order.
> 
> For curiosity, I've also measured the size of patches.  I had expected
> Patience to be much shorter, but that's not the case.  Patience is only
> 0.67% shorter on average (aggregate size of patience patches: 61643859,
> aggregate size of difflib patches: 62059993).
> 
> My patience branch is at
> http://code.aaronbentley.com/bzr/bzrrepo/others/bzr.patience/
> 
> Aaron

Well, that is about the performance that I would expect. Patience
doesn't really create much smaller patches. It just creates *better*
ones. (Ones that look better to a human reader)

Thanks for posting this for merge. I realize most of the following
errors were actually introduced by me, but it is a good time to review
them, and make sure things get cleaned up.


First off, in your 'patience-test.py' you do:

  new_file = file(file_path, 'rb')
  for line_a, line_b in zip(new_file, new_lines):
      assert line_a == line_b

Which is great, except 'zip()' stops as soon as one of the generators
runs out. So if one has extra stuff at the end, you won't detect it.
Since 'assert line_a == line_b' doesn't actually say anything about what
is different, I would tend to change it to:

new_file = file(file_path, 'rb')
assert list(new_file) == new_lines

We should go through and make sure that our copyright statements have
2006, and use # all the way through. (bzrlib/cdv/__init__.py doesn't
have the right lines in the beginning)

We should also check that we have the latest 'nofrillsprecisemerge.py'
files. I tried hard to make sure I was using an unmodified version so
that any updates could be easily obtained.

We probably should change the names from 'cdv' to 'patience'. I'm not
stuck on either, but I'm guessing Codeville isn't the right name to use.

In merge3.py we have a commented out import. I think we prefer to just
remove them now.

We also used to use a junk expression so that lines that only contained
space,tab or # would be ignored. It probably isn't needed anymore, but
we should be aware that the junk_re should be deleted.

test_diff.py has a bunch of pep8 whitespace issues.

We are using the new cdv diff for all of the Weave code, but it doesn't
look like anything has been done to use it for the 'knit' code. (It
should still be used for 'bzr diff')

At one point, I was thinking we could provide a flag to switch between
the diff engines. Is there any interest in that?

So, +0.5 from me. It just needs a little bit of cleanup still.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060522/fa8caa53/attachment.pgp 


More information about the bazaar mailing list