Tweaking PatienceDiff
LukášLalinský
lalinsky at gmail.com
Sun Apr 11 11:03:56 BST 2010
John Arbash Meinel <john <at> arbash-meinel.com> writes:
> http://bramcohen.livejournal.com/73318.html
>
> I believe our current code does the recurse first, rather than matching
> the start and end first if they are identical.
>
> The major difference is that for mostly-the-same-files you'll skip a lot
> of the internal "build a dict" stuff, for all of those lines that line
> up perfectly.
>
> Something to think about.
I've experimented with this a little this morning. After modifying the the
Python version to first match lines at the start and end and only run LCS on the
rest, 'bzr log -p' on one of my projects went down from ~1:36 to ~1:28. I'm not
sure what impact would it have on the C version.
The output of the algorithm is not the same though. There is actually one test
case that breaks after changing the code. If you have two strings
'abcdefghijklmnop' and 'abcXghiYZQRSTUVWXYZijklmnop', the original version which
does a LCS first results in these matches:
abcdefghi jklmnop
abcX ghiYZQRSTUVWXYZijklmnop
The modified version matches these characters:
abcdefgh ijklmnop
abcX ghiYZQRSTUVWXYZijklmnop
I think I actually like the result of the modified one better, but I was not
able to come up with a real-world case where it would make a difference.
Lukas
More information about the bazaar
mailing list