[merge] More tweaks to PatienceDiff

Mon Jun 5 14:39:14 BST 2006

On 28 May 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> Just a few cases where it was recursing only to back out right away, or
> extracting the last entry in a list, when the parent already knew the
> answer.
> 
> Also, since I now have a better feeling for PatienceDiff, I realize that
> we don't need to do Difflib matching inbetween. If you might recall,
> patience diff doesn't match common lines surrounded by unmatching text.
> However, it turns out that it isn't as big of a deal as we though.
> Because the lines only have to be repeated inside the unmatched portion.
> patience diff does a unique matching over the whole text, and then it
> does a unique matching inside each unmatched section. Which means that
> while this won't get matched "aBccDe" vs "abccde", this will get
> matched: "aBcdEfGcdHi" versus "abcdefgcdhi". The 'cd' sections are not
> common lines when viewed from the whole text, but they are unique lines
> when viewed as the chunks between unique lines. (the 'f' acts as a
> breaking point, and 'aBcdEf' => 'abcdef' match without any problem).

That's good to hear; I'd realized that previously and so didn't properly
understand why we needed to fall back to the other one.  If you really
do have a region where there are no lines that are common and unique on
both sides then I don't think just choosing one is great.

> Attached is a diff relative to what has already been approved. Though
> not submitted, since the pqm won't accept changing the name of the
> SequenceMatcher.
> 
> All changes are present on my jam-integration branch.

+1 from me.

-- 
Martin