mbp at sourcefrog.net
Thu Apr 20 01:19:25 BST 2006
On 19 Apr 2006, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> John Yates wrote:
> > Nonetheless the reference to exact line
> > identity makes me wonder whether the ideas described in this write-up
> > -- reconstructed from ancient memories -- of the matching algorithm in
> > DSEE and ClearCase have any relevance:
> This sounds very similar to the Patience sequence matcher which we will
> merge in 0.9.
> > http://www.abridgegame.org/pipermail/darcs-users/2005-April/006561.html
It's somewhat similar, and an interesting variation. The description
there doesn't say how you pick a line from the work list, or whether one
is just chosen at random. The patience-diff approach of trying to find
the longest common subsequence is probably good.
It has the same issue that it will eventually end up with regions
containing only lines that aren't unique in that region, and that can't
be accounted for. John's cdvdifflib adaption called difflib in this
situation to finish them off.
> > My sense is that Aaron's concept of line identity is more one of object
> > identity.
> Right. When performing a sequence match, I'd prefer to give first
> priority to lines that are definitely the same line in both revisions,
> second priority to lines that have the same text in both revisions, and
> are unique in each, and third priority to non-unique lines.
But it's never quite "definitely the same line", it's just
"looks like it's the same line when you take the evolution of the file
into account". But each of the steps we're looking at just took a
diff-like heuristic to work out which lines were still the same. It
could still be well worthwhile.
More information about the bazaar