KnitSequenceMatcher a net performance loss

Robert Collins robertc at robertcollins.net
Mon May 29 07:53:06 BST 2006


On Sun, 2006-05-28 at 21:32 -0500, John Arbash Meinel wrote:
> I'm not sure why it is, but in my work on performance testing
> PatienceDiff, I include a test of the KnitSequenceMatcher. And what I
> found was that KnitSequenceMatcher is actually slower than difflib's
> plain sequence matcher.
> 
> I turned the patience-test.py script into a bzr plugin, available here:
> http://bzr.arbash-meinel.com/plugins/patience_test
> 
> And this is what I found:
> 
> for 20 knits:
> pdiff time: 2.82s       1553511 bytes
> cpdiff time: 2.39s      1553511 bytes
> kdiff time: 3.45s       1550155 bytes
> diff time: 2.75s        1550155 bytes
> 
> for 224 knits:
> pdiff time: 47.61s	106.2% (relative to difflib time)
> cpdiff time: 39.92s	 89.0%
> kdiff time: 54.12s	122.7%
> diff time: 44.85s	100.0%
> 
> I'm running a complete test, but it hasn't finished yet. But this shows
> that my modified python PatienceSequenceMatcher matcher runs within a
> few percent of difflib's SequencMatcher (48 vs 45s). The compiled
> matcher runs much faster, but the knit sequence matcher runs much slower.
> 
> So I would recommend that we go ahead and switch to
> PatienceSequenceMatcher for knits. It isn't as fast as difflib, but at
> least we get some better line annotations out of it.
> 
> Now, I don't know how the sequence matcher was performance tested, I
> might be doing something weird. But I'm doing it at a pretty high level,
> so I think it is valid.

What difflib are you using, perhaps the implementation I copied was a
greatly slower one ?

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060529/bedf6898/attachment.pgp 


More information about the bazaar mailing list