KnitSequenceMatcher a net performance loss
John Arbash Meinel
john at arbash-meinel.com
Mon May 29 03:32:11 BST 2006
I'm not sure why it is, but in my work on performance testing
PatienceDiff, I include a test of the KnitSequenceMatcher. And what I
found was that KnitSequenceMatcher is actually slower than difflib's
plain sequence matcher.
I turned the patience-test.py script into a bzr plugin, available here:
http://bzr.arbash-meinel.com/plugins/patience_test
And this is what I found:
for 20 knits:
pdiff time: 2.82s 1553511 bytes
cpdiff time: 2.39s 1553511 bytes
kdiff time: 3.45s 1550155 bytes
diff time: 2.75s 1550155 bytes
for 224 knits:
pdiff time: 47.61s 106.2% (relative to difflib time)
cpdiff time: 39.92s 89.0%
kdiff time: 54.12s 122.7%
diff time: 44.85s 100.0%
I'm running a complete test, but it hasn't finished yet. But this shows
that my modified python PatienceSequenceMatcher matcher runs within a
few percent of difflib's SequencMatcher (48 vs 45s). The compiled
matcher runs much faster, but the knit sequence matcher runs much slower.
So I would recommend that we go ahead and switch to
PatienceSequenceMatcher for knits. It isn't as fast as difflib, but at
least we get some better line annotations out of it.
Now, I don't know how the sequence matcher was performance tested, I
might be doing something weird. But I'm doing it at a pretty high level,
so I think it is valid.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060528/57888395/attachment.pgp
More information about the bazaar
mailing list