Compiled PatienceDiff 50% faster than python implementation
John Arbash Meinel
john at arbash-meinel.com
Mon May 29 02:43:21 BST 2006
I decided to play around with writing a compiled version of the
PatienceDiff algorithm. I ended up writing it as a plugin, which is
available from here:
http://bzr.arbash-meinel.com/plugins/bzr_extensions/
I use Boost.Python as the interfacing library, which lets you write nice
stuff that uses 'boost::python::object' or 'boost::python::list'
classes. And you can easily "extract<int>(mylist[0][1])", etc.
I slowly built up a lot of stuff, first just re-implementing unique_lcs
and then recurse_matches, until finally I wrote a single function call
which returns a list of matched regions.
I used the 'patience-test.py" script as my timing suite (which basically
grabs the texts from the repository, and makes sure that you can do a
patience diff from each entry to the next, and that the diff can be
applied with patch, and results in a the correct transformation).
Using just the first 20 knits in my repository, it spends 3.97 seconds
in 'internal_diff' with the basic python implementation. When using the
single C++ function call, that time drops to 2.6 seconds. Which is 65.5%
the time (or inversely the python implementation takes 152% or 52% longer).
Using all of the knits in my repository, the times are 174s for the
python implementation, and 120s for the compiled version. Which is 69%
faster or python is 145% slower.
While I've been at it, I also did a few performance tweaks for the
python implementation of PatienceDiff. I'll submit the merge request as
a separate email.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060528/be21ccea/attachment.pgp
More information about the bazaar
mailing list