Compiled PatienceDiff 50% faster than python implementation

John Arbash Meinel john at arbash-meinel.com
Mon May 29 02:43:21 BST 2006


I decided to play around with writing a compiled version of the
PatienceDiff algorithm. I ended up writing it as a plugin, which is
available from here:
http://bzr.arbash-meinel.com/plugins/bzr_extensions/

I use Boost.Python as the interfacing library, which lets you write nice
stuff that uses 'boost::python::object' or 'boost::python::list'
classes. And you can easily "extract<int>(mylist[0][1])", etc.

I slowly built up a lot of stuff, first just re-implementing unique_lcs
and then recurse_matches, until finally I wrote a single function call
which returns a list of matched regions.

I used the 'patience-test.py" script as my timing suite (which basically
grabs the texts from the repository, and makes sure that you can do a
patience diff from each entry to the next, and that the diff can be
applied with patch, and results in a the correct transformation).

Using just the first 20 knits in my repository, it spends 3.97 seconds
in 'internal_diff' with the basic python implementation. When using the
single C++ function call, that time drops to 2.6 seconds. Which is 65.5%
the time (or inversely the python implementation takes 152% or 52% longer).

Using all of the knits in my repository, the times are 174s for the
python implementation, and 120s for the compiled version. Which is 69%
faster or python is 145% slower.

While I've been at it, I also did a few performance tweaks for the
python implementation of PatienceDiff. I'll submit the merge request as
a separate email.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060528/be21ccea/attachment.pgp 


More information about the bazaar mailing list