[MERGE][RFC] C implementation of PatienceSequenceMatcher
Lukáš Lalinský
lalinsky at gmail.com
Mon Jul 23 12:48:34 BST 2007
On Ne, 2007-07-22 at 09:05 -0700, Martin Pool wrote:
> You labeled this just 'rfc' - is this ready to be reviewed?
I had some time today and I think I've got the code into a state I'm
happy with. Attaching the code and a modified patience_test plugin I
used for "mass benchmarking".
The base of the algorithm is identical to the original Python
implementation. The only difference is in finding unique lines. The
Python implementation uses dicts for this and builds them on every
unique_lcs call. The C version pre-builds a hash table of lines from `b`
and their equivalents from `a`. Each entry in this table contains a
linked list of lines from `b` and their matching lines from `a`. Lines
from `a` which are not also in `b` are not added to the table at all.
Function unique_lcs then only goes through the hash table, checks how
many lines within the specified range is in the linked lists. If both
numbers are 1, it continues with the standard patience sorting
algorithm.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bzr-patiencediff-c.diff
Type: text/x-patch
Size: 85311 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070723/95b134fe/attachment-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: __init__.py
Type: text/x-python
Size: 4640 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070723/95b134fe/attachment-0001.py
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Toto je =?ISO-8859-1?Q?digit=E1lne?=
=?ISO-8859-1?Q?_podp=EDsan=E1?= =?UTF-8?Q?_=C4=8Das=C5=A5?=
=?ISO-8859-1?Q?_spr=E1vy?=
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070723/95b134fe/attachment-0001.pgp
More information about the bazaar
mailing list