[MERGE] Comparison cache to speed up diff

Aaron Bentley aaron.bentley at utoronto.ca
Fri Jul 20 20:39:41 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

This implements the first part of my diff-speedup plans:  A cache of
SequenceMatcher.get_matching_blocks output.  This is useful because file
comparisons have poor scaling properties.

So a list of matching blocks is associated with the working tree SHA1
sum and the basis tree's file revision.

Caches are created when you run diff, reused the next time you run diff,
and cleared when you commit.

I tested against a bzr.dev tree after doing "revert -r -50"

For vanilla bzr, the best diff time was:
real    0m4.538s
user    0m4.020s
sys     0m0.296s

When writing the cache for the first time, the best result was 1.3x slower:
real    0m6.114s
user    0m5.276s
sys     0m0.492s

When reusing the cached data, the best result was 1.7x faster:
real    0m3.556s
user    0m2.856s
sys     0m0.308s

As usual, file access time is a significant factor: According to
lsprof/kcachegrind: get_file takes 49.61 of 103.05 ticks.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGoQ990F+nu1YWqI0RAog1AJ9tlgmdYbgsOc+6Ii7o6/IP709PCACeIoul
9hxl4T7y8Mqv6EuOEj9M7Ac=
=vdzR
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comparison-cache.patch
Type: text/x-patch
Size: 31169 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070720/587d9e64/attachment-0001.bin 


More information about the bazaar mailing list