[MERGE] Faster diff on historical data

Aaron Bentley aaron.bentley at utoronto.ca
Thu Aug 9 18:42:56 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lukáš Lalinský wrote:
> I've tried profiling this with and without the C sequence matcher, and
> the times are quite interesting (this is for the second blender diff):
> 
> Python - get_grouped_opcodes - 2150 ticks
> C      - get_grouped_opcodes - 105 ticks

At least some of the time, lsprof gives misleading results about native
code.  I would not rely on those figures.

> I'm really not sure if combining the line deltas from knits using Python
> would be any faster. But I already have some code for it, so I'll try.

Neat.  My approach would have been to extract the matching blocks from
the knit deltas, determine which areas were definitely common from that,
and then run sequence matches on the remaining blocks (which may or may
not be common).

> Blender (10677..10690):
> 
> bzr.dev:
> real = 0m4.743s, user = 0m4.568s, sys = 0m0.172s
> 
> fastdiff:
> real = 0m3.919s, user = 0m3.744s, sys = 0m0.172s
> 
> bzr.dev+cpatience:
> real = 0m3.560s, user = 0m3.384s, sys =0m0.168s

So this is actually 1.25x faster.  Not bad.  Okay, I'm convinced that
the performance increase is real and valuable.

> fastdiff+cpatience:
> real = 0m2.713s, user = 0m2.540s, sys = 0m0.168s
> 
> (I've found an interesting bug in this one -- it was trying to diff a
> PDF file. It probably doesn't contain \0 so it treats it as a text. I
> think this would lead to quite bad results on merging.)

Yep.  I've seen that, too.  Silly PDF.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGu1If0F+nu1YWqI0RAgNRAJwKvZqz7zHCjZagEsLDbawiPfUmLACeOb6H
3YyAGo7e6gELPFcxURGmJ54=
=9w+o
-----END PGP SIGNATURE-----



More information about the bazaar mailing list