[MERGE] Faster diff on historical data

Lukáš Lalinský lalinsky at gmail.com
Thu Aug 9 18:29:24 BST 2007


On St, 2007-08-08 at 13:47 -0400, Aaron Bentley wrote:
> > People usually use such diffs only on nearby
> > revisions, so the knit extraction involves a lot of duplicated work,
> > e.g. instead of taking the file at revision 10 and applying one delta,
> > it takes revision 1 and unpacks/applies 10 deltas.
> 
> It's worse than that, because those intermediate deltas could be used to
> speed up the computation of the desired delta.  There's discussion of
> this in my diff analysis: doc/developers/diff.txt
>
> That document slightly predates your C sequence matcher, so I'm not
> clear whether the performance enhancement from reusing those comparisons
> will be an advantage when C sequence matching is merged.

I've tried profiling this with and without the C sequence matcher, and
the times are quite interesting (this is for the second blender diff):

Python - get_grouped_opcodes - 2150 ticks
C      - get_grouped_opcodes - 105 ticks

I'm really not sure if combining the line deltas from knits using Python
would be any faster. But I already have some code for it, so I'll try.

> > The attached patch lets it to extract both files in one go. It isn't a
> > big win, but it makes diff -r before:x..x on bzr.dev in my repository
> > about 0.1 seconds faster (~10% of the total diff time).
> 
> Doing this for a 0.1 second improvement doesn't seem worthwhile.  Can
> you get numbers for a larger project, where the impact is likely to be
> bigger?

I wanted to convert the linux-2.6 hg repository to bzr, but it took so
long to clone that I cancelled it. Here are some numbers for a bigger
bzr merge-diff and two blender diffs:


Bazaar (2650..2651):

bzr.dev:
real = 0m1.478s, user = 0m1.380s, sys = 0m0.100s

bzr.dev+cpatience:
real = 0m1.223s, user = 0m1.124s, sys = 0m0.100s

fastdiff:
real = 0m1.166s, user = 0m1.056s, sys = 0m0.108s

fastdiff+cpatience:
real = 0m0.987s, user = 0m0.904s, sys = 0m0.084s


Blender (10677..10678):

bzr.dev:
real = 0m2.309s, user = 0m2.184s, sys = 0m0.124s

bzr.dev+cpatience:
real = 0m2.083s, user = 0m1.956s, sys = 0m0.124s

fastdiff:
real = 0m1.954s, user = 0m1.848s, sys = 0m0.108s

fastdiff+cpatience:
real = 0m1.721s, user = 0m1.592s, sys = 0m0.128s


Blender (10677..10690):

bzr.dev:
real = 0m4.743s, user = 0m4.568s, sys = 0m0.172s

fastdiff:
real = 0m3.919s, user = 0m3.744s, sys = 0m0.172s

bzr.dev+cpatience:
real = 0m3.560s, user = 0m3.384s, sys =0m0.168s

fastdiff+cpatience:
real = 0m2.713s, user = 0m2.540s, sys = 0m0.168s

(I've found an interesting bug in this one -- it was trying to diff a
PDF file. It probably doesn't contain \0 so it treats it as a text. I
think this would lead to quite bad results on merging.)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Toto je =?ISO-8859-1?Q?digit=E1lne?=
	=?ISO-8859-1?Q?_podp=EDsan=E1?= =?UTF-8?Q?_=C4=8Das=C5=A5?=
	=?ISO-8859-1?Q?_spr=E1vy?=
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070809/7a78bab0/attachment-0001.pgp 


More information about the bazaar mailing list