[MERGE] Optimize single-file diff/revert/etc

John Arbash Meinel john at arbash-meinel.com
Fri Jan 12 01:33:33 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> Aaron Bentley wrote:
>>>
>>> What I would love to see is the ability to push this sort of the all the
>>> way down to the XML parsing layer, so it wouldn't have to create
>>> InventoryEntry objects for anything but the necessary paths.... But now
>>> is not the time for that. :)
> 
> Well, this is just a stopgap until we have distate trees, anyhow.  With
> a custom implementation of _iter_changes, these operations should scream.
> 
>>> Have you done any performance testing to see if this actually makes a
>>> difference? It seems like it should, but I wonder if it is a large
>>> difference.
> 
> I've done a little testing.  On my biggest tree, it dropped single-file
> diff from 15s to 10s.
> 
>>> I'm going to do a little performance testing on the Mozilla source tree
>>> (not a full conversion, just a single snapshot), I'll let you know what
>>> I find.
> 
> Sounds good.
> 
> Aaron

So a snapshot of the Mozilla source tree has ~50k files (bigger than a
kernel tree). With a stock bzr.dev I get:

20.008 bzr status
21.033 bzr status
20.686 bzr status

15.352 bzr status aclocal.m4
15.141 bzr status aclocal.m4
16.385 bzr status aclocal.m4

16.307 bzr status nsprpub/lib/prstreams/plvrsion.c
16.311 bzr status nsprpub/lib/prstreams/plvrsion.c
16.203 bzr status nsprpub/lib/prstreams/plvrsion.c

12.180 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
22.397 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
16.156 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c


after Aaron's changes:
17.463 bzr status
22.765 bzr status
21.062 bzr status

13.536 bzr status aclocal.m4
14.700 bzr status aclocal.m4
12.364 bzr status aclocal.m4

13.961 bzr status nsprpub/lib/prstreams/plvrsion.c
16.935 bzr status nsprpub/lib/prstreams/plvrsion.c
12.679 bzr status nsprpub/lib/prstreams/plvrsion.c

14.033 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
13.888 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
11.064 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c


There is enough noise in whole tree 'bzr status', that I can't say
whether Aaron's changes do anything (they might add a slight amount of
overhead, but it is not obvious). For a single file, it is pretty clear
that they help, even if that file is in a sub-directory. It doesn't help
a lot (because the largest overhead is still probably reading the whole
inventory).
But since this does show us cutting 2-3 seconds off the time to do
status for a single file, I think it is definitely worth merging. And
for PR purposes I wouldn't even like it to be merged into 0.14, but I
realize it would doesn't usually fall under the standard definition of
"trivial/bugfix".

I chose bzr status instead of bzr diff, because I thought they should
have the same effect, but I just tested and that isn't true. Because
'bzr status' runs a second pass through the filesystem to check for
unknowns. Which makes it actually much slower than 'bzr diff'.

Specifically (stock bzr.dev):
13.820 bzr diff
13.928 bzr diff
14.651 bzr diff

 7.809 bzr diff aclocal.m4
 7.965 bzr diff aclocal.m4
 7.694 bzr diff aclocal.m4

 8.096 bzr diff nsprpub/lib/prstreams/plvrsion.c
 7.877 bzr diff nsprpub/lib/prstreams/plvrsion.c
 8.063 bzr diff nsprpub/lib/prstreams/plvrsion.c

 7.702 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
 7.697 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
 7.710 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c

After Aaron:
13.756 bzr diff
13.860 bzr diff
13.516 bzr diff

 6.485 bzr diff aclocal.m4
 6.692 bzr diff aclocal.m4
 6.685 bzr diff aclocal.m4

 6.653 bzr diff nsprpub/lib/prstreams/plvrsion.c
 6.660 bzr diff nsprpub/lib/prstreams/plvrsion.c
 6.461 bzr diff nsprpub/lib/prstreams/plvrsion.c

 6.883 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
 7.094 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
 6.573 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c

I'm not sure exactly what is going on here, since it is starting to look
like 'bzr status' is doing whatever 'bzr diff' is doing 2 times (it is
seeing approx 2x the performance improvement, as well as taking 2x as
long overall).

Regardless, though, this does show a performance improvement for all of
the partial operations (single file and multiple file), even though
there is some noise in these measurements.

So +1 from me.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFpuVtJdeBCYSNAAMRAkkrAKCBoOfExS7pabduNZJphJ5GndDWmACfSaYg
LMaj95iGT22mLgcIcMgw7Gk=
=9tb7
-----END PGP SIGNATURE-----



More information about the bazaar mailing list