[MERGE] Optimize single-file diff/revert/etc
John Arbash Meinel
john at arbash-meinel.com
Fri Jan 12 01:33:33 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> Aaron Bentley wrote:
>>>
>>> What I would love to see is the ability to push this sort of the all the
>>> way down to the XML parsing layer, so it wouldn't have to create
>>> InventoryEntry objects for anything but the necessary paths.... But now
>>> is not the time for that. :)
>
> Well, this is just a stopgap until we have distate trees, anyhow. With
> a custom implementation of _iter_changes, these operations should scream.
>
>>> Have you done any performance testing to see if this actually makes a
>>> difference? It seems like it should, but I wonder if it is a large
>>> difference.
>
> I've done a little testing. On my biggest tree, it dropped single-file
> diff from 15s to 10s.
>
>>> I'm going to do a little performance testing on the Mozilla source tree
>>> (not a full conversion, just a single snapshot), I'll let you know what
>>> I find.
>
> Sounds good.
>
> Aaron
So a snapshot of the Mozilla source tree has ~50k files (bigger than a
kernel tree). With a stock bzr.dev I get:
20.008 bzr status
21.033 bzr status
20.686 bzr status
15.352 bzr status aclocal.m4
15.141 bzr status aclocal.m4
16.385 bzr status aclocal.m4
16.307 bzr status nsprpub/lib/prstreams/plvrsion.c
16.311 bzr status nsprpub/lib/prstreams/plvrsion.c
16.203 bzr status nsprpub/lib/prstreams/plvrsion.c
12.180 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
22.397 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
16.156 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
after Aaron's changes:
17.463 bzr status
22.765 bzr status
21.062 bzr status
13.536 bzr status aclocal.m4
14.700 bzr status aclocal.m4
12.364 bzr status aclocal.m4
13.961 bzr status nsprpub/lib/prstreams/plvrsion.c
16.935 bzr status nsprpub/lib/prstreams/plvrsion.c
12.679 bzr status nsprpub/lib/prstreams/plvrsion.c
14.033 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
13.888 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
11.064 bzr status aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
There is enough noise in whole tree 'bzr status', that I can't say
whether Aaron's changes do anything (they might add a slight amount of
overhead, but it is not obvious). For a single file, it is pretty clear
that they help, even if that file is in a sub-directory. It doesn't help
a lot (because the largest overhead is still probably reading the whole
inventory).
But since this does show us cutting 2-3 seconds off the time to do
status for a single file, I think it is definitely worth merging. And
for PR purposes I wouldn't even like it to be merged into 0.14, but I
realize it would doesn't usually fall under the standard definition of
"trivial/bugfix".
I chose bzr status instead of bzr diff, because I thought they should
have the same effect, but I just tested and that isn't true. Because
'bzr status' runs a second pass through the filesystem to check for
unknowns. Which makes it actually much slower than 'bzr diff'.
Specifically (stock bzr.dev):
13.820 bzr diff
13.928 bzr diff
14.651 bzr diff
7.809 bzr diff aclocal.m4
7.965 bzr diff aclocal.m4
7.694 bzr diff aclocal.m4
8.096 bzr diff nsprpub/lib/prstreams/plvrsion.c
7.877 bzr diff nsprpub/lib/prstreams/plvrsion.c
8.063 bzr diff nsprpub/lib/prstreams/plvrsion.c
7.702 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
7.697 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
7.710 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
After Aaron:
13.756 bzr diff
13.860 bzr diff
13.516 bzr diff
6.485 bzr diff aclocal.m4
6.692 bzr diff aclocal.m4
6.685 bzr diff aclocal.m4
6.653 bzr diff nsprpub/lib/prstreams/plvrsion.c
6.660 bzr diff nsprpub/lib/prstreams/plvrsion.c
6.461 bzr diff nsprpub/lib/prstreams/plvrsion.c
6.883 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
7.094 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
6.573 bzr diff aclocal.m4 nsprpub/lib/prstreams/plvrsion.c
I'm not sure exactly what is going on here, since it is starting to look
like 'bzr status' is doing whatever 'bzr diff' is doing 2 times (it is
seeing approx 2x the performance improvement, as well as taking 2x as
long overall).
Regardless, though, this does show a performance improvement for all of
the partial operations (single file and multiple file), even though
there is some noise in these measurements.
So +1 from me.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFpuVtJdeBCYSNAAMRAkkrAKCBoOfExS7pabduNZJphJ5GndDWmACfSaYg
LMaj95iGT22mLgcIcMgw7Gk=
=9tb7
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list