[PACKS] Performance opportunities.

Robert Collins robertc at robertcollins.net
Thu Aug 30 09:48:37 BST 2007


I haven't made these changes yet, but I've been profiling where the time
goes during commit. Timings in this tree are on my laptop; the tree is
an export of the HEAD of the mozilla sample tree we converted to bzr
some time back - 550MB of data, 55K files. The baseline is 4m3 seconds
user time, and 4m23 

Three things so far stand out as things we don't /need/ to spend time
on. 

One is having gzip files in the pack, rather than raw zlib. There is a
massive difference here - GzipFile takes 86 seconds, using zlib directly
a trivial implementation takes 38 seconds, to compress a 550MB tar. (the
gzip command line takes 36 seconds). Possibly we can fix up GzipFile,
but I have looked closely at it before, so I'm not convinced that its
worth doing this - packs are not zcattable, unlike .knit files which had
no delimiters between gzip objects. So we need our own debug tools
anyway. A rough and ready change to this shaved 30s off commit.

Secondly we sha the working tree twice on an initial commit (bzr init;
bzr add; bzr commit) because everything is a miss - thats only
~3seconds, but 3 seconds on a 263 is still > 1%.

Thirdly the way we store annotations has quite some overhead at the
moment. Turning our knit storage to use the PlainFactory rather than the
Annotated one saves 30 seconds.

So I have a prototype branch (it doesn't convert data, so its quite
un-interoperable as yet) where I have commit at:

no-anno, zlib direct:
real    3m24.990s
user    3m1.215s
sys     0m11.377s

no anno:
real    3m50.336s
user    3m34.897s
sys     0m10.941s

baseline (my normal packs branch off of bzr.dev):
real    4m23.884s
user    4m3.963s
sys     0m11.649s

Thats a 25% saving of userspace time.

Concretely, I plan to switch to using zlib directly in packs. I'll also
look at making the annotation cache be separate and disable-able.

I'm looking for critiques and 'good idea', 'bad idea' comments on this.
As well as suggestions for other things we can do in the short time
remaining before I'll need to start solidfying packs for 0.92 - when I'd
like to release the first user-exposed format.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070830/5842821f/attachment.pgp 


More information about the bazaar mailing list