[PACKS] Performance opportunities.
Ian Clatworthy
ian.clatworthy at internode.on.net
Fri Aug 31 00:52:45 BST 2007
Robert Collins wrote:
> One is having gzip files in the pack, rather than raw zlib. There is a
> massive difference here - GzipFile takes 86 seconds, using zlib directly
> a trivial implementation takes 38 seconds, to compress a 550MB tar. (the
> gzip command line takes 36 seconds). Possibly we can fix up GzipFile,
> but I have looked closely at it before, so I'm not convinced that its
> worth doing this - packs are not zcattable, unlike .knit files which had
> no delimiters between gzip objects. So we need our own debug tools
> anyway. A rough and ready change to this shaved 30s off commit.
If it's easy to do, I like John's idea of making this parameterised in
your experimental branch. It would be interesting to see the differences
across trees of various sizes.
> Secondly we sha the working tree twice on an initial commit (bzr init;
> bzr add; bzr commit) because everything is a miss - thats only
> ~3seconds, but 3 seconds on a 263 is still > 1%.
I've responded to this one in another email.
> Thirdly the way we store annotations has quite some overhead at the
> moment. Turning our knit storage to use the PlainFactory rather than the
> Annotated one saves 30 seconds.
That is a big difference! Wow.
> So I have a prototype branch (it doesn't convert data, so its quite
> un-interoperable as yet) where I have commit at:
[snip]
> Concretely, I plan to switch to using zlib directly in packs. I'll also
> look at making the annotation cache be separate and disable-able.
Ping me on IRC when you have a branch I can play with. I'll merge my
other commit changes and see what the total effect is.
> I'm looking for critiques and 'good idea', 'bad idea' comments on this.
> As well as suggestions for other things we can do in the short time
> remaining before I'll need to start solidfying packs for 0.92 - when I'd
> like to release the first user-exposed format.
Thanks for focussing on this. Commit performance is something we
frequently get benchmarked on and one of the two performance areas we
must improve IMHO to make bzr more usable on large trees. The other is
pull/branch. In both cases, landing the new pack repository support will
make a noticeable difference.
Ian C.
More information about the bazaar
mailing list