random commit speed notes:

Ian Clatworthy ian.clatworthy at internode.on.net
Thu Sep 6 01:08:05 BST 2007


Robert Collins wrote:

> Heres an important point: The largest win we can make outside of
> record_entry_contents is 30 seconds.

I'll try the changes you did here and see if I get the same result. We
are changing the memory footprint and disk cache by skipping
record_entry_contents like this.

> Ian - I know you see some 20-40 second win with your iter changes based
> commit; I'm really curious where they come from, because as far as I can
> tell the time outside of record entry contents is not even 40 seconds
> long :). It's always smelt strange to me to see so big a win, I'd really
> like to understand more clearly what you are seeing. Perhaps profiling
> with the sort of crippled record_entry_contents I'm describing may help
> us there.

My profiling was showing that the kind lookup was the time killer. The
main difference between populate_from_inventory and populate_from_tree
(which uses iter_changes) is that the latter has the kind returned as
part of the iterator.

Normal commits are the big winner from the iter_changes patch. Initial
commits come along for the ride.

> I plan to spent most of my time working on making record entry contents
> fast - shaving off time, wastage, and getting rid of the triple
> handling.

Great. I'll review as much as I can. I'll track your changes as well and
assist wherever I can help either via coding or profiling.

The triple handling change will be a big win IMO.

Another area I've been looking at is updating the working tree after a
commit. In the build-inventory-from-scratch-each-time algorithm, SHAs
look like they get recalculated again because they're not saved first
time around?

> Did you know that we spend 5% of commit just checking that the
> output of file.readlines() is a bytestring and correctly split on
> newlines? E.g. second guessing python.

On a normal commit or just the initial one?

Just in case it isn't obvious, it's normal commits I really want to
speed up. The good thing about getting initial commit going well is that
 it puts an "upper bound" on normal commit. The bad things stand out
more as well of course in the initial case.

Ian C.



More information about the bazaar mailing list