RFC: startup time - again

Tue Sep 9 11:50:19 BST 2008

Robert Collins wrote:
> So, Martin and I just had an interesting chat. We were talking about
> startup time and lazy_import etc.
> 
> startup time, such as 'bzr rocks' isn't a very useful metric for 'how
> slow is bzr to get going'. Much more useful is running a command with
> little work, and the same command with lots of work.
> 
> As an example, (figures from memory from that phone call)
> time bzr st bzr.dev -> 360ms
> time bzr st newly-inited-tree -> 320ms
> 
> so 320ms to get going, 40ms to do the status itself.

I'd be interested in how much
http://people.ubuntu.com/~andrew/bzr/faster-startup helps these numbers.  Mostly
it is just some hacking away at various small low-hanging fruit I've done in my
spare time.  It takes bzr st time on a clean bzr.dev tree down from ~265ms to
~235ms on my laptop.

(I measure with “for i in `seq 5` ; do time ./bzr --no-plugins st ; done”, as a
cheap way to reduce noise from CPU throttling on a laptop.)

> On that machine, 'time python -c "import sys"' -> 20ms
> 
> So 300ms/360ms is in loading bzrlib code.
> 
> There are lots of contributing factors to this, but the basic problem
> is:
>  - we're loading too much code.

Agreed.

I'd love to quantify this more though.

e.g, some questions about time spent during imports that I'd love answers to:

  - How much time is spent simply loading the bytecode in the .pyc files?
  - How much time is spent executing code that merely defines classes/functions?
  - How much time is spent executing code that is run on importing a module that
    is doing "work" rather than simply defining a class or function.

I wonder how much overhead we're paying for code that we don't really think
about as having overhead — e.g. applying decorators to defined functions.

Some quick numbers:

andrew at steerpike:~/code/bzr$ PYTHONPATH=. python -m timeit -s "def f(): pass" "property(f)"
1000000 loops, best of 3: 0.885 usec per loop
andrew at steerpike:~/code/bzr$ PYTHONPATH=. python -m timeit -s "def f(): pass" -s "from bzrlib import decorators" -s "n = decorators.needs_write_lock" "n(f)"
1000 loops, best of 3: 208 usec per loop
andrew at steerpike:~/code/bzr$ PYTHONPATH=. python -m timeit -s "def f(): pass" -s "from bzrlib import decorators" -s "decorators.use_fast_decorators()" -s "n = decorators.needs_write_lock" "n(f)"
1000000 loops, best of 3: 1.44 usec per loop

So if we define ~700 functions that have @needs_write_lock, that's a millisecond
of total time.  I guess that's probably not our problem then...

[...]
> We've focused a lot on the time to startup, and we've been looking at
> the time to load many modules, but essentially at the heart of it, its
> the time to load the aggregate code. I did an experiment with
> 'cxfreeze', and saved 40ms user time - so the number of files being
> loaded *is* an issue, but at only 1/8th of the time, one that solving
> will not remove the problem.

I'm not surprised by that, but it's good to know rather than just be guessing.

> A couple of interesting thought experiments. I'll let you guess who put
> which forward...
> 
> have the front end look for a statically-prepared set of python code,
> extracted from the main code base, to do e.g. status, and run just-that
> if the command being run is present there.

I'm guessing this will be hard.

> have demand-loading of individual methods on complex classes, so that
> e.g. calling WorkingTree4.rename would import
> bzrlib.WorkingTree4.rename, and assign a method from that into the class
> as an attribute, just-in-time.

That's an interesting idea.  My guess it wouldn't help a great deal, but that's
just a guess...

> The fan-out of needed-modules is likely a critical factor here. Things
> that cause more modules to be loaded, like subclassing, can increase the
> overall code loaded; so can single very large modules, like errors.py.
> 
> 
> No concrete proposals at this point, but please think about this, about
> these sorts of things, and the impact on the code base - as well as the
> benefits we might get by doing something like that.

I'll try to tidy my faster-startup experiments so far and aim to get them into
1.8... there's less controversial things in there than I thought there might be
:)

> For instance, we could make errors less of a problem, without preventing
> use in flow control by:
[...details snipped...]
> 
> Exceptions used in flow control would be demand loaded individually, and
> the others would not be loaded at all - a win except for selftest.

I think an error registry is a good idea.  Currently, it appears to take 14.6ms
to import bzrlib.errors on my laptop, which is pretty huge!

-Andrew.