RFC: startup time - again

Tue Sep 9 10:22:15 BST 2008

So, Martin and I just had an interesting chat. We were talking about
startup time and lazy_import etc.

startup time, such as 'bzr rocks' isn't a very useful metric for 'how
slow is bzr to get going'. Much more useful is running a command with
little work, and the same command with lots of work.

As an example, (figures from memory from that phone call)
time bzr st bzr.dev -> 360ms
time bzr st newly-inited-tree -> 320ms

so 320ms to get going, 40ms to do the status itself.

On that machine, 'time python -c "import sys"' -> 20ms

So 300ms/360ms is in loading bzrlib code.

There are lots of contributing factors to this, but the basic problem
is:
 - we're loading too much code.

The ideal minimum amount of code [while retaining object module and
behaviour] to run status is roughly:
 - the transport factory & local transport module
 - the repository factory & pack repository check-the-format code
 - the branch factory & branch check-the-format code
 - _walkdirs_utf8
 - dirstate parser
 - the WT4 intertree module for dirstate<->basis tree
 - the cmd_status object
 - the encoding detection logic
 - logging to ~/bzr.log support
 - the bzr front end

We've focused a lot on the time to startup, and we've been looking at
the time to load many modules, but essentially at the heart of it, its
the time to load the aggregate code. I did an experiment with
'cxfreeze', and saved 40ms user time - so the number of files being
loaded *is* an issue, but at only 1/8th of the time, one that solving
will not remove the problem.

A couple of interesting thought experiments. I'll let you guess who put
which forward...

have the front end look for a statically-prepared set of python code,
extracted from the main code base, to do e.g. status, and run just-that
if the command being run is present there.

have demand-loading of individual methods on complex classes, so that
e.g. calling WorkingTree4.rename would import
bzrlib.WorkingTree4.rename, and assign a method from that into the class
as an attribute, just-in-time.

The fan-out of needed-modules is likely a critical factor here. Things
that cause more modules to be loaded, like subclassing, can increase the
overall code loaded; so can single very large modules, like errors.py.

No concrete proposals at this point, but please think about this, about
these sorts of things, and the impact on the code base - as well as the
benefits we might get by doing something like that.

For instance, we could make errors less of a problem, without preventing
use in flow control by:
-renaming errors.py to something like error_base.py
-creating an object on the bzrlib module called errors, which will 
 demand-load-and-cache exceptions (a lazy registry, but one that does
 discovery too)
-putting most/all of our errors into split out files which can be 
 discovered as needed

Exceptions used in flow control would be demand loaded individually, and
the others would not be loaded at all - a win except for selftest.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080909/50567743/attachment.pgp