RFC: startup time - again
John Arbash Meinel
john at arbash-meinel.com
Tue Sep 9 16:08:37 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Andrew Bennetts wrote:
> I'd love to quantify this more though.
Just to pimp some code I wrote a long time ago, there is still 'bzr
- --profile-imports CMD' available. It hooks into __import__ and
re.compile as those were the big things I saw at the time. Also,
re.compile support is broken in python2.5 because they changed the code
I'm attaching a run of 'bzr --profile-imports st --no-plugins' on a
bzr.dev tree on win32. (It takes ~500ms on this machine.) (I also had to
update profile_imports to use time.clock on win32, because time.time()
only has a 16ms resolution.)
1) import bzrlib itself is ~100ms.
52ms spent in bzrlib.osutils.get_user_encoding()
18ms spent loading win32utils, 8ms ctypes
13ms loading trace, 12ms logging
5ms loading sha
43ms spent loading ConfigObj.
23ms loading 'compiler'
2) 69ms loading "bzrlib.directory_service"
69ms because it brings in Branch, which brings in
... 17ms loading urllib because of 12.6ms in socket
3) 45.6ms loading "bzrlib.builtins"
29ms bringing in bzrdir, 16ms bringing in lockdir
Lazy importing can make it tricky to track it down, because things don't
get loaded quite as 'linearly'.
As for 'get_user_encoding()' I wonder if we could cache the value based
And from what I remember, ConfigObj spends a lot of time compiling a
regex for parsing lists from config entries, which is something we don't
use in bzr core. I don't know why the new module is importing
'compiler', but that seems to be a significant overhead.
> e.g, some questions about time spent during imports that I'd love answers to:
> - How much time is spent simply loading the bytecode in the .pyc files?
> - How much time is spent executing code that merely defines classes/functions?
> - How much time is spent executing code that is run on importing a module that
> is doing "work" rather than simply defining a class or function.
> I wonder how much overhead we're paying for code that we don't really think
> about as having overhead — e.g. applying decorators to defined functions.
> Some quick numbers:
> andrew at steerpike:~/code/bzr$ PYTHONPATH=. python -m timeit -s "def f(): pass" "property(f)"
> 1000000 loops, best of 3: 0.885 usec per loop
> andrew at steerpike:~/code/bzr$ PYTHONPATH=. python -m timeit -s "def f(): pass" -s "from bzrlib import decorators" -s "n = decorators.needs_write_lock" "n(f)"
> 1000 loops, best of 3: 208 usec per loop
> andrew at steerpike:~/code/bzr$ PYTHONPATH=. python -m timeit -s "def f(): pass" -s "from bzrlib import decorators" -s "decorators.use_fast_decorators()" -s "n = decorators.needs_write_lock" "n(f)"
> 1000000 loops, best of 3: 1.44 usec per loop
> So if we define ~700 functions that have @needs_write_lock, that's a millisecond
> of total time. I guess that's probably not our problem then...
>> We've focused a lot on the time to startup, and we've been looking at
>> the time to load many modules, but essentially at the heart of it, its
>> the time to load the aggregate code. I did an experiment with
>> 'cxfreeze', and saved 40ms user time - so the number of files being
>> loaded *is* an issue, but at only 1/8th of the time, one that solving
>> will not remove the problem.
> I'm not surprised by that, but it's good to know rather than just be guessing.
>> A couple of interesting thought experiments. I'll let you guess who put
>> which forward...
>> have the front end look for a statically-prepared set of python code,
>> extracted from the main code base, to do e.g. status, and run just-that
>> if the command being run is present there.
> I'm guessing this will be hard.
I think the biggest first-pass win would be to not load code that we
won't use. We spent *some* time digging through that, but it is
something that needs to be done on an on-going basis.
Also, encouraging plugins to be more lazy. I know if I don't supply
'--no-plugins' the import overhead changes to certain plugins that
preemptively load all of branch, repo and workingtree.
>> have demand-loading of individual methods on complex classes, so that
>> e.g. calling WorkingTree4.rename would import
>> bzrlib.WorkingTree4.rename, and assign a method from that into the class
>> as an attribute, just-in-time.
> That's an interesting idea. My guess it wouldn't help a great deal, but that's
> just a guess...
>> The fan-out of needed-modules is likely a critical factor here. Things
>> that cause more modules to be loaded, like subclassing, can increase the
>> overall code loaded; so can single very large modules, like errors.py.
>> No concrete proposals at this point, but please think about this, about
>> these sorts of things, and the impact on the code base - as well as the
>> benefits we might get by doing something like that.
> I'll try to tidy my faster-startup experiments so far and aim to get them into
> 1.8... there's less controversial things in there than I thought there might be
>> For instance, we could make errors less of a problem, without preventing
>> use in flow control by:
> [...details snipped...]
>> Exceptions used in flow control would be demand loaded individually, and
>> the others would not be loaded at all - a win except for selftest.
> I think an error registry is a good idea. Currently, it appears to take 14.6ms
> to import bzrlib.errors on my laptop, which is pretty huge!
errors.py is a bit hidden in my --profile-imports testing, because it is
always imported with other things. So it is unclear which one is really
spending all of that time.
I *do* remember it being significant. But 15ms is small compared with
60ms in get_user_encoding() and 50ms in configobj.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the bazaar