RFC: startup time - again

Robert Collins robertc at robertcollins.net
Tue Sep 9 23:08:01 BST 2008


On Tue, 2008-09-09 at 09:11 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Martin Pool wrote:
> > On Tue, Sep 9, 2008 at 8:50 PM, Andrew Bennetts <andrew at canonical.com> wrote:
> >> Robert Collins wrote:
> > Having the entry point in C rather than in Python, and eventually
> > transitioning to running some commands without a Python interpreter at
> > all.  It's nontrivial to get there, if you want to keep all our
> > extension points for plugins, transports, formats, etc.  And just
> > starting Python is not the largest issue.
> 
> I would much rather work on a Python project with C extensions than a C
> project with Python extensions.

So would I. I'd also like to really put to bed some of the performance
things that keep tripping us. It's totally ridiculous some of the things
we find making bzr slow. Our tree access code is 3 times slower than a
simple 'find' command.

Naive pyrex code is only slightly faster than plain python. pyrex
extensions actually need to be written to the python C api to be faster;
and we're writing more and more of these things to get speed. They are
also harder to profile (the python VM has to be loaded etc), harder to
test|debug (can't step through it as easily, we're behind a layer the
python debugger can't see, no easy hookup for gdb yet).

If we're going to keep growing the amount of C code[or equally C++,
though I'd probably stick to C], I want to come up with some really good
answers to those two points. The possible answers I have today - and I'd
like more options :) - are
a)
 - have a pure C library containing the dirstate parser, read_dir,
knit_parse, btree_parse, gcompressor etc. This would have an independent
C test harness (allowing easy gdb and valgrind testing), and probably
benchmark suites for gprof/oprofile/macro-run profiling and tuning. [we
can expose a C test suite cleanly to 'bzr selftest' via subunit].
 - bind that pure C library via pyrex/python's C api.
b)
 - invert the C/python relationship, and instead have the driver be C,
with the python VM loaded and called into. This is not uncommon in the
python world - there are plenty of examples for doing this.
 - pros: C all the way is possible, for really scaling-intensive
operations.
 - cons: C is nowhere near as default-hookable as python, so we would
   have to be extremely cautious about things we move out of the VM to 
   ensure they remain hookable. (Well, we have to be cautious about that
today, but it's obvious to us to do this, rather than being something
extra to think about).

I think that a) is likely to be unobjectionable to the contributors to
bzr; but -long term- b) probably has more legs - has the potential to
drive higher overall results.

I want to be clear that I am *not* interested in making bzr less
flexible or hookable; I just want to be able to use a function call and
not be thinking, damn, thats a 8% performance drop.

For instance:
<<<EOF
void foo(int bar) {
}

int main(int argc, char **argv){
  int pos;
  for (pos=0; pos < 100000; pos++) {
    foo(pos);
  }
  return 0;
}
EOF
$ CFLAGS= gcc -O2 -fno-inline -o s s.c
$ python -m timeit 'import subprocess' 'subprocess.call("./s")'
100 loops, best of 3: 1.94 msec per loop

cat > e.c <<< EOF
int main(int argc, char **argv){
  return 0;
}
EOF
$ CFLAGS= gcc -O2 -fno-inline -o e e.c
$ python -m timeit 'import subprocess' 'subprocess.call("./e")'
100 loops, best of 3: 1.86 msec per loop
1.94 - 1.86 = 0.12msec

robertc at lifeless-64:/tmp/foofoo$ python -m timeit 'def foo(bar):pass'
'for pos in xrange(100000):foo(pos)'
10 loops, best of 3: 31.7 msec per loop

31.7/0.12
=> 264 times slower

This is directly impacting our code quality; the horrible insane mess of
dirstate's iter_changes is entirely due to avoiding function calls
because they showed up on profile (not lsprof, real world try-with,
try-without).

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080910/e4e9066f/attachment-0001.pgp 


More information about the bazaar mailing list