ideas for C optimization wanted
Robert Collins
robertc at robertcollins.net
Fri Jul 4 02:20:12 BST 2008
On Thu, 2008-07-03 at 14:07 +1000, Martin Pool wrote:
> Tim said to me he was interested in doing some C or Pyrex code to
> speed up Bazaar, and asked for suggestions on where to begin.
I think primarily we suffer from IO or big-O problems more than
lack-of-C. (When we solve a performance problem by using C or Pyrex, we
end up with 2 problems).
We do have some critical points where sheer volume of operations
interacts with various python quirks (such as function call lookup!) to
make operations slow. These are places that benefit from C when we are
doing 100's of thousands of iterations.
Places that I know of that do that:
- all-history operations
- size-of-tree operations w/big trees [disk and Inventory]
- diff()
> One that comes to mind for me would be changing some of the larger
> performance-critical functions in workingtree_4.py (like
> InterDirStateTree.iter_changes) to have a simple Python implementation
> and a fast C implementation. Some of them are pretty long, partly
> because of wanting to avoid python function-call overhead. But it
> would be a good idea to do some profiling with lsprof first and see if
> they are actually near the top in current code. You should make sure
> lsprof is installed and working before going offline.
Indeed,this function could use love. But we also have some bugs here to
fix: We need to not sha sometimes, and to update shas during commit. So
investigating performance is well worth doing.
A note on working with performance. At small data sets it can be very
hard to see where problems are. I worked with 20K files to make initial
commit much faster, and at the start that was a couple of minutes long
(IIRC). I think I eventually got it to 15 seconds or so. But it takes a
lot of CPU effort to show up a 0.1 % performance difference reliably -
and once the big fruit are done thats the sort of size we end up gaining
- python has a huge death-of-a-thousand-cuts feel to it. Measurement
error is hard to avoid - and being on laptop battery exacerbates it :(.
Also, having a big enough tree, or deep enough history, to really
reproduce the problem can be a logistical problem on its own.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080704/2b07e7d4/attachment.pgp
More information about the bazaar
mailing list