ideas for C optimization wanted

Robert Collins robertc at robertcollins.net
Fri Jul 4 02:20:12 BST 2008


On Thu, 2008-07-03 at 14:07 +1000, Martin Pool wrote:
> Tim said to me he was interested in doing some C or Pyrex code to
> speed up Bazaar, and asked for suggestions on where to begin.

I think primarily we suffer from IO or big-O problems more than
lack-of-C. (When we solve a performance problem by using C or Pyrex, we
end up with 2 problems).

We do have some critical points where sheer volume of operations
interacts with various python quirks (such as function call lookup!) to
make operations slow. These are places that benefit from C when we are
doing 100's of thousands of iterations.

Places that I know of that do that:
 - all-history operations
 - size-of-tree operations w/big trees [disk and Inventory]
 - diff()


> One that comes to mind for me would be changing some of the larger
> performance-critical functions in workingtree_4.py (like
> InterDirStateTree.iter_changes) to have a simple Python implementation
> and a fast C implementation.  Some of them are pretty long, partly
> because of wanting to avoid python function-call overhead.   But it
> would be a good idea to do some profiling with lsprof first and see if
> they are actually near the top in current code.  You should make sure
> lsprof is installed and working before going offline.

Indeed,this function could use love. But we also have some bugs here to
fix: We need to not sha sometimes, and to update shas during commit. So
investigating performance is well worth doing.

A note on working with performance. At small data sets it can be very
hard to see where problems are. I worked with 20K files to make initial
commit much faster,  and at the start that was a couple of minutes long
(IIRC). I think I eventually got it to 15 seconds or so. But it takes a
lot of CPU effort to show up a 0.1 % performance difference reliably -
and once the big fruit are done thats the sort of size we end up gaining
- python has a huge death-of-a-thousand-cuts feel to it. Measurement
error is hard to avoid - and being on laptop battery exacerbates it :(.
Also, having a big enough tree, or deep enough history, to really
reproduce the problem can be a logistical problem on its own.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080704/2b07e7d4/attachment.pgp 


More information about the bazaar mailing list