[MERGE] DirState pyrex helpers

John Arbash Meinel john at arbash-meinel.com
Thu Jul 12 23:48:31 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The attached patch contains the pyrex versions for a few DirState functions.
Most notably _read_dirblocks and bisect_dirblock.

The _read_dirblock change cuts the time to read a DirState to about 1/2 the
current value. (For 20k entries on my machine, it takes 170ms versus 384ms).
This would show up in commands like 'bzr diff' and 'bzr status', and probably
even 'bzr add', anything that needs to read the WT state. (On my Moz tree it
changes 'bzr status' from 5.6s to 5.1s, _read_dirblocks_if_needed is dropping
from 591ms to 240ms).

The bisect_dirblock code effects any time we are looking something up in the
current dirstate. (_get_entry uses it, make_absent, update_minimal, etc). This
actually effects 'bzr add' time a bit, since we do at least one lookup per new
file. (Just calling DirState.add() 20k times changes from 2.6s to 2.2s the
actual function is about 10x faster, but it just isn't a huge portion of the
time spent. The Unicode normalization check is actually a huge chunk of adding
20k files).


I would be very happy if Andrew would want to look over the code from a
security POV. I tried to do some auditing myself, but it is always good to have
another person look at it. (I'm using mem* instead of str*, except for strncmp,
etc)

I agree with Andrew that having a way to watch out for memory leaks would be
useful, but I haven't figured out exactly how to do it. We could use
gc.set_debug(gc.DEBUG_LEAK), but I don't think it is quite what we want.

I tried doing something like:
  gc.get_referrers(str)

But that didn't really get me far (it just gives me some tuples, etc). I
thought an instance of a class would have to refer to it. I can say that:

class Foo(object):
  pass

f = Foo()
gc.get_referents(f)
[<class '__main__.Foo'>]

But
gc.get_referrers(Foo)

doesn't seem to include f.

Anyway, I don't really know how to do this sort of testing in a platform
independent manner. (On some forms of Linux you can read /proc/PID/status to
find your memory usage, etc)

John
=:->

PS> This also fixes an old bug in the DirState._bisect/_bisect_recursive code,
when dealing with paths like a/ a-b a=b. To date, we don't use the code in any
live paths, but it is potentially a really big win if you are doing a partial
operation like 'bzr status foo'.
When I say potentially big win, my example is:

Function				time	entry count
_read_dirblocks_if_needed()		240ms	54708
_bisect_recursive([''])                2210ms   54708
_bisect_recursive(['js'])		320ms	 8498
_bisect_recursive(['widget'])		 26.7ms	  686
_bisect_recursive(['README.txt'])	  2.4ms	    1

_bisect_recursive is currently inefficient in that it forgets about all
information in each recursion. So the first level finds the requested directory
entries, and then calls _bisect again for any children, but it doesn't remember
where in the file it has already searched. It would need a bit of refactoring
to get that to work well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlq+/JdeBCYSNAAMRAhV6AJ92nkXJ3seDqUMOowtArEHBsA2S+ACgkhR2
GLxO4n/vm0RrjXT+xLPqfTY=
=sBNT
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dirstate_pyrex.diff
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070712/b3ece2ff/attachment-0001.diff 


More information about the bazaar mailing list