walkdirs performance

John Arbash Meinel john at arbash-meinel.com
Wed Aug 20 14:00:23 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> so, walkdirs is kinda slow.
> 
> In C, readdir + lstat every path on a mozilla tree:
>  0.4 seconds
> 
> len(_walkdirs_utf8(mozilla))
>  1.1 seconds
> 
> Still digging into _why_.
> 

do you mean list(_walkdirs_utf8(mozilla)) ?

Are you including the sort() time in the C code? I would be surprised if it
was .7 seconds overhead, though. If it was, we could certainly consider a
different data structure (sorted RB trees, Judy trees, etc.)

Also, what about the python object overhead. You are creating 55k Stat objects.

% TIMEIT 'object()'
10000000 loops, best of 3: 0.151 usec per loop
% TIMEIT -s "class MyClass(object):
dquote>   def __init__(self):
dquote>     self.a = 1
dquote>     self.b = 2
dquote>     self.c = 'a bc'
dquote> " "MyClass()"
1000000 loops, best of 3: 0.946 usec per loop

So if it takes 1us to create an object with real members, and you do 55K of
them, that should be only 55ms.
Now, there will be some tuples (they seem to cost 0.03us here) and lists (0.2us).

I still don't see a huge reason. The best I can come up with might be object
thrashing of some sort. Or possibly because it is a generator rather than
building up a list and returning it? Though it is only re-entering the
function ~5k times, which doesn't seem like a huge overhead.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIrBVnJdeBCYSNAAMRAuWfAJ9SX12OYClgzpz5hGDrLcBPo4wQmwCgzZlc
7FJfcduK0TAbusp8jsoseDg=
=nmeI
-----END PGP SIGNATURE-----



More information about the bazaar mailing list