[PREVIEW] pymemdump memory profiler

Sun Apr 5 19:53:23 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This last week at pycon, Michael Hudson and I were discussing some of
the difficulties we were having trying to debug memory consumption
issues with Loggerhead and Bazaar.

We tried to get Heapy to work, but it doesn't compile on Windows, it
tends to crash a bit, and the code is a bit of a nightmare to
understand. We looked at PySizer, but it didn't give particularly
accurate results. Dozer was also mentioned, which gives some nice pretty
graphs, but only of the *count* of objects, not of their actual size.
Further, it only returns things in the garbage collector, which leaves
out all *strings* (since they can't have references, they aren't in the
cyclic gc list).

So we came up with the design for "py memory dump". Currently available
from:

  bzr branch lp:~jameinel/+junk/pymemdump

Basically, the code is split into 2 halves. One part walks your python
objects and dumps the size information to disk. The *really* nice thing
is that we figured out ways to get accurate information, and only
allocate ~20kB of memory to do so. This was important, as when I last
wanted to do this, memory consumption was up at 1GB, not leaving much
room for the analyzer to run. Also, we use a pyrex/C extension, so that
we can get very accurate sizes on stuff like "dict". (A dict with 2M
entries takes up 50MB of memory for its hash array... 24 bytes per dict
entry is a significant fraction of a 40-byte object.) Also, writing a
python analyzer in python ends up allocating a lot of objects that you
then have to ignore.

We decided to dump out a JSON description, just because it was easy, and
we needed *something*.

The second half is the code to interpret the resultant dump. It is
pretty interesting how far you can get with stuff like "sort | uniq -c |
cut" when you just have that data available.

The goal is to write a conversion into something like "Calltree" format,
so you could load the memory graph into KCacheGrind. In the medium term,
I've at least implemented the same summary info that you can get with
heapy. It also should be fairly easy to do the same "compare the current
heap with the previous heap", etc.

Anyway, at this point, the scanning code works quite well, the loading
code is reasonably fast, and can give summary statistics, so I figure it
is at the point where it can actually be useful to people.

Some things I'd like to work on next, and would be happy to get help for:

1) Coming up with a catchy or at least somewhat interesting name.

   I suck at names. Currently "memory_dump" is the library, pymemdump is
   the project. I don't mind a functional name, but I don't want people
   going "ugh" when they think of using the tool. :)

2) Tracking the memory consumed by the GC overhead.

   Objects allocated in the garbage collector (just about everything,
   strings being the notable exception) actually have a PyGC_Head
   structure allocated first. So while a 1 entry tuple *looks* like it
   is only 16 bytes, it actually has another probably 16-byte PyGC_Head
   structure allocated for each one.

   I haven't quite figured out how to tell if a given object is in the
   gc. It may just be a bit-field in the type object.

3) Generating a Calltree output.

   I haven't yet understood the calltree syntax, nor how I want to
   exactly match things. Certainly you don't have FILE/LINE to put into
   the output.

4) Other analysis tools, like walking the ref graph.

   I'm thinking something similar to PDB, which could let you walk
   up-and-down the reference graph, to let you figure out why that one
   string is being cached, by going through the 10 layers of references.
   At the moment, you can do this using '*' in Vim, which is at least a
   start, and one reason to use a text-compatible dump format.

5) Easier ways to hook this into existing processes...

   I'm not really sure what to do here, but adding a function to make it
   easier to write-out and load-in the memory info, when you aren't as
   memory constrained.

   The dump file current takes ~ the same amount of memory as the actual
   objects in ram, both on disk, and then when loaded back into memory.

6) Dump differencing utilities.

   This probably will make it a bit easier to see where memory is
   increasing, rather than just where it is at right now.

7) Cheaper "dict" of MemObjects.

   At the moment, loading a 2M object dump costs 50MB for just the dict
   holding them. However each entry uses a simple object address as the
   key, which it maintains on the object itself. So instead of 3-words
   per entry, you could use 1. Further, the address isn't all that great
   as a hash key. Namely 90% of your objects are aligned on a 16-byte
   boundary, another 9% or so on a 8-byte boundary, and the random
   Integer is allocated on a 4-byte boundary. Regardless, just using
   "address & 0xFF" is going to have ~16x more collisions than doing
   something a bit more sensible. (Rotate the bits a bit.)

   Also, I'm thinking to allow you to load a dump file, and strip off
   things that may not be as interesting. Like whether you want values
   or not, or if you wanted to limit the maximum reference list to 100
   or so. I figure at more that 100, you aren't all that interested in
   an individual reference. At it might be nice to be able to analyze
   big dump files without consuming all of your memory.

8) Full cross-platform and version compatibility.

   I'd like to support python2.4+, 32/64-bit, Win/Linux/Mac. I've tested
   a couple variants, but I don't have all of them to make sure it works
   everywhere.

Anyway, feel free to grab a copy and let me know what you think. Feeback
is certainly welcome.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknY/iMACgkQJdeBCYSNAAOKjgCgyeXoNN94fwxOjK42Zb/ROL0I
8rsAoNDEDu5IobpjrkIhfMBJtZ3gygEz
=K3y0
-----END PGP SIGNATURE-----