Rev 139: Bring in trunk to prepare 0.2.1rc1 in http://bazaar.launchpad.net/~jameinel/meliae/0.2

Wed Jun 30 22:45:45 BST 2010

At http://bazaar.launchpad.net/~jameinel/meliae/0.2

------------------------------------------------------------
revno: 139 [merge]
revision-id: john at arbash-meinel.com-20100630214526-kygn6zb1ma7opu1c
parent: john at arbash-meinel.com-20100108230835-l4gwbl3711kzl9bs
parent: john at arbash-meinel.com-20100630214455-2mpjbggxot4vly80
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 0.2
timestamp: Wed 2010-06-30 16:45:26 -0500
message:
  Bring in trunk to prepare 0.2.1rc1
modified:
  CHANGES.txt                    changes.txt-20100104131503-ipkk7tyh2bnv0lu4-1
  TODO.txt                       todo.txt-20090406140810-jcdemk8ci5r1t3gu-1
  meliae/__init__.py             __init__.py-20090401185017-cy1sj2rnyn6z7haz-4
  meliae/_intset.pyx             _intset.pyx-20090402034220-0rjrbrlgg8iuicjv-1
  meliae/_scanner.pyx            _scanner.pyx-20090401185718-094vrprmymne09r1-2
  meliae/_scanner_core.c         _scanner_core.c-20090402012435-66bb6fp08v4begco-1
  meliae/loader.py               loader.py-20090402195228-cw8lxf847wp00s90-1
  meliae/tests/test__intset.py   test__intset.py-20090402034220-0rjrbrlgg8iuicjv-2
  meliae/tests/test__loader.py   test__loader.py-20090403201651-opywr80iv8lsqp76-1
-------------- next part --------------
=== modified file 'CHANGES.txt'

--- a/CHANGES.txt	2010-01-08 23:08:35 +0000
+++ b/CHANGES.txt	2010-06-30 21:44:55 +0000
@@ -5,6 +5,28 @@
 .. contents:: List of Releases
    :depth: 1
 
+Meliae 0.2.1rc1
+###############
+
+:0.2.1rc1: (not released yet)
+
+* Avoid calling ``PyType_Type.tp_traverse`` when the argument is not a
+  heap-class. There is an assert that gets tripped if you are running a
+  debug build (or something to do with how Fedora builds its python).
+  (John Arbash Meinel, #586122)
+
+* Flush the file handle before exiting. We were doing it at the Python
+  layer, but that might not translate into the ``FILE*`` object.
+  (John Arbash Meinel, #428165)
+
+* Handle some issues when using Pyrex 0.9.8.4. It was treating
+  ``<unsigned long>`` as casting the object pointer, not as a Python
+  level "extract an integer". However, assignment to an ``cdef unsigned
+  long`` does the right thing. (John Arbash Meinel)
+
+* Tweak some memory performance issues (Gary Poster, #581918)
+
+
 Meliae 0.2.0
 ############
 

=== modified file 'TODO.txt'
--- a/TODO.txt	2009-04-07 21:01:42 +0000
+++ b/TODO.txt	2010-06-30 21:32:18 +0000
@@ -4,27 +4,7 @@
 
 A fairly random collection of things to work on next...
 
-1) Coming up with a catchy or at least somewhat interesting name.
-
-   I suck at names. Currently "memory_dump" is the library, pymemdump is
-   the project. I don't mind a functional name, but I don't want people
-   going "ugh" when they think of using the tool.  :) 
-
-   When this happens, create an official project on Launchpad, and host it
-   there.
-
-2) (DONE @ revno 58) Tracking the memory consumed by the GC overhead.
-
-   Objects allocated in the garbage collector (just about everything,
-   strings being the notable exception) actually have a PyGC_Head
-   structure allocated first. So while a 1 entry tuple *looks* like it
-   is only 16 bytes, it actually has another probably 16-byte PyGC_Head
-   structure allocated for each one.
-
-   I haven't quite figured out how to tell if a given object is in the
-   gc. It may just be a bit-field in the type object.
-
-3) Generating a Calltree output.
+1) Generating a Calltree output.
 
    I haven't yet understood the calltree syntax, nor how I want to
    exactly match things. Certainly you don't have FILE/LINE to put into
@@ -34,7 +14,7 @@
 
 .. _runsnakerun: http://www.vrplumber.com/programming/runsnakerun/
 
-4) Other analysis tools, like walking the ref graph.
+2) Other analysis tools, like walking the ref graph.
 
    I'm thinking something similar to PDB, which could let you walk
    up-and-down the reference graph, to let you figure out why that one
@@ -42,40 +22,12 @@
    At the moment, you can do this using '*' in Vim, which is at least a
    start, and one reason to use a text-compatible dump format.
 
-5) Easier ways to hook this into existing processes...
-
-   I'm not really sure what to do here, but adding a function to make it
-   easier to write-out and load-in the memory info, when you aren't as
-   memory constrained.
-
-   The dump file current takes ~ the same amount of memory as the actual
-   objects in ram, both on disk, and then when loaded back into memory.
-
-6) Dump differencing utilities.
+3) Dump differencing utilities.
 
    This probably will make it a bit easier to see where memory is
    increasing, rather than just where it is at right now.
 
-7) Cheaper "dict" of MemObjects.
-
-   At the moment, loading a 2M object dump costs 50MB for just the dict
-   holding them. However each entry uses a simple object address as the
-   key, which it maintains on the object itself. So instead of 3-words
-   per entry, you could use 1. Further, the address isn't all that great
-   as a hash key. Namely 90% of your objects are aligned on a 16-byte
-   boundary, another 9% or so on a 8-byte boundary, and the random
-   Integer is allocated on a 4-byte boundary. Regardless, just using
-   "address & 0xFF" is going to have ~16x more collisions than doing
-   something a bit more sensible. (Rotate the bits a bit.)
-
-   Also, I'm thinking to allow you to load a dump file, and strip off
-   things that may not be as interesting. Like whether you want values
-   or not, or if you wanted to limit the maximum reference list to 100
-   or so. I figure at more that 100, you aren't all that interested in
-   an individual reference. At it might be nice to be able to analyze
-   big dump files without consuming all of your memory.
-
-8) Full cross-platform and version compatibility.
+4) Full cross-platform and version compatibility testing.
 
    I'd like to support python2.4+, 32/64-bit, Win/Linux/Mac. I've tested
    a couple variants, but I don't have all of them to make sure it works

=== modified file 'meliae/__init__.py'
--- a/meliae/__init__.py	2010-01-08 23:03:50 +0000
+++ b/meliae/__init__.py	2010-01-08 23:07:14 +0000
@@ -14,6 +14,6 @@
 
 """A simple way to dump memory consumption of a running python program."""
 
-version_info = (0, 2, 0, 'final', 0)
+version_info = (0, 2, 1, 'dev', 0)
 __version__ = '.'.join(map(str, version_info))
 

=== modified file 'meliae/_intset.pyx'
--- a/meliae/_intset.pyx	2010-01-08 22:10:36 +0000
+++ b/meliae/_intset.pyx	2010-06-30 17:29:31 +0000
@@ -112,8 +112,12 @@
             perturb = perturb >> 5 # PERTURB_SHIFT
 
     def __contains__(self, val):
-        cdef int_type c_val, *entry
-        c_val = val
+        cdef int_type i_val
+        i_val = val
+        return self._contains(i_val)
+
+    cdef object _contains(self, int_type c_val):
+        cdef int_type *entry
         if c_val == _singleton1:
             if self._has_singleton & 0x01:
                 return True
@@ -200,6 +204,7 @@
             % (c_val, entry[0]))
 
     def add(self, val):
+        """Add a new entry to the set."""
         self._add(val)
 
 
@@ -210,8 +215,25 @@
     addresses tend to be aligned on 16-byte boundaries (occasionally 8-byte,
     and even more rarely on 4-byte), as such the standard hash lookup has more
     collisions than desired.
+
+    Also, addresses are considered to be unsigned longs by python, but
+    Py_ssize_t is a signed long. Just treating it normally causes us to get a
+    value overflow on 32-bits if the highest bit is set.
     """
 
+    def add(self, val):
+        cdef unsigned long ul_val
+        ul_val = val
+        self._add(<int_type>(ul_val))
+
+    def __contains__(self, val):
+        cdef unsigned long ul_val
+        ul_val = val
+        return self._contains(<int_type>(ul_val))
+
+    # TODO: Consider that the code would probably be simpler if we just
+    # bit-shifted before passing the value to self._add and self._contains,
+    # rather than re-implementing _lookup here.
     cdef int_type *_lookup(self, int_type c_val) except NULL:
         """Taken from the set() algorithm."""
         cdef size_t offset, perturb

=== modified file 'meliae/_scanner.pyx'
--- a/meliae/_scanner.pyx	2009-12-30 16:25:15 +0000
+++ b/meliae/_scanner.pyx	2010-05-20 14:08:53 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2009 Canonical Ltd
+# Copyright (C) 2009, 2010 Canonical Ltd
 #
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License version 3 as
@@ -21,6 +21,7 @@
     FILE *stderr
     size_t fwrite(void *, size_t, size_t, FILE *)
     size_t fprintf(FILE *, char *, ...)
+    void fflush(FILE *)
 
 cdef extern from "Python.h":
     FILE *PyFile_AsFile(object)
@@ -60,7 +61,7 @@
 
 cdef void _file_io_callback(void *callee_data, char *bytes, size_t len):
     cdef FILE *file_cb
-    
+
     file_cb = <FILE *>callee_data
     fwrite(bytes, 1, len, file_cb)
 
@@ -93,9 +94,9 @@
 
     fp_out = PyFile_AsFile(out)
     if fp_out != NULL:
-        # This must be a callable
         _dump_object_info(<write_callback>_file_io_callback, fp_out, obj,
                           nodump, recurse_depth)
+        fflush(fp_out)
     else:
         _dump_object_info(<write_callback>_callable_callback, <void *>out, obj,
                           nodump, recurse_depth)

=== modified file 'meliae/_scanner_core.c'
--- a/meliae/_scanner_core.c	2009-12-30 16:25:15 +0000
+++ b/meliae/_scanner_core.c	2010-06-30 18:05:55 +0000
@@ -1,4 +1,4 @@
-/* Copyright (C) 2009 Canonical Ltd
+/* Copyright (C) 2009, 2010 Canonical Ltd
  *
  * This program is free software: you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 3 as
@@ -397,6 +397,7 @@
 {
     Py_ssize_t size;
     int retval;
+    int do_traverse;
 
     if (info->nodump != NULL && 
         info->nodump != Py_None
@@ -473,12 +474,26 @@
         _write_to_ref_info(info, ", \"len\": " SSIZET_FMT, PyDict_Size(c_obj));
     }
     _write_static_to_info(info, ", \"refs\": [");
-    if (Py_TYPE(c_obj)->tp_traverse != NULL) {
+    do_traverse = 1;
+    if (Py_TYPE(c_obj)->tp_traverse == NULL
+        || (Py_TYPE(c_obj)->tp_traverse == PyType_Type.tp_traverse
+            && !PyType_HasFeature((PyTypeObject*)c_obj, Py_TPFLAGS_HEAPTYPE)))
+    {
+        /* Obviously we don't traverse if there is no traverse function. But
+         * also, if this is a 'Type' (class definition), then
+         * PyTypeObject.tp_traverse has an assertion about whether this type is
+         * a HEAPTYPE. In debug builds, this can trip and cause failures, even
+         * though it doesn't seem to hurt anything.
+         *  See: https://bugs.launchpad.net/bugs/586122
+         */
+        do_traverse = 0;
+    }
+    if (do_traverse) {
         info->first = 1;
         Py_TYPE(c_obj)->tp_traverse(c_obj, _dump_reference, info);
     }
     _write_static_to_info(info, "]}\n");
-    if (Py_TYPE(c_obj)->tp_traverse != NULL && recurse != 0) {
+    if (do_traverse && recurse != 0) {
         if (recurse == 2) { /* Always dump one layer deeper */
             Py_TYPE(c_obj)->tp_traverse(c_obj, _dump_child, info);
         } else if (recurse == 1) {
@@ -514,7 +529,10 @@
     if (lst == NULL) {
         return NULL;
     }
-    if (Py_TYPE(c_obj)->tp_traverse != NULL) {
+    if (Py_TYPE(c_obj)->tp_traverse != NULL
+        && (Py_TYPE(c_obj)->tp_traverse != PyType_Type.tp_traverse
+            || PyType_HasFeature((PyTypeObject *)c_obj, Py_TPFLAGS_HEAPTYPE)))
+    {
         Py_TYPE(c_obj)->tp_traverse(c_obj, _append_object, lst);
     }
     return lst;

=== modified file 'meliae/loader.py'
--- a/meliae/loader.py	2010-01-08 23:00:40 +0000
+++ b/meliae/loader.py	2010-05-20 15:58:12 +0000
@@ -395,7 +395,8 @@
         collapsed = 0
         total = len(self.objs)
         tlast = timer()-20
-        for item_idx, (address, obj) in enumerate(self.objs.items()):
+        to_be_removed = set()
+        for item_idx, (address, obj) in enumerate(self.objs.iteritems()):
             if obj.type_str in ('str', 'dict', 'tuple', 'list', 'type',
                                 'function', 'wrapper_descriptor',
                                 'code', 'classobj', 'int',
@@ -441,9 +442,14 @@
             obj.total_size = 0
             if obj.type_str == 'instance':
                 obj.type_str = type_obj.value
-            # Now that all the data has been moved into the instance, remove
-            # the dict from the collection
-            del self.objs[dict_obj.address]
+            # Now that all the data has been moved into the instance, we
+            # will want to remove the dict from the collection.  We'll do the
+            # actual deletion later, since we are using iteritems for this
+            # loop.
+            to_be_removed.add(dict_obj.address)
+        # Now we can do the actual deletion.
+        for address in to_be_removed:
+            del self.objs[address]
         if self.show_progress:
             sys.stderr.write('checked %8d / %8d collapsed %8d    \n'
                              % (item_idx, total, collapsed))

=== modified file 'meliae/tests/test__intset.py'
--- a/meliae/tests/test__intset.py	2009-09-18 17:00:34 +0000
+++ b/meliae/tests/test__intset.py	2010-01-29 22:19:38 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2009 Canonical Ltd
+# Copyright (C) 2009, 2010 Canonical Ltd
 # 
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License version 3 as
@@ -14,6 +14,8 @@
 
 """Test the Set of Integers object."""
 
+import sys
+
 from meliae import (
     _intset,
     tests,
@@ -84,19 +86,19 @@
 
     def test_add_and_grow(self):
         iset = self._set_type()
-        for i in xrange(-5, 10000):
+        for i in xrange(0, 10000):
             iset.add(i)
-        self.assertEqual(10005, len(iset))
+        self.assertEqual(10000, len(iset))
 
     def test_from_list(self):
-        iset = self._set_type([-1, 0, 1, 2, 3, 4])
-        self.assertTrue(-1 in iset)
+        iset = self._set_type([0, 1, 2, 3, 4, 5])
         self.assertTrue(0 in iset)
         self.assertTrue(1 in iset)
         self.assertTrue(2 in iset)
         self.assertTrue(3 in iset)
         self.assertTrue(4 in iset)
-        self.assertFalse(5 in iset)
+        self.assertTrue(5 in iset)
+        self.assertFalse(6 in iset)
 
     def test_discard(self):
         # Not supported yet... KnownFailure
@@ -110,3 +112,20 @@
 class TestIDSet(TestIntSet):
 
     _set_type = _intset.IDSet
+
+    def test_high_bit(self):
+        # Python ids() are considered to be unsigned values, but python
+        # integers are considered to be signed longs. As such, we need to play
+        # some tricks to get them to fit properly. Otherwise we get
+        # 'Overflow' exceptions
+        bigint = sys.maxint + 1
+        self.assertTrue(isinstance(bigint, long))
+        iset = self._set_type()
+        self.assertFalse(bigint in iset)
+        iset.add(bigint)
+        self.assertTrue(bigint in iset)
+        
+    def test_add_singletons(self):
+        pass
+        # Negative values cannot be checked in IDSet, because we cast them to
+        # unsigned long first.

=== modified file 'meliae/tests/test__loader.py'
--- a/meliae/tests/test__loader.py	2010-01-08 22:51:33 +0000
+++ b/meliae/tests/test__loader.py	2010-06-30 17:48:19 +0000
@@ -14,6 +14,8 @@
 
 """Pyrex extension for tracking loaded objects"""
 
+import sys
+
 from meliae import (
     _loader,
     warn,
@@ -22,7 +24,7 @@
 
 
 class TestMemObjectCollection(tests.TestCase):
-    
+
     def test__init__(self):
         moc = _loader.MemObjectCollection()
         self.assertEqual(0, moc._active)
@@ -38,7 +40,6 @@
         self.assertEqual(933, moc._test_lookup(933))
         self.assertEqual(933, moc._test_lookup(933+1024))
         self.assertEqual(933, moc._test_lookup(933L+1024L))
-        self.assertEqual(933, moc._test_lookup(933L+2**32-1))
 
     def test__len__(self):
         moc = _loader.MemObjectCollection()