Rev 139: Bring in trunk to prepare 0.2.1rc1 in http://bazaar.launchpad.net/~jameinel/meliae/0.2
John Arbash Meinel
john at arbash-meinel.com
Wed Jun 30 22:45:45 BST 2010
At http://bazaar.launchpad.net/~jameinel/meliae/0.2
------------------------------------------------------------
revno: 139 [merge]
revision-id: john at arbash-meinel.com-20100630214526-kygn6zb1ma7opu1c
parent: john at arbash-meinel.com-20100108230835-l4gwbl3711kzl9bs
parent: john at arbash-meinel.com-20100630214455-2mpjbggxot4vly80
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 0.2
timestamp: Wed 2010-06-30 16:45:26 -0500
message:
Bring in trunk to prepare 0.2.1rc1
modified:
CHANGES.txt changes.txt-20100104131503-ipkk7tyh2bnv0lu4-1
TODO.txt todo.txt-20090406140810-jcdemk8ci5r1t3gu-1
meliae/__init__.py __init__.py-20090401185017-cy1sj2rnyn6z7haz-4
meliae/_intset.pyx _intset.pyx-20090402034220-0rjrbrlgg8iuicjv-1
meliae/_scanner.pyx _scanner.pyx-20090401185718-094vrprmymne09r1-2
meliae/_scanner_core.c _scanner_core.c-20090402012435-66bb6fp08v4begco-1
meliae/loader.py loader.py-20090402195228-cw8lxf847wp00s90-1
meliae/tests/test__intset.py test__intset.py-20090402034220-0rjrbrlgg8iuicjv-2
meliae/tests/test__loader.py test__loader.py-20090403201651-opywr80iv8lsqp76-1
-------------- next part --------------
=== modified file 'CHANGES.txt'
--- a/CHANGES.txt 2010-01-08 23:08:35 +0000
+++ b/CHANGES.txt 2010-06-30 21:44:55 +0000
@@ -5,6 +5,28 @@
.. contents:: List of Releases
:depth: 1
+Meliae 0.2.1rc1
+###############
+
+:0.2.1rc1: (not released yet)
+
+* Avoid calling ``PyType_Type.tp_traverse`` when the argument is not a
+ heap-class. There is an assert that gets tripped if you are running a
+ debug build (or something to do with how Fedora builds its python).
+ (John Arbash Meinel, #586122)
+
+* Flush the file handle before exiting. We were doing it at the Python
+ layer, but that might not translate into the ``FILE*`` object.
+ (John Arbash Meinel, #428165)
+
+* Handle some issues when using Pyrex 0.9.8.4. It was treating
+ ``<unsigned long>`` as casting the object pointer, not as a Python
+ level "extract an integer". However, assignment to an ``cdef unsigned
+ long`` does the right thing. (John Arbash Meinel)
+
+* Tweak some memory performance issues (Gary Poster, #581918)
+
+
Meliae 0.2.0
############
=== modified file 'TODO.txt'
--- a/TODO.txt 2009-04-07 21:01:42 +0000
+++ b/TODO.txt 2010-06-30 21:32:18 +0000
@@ -4,27 +4,7 @@
A fairly random collection of things to work on next...
-1) Coming up with a catchy or at least somewhat interesting name.
-
- I suck at names. Currently "memory_dump" is the library, pymemdump is
- the project. I don't mind a functional name, but I don't want people
- going "ugh" when they think of using the tool. :)
-
- When this happens, create an official project on Launchpad, and host it
- there.
-
-2) (DONE @ revno 58) Tracking the memory consumed by the GC overhead.
-
- Objects allocated in the garbage collector (just about everything,
- strings being the notable exception) actually have a PyGC_Head
- structure allocated first. So while a 1 entry tuple *looks* like it
- is only 16 bytes, it actually has another probably 16-byte PyGC_Head
- structure allocated for each one.
-
- I haven't quite figured out how to tell if a given object is in the
- gc. It may just be a bit-field in the type object.
-
-3) Generating a Calltree output.
+1) Generating a Calltree output.
I haven't yet understood the calltree syntax, nor how I want to
exactly match things. Certainly you don't have FILE/LINE to put into
@@ -34,7 +14,7 @@
.. _runsnakerun: http://www.vrplumber.com/programming/runsnakerun/
-4) Other analysis tools, like walking the ref graph.
+2) Other analysis tools, like walking the ref graph.
I'm thinking something similar to PDB, which could let you walk
up-and-down the reference graph, to let you figure out why that one
@@ -42,40 +22,12 @@
At the moment, you can do this using '*' in Vim, which is at least a
start, and one reason to use a text-compatible dump format.
-5) Easier ways to hook this into existing processes...
-
- I'm not really sure what to do here, but adding a function to make it
- easier to write-out and load-in the memory info, when you aren't as
- memory constrained.
-
- The dump file current takes ~ the same amount of memory as the actual
- objects in ram, both on disk, and then when loaded back into memory.
-
-6) Dump differencing utilities.
+3) Dump differencing utilities.
This probably will make it a bit easier to see where memory is
increasing, rather than just where it is at right now.
-7) Cheaper "dict" of MemObjects.
-
- At the moment, loading a 2M object dump costs 50MB for just the dict
- holding them. However each entry uses a simple object address as the
- key, which it maintains on the object itself. So instead of 3-words
- per entry, you could use 1. Further, the address isn't all that great
- as a hash key. Namely 90% of your objects are aligned on a 16-byte
- boundary, another 9% or so on a 8-byte boundary, and the random
- Integer is allocated on a 4-byte boundary. Regardless, just using
- "address & 0xFF" is going to have ~16x more collisions than doing
- something a bit more sensible. (Rotate the bits a bit.)
-
- Also, I'm thinking to allow you to load a dump file, and strip off
- things that may not be as interesting. Like whether you want values
- or not, or if you wanted to limit the maximum reference list to 100
- or so. I figure at more that 100, you aren't all that interested in
- an individual reference. At it might be nice to be able to analyze
- big dump files without consuming all of your memory.
-
-8) Full cross-platform and version compatibility.
+4) Full cross-platform and version compatibility testing.
I'd like to support python2.4+, 32/64-bit, Win/Linux/Mac. I've tested
a couple variants, but I don't have all of them to make sure it works
=== modified file 'meliae/__init__.py'
--- a/meliae/__init__.py 2010-01-08 23:03:50 +0000
+++ b/meliae/__init__.py 2010-01-08 23:07:14 +0000
@@ -14,6 +14,6 @@
"""A simple way to dump memory consumption of a running python program."""
-version_info = (0, 2, 0, 'final', 0)
+version_info = (0, 2, 1, 'dev', 0)
__version__ = '.'.join(map(str, version_info))
=== modified file 'meliae/_intset.pyx'
--- a/meliae/_intset.pyx 2010-01-08 22:10:36 +0000
+++ b/meliae/_intset.pyx 2010-06-30 17:29:31 +0000
@@ -112,8 +112,12 @@
perturb = perturb >> 5 # PERTURB_SHIFT
def __contains__(self, val):
- cdef int_type c_val, *entry
- c_val = val
+ cdef int_type i_val
+ i_val = val
+ return self._contains(i_val)
+
+ cdef object _contains(self, int_type c_val):
+ cdef int_type *entry
if c_val == _singleton1:
if self._has_singleton & 0x01:
return True
@@ -200,6 +204,7 @@
% (c_val, entry[0]))
def add(self, val):
+ """Add a new entry to the set."""
self._add(val)
@@ -210,8 +215,25 @@
addresses tend to be aligned on 16-byte boundaries (occasionally 8-byte,
and even more rarely on 4-byte), as such the standard hash lookup has more
collisions than desired.
+
+ Also, addresses are considered to be unsigned longs by python, but
+ Py_ssize_t is a signed long. Just treating it normally causes us to get a
+ value overflow on 32-bits if the highest bit is set.
"""
+ def add(self, val):
+ cdef unsigned long ul_val
+ ul_val = val
+ self._add(<int_type>(ul_val))
+
+ def __contains__(self, val):
+ cdef unsigned long ul_val
+ ul_val = val
+ return self._contains(<int_type>(ul_val))
+
+ # TODO: Consider that the code would probably be simpler if we just
+ # bit-shifted before passing the value to self._add and self._contains,
+ # rather than re-implementing _lookup here.
cdef int_type *_lookup(self, int_type c_val) except NULL:
"""Taken from the set() algorithm."""
cdef size_t offset, perturb
=== modified file 'meliae/_scanner.pyx'
--- a/meliae/_scanner.pyx 2009-12-30 16:25:15 +0000
+++ b/meliae/_scanner.pyx 2010-05-20 14:08:53 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2009 Canonical Ltd
+# Copyright (C) 2009, 2010 Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 3 as
@@ -21,6 +21,7 @@
FILE *stderr
size_t fwrite(void *, size_t, size_t, FILE *)
size_t fprintf(FILE *, char *, ...)
+ void fflush(FILE *)
cdef extern from "Python.h":
FILE *PyFile_AsFile(object)
@@ -60,7 +61,7 @@
cdef void _file_io_callback(void *callee_data, char *bytes, size_t len):
cdef FILE *file_cb
-
+
file_cb = <FILE *>callee_data
fwrite(bytes, 1, len, file_cb)
@@ -93,9 +94,9 @@
fp_out = PyFile_AsFile(out)
if fp_out != NULL:
- # This must be a callable
_dump_object_info(<write_callback>_file_io_callback, fp_out, obj,
nodump, recurse_depth)
+ fflush(fp_out)
else:
_dump_object_info(<write_callback>_callable_callback, <void *>out, obj,
nodump, recurse_depth)
=== modified file 'meliae/_scanner_core.c'
--- a/meliae/_scanner_core.c 2009-12-30 16:25:15 +0000
+++ b/meliae/_scanner_core.c 2010-06-30 18:05:55 +0000
@@ -1,4 +1,4 @@
-/* Copyright (C) 2009 Canonical Ltd
+/* Copyright (C) 2009, 2010 Canonical Ltd
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 3 as
@@ -397,6 +397,7 @@
{
Py_ssize_t size;
int retval;
+ int do_traverse;
if (info->nodump != NULL &&
info->nodump != Py_None
@@ -473,12 +474,26 @@
_write_to_ref_info(info, ", \"len\": " SSIZET_FMT, PyDict_Size(c_obj));
}
_write_static_to_info(info, ", \"refs\": [");
- if (Py_TYPE(c_obj)->tp_traverse != NULL) {
+ do_traverse = 1;
+ if (Py_TYPE(c_obj)->tp_traverse == NULL
+ || (Py_TYPE(c_obj)->tp_traverse == PyType_Type.tp_traverse
+ && !PyType_HasFeature((PyTypeObject*)c_obj, Py_TPFLAGS_HEAPTYPE)))
+ {
+ /* Obviously we don't traverse if there is no traverse function. But
+ * also, if this is a 'Type' (class definition), then
+ * PyTypeObject.tp_traverse has an assertion about whether this type is
+ * a HEAPTYPE. In debug builds, this can trip and cause failures, even
+ * though it doesn't seem to hurt anything.
+ * See: https://bugs.launchpad.net/bugs/586122
+ */
+ do_traverse = 0;
+ }
+ if (do_traverse) {
info->first = 1;
Py_TYPE(c_obj)->tp_traverse(c_obj, _dump_reference, info);
}
_write_static_to_info(info, "]}\n");
- if (Py_TYPE(c_obj)->tp_traverse != NULL && recurse != 0) {
+ if (do_traverse && recurse != 0) {
if (recurse == 2) { /* Always dump one layer deeper */
Py_TYPE(c_obj)->tp_traverse(c_obj, _dump_child, info);
} else if (recurse == 1) {
@@ -514,7 +529,10 @@
if (lst == NULL) {
return NULL;
}
- if (Py_TYPE(c_obj)->tp_traverse != NULL) {
+ if (Py_TYPE(c_obj)->tp_traverse != NULL
+ && (Py_TYPE(c_obj)->tp_traverse != PyType_Type.tp_traverse
+ || PyType_HasFeature((PyTypeObject *)c_obj, Py_TPFLAGS_HEAPTYPE)))
+ {
Py_TYPE(c_obj)->tp_traverse(c_obj, _append_object, lst);
}
return lst;
=== modified file 'meliae/loader.py'
--- a/meliae/loader.py 2010-01-08 23:00:40 +0000
+++ b/meliae/loader.py 2010-05-20 15:58:12 +0000
@@ -395,7 +395,8 @@
collapsed = 0
total = len(self.objs)
tlast = timer()-20
- for item_idx, (address, obj) in enumerate(self.objs.items()):
+ to_be_removed = set()
+ for item_idx, (address, obj) in enumerate(self.objs.iteritems()):
if obj.type_str in ('str', 'dict', 'tuple', 'list', 'type',
'function', 'wrapper_descriptor',
'code', 'classobj', 'int',
@@ -441,9 +442,14 @@
obj.total_size = 0
if obj.type_str == 'instance':
obj.type_str = type_obj.value
- # Now that all the data has been moved into the instance, remove
- # the dict from the collection
- del self.objs[dict_obj.address]
+ # Now that all the data has been moved into the instance, we
+ # will want to remove the dict from the collection. We'll do the
+ # actual deletion later, since we are using iteritems for this
+ # loop.
+ to_be_removed.add(dict_obj.address)
+ # Now we can do the actual deletion.
+ for address in to_be_removed:
+ del self.objs[address]
if self.show_progress:
sys.stderr.write('checked %8d / %8d collapsed %8d \n'
% (item_idx, total, collapsed))
=== modified file 'meliae/tests/test__intset.py'
--- a/meliae/tests/test__intset.py 2009-09-18 17:00:34 +0000
+++ b/meliae/tests/test__intset.py 2010-01-29 22:19:38 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2009 Canonical Ltd
+# Copyright (C) 2009, 2010 Canonical Ltd
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 3 as
@@ -14,6 +14,8 @@
"""Test the Set of Integers object."""
+import sys
+
from meliae import (
_intset,
tests,
@@ -84,19 +86,19 @@
def test_add_and_grow(self):
iset = self._set_type()
- for i in xrange(-5, 10000):
+ for i in xrange(0, 10000):
iset.add(i)
- self.assertEqual(10005, len(iset))
+ self.assertEqual(10000, len(iset))
def test_from_list(self):
- iset = self._set_type([-1, 0, 1, 2, 3, 4])
- self.assertTrue(-1 in iset)
+ iset = self._set_type([0, 1, 2, 3, 4, 5])
self.assertTrue(0 in iset)
self.assertTrue(1 in iset)
self.assertTrue(2 in iset)
self.assertTrue(3 in iset)
self.assertTrue(4 in iset)
- self.assertFalse(5 in iset)
+ self.assertTrue(5 in iset)
+ self.assertFalse(6 in iset)
def test_discard(self):
# Not supported yet... KnownFailure
@@ -110,3 +112,20 @@
class TestIDSet(TestIntSet):
_set_type = _intset.IDSet
+
+ def test_high_bit(self):
+ # Python ids() are considered to be unsigned values, but python
+ # integers are considered to be signed longs. As such, we need to play
+ # some tricks to get them to fit properly. Otherwise we get
+ # 'Overflow' exceptions
+ bigint = sys.maxint + 1
+ self.assertTrue(isinstance(bigint, long))
+ iset = self._set_type()
+ self.assertFalse(bigint in iset)
+ iset.add(bigint)
+ self.assertTrue(bigint in iset)
+
+ def test_add_singletons(self):
+ pass
+ # Negative values cannot be checked in IDSet, because we cast them to
+ # unsigned long first.
=== modified file 'meliae/tests/test__loader.py'
--- a/meliae/tests/test__loader.py 2010-01-08 22:51:33 +0000
+++ b/meliae/tests/test__loader.py 2010-06-30 17:48:19 +0000
@@ -14,6 +14,8 @@
"""Pyrex extension for tracking loaded objects"""
+import sys
+
from meliae import (
_loader,
warn,
@@ -22,7 +24,7 @@
class TestMemObjectCollection(tests.TestCase):
-
+
def test__init__(self):
moc = _loader.MemObjectCollection()
self.assertEqual(0, moc._active)
@@ -38,7 +40,6 @@
self.assertEqual(933, moc._test_lookup(933))
self.assertEqual(933, moc._test_lookup(933+1024))
self.assertEqual(933, moc._test_lookup(933L+1024L))
- self.assertEqual(933, moc._test_lookup(933L+2**32-1))
def test__len__(self):
moc = _loader.MemObjectCollection()
More information about the bazaar-commits
mailing list