Rev 4275: (jam) During BTreeIndex parsing, intern() the appropriate key bits. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Wed Apr 8 23:50:26 BST 2009


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 4275
revision-id: pqm at pqm.ubuntu.com-20090408225022-exxekai8bxhxrayk
parent: pqm at pqm.ubuntu.com-20090408191523-xbmkv119txxrwxr7
parent: john at arbash-meinel.com-20090408214053-m9192eukaj8n1kzw
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2009-04-08 23:50:22 +0100
message:
  (jam) During BTreeIndex parsing, intern() the appropriate key bits.
modified:
  bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
  bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 4274.1.3
    revision-id: john at arbash-meinel.com-20090408214053-m9192eukaj8n1kzw
    parent: john at arbash-meinel.com-20090408202307-5kmzxsbhro8hn5yb
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.15-btree-intern
    timestamp: Wed 2009-04-08 16:40:53 -0500
    message:
      Use a proper tuple
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 4274.1.2
    revision-id: john at arbash-meinel.com-20090408202307-5kmzxsbhro8hn5yb
    parent: john at arbash-meinel.com-20090408202106-kbzs503kwtbyty1g
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.15-btree-intern
    timestamp: Wed 2009-04-08 15:23:07 -0500
    message:
      Add slots to _LeafNode and _InternalNode.
      We don't have many InternalNodes, but it is easy to make the change.
      We can have up to 1000 LeafNodes per btree index, and up to
      ~40 btree indices in practice (though you won't have 1k leaves per btree
      at that point.) Anyway, 100bytes * 40k leaf nodes is 4MB saved.
      At the point you have 40k leaf nodes, this is pretty small, but
      still worth doing.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 4274.1.1
    revision-id: john at arbash-meinel.com-20090408202106-kbzs503kwtbyty1g
    parent: pqm at pqm.ubuntu.com-20090408191523-xbmkv119txxrwxr7
    parent: john at arbash-meinel.com-20090304040109-5y4s6vycvzcwuwfm
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.15-btree-intern
    timestamp: Wed 2009-04-08 15:21:06 -0500
    message:
      Merge in the BTreeIndex intern() during parse changes.
    modified:
      bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 4075.3.4
    revision-id: john at arbash-meinel.com-20090304040109-5y4s6vycvzcwuwfm
    parent: john at arbash-meinel.com-20090304025552-rhoamen9pwjkvcua
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: intern_keys
    timestamp: Tue 2009-03-03 22:01:09 -0600
    message:
      Remove a member we don't actually access.
    modified:
      bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
    ------------------------------------------------------------
    revno: 4075.3.3
    revision-id: john at arbash-meinel.com-20090304025552-rhoamen9pwjkvcua
    parent: john at arbash-meinel.com-20090304025410-pzr7phpvarv25jea
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: intern_keys
    timestamp: Tue 2009-03-03 20:55:52 -0600
    message:
      NEWS
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
    ------------------------------------------------------------
    revno: 4075.3.2
    revision-id: john at arbash-meinel.com-20090304025410-pzr7phpvarv25jea
    parent: john at arbash-meinel.com-20090304025054-fze01hr79xjv21x5
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: intern_keys
    timestamp: Tue 2009-03-03 20:54:10 -0600
    message:
      Use intern() instead of _get_cached_ascii for getting unique revision_ids and file_ids.
      
      intern() puts the strings in a dict (similar to what we do), but it does so
      without increasing the refcount, so intern()ed strings do not have
      unlimited lifetime.
      This allows us to avoid duplicate copies in memory, without having an unlimited cache.
      Further, we avoid keeping a Unicode representation of the string around,
      as we no longer use unicode revision_ids or file_ids in the codebase.
    modified:
      bzrlib/xml8.py                 xml5.py-20050907032657-aac8f960815b66b1
    ------------------------------------------------------------
    revno: 4075.3.1
    revision-id: john at arbash-meinel.com-20090304025054-fze01hr79xjv21x5
    parent: pqm at pqm.ubuntu.com-20090303085413-35seprvnu885xorz
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: intern_keys
    timestamp: Tue 2009-03-03 20:50:54 -0600
    message:
      Use PyString_InternInPlace to intern() the various parts of keys that are processed.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
=== modified file 'bzrlib/_btree_serializer_c.pyx'
--- a/bzrlib/_btree_serializer_c.pyx	2009-03-23 14:59:43 +0000
+++ b/bzrlib/_btree_serializer_c.pyx	2009-04-08 20:21:06 +0000
@@ -32,15 +32,18 @@
 
     char *PyString_AsString(object p) except NULL
     object PyString_FromStringAndSize(char *, Py_ssize_t)
+    PyObject *PyString_FromStringAndSize_ptr "PyString_FromStringAndSize" (char *, Py_ssize_t)
     int PyString_CheckExact(object s)
     int PyString_CheckExact_ptr "PyString_CheckExact" (PyObject *)
     Py_ssize_t PyString_Size(object p)
     Py_ssize_t PyString_GET_SIZE_ptr "PyString_GET_SIZE" (PyObject *)
     char * PyString_AS_STRING_ptr "PyString_AS_STRING" (PyObject *)
     int PyString_AsStringAndSize_ptr(PyObject *, char **buf, Py_ssize_t *len)
+    void PyString_InternInPlace(PyObject **)
     int PyTuple_CheckExact(object t)
     Py_ssize_t PyTuple_GET_SIZE(object t)
     PyObject *PyTuple_GET_ITEM_ptr_object "PyTuple_GET_ITEM" (object tpl, int index)
+    void Py_DECREF_ptr "Py_DECREF" (PyObject *)
 
 cdef extern from "string.h":
     void *memcpy(void *dest, void *src, size_t n)
@@ -74,6 +77,21 @@
     return PyString_FromStringAndSize(s, size)
 
 
+cdef object safe_interned_string_from_size(char *s, Py_ssize_t size):
+    cdef PyObject *py_str
+    if size < 0:
+        raise AssertionError(
+            'tried to create a string with an invalid size: %d @0x%x'
+            % (size, <int>s))
+    py_str = PyString_FromStringAndSize_ptr(s, size)
+    PyString_InternInPlace(&py_str)
+    result = <object>py_str
+    # Casting a PyObject* to an <object> triggers an INCREF from Pyrex, so we
+    # DECREF it to avoid geting immortal strings
+    Py_DECREF_ptr(py_str)
+    return result
+
+
 cdef class BTreeLeafParser:
     """Parse the leaf nodes of a BTree index.
 
@@ -142,8 +160,8 @@
             # TODO: Consider using PyIntern_FromString, the only caveat is that
             # it assumes a NULL-terminated string, so we have to check if
             # temp_ptr[0] == c'\0' or some other char.
-            key_element = safe_string_from_size(self._start,
-                                                temp_ptr - self._start)
+            key_element = safe_interned_string_from_size(self._start,
+                                                         temp_ptr - self._start)
             # advance our pointer
             self._start = temp_ptr + 1
             PyList_Append(key_segments, key_element)

=== modified file 'bzrlib/btree_index.py'
--- a/bzrlib/btree_index.py	2009-04-04 02:50:01 +0000
+++ b/bzrlib/btree_index.py	2009-04-08 21:40:53 +0000
@@ -590,6 +590,8 @@
 class _LeafNode(object):
     """A leaf node for a serialised B+Tree index."""
 
+    __slots__ = ('keys',)
+
     def __init__(self, bytes, key_length, ref_list_length):
         """Parse bytes to create a leaf node object."""
         # splitlines mangles the \r delimiters.. don't use it.
@@ -600,6 +602,8 @@
 class _InternalNode(object):
     """An internal node for a serialised B+Tree index."""
 
+    __slots__ = ('keys', 'offset')
+
     def __init__(self, bytes):
         """Parse bytes to create an internal node object."""
         # splitlines mangles the \r delimiters.. don't use it.
@@ -611,7 +615,7 @@
         for line in lines[2:]:
             if line == '':
                 break
-            nodes.append(tuple(line.split('\0')))
+            nodes.append(tuple(map(intern, line.split('\0'))))
         return nodes
 
 




More information about the bazaar-commits mailing list