Rev 4275: (jam) During BTreeIndex parsing, intern() the appropriate key bits. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Wed Apr 8 23:50:26 BST 2009
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 4275
revision-id: pqm at pqm.ubuntu.com-20090408225022-exxekai8bxhxrayk
parent: pqm at pqm.ubuntu.com-20090408191523-xbmkv119txxrwxr7
parent: john at arbash-meinel.com-20090408214053-m9192eukaj8n1kzw
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2009-04-08 23:50:22 +0100
message:
(jam) During BTreeIndex parsing, intern() the appropriate key bits.
modified:
bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 4274.1.3
revision-id: john at arbash-meinel.com-20090408214053-m9192eukaj8n1kzw
parent: john at arbash-meinel.com-20090408202307-5kmzxsbhro8hn5yb
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 1.15-btree-intern
timestamp: Wed 2009-04-08 16:40:53 -0500
message:
Use a proper tuple
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 4274.1.2
revision-id: john at arbash-meinel.com-20090408202307-5kmzxsbhro8hn5yb
parent: john at arbash-meinel.com-20090408202106-kbzs503kwtbyty1g
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 1.15-btree-intern
timestamp: Wed 2009-04-08 15:23:07 -0500
message:
Add slots to _LeafNode and _InternalNode.
We don't have many InternalNodes, but it is easy to make the change.
We can have up to 1000 LeafNodes per btree index, and up to
~40 btree indices in practice (though you won't have 1k leaves per btree
at that point.) Anyway, 100bytes * 40k leaf nodes is 4MB saved.
At the point you have 40k leaf nodes, this is pretty small, but
still worth doing.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 4274.1.1
revision-id: john at arbash-meinel.com-20090408202106-kbzs503kwtbyty1g
parent: pqm at pqm.ubuntu.com-20090408191523-xbmkv119txxrwxr7
parent: john at arbash-meinel.com-20090304040109-5y4s6vycvzcwuwfm
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 1.15-btree-intern
timestamp: Wed 2009-04-08 15:21:06 -0500
message:
Merge in the BTreeIndex intern() during parse changes.
modified:
bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 4075.3.4
revision-id: john at arbash-meinel.com-20090304040109-5y4s6vycvzcwuwfm
parent: john at arbash-meinel.com-20090304025552-rhoamen9pwjkvcua
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: intern_keys
timestamp: Tue 2009-03-03 22:01:09 -0600
message:
Remove a member we don't actually access.
modified:
bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
------------------------------------------------------------
revno: 4075.3.3
revision-id: john at arbash-meinel.com-20090304025552-rhoamen9pwjkvcua
parent: john at arbash-meinel.com-20090304025410-pzr7phpvarv25jea
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: intern_keys
timestamp: Tue 2009-03-03 20:55:52 -0600
message:
NEWS
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
------------------------------------------------------------
revno: 4075.3.2
revision-id: john at arbash-meinel.com-20090304025410-pzr7phpvarv25jea
parent: john at arbash-meinel.com-20090304025054-fze01hr79xjv21x5
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: intern_keys
timestamp: Tue 2009-03-03 20:54:10 -0600
message:
Use intern() instead of _get_cached_ascii for getting unique revision_ids and file_ids.
intern() puts the strings in a dict (similar to what we do), but it does so
without increasing the refcount, so intern()ed strings do not have
unlimited lifetime.
This allows us to avoid duplicate copies in memory, without having an unlimited cache.
Further, we avoid keeping a Unicode representation of the string around,
as we no longer use unicode revision_ids or file_ids in the codebase.
modified:
bzrlib/xml8.py xml5.py-20050907032657-aac8f960815b66b1
------------------------------------------------------------
revno: 4075.3.1
revision-id: john at arbash-meinel.com-20090304025054-fze01hr79xjv21x5
parent: pqm at pqm.ubuntu.com-20090303085413-35seprvnu885xorz
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: intern_keys
timestamp: Tue 2009-03-03 20:50:54 -0600
message:
Use PyString_InternInPlace to intern() the various parts of keys that are processed.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
=== modified file 'bzrlib/_btree_serializer_c.pyx'
--- a/bzrlib/_btree_serializer_c.pyx 2009-03-23 14:59:43 +0000
+++ b/bzrlib/_btree_serializer_c.pyx 2009-04-08 20:21:06 +0000
@@ -32,15 +32,18 @@
char *PyString_AsString(object p) except NULL
object PyString_FromStringAndSize(char *, Py_ssize_t)
+ PyObject *PyString_FromStringAndSize_ptr "PyString_FromStringAndSize" (char *, Py_ssize_t)
int PyString_CheckExact(object s)
int PyString_CheckExact_ptr "PyString_CheckExact" (PyObject *)
Py_ssize_t PyString_Size(object p)
Py_ssize_t PyString_GET_SIZE_ptr "PyString_GET_SIZE" (PyObject *)
char * PyString_AS_STRING_ptr "PyString_AS_STRING" (PyObject *)
int PyString_AsStringAndSize_ptr(PyObject *, char **buf, Py_ssize_t *len)
+ void PyString_InternInPlace(PyObject **)
int PyTuple_CheckExact(object t)
Py_ssize_t PyTuple_GET_SIZE(object t)
PyObject *PyTuple_GET_ITEM_ptr_object "PyTuple_GET_ITEM" (object tpl, int index)
+ void Py_DECREF_ptr "Py_DECREF" (PyObject *)
cdef extern from "string.h":
void *memcpy(void *dest, void *src, size_t n)
@@ -74,6 +77,21 @@
return PyString_FromStringAndSize(s, size)
+cdef object safe_interned_string_from_size(char *s, Py_ssize_t size):
+ cdef PyObject *py_str
+ if size < 0:
+ raise AssertionError(
+ 'tried to create a string with an invalid size: %d @0x%x'
+ % (size, <int>s))
+ py_str = PyString_FromStringAndSize_ptr(s, size)
+ PyString_InternInPlace(&py_str)
+ result = <object>py_str
+ # Casting a PyObject* to an <object> triggers an INCREF from Pyrex, so we
+ # DECREF it to avoid geting immortal strings
+ Py_DECREF_ptr(py_str)
+ return result
+
+
cdef class BTreeLeafParser:
"""Parse the leaf nodes of a BTree index.
@@ -142,8 +160,8 @@
# TODO: Consider using PyIntern_FromString, the only caveat is that
# it assumes a NULL-terminated string, so we have to check if
# temp_ptr[0] == c'\0' or some other char.
- key_element = safe_string_from_size(self._start,
- temp_ptr - self._start)
+ key_element = safe_interned_string_from_size(self._start,
+ temp_ptr - self._start)
# advance our pointer
self._start = temp_ptr + 1
PyList_Append(key_segments, key_element)
=== modified file 'bzrlib/btree_index.py'
--- a/bzrlib/btree_index.py 2009-04-04 02:50:01 +0000
+++ b/bzrlib/btree_index.py 2009-04-08 21:40:53 +0000
@@ -590,6 +590,8 @@
class _LeafNode(object):
"""A leaf node for a serialised B+Tree index."""
+ __slots__ = ('keys',)
+
def __init__(self, bytes, key_length, ref_list_length):
"""Parse bytes to create a leaf node object."""
# splitlines mangles the \r delimiters.. don't use it.
@@ -600,6 +602,8 @@
class _InternalNode(object):
"""An internal node for a serialised B+Tree index."""
+ __slots__ = ('keys', 'offset')
+
def __init__(self, bytes):
"""Parse bytes to create an internal node object."""
# splitlines mangles the \r delimiters.. don't use it.
@@ -611,7 +615,7 @@
for line in lines[2:]:
if line == '':
break
- nodes.append(tuple(line.split('\0')))
+ nodes.append(tuple(map(intern, line.split('\0'))))
return nodes
More information about the bazaar-commits
mailing list