Rev 32: Merge in Robert's code. in http://bzr.arbash-meinel.com/plugins/index2
John Arbash Meinel
john at arbash-meinel.com
Thu Jul 3 18:12:18 BST 2008
At http://bzr.arbash-meinel.com/plugins/index2
------------------------------------------------------------
revno: 32
revision-id: john at arbash-meinel.com-20080703171207-a315ypnhybt1r6vk
parent: john at arbash-meinel.com-20080702210419-4xfq1jb9k4cuksk1
parent: robertc at robertcollins.net-20080703115714-czomk4m21tfu2ebe
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Thu 2008-07-03 12:12:07 -0500
message:
Merge in Robert's code.
added:
.bzrignore bzrignore-20080703034434-q63sohljnxg5loze-1
_parse_btree_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
_parse_btree_py.py _parse_btree_py.py-20080703034413-3q25bklkenti3p8p-3
modified:
__init__.py __init__.py-20080624222253-p0x5f92uyh5hw734-5
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
repofmt.py repofmt.py-20080701113732-m1iu3n94ikbxdelb-1
setup.py setup.py-20080624222253-p0x5f92uyh5hw734-8
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.15
revision-id: robertc at robertcollins.net-20080703115714-czomk4m21tfu2ebe
parent: robertc at robertcollins.net-20080703082855-5smrfzrb8gamkfbh
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 21:57:14 +1000
message:
Add rich root and subtree flavours of format.
modified:
__init__.py __init__.py-20080624222253-p0x5f92uyh5hw734-5
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
repofmt.py repofmt.py-20080701113732-m1iu3n94ikbxdelb-1
------------------------------------------------------------
revno: 8.1.14
revision-id: robertc at robertcollins.net-20080703082855-5smrfzrb8gamkfbh
parent: robertc at robertcollins.net-20080703073137-c2201o4l8zhne7b2
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 18:28:55 +1000
message:
Add random_reload test that reopens the indices. Shows cold-start performance well.
modified:
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
------------------------------------------------------------
revno: 8.1.13
revision-id: robertc at robertcollins.net-20080703073137-c2201o4l8zhne7b2
parent: robertc at robertcollins.net-20080703061119-9pgnmpu1dj4qna2u
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 17:31:37 +1000
message:
Use a 1000 page cache for benchmark results.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 8.1.12
revision-id: robertc at robertcollins.net-20080703061119-9pgnmpu1dj4qna2u
parent: robertc at robertcollins.net-20080703034549-teew69qy3z91m1pt
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 16:11:19 +1000
message:
C parsing for the win.
modified:
_parse_btree_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
_parse_btree_py.py _parse_btree_py.py-20080703034413-3q25bklkenti3p8p-3
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
setup.py setup.py-20080624222253-p0x5f92uyh5hw734-8
------------------------------------------------------------
revno: 8.1.11
revision-id: robertc at robertcollins.net-20080703034549-teew69qy3z91m1pt
parent: robertc at robertcollins.net-20080702221844-aw67l2n2hdt2hvw0
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 13:45:49 +1000
message:
Framework for C parser extension, parameterised callgraph output, shuffle lsprofiling.
added:
.bzrignore bzrignore-20080703034434-q63sohljnxg5loze-1
_parse_btree_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
_parse_btree_py.py _parse_btree_py.py-20080703034413-3q25bklkenti3p8p-3
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
setup.py setup.py-20080624222253-p0x5f92uyh5hw734-8
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.10
revision-id: robertc at robertcollins.net-20080702221844-aw67l2n2hdt2hvw0
parent: robertc at robertcollins.net-20080702220240-73x9ykl98pqpg8by
parent: john at arbash-meinel.com-20080702210419-4xfq1jb9k4cuksk1
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 08:18:44 +1000
message:
Merge Johns improvements.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.9
revision-id: robertc at robertcollins.net-20080702220240-73x9ykl98pqpg8by
parent: robertc at robertcollins.net-20080702125159-o72w19dm8g5yrtt0
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 08:02:40 +1000
message:
Various tweaks in bloom usage.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
-------------- next part --------------
=== added file '.bzrignore'
--- a/.bzrignore 1970-01-01 00:00:00 +0000
+++ b/.bzrignore 2008-07-03 17:12:07 +0000
@@ -0,0 +1,3 @@
+_parse_btree_c.c
+_parse_btree_c.pyd
+build
=== modified file '__init__.py'
--- a/__init__.py 2008-07-02 08:43:38 +0000
+++ b/__init__.py 2008-07-03 11:57:14 +0000
@@ -40,12 +40,50 @@
experimental=True,
)
+format_registry.register_metadir('btree-rich-root',
+ 'bzrlib.plugins.index2.repofmt.RepositoryFormatPackBTreeRichRoot',
+ help='Trivial rename of rich-root-pack with B+Tree based index. '
+ 'Please read '
+ 'http://doc.bazaar-vcs.org/latest/developers/development-repo.html '
+ 'before use.',
+ branch_format='bzrlib.branch.BzrBranchFormat6',
+ tree_format='bzrlib.workingtree.WorkingTreeFormat4',
+ hidden=False,
+ experimental=True,
+ )
+
+format_registry.register_metadir('btree-subtrees',
+ 'bzrlib.plugins.index2.repofmt.RepositoryFormatPackBTreeSubtrees',
+ help='Trivial rename of pack-0.92-subtrees with B+Tree based index. '
+ 'Please read '
+ 'http://doc.bazaar-vcs.org/latest/developers/development-repo.html '
+ 'before use.',
+ branch_format='bzrlib.branch.BzrBranchFormat6',
+ tree_format='bzrlib.workingtree.WorkingTreeFormat4',
+ hidden=False,
+ experimental=True,
+ )
+
from bzrlib.repository import format_registry as repo_registry
repo_registry.register_lazy(
'Bazaar development format - btree (needs bzr.dev from 1.6)\n',
'bzrlib.plugins.index2.repofmt',
'RepositoryFormatPackBTreePlain',
)
+from bzrlib.repository import format_registry as repo_registry
+repo_registry.register_lazy(
+ 'Bazaar development format - btree-rich-root (needs bzr.dev from 1.6)\n',
+ 'bzrlib.plugins.index2.repofmt',
+ 'RepositoryFormatPackBTreeRichRoot',
+ )
+
+from bzrlib.repository import format_registry as repo_registry
+repo_registry.register_lazy(
+ 'Bazaar development format - btree-subtrees (needs bzr.dev from 1.6)\n',
+ 'bzrlib.plugins.index2.repofmt',
+ 'RepositoryFormatPackBTreeSubtrees',
+ )
+
import indexbench
from bzrlib import commands
=== added file '_parse_btree_c.pyx'
--- a/_parse_btree_c.pyx 1970-01-01 00:00:00 +0000
+++ b/_parse_btree_c.pyx 2008-07-03 06:11:19 +0000
@@ -0,0 +1,209 @@
+# Copyright (C) 2008 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+"""Pyrex extensions to btree node parsing."""
+
+import sys
+
+cdef extern from "stdlib.h":
+ ctypedef unsigned size_t
+ long int strtol(char *nptr, char **endptr, int base)
+
+
+cdef extern from "Python.h":
+ int PyDict_CheckExact(object)
+ void *PyDict_GetItem_void "PyDict_GetItem" (object p, object key)
+ int PyDict_SetItem(object p, object key, object val) except -1
+
+ int PyList_Append(object lst, object item) except -1
+ object PyList_GET_ITEM(object lst, int index)
+ int PyList_CheckExact(object)
+
+ void *PyTuple_GetItem_void_void "PyTuple_GET_ITEM" (void* tpl, int index)
+
+ char *PyString_AsString(object p)
+ object PyString_FromStringAndSize(char *, int)
+ object PyString_FromString(char *)
+ int PyString_Size(object p)
+
+ void Py_INCREF(object)
+
+
+cdef extern from "string.h":
+ void *memchr(void *s, int c, size_t n)
+ void *memrchr(void *s, int c, size_t n)
+ int strncmp(char *s1, char *s2, size_t n)
+
+
+cdef class BTreeLeafParser:
+
+ cdef object bytes
+ cdef int key_length
+ cdef int ref_list_length
+ cdef object keys
+
+ cdef char * cur_str
+ cdef char * end_str
+ # The current start point for parsing
+ cdef char * start
+
+ cdef int header_found
+
+ def __init__(self, bytes, key_length, ref_list_length):
+ self.bytes = bytes
+ self.key_length = key_length
+ self.ref_list_length = ref_list_length
+ self.keys = []
+ self.cur_str = NULL
+ self.end_str = NULL
+ self.header_found = 0
+
+ cdef extract_key(self, char * last):
+ """Extract a key.
+
+ :param last: points at the byte after the last byte permitted for the key.
+ """
+ cdef char *temp_ptr
+ cdef int loop_counter
+ # keys are tuples
+ loop_counter = 0
+ key_segments = []
+ while loop_counter < self.key_length:
+ loop_counter = loop_counter + 1
+ # grab a key segment
+ temp_ptr = <char*>memchr(self.start, c'\0', last - self.start)
+ if temp_ptr == NULL:
+ if loop_counter == self.key_length:
+ # capture to last
+ temp_ptr = last
+ else:
+ # Invalid line
+ failure_string = ("invalid key, wanted segment from " +
+ repr(PyString_FromStringAndSize(self.start, last-self.start)))
+ raise AssertionError(failure_string)
+ # capture the key string
+ key_element = PyString_FromStringAndSize(self.start, temp_ptr - self.start)
+ # advance our pointer
+ self.start = temp_ptr + 1
+ PyList_Append(key_segments, key_element)
+ return tuple(key_segments)
+
+ cdef int process_line(self) except -1:
+ """Process a line in the bytes."""
+ cdef char *last
+ cdef char *temp_ptr
+ cdef char *ref_ptr
+ cdef char *next_start
+ cdef int loop_counter
+
+ self.start = self.cur_str
+ # Find the next newline
+ last = <char*>memchr(self.start, c'\n', self.end_str - self.start)
+ if last == NULL:
+ # Process until the end of the file
+ last = self.end_str
+ self.cur_str = self.end_str
+ else:
+ # And the next string is right after it
+ self.cur_str = last + 1
+ # The last character is right before the '\n'
+ last = last
+
+ if last == self.start:
+ # parsed it all.
+ return 0
+ if last < self.start:
+ # Unexpected error condition - fail
+ return -1
+ if 0 == self.header_found:
+ if strncmp("type=leaf", self.start, last-self.start) == 0:
+ self.header_found = 1
+ return 0
+ else:
+ print "failed strncmp", repr(PyString_FromStringAndSize(self.start, last-self.start))
+ return -1
+
+ key = self.extract_key(last)
+ # find the value area
+ temp_ptr = <char*>memrchr(self.start, c'\0', last - self.start)
+ if temp_ptr == NULL:
+ # Invalid line
+ return -1
+ else:
+ # capture the value string
+ value = PyString_FromStringAndSize(temp_ptr + 1, last - temp_ptr - 1)
+ # shrink the references end point
+ last = temp_ptr
+ if self.ref_list_length:
+ ref_lists = []
+ loop_counter = 0
+ while loop_counter < self.ref_list_length:
+ ref_list = []
+ # extract a reference list
+ loop_counter = loop_counter + 1
+ if last < self.start:
+ return -1
+ # find the next reference list end point:
+ temp_ptr = <char*>memchr(self.start, c'\t', last - self.start)
+ if temp_ptr == NULL:
+ # Only valid for the last list
+ if loop_counter != self.ref_list_length:
+ # Invalid line
+ return -1
+ raise AssertionError("invalid key")
+ else:
+ # scan to the end of the ref list area
+ ref_ptr = last
+ next_start = last
+ else:
+ # scan to the end of this ref list
+ ref_ptr = temp_ptr
+ next_start = temp_ptr + 1
+ # Now, there may be multiple keys in the ref list.
+ while self.start < ref_ptr:
+ # loop finding keys and extracting them
+ temp_ptr = <char*>memchr(self.start, c'\r', ref_ptr - self.start)
+ if temp_ptr == NULL:
+ # key runs to the end
+ temp_ptr = ref_ptr
+ PyList_Append(ref_list, self.extract_key(temp_ptr))
+ PyList_Append(ref_lists, tuple(ref_list))
+ # prepare for the next reference list
+ self.start = next_start
+ ref_lists = tuple(ref_lists)
+ node_value = (value, ref_lists)
+ else:
+ if last != self.start:
+ # unexpected reference data present
+ return -1
+ node_value = (value, ())
+ PyList_Append(self.keys, (key, node_value))
+ return 0
+
+ def parse(self):
+ cdef int byte_count
+ byte_count = PyString_Size(self.bytes)
+ self.cur_str = PyString_AsString(self.bytes)
+ # This points to the last character in the string
+ self.end_str = self.cur_str + byte_count
+ while self.cur_str < self.end_str:
+ self.process_line()
+ return self.keys
+
+
+def _parse_leaf_lines(bytes, key_length, ref_list_length):
+ parser = BTreeLeafParser(bytes, key_length, ref_list_length)
+ return parser.parse()
=== added file '_parse_btree_py.py'
--- a/_parse_btree_py.py 1970-01-01 00:00:00 +0000
+++ b/_parse_btree_py.py 2008-07-03 06:11:19 +0000
@@ -0,0 +1,42 @@
+# index2, a bzr plugin providing experimental index types.
+# Copyright (C) 2008 Canonical Limited.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as published
+# by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#
+
+"""B+Tree index parsing."""
+
+def _parse_leaf_lines(bytes, key_length, ref_list_length):
+ lines = bytes.split('\n')
+ nodes = []
+ for line in lines[1:]:
+ if line == '':
+ return nodes
+ elements = line.split('\0', key_length)
+ # keys are tuples
+ key = tuple(elements[:key_length])
+ line = elements[-1]
+ references, value = line.rsplit('\0', 1)
+ if ref_list_length:
+ ref_lists = []
+ for ref_string in references.split('\t'):
+ ref_lists.append(tuple([
+ tuple(ref.split('\0')) for ref in ref_string.split('\r') if ref
+ ]))
+ ref_lists = tuple(ref_lists)
+ node_value = (value, ref_lists)
+ else:
+ node_value = (value, ())
+ nodes.append((key, node_value))
+ return nodes
=== modified file 'btree_index.py'
--- a/btree_index.py 2008-07-02 21:04:19 +0000
+++ b/btree_index.py 2008-07-03 07:31:37 +0000
@@ -41,6 +41,9 @@
_PAGE_SIZE = 4096
+# 4K per page: 4MB - 1000 entries
+_NODE_CACHE_SIZE = 1000
+
bloom_hits = 0
leaf_value_hits = [0, 0]
internal_node_hits = [0, 0]
@@ -192,6 +195,11 @@
# flesh out any internal nodes that are needed to
# preserve the high of the tree
if internal_row.writer is None:
+ if 'index' in debug.debug_flags:
+ trace.mutter('Creating new writer and bloom at'
+ ' %s, %s, old bloom: %r',
+ pos, internal_row.nodes,
+ internal_row.bloom)
length = _PAGE_SIZE
if internal_row.nodes == 0:
length -= 100 # padded
@@ -201,13 +209,7 @@
internal_row.writer.write(_INTERNAL_FLAG)
internal_row.writer.write(_INTERNAL_OFFSET +
str(self.rows[pos + 1].nodes) + "\n")
- # We reset the bloom at this point, because we are
- # starting a new set of child nodes
- if 'index' in debug.debug_flags:
- trace.mutter('Creating new writer and bloom at'
- ' %s, %s, old bloom: %r',
- pos, internal_row.nodes,
- internal_row.bloom)
+ # new bloom for the new node
internal_row.bloom = BloomSHA1(256 * 8)
# add a new leaf
length = _PAGE_SIZE
@@ -303,31 +305,8 @@
def __init__(self, bytes, key_length, ref_list_length):
"""Parse bytes to create a leaf node object."""
# splitlines mangles the \r delimiters.. don't use it.
- self.keys = dict(self._parse_lines(bytes.split('\n'), key_length,
- ref_list_length))
-
- def _parse_lines(self, lines, key_length, ref_list_length):
- nodes = []
- for line in lines[1:]:
- if line == '':
- return nodes
- elements = line.split('\0', key_length)
- # keys are tuples
- key = tuple(elements[:key_length])
- line = elements[-1]
- references, value = line.rsplit('\0', 1)
- if ref_list_length:
- ref_lists = []
- for ref_string in references.split('\t'):
- ref_lists.append(tuple([
- tuple(ref.split('\0')) for ref in ref_string.split('\r') if ref
- ]))
- ref_lists = tuple(ref_lists)
- node_value = (value, ref_lists)
- else:
- node_value = (value, ())
- nodes.append((key, node_value))
- return nodes
+ self.keys = dict(_parse_btree._parse_leaf_lines(bytes,
+ key_length, ref_list_length))
class _InternalNode(object):
@@ -389,7 +368,7 @@
self._root_node = None
# Default max size is 100,000 leave values
self._leaf_value_cache = None # lru_cache.LRUCache(100*1000)
- self._leaf_node_cache = lru_cache.LRUCache()
+ self._leaf_node_cache = lru_cache.LRUCache(_NODE_CACHE_SIZE)
self._internal_node_cache = lru_cache.LRUCache()
self._key_count = None
self._use_blooms = BTreeGraphIndex._default_use_blooms
@@ -609,6 +588,8 @@
return
# Find all the keys that are already in our value cache
global leaf_value_hits, miss_attempts
+ # in a two-row index, row 0 has the filter.
+ bloom_row = len(self._row_lengths) - 2
needed_keys = []
if self._leaf_value_cache is None:
needed_keys = keys
@@ -636,7 +617,7 @@
if len(self._row_lengths) > 1:
# no internal nodes, and thus no blooms in a length-1 b+tree.
bloom_raw_offsets = NodeBloom.get_raw_offsets(needed_keys)
- for row_length in self._row_lengths[:-1]:
+ for row_pos, row_length in enumerate(self._row_lengths[:-1]):
node_indexes = [idx for idx, s_keys in nodes_and_keys]
nodes = self._get_internal_nodes(node_indexes)
@@ -645,7 +626,8 @@
next_nodes_and_keys = []
for node_index, sub_keys in nodes_and_keys:
node = nodes[node_index]
- if self._use_blooms and node.bloom is not None:
+ if (self._use_blooms and row_pos == bloom_row and
+ node.bloom is not None):
# If using blooms, filter out any nodes that don't hit.
remaining_sub_keys = [sub_key for sub_key in sub_keys
if node.bloom.contains_raw_offset(bloom_raw_offsets[sub_key])]
@@ -810,3 +792,9 @@
node_count = sum(self._row_lengths) - 1
for node in self._read_nodes(range(1, node_count + 1)):
pass
+
+
+try:
+ from bzrlib.plugins.index2 import _parse_btree_c as _parse_btree
+except ImportError:
+ from bzrlib.plugins.index2 import _parse_btree_py as _parse_btree
=== modified file 'indexbench.py'
--- a/indexbench.py 2008-07-02 20:17:01 +0000
+++ b/indexbench.py 2008-07-03 11:57:14 +0000
@@ -16,6 +16,7 @@
from bzrlib.revision import NULL_REVISION
from bzrlib.lsprof import profile
+use_calltree = True
key_count = 0
@@ -44,7 +45,11 @@
def profile_fixture(class_name, fixture_name, fixture, *args, **kwargs):
"""Profile a fixture."""
value, stats = profile(fixture, *args, **kwargs)
- fname = class_name + '.' + fixture_name + '.txt'
+ if use_calltree:
+ suffix = '.callgrind'
+ else:
+ suffix = '.txt'
+ fname = class_name + '.' + fixture_name + suffix
stats.save(fname)
return value
@@ -112,17 +117,17 @@
finish = time.time()
print "%s: iter_all_entries in %0.3f" % (label, finish - now)
# random shuffle, all keys (name_keys comes preshuffled)
+# This tests random access across the index - which on GraphIndex will exhaust
+# VM, on BTreeIndex can cache-thrash.
if 'shuffle' in fixtures:
- drop_cache()
- reset_hit_counts()
- now = time.time()
- for name, index in iter_indices(names, target, factory):
- order = name_keys[name]
- shuffle(order)
- for key in order:
- index.iter_entries([key]).next()
- finish = time.time()
- print "%s: iter_random_one in %0.3f,%s" % (label, finish - now, hits())
+ run(label, 'iter_random_one', iter_random_one, label, 'iter_random_one',
+ drop_cache, names, target, factory, name_keys, False)
+# random shuffle, all keys (name_keys comes preshuffled), reopening the index
+# each time. This tests the minimum time to get to an average key from 'start'.
+# On GraphIndex this should be much slower than on BTree index.
+ if 'random_reload' in fixtures:
+ run(label, 'iter_random_reload', iter_random_one, 'iter_random_reload',
+ label, drop_cache, names, target, factory, name_keys, True)
# text extraction simulation (follow a compression graph) for text_keys
if 'text' in fixtures:
text_names = [name for name in names if name.endswith('.tix')]
@@ -169,6 +174,23 @@
print "%s: -------Done---------" % (label,)
+def iter_random_one(label, fixture_label, drop_cache, names, target, factory,
+ name_keys, reload_index):
+ drop_cache()
+ reset_hit_counts()
+ now = time.time()
+ for name, index in iter_indices(names, target, factory):
+ order = name_keys[name]
+ shuffle(order)
+ for key in order:
+ index.iter_entries([key]).next()
+ if reload_index:
+ index = factory(index._transport, index._name, index._size)
+ drop_cache()
+ finish = time.time()
+ print "%s: %s in %0.3f,%s" % (label, fixture_label, finish - now, hits())
+
+
def revision_search(label, drop_cache, names, target, factory, tip_revision_id):
rev_names = [name for name in names if name.endswith('.rix')]
# reopen indices
@@ -212,18 +234,24 @@
Option('btree', help='bench the BTreeGraphIndex class'),
Option('drop-cache', help='drop caches between fixtures'),
ListOption('fixture', type=str,
- help='fixtures to test: one of all, shuffle, text, revision, miss'),
+ help='fixtures to test: one of all, shuffle, text, revision, miss, '
+ 'random_reload.'),
# lspro because --lsprof is a global option, and they interfere with each other.
Option('lspro', help='generate class.fixture.callgrind lsprof files'),
+ Option('calltree', help='generate KCachegrind calltrees when profiling.'),
+ Option('test-area',type=str,
+ help='use a specific transport prefix for testing.'),
]
takes_args = ['sample_branch']
def run(self, sample_branch, graph=True, btree=True, drop_cache=False,
- fixture=None, lspro=False):
+ fixture=None, lspro=False, calltree=True, test_area=None):
if not fixture:
- fixture = ['all', 'shuffle', 'text', 'revision', 'miss']
+ fixture = ['all', 'shuffle', 'text', 'revision', 'miss', 'random_reload']
+ global use_calltree
+ use_calltree = calltree
from bzrlib.branch import Branch
source_branch = Branch.open(sample_branch)
source = source_branch.repository._transport
@@ -252,19 +280,32 @@
global key_count
key_count = sum(map(len, name_keys.itervalues()))
+ if test_area is None:
+ test_transport = None
+ else:
+ test_transport = get_transport(test_area)
+
if btree:
self.test_class(names, source, BTreeGraphIndex, BTreeBuilder,
'BTreeIndex', name_keys, text_keys, drop_cache, fixture, lspro,
- tip_revision_id, ancestry)
+ tip_revision_id, ancestry, test_transport)
if graph:
self.test_class(names, source, GraphIndex, GraphIndexBuilder,
'GraphIndex', name_keys, text_keys, drop_cache, fixture, lspro,
- tip_revision_id, ancestry)
+ tip_revision_id, ancestry, test_transport)
def test_class(self, names, source, factory, builder, label, name_keys,
- text_keys, drop_cache, fixtures, lsprof, tip_revision_id, ancestry):
- workingdir = tempfile.mkdtemp()
- t = get_transport(workingdir)
+ text_keys, drop_cache, fixtures, lsprof, tip_revision_id, ancestry,
+ test_transport):
+ if test_transport is None:
+ workingdir = tempfile.mkdtemp()
+ t = get_transport(workingdir)
+ else:
+ #
+ test_transport.mkdir('indexbench-work')
+ t = test_transport.clone('indexbench-work')
+ if t.list_dir('.') != []:
+ raise AssertionError('dirty workspace %s' % t)
try:
time_index(names, source, factory, builder, t, label, name_keys,
text_keys, drop_cache, fixtures, lsprof, tip_revision_id,
=== modified file 'repofmt.py'
--- a/repofmt.py 2008-07-01 11:37:43 +0000
+++ b/repofmt.py 2008-07-03 11:57:14 +0000
@@ -37,6 +37,8 @@
KnitPackRepository,
RepositoryPackCollection,
RepositoryFormatPackDevelopment0,
+ RepositoryFormatPackDevelopment0Subtree,
+ RepositoryFormatKnitPack4,
Packer,
ReconcilePacker,
OptimisingPacker,
@@ -392,3 +394,35 @@
"pack-0.92\n")
+class RepositoryFormatPackBTreeRichRoot(RepositoryFormatKnitPack4):
+ """A B+Tree index using pack repository."""
+
+ repository_class = BTreePackRepository
+
+ def get_format_string(self):
+ """See RepositoryFormat.get_format_string()."""
+ return ("Bazaar development format - btree-rich-root "
+ "(needs bzr.dev from 1.6)\n")
+
+ def get_format_description(self):
+ """See RepositoryFormat.get_format_description()."""
+ return ("Development repository format - btree, interoperates with "
+ "rich-root-pack\n")
+
+
+class RepositoryFormatPackBTreeSubtrees(RepositoryFormatPackDevelopment0Subtree):
+ """A B+Tree index using pack repository."""
+
+ repository_class = BTreePackRepository
+
+ def get_format_string(self):
+ """See RepositoryFormat.get_format_string()."""
+ return ("Bazaar development format - btree-subtrees "
+ "(needs bzr.dev from 1.6)\n")
+
+ def get_format_description(self):
+ """See RepositoryFormat.get_format_description()."""
+ return ("Development repository format - btree, interoperates with "
+ "pack-0.92-subtrees\n")
+
+
=== modified file 'setup.py'
--- a/setup.py 2008-06-24 22:37:56 +0000
+++ b/setup.py 2008-07-03 06:11:19 +0000
@@ -1,11 +1,83 @@
-#!/usr/bin/env python2.4
+#!/usr/bin/env python
from distutils.core import setup
bzr_plugin_name = 'index2'
bzr_plugin_version = (1, 6, 0, 'dev', 0)
-if __name__ == 'main':
+
+from distutils import log
+from distutils.errors import CCompilerError, DistutilsPlatformError
+from distutils.extension import Extension
+ext_modules = []
+try:
+ from Pyrex.Distutils import build_ext
+except ImportError:
+ have_pyrex = False
+ # try to build the extension from the prior generated source.
+ print
+ print ("The python package 'Pyrex' is not available."
+ " If the .c files are available,")
+ print ("they will be built,"
+ " but modifying the .pyx files will not rebuild them.")
+ print
+ from distutils.command.build_ext import build_ext
+else:
+ have_pyrex = True
+
+
+class build_ext_if_possible(build_ext):
+
+ def run(self):
+ try:
+ build_ext.run(self)
+ except DistutilsPlatformError, e:
+ log.warn(str(e))
+ log.warn('Extensions cannot be built, '
+ 'will use the Python versions instead')
+
+ def build_extension(self, ext):
+ try:
+ build_ext.build_extension(self, ext)
+ except CCompilerError:
+ log.warn('Building of "%s" extension failed, '
+ 'will use the Python version instead' % (ext.name,))
+
+
+# Override the build_ext if we have Pyrex available
+unavailable_files = []
+
+
+def add_pyrex_extension(module_name, **kwargs):
+ """Add a pyrex module to build.
+
+ This will use Pyrex to auto-generate the .c file if it is available.
+ Otherwise it will fall back on the .c file. If the .c file is not
+ available, it will warn, and not add anything.
+
+ You can pass any extra options to Extension through kwargs. One example is
+ 'libraries = []'.
+
+ :param module_name: The python path to the module. This will be used to
+ determine the .pyx and .c files to use.
+ """
+ path = module_name.replace('.', '/')
+ pyrex_name = path + '.pyx'
+ c_name = path + '.c'
+ # Manually honour package_dir :(
+ module_name = 'bzrlib.plugins.index2.' + module_name
+ if have_pyrex:
+ ext_modules.append(Extension(module_name, [pyrex_name]))
+ else:
+ if not os.path.isfile(c_name):
+ unavailable_files.append(c_name)
+ else:
+ ext_modules.append(Extension(module_name, [c_name]))
+
+add_pyrex_extension('_parse_btree_c')
+
+
+if __name__ == '__main__':
setup(name="bzr index2",
version="1.6.0dev0",
description="bzr btree indices.",
@@ -16,4 +88,7 @@
packages=['bzrlib.plugins.index2',
'bzrlib.plugins.index2.tests',
],
- package_dir={'bzrlib.plugins.index2': '.'})
+ package_dir={'bzrlib.plugins.index2': '.'},
+ cmdclass={'build_ext': build_ext_if_possible},
+ ext_modules=ext_modules,
+ )
=== modified file 'tests/test_btree_index.py'
--- a/tests/test_btree_index.py 2008-07-02 21:00:46 +0000
+++ b/tests/test_btree_index.py 2008-07-03 03:45:49 +0000
@@ -25,10 +25,32 @@
from bzrlib.plugins import index2
from bzrlib.plugins.index2 import errors, btree_index
from bzrlib.plugins.pybloom.pybloom import BloomSHA1
-from bzrlib.tests import TestCaseWithTransport
+from bzrlib.tests import (
+ TestCaseWithTransport,
+ TestScenarioApplier,
+ adapt_tests,
+ condition_isinstance,
+ split_suite_by_condition,
+ )
from bzrlib.transport import get_transport
+def load_tests(standard_tests, module, loader):
+ # parameterise the TestBTreeNodes tests
+ node_tests, others = split_suite_by_condition(standard_tests,
+ condition_isinstance(TestBTreeNodes))
+ applier = TestScenarioApplier()
+ import bzrlib.plugins.index2._parse_btree_py as py_module
+ applier.scenarios = [('python', {'parse_btree': py_module})]
+ try:
+ import bzrlib.plugins.index2._parse_btree_c as c_module
+ applier.scenarios.append(('C', {'parse_btree': c_module}))
+ except ImportError:
+ pass
+ adapt_tests(node_tests, applier, others)
+ return others
+
+
class BTreeTestCase(TestCaseWithTransport):
# test names here are suffixed by the key length and reference list count
# that they test.
@@ -529,6 +551,15 @@
class TestBTreeNodes(BTreeTestCase):
+ def restore_parser(self):
+ btree_index._parse_btree = self.saved_parser
+
+ def setUp(self):
+ BTreeTestCase.setUp(self)
+ self.saved_parser = btree_index._parse_btree
+ self.addCleanup(self.restore_parser)
+ btree_index._parse_btree = self.parse_btree
+
def test_LeafNode_1_0(self):
node_bytes = ("type=leaf\n"
"0000000000000000000000000000000000000000\x00\x00value:0\n"
More information about the bazaar-commits
mailing list