Rev 32: Merge in Robert's code. in http://bzr.arbash-meinel.com/plugins/index2

Thu Jul 3 18:12:18 BST 2008

At http://bzr.arbash-meinel.com/plugins/index2

------------------------------------------------------------
revno: 32
revision-id: john at arbash-meinel.com-20080703171207-a315ypnhybt1r6vk
parent: john at arbash-meinel.com-20080702210419-4xfq1jb9k4cuksk1
parent: robertc at robertcollins.net-20080703115714-czomk4m21tfu2ebe
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Thu 2008-07-03 12:12:07 -0500
message:
  Merge in Robert's code.
added:
  .bzrignore                     bzrignore-20080703034434-q63sohljnxg5loze-1
  _parse_btree_c.pyx             _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
  _parse_btree_py.py             _parse_btree_py.py-20080703034413-3q25bklkenti3p8p-3
modified:
  __init__.py                    __init__.py-20080624222253-p0x5f92uyh5hw734-5
  btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
  indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
  repofmt.py                     repofmt.py-20080701113732-m1iu3n94ikbxdelb-1
  setup.py                       setup.py-20080624222253-p0x5f92uyh5hw734-8
  tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.15
    revision-id: robertc at robertcollins.net-20080703115714-czomk4m21tfu2ebe
    parent: robertc at robertcollins.net-20080703082855-5smrfzrb8gamkfbh
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: trunk
    timestamp: Thu 2008-07-03 21:57:14 +1000
    message:
      Add rich root and subtree flavours of format.
    modified:
      __init__.py                    __init__.py-20080624222253-p0x5f92uyh5hw734-5
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
      repofmt.py                     repofmt.py-20080701113732-m1iu3n94ikbxdelb-1
    ------------------------------------------------------------
    revno: 8.1.14
    revision-id: robertc at robertcollins.net-20080703082855-5smrfzrb8gamkfbh
    parent: robertc at robertcollins.net-20080703073137-c2201o4l8zhne7b2
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: trunk
    timestamp: Thu 2008-07-03 18:28:55 +1000
    message:
      Add random_reload test that reopens the indices. Shows cold-start performance well.
    modified:
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
    ------------------------------------------------------------
    revno: 8.1.13
    revision-id: robertc at robertcollins.net-20080703073137-c2201o4l8zhne7b2
    parent: robertc at robertcollins.net-20080703061119-9pgnmpu1dj4qna2u
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: trunk
    timestamp: Thu 2008-07-03 17:31:37 +1000
    message:
      Use a 1000 page cache for benchmark results.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 8.1.12
    revision-id: robertc at robertcollins.net-20080703061119-9pgnmpu1dj4qna2u
    parent: robertc at robertcollins.net-20080703034549-teew69qy3z91m1pt
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: trunk
    timestamp: Thu 2008-07-03 16:11:19 +1000
    message:
      C parsing for the win.
    modified:
      _parse_btree_c.pyx             _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
      _parse_btree_py.py             _parse_btree_py.py-20080703034413-3q25bklkenti3p8p-3
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      setup.py                       setup.py-20080624222253-p0x5f92uyh5hw734-8
    ------------------------------------------------------------
    revno: 8.1.11
    revision-id: robertc at robertcollins.net-20080703034549-teew69qy3z91m1pt
    parent: robertc at robertcollins.net-20080702221844-aw67l2n2hdt2hvw0
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: trunk
    timestamp: Thu 2008-07-03 13:45:49 +1000
    message:
      Framework for C parser extension, parameterised callgraph output, shuffle lsprofiling.
    added:
      .bzrignore                     bzrignore-20080703034434-q63sohljnxg5loze-1
      _parse_btree_c.pyx             _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
      _parse_btree_py.py             _parse_btree_py.py-20080703034413-3q25bklkenti3p8p-3
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
      setup.py                       setup.py-20080624222253-p0x5f92uyh5hw734-8
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.10
    revision-id: robertc at robertcollins.net-20080702221844-aw67l2n2hdt2hvw0
    parent: robertc at robertcollins.net-20080702220240-73x9ykl98pqpg8by
    parent: john at arbash-meinel.com-20080702210419-4xfq1jb9k4cuksk1
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: trunk
    timestamp: Thu 2008-07-03 08:18:44 +1000
    message:
      Merge Johns improvements.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      chunk_writer.py                chunk_writer.py-20080630234519-6ggn4id17nipovny-1
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.9
    revision-id: robertc at robertcollins.net-20080702220240-73x9ykl98pqpg8by
    parent: robertc at robertcollins.net-20080702125159-o72w19dm8g5yrtt0
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: trunk
    timestamp: Thu 2008-07-03 08:02:40 +1000
    message:
      Various tweaks in bloom usage.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
-------------- next part --------------
=== added file '.bzrignore'

--- a/.bzrignore	1970-01-01 00:00:00 +0000
+++ b/.bzrignore	2008-07-03 17:12:07 +0000
@@ -0,0 +1,3 @@
+_parse_btree_c.c
+_parse_btree_c.pyd
+build

=== modified file '__init__.py'
--- a/__init__.py	2008-07-02 08:43:38 +0000
+++ b/__init__.py	2008-07-03 11:57:14 +0000
@@ -40,12 +40,50 @@
     experimental=True,
     )
 
+format_registry.register_metadir('btree-rich-root',
+    'bzrlib.plugins.index2.repofmt.RepositoryFormatPackBTreeRichRoot',
+    help='Trivial rename of rich-root-pack with B+Tree based index. '
+        'Please read '
+        'http://doc.bazaar-vcs.org/latest/developers/development-repo.html '
+        'before use.',
+    branch_format='bzrlib.branch.BzrBranchFormat6',
+    tree_format='bzrlib.workingtree.WorkingTreeFormat4',
+    hidden=False,
+    experimental=True,
+    )
+
+format_registry.register_metadir('btree-subtrees',
+    'bzrlib.plugins.index2.repofmt.RepositoryFormatPackBTreeSubtrees',
+    help='Trivial rename of pack-0.92-subtrees with B+Tree based index. '
+        'Please read '
+        'http://doc.bazaar-vcs.org/latest/developers/development-repo.html '
+        'before use.',
+    branch_format='bzrlib.branch.BzrBranchFormat6',
+    tree_format='bzrlib.workingtree.WorkingTreeFormat4',
+    hidden=False,
+    experimental=True,
+    )
+
 from bzrlib.repository import format_registry as repo_registry
 repo_registry.register_lazy(
     'Bazaar development format - btree (needs bzr.dev from 1.6)\n',
     'bzrlib.plugins.index2.repofmt',
     'RepositoryFormatPackBTreePlain',
     )
+from bzrlib.repository import format_registry as repo_registry
+repo_registry.register_lazy(
+    'Bazaar development format - btree-rich-root (needs bzr.dev from 1.6)\n',
+    'bzrlib.plugins.index2.repofmt',
+    'RepositoryFormatPackBTreeRichRoot',
+    )
+
+from bzrlib.repository import format_registry as repo_registry
+repo_registry.register_lazy(
+    'Bazaar development format - btree-subtrees (needs bzr.dev from 1.6)\n',
+    'bzrlib.plugins.index2.repofmt',
+    'RepositoryFormatPackBTreeSubtrees',
+    )
+
 
 import indexbench
 from bzrlib import commands

=== added file '_parse_btree_c.pyx'
--- a/_parse_btree_c.pyx	1970-01-01 00:00:00 +0000
+++ b/_parse_btree_c.pyx	2008-07-03 06:11:19 +0000
@@ -0,0 +1,209 @@
+# Copyright (C) 2008 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+"""Pyrex extensions to btree node parsing."""
+
+import sys
+
+cdef extern from "stdlib.h":
+    ctypedef unsigned size_t
+    long int strtol(char *nptr, char **endptr, int base)
+
+
+cdef extern from "Python.h":
+    int PyDict_CheckExact(object)
+    void *PyDict_GetItem_void "PyDict_GetItem" (object p, object key)
+    int PyDict_SetItem(object p, object key, object val) except -1
+
+    int PyList_Append(object lst, object item) except -1
+    object PyList_GET_ITEM(object lst, int index)
+    int PyList_CheckExact(object)
+
+    void *PyTuple_GetItem_void_void "PyTuple_GET_ITEM" (void* tpl, int index)
+
+    char *PyString_AsString(object p)
+    object PyString_FromStringAndSize(char *, int)
+    object PyString_FromString(char *)
+    int PyString_Size(object p)
+
+    void Py_INCREF(object)
+
+
+cdef extern from "string.h":
+    void *memchr(void *s, int c, size_t n)
+    void *memrchr(void *s, int c, size_t n)
+    int strncmp(char *s1, char *s2, size_t n)
+
+
+cdef class BTreeLeafParser:
+
+    cdef object bytes
+    cdef int key_length
+    cdef int ref_list_length
+    cdef object keys
+
+    cdef char * cur_str
+    cdef char * end_str
+    # The current start point for parsing
+    cdef char * start
+
+    cdef int header_found
+
+    def __init__(self, bytes, key_length, ref_list_length):
+        self.bytes = bytes
+        self.key_length = key_length
+        self.ref_list_length = ref_list_length
+        self.keys = []
+        self.cur_str = NULL
+        self.end_str = NULL
+        self.header_found = 0
+
+    cdef extract_key(self, char * last):
+        """Extract a key.
+
+        :param last: points at the byte after the last byte permitted for the key.
+        """
+        cdef char *temp_ptr
+        cdef int loop_counter
+        # keys are tuples
+        loop_counter = 0
+        key_segments = []
+        while loop_counter < self.key_length:
+            loop_counter = loop_counter + 1
+            # grab a key segment
+            temp_ptr = <char*>memchr(self.start, c'\0', last - self.start)
+            if temp_ptr == NULL:
+                if loop_counter == self.key_length:
+                    # capture to last
+                    temp_ptr = last
+                else:
+                    # Invalid line
+                    failure_string = ("invalid key, wanted segment from " +
+                        repr(PyString_FromStringAndSize(self.start, last-self.start)))
+                    raise AssertionError(failure_string)
+            # capture the key string
+            key_element = PyString_FromStringAndSize(self.start, temp_ptr - self.start)
+            # advance our pointer
+            self.start = temp_ptr + 1
+            PyList_Append(key_segments, key_element)
+        return tuple(key_segments)
+
+    cdef int process_line(self) except -1:
+        """Process a line in the bytes."""
+        cdef char *last
+        cdef char *temp_ptr
+        cdef char *ref_ptr
+        cdef char *next_start
+        cdef int loop_counter
+
+        self.start = self.cur_str
+        # Find the next newline
+        last = <char*>memchr(self.start, c'\n', self.end_str - self.start)
+        if last == NULL:
+            # Process until the end of the file
+            last = self.end_str
+            self.cur_str = self.end_str
+        else:
+            # And the next string is right after it
+            self.cur_str = last + 1
+            # The last character is right before the '\n'
+            last = last
+
+        if last == self.start:
+            # parsed it all.
+            return 0
+        if last < self.start:
+            # Unexpected error condition - fail
+            return -1
+        if 0 == self.header_found:
+            if strncmp("type=leaf", self.start, last-self.start) == 0:
+                self.header_found = 1
+                return 0
+            else:
+                print "failed strncmp", repr(PyString_FromStringAndSize(self.start, last-self.start))
+                return -1
+
+        key = self.extract_key(last)
+        # find the value area
+        temp_ptr = <char*>memrchr(self.start, c'\0', last - self.start)
+        if temp_ptr == NULL:
+            # Invalid line
+            return -1
+        else:
+            # capture the value string
+            value = PyString_FromStringAndSize(temp_ptr + 1, last - temp_ptr - 1)
+            # shrink the references end point
+            last = temp_ptr
+        if self.ref_list_length:
+            ref_lists = []
+            loop_counter = 0
+            while loop_counter < self.ref_list_length:
+                ref_list = []
+                # extract a reference list
+                loop_counter = loop_counter + 1
+                if last < self.start:
+                    return -1
+                # find the next reference list end point:
+                temp_ptr = <char*>memchr(self.start, c'\t', last - self.start)
+                if temp_ptr == NULL:
+                    # Only valid for the last list
+                    if loop_counter != self.ref_list_length:
+                        # Invalid line
+                        return -1
+                        raise AssertionError("invalid key")
+                    else:
+                        # scan to the end of the ref list area
+                        ref_ptr = last
+                        next_start = last
+                else:
+                    # scan to the end of this ref list
+                    ref_ptr = temp_ptr
+                    next_start = temp_ptr + 1
+                # Now, there may be multiple keys in the ref list.
+                while self.start < ref_ptr:
+                    # loop finding keys and extracting them
+                    temp_ptr = <char*>memchr(self.start, c'\r', ref_ptr - self.start)
+                    if temp_ptr == NULL:
+                        # key runs to the end
+                        temp_ptr = ref_ptr
+                    PyList_Append(ref_list, self.extract_key(temp_ptr))
+                PyList_Append(ref_lists, tuple(ref_list))
+                # prepare for the next reference list
+                self.start = next_start
+            ref_lists = tuple(ref_lists)
+            node_value = (value, ref_lists)
+        else:
+            if last != self.start:
+                # unexpected reference data present
+                return -1
+            node_value = (value, ())
+        PyList_Append(self.keys, (key, node_value))
+        return 0
+
+    def parse(self):
+        cdef int byte_count
+        byte_count = PyString_Size(self.bytes)
+        self.cur_str = PyString_AsString(self.bytes)
+        # This points to the last character in the string
+        self.end_str = self.cur_str + byte_count
+        while self.cur_str < self.end_str:
+            self.process_line()
+        return self.keys
+
+
+def _parse_leaf_lines(bytes, key_length, ref_list_length):
+    parser = BTreeLeafParser(bytes, key_length, ref_list_length)
+    return parser.parse()

=== added file '_parse_btree_py.py'
--- a/_parse_btree_py.py	1970-01-01 00:00:00 +0000
+++ b/_parse_btree_py.py	2008-07-03 06:11:19 +0000
@@ -0,0 +1,42 @@
+# index2, a bzr plugin providing experimental index types.
+# Copyright (C) 2008 Canonical Limited.
+# 
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as published
+# by the Free Software Foundation.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+# 
+
+"""B+Tree index parsing."""
+
+def _parse_leaf_lines(bytes, key_length, ref_list_length):
+    lines = bytes.split('\n')
+    nodes = []
+    for line in lines[1:]:
+        if line == '':
+            return nodes
+        elements = line.split('\0', key_length)
+        # keys are tuples
+        key = tuple(elements[:key_length])
+        line = elements[-1]
+        references, value = line.rsplit('\0', 1)
+        if ref_list_length:
+            ref_lists = []
+            for ref_string in references.split('\t'):
+                ref_lists.append(tuple([
+                    tuple(ref.split('\0')) for ref in ref_string.split('\r') if ref
+                    ]))
+            ref_lists = tuple(ref_lists)
+            node_value = (value, ref_lists)
+        else:
+            node_value = (value, ())
+        nodes.append((key, node_value))
+    return nodes

=== modified file 'btree_index.py'
--- a/btree_index.py	2008-07-02 21:04:19 +0000
+++ b/btree_index.py	2008-07-03 07:31:37 +0000
@@ -41,6 +41,9 @@
 
 _PAGE_SIZE = 4096
 
+# 4K per page: 4MB - 1000 entries
+_NODE_CACHE_SIZE = 1000
+
 bloom_hits = 0
 leaf_value_hits = [0, 0]
 internal_node_hits = [0, 0]
@@ -192,6 +195,11 @@
                         # flesh out any internal nodes that are needed to
                         # preserve the high of the tree
                         if internal_row.writer is None:
+                            if 'index' in debug.debug_flags:
+                                trace.mutter('Creating new writer and bloom at'
+                                             ' %s, %s, old bloom: %r',
+                                             pos, internal_row.nodes,
+                                             internal_row.bloom)
                             length = _PAGE_SIZE
                             if internal_row.nodes == 0:
                                 length -= 100 # padded
@@ -201,13 +209,7 @@
                             internal_row.writer.write(_INTERNAL_FLAG)
                             internal_row.writer.write(_INTERNAL_OFFSET +
                                 str(self.rows[pos + 1].nodes) + "\n")
-                            # We reset the bloom at this point, because we are
-                            # starting a new set of child nodes
-                            if 'index' in debug.debug_flags:
-                                trace.mutter('Creating new writer and bloom at'
-                                             ' %s, %s, old bloom: %r',
-                                             pos, internal_row.nodes,
-                                             internal_row.bloom)
+                            # new bloom for the new node
                             internal_row.bloom = BloomSHA1(256 * 8)
                     # add a new leaf
                     length = _PAGE_SIZE
@@ -303,31 +305,8 @@
     def __init__(self, bytes, key_length, ref_list_length):
         """Parse bytes to create a leaf node object."""
         # splitlines mangles the \r delimiters.. don't use it.
-        self.keys = dict(self._parse_lines(bytes.split('\n'), key_length,
-            ref_list_length))
-
-    def _parse_lines(self, lines, key_length, ref_list_length):
-        nodes = []
-        for line in lines[1:]:
-            if line == '':
-                return nodes
-            elements = line.split('\0', key_length)
-            # keys are tuples
-            key = tuple(elements[:key_length])
-            line = elements[-1]
-            references, value = line.rsplit('\0', 1)
-            if ref_list_length:
-                ref_lists = []
-                for ref_string in references.split('\t'):
-                    ref_lists.append(tuple([
-                        tuple(ref.split('\0')) for ref in ref_string.split('\r') if ref
-                        ]))
-                ref_lists = tuple(ref_lists)
-                node_value = (value, ref_lists)
-            else:
-                node_value = (value, ())
-            nodes.append((key, node_value))
-        return nodes
+        self.keys = dict(_parse_btree._parse_leaf_lines(bytes,
+            key_length, ref_list_length))
 
 
 class _InternalNode(object):
@@ -389,7 +368,7 @@
         self._root_node = None
         # Default max size is 100,000 leave values
         self._leaf_value_cache = None # lru_cache.LRUCache(100*1000)
-        self._leaf_node_cache = lru_cache.LRUCache()
+        self._leaf_node_cache = lru_cache.LRUCache(_NODE_CACHE_SIZE)
         self._internal_node_cache = lru_cache.LRUCache()
         self._key_count = None
         self._use_blooms = BTreeGraphIndex._default_use_blooms
@@ -609,6 +588,8 @@
             return
         # Find all the keys that are already in our value cache
         global leaf_value_hits, miss_attempts
+        # in a two-row index, row 0 has the filter.
+        bloom_row = len(self._row_lengths) - 2
         needed_keys = []
         if self._leaf_value_cache is None:
             needed_keys = keys
@@ -636,7 +617,7 @@
             if len(self._row_lengths) > 1:
                 # no internal nodes, and thus no blooms in a length-1 b+tree.
                 bloom_raw_offsets = NodeBloom.get_raw_offsets(needed_keys)
-        for row_length in self._row_lengths[:-1]:
+        for row_pos, row_length in enumerate(self._row_lengths[:-1]):
             node_indexes = [idx for idx, s_keys in nodes_and_keys]
             nodes = self._get_internal_nodes(node_indexes)
 
@@ -645,7 +626,8 @@
             next_nodes_and_keys = []
             for node_index, sub_keys in nodes_and_keys:
                 node = nodes[node_index]
-                if self._use_blooms and node.bloom is not None:
+                if (self._use_blooms and row_pos == bloom_row and
+                    node.bloom is not None):
                     # If using blooms, filter out any nodes that don't hit.
                     remaining_sub_keys = [sub_key for sub_key in sub_keys
                         if node.bloom.contains_raw_offset(bloom_raw_offsets[sub_key])]
@@ -810,3 +792,9 @@
         node_count = sum(self._row_lengths) - 1
         for node in self._read_nodes(range(1, node_count + 1)):
             pass
+
+
+try:
+    from bzrlib.plugins.index2 import _parse_btree_c as _parse_btree
+except ImportError:
+    from bzrlib.plugins.index2 import _parse_btree_py as _parse_btree

=== modified file 'indexbench.py'
--- a/indexbench.py	2008-07-02 20:17:01 +0000
+++ b/indexbench.py	2008-07-03 11:57:14 +0000
@@ -16,6 +16,7 @@
 from bzrlib.revision import NULL_REVISION
 from bzrlib.lsprof import profile
 
+use_calltree = True
 
 key_count = 0
 
@@ -44,7 +45,11 @@
 def profile_fixture(class_name, fixture_name, fixture, *args, **kwargs):
     """Profile a fixture."""
     value, stats = profile(fixture, *args, **kwargs)
-    fname = class_name + '.' + fixture_name + '.txt'
+    if use_calltree:
+        suffix = '.callgrind'
+    else:
+        suffix = '.txt'
+    fname = class_name + '.' + fixture_name + suffix
     stats.save(fname)
     return value
 
@@ -112,17 +117,17 @@
         finish = time.time()
         print "%s: iter_all_entries in %0.3f" % (label, finish - now)
 # random shuffle, all keys (name_keys comes preshuffled)
+# This tests random access across the index - which on GraphIndex will exhaust
+# VM, on BTreeIndex can cache-thrash.
     if 'shuffle' in fixtures:
-        drop_cache()
-        reset_hit_counts()
-        now = time.time()
-        for name, index in iter_indices(names, target, factory):
-            order = name_keys[name]
-            shuffle(order)
-            for key in order:
-                index.iter_entries([key]).next()
-        finish = time.time()
-        print "%s: iter_random_one in %0.3f,%s" % (label, finish - now, hits())
+        run(label, 'iter_random_one', iter_random_one, label, 'iter_random_one',
+            drop_cache, names, target, factory, name_keys, False)
+# random shuffle, all keys (name_keys comes preshuffled), reopening the index
+# each time. This tests the minimum time to get to an average key from 'start'.
+# On GraphIndex this should be much slower than on BTree index.
+    if 'random_reload' in fixtures:
+        run(label, 'iter_random_reload', iter_random_one, 'iter_random_reload',
+            label, drop_cache, names, target, factory, name_keys, True)
 # text extraction simulation (follow a compression graph) for text_keys
     if 'text' in fixtures:
         text_names = [name for name in names if name.endswith('.tix')]
@@ -169,6 +174,23 @@
     print "%s: -------Done---------" % (label,)
 
 
+def iter_random_one(label, fixture_label, drop_cache, names, target, factory,
+    name_keys, reload_index):
+    drop_cache()
+    reset_hit_counts()
+    now = time.time()
+    for name, index in iter_indices(names, target, factory):
+        order = name_keys[name]
+        shuffle(order)
+        for key in order:
+            index.iter_entries([key]).next()
+            if reload_index:
+                index = factory(index._transport, index._name, index._size)
+                drop_cache()
+    finish = time.time()
+    print "%s: %s in %0.3f,%s" % (label, fixture_label, finish - now, hits())
+
+
 def revision_search(label, drop_cache, names, target, factory, tip_revision_id):
     rev_names = [name for name in names if name.endswith('.rix')]
     # reopen indices
@@ -212,18 +234,24 @@
         Option('btree', help='bench the BTreeGraphIndex class'),
         Option('drop-cache', help='drop caches between fixtures'),
         ListOption('fixture', type=str,
-            help='fixtures to test: one of all, shuffle, text, revision, miss'),
+            help='fixtures to test: one of all, shuffle, text, revision, miss, '
+            'random_reload.'),
         # lspro because --lsprof is a global option, and they interfere with each other.
         Option('lspro', help='generate class.fixture.callgrind lsprof files'),
+        Option('calltree', help='generate KCachegrind calltrees when profiling.'),
+        Option('test-area',type=str,
+            help='use a specific transport prefix for testing.'),
         ]
 
 
     takes_args = ['sample_branch']
 
     def run(self, sample_branch, graph=True, btree=True, drop_cache=False,
-            fixture=None, lspro=False):
+            fixture=None, lspro=False, calltree=True, test_area=None):
         if not fixture:
-            fixture = ['all', 'shuffle', 'text', 'revision', 'miss']
+            fixture = ['all', 'shuffle', 'text', 'revision', 'miss', 'random_reload']
+        global use_calltree
+        use_calltree = calltree
         from bzrlib.branch import Branch
         source_branch = Branch.open(sample_branch)
         source = source_branch.repository._transport
@@ -252,19 +280,32 @@
         global key_count
         key_count = sum(map(len, name_keys.itervalues()))
 
+        if test_area is None:
+            test_transport = None
+        else:
+            test_transport = get_transport(test_area)
+
         if btree:
             self.test_class(names, source, BTreeGraphIndex, BTreeBuilder,
                 'BTreeIndex', name_keys, text_keys, drop_cache, fixture, lspro,
-                tip_revision_id, ancestry)
+                tip_revision_id, ancestry, test_transport)
         if graph:
             self.test_class(names, source, GraphIndex, GraphIndexBuilder,
                 'GraphIndex', name_keys, text_keys, drop_cache, fixture, lspro,
-                tip_revision_id, ancestry)
+                tip_revision_id, ancestry, test_transport)
 
     def test_class(self, names, source, factory, builder, label, name_keys,
-        text_keys, drop_cache, fixtures, lsprof, tip_revision_id, ancestry):
-        workingdir = tempfile.mkdtemp()
-        t = get_transport(workingdir)
+        text_keys, drop_cache, fixtures, lsprof, tip_revision_id, ancestry,
+        test_transport):
+        if test_transport is None:
+            workingdir = tempfile.mkdtemp()
+            t = get_transport(workingdir)
+        else:
+            # 
+            test_transport.mkdir('indexbench-work')
+            t = test_transport.clone('indexbench-work')
+            if t.list_dir('.') != []:
+                raise AssertionError('dirty workspace %s' % t)
         try:
             time_index(names, source, factory, builder, t, label, name_keys,
                 text_keys, drop_cache, fixtures, lsprof, tip_revision_id,

=== modified file 'repofmt.py'
--- a/repofmt.py	2008-07-01 11:37:43 +0000
+++ b/repofmt.py	2008-07-03 11:57:14 +0000
@@ -37,6 +37,8 @@
     KnitPackRepository,
     RepositoryPackCollection,
     RepositoryFormatPackDevelopment0,
+    RepositoryFormatPackDevelopment0Subtree,
+    RepositoryFormatKnitPack4,
     Packer,
     ReconcilePacker,
     OptimisingPacker,
@@ -392,3 +394,35 @@
             "pack-0.92\n")
 
 
+class RepositoryFormatPackBTreeRichRoot(RepositoryFormatKnitPack4):
+    """A B+Tree index using pack repository."""
+
+    repository_class = BTreePackRepository
+
+    def get_format_string(self):
+        """See RepositoryFormat.get_format_string()."""
+        return ("Bazaar development format - btree-rich-root "
+            "(needs bzr.dev from 1.6)\n")
+
+    def get_format_description(self):
+        """See RepositoryFormat.get_format_description()."""
+        return ("Development repository format - btree, interoperates with "
+            "rich-root-pack\n")
+
+
+class RepositoryFormatPackBTreeSubtrees(RepositoryFormatPackDevelopment0Subtree):
+    """A B+Tree index using pack repository."""
+
+    repository_class = BTreePackRepository
+
+    def get_format_string(self):
+        """See RepositoryFormat.get_format_string()."""
+        return ("Bazaar development format - btree-subtrees "
+            "(needs bzr.dev from 1.6)\n")
+
+    def get_format_description(self):
+        """See RepositoryFormat.get_format_description()."""
+        return ("Development repository format - btree, interoperates with "
+            "pack-0.92-subtrees\n")
+
+

=== modified file 'setup.py'
--- a/setup.py	2008-06-24 22:37:56 +0000
+++ b/setup.py	2008-07-03 06:11:19 +0000
@@ -1,11 +1,83 @@
-#!/usr/bin/env python2.4
+#!/usr/bin/env python
 from distutils.core import setup
 
 bzr_plugin_name = 'index2'
 
 bzr_plugin_version = (1, 6, 0, 'dev', 0)
 
-if __name__ == 'main':
+
+from distutils import log
+from distutils.errors import CCompilerError, DistutilsPlatformError
+from distutils.extension import Extension
+ext_modules = []
+try:
+    from Pyrex.Distutils import build_ext
+except ImportError:
+    have_pyrex = False
+    # try to build the extension from the prior generated source.
+    print
+    print ("The python package 'Pyrex' is not available."
+           " If the .c files are available,")
+    print ("they will be built,"
+           " but modifying the .pyx files will not rebuild them.")
+    print
+    from distutils.command.build_ext import build_ext
+else:
+    have_pyrex = True
+
+
+class build_ext_if_possible(build_ext):
+
+    def run(self):
+        try:
+            build_ext.run(self)
+        except DistutilsPlatformError, e:
+            log.warn(str(e))
+            log.warn('Extensions cannot be built, '
+                     'will use the Python versions instead')
+
+    def build_extension(self, ext):
+        try:
+            build_ext.build_extension(self, ext)
+        except CCompilerError:
+            log.warn('Building of "%s" extension failed, '
+                     'will use the Python version instead' % (ext.name,))
+
+
+# Override the build_ext if we have Pyrex available
+unavailable_files = []
+
+
+def add_pyrex_extension(module_name, **kwargs):
+    """Add a pyrex module to build.
+
+    This will use Pyrex to auto-generate the .c file if it is available.
+    Otherwise it will fall back on the .c file. If the .c file is not
+    available, it will warn, and not add anything.
+
+    You can pass any extra options to Extension through kwargs. One example is
+    'libraries = []'.
+
+    :param module_name: The python path to the module. This will be used to
+        determine the .pyx and .c files to use.
+    """
+    path = module_name.replace('.', '/')
+    pyrex_name = path + '.pyx'
+    c_name = path + '.c'
+    # Manually honour package_dir :(
+    module_name = 'bzrlib.plugins.index2.' + module_name
+    if have_pyrex:
+        ext_modules.append(Extension(module_name, [pyrex_name]))
+    else:
+        if not os.path.isfile(c_name):
+            unavailable_files.append(c_name)
+        else:
+            ext_modules.append(Extension(module_name, [c_name]))
+
+add_pyrex_extension('_parse_btree_c')
+
+
+if __name__ == '__main__':
     setup(name="bzr index2",
           version="1.6.0dev0",
           description="bzr btree indices.",
@@ -16,4 +88,7 @@
           packages=['bzrlib.plugins.index2',
                     'bzrlib.plugins.index2.tests',
                     ],
-          package_dir={'bzrlib.plugins.index2': '.'})
+          package_dir={'bzrlib.plugins.index2': '.'},
+          cmdclass={'build_ext': build_ext_if_possible},
+          ext_modules=ext_modules,
+          )

=== modified file 'tests/test_btree_index.py'
--- a/tests/test_btree_index.py	2008-07-02 21:00:46 +0000
+++ b/tests/test_btree_index.py	2008-07-03 03:45:49 +0000
@@ -25,10 +25,32 @@
 from bzrlib.plugins import index2
 from bzrlib.plugins.index2 import errors, btree_index
 from bzrlib.plugins.pybloom.pybloom import BloomSHA1
-from bzrlib.tests import TestCaseWithTransport
+from bzrlib.tests import (
+    TestCaseWithTransport,
+    TestScenarioApplier,
+    adapt_tests,
+    condition_isinstance,
+    split_suite_by_condition,
+    )
 from bzrlib.transport import get_transport
 
 
+def load_tests(standard_tests, module, loader):
+    # parameterise the TestBTreeNodes tests
+    node_tests, others = split_suite_by_condition(standard_tests,
+        condition_isinstance(TestBTreeNodes))
+    applier = TestScenarioApplier()
+    import bzrlib.plugins.index2._parse_btree_py as py_module
+    applier.scenarios = [('python', {'parse_btree': py_module})]
+    try:
+        import bzrlib.plugins.index2._parse_btree_c as c_module
+        applier.scenarios.append(('C', {'parse_btree': c_module}))
+    except ImportError:
+        pass
+    adapt_tests(node_tests, applier, others)
+    return others
+
+
 class BTreeTestCase(TestCaseWithTransport):
     # test names here are suffixed by the key length and reference list count
     # that they test.
@@ -529,6 +551,15 @@
 
 class TestBTreeNodes(BTreeTestCase):
 
+    def restore_parser(self):
+        btree_index._parse_btree = self.saved_parser
+
+    def setUp(self):
+        BTreeTestCase.setUp(self)
+        self.saved_parser = btree_index._parse_btree
+        self.addCleanup(self.restore_parser)
+        btree_index._parse_btree = self.parse_btree
+
     def test_LeafNode_1_0(self):
         node_bytes = ("type=leaf\n"
             "0000000000000000000000000000000000000000\x00\x00value:0\n"