Rev 3805: (jam) BTreeIndex will now prefetch nearby pages. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue Oct 28 20:21:01 GMT 2008


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3805
revision-id: pqm at pqm.ubuntu.com-20081028202057-u3csau9zvf0hapya
parent: pqm at pqm.ubuntu.com-20081028181614-p3qlghekhffb6cbu
parent: john at arbash-meinel.com-20081028194342-7rrdfadt13mg82jy
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2008-10-28 20:20:57 +0000
message:
  (jam) BTreeIndex will now prefetch nearby pages.
added:
  doc/developers/btree_index_prefetch.txt btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
  bzrlib/lru_cache.py            lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
  bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
  bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
    ------------------------------------------------------------
    revno: 3763.8.17
    revision-id: john at arbash-meinel.com-20081028194342-7rrdfadt13mg82jy
    parent: john at arbash-meinel.com-20081028193957-zg2eygq5cgz2bnpu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-28 14:43:42 -0500
    message:
      NEWS entry about btree prefetch.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
    ------------------------------------------------------------
    revno: 3763.8.16
    revision-id: john at arbash-meinel.com-20081028193957-zg2eygq5cgz2bnpu
    parent: john at arbash-meinel.com-20081028193747-0qmbzudogd1pjstk
    parent: pqm at pqm.ubuntu.com-20081028181614-p3qlghekhffb6cbu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-28 14:39:57 -0500
    message:
      Merge bzr.dev 3804
    added:
      bzrlib/tests/blackbox/test_dump_btree.py test_dump_btree.py-20081008203335-zkpcq230b6vubszz-1
      bzrlib/tests/fake_command.py   fake_command.py-20081021195002-r9v65tgxx63c25v9-1
      doc/developers/cycle.txt       cycle.txt-20081017031739-rw24r0cywm2ok3xu-1
      tools/packaging/lp-upload-release lpuploadrelease-20081020075647-56zdf9z6yav1bx81-1
    modified:
      Makefile                       Makefile-20050805140406-d96e3498bb61c5bb
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzr                            bzr.py-20050313053754-5485f144c7006fa6
      bzrlib/__init__.py             __init__.py-20050309040759-33e65acf91bbcd5d
      bzrlib/_dirstate_helpers_c.pyx dirstate_helpers.pyx-20070503201057-u425eni465q4idwn-3
      bzrlib/_readdir_pyx.pyx        readdir.pyx-20060609152855-rm6v321vuaqyh9tu-1
      bzrlib/_walkdirs_win32.pyx     _walkdirs_win32.pyx-20080716220454-kweh3tgxez5dvw2l-2
      bzrlib/api.py                  api.py-20070626082640-35lspz7j0ys7a8ld-1
      bzrlib/branch.py               branch.py-20050309040759-e4baf4e0d046576e
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
      bzrlib/commands.py             bzr.py-20050309040720-d10f4714595cf8c3
      bzrlib/config.py               config.py-20051011043216-070c74f4e9e338e8
      bzrlib/errors.py               errors.py-20050309040759-20512168c4e14fbd
      bzrlib/help_topics/en/hooks.txt hooks.txt-20070830033044-xxu2rced13f72dka-1
      bzrlib/index.py                index.py-20070712131115-lolkarso50vjr64s-1
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/merge.py                merge.py-20050513021216-953b65a438527106
      bzrlib/patches.py              patches.py-20050727183609-378c1cc5972ce908
      bzrlib/plugin.py               plugin.py-20050622060424-829b654519533d69
      bzrlib/plugins/launchpad/account.py account.py-20071011033320-50y6vfftywf4yllw-1
      bzrlib/plugins/launchpad/lp_directory.py lp_indirect.py-20070126012204-de5rugwlt22c7u7e-1
      bzrlib/plugins/launchpad/test_account.py test_account.py-20071011033320-50y6vfftywf4yllw-2
      bzrlib/plugins/launchpad/test_lp_directory.py test_lp_indirect.py-20070126002743-oyle362tzv9cd8mi-1
      bzrlib/python-compat.h         pythoncompat.h-20080924041409-9kvi0fgtuuqp743j-1
      bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/revisiontree.py         revisiontree.py-20060724012533-bg8xyryhxd0o0i0h-1
      bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
      bzrlib/tests/blackbox/__init__.py __init__.py-20051128053524-eba30d8255e08dc3
      bzrlib/tests/blackbox/test_command_encoding.py test_command_encoding.py-20060106032110-45431fd2ce9ff21f
      bzrlib/tests/blackbox/test_missing.py test_missing.py-20051211212735-a2cf4c1840bb84c4
      bzrlib/tests/branch_implementations/test_stacking.py test_stacking.py-20080214020755-msjlkb7urobwly0f-1
      bzrlib/tests/test_api.py       testapi.py-20051027033546-6f9be2d308d18a52
      bzrlib/tests/test_branch.py    test_branch.py-20060116013032-97819aa07b8ab3b5
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
      bzrlib/tests/test_chunk_writer.py test_chunk_writer.py-20080630234519-6ggn4id17nipovny-2
      bzrlib/tests/test_commands.py  test_command.py-20051019190109-3b17be0f52eaa7a8
      bzrlib/tests/test_config.py    testconfig.py-20051011041908-742d0c15d8d8c8eb
      bzrlib/tests/test_index.py     test_index.py-20070712131115-lolkarso50vjr64s-2
      bzrlib/tests/test_knit.py      test_knit.py-20051212171302-95d4c00dd5f11f2b
      bzrlib/tests/test_pack_repository.py test_pack_repository-20080801043947-eaw0e6h2gu75kwmy-1
      bzrlib/tests/test_patches.py   test_patches.py-20051231203844-f4974d20f6aea09c
      bzrlib/tests/test_plugins.py   plugins.py-20050622075746-32002b55e5e943e9
      bzrlib/tests/test_remote.py    test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
      bzrlib/tests/test_sftp_transport.py testsftp.py-20051027032739-247570325fec7e7e
      bzrlib/tests/test_store.py     teststore.py-20050826022702-f6caadb647395769
      bzrlib/tests/test_transform.py test_transaction.py-20060105172520-b3ffb3946550e6c4
      bzrlib/tests/tree_implementations/test_tree.py test_tree.py-20061215160206-usu7lwcj8aq2n3br-1
      bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
      bzrlib/transport/__init__.py   transport.py-20050711165921-4978aa7ce1285ad5
      bzrlib/transport/ftp/__init__.py ftp.py-20051116161804-58dc9506548c2a53
      bzrlib/transport/ftp/_gssapi.py _gssapi.py-20080611190840-7ejrtp884bk5eu72-2
      bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
      bzrlib/transport/sftp.py       sftp.py-20051019050329-ab48ce71b7e32dfe
      bzrlib/transport/ssh.py        ssh.py-20060824042150-0s9787kng6zv1nwq-1
      bzrlib/tree.py                 tree.py-20050309040759-9d5f2496be663e77
      bzrlib/win32utils.py           win32console.py-20051021033308-123c6c929d04973d
      bzrlib/workingtree.py          workingtree.py-20050511021032-29b6ec0a681e02e3
      bzrlib/workingtree_4.py        workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
      doc/developers/HACKING.txt     HACKING-20050805200004-2a5dc975d870f78c
      doc/developers/index.txt       index.txt-20070508041241-qznziunkg0nffhiw-1
      doc/developers/ppa.txt         ppa.txt-20080722055539-606u7t2z32t3ae4w-1
      doc/developers/releasing.txt   releasing.txt-20080502015919-fnrcav8fwy8ccibu-1
      doc/en/user-guide/branching_a_project.txt branching_a_project.-20071122141511-0knao2lklsdsvb1q-2
      doc/en/user-guide/core_concepts.txt core_concepts.txt-20071114035000-q36a9h57ps06uvnl-2
      doc/en/user-guide/using_checkouts.txt using_checkouts.txt-20071123055134-k5x4ekduci2lbn36-4
      setup.py                       setup.py-20050314065409-02f8a0a6e3f9bc70
    ------------------------------------------------------------
    revno: 3763.8.15
    revision-id: john at arbash-meinel.com-20081028193747-0qmbzudogd1pjstk
    parent: john at arbash-meinel.com-20081014213527-4j9uc93aq1qmn43b
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-28 14:37:47 -0500
    message:
      Review comments from Martin. Code clarity/variable name/docstring updates.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 3763.8.14
    revision-id: john at arbash-meinel.com-20081014213527-4j9uc93aq1qmn43b
    parent: john at arbash-meinel.com-20081014202843-775qw8g14vdozp7t
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-14 16:35:27 -0500
    message:
      Add in a shortcut when we haven't cached much yet.
      
      Document the current algorithm more completely, including the proper
      justification for the various steps.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
      doc/developers/btree_index_prefetch.txt btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    ------------------------------------------------------------
    revno: 3763.8.13
    revision-id: john at arbash-meinel.com-20081014202843-775qw8g14vdozp7t
    parent: john at arbash-meinel.com-20081014202722-svyfvmyhcieabhyd
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-14 15:28:43 -0500
    message:
      This is now the algorithm, rather than just a suggestion.
    modified:
      doc/developers/btree_index_prefetch.txt btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    ------------------------------------------------------------
    revno: 3763.8.12
    revision-id: john at arbash-meinel.com-20081014202722-svyfvmyhcieabhyd
    parent: john at arbash-meinel.com-20081014201953-gmbxn75fjj8vcie2
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-14 15:27:22 -0500
    message:
      Code cleanup.
      
      Move some functions around to better areas. Fix some function names, so that
      they make more sense.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 3763.8.11
    revision-id: john at arbash-meinel.com-20081014201953-gmbxn75fjj8vcie2
    parent: john at arbash-meinel.com-20081014201906-5rtgq3yvzent0gwe
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-14 15:19:53 -0500
    message:
      Use the new LRUCache.keys() member rather than peeking behind the scenes.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 3763.8.10
    revision-id: john at arbash-meinel.com-20081014201906-5rtgq3yvzent0gwe
    parent: john at arbash-meinel.com-20081014193134-yi1otoetaq96obxf
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-14 15:19:06 -0500
    message:
      Add a .keys() member to LRUCache and LRUSizeCache.
    modified:
      bzrlib/lru_cache.py            lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
      bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
    ------------------------------------------------------------
    revno: 3763.8.9
    revision-id: john at arbash-meinel.com-20081014193134-yi1otoetaq96obxf
    parent: john at arbash-meinel.com-20081014190433-81nbtayr41p7vr33
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-14 14:31:34 -0500
    message:
      Finish up the algorithm to stay within a given layer.
      
      Now when expanding requests, we will only go between layers when we
      the whole thing is small enough that we can read everything.
      In all other cases, we will only prefetch up until the layer boundary.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 3763.8.8
    revision-id: john at arbash-meinel.com-20081014190433-81nbtayr41p7vr33
    parent: john at arbash-meinel.com-20081014190226-xns3yk2nti9p96zd
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-14 14:04:33 -0500
    message:
      simple test of overlapped behavior
    modified:
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 3763.8.7
    revision-id: john at arbash-meinel.com-20081014190226-xns3yk2nti9p96zd
    parent: john at arbash-meinel.com-20081009205450-z4k6rnvj5j1tqcmu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-14 14:02:26 -0500
    message:
      A bit of doc updates, start putting in tests for current behavior.
    renamed:
      doc/developers/btree_index_request_expansion.txt => doc/developers/btree_index_prefetch.txt btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
      doc/developers/btree_index_prefetch.txt btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    ------------------------------------------------------------
    revno: 3763.8.6
    revision-id: john at arbash-meinel.com-20081009205450-z4k6rnvj5j1tqcmu
    parent: john at arbash-meinel.com-20081007214112-yl35u4yehfee3cam
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Thu 2008-10-09 15:54:50 -0500
    message:
      Fix the logic a bit, and add a bit more tweaking opportunities
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 3763.8.5
    revision-id: john at arbash-meinel.com-20081007214112-yl35u4yehfee3cam
    parent: john at arbash-meinel.com-20081006190823-8fqjwof4rf8a1xzl
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Tue 2008-10-07 16:41:12 -0500
    message:
      Rename the doc to .txt and add the vim syntax line.
    renamed:
      doc/developers/btree_index_request_expansion.rst => doc/developers/btree_index_request_expansion.txt btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    modified:
      doc/developers/btree_index_request_expansion.txt btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    ------------------------------------------------------------
    revno: 3763.8.4
    revision-id: john at arbash-meinel.com-20081006190823-8fqjwof4rf8a1xzl
    parent: john at arbash-meinel.com-20081004161137-66rj5cqt4r2dxw32
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Mon 2008-10-06 14:08:23 -0500
    message:
      A little bit more
    modified:
      doc/developers/btree_index_request_expansion.rst btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    ------------------------------------------------------------
    revno: 3763.8.3
    revision-id: john at arbash-meinel.com-20081004161137-66rj5cqt4r2dxw32
    parent: john at arbash-meinel.com-20081004155403-m1t9fbf35mbre2vy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Sat 2008-10-04 11:11:37 -0500
    message:
      A bit more info
    modified:
      doc/developers/btree_index_request_expansion.rst btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    ------------------------------------------------------------
    revno: 3763.8.2
    revision-id: john at arbash-meinel.com-20081004155403-m1t9fbf35mbre2vy
    parent: john at arbash-meinel.com-20081004141013-yskxjlwtuy2k18ue
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Sat 2008-10-04 10:54:03 -0500
    message:
      Start writing a document describing the background for the algorithm,
      along with the actual algorithm.
    added:
      doc/developers/btree_index_request_expansion.rst btree_index_request_-20081004155340-2u6apsy53f43f0xn-1
    ------------------------------------------------------------
    revno: 3763.8.1
    revision-id: john at arbash-meinel.com-20081004141013-yskxjlwtuy2k18ue
    parent: pqm at pqm.ubuntu.com-20081002172844-d6df1l8dzpsqzyup
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree_buffer
    timestamp: Sat 2008-10-04 09:10:13 -0500
    message:
      Playing around with expanding requests for btree index nodes into neighboring nodes.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
=== modified file 'NEWS'
--- a/NEWS	2008-10-28 17:41:35 +0000
+++ b/NEWS	2008-10-28 19:43:42 +0000
@@ -16,6 +16,13 @@
 
   IMPROVEMENTS:
 
+    * ``BTreeIndex`` code now is able to prefetch extra pages to help tune
+      the tradeoff between bandwidth and latency. Should be tuned
+      appropriately to not impact commands which need minimal information,
+      but provide a significant boost to ones that need more context. Only
+      has a direct impact on the ``--development2`` format which uses
+      btree's for the indexes. (John Arbash Meinel)
+
     * ``bzr dump-btree`` is a hidden command introduced to allow dumping
       the contents of a compressed btree file.  (John Arbash Meinel)
 

=== modified file 'bzrlib/btree_index.py'
--- a/bzrlib/btree_index.py	2008-10-15 21:34:10 +0000
+++ b/bzrlib/btree_index.py	2008-10-28 19:39:57 +0000
@@ -351,12 +351,6 @@
                 # First key triggers the first row
                 rows.append(_LeafBuilderRow())
             key_count += 1
-            # TODO: Flattening the node into a string key and a line should
-            #       probably be put into a pyrex function. We can do a quick
-            #       iter over all the entries to determine the final length,
-            #       and then do a single malloc() rather than lots of
-            #       intermediate mallocs as we build everything up.
-            #       ATM 3 / 13s are spent flattening nodes (10s is compressing)
             string_key, line = _btree_serializer._flatten_node(node,
                                     self.reference_lists)
             self._add_key(string_key, line, rows)
@@ -611,7 +605,7 @@
         self._name = name
         self._size = size
         self._file = None
-        self._page_size = transport.recommended_page_size()
+        self._recommended_pages = self._compute_recommended_pages()
         self._root_node = None
         # Default max size is 100,000 leave values
         self._leaf_value_cache = None # lru_cache.LRUCache(100*1000)
@@ -632,15 +626,7 @@
     def __ne__(self, other):
         return not self.__eq__(other)
 
-    def _get_root_node(self):
-        if self._root_node is None:
-            # We may not have a root node yet
-            nodes = list(self._read_nodes([0]))
-            if len(nodes):
-                self._root_node = nodes[0][1]
-        return self._root_node
-
-    def _cache_nodes(self, nodes, cache):
+    def _get_and_cache_nodes(self, nodes):
         """Read nodes and cache them in the lru.
 
         The nodes list supplied is sorted and then read from disk, each node
@@ -653,18 +639,191 @@
 
         :return: A dict of {node_pos: node}
         """
-        if len(nodes) > cache._max_cache:
-            trace.mutter('Requesting %s > %s nodes, not all will be cached',
-                         len(nodes), cache._max_cache)
         found = {}
+        start_of_leaves = None
         for node_pos, node in self._read_nodes(sorted(nodes)):
             if node_pos == 0: # Special case
                 self._root_node = node
             else:
-                cache.add(node_pos, node)
+                if start_of_leaves is None:
+                    start_of_leaves = self._row_offsets[-2]
+                if node_pos < start_of_leaves:
+                    self._internal_node_cache.add(node_pos, node)
+                else:
+                    self._leaf_node_cache.add(node_pos, node)
             found[node_pos] = node
         return found
 
+    def _compute_recommended_pages(self):
+        """Convert transport's recommended_page_size into btree pages.
+
+        recommended_page_size is in bytes, we want to know how many _PAGE_SIZE
+        pages fit in that length.
+        """
+        recommended_read = self._transport.recommended_page_size()
+        recommended_pages = int(math.ceil(recommended_read /
+                                          float(_PAGE_SIZE)))
+        return recommended_pages
+
+    def _compute_total_pages_in_index(self):
+        """How many pages are in the index.
+
+        If we have read the header we will use the value stored there.
+        Otherwise it will be computed based on the length of the index.
+        """
+        if self._size is None:
+            raise AssertionError('_compute_total_pages_in_index should not be'
+                                 ' called when self._size is None')
+        if self._root_node is not None:
+            # This is the number of pages as defined by the header
+            return self._row_offsets[-1]
+        # This is the number of pages as defined by the size of the index. They
+        # should be indentical.
+        total_pages = int(math.ceil(self._size / float(_PAGE_SIZE)))
+        return total_pages
+
+    def _expand_offsets(self, offsets):
+        """Find extra pages to download.
+
+        The idea is that we always want to make big-enough requests (like 64kB
+        for http), so that we don't waste round trips. So given the entries
+        that we already have cached and the new pages being downloaded figure
+        out what other pages we might want to read.
+
+        See also doc/developers/btree_index_prefetch.txt for more details.
+
+        :param offsets: The offsets to be read
+        :return: A list of offsets to download
+        """
+        if 'index' in debug.debug_flags:
+            trace.mutter('expanding: %s\toffsets: %s', self._name, offsets)
+
+        if len(offsets) >= self._recommended_pages:
+            # Don't add more, we are already requesting more than enough
+            if 'index' in debug.debug_flags:
+                trace.mutter('  not expanding large request (%s >= %s)',
+                             len(offsets), self._recommended_pages)
+            return offsets
+        if self._size is None:
+            # Don't try anything, because we don't know where the file ends
+            if 'index' in debug.debug_flags:
+                trace.mutter('  not expanding without knowing index size')
+            return offsets
+        total_pages = self._compute_total_pages_in_index()
+        cached_offsets = self._get_offsets_to_cached_pages()
+        # If reading recommended_pages would read the rest of the index, just
+        # do so.
+        if total_pages - len(cached_offsets) <= self._recommended_pages:
+            # Read whatever is left
+            if cached_offsets:
+                expanded = [x for x in xrange(total_pages)
+                               if x not in cached_offsets]
+            else:
+                expanded = range(total_pages)
+            if 'index' in debug.debug_flags:
+                trace.mutter('  reading all unread pages: %s', expanded)
+            return expanded
+
+        if self._root_node is None:
+            # ATM on the first read of the root node of a large index, we don't
+            # bother pre-reading any other pages. This is because the
+            # likelyhood of actually reading interesting pages is very low.
+            # See doc/developers/btree_index_prefetch.txt for a discussion, and
+            # a possible implementation when we are guessing that the second
+            # layer index is small
+            final_offsets = offsets
+        else:
+            tree_depth = len(self._row_lengths)
+            if len(cached_offsets) < tree_depth and len(offsets) == 1:
+                # We haven't read enough to justify expansion
+                # If we are only going to read the root node, and 1 leaf node,
+                # then it isn't worth expanding our request. Once we've read at
+                # least 2 nodes, then we are probably doing a search, and we
+                # start expanding our requests.
+                if 'index' in debug.debug_flags:
+                    trace.mutter('  not expanding on first reads')
+                return offsets
+            final_offsets = self._expand_to_neighbors(offsets, cached_offsets,
+                                                      total_pages)
+
+        final_offsets = sorted(final_offsets)
+        if 'index' in debug.debug_flags:
+            trace.mutter('expanded:  %s', final_offsets)
+        return final_offsets
+
+    def _expand_to_neighbors(self, offsets, cached_offsets, total_pages):
+        """Expand requests to neighbors until we have enough pages.
+
+        This is called from _expand_offsets after policy has determined that we
+        want to expand.
+        We only want to expand requests within a given layer. We cheat a little
+        bit and assume all requests will be in the same layer. This is true
+        given the current design, but if it changes this algorithm may perform
+        oddly.
+
+        :param offsets: requested offsets
+        :param cached_offsets: offsets for pages we currently have cached
+        :return: A set() of offsets after expansion
+        """
+        final_offsets = set(offsets)
+        first = end = None
+        new_tips = set(final_offsets)
+        while len(final_offsets) < self._recommended_pages and new_tips:
+            next_tips = set()
+            for pos in new_tips:
+                if first is None:
+                    first, end = self._find_layer_first_and_end(pos)
+                previous = pos - 1
+                if (previous > 0
+                    and previous not in cached_offsets
+                    and previous not in final_offsets
+                    and previous >= first):
+                    next_tips.add(previous)
+                after = pos + 1
+                if (after < total_pages
+                    and after not in cached_offsets
+                    and after not in final_offsets
+                    and after < end):
+                    next_tips.add(after)
+                # This would keep us from going bigger than
+                # recommended_pages by only expanding the first offsets.
+                # However, if we are making a 'wide' request, it is
+                # reasonable to expand all points equally.
+                # if len(final_offsets) > recommended_pages:
+                #     break
+            final_offsets.update(next_tips)
+            new_tips = next_tips
+        return final_offsets
+
+    def _find_layer_first_and_end(self, offset):
+        """Find the start/stop nodes for the layer corresponding to offset.
+
+        :return: (first, end)
+            first is the first node in this layer
+            end is the first node of the next layer
+        """
+        first = end = 0
+        for roffset in self._row_offsets:
+            first = end
+            end = roffset
+            if offset < roffset:
+                break
+        return first, end
+
+    def _get_offsets_to_cached_pages(self):
+        """Determine what nodes we already have cached."""
+        cached_offsets = set(self._internal_node_cache.keys())
+        cached_offsets.update(self._leaf_node_cache.keys())
+        if self._root_node is not None:
+            cached_offsets.add(0)
+        return cached_offsets
+
+    def _get_root_node(self):
+        if self._root_node is None:
+            # We may not have a root node yet
+            self._get_internal_nodes([0])
+        return self._root_node
+
     def _get_nodes(self, cache, node_indexes):
         found = {}
         needed = []
@@ -676,7 +835,10 @@
                 found[idx] = cache[idx]
             except KeyError:
                 needed.append(idx)
-        found.update(self._cache_nodes(needed, cache))
+        if not needed:
+            return found
+        needed = self._expand_offsets(needed)
+        found.update(self._get_and_cache_nodes(needed))
         return found
 
     def _get_internal_nodes(self, node_indexes):
@@ -1012,6 +1174,16 @@
             self._get_root_node()
         return self._key_count
 
+    def _compute_row_offsets(self):
+        """Fill out the _row_offsets attribute based on _row_lengths."""
+        offsets = []
+        row_offset = 0
+        for row in self._row_lengths:
+            offsets.append(row_offset)
+            row_offset += row
+        offsets.append(row_offset)
+        self._row_offsets = offsets
+
     def _parse_header_from_bytes(self, bytes):
         """Parse the header from a region of bytes.
 
@@ -1053,13 +1225,7 @@
                 if len(length)])
         except ValueError:
             raise errors.BadIndexOptions(self)
-        offsets = []
-        row_offset = 0
-        for row in self._row_lengths:
-            offsets.append(row_offset)
-            row_offset += row
-        offsets.append(row_offset)
-        self._row_offsets = offsets
+        self._compute_row_offsets()
 
         # calculate the bytes we have processed
         header_end = (len(signature) + sum(map(len, lines[0:4])) + 4)
@@ -1091,6 +1257,10 @@
                     self._size = len(start)
                     size = min(_PAGE_SIZE, self._size)
             else:
+                if offset > self._size:
+                    raise AssertionError('tried to read past the end'
+                                         ' of the file %s > %s'
+                                         % (offset, self._size))
                 size = min(size, self._size - offset)
             ranges.append((offset, size))
         if not ranges:

=== modified file 'bzrlib/lru_cache.py'
--- a/bzrlib/lru_cache.py	2008-04-24 07:22:53 +0000
+++ b/bzrlib/lru_cache.py	2008-10-14 20:19:06 +0000
@@ -74,6 +74,17 @@
             return self[key]
         return default
 
+    def keys(self):
+        """Get the list of keys currently cached.
+
+        Note that values returned here may not be available by the time you
+        request them later. This is simply meant as a peak into the current
+        state.
+
+        :return: An unordered list of keys that are currently cached.
+        """
+        return self._cache.keys()
+
     def cleanup(self):
         """Clear the cache until it shrinks to the requested size.
 

=== modified file 'bzrlib/tests/test_btree_index.py'
--- a/bzrlib/tests/test_btree_index.py	2008-10-15 21:40:03 +0000
+++ b/bzrlib/tests/test_btree_index.py	2008-10-28 19:39:57 +0000
@@ -1003,3 +1003,185 @@
                                      (4, ['g', 'h'])],
                                     ['a', 'b', 'd', 'e', 'g', 'h'],
                                     ['c', 'd', 'f', 'g'])
+
+
+class TestExpandOffsets(tests.TestCase):
+
+    def make_index(self, size, recommended_pages=None):
+        """Make an index with a generic size.
+
+        This doesn't actually create anything on disk, it just primes a
+        BTreeGraphIndex with the recommended information.
+        """
+        index = btree_index.BTreeGraphIndex(get_transport('memory:///'),
+                                            'test-index', size=size)
+        if recommended_pages is not None:
+            index._recommended_pages = recommended_pages
+        return index
+
+    def set_cached_offsets(self, index, cached_offsets):
+        """Monkeypatch to give a canned answer for _get_offsets_for...()."""
+        def _get_offsets_to_cached_pages():
+            cached = set(cached_offsets)
+            return cached
+        index._get_offsets_to_cached_pages = _get_offsets_to_cached_pages
+
+    def prepare_index(self, index, node_ref_lists, key_length, key_count,
+                      row_lengths, cached_offsets):
+        """Setup the BTreeGraphIndex with some pre-canned information."""
+        index.node_ref_lists = node_ref_lists
+        index._key_length = key_length
+        index._key_count = key_count
+        index._row_lengths = row_lengths
+        index._compute_row_offsets()
+        index._root_node = btree_index._InternalNode('internal\noffset=0\n')
+        self.set_cached_offsets(index, cached_offsets)
+
+    def make_100_node_index(self):
+        index = self.make_index(4096*100, 6)
+        # Consider we've already made a single request at the middle
+        self.prepare_index(index, node_ref_lists=0, key_length=1,
+                           key_count=1000, row_lengths=[1, 99],
+                           cached_offsets=[0, 50])
+        return index
+
+    def make_1000_node_index(self):
+        index = self.make_index(4096*1000, 6)
+        # Pretend we've already made a single request in the middle
+        self.prepare_index(index, node_ref_lists=0, key_length=1,
+                           key_count=90000, row_lengths=[1, 9, 990],
+                           cached_offsets=[0, 5, 500])
+        return index
+
+    def assertNumPages(self, expected_pages, index, size):
+        index._size = size
+        self.assertEqual(expected_pages, index._compute_total_pages_in_index())
+
+    def assertExpandOffsets(self, expected, index, offsets):
+        self.assertEqual(expected, index._expand_offsets(offsets),
+                         'We did not get the expected value after expanding'
+                         ' %s' % (offsets,))
+
+    def test_default_recommended_pages(self):
+        index = self.make_index(None)
+        # local transport recommends 4096 byte reads, which is 1 page
+        self.assertEqual(1, index._recommended_pages)
+
+    def test__compute_total_pages_in_index(self):
+        index = self.make_index(None)
+        self.assertNumPages(1, index, 1024)
+        self.assertNumPages(1, index, 4095)
+        self.assertNumPages(1, index, 4096)
+        self.assertNumPages(2, index, 4097)
+        self.assertNumPages(2, index, 8192)
+        self.assertNumPages(76, index, 4096*75 + 10)
+
+    def test__find_layer_start_and_stop(self):
+        index = self.make_1000_node_index()
+        self.assertEqual((0, 1), index._find_layer_first_and_end(0))
+        self.assertEqual((1, 10), index._find_layer_first_and_end(1))
+        self.assertEqual((1, 10), index._find_layer_first_and_end(9))
+        self.assertEqual((10, 1000), index._find_layer_first_and_end(10))
+        self.assertEqual((10, 1000), index._find_layer_first_and_end(99))
+        self.assertEqual((10, 1000), index._find_layer_first_and_end(999))
+
+    def test_unknown_size(self):
+        # We should not expand if we don't know the file size
+        index = self.make_index(None, 10)
+        self.assertExpandOffsets([0], index, [0])
+        self.assertExpandOffsets([1, 4, 9], index, [1, 4, 9])
+
+    def test_more_than_recommended(self):
+        index = self.make_index(4096*100, 2)
+        self.assertExpandOffsets([1, 10], index, [1, 10])
+        self.assertExpandOffsets([1, 10, 20], index, [1, 10, 20])
+
+    def test_read_all_from_root(self):
+        index = self.make_index(4096*10, 20)
+        self.assertExpandOffsets(range(10), index, [0])
+
+    def test_read_all_when_cached(self):
+        # We've read enough that we can grab all the rest in a single request
+        index = self.make_index(4096*10, 5)
+        self.prepare_index(index, node_ref_lists=0, key_length=1,
+                           key_count=1000, row_lengths=[1, 9],
+                           cached_offsets=[0, 1, 2, 5, 6])
+        # It should fill the remaining nodes, regardless of the one requested
+        self.assertExpandOffsets([3, 4, 7, 8, 9], index, [3])
+        self.assertExpandOffsets([3, 4, 7, 8, 9], index, [8])
+        self.assertExpandOffsets([3, 4, 7, 8, 9], index, [9])
+
+    def test_no_root_node(self):
+        index = self.make_index(4096*10, 5)
+        self.assertExpandOffsets([0], index, [0])
+
+    def test_include_neighbors(self):
+        index = self.make_100_node_index()
+        # We expand in both directions, until we have at least 'recommended'
+        # pages
+        self.assertExpandOffsets([9, 10, 11, 12, 13, 14, 15], index, [12])
+        self.assertExpandOffsets([88, 89, 90, 91, 92, 93, 94], index, [91])
+        # If we hit an 'edge' we continue in the other direction
+        self.assertExpandOffsets([1, 2, 3, 4, 5, 6], index, [2])
+        self.assertExpandOffsets([94, 95, 96, 97, 98, 99], index, [98])
+
+        # Requesting many nodes will expand all locations equally
+        self.assertExpandOffsets([1, 2, 3, 80, 81, 82], index, [2, 81])
+        self.assertExpandOffsets([1, 2, 3, 9, 10, 11, 80, 81, 82], index,
+                               [2, 10, 81])
+
+    def test_stop_at_cached(self):
+        index = self.make_100_node_index()
+        self.set_cached_offsets(index, [0, 10, 19])
+        self.assertExpandOffsets([11, 12, 13, 14, 15, 16], index, [11])
+        self.assertExpandOffsets([11, 12, 13, 14, 15, 16], index, [12])
+        self.assertExpandOffsets([12, 13, 14, 15, 16, 17, 18], index, [15])
+        self.assertExpandOffsets([13, 14, 15, 16, 17, 18], index, [16])
+        self.assertExpandOffsets([13, 14, 15, 16, 17, 18], index, [17])
+        self.assertExpandOffsets([13, 14, 15, 16, 17, 18], index, [18])
+
+    def test_cannot_fully_expand(self):
+        index = self.make_100_node_index()
+        self.set_cached_offsets(index, [0, 10, 12])
+        # We don't go into an endless loop if we are bound by cached nodes
+        self.assertExpandOffsets([11], index, [11])
+
+    def test_overlap(self):
+        index = self.make_100_node_index()
+        self.assertExpandOffsets([10, 11, 12, 13, 14, 15], index, [12, 13])
+        self.assertExpandOffsets([10, 11, 12, 13, 14, 15], index, [11, 14])
+
+    def test_stay_within_layer(self):
+        index = self.make_1000_node_index()
+        # When expanding a request, we won't read nodes from the next layer
+        self.assertExpandOffsets([1, 2, 3, 4], index, [2])
+        self.assertExpandOffsets([6, 7, 8, 9], index, [6])
+        self.assertExpandOffsets([6, 7, 8, 9], index, [9])
+        self.assertExpandOffsets([10, 11, 12, 13, 14, 15], index, [10])
+        self.assertExpandOffsets([10, 11, 12, 13, 14, 15, 16], index, [13])
+
+        self.set_cached_offsets(index, [0, 4, 12])
+        self.assertExpandOffsets([5, 6, 7, 8, 9], index, [7])
+        self.assertExpandOffsets([10, 11], index, [11])
+
+    def test_small_requests_unexpanded(self):
+        index = self.make_100_node_index()
+        self.set_cached_offsets(index, [0])
+        self.assertExpandOffsets([1], index, [1])
+        self.assertExpandOffsets([50], index, [50])
+        # If we request more than one node, then we'll expand
+        self.assertExpandOffsets([49, 50, 51, 59, 60, 61], index, [50, 60])
+
+        # The first pass does not expand
+        index = self.make_1000_node_index()
+        self.set_cached_offsets(index, [0])
+        self.assertExpandOffsets([1], index, [1])
+        self.set_cached_offsets(index, [0, 1])
+        self.assertExpandOffsets([100], index, [100])
+        self.set_cached_offsets(index, [0, 1, 100])
+        # But after the first depth, we will expand
+        self.assertExpandOffsets([2, 3, 4, 5, 6, 7], index, [2])
+        self.assertExpandOffsets([2, 3, 4, 5, 6, 7], index, [4])
+        self.set_cached_offsets(index, [0, 1, 2, 3, 4, 5, 6, 7, 100])
+        self.assertExpandOffsets([102, 103, 104, 105, 106, 107, 108], index,
+                                 [105])

=== modified file 'bzrlib/tests/test_lru_cache.py'
--- a/bzrlib/tests/test_lru_cache.py	2007-11-16 23:53:17 +0000
+++ b/bzrlib/tests/test_lru_cache.py	2008-10-14 20:19:06 +0000
@@ -213,6 +213,18 @@
         obj = object()
         self.assertIs(obj, cache.get(3, obj))
 
+    def test_keys(self):
+        cache = lru_cache.LRUCache(max_cache=5)
+
+        cache[1] = 2
+        cache[2] = 3
+        cache[3] = 4
+        self.assertEqual([1, 2, 3], sorted(cache.keys()))
+        cache[4] = 5
+        cache[5] = 6
+        cache[6] = 7
+        self.assertEqual([2, 3, 4, 5, 6], sorted(cache.keys()))
+
 
 class TestLRUSizeCache(tests.TestCase):
 
@@ -312,3 +324,11 @@
         cache.cleanup()
         # Only the most recent fits after cleaning up
         self.assertEqual(7, cache._value_size)
+
+    def test_keys(self):
+        cache = lru_cache.LRUSizeCache(max_size=10)
+
+        cache[1] = 'a'
+        cache[2] = 'b'
+        cache[3] = 'cdef'
+        self.assertEqual([1, 2, 3], sorted(cache.keys()))

=== added file 'doc/developers/btree_index_prefetch.txt'
--- a/doc/developers/btree_index_prefetch.txt	1970-01-01 00:00:00 +0000
+++ b/doc/developers/btree_index_prefetch.txt	2008-10-14 21:35:27 +0000
@@ -0,0 +1,294 @@
+====================
+BTree Index Prefetch
+====================
+
+This document outlines how we decide to pre-read extra nodes in the btree
+index.
+
+
+Rationale
+=========
+
+Because of the latency involved in making a request, it is often better to make
+fewer large requests, rather than more small requests, even if some of the
+extra data will be wasted.
+
+Example
+-------
+
+Using my connection as an example, I have a max bandwidth of 160kB/s, and a
+latency of between 100-400ms to London, I'll use 200ms for this example. With
+this connection, in 200ms you can download 32kB. So if you make 10 requests for
+4kB of data, you spend 10*.2s = 2s sending the requests, and 4*10/160 = .25s
+actually downloading the data.  If, instead, you made 3 requests for 32kB of
+data each, you would take 3*.2s = .6s for requests, and 32*3/160 = .6s for
+downloading the data. So you save 2.25 - 1.2 = 1.05s even though you downloaded
+32*3-4*10 = 56kB of data that you probably don't need.  On the other hand, if
+you made 1 request for 480kB, you would take .2s for the request, and
+480/160=3s for the data. So you end up taking 3.2s, because of the wasted
+440kB.
+
+
+BTree Structure
+===============
+
+This is meant to give a basic feeling for how the btree index is laid out on
+disk, not give a rigorous discussion. For that look elsewhere[ref?].
+
+The basic structure is that we have pages of 4kB. Each page is either a leaf,
+which holds the final information we are interested in, or is an internal node,
+which contains a list of references to the next layer of nodes. The layers are
+structured such that all nodes for the top layer come first, then the nodes for
+the next layer, linearly in the file.
+
+
+Example 1 layer
+---------------
+
+In the simplest example, all the data fits into a single page, the root node.
+This means the root node is a leaf node.
+
+
+Example 2 layer
+---------------
+
+As soon as the data cannot fit in a single node, we create a new internal node,
+make that the root, and start to create multiple leaf nodes. The root node then
+contains the keys which divide the leaf pages. (So if leaf node 1 ends with
+'foo' and leaf node 2 starts with 'foz', the root node would hold the key 'foz'
+at position 0).
+
+
+Example 3 layer
+---------------
+
+It is possible for enough leaf nodes to be created, that we cannot fit all
+there references in a single node. In this case, we again split, creating
+another layer, and setting that as the root. This layer then references the
+intermediate layer, which references the final leaf nodes.
+
+In all cases, the root node is a single page wide. The next layer can have 2-N
+nodes.
+
+
+Current Info
+------------
+
+Empirically, we've found that the number of references that can be stored on a
+page varies from about 60 to about 180, depending on how much we compress, and
+how similar the keys are. Internal nodes also achieve approximately the same
+compression, though they seem to be closer to 80-100 and not as variable. For
+most of this discussion, we will assume each page holds 100 entries, as that
+makes the math nice and clean.
+
+So the idea is that if you have <100 keys, they will probably all fit on the
+root page. If you have 100 - 10,000 keys, we will have a 2-layer structure, if
+you have 10,000 - 1,000,000 keys, you will have a 3-layer structure. 10^6-10^8
+will be 4-layer, etc.
+
+
+Data and Request
+================
+
+It is important to be aware of what sort of data requests will be made on these
+indexes, so that we know how to optimize them. This is still a work in
+progress, but generally we are searching through ancestry. The final
+information (in the leaf nodes) is stored in sorted order. Revision ids are
+generally of the form "prefix:committer at email-timestamp-randomtail".
+This means that revisions made by the same person around the same time will be
+clustered, but revisions made by different people at the same time will not be
+clustered.
+For files, the keys are ``(file-id, revision-id)`` tuples. And file-ids are
+generally ``basename-timestamp-random-count`` (depending on the converter).
+This means that all revisions for a given file-id will be grouped together, and
+that files with similar names will be grouped together. However, files
+committed in the same revisions will not be grouped together in the index.[1]_
+
+.. [1] One interesting possibility would be to change file-ids from being
+   'basename-...', to being 'containing-dirname-filename-...', which would
+   group files in the similarly named directories together.
+
+
+In general, we always start with a request for the root node of the index, as
+it tells us the final structure of the rest of the index. How many total pages,
+what pages are internal nodes and what layer, which ones are leaves. Before
+this point, we do know the *size* of the index, because that is stored in the
+``pack-names`` file.
+
+
+Thoughts on expansion
+=====================
+
+This is just a bullet list of things to consider when expanding a request.
+
+* We generally assume locality of reference. So if we are currently reading
+  page 10, we are more likely to read page 9 or 11 than we are page 20.
+
+* However, locality of reference only really holds within a layer. If we are
+  reading the last node in a layer, we are unlikely to read the first node of
+  the next layer. In fact, we are most likely to read the *last* node of the
+  next layer.
+
+  More directly, we are probably equally likely to read any of the nodes in the
+  next layer, which could be referred to by this layer. So if we have a
+  structure of 1 root node, 100 intermediate nodes, and 10,000 leaf nodes.
+  They will have offsets: 0, 1-101, 102-10,102.
+
+  If we read the root node, we are likely to want any of the 1-101 nodes
+  (because we don't know where the key points). If we are reading node 90, then
+  we are likely to want a node somewhere around 9,100-9,200.
+
+* When expanding a request, we are considering that we probably want to read on
+  the order of 10 pages extra. (64kB / 4kB = 16 pages.) It is unlikely that we
+  want to expand the requests by 100.
+
+* At the moment, we assume that we don't have an idea of where in the next
+  layer the keys might fall. We *could* use a predictive algorithm assuming
+  homogenous distribution. When reading the root node, we could assume an even
+  distribution from 'a-z', so that a key starting with 'a' would tend to fall
+  in the first few pages of the next layer, while a key starting with 'z' would
+  fall at the end of the next layer.
+  However, this is quite likely to fail in many ways. Specific examples:
+
+    * Converters tend to use an identical prefix. So all revisions will start
+      with 'xxx:', leading us to think that the keys fall in the last half,
+      when in reality they fall evenly distributed.
+
+    * When looking in text indexes. In the short term, changes tend to be
+      clustered around a small set of files. Short term changes are unlikely to
+      cross many pages, but it is unclear what happens in the mid-term.
+      Obviously in the long term, changes have happened to all files.
+
+  A possibility, would be to use this after reading the root node. And then
+  using an algorithm that compares the keys before and after this record, to
+  find what a distribution would be, and estimate the next pages.
+
+  This is a lot of work for a potentially small benefit, though.
+
+* When checking for N keys, we do sequential lookups in each layer. So we look
+  at layer 1 for all N keys, then in layer 2 for all N keys, etc. So our
+  requests will be clustered by layer.
+
+* For projects with large history, we are probably more likely to end up with a
+  bi-modal distribution of pack files. Where we have 1 pack file with a large
+  index, and then several pack files with small indexes, several with tiny
+  indexes, but no pack files with medium sized indexes.
+  This is because a command like ``bzr pack`` will combine everything into a
+  single large file. Commands like ``bzr commit`` will create an index with a
+  single new record, though these will be packaged together by autopack.
+  Commands like ``bzr push`` and ``bzr pull`` will create indexes with more
+  records, but these are unlikely to be a significant portion of the history.
+  Consider bzr has 20,000 revisions, a single push/pull is likely to only be
+  100-200 revisions, or 1% of the history.
+
+  Note that there will always be cases where things are evenly distributed, but
+  we probably shouldn't *optimize* for that case.
+
+* 64kB is 16 pages. 16 pages is approximately 1,600 keys.
+
+* We are considering an index with 1 million keys to be very large. 10M is
+  probably possible, and maybe 100M, but something like 1 billion keys is
+  unlikely. So a 3-layer index is fairly common (it exists already in bzr), but
+  a 4-layer is going to be quite rare, and we will probably never see a
+  5-layer.
+
+* There are times when the second layer is going to be incompletely filled out.
+  Consider an index with 101 keys. We found that we couldn't fit everything
+  into a single page, so we expanded the btree into a root page and a leaf
+  page, and started a new leaf page. However, the root node only has a single
+  entry. There are 3 pages, but only one of them is "full".
+  This happens again when we get near the 10,000 node barrier. We found we
+  couldn't fit the index in a single page, so we split it into a higher layer,
+  and 1 more sub-layer. So we have 1 root node, 2 layer-2 nodes, and N leaf
+  nodes (layer 3). If we read the first 3 nodes, we will have read all internal
+  nodes.
+
+  It is certainly possible to detect this for the first-split case (when things
+  no-longer fit into just the root node), as there will only be a few nodes
+  total. Is it possible to detect this from only the 'size' information for the
+  second-split case (when the index no longer fits in a single page, but still
+  fits in only a small handful of pages)?
+
+  This only really works for the root + layer 2. For layers 3+ they will always
+  be too big to read all at once. However, until we've read the root, we don't
+  know the layout, so all we have to go on is the size of the index, though
+  that also gives us the explicit total number of pages.
+  So it doesn't help to read the root page and then decide. However, on the
+  flip side, if we read *before* the split, then we don't gain much, as we are
+  reading pages we aren't likely to be interested in.
+
+  For example:
+
+    We have 100 keys, which fits onto 100 pages, with a single root node. At
+    1,100 keys, it would be 101 leaf pages, which would then cause us to need 2
+    index pages, triggering an extra layer. However, this is very sensitive to
+    the number of keys we fit per-page, which depends on the compression.
+    Although, we could consider 2,000 keys. Which would be 200 leaf nodes, and
+    2 intermediate nodes, and a single root node. It is unlikely that we would
+    ever be able to fit 200 references into a single root node.
+
+  So if we pretend that we split at 1 page, 100 pages, and 10,000 pages. We
+  might be able to say, at 1-5 pages, read all pages, for 5-100 pages, read
+  only the root. At 100 - 500 pages, read 1-5 pages, for 500-10,000 read only
+  the root. At 10,000-50,000 read 1-5 pages again, but above 50,000 read only
+  the root. We could bias this a bit smaller, say at powers of 80, instead of
+  powers of 100, etc. The basic idea is that if we are *close* to a layer
+  split, go ahead and read a small number of extra pages.
+
+* The previous discussion applies whenever we have an upper layer that is not
+  completely full. So the pages referenced by the last node from the upper
+  layer will often not have a full 100-way fan out. Probably not worthwhile
+  very often, though.
+
+* Sometimes we will be making a very small request for a very small number of
+  keys, we don't really want to bloat tiny requests. Hopefully we can find a
+  decent heuristic to determine when we will be wanting extra nodes later,
+  versus when we expect to find all we want right now.
+
+
+Algorithm
+=========
+
+This is the basic outline of the algorithm.
+
+1. If we don't know the size of the index, don't expand as we don't know what
+   is available. (This only really applies to the pack-names file, which is
+   unlikely to ever become larger than 1 page anyway.)
+
+2. If a request is already wide enough to be greater than the number of
+   recommended pages, don't bother trying to expand. This only really happens
+   with LocalTransport which recommends a single page.
+
+3. Determine what pages have already been read (if any). If the pages left to
+   read can fit in a single request, just request them. This tends to happen on
+   medium sized indexes (ones with low hundreds of revisions), and near the end
+   when we've read most of the whole index already.
+
+4. If we haven't read the root node yet, and we can't fit the whole index into
+   a single request, only read the root node. We don't know where the layer
+   boundaries are anyway.
+
+5. If we haven't read "tree depth" pages yet, and are only requesting a single
+   new page don't expand. This is meant to handle the 'lookup 1 item in the
+   index' case. In a large pack file, you'll read only a single page at each
+   layer and then be done. When spidering out in a search, this will cause us
+   to take a little bit longer to start expanding, but once we've started we'll
+   be expanding at full velocity. This could be improved by having indexes
+   inform each other that they have already entered the 'search' phase, or by
+   having a hint from above to indicate the same.
+
+   However, remember the 'bi-modal' distribution. Most indexes will either be
+   very small, or very large. So either we'll read the whole thing quickly, or
+   we'll end up spending a lot of time in the index. Which makes a small number
+   of extra round trips to large indexes a small overhead. For 2-layer nodes,
+   this only 'wastes' one round trip.
+
+6. Now we are ready to expand the requests. Expand by looking for more pages
+   next to the ones requested that fit within the current layer. If you run
+   into a cached page, or a layer boundary, search further only in the opposite
+   direction. This gives us proper locality of reference, and also helps
+   because when a search goes in a single direction, we will continue to
+   prefetch pages in that direction.
+
+..
+   vim: ft=rst tw=79 ai




More information about the bazaar-commits mailing list