Rev 3614: Cherry-pick Robert's index buffering. in http://bzr.arbash-meinel.com/branches/bzr/jam-integration

Thu Aug 14 15:58:48 BST 2008

At http://bzr.arbash-meinel.com/branches/bzr/jam-integration

------------------------------------------------------------
revno: 3614
revision-id: john at arbash-meinel.com-20080814145830-hsj8dytdcqze3mvg
parent: pqm at pqm.ubuntu.com-20080814055033-myv5ibn1lb50yf0t
author: Robert Collins <robertc at robertcollins.net>
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: jam-integration
timestamp: Thu 2008-08-14 09:58:30 -0500
message:
  Cherry-pick Robert's index buffering.
-------------- next part --------------
=== modified file 'NEWS'

--- a/NEWS	2008-08-13 17:35:34 +0000
+++ b/NEWS	2008-08-14 14:58:30 +0000
@@ -4,6 +4,20 @@
 
 .. contents::
 
+IN DEVELOPMENT
+--------------
+
+  BUG FIXES: 
+
+    * ``GraphIndex`` objects will internally read an entire index if more
+      than 1/20th of their keyspace is requested in a single operation.
+      This largely mitigates a performance regression in ``bzr log FILE``
+      and completely corrects the performance regression in ``bzr log``.
+      The regression was caused by removing an accomodation which had been
+      supporting the index format in use. A newer index format is in
+      development which is substantially faster. (Robert Collins)
+
+
 bzr 1.6rc2 2008-08-13
 ---------------------
 

=== modified file 'bzrlib/index.py'
--- a/bzrlib/index.py	2008-07-17 07:33:12 +0000
+++ b/bzrlib/index.py	2008-08-14 14:58:30 +0000
@@ -468,14 +468,20 @@
             keys supplied. No additional keys will be returned, and every
             key supplied that is in the index will be returned.
         """
-        # PERFORMANCE TODO: parse and bisect all remaining data at some
-        # threshold of total-index processing/get calling layers that expect to
-        # read the entire index to use the iter_all_entries  method instead.
         keys = set(keys)
         if not keys:
             return []
         if self._size is None and self._nodes is None:
             self._buffer_all()
+        # We fit about 20 keys per minimum-read (4K), so if we are looking for
+        # more than 1/20th of the index its likely (assuming homogenous key
+        # spread) that we'll read the entire index. If we're going to do that,
+        # buffer the whole thing. A better analysis might take key spread into
+        # account - but B+Tree indices are better anyway.
+        # We could look at all data read, and use a threshold there, which will
+        # trigger on ancestry walks, but that is not yet fully mapped out.
+        if self._nodes is None and len(keys) * 20 > self.key_count():
+            self._buffer_all()
         if self._nodes is not None:
             return self._iter_entries_from_total_buffer(keys)
         else: