Rev 3621: Cause a full index read when a single index query is larger than 1/20th the index capacity. in http://people.ubuntu.com/~robertc/baz2.0/index.trigger

Robert Collins robertc at robertcollins.net
Thu Aug 14 05:21:58 BST 2008


At http://people.ubuntu.com/~robertc/baz2.0/index.trigger

------------------------------------------------------------
revno: 3621
revision-id: robertc at robertcollins.net-20080814042150-n13mos3tt90xw2rp
parent: pqm at pqm.ubuntu.com-20080812201855-9qxbdo0t2h9byzhj
committer: Robert Collins <robertc at robertcollins.net>
branch nick: index.trigger
timestamp: Thu 2008-08-14 14:21:50 +1000
message:
  Cause a full index read when a single index query is larger than 1/20th the index capacity.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/index.py                index.py-20070712131115-lolkarso50vjr64s-1
=== modified file 'NEWS'
--- a/NEWS	2008-08-11 03:39:25 +0000
+++ b/NEWS	2008-08-14 04:21:50 +0000
@@ -38,6 +38,14 @@
 
   INTERNALS:
 
+    * ``GraphIndex`` objects will internally read an entire index if more
+      than 1/20th of their keyspace is requested in a single operation.
+      This largely mitigates a performance regression in ``bzr log FILE``
+      and completely corrects the performance regression in ``bzr log``.
+      The regression was caused by removing an accomodation which had been
+      supporting the index format in use. A newer index format is in
+      development which is substantially faster. (Robert Collins)
+
 
 bzr 1.6rc1 2008-08-06
 ---------------------

=== modified file 'bzrlib/index.py'
--- a/bzrlib/index.py	2008-07-17 07:33:12 +0000
+++ b/bzrlib/index.py	2008-08-14 04:21:50 +0000
@@ -468,14 +468,20 @@
             keys supplied. No additional keys will be returned, and every
             key supplied that is in the index will be returned.
         """
-        # PERFORMANCE TODO: parse and bisect all remaining data at some
-        # threshold of total-index processing/get calling layers that expect to
-        # read the entire index to use the iter_all_entries  method instead.
         keys = set(keys)
         if not keys:
             return []
         if self._size is None and self._nodes is None:
             self._buffer_all()
+        # We fit about 20 keys per minimum-read (4K), so if we are looking for
+        # more than 1/20th of the index its likely (assuming homogenous key
+        # spread) that we'll read the entire index. If we're going to do that,
+        # buffer the whole thing. A better analysis might take key spread into
+        # account - but B+Tree indices are better anyway.
+        # We could look at all data read, and use a threshold there, which will
+        # trigger on ancestry walks, but that is not yet fully mapped out.
+        if self._nodes is None and len(keys) * 20 > self.key_count():
+            self._buffer_all()
         if self._nodes is not None:
             return self._iter_entries_from_total_buffer(keys)
         else:




More information about the bazaar-commits mailing list