Rev 3621: Cause a full index read when a single index query is larger than 1/20th the index capacity. in http://people.ubuntu.com/~robertc/baz2.0/index.trigger
Robert Collins
robertc at robertcollins.net
Thu Aug 14 05:21:58 BST 2008
At http://people.ubuntu.com/~robertc/baz2.0/index.trigger
------------------------------------------------------------
revno: 3621
revision-id: robertc at robertcollins.net-20080814042150-n13mos3tt90xw2rp
parent: pqm at pqm.ubuntu.com-20080812201855-9qxbdo0t2h9byzhj
committer: Robert Collins <robertc at robertcollins.net>
branch nick: index.trigger
timestamp: Thu 2008-08-14 14:21:50 +1000
message:
Cause a full index read when a single index query is larger than 1/20th the index capacity.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
=== modified file 'NEWS'
--- a/NEWS 2008-08-11 03:39:25 +0000
+++ b/NEWS 2008-08-14 04:21:50 +0000
@@ -38,6 +38,14 @@
INTERNALS:
+ * ``GraphIndex`` objects will internally read an entire index if more
+ than 1/20th of their keyspace is requested in a single operation.
+ This largely mitigates a performance regression in ``bzr log FILE``
+ and completely corrects the performance regression in ``bzr log``.
+ The regression was caused by removing an accomodation which had been
+ supporting the index format in use. A newer index format is in
+ development which is substantially faster. (Robert Collins)
+
bzr 1.6rc1 2008-08-06
---------------------
=== modified file 'bzrlib/index.py'
--- a/bzrlib/index.py 2008-07-17 07:33:12 +0000
+++ b/bzrlib/index.py 2008-08-14 04:21:50 +0000
@@ -468,14 +468,20 @@
keys supplied. No additional keys will be returned, and every
key supplied that is in the index will be returned.
"""
- # PERFORMANCE TODO: parse and bisect all remaining data at some
- # threshold of total-index processing/get calling layers that expect to
- # read the entire index to use the iter_all_entries method instead.
keys = set(keys)
if not keys:
return []
if self._size is None and self._nodes is None:
self._buffer_all()
+ # We fit about 20 keys per minimum-read (4K), so if we are looking for
+ # more than 1/20th of the index its likely (assuming homogenous key
+ # spread) that we'll read the entire index. If we're going to do that,
+ # buffer the whole thing. A better analysis might take key spread into
+ # account - but B+Tree indices are better anyway.
+ # We could look at all data read, and use a threshold there, which will
+ # trigger on ancestry walks, but that is not yet fully mapped out.
+ if self._nodes is None and len(keys) * 20 > self.key_count():
+ self._buffer_all()
if self._nodes is not None:
return self._iter_entries_from_total_buffer(keys)
else:
More information about the bazaar-commits
mailing list