Rev 3614: Cherry-pick Robert's index buffering. in http://bzr.arbash-meinel.com/branches/bzr/jam-integration
John Arbash Meinel
john at arbash-meinel.com
Thu Aug 14 15:58:48 BST 2008
At http://bzr.arbash-meinel.com/branches/bzr/jam-integration
------------------------------------------------------------
revno: 3614
revision-id: john at arbash-meinel.com-20080814145830-hsj8dytdcqze3mvg
parent: pqm at pqm.ubuntu.com-20080814055033-myv5ibn1lb50yf0t
author: Robert Collins <robertc at robertcollins.net>
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: jam-integration
timestamp: Thu 2008-08-14 09:58:30 -0500
message:
Cherry-pick Robert's index buffering.
-------------- next part --------------
=== modified file 'NEWS'
--- a/NEWS 2008-08-13 17:35:34 +0000
+++ b/NEWS 2008-08-14 14:58:30 +0000
@@ -4,6 +4,20 @@
.. contents::
+IN DEVELOPMENT
+--------------
+
+ BUG FIXES:
+
+ * ``GraphIndex`` objects will internally read an entire index if more
+ than 1/20th of their keyspace is requested in a single operation.
+ This largely mitigates a performance regression in ``bzr log FILE``
+ and completely corrects the performance regression in ``bzr log``.
+ The regression was caused by removing an accomodation which had been
+ supporting the index format in use. A newer index format is in
+ development which is substantially faster. (Robert Collins)
+
+
bzr 1.6rc2 2008-08-13
---------------------
=== modified file 'bzrlib/index.py'
--- a/bzrlib/index.py 2008-07-17 07:33:12 +0000
+++ b/bzrlib/index.py 2008-08-14 14:58:30 +0000
@@ -468,14 +468,20 @@
keys supplied. No additional keys will be returned, and every
key supplied that is in the index will be returned.
"""
- # PERFORMANCE TODO: parse and bisect all remaining data at some
- # threshold of total-index processing/get calling layers that expect to
- # read the entire index to use the iter_all_entries method instead.
keys = set(keys)
if not keys:
return []
if self._size is None and self._nodes is None:
self._buffer_all()
+ # We fit about 20 keys per minimum-read (4K), so if we are looking for
+ # more than 1/20th of the index its likely (assuming homogenous key
+ # spread) that we'll read the entire index. If we're going to do that,
+ # buffer the whole thing. A better analysis might take key spread into
+ # account - but B+Tree indices are better anyway.
+ # We could look at all data read, and use a threshold there, which will
+ # trigger on ancestry walks, but that is not yet fully mapped out.
+ if self._nodes is None and len(keys) * 20 > self.key_count():
+ self._buffer_all()
if self._nodes is not None:
return self._iter_entries_from_total_buffer(keys)
else:
More information about the bazaar-commits
mailing list