Rev 3632: (robertc) Trigger index._buffer_all when a significant portion of the in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Thu Aug 14 19:25:17 BST 2008


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3632
revision-id: pqm at pqm.ubuntu.com-20080814182509-9f2dz3kcb1lv5lku
parent: pqm at pqm.ubuntu.com-20080814175522-mho538328p19v77a
parent: john at arbash-meinel.com-20080814170306-ucffga2qhv6kwgju
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2008-08-14 19:25:09 +0100
message:
  (robertc) Trigger index._buffer_all when a significant portion of the
  	index is being queried.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/index.py                index.py-20070712131115-lolkarso50vjr64s-1
    ------------------------------------------------------------
    revno: 3620.3.2
    revision-id: john at arbash-meinel.com-20080814170306-ucffga2qhv6kwgju
    parent: robertc at robertcollins.net-20080814042150-n13mos3tt90xw2rp
    parent: pqm at pqm.ubuntu.com-20080814074117-x0zvzzv7y6mok8pz
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: jam-integration
    timestamp: Thu 2008-08-14 12:03:06 -0500
    message:
      Merge in bzr.dev, resolve NEWS
    added:
      doc/en/developer-guide/testing.txt testing.txt-20080812140359-i70zzh6v2z7grqex-1
    modified:
      Makefile                       Makefile-20050805140406-d96e3498bb61c5bb
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/diff.py                 diff.py-20050309040759-26944fbbf2ebbf36
      bzrlib/help_topics/en/rules.txt rules.txt-20080516063844-ghr5l6pvvrhiycun-1
      bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/smart/medium.py         medium.py-20061103051856-rgu2huy59fkz902q-1
      bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
      bzrlib/tests/http_utils.py     HTTPTestUtil.py-20050914180604-247d3aafb7a43343
      bzrlib/tests/intertree_implementations/test_compare.py test_compare.py-20060724101752-09ysswo1a92uqyoz-2
      bzrlib/tests/test_http.py      testhttp.py-20051018020158-b2eef6e867c514d9
      bzrlib/tests/test_transform.py test_transaction.py-20060105172520-b3ffb3946550e6c4
      bzrlib/tests/tree_implementations/test_iter_search_rules.py test_iter_search_rul-20080528065532-1ml1ttb12az20cxf-1
      bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
      bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
      bzrlib/tree.py                 tree.py-20050309040759-9d5f2496be663e77
      bzrlib/workingtree_4.py        workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
      doc/developers/HACKING.txt     HACKING-20050805200004-2a5dc975d870f78c
      doc/en/user-guide/configuring_bazaar.txt configuring_bazaar.t-20071128000722-ncxiua259xwbdbg7-1
      setup.py                       setup.py-20050314065409-02f8a0a6e3f9bc70
      tools/win32/bzr.iss.cog        bzr.iss.cog-20060622100836-b3yup582rt3y0nvm-5
    ------------------------------------------------------------
    revno: 3620.3.1
    revision-id: robertc at robertcollins.net-20080814042150-n13mos3tt90xw2rp
    parent: pqm at pqm.ubuntu.com-20080812201855-9qxbdo0t2h9byzhj
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: index.trigger
    timestamp: Thu 2008-08-14 14:21:50 +1000
    message:
      Cause a full index read when a single index query is larger than 1/20th the index capacity.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/index.py                index.py-20070712131115-lolkarso50vjr64s-1
=== modified file 'NEWS'
--- a/NEWS	2008-08-14 06:46:29 +0000
+++ b/NEWS	2008-08-14 17:03:06 +0000
@@ -38,6 +38,14 @@
       is unknown in both source and target.
       (Robert Collins, Aaron Bentley)
 
+    * ``GraphIndex`` objects will internally read an entire index if more
+      than 1/20th of their keyspace is requested in a single operation.
+      This largely mitigates a performance regression in ``bzr log FILE``
+      and completely corrects the performance regression in ``bzr log``.
+      The regression was caused by removing an accomodation which had been
+      supporting the index format in use. A newer index format is in
+      development which is substantially faster. (Robert Collins)
+
 
 bzr 1.6rc2 2008-08-13
 ---------------------

=== modified file 'bzrlib/index.py'
--- a/bzrlib/index.py	2008-07-17 07:33:12 +0000
+++ b/bzrlib/index.py	2008-08-14 04:21:50 +0000
@@ -468,14 +468,20 @@
             keys supplied. No additional keys will be returned, and every
             key supplied that is in the index will be returned.
         """
-        # PERFORMANCE TODO: parse and bisect all remaining data at some
-        # threshold of total-index processing/get calling layers that expect to
-        # read the entire index to use the iter_all_entries  method instead.
         keys = set(keys)
         if not keys:
             return []
         if self._size is None and self._nodes is None:
             self._buffer_all()
+        # We fit about 20 keys per minimum-read (4K), so if we are looking for
+        # more than 1/20th of the index its likely (assuming homogenous key
+        # spread) that we'll read the entire index. If we're going to do that,
+        # buffer the whole thing. A better analysis might take key spread into
+        # account - but B+Tree indices are better anyway.
+        # We could look at all data read, and use a threshold there, which will
+        # trigger on ancestry walks, but that is not yet fully mapped out.
+        if self._nodes is None and len(keys) * 20 > self.key_count():
+            self._buffer_all()
         if self._nodes is not None:
             return self._iter_entries_from_total_buffer(keys)
         else:




More information about the bazaar-commits mailing list