Rev 2926: (robertc) Add a last-lookup cache to DirState reducing the cost of serial access to paths. (Robert Collins) in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Mon Oct 22 21:45:30 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2926
revision-id: pqm at pqm.ubuntu.com-20071022204528-m4i3ievs46d19324
parent: pqm at pqm.ubuntu.com-20071022183547-ylcflvri4vahgtmg
parent: robertc at robertcollins.net-20071022200512-ev362dzjjoj9v7r1
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Mon 2007-10-22 21:45:28 +0100
message:
  (robertc) Add a last-lookup cache to DirState reducing the cost of serial access to paths. (Robert Collins)
modified:
  bzrlib/dirstate.py             dirstate.py-20060728012006-d6mvoihjb3je9peu-1
    ------------------------------------------------------------
    revno: 2921.1.4
    merged: robertc at robertcollins.net-20071022200512-ev362dzjjoj9v7r1
    parent: robertc at robertcollins.net-20071022195941-06a7tdtkd8muueg9
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: dirstate.cache
    timestamp: Tue 2007-10-23 06:05:12 +1000
    message:
      Documentation to aid our failing memory in the future.
    ------------------------------------------------------------
    revno: 2921.1.3
    merged: robertc at robertcollins.net-20071022195941-06a7tdtkd8muueg9
    parent: robertc at robertcollins.net-20071022193257-4sa1111h0d0b01zg
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: dirstate.cache
    timestamp: Tue 2007-10-23 05:59:41 +1000
    message:
      Review feedback.
    ------------------------------------------------------------
    revno: 2921.1.2
    merged: robertc at robertcollins.net-20071022193257-4sa1111h0d0b01zg
    parent: robertc at robertcollins.net-20071022043300-c8sdvyizlgoy3i97
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: dirstate.cache
    timestamp: Tue 2007-10-23 05:32:57 +1000
    message:
      Reset the dirstate cache pointer for the entry on lookup of a new block, suggested by John.
    ------------------------------------------------------------
    revno: 2921.1.1
    merged: robertc at robertcollins.net-20071022043300-c8sdvyizlgoy3i97
    parent: pqm at pqm.ubuntu.com-20071019201226-6z006xotgfe7zmu8
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: dirstate.cache
    timestamp: Mon 2007-10-22 14:33:00 +1000
    message:
      Add a cache for dirstate entry lookups.
=== modified file 'bzrlib/dirstate.py'
--- a/bzrlib/dirstate.py	2007-10-10 00:04:46 +0000
+++ b/bzrlib/dirstate.py	2007-10-22 20:05:12 +0000
@@ -345,6 +345,12 @@
             self._sha1_file = self._sha1_file_and_mutter
         else:
             self._sha1_file = osutils.sha_file_by_name
+        # These two attributes provide a simple cache for lookups into the
+        # dirstate in-memory vectors. By probing respectively for the last
+        # block, and for the next entry, we save nearly 2 bisections per path
+        # during commit.
+        self._last_block_index = None
+        self._last_entry_index = None
 
     def __repr__(self):
         return "%s(%r)" % \
@@ -1042,6 +1048,12 @@
         """
         if key[0:2] == ('', ''):
             return 0, True
+        try:
+            if (self._last_block_index is not None and
+                self._dirblocks[self._last_block_index][0] == key[0]):
+                return self._last_block_index, True
+        except IndexError:
+            pass
         block_index = bisect_dirblock(self._dirblocks, key[0], 1,
                                       cache=self._split_path_cache)
         # _right returns one-past-where-key is so we have to subtract
@@ -1052,6 +1064,9 @@
         # simple and correct:
         present = (block_index < len(self._dirblocks) and
             self._dirblocks[block_index][0] == key[0])
+        self._last_block_index = block_index
+        # Reset the entry index cache to the beginning of the block.
+        self._last_entry_index = -1
         return block_index, present
 
     def _find_entry_index(self, key, block):
@@ -1059,9 +1074,24 @@
 
         :return: The entry index, True if the entry for the key is present.
         """
+        len_block = len(block)
+        try:
+            if self._last_entry_index is not None:
+                # mini-bisect here.
+                entry_index = self._last_entry_index + 1
+                # A hit is when the key is after the last slot, and before or
+                # equal to the next slot.
+                if ((entry_index > 0 and block[entry_index - 1][0] < key) and
+                    key <= block[entry_index][0]):
+                    self._last_entry_index = entry_index
+                    present = (block[entry_index][0] == key)
+                    return entry_index, present
+        except IndexError:
+            pass
         entry_index = bisect.bisect_left(block, (key, []))
-        present = (entry_index < len(block) and
+        present = (entry_index < len_block and
             block[entry_index][0] == key)
+        self._last_entry_index = entry_index
         return entry_index, present
 
     @staticmethod
@@ -1353,7 +1383,7 @@
             return block_index, 0, False, False
         block = self._dirblocks[block_index][1] # access the entries only
         entry_index, present = self._find_entry_index(key, block)
-        # linear search through present entries at this path to find the one
+        # linear search through entries at this path to find the one
         # requested.
         while entry_index < len(block) and block[entry_index][0][1] == basename:
             if block[entry_index][1][tree_index][0] not in \
@@ -1380,7 +1410,8 @@
         """
         self._read_dirblocks_if_needed()
         if path_utf8 is not None:
-            assert path_utf8.__class__ == str, 'path_utf8 is not a str: %s %s' % (type(path_utf8), path_utf8)
+            assert path_utf8.__class__ == str, ('path_utf8 is not a str: %s %s'
+                % (type(path_utf8), path_utf8))
             # path lookups are faster
             dirname, basename = osutils.split(path_utf8)
             block_index, entry_index, dir_present, file_present = \




More information about the bazaar-commits mailing list