Rev 5538: peek() seems to shave about 5% (5s reduction, over ~100s total time). in http://bazaar.launchpad.net/~jameinel/bzr/2.3-gcb-peak-mem

John Arbash Meinel john at arbash-meinel.com
Wed Dec 1 22:50:11 GMT 2010


At http://bazaar.launchpad.net/~jameinel/bzr/2.3-gcb-peak-mem

------------------------------------------------------------
revno: 5538
revision-id: john at arbash-meinel.com-20101201225008-h8065d63rq03shdr
parent: john at arbash-meinel.com-20101201222700-mpty01tewes4s1lu
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.3-gcb-peak-mem
timestamp: Wed 2010-12-01 16:50:08 -0600
message:
  peek() seems to shave about 5% (5s reduction, over ~100s total time).
  
  I don't know that we need it as a separate 'inline' function, but it is definitely
  useful to peek() before doing the full test.
  
  With peek, we get about 217M peek skips, 166M no-actual-matches, out of 392M
  matches that aren't good enough, and 420M total matches.
-------------- next part --------------
=== modified file 'bzrlib/_delta_index_pyx.pyx'
--- a/bzrlib/_delta_index_pyx.pyx	2010-12-01 22:27:00 +0000
+++ b/bzrlib/_delta_index_pyx.pyx	2010-12-01 22:50:08 +0000
@@ -157,20 +157,6 @@
         self._table_mask = rabin_hash_map.table_mask
 
     @cython.profile(False)
-    cdef int peek(self, unsigned int rabin_val): # noexcept
-        """Check and see if there might be anything at a given rabin_val.
-
-        The idea is that most lookups fail to find anything, and we want it to
-        be as efficient as possible to get to that state. If this returns 0,
-        then there is definitely nothing for that rabin_val. If it returns 1,
-        then there might be something, might not. Concretely, we return 0 if
-        there is null in the first slot, 1 otherwise.
-        """
-        cdef rabin_entry *slot
-        slot = self._table + (rabin_val & self._table_mask)
-        return (slot.ptr != NULL)
-
-    @cython.profile(False)
     cdef start(self, unsigned int rabin_val):
         self.rabin_val = rabin_val
         self.entry = NULL
@@ -235,6 +221,18 @@
                            % (self.rabin_val, self._step,
                               self._next_hash))
 
+ at cython.profile(False)
+cdef inline int _peek(RabinEntrySearch self, unsigned int rabin_val): # noexcept
+    """Check and see if there might be anything at a given rabin_val.
+
+    The idea is that most lookups fail to find anything, and we want it to
+    be as efficient as possible to get to that state. If this returns 0,
+    then there is definitely nothing for that rabin_val. If it returns 1,
+    then there might be something, might not. Concretely, we return 0 if
+    there is null in the first slot, 1 otherwise.
+    """
+    return self._table[rabin_val & self._table_mask].ptr != NULL
+
 
 cdef public unsigned int MIN_TABLE_SIZE
 MIN_TABLE_SIZE = 1024
@@ -1168,7 +1166,7 @@
                 # one
                 # Pull in the next data character
                 RABIN_UPDATE(cur_rabin_val, data[-RABIN_WINDOW], data[0])
-                if search.peek(cur_rabin_val):
+                if _peek(search, cur_rabin_val):
                     count = self._find_better_match(data, data_end, cur_rabin_val,
                                                     search)
                     global matches_checked



More information about the bazaar-commits mailing list