Rev 5538: peek() seems to shave about 5% (5s reduction, over ~100s total time). in http://bazaar.launchpad.net/~jameinel/bzr/2.3-gcb-peak-mem
John Arbash Meinel
john at arbash-meinel.com
Wed Dec 1 22:50:11 GMT 2010
At http://bazaar.launchpad.net/~jameinel/bzr/2.3-gcb-peak-mem
------------------------------------------------------------
revno: 5538
revision-id: john at arbash-meinel.com-20101201225008-h8065d63rq03shdr
parent: john at arbash-meinel.com-20101201222700-mpty01tewes4s1lu
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.3-gcb-peak-mem
timestamp: Wed 2010-12-01 16:50:08 -0600
message:
peek() seems to shave about 5% (5s reduction, over ~100s total time).
I don't know that we need it as a separate 'inline' function, but it is definitely
useful to peek() before doing the full test.
With peek, we get about 217M peek skips, 166M no-actual-matches, out of 392M
matches that aren't good enough, and 420M total matches.
-------------- next part --------------
=== modified file 'bzrlib/_delta_index_pyx.pyx'
--- a/bzrlib/_delta_index_pyx.pyx 2010-12-01 22:27:00 +0000
+++ b/bzrlib/_delta_index_pyx.pyx 2010-12-01 22:50:08 +0000
@@ -157,20 +157,6 @@
self._table_mask = rabin_hash_map.table_mask
@cython.profile(False)
- cdef int peek(self, unsigned int rabin_val): # noexcept
- """Check and see if there might be anything at a given rabin_val.
-
- The idea is that most lookups fail to find anything, and we want it to
- be as efficient as possible to get to that state. If this returns 0,
- then there is definitely nothing for that rabin_val. If it returns 1,
- then there might be something, might not. Concretely, we return 0 if
- there is null in the first slot, 1 otherwise.
- """
- cdef rabin_entry *slot
- slot = self._table + (rabin_val & self._table_mask)
- return (slot.ptr != NULL)
-
- @cython.profile(False)
cdef start(self, unsigned int rabin_val):
self.rabin_val = rabin_val
self.entry = NULL
@@ -235,6 +221,18 @@
% (self.rabin_val, self._step,
self._next_hash))
+ at cython.profile(False)
+cdef inline int _peek(RabinEntrySearch self, unsigned int rabin_val): # noexcept
+ """Check and see if there might be anything at a given rabin_val.
+
+ The idea is that most lookups fail to find anything, and we want it to
+ be as efficient as possible to get to that state. If this returns 0,
+ then there is definitely nothing for that rabin_val. If it returns 1,
+ then there might be something, might not. Concretely, we return 0 if
+ there is null in the first slot, 1 otherwise.
+ """
+ return self._table[rabin_val & self._table_mask].ptr != NULL
+
cdef public unsigned int MIN_TABLE_SIZE
MIN_TABLE_SIZE = 1024
@@ -1168,7 +1166,7 @@
# one
# Pull in the next data character
RABIN_UPDATE(cur_rabin_val, data[-RABIN_WINDOW], data[0])
- if search.peek(cur_rabin_val):
+ if _peek(search, cur_rabin_val):
count = self._find_better_match(data, data_end, cur_rabin_val,
search)
global matches_checked
More information about the bazaar-commits
mailing list