Rev 4802: Some small tweaks. in http://bazaar.launchpad.net/~jameinel/bzr/chk-index

Wed Oct 28 20:41:52 GMT 2009

At http://bazaar.launchpad.net/~jameinel/bzr/chk-index

------------------------------------------------------------
revno: 4802
revision-id: john at arbash-meinel.com-20091028204142-pcqz133s5ed16c9e
parent: john at arbash-meinel.com-20091028195503-lzy0faa8op3z85yy
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: chk-index
timestamp: Wed 2009-10-28 15:41:42 -0500
message:
  Some small tweaks.
  
  I just did some perf testing of instrumenting the key requests against
  streaming the content out of a bzr branch and a launchpad branch.
  I then created a single CHKIndex with the same 'group' layout,
  and performed all the same requests against it, and compared
  that time to the time to issue the same requests against the
  largest btree .cix (note that this wasn't *all* entries).
  
  For bzr, CHKIndex was 0.4s versus 0.979s.
  For launchpad, CHKIndex was 1.7s versus 9.9s. \o/
  
  And all of this without any custom C parsers, etc.
  Note that this was also just timing 'iter_entries()', and
  not any of the other parsing, etc. But the CHK code already
  has all the values in integers rather than strings.
  
  As for size on disk.
  For bzr, it is 3.7MB => 3.0MB
  For launchpad, it is 16MB => 12MB
  
  So this code is both smaller and significantly faster.
  Though admittedly, the time for .cix parsing is probably not
  a primary factor in performance. (time to stream the content is
  probably much higher.)
  
  As for size in memory
  For bzr, it is 21MB => 11.2MB.
  For launchpad, it is 82MB => 42MB.
  There is probably a bit of 'are the tuples interned' effect here
  that we should be watching out for.
-------------- next part --------------
=== modified file 'bzrlib/chk_index.py'

--- a/bzrlib/chk_index.py	2009-10-28 19:55:03 +0000
+++ b/bzrlib/chk_index.py	2009-10-28 20:41:42 +0000
@@ -692,11 +692,11 @@
         return self._header.num_entries
 
     def _bit_key_to_key(self, bit_key):
-        # TODO: intern()?
         # TODO: Handle when/if we support less-than-full-width keys
         #       In which case we should probably return a 'psha1:' for 'partial
         #       sha1' or something along those lines
-        return static_tuple.StaticTuple('sha1:' + binascii.b2a_hex(bit_key))
+        hex_bits = binascii.b2a_hex(bit_key)
+        return static_tuple.StaticTuple('sha1:' + hex_bits).intern()
 
     def _key_to_bit_key(self, key):
         # TODO: Handle when/if we support less-than-full-width keys and 'sha1:'
@@ -713,11 +713,12 @@
         return [to_key(bit_key) for bit_key in self._entries]
 
     def iter_all_entries(self):
+        """Return all key, value pairs in this index."""
         self._ensure_header()
         self._read_all()
         b_to_k = self._bit_key_to_key
         for bit_key, value in self._entries.iteritems():
-            yield self, b_to_k(bit_key), value
+            yield (self, b_to_k(bit_key), value)
 
     def iter_entries(self, keys):
         """See GraphIndex.iter_entries()"""