Rev 4802: Some small tweaks. in http://bazaar.launchpad.net/~jameinel/bzr/chk-index
John Arbash Meinel
john at arbash-meinel.com
Wed Oct 28 20:41:52 GMT 2009
At http://bazaar.launchpad.net/~jameinel/bzr/chk-index
------------------------------------------------------------
revno: 4802
revision-id: john at arbash-meinel.com-20091028204142-pcqz133s5ed16c9e
parent: john at arbash-meinel.com-20091028195503-lzy0faa8op3z85yy
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: chk-index
timestamp: Wed 2009-10-28 15:41:42 -0500
message:
Some small tweaks.
I just did some perf testing of instrumenting the key requests against
streaming the content out of a bzr branch and a launchpad branch.
I then created a single CHKIndex with the same 'group' layout,
and performed all the same requests against it, and compared
that time to the time to issue the same requests against the
largest btree .cix (note that this wasn't *all* entries).
For bzr, CHKIndex was 0.4s versus 0.979s.
For launchpad, CHKIndex was 1.7s versus 9.9s. \o/
And all of this without any custom C parsers, etc.
Note that this was also just timing 'iter_entries()', and
not any of the other parsing, etc. But the CHK code already
has all the values in integers rather than strings.
As for size on disk.
For bzr, it is 3.7MB => 3.0MB
For launchpad, it is 16MB => 12MB
So this code is both smaller and significantly faster.
Though admittedly, the time for .cix parsing is probably not
a primary factor in performance. (time to stream the content is
probably much higher.)
As for size in memory
For bzr, it is 21MB => 11.2MB.
For launchpad, it is 82MB => 42MB.
There is probably a bit of 'are the tuples interned' effect here
that we should be watching out for.
-------------- next part --------------
=== modified file 'bzrlib/chk_index.py'
--- a/bzrlib/chk_index.py 2009-10-28 19:55:03 +0000
+++ b/bzrlib/chk_index.py 2009-10-28 20:41:42 +0000
@@ -692,11 +692,11 @@
return self._header.num_entries
def _bit_key_to_key(self, bit_key):
- # TODO: intern()?
# TODO: Handle when/if we support less-than-full-width keys
# In which case we should probably return a 'psha1:' for 'partial
# sha1' or something along those lines
- return static_tuple.StaticTuple('sha1:' + binascii.b2a_hex(bit_key))
+ hex_bits = binascii.b2a_hex(bit_key)
+ return static_tuple.StaticTuple('sha1:' + hex_bits).intern()
def _key_to_bit_key(self, key):
# TODO: Handle when/if we support less-than-full-width keys and 'sha1:'
@@ -713,11 +713,12 @@
return [to_key(bit_key) for bit_key in self._entries]
def iter_all_entries(self):
+ """Return all key, value pairs in this index."""
self._ensure_header()
self._read_all()
b_to_k = self._bit_key_to_key
for bit_key, value in self._entries.iteritems():
- yield self, b_to_k(bit_key), value
+ yield (self, b_to_k(bit_key), value)
def iter_entries(self, keys):
"""See GraphIndex.iter_entries()"""
More information about the bazaar-commits
mailing list