Rev 5375: Do some timing with different objects. in http://bazaar.launchpad.net/~jameinel/bzr/2.3-btree-chk-leaf

John Arbash Meinel john at arbash-meinel.com
Wed Aug 4 03:00:54 BST 2010


At http://bazaar.launchpad.net/~jameinel/bzr/2.3-btree-chk-leaf

------------------------------------------------------------
revno: 5375
revision-id: john at arbash-meinel.com-20100804020038-teb38durhgj6psxj
parent: john at arbash-meinel.com-20100804015214-5dbdub9p0qyc07dp
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.3-btree-chk-leaf
timestamp: Tue 2010-08-03 21:00:38 -0500
message:
  Do some timing with different objects.
  
  It turns out that pure dict is 13.7us lookup time, dict-as-attr is 20.7us,
  PyObject __getitem__ to dict-as-attr is 54us.
-------------- next part --------------
=== modified file 'bzrlib/_btree_serializer_pyx.pyx'
--- a/bzrlib/_btree_serializer_pyx.pyx	2010-08-04 01:52:14 +0000
+++ b/bzrlib/_btree_serializer_pyx.pyx	2010-08-04 02:00:38 +0000
@@ -567,6 +567,15 @@
         hi = self.num_entries
 
         # Bisect until we find the right spot
+        # Note the time to create Python objects dominates over comparison
+        # times. With a simple iterate and compare (no bisect) we spend 45us
+        # for 120 __contains__, but 201us for 120 __getitem__.
+        # The _LeafNode.__getitem__ takes 54.6us, but (k in o._keys) is 20.7us
+        # removing the getattr() w/ (k in _keys) 13.7ms. So we do still have
+        # some room for improvement
+        # TODO: Consider improving this, but for now it seems good enough. (We
+        #       could create a small index so we would have less to bisect,
+        #       etc.)
         while lo < hi:
             mid = (lo + hi) / 2
             the_cmp = memcmp(sha1, self.entries[mid].sha1, 20)



More information about the bazaar-commits mailing list