Rev 4735: CHKInventory using __slots__ has a huge impact in my testing script. in http://bazaar.launchpad.net/~jameinel/bzr/2.1-chk-memory

John Arbash Meinel john at arbash-meinel.com
Thu Oct 8 22:26:50 BST 2009


At http://bazaar.launchpad.net/~jameinel/bzr/2.1-chk-memory

------------------------------------------------------------
revno: 4735
revision-id: john at arbash-meinel.com-20091008212628-q5oh7rdg7ikvy7jo
parent: john at arbash-meinel.com-20091008194850-nigahumk4tj2uhy8
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.1-chk-memory
timestamp: Thu 2009-10-08 16:26:28 -0500
message:
  CHKInventory using __slots__ has a huge impact in my testing script.
  
  Specifically, we have >= 6 attributes on CHKInventory, which causes the
  self.__dict__ to expand to 524 bytes resident.
  So for 25k items @524bytes each, that is 10MB. Switching it to use __slots__
  changes the overhead to 8*4=32, or saving 492 bytes per object.
  
  This may not translate into real-world, as we may not hold many CHKInventories
  in memory at once. I think the log code has a cap of 200 Revision trees at
  once. Which is only 100k.
-------------- next part --------------
=== modified file 'bzrlib/inventory.py'
--- a/bzrlib/inventory.py	2009-10-08 19:48:50 +0000
+++ b/bzrlib/inventory.py	2009-10-08 21:26:28 +0000
@@ -732,6 +732,8 @@
     inserted, other than through the Inventory API.
     """
 
+    __slots__ = ('root', 'revision_id')
+
     def __contains__(self, file_id):
         """True if this entry contains a file with given id.
 
@@ -1492,6 +1494,18 @@
     want to reuse.
     """
 
+    # An attribute dict that holds between 6 and 22 entries (inclusive) costs
+    # 524 bytes of memory (32-bit). Using slots for 6 entries costs 24 bytes of
+    # memory, and 88 bytes of memory for 22 entries.
+    # For <6 entries, it costs 140 bytes, but 5 slots == 20 bytes.
+    # Switching CHKInventory to using __slots__ saves 10MB when loading all
+    # bzr.dev's chk inventories, and 30MB when loading all of launchpad.
+    # I don't know the specific effect in real-world operations, because we may
+    # never grab all CHKInventory objects at once.
+    __slots__ = ('_fileid_to_entry_cache', '_path_to_fileid_cache',
+                 '_search_key_name', 'root_id',
+                 'id_to_entry', 'parent_id_basename_to_file_id')
+
     def __init__(self, search_key_name):
         CommonInventory.__init__(self)
         # Note: if just loading all CHKInventory objects, these two empty



More information about the bazaar-commits mailing list