Rev 104: Do some ugly hacks to keep memory low during 'compute_referrers'. in http://bazaar.launchpad.net/~meliae-dev/meliae/trunk

John Arbash Meinel john at arbash-meinel.com
Thu Oct 22 23:02:50 BST 2009


At http://bazaar.launchpad.net/~meliae-dev/meliae/trunk

------------------------------------------------------------
revno: 104
revision-id: john at arbash-meinel.com-20091022220241-x902omy964q6pk22
parent: john at arbash-meinel.com-20091022214742-8w54cvqz9r1vvte5
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: trunk
timestamp: Thu 2009-10-22 17:02:41 -0500
message:
  Do some ugly hacks to keep memory low during 'compute_referrers'.
  
  1) If there is only 1 referrer, keep the 'list' as a simple integer
  2) If there is <10 referrers, use a tuple, and create new ones as necessary
  3) If there is >=10 referrers, use a list as normal
  
  This should decrease memory a bit, when dealing with really big datasets.
  Quite a few objects will only have 1 reference, and this drops memory
  consumption down to a dict entry / pointer (because the address is
  already known to be a unique PyInt that we already have.)
-------------- next part --------------
=== modified file 'meliae/loader.py'
--- a/meliae/loader.py	2009-10-22 21:29:53 +0000
+++ b/meliae/loader.py	2009-10-22 22:02:41 +0000
@@ -202,18 +202,45 @@
         """For each object, figure out who is referencing it."""
         referrers = {} # From address => [referred from]
         id_cache = {}
+        unique_address = id_cache.setdefault 
         total = len(self.objs)
         for idx, obj in enumerate(self.objs.itervalues()):
             if self.show_progress and idx & 0x1ff == 0:
                 sys.stderr.write('compute referrers %8d / %8d        \r'
                                  % (idx, total))
             address = obj.address
-            address = id_cache.setdefault(address, address)
+            address = unique_address(address, address)
             for ref in obj.ref_list:
-                ref = id_cache.setdefault(ref, ref)
-                referrers.setdefault(ref, []).append(address)
+                ref = unique_address(ref, ref)
+                refs = referrers.get(ref, None)
+                t = type(refs)
+                if refs is None:
+                    refs = address
+                elif t is int:
+                    refs = (refs, address)
+                elif t is tuple:
+                    if len(refs) >= 10:
+                        refs = list(refs)
+                        refs.append(address)
+                    else:
+                        refs = refs + (address,)
+                elif t is list:
+                    refs.append(address)
+                else:
+                    raise TypeError('unknown refs type: %s\n'
+                                    % (t,))
+                referrers[ref] = refs
+        del id_cache
         for obj in self.objs.itervalues():
-            obj.referrers = referrers.get(obj.address, ())
+            try:
+                refs = referrers.pop(obj.address)
+            except KeyError:
+                obj.referrers = ()
+            else:
+                if type(refs) is int:
+                    obj.referrers = (refs,)
+                else:
+                    obj.referrers = refs
         if self.show_progress:
             sys.stderr.write('compute referrers %8d / %8d        \n'
                              % (idx, total))



More information about the bazaar-commits mailing list