Rev 5529: Remove the _limit_hash_buckets code. in http://bazaar.launchpad.net/~jameinel/bzr/2.3-gcb-peak-mem

Mon Nov 22 20:34:48 GMT 2010

At http://bazaar.launchpad.net/~jameinel/bzr/2.3-gcb-peak-mem

------------------------------------------------------------
revno: 5529
revision-id: john at arbash-meinel.com-20101122203416-6q45s5j57ik4i1hn
parent: john at arbash-meinel.com-20101122201859-0vxlror7x9gf1r8b
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.3-gcb-peak-mem
timestamp: Mon 2010-11-22 14:34:16 -0600
message:
  Remove the _limit_hash_buckets code.
  
  Interestingly, the new code is slower, but packs ever-so-slightly better than the old code.
  3m3s vs 4m34s, with peak mem 196MB vs 190MB. And 31206860 vs 31205560 bytes on disk.
  Slightly lower memory, slightly better packing, lots slower.
  
  Still need to be tweaking this a bit, but interesting to see where it is at nonetheless.
-------------- next part --------------
=== modified file 'bzrlib/_delta_index_pyx.pyx'

--- a/bzrlib/_delta_index_pyx.pyx	2010-11-22 20:18:59 +0000
+++ b/bzrlib/_delta_index_pyx.pyx	2010-11-22 20:34:16 +0000
@@ -679,39 +679,6 @@
         if ptr != end:
             raise ValueError('did not reach the end of stream')
 
-
-    def _limit_hash_buckets(self):
-        """Avoid pathological behavior by limiting the entries in buckets."""
-        # TODO: when adding items, we can get a feel for how many items are in
-        #       each 'bucket', and determine which ones need to be updated.
-        #       Especially since we don't store the data into explicit buckets
-        #       anymore.
-        pass
-        # cdef RabinBucket bucket
-        # cdef rabin_offset entry
-        # cdef rabin_offset keep
-        # cdef unsigned int extra
-
-        # for bucket in self.buckets:
-        #     if bucket is None or bucket.count < self.max_bucket_size:
-        #         continue
-        #     # This bucket is over-sized, so lets remove a few entries
-        #     extra = bucket.count - self.max_bucket_size
-        #     entry = bucket.first
-        #     # For now, we just mirror the existing code
-        #     acc = 0
-        #     while entry:
-        #         acc += extra
-        #         if acc > 0:
-        #             keep = entry
-        #             while acc > 0:
-        #                 entry = entry.next
-        #                 acc -= self.max_bucket_size
-        #             keep.next = entry.next
-        #         entry = entry.next
-        #     self.num_entries -= extra
-        #     bucket.count -= extra
-
     def add_source(self, source, extra_offset=0):
         """Add a source of more data to delta against.