Rev 4671: Add some comments, etc to discussing the 'is this block full enough' in http://bazaar.launchpad.net/~jameinel/bzr/2.1b1-pack-on-the-fly

John Arbash Meinel john at arbash-meinel.com
Wed Sep 2 20:24:50 BST 2009


At http://bazaar.launchpad.net/~jameinel/bzr/2.1b1-pack-on-the-fly

------------------------------------------------------------
revno: 4671
revision-id: john at arbash-meinel.com-20090902192442-xacg1ky4pz9mryvd
parent: john at arbash-meinel.com-20090901215814-5x0804myuqf42j87
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.1b1-pack-on-the-fly
timestamp: Wed 2009-09-02 14:24:42 -0500
message:
  Add some comments, etc to discussing the 'is this block full enough'
  Change the threshold to 75% utilized (3MB or 1.5MB mixed) rather than 100% (4/2MB)
-------------- next part --------------
=== modified file 'bzrlib/groupcompress.py'
--- a/bzrlib/groupcompress.py	2009-09-01 21:58:14 +0000
+++ b/bzrlib/groupcompress.py	2009-09-02 19:24:42 +0000
@@ -596,25 +596,35 @@
         #       'start_new_block' logic. It would probably be better to factor
         #       out that logic into a shared location, so that it stays
         #       together better
-        if self._block._content_length >= 4*1024*1024:
-            # This only violates the 'large content grows to 2x single content
-            # size' rule. However most of that is probably caught by the
-            # 'len(self._factories) == 1' check.
+        # We currently assume a block is properly utilized whenever it is >75%
+        # of the size of a 'full' block. In normal operation, a block is
+        # considered full when it hits 4MB of same-file content. So any block
+        # >3MB is 'full enough'.
+        # The only time this isn't true is when a given block has large-object
+        # content. (a single file >4MB, etc.)
+        # Under these circumstances, we allow a block to grow to
+        # 2 x largest_content.  Which means that if a given block had a large
+        # object, it may actually be under-utilized. However, given that this
+        # is 'pack-on-the-fly' it is probably reasonable to not repack large
+        # contet blobs on-the-fly.
+        if self._block._content_length >= 3*1024*1024:
             return True
-        # TODO: We can get the raw content's real size from the stored data. We
-        #       have to zlib.decompress it, but we don't have to apply the deltas.
+        # If a block is <3MB, it still may be considered 'full' if it contains
+        # mixed content. The current rule is 2MB of mixed content is considered
+        # full. So check to see if this block contains mixed content, and
+        # set the threshold appropriately.
         common_prefix = None
         for factory in self._factories:
             prefix = factory.key[:-1]
             if common_prefix is None:
                 common_prefix = prefix
             elif prefix != common_prefix:
-                # No common prefix
-                common_prefix = None
+                # Mixed content, check the size appropriately
+                if self._block._content_length >= 2*768*1024: #1.5MB
+                    return True
                 break
-        if common_prefix is None and self._block._content_length >= 2*1024*1024:
-            # Mixed content blocks are capped at 2MB
-            return True
+        # The content failed both the mixed check and the single-content check
+        # so obviously it is not fully utilized
         return False
 
     def _check_rebuild_block(self):



More information about the bazaar-commits mailing list