Rev 3653: (jam) Updates to the btree indexing code. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Thu Aug 28 04:01:12 BST 2008

At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3653
revision-id: pqm at pqm.ubuntu.com-20080828030104-6a87mmhafprj1prs
parent: pqm at pqm.ubuntu.com-20080828023029-r2qwgt7zu9udtla3
parent: john at arbash-meinel.com-20080828021558-5ek9vno64yv7yx96
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2008-08-28 04:01:04 +0100
message:
  (jam) Updates to the btree indexing code.
modified:
  bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
  bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
  bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
  bzrlib/tests/test_chunk_writer.py test_chunk_writer.py-20080630234519-6ggn4id17nipovny-2
    ------------------------------------------------------------
    revno: 3641.5.19
    revision-id: john at arbash-meinel.com-20080828021558-5ek9vno64yv7yx96
    parent: john at arbash-meinel.com-20080828015958-bvdt8spf2ls57s39
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Wed 2008-08-27 21:15:58 -0500
    message:
      Documentation cleanup pass.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.18
    revision-id: john at arbash-meinel.com-20080828015958-bvdt8spf2ls57s39
    parent: john at arbash-meinel.com-20080826005610-275jq9uqje3prqry
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Wed 2008-08-27 20:59:58 -0500
    message:
      Clean out the global state, good for prototyping and tuning, bad for production code.
      (as recommended by Robert)
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.17
    revision-id: john at arbash-meinel.com-20080826005610-275jq9uqje3prqry
    parent: john at arbash-meinel.com-20080826004207-0kp25c5a3z3rsd9t
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Mon 2008-08-25 19:56:10 -0500
    message:
      Fix up the test suite now that things don't pack as well.
    modified:
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 3641.5.16
    revision-id: john at arbash-meinel.com-20080826004207-0kp25c5a3z3rsd9t
    parent: john at arbash-meinel.com-20080826002330-3ulihyk5qxd6b3dw
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Mon 2008-08-25 19:42:07 -0500
    message:
      More benchmarks.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.15
    revision-id: john at arbash-meinel.com-20080826002330-3ulihyk5qxd6b3dw
    parent: john at arbash-meinel.com-20080826001009-16cw7yc88rix9hai
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Mon 2008-08-25 19:23:30 -0500
    message:
      and some new timings for bzr.dev and zsync
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.14
    revision-id: john at arbash-meinel.com-20080826001009-16cw7yc88rix9hai
    parent: john at arbash-meinel.com-20080825200333-s1s2d3fq4oq9igm7
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Mon 2008-08-25 19:10:09 -0500
    message:
      Some benchmark updates
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.13
    revision-id: john at arbash-meinel.com-20080825200333-s1s2d3fq4oq9igm7
    parent: john at arbash-meinel.com-20080823174752-qfy5puc1pta4jexo
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Mon 2008-08-25 15:03:33 -0500
    message:
      Include some timing on msyql, which has different
      characteristics. Probably because the compression ratio is higher.
      (avg 7:1 rather than 5:1 for bzr.dev). Probably because the
      converter used more compressible ids.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.12
    revision-id: john at arbash-meinel.com-20080823174752-qfy5puc1pta4jexo
    parent: john at arbash-meinel.com-20080823005634-tmjvmlbaf4hmhzwj
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Sat 2008-08-23 12:47:52 -0500
    message:
      Play around with max_repack=0 and limiting work done based on
      the number of ZSYNC_FLUSH calls. We can shave 1s/7.5s by doing this,
      though we end up with probably a 20% increase in storage size
      (for the reasonable values, for Z_SYNC=0 we can get 3x storage size)
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.11
    revision-id: john at arbash-meinel.com-20080823005634-tmjvmlbaf4hmhzwj
    parent: john at arbash-meinel.com-20080823005056-r2ccmi0zrbuvj0mb
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 19:56:34 -0500
    message:
      Some small test cleanup
    modified:
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 3641.5.10
    revision-id: john at arbash-meinel.com-20080823005056-r2ccmi0zrbuvj0mb
    parent: john at arbash-meinel.com-20080822211125-p9bmwrug1j4ann6d
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 19:50:56 -0500
    message:
      Only Z_SYNC_FLUSH when we have extra bytes.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.9
    revision-id: john at arbash-meinel.com-20080822211125-p9bmwrug1j4ann6d
    parent: john at arbash-meinel.com-20080822210515-f8qwnjnpqk560gly
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 16:11:25 -0500
    message:
      Shrink the page size so that the three_level and iter_all
      tests don't have to use so many keys to get a three-level index.
    modified:
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 3641.5.8
    revision-id: john at arbash-meinel.com-20080822210515-f8qwnjnpqk560gly
    parent: john at arbash-meinel.com-20080822205059-xrra00puh3onekbi
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 16:05:15 -0500
    message:
      Fix up the test suite, now that we pack more efficiently.
      Unfortunately, that means upping the node count for the iter_all and
      three_level tests.
    modified:
      bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 3641.5.7
    revision-id: john at arbash-meinel.com-20080822205059-xrra00puh3onekbi
    parent: john at arbash-meinel.com-20080822205036-7el97kr85o0atqit
    parent: pqm at pqm.ubuntu.com-20080822042630-on3dxyek4ezk0miu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 15:50:59 -0500
    message:
      Merge in bzr.dev to get the pyrex update
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/_btree_serializer_c.pyx _parse_btree_c.pyx-20080703034413-3q25bklkenti3p8p-2
      bzrlib/bzrdir.py               bzrdir.py-20060131065624-156dfea39c4387cb
      bzrlib/tests/branch_implementations/test_permissions.py test_permissions.py-20060210110243-245c01403bf0fde6
    ------------------------------------------------------------
    revno: 3641.5.6
    revision-id: john at arbash-meinel.com-20080822205036-7el97kr85o0atqit
    parent: john at arbash-meinel.com-20080822203551-cnz8r1hpi4wyfamh
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 15:50:36 -0500
    message:
      Include the benchmarks for the new estimator.
      All around, it seems to do a much better job. A bit faster, and a bit smaller.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.5
    revision-id: john at arbash-meinel.com-20080822203551-cnz8r1hpi4wyfamh
    parent: john at arbash-meinel.com-20080822203320-y98xykrjms4r5goj
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 15:35:51 -0500
    message:
      Document my attempt to use copy() as a look-ahead.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.4
    revision-id: john at arbash-meinel.com-20080822203320-y98xykrjms4r5goj
    parent: john at arbash-meinel.com-20080822055444-5kcr0csbbvkqbbiw
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 15:33:20 -0500
    message:
      Using a different safety margin for the first repack,
      and using 2 repacks gives us effectively the same result, while
      still making it safe for arbitary data. (With 1-repack, it does
      effect the results 3-5%, and with 2-repacks the second margin
      gives the same results.
      Also, we now can get about 2-3:1 of lines that are 'blindly' added versus
      ones which are added with a SYNC.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 3641.5.3
    revision-id: john at arbash-meinel.com-20080822055444-5kcr0csbbvkqbbiw
    parent: john at arbash-meinel.com-20080822054012-ikrwmq9nm2q4h6q8
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 00:54:44 -0500
    message:
      If we repack earlier, it catches this case.
      Still need to fix the other tests, but at least
      the too_much test passes now.
      Impact on real-world results is measurable
      (2-3% final compression). Is it worth it?
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
      bzrlib/tests/test_chunk_writer.py test_chunk_writer.py-20080630234519-6ggn4id17nipovny-2
    ------------------------------------------------------------
    revno: 3641.5.2
    revision-id: john at arbash-meinel.com-20080822054012-ikrwmq9nm2q4h6q8
    parent: john at arbash-meinel.com-20080822035819-yx19e7qxdvjgaeql
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Fri 2008-08-22 00:40:12 -0500
    message:
      (broken, but hopeful) Change the compact logic.
      Instead of only paying attention to the total bytes read,
      use the fact that we *know* some of that is already compressed well.
      So instead, we just pay attention to the bytes that are added since
      the last sync. This means we Z_SYNC_FLUSH much less often.
      (After syncing, we end up with more room to add without syncing.)
      This improves both the time to compress and the final compressed
      size. Need to update tests with the new offsets.
      Also, we seem to have found a case where using Z_SYNC_FLUSH
      in the middle of a stream will actually generate *better*
      compression than compressing the whole stream in one pass.
      test_too_much_data_does_not_exceed_size triggers this.
      It *can* be packed in more than 100 bytes less than the
      amount given by a full compress.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
      bzrlib/tests/test_chunk_writer.py test_chunk_writer.py-20080630234519-6ggn4id17nipovny-2
    ------------------------------------------------------------
    revno: 3641.5.1
    revision-id: john at arbash-meinel.com-20080822035819-yx19e7qxdvjgaeql
    parent: john at arbash-meinel.com-20080822022908-420tr0519tdz6pxy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: btree
    timestamp: Thu 2008-08-21 22:58:19 -0500
    message:
      Update the stats for the current code layout.
      This shows why I like _max_repack=2 so much. It is
      the highest value that has 'no waste'.
      At _max_repack=2, you can always sneak in 1 more
      line, which avoids triggering an extra repack.
      Also, updating the timings with the current tuning.
    modified:
      bzrlib/chunk_writer.py         chunk_writer.py-20080630234519-6ggn4id17nipovny-1
      bzrlib/tests/test_chunk_writer.py test_chunk_writer.py-20080630234519-6ggn4id17nipovny-2
=== modified file 'bzrlib/btree_index.py'

--- a/bzrlib/btree_index.py	2008-08-22 02:18:27 +0000
+++ b/bzrlib/btree_index.py	2008-08-28 01:59:58 +0000
@@ -53,13 +53,6 @@
 # 4K per page: 4MB - 1000 entries
 _NODE_CACHE_SIZE = 1000
 
-leaf_value_hits = [0, 0]
-internal_node_hits = [0, 0]
-leaf_node_hits = [0, 0]
-miss_attempts = 0  # Missed this entry while looking up
-bisect_shortcut = [0, 0]
-dupes = [0]
-
 
 class _BuilderRow(object):
     """The stored state accumulated while writing out a row in the index.
@@ -622,7 +615,7 @@
             found[node_pos] = node
         return found
 
-    def _get_nodes(self, cache, node_indexes, counter):
+    def _get_nodes(self, cache, node_indexes):
         found = {}
         needed = []
         for idx in node_indexes:
@@ -631,10 +624,8 @@
                 continue
             try:
                 found[idx] = cache[idx]
-                counter[0] += 1
             except KeyError:
                 needed.append(idx)
-                counter[1] += 1
         found.update(self._cache_nodes(needed, cache))
         return found
 
@@ -643,13 +634,11 @@
 
         After getting it, the node will be cached.
         """
-        return self._get_nodes(self._internal_node_cache, node_indexes,
-                               internal_node_hits)
+        return self._get_nodes(self._internal_node_cache, node_indexes)
 
     def _get_leaf_nodes(self, node_indexes):
         """Get a bunch of nodes, from cache or disk."""
-        found = self._get_nodes(self._leaf_node_cache, node_indexes,
-                                leaf_node_hits)
+        found = self._get_nodes(self._leaf_node_cache, node_indexes)
         if self._leaf_value_cache is not None:
             for node in found.itervalues():
                 for key, value in node.keys.iteritems():
@@ -715,17 +704,13 @@
         # iter_steps = len(in_keys) + len(fixed_keys)
         # bisect_steps = len(in_keys) * math.log(len(fixed_keys), 2)
         if len(in_keys) == 1: # Bisect will always be faster for M = 1
-            bisect_shortcut[0] += 1
             return [(bisect_right(fixed_keys, in_keys[0]), in_keys)]
         # elif bisect_steps < iter_steps:
-        #     bisect_shortcut[0] += len(in_keys)
         #     offsets = {}
         #     for key in in_keys:
         #         offsets.setdefault(bisect_right(fixed_keys, key),
         #                            []).append(key)
         #     return [(o, offsets[o]) for o in sorted(offsets)]
-        else:
-            bisect_shortcut[1] += len(in_keys)
         in_keys_iter = iter(in_keys)
         fixed_keys_iter = enumerate(fixed_keys)
         cur_in_key = in_keys_iter.next()
@@ -794,7 +779,6 @@
         if not keys:
             return
 
-        global leaf_value_hits, miss_attempts, dupes
         if not self.key_count():
             return
 
@@ -805,7 +789,6 @@
             for key in keys:
                 value = self._leaf_value_cache.get(key, None)
                 if value is not None:
-                    leaf_value_hits[0] += 1
                     # This key is known not to be here, skip it
                     value, refs = value
                     if self.node_ref_lists:
@@ -813,7 +796,6 @@
                     else:
                         yield (self, key, value)
                 else:
-                    leaf_value_hits[1] += 1
                     needed_keys.append(key)
 
         last_key = None
@@ -857,8 +839,6 @@
                         yield (self, next_sub_key, value, refs)
                     else:
                         yield (self, next_sub_key, value)
-                else:
-                    miss_attempts += 1
 
     def iter_entries_prefix(self, keys):
         """Iterate over keys within the index using prefix matching.

=== modified file 'bzrlib/chunk_writer.py'
--- a/bzrlib/chunk_writer.py	2008-08-22 02:09:36 +0000
+++ b/bzrlib/chunk_writer.py	2008-08-28 02:15:58 +0000
@@ -36,36 +36,60 @@
         will sometimes start over and compress the whole list to get tighter
         packing. We get diminishing returns after a while, so this limits the
         number of times we will try.
-        In testing, some values for bzr.dev::
-
-                    w/o copy    w/ copy     w/ copy ins w/ copy & save
-            repack  time  MB    time  MB    time  MB    time  MB
-             1       8.8  5.1    8.9  5.1    9.6  4.4   12.5  4.1
-             2       9.6  4.4   10.1  4.3   10.4  4.2   11.1  4.1
-             3      10.6  4.2   11.1  4.1   11.2  4.1   11.3  4.1
-             4      12.0  4.1
-             5      12.6  4.1
-            20      12.9  4.1   12.2  4.1   12.3  4.1
-
-        In testing, some values for mysql-unpacked::
-
-                    w/o copy    w/ copy     w/ copy ins w/ copy & save
-            repack  time  MB    time  MB    time  MB    time  MB
-             1      56.6  16.9              60.7  14.2
-             2      59.3  14.1              62.6  13.5  64.3  13.4
-             3      64.4  13.5
-            20      73.4  13.4
-
-    :cvar _default_min_compression_size: The expected minimum compression.
-        While packing nodes into the page, we won't Z_SYNC_FLUSH until we have
-        received this much input data. This saves time, because we don't bloat
-        the result with SYNC entries (and then need to repack), but if it is
-        set too high we will accept data that will never fit and trigger a
-        fault later.
+        The default is to try to avoid recompressing entirely, but setting this
+        to something like 20 will give maximum compression.
+
+    :cvar _max_zsync: Another tunable nob. If _max_repack is set to 0, then you
+        can limit the number of times we will try to pack more data into a
+        node. This allows us to do a single compression pass, rather than
+        trying until we overflow, and then recompressing again.
     """
-
-    _max_repack = 2
-    _default_min_compression_size = 1.8
+    #    In testing, some values for bzr.dev::
+    #        repack  time  MB   max   full
+    #         1       7.5  4.6  1140  0
+    #         2       8.4  4.2  1036  1          6.8
+    #         3       9.8  4.1  1012  278
+    #         4      10.8  4.1  728   945
+    #        20      11.1  4.1  0     1012
+    #        repack = 0
+    #        zsync   time  MB    repack  max_z   time w/ add_node
+    #         0       6.7  24.7  0       6270    5.0
+    #         1       6.5  13.2  0       3342    4.3
+    #         2       6.6   9.6  0       2414    4.9
+    #         5       6.5   6.2  0       1549    4.8
+    #         6       6.5   5.8  1       1435    4.8
+    #         7       6.6   5.5  19      1337    4.8
+    #         8       6.7   5.3  81      1220    4.4
+    #        10       6.8   5.0  260     967     5.3
+    #        11       6.8   4.9  366     839     5.3
+    #        12       6.9   4.8  454     731     5.1
+    #        15       7.2   4.7  704     450     5.8
+    #        20       7.7   4.6  1133    7       5.8
+
+    #    In testing, some values for mysql-unpacked::
+    #                next_bytes estim
+    #        repack  time  MB    hit_max full
+    #         1      51.7  15.4  3913  0
+    #         2      54.4  13.7  3467  0         35.4
+    #        20      67.0  13.4  0     3380      46.7
+    #        repack=0
+    #        zsync                               time w/ add_node
+    #         0      47.7 116.5  0       29782   29.5
+    #         1      48.5  60.2  0       15356   27.8
+    #         2      48.1  42.4  0       10822   27.8
+    #         5      48.3  25.5  0       6491    26.8
+    #         6      48.0  23.2  13      5896    27.3
+    #         7      48.1  21.6  29      5451    27.5
+    #         8      48.1  20.3  52      5108    27.1
+    #        10      46.9  18.6  195     4526    29.4
+    #        11      48.8  18.0  421     4143    29.2
+    #        12      47.4  17.5  702     3738    28.0
+    #        15      49.6  16.5  1223    2969    28.9
+    #        20      48.9  15.7  2182    1810    29.6
+    #        30            15.4  3891    23      31.4
+
+    _max_repack = 0
+    _max_zsync = 8
 
     def __init__(self, chunk_size, reserved=0):
         """Create a ChunkWriter to write chunk_size chunks.
@@ -73,35 +97,46 @@
         :param chunk_size: The total byte count to emit at the end of the
             chunk.
         :param reserved: How many bytes to allow for reserved data. reserved
-            data space can only be written to via the write_reserved method.
+            data space can only be written to via the write(..., reserved=True).
         """
         self.chunk_size = chunk_size
         self.compressor = zlib.compressobj()
         self.bytes_in = []
         self.bytes_list = []
         self.bytes_out_len = 0
-        self.compressed = None
-        self.seen_bytes = 0
+        # bytes that have been seen, but not included in a flush to out yet
+        self.unflushed_in_bytes = 0
         self.num_repack = 0
+        self.num_zsync = 0
         self.unused_bytes = None
         self.reserved_size = reserved
-        self.min_compress_size = self._default_min_compression_size
 
     def finish(self):
         """Finish the chunk.
 
         This returns the final compressed chunk, and either None, or the
         bytes that did not fit in the chunk.
+
+        :return: (compressed_bytes, unused_bytes, num_nulls_needed)
+            compressed_bytes    a list of bytes that were output from the
+                                compressor. If the compressed length was not
+                                exactly chunk_size, the final string will be a
+                                string of all null bytes to pad this to
+                                chunk_size
+            unused_bytes        None, or the last bytes that were added, which
+                                we could not fit.
+            num_nulls_needed    How many nulls are padded at the end
         """
         self.bytes_in = None # Free the data cached so far, we don't need it
         out = self.compressor.flush(Z_FINISH)
         self.bytes_list.append(out)
         self.bytes_out_len += len(out)
+
         if self.bytes_out_len > self.chunk_size:
             raise AssertionError('Somehow we ended up with too much'
                                  ' compressed data, %d > %d'
                                  % (self.bytes_out_len, self.chunk_size))
-        nulls_needed = self.chunk_size - self.bytes_out_len % self.chunk_size
+        nulls_needed = self.chunk_size - self.bytes_out_len
         if nulls_needed:
             self.bytes_list.append("\x00" * nulls_needed)
         return self.bytes_list, self.unused_bytes, nulls_needed
@@ -109,19 +144,13 @@
     def _recompress_all_bytes_in(self, extra_bytes=None):
         """Recompress the current bytes_in, and optionally more.
 
-        :param extra_bytes: Optional, if supplied we will try to add it with
+        :param extra_bytes: Optional, if supplied we will add it with
             Z_SYNC_FLUSH
-        :return: (bytes_out, compressor, alt_compressed)
+        :return: (bytes_out, bytes_out_len, alt_compressed)
             bytes_out   is the compressed bytes returned from the compressor
+            bytes_out_len the length of the compressed output
             compressor  An object with everything packed in so far, and
                         Z_SYNC_FLUSH called.
-            alt_compressed  If the compressor supports copy(), then this is a
-                            snapshot just before extra_bytes is added.
-                            It is (bytes_out, compressor) as well.
-                            The idea is if you find you cannot fit the new
-                            bytes, you don't have to start over.
-                            And if you *can* you don't have to Z_SYNC_FLUSH
-                            yet.
         """
         compressor = zlib.compressobj()
         bytes_out = []
@@ -134,8 +163,7 @@
         if extra_bytes:
             out = compress(extra_bytes)
             out += compressor.flush(Z_SYNC_FLUSH)
-            if out:
-                append(out)
+            append(out)
         bytes_out_len = sum(map(len, bytes_out))
         return bytes_out, bytes_out_len, compressor
 
@@ -144,60 +172,87 @@
 
         If the bytes fit, False is returned. Otherwise True is returned
         and the bytes have not been added to the chunk.
+
+        :param bytes: The bytes to include
+        :param reserved: If True, we can use the space reserved in the
+            constructor.
         """
+        if self.num_repack > self._max_repack and not reserved:
+            self.unused_bytes = bytes
+            return True
         if reserved:
             capacity = self.chunk_size
         else:
             capacity = self.chunk_size - self.reserved_size
-        # Check quickly to see if this is likely to put us outside of our
-        # budget:
-        next_seen_size = self.seen_bytes + len(bytes)
         comp = self.compressor
-        if (next_seen_size < self.min_compress_size * capacity):
-            # No need, we assume this will "just fit"
+
+        # Check to see if the currently unflushed bytes would fit with a bit of
+        # room to spare, assuming no compression.
+        next_unflushed = self.unflushed_in_bytes + len(bytes)
+        remaining_capacity = capacity - self.bytes_out_len - 10
+        if (next_unflushed < remaining_capacity):
+            # looks like it will fit
             out = comp.compress(bytes)
             if out:
                 self.bytes_list.append(out)
                 self.bytes_out_len += len(out)
             self.bytes_in.append(bytes)
-            self.seen_bytes = next_seen_size
+            self.unflushed_in_bytes += len(bytes)
         else:
-            if self.num_repack >= self._max_repack and not reserved:
-                # We already know we don't want to try to fit more
+            # This may or may not fit, try to add it with Z_SYNC_FLUSH
+            # Note: It is tempting to do this as a look-ahead pass, and to
+            #       'copy()' the compressor before flushing. However, it seems
+            #       that Which means that it is the same thing as increasing
+            #       repack, similar cost, same benefit. And this way we still
+            #       have the 'repack' knob that can be adjusted, and not depend
+            #       on a platform-specific 'copy()' function.
+            self.num_zsync += 1
+            if self._max_repack == 0 and self.num_zsync > self._max_zsync:
+                self.num_repack += 1
+                self.unused_bytes = bytes
                 return True
-            # This may or may not fit, try to add it with Z_SYNC_FLUSH
             out = comp.compress(bytes)
             out += comp.flush(Z_SYNC_FLUSH)
+            self.unflushed_in_bytes = 0
             if out:
                 self.bytes_list.append(out)
                 self.bytes_out_len += len(out)
-            if self.bytes_out_len + 10 > capacity:
+
+            # We are a bit extra conservative, because it seems that you *can*
+            # get better compression with Z_SYNC_FLUSH than a full compress. It
+            # is probably very rare, but we were able to trigger it.
+            if self.num_repack == 0:
+                safety_margin = 100
+            else:
+                safety_margin = 10
+            if self.bytes_out_len + safety_margin <= capacity:
+                # It fit, so mark it added
+                self.bytes_in.append(bytes)
+            else:
                 # We are over budget, try to squeeze this in without any
                 # Z_SYNC_FLUSH calls
                 self.num_repack += 1
-                bytes_out, this_len, compressor = self._recompress_all_bytes_in(bytes)
+                (bytes_out, this_len,
+                 compressor) = self._recompress_all_bytes_in(bytes)
+                if self.num_repack >= self._max_repack:
+                    # When we get *to* _max_repack, bump over so that the
+                    # earlier > _max_repack will be triggered.
+                    self.num_repack += 1
                 if this_len + 10 > capacity:
-                    # No way we can add anymore, we need to re-pack because our
-                    # compressor is now out of sync.
-                    # This seems to be rarely triggered over
-                    #   num_repack > _max_repack
-                    bytes_out, this_len, compressor = self._recompress_all_bytes_in()
+                    (bytes_out, this_len,
+                     compressor) = self._recompress_all_bytes_in()
                     self.compressor = compressor
+                    # Force us to not allow more data
+                    self.num_repack = self._max_repack + 1
                     self.bytes_list = bytes_out
                     self.bytes_out_len = this_len
                     self.unused_bytes = bytes
                     return True
                 else:
                     # This fits when we pack it tighter, so use the new packing
-                    # There is one Z_SYNC_FLUSH call in
-                    # _recompress_all_bytes_in
                     self.compressor = compressor
                     self.bytes_in.append(bytes)
                     self.bytes_list = bytes_out
                     self.bytes_out_len = this_len
-            else:
-                # It fit, so mark it added
-                self.bytes_in.append(bytes)
-                self.seen_bytes = next_seen_size
         return False
 

=== modified file 'bzrlib/tests/test_btree_index.py'
--- a/bzrlib/tests/test_btree_index.py	2008-08-22 02:18:27 +0000
+++ b/bzrlib/tests/test_btree_index.py	2008-08-26 00:56:10 +0000
@@ -113,6 +113,14 @@
                 keys.append((key, value, refs))
         return keys
 
+    def shrink_page_size(self):
+        """Shrink the default page size so that less fits in a page."""
+        old_page_size = btree_index._PAGE_SIZE
+        def cleanup():
+            btree_index._PAGE_SIZE = old_page_size
+        self.addCleanup(cleanup)
+        btree_index._PAGE_SIZE = 2048
+
 
 class TestBTreeBuilder(BTreeTestCase):
 
@@ -196,16 +204,16 @@
 
     def test_2_leaves_1_0(self):
         builder = btree_index.BTreeBuilder(key_elements=1, reference_lists=0)
-        nodes = self.make_nodes(800, 1, 0)
+        nodes = self.make_nodes(400, 1, 0)
         for node in nodes:
             builder.add_node(*node)
         # NamedTemporaryFile dies on builder.finish().read(). weird.
         temp_file = builder.finish()
         content = temp_file.read()
         del temp_file
-        self.assertEqual(10646, len(content))
+        self.assertEqual(9283, len(content))
         self.assertEqual(
-            "B+Tree Graph Index 2\nnode_ref_lists=0\nkey_elements=1\nlen=800\n"
+            "B+Tree Graph Index 2\nnode_ref_lists=0\nkey_elements=1\nlen=400\n"
             "row_lengths=1,2\n",
             content[:77])
         root = content[77:4096]
@@ -215,19 +223,18 @@
         expected_root = (
             "type=internal\n"
             "offset=0\n"
-            "503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503\n"
-            )
+            ) + ("307" * 40) + "\n"
         self.assertEqual(expected_root, root_bytes)
         # We already know serialisation works for leaves, check key selection:
         leaf1_bytes = zlib.decompress(leaf1)
         sorted_node_keys = sorted(node[0] for node in nodes)
         node = btree_index._LeafNode(leaf1_bytes, 1, 0)
-        self.assertEqual(448, len(node.keys))
-        self.assertEqual(sorted_node_keys[:448], sorted(node.keys))
+        self.assertEqual(231, len(node.keys))
+        self.assertEqual(sorted_node_keys[:231], sorted(node.keys))
         leaf2_bytes = zlib.decompress(leaf2)
         node = btree_index._LeafNode(leaf2_bytes, 1, 0)
-        self.assertEqual(800 - 448, len(node.keys))
-        self.assertEqual(sorted_node_keys[448:], sorted(node.keys))
+        self.assertEqual(400 - 231, len(node.keys))
+        self.assertEqual(sorted_node_keys[231:], sorted(node.keys))
 
     def test_last_page_rounded_1_layer(self):
         builder = btree_index.BTreeBuilder(key_elements=1, reference_lists=0)
@@ -253,38 +260,36 @@
 
     def test_last_page_not_rounded_2_layer(self):
         builder = btree_index.BTreeBuilder(key_elements=1, reference_lists=0)
-        nodes = self.make_nodes(800, 1, 0)
+        nodes = self.make_nodes(400, 1, 0)
         for node in nodes:
             builder.add_node(*node)
         # NamedTemporaryFile dies on builder.finish().read(). weird.
         temp_file = builder.finish()
         content = temp_file.read()
         del temp_file
-        self.assertEqual(10646, len(content))
+        self.assertEqual(9283, len(content))
         self.assertEqual(
-            "B+Tree Graph Index 2\nnode_ref_lists=0\nkey_elements=1\nlen=800\n"
+            "B+Tree Graph Index 2\nnode_ref_lists=0\nkey_elements=1\nlen=400\n"
             "row_lengths=1,2\n",
             content[:77])
         # Check the last page is well formed
         leaf2 = content[8192:]
         leaf2_bytes = zlib.decompress(leaf2)
         node = btree_index._LeafNode(leaf2_bytes, 1, 0)
-        self.assertEqual(800 - 448, len(node.keys))
+        self.assertEqual(400 - 231, len(node.keys))
         sorted_node_keys = sorted(node[0] for node in nodes)
-        self.assertEqual(sorted_node_keys[448:], sorted(node.keys))
+        self.assertEqual(sorted_node_keys[231:], sorted(node.keys))
 
     def test_three_level_tree_details(self):
         # The left most pointer in the second internal node in a row should
         # pointer to the second node that the internal node is for, _not_
         # the first, otherwise the first node overlaps with the last node of
         # the prior internal node on that row.
-        # We will be adding 100,000 nodes, so spill at 100,001 to prevent
-        # having to flush anything out to disk.
-        builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2,
-                                           spill_at=100001)
-        # 100K nodes is *just* enough to create a two internal nodes on the
-        # second level
-        nodes = self.make_nodes(50000, 2, 2)
+        self.shrink_page_size()
+        builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2)
+        # 40K nodes is enough to create a two internal nodes on the second
+        # level, with a 2K page size
+        nodes = self.make_nodes(20000, 2, 2)
 
         for node in nodes:
             builder.add_node(*node)
@@ -314,28 +319,28 @@
 
     def test_2_leaves_2_2(self):
         builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2)
-        nodes = self.make_nodes(200, 2, 2)
+        nodes = self.make_nodes(100, 2, 2)
         for node in nodes:
             builder.add_node(*node)
         # NamedTemporaryFile dies on builder.finish().read(). weird.
         temp_file = builder.finish()
         content = temp_file.read()
         del temp_file
-        self.assertEqual(10574, len(content))
+        self.assertEqual(12643, len(content))
         self.assertEqual(
-            "B+Tree Graph Index 2\nnode_ref_lists=2\nkey_elements=2\nlen=400\n"
-            "row_lengths=1,2\n",
+            "B+Tree Graph Index 2\nnode_ref_lists=2\nkey_elements=2\nlen=200\n"
+            "row_lengths=1,3\n",
             content[:77])
         root = content[77:4096]
         leaf1 = content[4096:8192]
-        leaf2 = content[8192:]
+        leaf2 = content[8192:12288]
+        leaf3 = content[12288:]
         root_bytes = zlib.decompress(root)
         expected_root = (
             "type=internal\n"
             "offset=0\n"
-            "1111111111111111111111111111111111111111\x00"
-            "126126126126126126126126126126126126126126126126126126126"
-            "126126126126126126126126126126126126126126126126126126126126126\n"
+            + ("0" * 40) + "\x00" + ("91" * 40) + "\n"
+            + ("1" * 40) + "\x00" + ("81" * 40) + "\n"
             )
         self.assertEqual(expected_root, root_bytes)
         # We assume the other leaf nodes have been written correctly - layering
@@ -594,7 +599,7 @@
         # The entire index should have been read, as it is one page long.
         self.assertEqual([('readv', 'index', [(0, size)], False, None)],
             transport._activity)
-        self.assertEqual(1593, size)
+        self.assertEqual(1199, size)
 
     def test_2_levels_key_count_2_2(self):
         builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2)
@@ -603,7 +608,7 @@
             builder.add_node(*node)
         transport = get_transport('trace+' + self.get_url(''))
         size = transport.put_file('index', builder.finish())
-        self.assertEqual(10242, size)
+        self.assertEqual(17692, size)
         index = btree_index.BTreeGraphIndex(transport, 'index', size)
         del transport._activity[:]
         self.assertEqual([], transport._activity)
@@ -614,36 +619,36 @@
 
     def test_validate_one_page(self):
         builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2)
+        nodes = self.make_nodes(45, 2, 2)
+        for node in nodes:
+            builder.add_node(*node)
+        transport = get_transport('trace+' + self.get_url(''))
+        size = transport.put_file('index', builder.finish())
+        index = btree_index.BTreeGraphIndex(transport, 'index', size)
+        del transport._activity[:]
+        self.assertEqual([], transport._activity)
+        index.validate()
+        # The entire index should have been read linearly.
+        self.assertEqual([('readv', 'index', [(0, size)], False, None)],
+            transport._activity)
+        self.assertEqual(1514, size)
+
+    def test_validate_two_pages(self):
+        builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2)
         nodes = self.make_nodes(80, 2, 2)
         for node in nodes:
             builder.add_node(*node)
         transport = get_transport('trace+' + self.get_url(''))
         size = transport.put_file('index', builder.finish())
-        index = btree_index.BTreeGraphIndex(transport, 'index', size)
-        del transport._activity[:]
-        self.assertEqual([], transport._activity)
-        index.validate()
-        # The entire index should have been read linearly.
-        self.assertEqual([('readv', 'index', [(0, size)], False, None)],
-            transport._activity)
-        self.assertEqual(3846, size)
-
-    def test_validate_two_pages(self):
-        builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2)
-        nodes = self.make_nodes(160, 2, 2)
-        for node in nodes:
-            builder.add_node(*node)
-        transport = get_transport('trace+' + self.get_url(''))
-        size = transport.put_file('index', builder.finish())
         # Root page, 2 leaf pages
-        self.assertEqual(10242, size)
+        self.assertEqual(9339, size)
         index = btree_index.BTreeGraphIndex(transport, 'index', size)
         del transport._activity[:]
         self.assertEqual([], transport._activity)
         index.validate()
         # The entire index should have been read linearly.
         self.assertEqual([('readv', 'index', [(0, 4096)], False, None),
-            ('readv', 'index', [(4096, 4096), (8192, 2050)], False, None)],
+            ('readv', 'index', [(4096, 4096), (8192, 1147)], False, None)],
             transport._activity)
         # XXX: TODO: write some badly-ordered nodes, and some pointers-to-wrong
         # node and make validate find them.
@@ -686,22 +691,23 @@
     def test_iter_all_entries_reads(self):
         # iterating all entries reads the header, then does a linear
         # read.
-        builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2,
-                                           spill_at=100001)
-        # 100k nodes is enough to create a three-level index, which shows that
-        # we skip the internal nodes and just read the leaf nodes.
-        nodes = self.make_nodes(50000, 2, 2)
+        self.shrink_page_size()
+        builder = btree_index.BTreeBuilder(key_elements=2, reference_lists=2)
+        # 20k nodes is enough to create a two internal nodes on the second
+        # level, with a 2K page size
+        nodes = self.make_nodes(10000, 2, 2)
         for node in nodes:
             builder.add_node(*node)
         transport = get_transport('trace+' + self.get_url(''))
         size = transport.put_file('index', builder.finish())
-        self.assertEqual(2026722, size, 'number of expected bytes in the'
+        self.assertEqual(1303220, size, 'number of expected bytes in the'
                                         ' output changed')
+        page_size = btree_index._PAGE_SIZE
         del builder
         index = btree_index.BTreeGraphIndex(transport, 'index', size)
         del transport._activity[:]
         self.assertEqual([], transport._activity)
-        found_nodes = list(index.iter_all_entries())
+        found_nodes = self.time(list, index.iter_all_entries())
         bare_nodes = []
         for node in found_nodes:
             self.assertTrue(node[0] is index)
@@ -709,7 +715,7 @@
         self.assertEqual(3, len(index._row_lengths),
             "Not enough rows: %r" % index._row_lengths)
         # Should be as long as the nodes we supplied
-        self.assertEqual(100000, len(found_nodes))
+        self.assertEqual(20000, len(found_nodes))
         # Should have the same content
         self.assertEqual(set(nodes), set(bare_nodes))
         # Should have done linear scan IO up the index, ignoring
@@ -717,15 +723,15 @@
         # The entire index should have been read
         total_pages = sum(index._row_lengths)
         self.assertEqual(total_pages, index._row_offsets[-1])
-        self.assertEqual(2026722, size)
+        self.assertEqual(1303220, size)
         # The start of the leaves
-        first_byte = index._row_offsets[-2] * btree_index._PAGE_SIZE
+        first_byte = index._row_offsets[-2] * page_size
         readv_request = []
-        for offset in range(first_byte, size, 4096):
-            readv_request.append((offset, 4096))
+        for offset in range(first_byte, size, page_size):
+            readv_request.append((offset, page_size))
         # The last page is truncated
-        readv_request[-1] = (readv_request[-1][0], 3298)
-        expected = [('readv', 'index', [(0, 4096)], False, None),
+        readv_request[-1] = (readv_request[-1][0], 1303220 % page_size)
+        expected = [('readv', 'index', [(0, page_size)], False, None),
              ('readv',  'index', readv_request, False, None)]
         if expected != transport._activity:
             self.assertEqualDiff(pprint.pformat(expected),
@@ -765,7 +771,7 @@
         self.assertEqual(nodes[30], bare_nodes[0])
         # Should have read the root node, then one leaf page:
         self.assertEqual([('readv', 'index', [(0, 4096)], False, None),
-             ('readv',  'index', [(4096, 4096), ], False, None)],
+             ('readv',  'index', [(8192, 4096), ], False, None)],
             transport._activity)
 
     def test_iter_key_prefix_1_element_key_None(self):

=== modified file 'bzrlib/tests/test_chunk_writer.py'
--- a/bzrlib/tests/test_chunk_writer.py	2008-08-22 02:09:36 +0000
+++ b/bzrlib/tests/test_chunk_writer.py	2008-08-22 05:54:44 +0000
@@ -58,8 +58,9 @@
             # Create a line with this group
             lines.append(''.join(map(str, numbers)) + '\n')
         writer = chunk_writer.ChunkWriter(4096)
-        for line in lines:
+        for idx, line in enumerate(lines):
             if writer.write(line):
+                self.assertEqual(46, idx)
                 break
         bytes_list, unused, _ = writer.finish()
         node_bytes = self.check_chunk(bytes_list, 4096)
@@ -78,9 +79,12 @@
             # Create a line with this group
             lines.append(''.join(map(str, numbers)) + '\n')
         writer = chunk_writer.ChunkWriter(4096, 256)
-        for line in lines:
+        for idx, line in enumerate(lines):
             if writer.write(line):
+                self.assertEqual(44, idx)
                 break
+        else:
+            self.fail('We were able to write all lines')
         self.assertFalse(writer.write("A"*256, reserved=True))
         bytes_list, unused, _ = writer.finish()
         node_bytes = self.check_chunk(bytes_list, 4096)