Rev 3676: Using a different safety margin for the first repack, in http://bzr.arbash-meinel.com/branches/bzr/1.7-dev/btree
John Arbash Meinel
john at arbash-meinel.com
Fri Aug 22 21:33:29 BST 2008
At http://bzr.arbash-meinel.com/branches/bzr/1.7-dev/btree
------------------------------------------------------------
revno: 3676
revision-id: john at arbash-meinel.com-20080822203320-y98xykrjms4r5goj
parent: john at arbash-meinel.com-20080822055444-5kcr0csbbvkqbbiw
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: btree
timestamp: Fri 2008-08-22 15:33:20 -0500
message:
Using a different safety margin for the first repack,
and using 2 repacks gives us effectively the same result, while
still making it safe for arbitary data. (With 1-repack, it does
effect the results 3-5%, and with 2-repacks the second margin
gives the same results.
Also, we now can get about 2-3:1 of lines that are 'blindly' added versus
ones which are added with a SYNC.
modified:
bzrlib/chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
-------------- next part --------------
=== modified file 'bzrlib/chunk_writer.py'
--- a/bzrlib/chunk_writer.py 2008-08-22 05:54:44 +0000
+++ b/bzrlib/chunk_writer.py 2008-08-22 20:33:20 +0000
@@ -21,8 +21,9 @@
from zlib import Z_FINISH, Z_SYNC_FLUSH
# [max_repack, buffer_full, repacks_with_space, min_compression,
-# total_bytes_in, total_bytes_out, avg_comp]
-_stats = [0, 0, 0, 999, 0, 0, 0]
+# total_bytes_in, total_bytes_out, avg_comp,
+# bytes_autopack, bytes_sync_packed]
+_stats = [0, 0, 0, 999, 0, 0, 0, 0, 0]
class ChunkWriter(object):
"""ChunkWriter allows writing of compressed data with a fixed size.
@@ -169,8 +170,10 @@
self.bytes_in.append(bytes)
self.seen_bytes += len(bytes)
self.unflushed_in_bytes += len(bytes)
+ _stats[7] += 1 # len(bytes)
else:
# This may or may not fit, try to add it with Z_SYNC_FLUSH
+ _stats[8] += 1 # len(bytes)
out = comp.compress(bytes)
out += comp.flush(Z_SYNC_FLUSH)
self.unflushed_in_bytes = 0
@@ -181,7 +184,11 @@
# We are a bit extra conservative, because it seems that you *can*
# get better compression with Z_SYNC_FLUSH than a full compress. It
# is probably very rare, but we were able to trigger it.
- if self.bytes_out_len + 100 <= capacity:
+ if self.num_repack == 0:
+ safety_margin = 100
+ else:
+ safety_margin = 10
+ if self.bytes_out_len + safety_margin <= capacity:
# It fit, so mark it added
self.bytes_in.append(bytes)
self.seen_bytes += len(bytes)
More information about the bazaar-commits
mailing list