[MERGE] Btree index updates

John Arbash Meinel john at arbash-meinel.com
Thu Aug 28 02:22:38 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Mon, 2008-08-25 at 20:07 -0500, John Arbash Meinel wrote:
>> === modified file 'bzrlib/chunk_writer.py'
>> --- bzrlib/chunk_writer.py      2008-08-22 02:09:36 +0000
>> +++ bzrlib/chunk_writer.py      2008-08-26 00:42:07 +0000
>> @@ -20,6 +20,10 @@
>>  import zlib
>>  from zlib import Z_FINISH, Z_SYNC_FLUSH
>>  
>> +# [max_repack, buffer_full, repacks_with_space, min_compression,
>> +#  total_bytes_in, total_bytes_out, avg_comp,
>> +#  bytes_autopack, bytes_sync_packed, num_full_by_zsync]
>> +_stats = [0, 0, 0, 999, 0, 0, 0, 0, 0, 0]
> 
> Do we need global state for this? I'd rather have a callback or
> something that can be used for instrumentation.

Not at all. I'd even be fine with nuking all of the global state in both
btree_index and chunk_writer. It was useful for a while, but now isn't needed.

> 
>>  class ChunkWriter(object):
>>      """ChunkWriter allows writing of compressed data with a fixed
>> size.
>> @@ -38,34 +42,55 @@
>>          number of times we will try.
>>          In testing, some values for bzr.dev::
>>       2      54.4  13.7  3467  0         35.4
>> +            20      67.0  13.4  0     3380      46.7
> 
> I'd rather have the performance details in comments than docstrings;
> docstrings are the how-to-use documentation and the performance details
> really aren't helpful with that. In fact, I think it hurts because its
> not prose and readers don't know how much to skip.

Sure. It also changes quite a bit with each draft. It was mostly just a tool
for me to look at while tuning. If you prefer, I can remove it.

>>  
>>      def finish(self):
>>          """Finish the chunk.
>> @@ -97,11 +125,22 @@
>>          out = self.compressor.flush(Z_FINISH)
>>          self.bytes_list.append(out)
>>          self.bytes_out_len += len(out)
> 
> If you seed self.seen_bytes as 0.0, it will stay as a float and you
> won't need to cast. Might be faster - or slower :P.
>  
> 
> bb:tweak
> 

I think you trimmed the wrong portion. The only place I cast it is during
statistics generation (during finish().) I thought it would be good to leave
it as an int the rest of the time.

Actually any more, we don't really need to track 'seen_bytes' other than for
stats. If you want, I could trim out some of the extra code there too.

So I'd like a little bit more feedback as to what you would *like*, and I'm
happy to do some cleanup.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFItf3eJdeBCYSNAAMRAux2AJ4jN+GgyH99JPHcbpw09xSnB3irEgCeK6xB
H9F6OJs6wEyCYt0bKVIAyCY=
=Ngaw
-----END PGP SIGNATURE-----



More information about the bazaar mailing list