[MERGE] Btree index updates
John Arbash Meinel
john at arbash-meinel.com
Thu Aug 28 02:22:38 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> On Mon, 2008-08-25 at 20:07 -0500, John Arbash Meinel wrote:
>> === modified file 'bzrlib/chunk_writer.py'
>> --- bzrlib/chunk_writer.py 2008-08-22 02:09:36 +0000
>> +++ bzrlib/chunk_writer.py 2008-08-26 00:42:07 +0000
>> @@ -20,6 +20,10 @@
>> import zlib
>> from zlib import Z_FINISH, Z_SYNC_FLUSH
>>
>> +# [max_repack, buffer_full, repacks_with_space, min_compression,
>> +# total_bytes_in, total_bytes_out, avg_comp,
>> +# bytes_autopack, bytes_sync_packed, num_full_by_zsync]
>> +_stats = [0, 0, 0, 999, 0, 0, 0, 0, 0, 0]
>
> Do we need global state for this? I'd rather have a callback or
> something that can be used for instrumentation.
Not at all. I'd even be fine with nuking all of the global state in both
btree_index and chunk_writer. It was useful for a while, but now isn't needed.
>
>> class ChunkWriter(object):
>> """ChunkWriter allows writing of compressed data with a fixed
>> size.
>> @@ -38,34 +42,55 @@
>> number of times we will try.
>> In testing, some values for bzr.dev::
>> 2 54.4 13.7 3467 0 35.4
>> + 20 67.0 13.4 0 3380 46.7
>
> I'd rather have the performance details in comments than docstrings;
> docstrings are the how-to-use documentation and the performance details
> really aren't helpful with that. In fact, I think it hurts because its
> not prose and readers don't know how much to skip.
Sure. It also changes quite a bit with each draft. It was mostly just a tool
for me to look at while tuning. If you prefer, I can remove it.
>>
>> def finish(self):
>> """Finish the chunk.
>> @@ -97,11 +125,22 @@
>> out = self.compressor.flush(Z_FINISH)
>> self.bytes_list.append(out)
>> self.bytes_out_len += len(out)
>
> If you seed self.seen_bytes as 0.0, it will stay as a float and you
> won't need to cast. Might be faster - or slower :P.
>
>
> bb:tweak
>
I think you trimmed the wrong portion. The only place I cast it is during
statistics generation (during finish().) I thought it would be good to leave
it as an int the rest of the time.
Actually any more, we don't really need to track 'seen_bytes' other than for
stats. If you want, I could trim out some of the extra code there too.
So I'd like a little bit more feedback as to what you would *like*, and I'm
happy to do some cleanup.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFItf3eJdeBCYSNAAMRAux2AJ4jN+GgyH99JPHcbpw09xSnB3irEgCeK6xB
H9F6OJs6wEyCYt0bKVIAyCY=
=Ngaw
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list