[MERGE] Btree index updates
Robert Collins
robertc at robertcollins.net
Thu Aug 28 02:35:40 BST 2008
On Wed, 2008-08-27 at 20:22 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> >>
> >> +# [max_repack, buffer_full, repacks_with_space, min_compression,
> >> +# total_bytes_in, total_bytes_out, avg_comp,
> >> +# bytes_autopack, bytes_sync_packed, num_full_by_zsync]
> >> +_stats = [0, 0, 0, 999, 0, 0, 0, 0, 0, 0]
> >
> > Do we need global state for this? I'd rather have a callback or
> > something that can be used for instrumentation.
>
> Not at all. I'd even be fine with nuking all of the global state in both
> btree_index and chunk_writer. It was useful for a while, but now isn't needed.
Please. global state hurts eventually.
> >
> >> class ChunkWriter(object):
> >> """ChunkWriter allows writing of compressed data with a fixed
> >> size.
> >> @@ -38,34 +42,55 @@
> >> number of times we will try.
> >> In testing, some values for bzr.dev::
> >> 2 54.4 13.7 3467 0 35.4
> >> + 20 67.0 13.4 0 3380 46.7
> >
> > I'd rather have the performance details in comments than docstrings;
> > docstrings are the how-to-use documentation and the performance details
> > really aren't helpful with that. In fact, I think it hurts because its
> > not prose and readers don't know how much to skip.
>
> Sure. It also changes quite a bit with each draft. It was mostly just a tool
> for me to look at while tuning. If you prefer, I can remove it.
Up to you - as long as its not in the docstring, I don't care.
> >> def finish(self):
> >> """Finish the chunk.
> >> @@ -97,11 +125,22 @@
> >> out = self.compressor.flush(Z_FINISH)
> >> self.bytes_list.append(out)
> >> self.bytes_out_len += len(out)
> >
> > If you seed self.seen_bytes as 0.0, it will stay as a float and you
> > won't need to cast. Might be faster - or slower :P.
> >
> >
> > bb:tweak
> >
>
> I think you trimmed the wrong portion. The only place I cast it is during
> statistics generation (during finish().) I thought it would be good to leave
> it as an int the rest of the time.
>
> Actually any more, we don't really need to track 'seen_bytes' other than for
> stats. If you want, I could trim out some of the extra code there too.
>
> So I'd like a little bit more feedback as to what you would *like*, and I'm
> happy to do some cleanup.
I am agnostic about the casting, it was just an observation. There was a
float(self.seen_bytes) / self.out_bytes, or something like that - yes I
did trim it.
For sure though, less code (as long as its clearer) is probably good ;)
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080828/e1936a4f/attachment-0001.pgp
More information about the bazaar
mailing list