[MERGE] Btree index updates

Robert Collins robertc at robertcollins.net
Thu Aug 28 02:35:40 BST 2008


On Wed, 2008-08-27 at 20:22 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1

> >>  
> >> +# [max_repack, buffer_full, repacks_with_space, min_compression,
> >> +#  total_bytes_in, total_bytes_out, avg_comp,
> >> +#  bytes_autopack, bytes_sync_packed, num_full_by_zsync]
> >> +_stats = [0, 0, 0, 999, 0, 0, 0, 0, 0, 0]
> > 
> > Do we need global state for this? I'd rather have a callback or
> > something that can be used for instrumentation.
> 
> Not at all. I'd even be fine with nuking all of the global state in both
> btree_index and chunk_writer. It was useful for a while, but now isn't needed.

Please. global state hurts eventually.

> > 
> >>  class ChunkWriter(object):
> >>      """ChunkWriter allows writing of compressed data with a fixed
> >> size.
> >> @@ -38,34 +42,55 @@
> >>          number of times we will try.
> >>          In testing, some values for bzr.dev::
> >>       2      54.4  13.7  3467  0         35.4
> >> +            20      67.0  13.4  0     3380      46.7
> > 
> > I'd rather have the performance details in comments than docstrings;
> > docstrings are the how-to-use documentation and the performance details
> > really aren't helpful with that. In fact, I think it hurts because its
> > not prose and readers don't know how much to skip.
> 
> Sure. It also changes quite a bit with each draft. It was mostly just a tool
> for me to look at while tuning. If you prefer, I can remove it.

Up to you - as long as its not in the docstring, I don't care.

> >>      def finish(self):
> >>          """Finish the chunk.
> >> @@ -97,11 +125,22 @@
> >>          out = self.compressor.flush(Z_FINISH)
> >>          self.bytes_list.append(out)
> >>          self.bytes_out_len += len(out)
> > 
> > If you seed self.seen_bytes as 0.0, it will stay as a float and you
> > won't need to cast. Might be faster - or slower :P.
> >  
> > 
> > bb:tweak
> > 
> 
> I think you trimmed the wrong portion. The only place I cast it is during
> statistics generation (during finish().) I thought it would be good to leave
> it as an int the rest of the time.
> 
> Actually any more, we don't really need to track 'seen_bytes' other than for
> stats. If you want, I could trim out some of the extra code there too.
> 
> So I'd like a little bit more feedback as to what you would *like*, and I'm
> happy to do some cleanup.

I am agnostic about the casting, it was just an observation. There was a
float(self.seen_bytes) / self.out_bytes, or something like that - yes I
did trim it.

For sure though, less code (as long as its clearer) is probably good ;)

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080828/e1936a4f/attachment-0001.pgp 


More information about the bazaar mailing list