New toy for bzr developers: B+Tree index sketch

Robert Collins robertc at
Tue Jul 1 21:36:18 BST 2008

On Tue, 2008-07-01 at 14:07 -0500, John Arbash Meinel wrote:

> | Not tested yet:
> |  - high latency (NFS or Remote or http) links
> Did you have simple scripts to make it easy to reproduce your results?

yeah, I sent that at midnight; brain not worky so well. I'll add a
couple of extra things and make it more parameterised and attach it
later today.

Thanks for spending time tuning the size. I don't think 4K is
necessarily best; Indeed, 64K would clearly be better in terms of
accessing remote servers *if* we get a high hit rate. (Reading 128K to
access one key/determine one key is missing is bad compared to 8K; we
want lots of accesses to the same node to accommodate the overhead of
the read).

> John
> =:->
> PS> It would all be so much simpler if we could just snapshot the state
> of the compressor. Do the flush, restore the state, etc. zlib itself
> supports it with deflateCopy(). Though it does warn:

there is a .copy() on compressobj - I checked the python code :). But it
depends if the zlib present when python is built has deflateCopy, so I
felt it was likely to give use installation issues.

> ... Note that deflateCopy duplicates the internal compression state
> which can be quite large, so this strategy is slow and can consume lots
> of memory.

OTOH zlib is used for very large data sets, and our typical indices are
not; individual pages are definitely not.

> I also tried switching to 8192-sized pack files. And with my code, it
> drops the size to 3.2MB. So about a 3% savings. Certainly not nearly as
> much as just trying to pack more into the existing 4096 bytes.

GPG key available at: <>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : 

More information about the bazaar mailing list