B+Tree indices: ongoing progress

Robert Collins robertc at robertcollins.net
Wed Jul 2 04:42:06 BST 2008


On Tue, 2008-07-01 at 22:33 -0500, John Arbash Meinel wrote:

> | re safety:
> | As long as in the event of an overrun it *at worst* asserts during
> | writing, I'm _ok_. To be happy I'd like it to always work no matter what
> | (short of 4K long values :P). I expect that (bzr-git revids will be very
> | high entropy and thus unlikely to get 2:1 compression.
> 
> As in raises an Assert?
> 
> And, IIRC we actually do a hex(git_sha1) which gives us at least 2:1
> because of the expansion to hex digits. (len(x)*2 == len(hex(x)).
> 
> It is trivial to error out in .finish() if compressed bytes is > chunk_size.
> 
> And further, the caller *could* reset itself, because we have been
> tracking the 'bytes_in'.
> 
> If you are okay with plain 'punt for now', then I'm very happy to bring
> it in.

If punt for now == raise an AssertionError - yes, please bring it in.

> |
> | re: compression level - thats because of the window size - we're dealing
> | in a very small stream anyway. We can drop the compression but we can't
> | raise it using zlib.
> |
> | re: performance - I found while tuning commit to be hg-speed that we
> | really need to eliminate fat all the way through the process. At the end
> | of that process I was seeking 0.1 second wins throughout the code base.
> | (On a 20K file tree - or 1/4 the size of this data set. So any expansion
> | here is something we will need to counteract to prevent commit()
> | becoming slower).
> 
> Well, is this a commit that triggers a repack, or a plain commit.
> Because with only 1 node being added to a fresh pack file, we are
> unlikely to hit any of the node packing code.

a plain commit adds potentially very many nodes to the .tix index. Its
true that initial commits are a pathological case, but merge commits do
their own damage :).

> 
> The miss torture is very nice to see. And graph traversal is a decent
> way of showing some of that.
> 
> I'll see if I can't fix up your iter_random_one(), though :).

That would be awesome!

Also, apropos blooms - http://bzr.arbash-meinel.com/plugins/pybloom/ is
the location for anyone wanting index2 rev 9 or so (coming soon).

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080702/184fced3/attachment.pgp 


More information about the bazaar mailing list