B+Tree indices: ongoing progress
Robert Collins
robertc at robertcollins.net
Wed Jul 2 04:42:06 BST 2008
On Tue, 2008-07-01 at 22:33 -0500, John Arbash Meinel wrote:
> | re safety:
> | As long as in the event of an overrun it *at worst* asserts during
> | writing, I'm _ok_. To be happy I'd like it to always work no matter what
> | (short of 4K long values :P). I expect that (bzr-git revids will be very
> | high entropy and thus unlikely to get 2:1 compression.
>
> As in raises an Assert?
>
> And, IIRC we actually do a hex(git_sha1) which gives us at least 2:1
> because of the expansion to hex digits. (len(x)*2 == len(hex(x)).
>
> It is trivial to error out in .finish() if compressed bytes is > chunk_size.
>
> And further, the caller *could* reset itself, because we have been
> tracking the 'bytes_in'.
>
> If you are okay with plain 'punt for now', then I'm very happy to bring
> it in.
If punt for now == raise an AssertionError - yes, please bring it in.
> |
> | re: compression level - thats because of the window size - we're dealing
> | in a very small stream anyway. We can drop the compression but we can't
> | raise it using zlib.
> |
> | re: performance - I found while tuning commit to be hg-speed that we
> | really need to eliminate fat all the way through the process. At the end
> | of that process I was seeking 0.1 second wins throughout the code base.
> | (On a 20K file tree - or 1/4 the size of this data set. So any expansion
> | here is something we will need to counteract to prevent commit()
> | becoming slower).
>
> Well, is this a commit that triggers a repack, or a plain commit.
> Because with only 1 node being added to a fresh pack file, we are
> unlikely to hit any of the node packing code.
a plain commit adds potentially very many nodes to the .tix index. Its
true that initial commits are a pathological case, but merge commits do
their own damage :).
>
> The miss torture is very nice to see. And graph traversal is a decent
> way of showing some of that.
>
> I'll see if I can't fix up your iter_random_one(), though :).
That would be awesome!
Also, apropos blooms - http://bzr.arbash-meinel.com/plugins/pybloom/ is
the location for anyone wanting index2 rev 9 or so (coming soon).
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080702/184fced3/attachment.pgp
More information about the bazaar
mailing list