New toy for bzr developers: B+Tree index sketch

Robert Collins robertc at robertcollins.net
Wed Jul 2 00:40:28 BST 2008


On Tue, 2008-07-01 at 16:41 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> | On Tue, 2008-07-01 at 14:29 -0500, John Arbash Meinel wrote:
> |
> |
> |
> |> Attached is a patch which moves the page size into a constant, to make
> |> it easier to modify if we want to test for optimal page sizes on large
> |> projects. Specifically, you had a place where you did 4096 - 100 = 3996
> |> and then hard-coded that, which caused weird problems when I replaced
> |> all occurances of 4096 :).
> |
> | Thanks. This is good; I have been wondering about putting this in the
> | header to make it completely tunable; at the cost of needing an extra
> | read if the first read doesn't contain the root node.
> |
> | -Rob
> 
> I wanted to ask a bit about how these are written out.

_BuilderRow is the class accumulating the data for each row.

> It seems like you write all the leaf nodes, and then work out what the
> next level nodes are, write them, if that spills, work out the next, etc.
> 
> Which means that you write leafs => internal 1 => internal 2.

Yes.

> But isn't the final structure "internal 2 | internal 1 | leaf" ? At
> least, I would assume that gives better forward reads.

Yes.

> So is it just that you stage into a temp file, and then copy the values
> over when finished? Certainly you won't know how many internals you need
> until you are done, because you don't know how much you can fit into a node.

One temp file per row, and then the rows are re-read from there into the
final index.

Another thing that isn't addressed yet is building the index holds all
the nodes in memory. This is bad scaling wise, but we still access some
new keys during some operations. I think. Anyhow, my plan there is to
have a LRU cache to hold recent keys, separately group up batches of of
5K or 10K keys, and drop them in batches to a tempfile. At finish do a
merge sort (read two batches incrementally while writing a double-length
batch to another temp file, then loop with doubling batch offsets until
its fully sorted). Or variations thereon such as not doing the final
merge but pulling straight from the nearly-merged streams into the tree
serialiser.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080702/d5630593/attachment.pgp 


More information about the bazaar mailing list