B+Tree indices: ongoing progress

Wed Jul 2 22:20:42 BST 2008

On Wed, 2008-07-02 at 09:37 -0500, John Arbash Meinel wrote:
> 
> 1) IO gets a lot more expensive after a cache drop, but more
> importantly
> when accessing over remote. So we *might* end up with a bigger win.
> Though if we are switching over to RemoteRPC calls, then local access
> to
> indexes is the only real priority.

Launchpad and other hosting sites run 10's of thousands of branches. So
disk cache misses are likely to be more common than on user machines
where the first op hits and the second misses. So I felt being able to
get some sort of impression was important.

> 2) One problem is that you don't store any of your sha1 values. So at
> each level that there is a bloom, you do *another* sha1 to check if it
> exists. The sha1 would never change, and if all the bloom sizes are
> the
> same, the offsets don't change either. Actually, we could even save
> the
> struct.unpack() since the integers derived from the sha1 don't change.
> We just mask them to get the right location in the bloom itself.
> So there is some trivial caching that we could do, which should
> (possibly dramatically) decrease the cost of bloom searches.

Indeed. I think the best thing to do immediately is to only check the
second rows bloom, never the higher rows, as we can reasonably expect
the higher ones to be black. or -

> 3) What if we put a single :bloom: record into the overall structure,
> instead of having one at each node. (or maybe both). The idea is that
> you can fill a 4096 byte page with a bloom filter, and then just go
> look
> it up 1 time on first request. If we find it works really well, we can
> change the logic so that page[0] is the root node, and page[1] is the
> bloom. (Or the reverse, but having a fixed-size bloom is probably
> best.)
>
> The bloom could even be spread over multiple pages. We know how many
> nodes are in the index from the header. If we just always allocated N
> bytes for the overall bloom filter, we would get approx 2% false
> positive rate.
> 
> For the 1.1M keys in OOo, that would bloat the index by 1.1MB. But as
> the index itself is 36MB, that is only 3% overhead. And if we *really*
> want, you can just read the pages of the bloom for the keys you have
> actually hit. Put another way...

Yes, this too would work.

> bloom is an array of fixed length. When we check the offsets for a
> key,
> those bits will fall at specific locations. Say we hit [10, 20, 500,
> 600], we don't have to read the stuff from 40-400. If we are using
> SHA1
> as our hash, we get 5 keys. So to lookup a single key would cost (on
> average) 5 page reads, or 4096*5 = 20480 bytes, rather than the full
> 1.1MB.
> 
> Unfortunately, as the offsets are intentionally random, adding more
> keys
> will tend to increase the # pages linearly. So 2 keys == 10 pages,
> etc.

In particular we'd expect 5 pages of IO to find a miss; vs 3 for three
internal nodes + the bloom before the leaf nodes.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080703/747cab90/attachment-0001.pgp