btree index pre-reading

Thu Oct 9 02:03:01 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> I wrote up this document to outline some ideas about having btree
> indexes do a bit of "pre-reading". At the moment, they read as 4k pages,
> but they never read more than the minimum number of pages to fulfill the
> current request.
> 
> I thought it might make a decent developer documentation for explaining
> why we do it this way, rather than just having an email which explains it.
> 
> I'd appreciate any feedback, especially wrt the algorithm itself, as I
> plan on implementing the ideas here.
> 
> John
> =:->

Nobody has commented on this yet, but I figured I'd post some results.

First, I have 2 sets of indexes. One is the standard index, generated by
the current default algorithm. The second I turned up the 'fit keys per
page' algorithm to its max setting. I also have a "GraphIndex"
repository for comparison.

I'll also note that small differences are probably not significant, as
I'm doing this over the internet, with lots of variation.

bzr log --short -r -100..-1
	regular	packed	reg+expand	packed+exp	graphindex
	9.7	9.8	11.7		11.7		30.3

so 'log --short' is a little bit slower with 64kB reads, because
everything can actually be returned in a single or two from each index.
(We don't save any round trips.) This isn't terribly surprising,
considering all the keys for bzr.dev will be clustered around "pqm@" and
 over 100 revisions, probably not too far apart date-wise.

bzr log --long -r -100..-1
	regular	packed	reg+expand	packed+exp	graphindex
	47.5	41.0	23		21.6		45

Take these numbers with a little salt, as I saw variance from 45s =>
1m12s for some of these cases.

Packing harder made the indexes ~75% the size (3.7MB versus 4.7MB).
This has a larger effect on non-expanded request. I would imagine
because with expansion we end up reading ~ the same amount of data
either way.

Expansion here has a huge effect, though, cutting the total time in
half. In fact, without expansion the plain graph-index code is
approximately as efficient.

log --long is a bit different, though. As it grabs the entire revision
history, but doesn't grab anything but revisions.

bzr pull # of bzr.dev 3759 to 3763

	regular	packed	reg+expand	packed+exp	graphindex
	14.8	14.8	22.6		21.8		49.6

Again interestingly enough, with only 4 revisions being pulled,
expansion doesn't help, because we don't actually need more than one
page of the target. I can ameliorate this effect by using a check for
having read < 2 nodes from the remote repository, then I don't expand
yet. This negatively impacts 'bzr log' times, but positively impacts
'bzr pull' times.

Though again, it shows some big wins overall with getting btree indexes
in a stable form for people to use.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjtWEUACgkQJdeBCYSNAAPIdACgnLqTiNIjyz5nigxrWDSIBT8u
m30AoNoDtdasrSmz7SIaKbeV6NJ1Dxeb
=Rf0k
-----END PGP SIGNATURE-----