btree index pre-reading
John Arbash Meinel
john at arbash-meinel.com
Thu Oct 9 02:03:01 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> I wrote up this document to outline some ideas about having btree
> indexes do a bit of "pre-reading". At the moment, they read as 4k pages,
> but they never read more than the minimum number of pages to fulfill the
> current request.
>
> I thought it might make a decent developer documentation for explaining
> why we do it this way, rather than just having an email which explains it.
>
> I'd appreciate any feedback, especially wrt the algorithm itself, as I
> plan on implementing the ideas here.
>
> John
> =:->
Nobody has commented on this yet, but I figured I'd post some results.
First, I have 2 sets of indexes. One is the standard index, generated by
the current default algorithm. The second I turned up the 'fit keys per
page' algorithm to its max setting. I also have a "GraphIndex"
repository for comparison.
I'll also note that small differences are probably not significant, as
I'm doing this over the internet, with lots of variation.
bzr log --short -r -100..-1
regular packed reg+expand packed+exp graphindex
9.7 9.8 11.7 11.7 30.3
so 'log --short' is a little bit slower with 64kB reads, because
everything can actually be returned in a single or two from each index.
(We don't save any round trips.) This isn't terribly surprising,
considering all the keys for bzr.dev will be clustered around "pqm@" and
over 100 revisions, probably not too far apart date-wise.
bzr log --long -r -100..-1
regular packed reg+expand packed+exp graphindex
47.5 41.0 23 21.6 45
Take these numbers with a little salt, as I saw variance from 45s =>
1m12s for some of these cases.
Packing harder made the indexes ~75% the size (3.7MB versus 4.7MB).
This has a larger effect on non-expanded request. I would imagine
because with expansion we end up reading ~ the same amount of data
either way.
Expansion here has a huge effect, though, cutting the total time in
half. In fact, without expansion the plain graph-index code is
approximately as efficient.
log --long is a bit different, though. As it grabs the entire revision
history, but doesn't grab anything but revisions.
bzr pull # of bzr.dev 3759 to 3763
regular packed reg+expand packed+exp graphindex
14.8 14.8 22.6 21.8 49.6
Again interestingly enough, with only 4 revisions being pulled,
expansion doesn't help, because we don't actually need more than one
page of the target. I can ameliorate this effect by using a check for
having read < 2 nodes from the remote repository, then I don't expand
yet. This negatively impacts 'bzr log' times, but positively impacts
'bzr pull' times.
Though again, it shows some big wins overall with getting btree indexes
in a stable form for people to use.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkjtWEUACgkQJdeBCYSNAAPIdACgnLqTiNIjyz5nigxrWDSIBT8u
m30AoNoDtdasrSmz7SIaKbeV6NJ1Dxeb
=Rf0k
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list