New toy for bzr developers: B+Tree index sketch
John Arbash Meinel
john at arbash-meinel.com
Wed Jul 2 06:27:03 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
| Robert Collins wrote:
|> I have been tinkering with a B+Tree approach to the disk indices. I
|> prefer this over a hash based index because of the ability to do
|> in-order traversal when desired.
|>
|> In short, I think we can save substantial size with the currently
|> developed B+Tree and it allows addressing some of the memory problems
|> Andrew has spent some time debugging.
|>
|> The attached script (run from the root of any pack style repository you
|> want to analyze) gets the following result for bzr.dev:
|>
|> Original aggregate index size: 10140380
|> B+Tree aggregate index size: 5644360
|> Difference: 4496020
|>
|
| Here are the result for OpenOffice.org (305K revisions) and Mozilla (12.5k
| revisions):
|
| ian at humpback:~/scm-conversions/OOo/OOo.bzr$ python ../../testindex2.py
| 27.022841 4b48589090022a0d3167b2b8ba23bf99.rix 137.1 305550 2228 64763
0.0 -12101512
| 0.004205 4b48589090022a0d3167b2b8ba23bf99.six N/A 0 0 0 0.0 12
| 32.642130 4b48589090022a0d3167b2b8ba23bf99.iix 120.5 305550 2536 76870
0.0 -14081279
| 78.217462 4b48589090022a0d3167b2b8ba23bf99.tix 132.6 538404 4060
115027 0.0 -55270315
| Original aggregate index size: 117596270
| B+Tree aggregate index size: 36143176
| Difference: 81453094
|
| ian at humpback:~/scm-conversions/OOo/OOo.bzr$ cd ../../mozilla/mozilla.bzr
| ian at humpback:~/scm-conversions/mozilla/mozilla.bzr$ python
../../testindex2.py
| 0.939508 0b9f34ffabd7722d88aafde4ac487cf3.rix 92.3 12456 135 5187 0.0
- -355543
| 0.000504 0b9f34ffabd7722d88aafde4ac487cf3.six N/A 0 0 0 0.0 12
| 1.047417 0b9f34ffabd7722d88aafde4ac487cf3.iix 84.7 12456 147 8072 0.0
- -432741
| 2.954725 0b9f34ffabd7722d88aafde4ac487cf3.tix 292.5 34813 119 4456 0.0
- -3861195
| Original aggregate index size: 6292035
| B+Tree aggregate index size: 1642568
| Difference: 4649467
|
| So the improvements are nice. However to put the OOo one in context,
the indexes
| are already small compared to the size of the pack. Trimming 81M off
3.8G is
| heading in the right direction though!
|
| Ian C.
|
|
I think fundamentally the indexes should always be a small portion of
the total data. (For bzr.dev you are talking 90MB vs 3-10MB). However,
they are a *critical* portion, as it is where you start from.
The data sizes are nice to see, I would also be very interested in the
performance numbers. Like, time to 'bzr log', etc.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkhrEacACgkQJdeBCYSNAAMgqACfa0klZeE00Btqaeb1fUQ8/aiA
2fkAoNlcvTnRZl6aRWYCJxrdoL/36aBy
=nCqv
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list