New toy for bzr developers: B+Tree index sketch
Ian Clatworthy
ian.clatworthy at internode.on.net
Wed Jul 2 06:19:02 BST 2008
Robert Collins wrote:
> I have been tinkering with a B+Tree approach to the disk indices. I
> prefer this over a hash based index because of the ability to do
> in-order traversal when desired.
>
> In short, I think we can save substantial size with the currently
> developed B+Tree and it allows addressing some of the memory problems
> Andrew has spent some time debugging.
>
> The attached script (run from the root of any pack style repository you
> want to analyze) gets the following result for bzr.dev:
>
> Original aggregate index size: 10140380
> B+Tree aggregate index size: 5644360
> Difference: 4496020
>
Here are the result for OpenOffice.org (305K revisions) and Mozilla (12.5k
revisions):
ian at humpback:~/scm-conversions/OOo/OOo.bzr$ python ../../testindex2.py
27.022841 4b48589090022a0d3167b2b8ba23bf99.rix 137.1 305550 2228 64763 0.0 -12101512
0.004205 4b48589090022a0d3167b2b8ba23bf99.six N/A 0 0 0 0.0 12
32.642130 4b48589090022a0d3167b2b8ba23bf99.iix 120.5 305550 2536 76870 0.0 -14081279
78.217462 4b48589090022a0d3167b2b8ba23bf99.tix 132.6 538404 4060 115027 0.0 -55270315
Original aggregate index size: 117596270
B+Tree aggregate index size: 36143176
Difference: 81453094
ian at humpback:~/scm-conversions/OOo/OOo.bzr$ cd ../../mozilla/mozilla.bzr
ian at humpback:~/scm-conversions/mozilla/mozilla.bzr$ python ../../testindex2.py
0.939508 0b9f34ffabd7722d88aafde4ac487cf3.rix 92.3 12456 135 5187 0.0 -355543
0.000504 0b9f34ffabd7722d88aafde4ac487cf3.six N/A 0 0 0 0.0 12
1.047417 0b9f34ffabd7722d88aafde4ac487cf3.iix 84.7 12456 147 8072 0.0 -432741
2.954725 0b9f34ffabd7722d88aafde4ac487cf3.tix 292.5 34813 119 4456 0.0 -3861195
Original aggregate index size: 6292035
B+Tree aggregate index size: 1642568
Difference: 4649467
So the improvements are nice. However to put the OOo one in context, the indexes
are already small compared to the size of the pack. Trimming 81M off 3.8G is
heading in the right direction though!
Ian C.
More information about the bazaar
mailing list