Initial results using 'hash trie'
John Arbash Meinel
john at arbash-meinel.com
Tue Dec 23 21:52:52 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> I hacked together some work to use the zlib crc32 instead of the raw
> file-id as the lookup key for split inventories. I got some interesting
> results.
...
> The results for the first 1034 revisions of mysql are as follows:
>
> original (no hash)
> Commits: 1043
> Raw % Compressed % Objects
> Revisions: 3990 KiB 0% 1228 KiB 5% 1043
> Inventories: 17905 KiB 1% 9503 KiB 43% 20970
> Texts: 882272 KiB 97% 11174 KiB 51% 7226
> Signatures: 0 KiB 0% 0 KiB 0% 0
> Total: 904168 KiB 100% 21906 KiB 100% 29239
>
> Extra Info: count total avg stddev min max
> internal node refs 11864 114461 9 8.1 2 27
> internal p_id refs 1256 6840 5 5.7 2 24
> inv depth 6005 35566 5 3.5 1 9
> leaf node items 6005 34646 5 5.2 1 16
> leaf p_id items 802 6566 8 9.1 1 33
> p_id depth 802 7050 8 4.2 1 14
>
>
>
> 16-way fan out
> Commits: 1043
> Raw % Compressed % Objects
> Revisions: 3990 KiB 0% 1228 KiB 6% 1043
> Inventories: 11982 KiB 1% 7882 KiB 38% 19390
> Texts: 882272 KiB 98% 11174 KiB 55% 7226
> Signatures: 0 KiB 0% 0 KiB 0% 0
> Total: 898245 KiB 100% 20285 KiB 100% 27659
>
> Extra Info: count total avg stddev min max
> internal node refs 9906 127314 12 5.3 7 16
> internal p_id refs 621 5373 8 4.6 2 16
> inv depth 7232 28832 3 2.4 1 4
> leaf node items 7232 15841 2 1.7 1 14
> leaf p_id items 588 7550 12 11.3 1 52
> p_id depth 588 2505 4 1.7 1 6
>
>
> 255-way fan out
> Commits: 1043
> Raw % Compressed % Objects
> Revisions: 3990 KiB 0% 1228 KiB 4% 1043
> Inventories: 30982 KiB 3% 16000 KiB 56% 11993
> Texts: 882272 KiB 96% 11174 KiB 39% 7226
> Signatures: 0 KiB 0% 0 KiB 0% 0
> Total: 917245 KiB 100% 28403 KiB 100% 20262
>
> Extra Info: count total avg stddev min max
> internal node refs 1932 279512 144 120.2 12 255
> internal p_id refs 343 26541 77 46.8 2 169
> inv depth 7029 15989 2 1.0 1 3
> leaf node items 7029 54469 7 5.5 1 14
> leaf p_id items 1646 5342 3 6.8 1 52
> p_id depth 1646 5458 3 1.4 1 4
>
3,8-way fan out '%011o'
Commits: 1043
Raw % Compressed % Objects
Revisions: 3990 KiB 0% 1228 KiB 6% 1043
Inventories: 12352 KiB 1% 8015 KiB 39% 21870
Texts: 882272 KiB 98% 11174 KiB 54% 7226
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 898615 KiB 100% 20417 KiB 100% 30139
Extra Info: count total avg stddev min max
internal node refs 12613 91665 7 2.4 2 8
internal p_id refs 921 4501 4 3.7 2 8
inv depth 6734 33573 4 2.8 1 5
leaf node items 6734 22515 3 2.2 1 14
leaf p_id items 559 8491 15 11.9 1 52
p_id depth 559 3146 5 2.8 1 8
8-way fan out '%011o'[1:]
Commits: 1043
Raw % Compressed % Objects
Revisions: 3990 KiB 0% 1228 KiB 5% 1043
Inventories: 16069 KiB 1% 8979 KiB 41% 18495
Texts: 882272 KiB 97% 11174 KiB 52% 7226
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 902332 KiB 100% 21382 KiB 100% 26764
Extra Info: count total avg stddev min max
internal node refs 9900 79147 7 3.7 7 8
internal p_id refs 728 4536 6 2.6 2 8
inv depth 6260 25037 3 2.4 1 4
leaf node items 6260 37408 5 4.0 1 13
leaf p_id items 564 8164 14 10.9 1 52
p_id depth 564 2551 4 2.2 1 6
So interestingly, the number of redundant leaf items went down in both
ways, the overall depth increased (inv depth 3 => 4 for 3,8 breakout).
But the total bytes actually did go up (12MB => 12.4MB).
Next up is going to be bringing in the knit-delta hack, just to see how
things end up. It also ends up bringing a change in the serializer, so
that a line-delta can be more optimal on these texts.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAklRXbQACgkQJdeBCYSNAAN88QCgyCC2IJ/54hmraeziSEJSEnYX
hsMAoNl+o8U+bd4K6etQ7YN2uB9Lhpgk
=tDcb
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list