Effects of customizing search-key hash
John Arbash Meinel
john at arbash-meinel.com
Thu Feb 26 21:40:43 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Some might remember my earlier work playing with the various 'hash'
functions for splitting up the CHK pages. I don't remember my specific
equation, but basically, the total consumed size was a factor of the
size of the root node (which changes every commit), and then the depth
of the tree.
There are other factors, like average number of changes, etc. With a
hashed tree, you expect each change to change a leaf node, and then
every internal node up to the top. And you expect that internal nodes
are unlikely to overlap (except the root).
I played with a few more hash functions. We have the original '16' which
has 16-way fan out at each level, and 255 which has 255-way at every level.
I then played with several that had a smaller fan-out at the root (to
shrink that page) and then the 255-way fan out for all other nodes.
16-way Inventories: 21897 KiB 2% 10834 KiB 46% 15374
255-way Inventories: 31012 KiB 3% 15987 KiB 56% 12090
16/255 Inventories: 31601 KiB 3% 19090 KiB 60% 15197
63/255 Inventories: 18043 KiB 1% 11604 KiB 48% 17151
127/255 Inventories: 16763 KiB 1% 10909 KiB 46% 17712
127-way Inventories: 16647 KiB 1% 10769 KiB 46% 17627
So 127 seems to be the 'sweet spot' for this tree. Also all of this is
without compression. I still want to test this with compression. Though
I noticed recently that the 'gc' format repositories weren't getting
better compression than plain zlib compression for the chk pages. I
think I know why, I just need to spend some time to fix it.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkmnDFsACgkQJdeBCYSNAAMSagCfZJImHW5IQLG03UcSZrcQvARH
MCwAnj6Sf6Vagkhd7EkVGofa+SDtyCUe
=isA+
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list