Effects of customizing search-key hash

John Arbash Meinel john at arbash-meinel.com
Thu Feb 26 21:40:43 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Some might remember my earlier work playing with the various 'hash'
functions for splitting up the CHK pages. I don't remember my specific
equation, but basically, the total consumed size was a factor of the
size of the root node (which changes every commit), and then the depth
of the tree.

There are other factors, like average number of changes, etc. With a
hashed tree, you expect each change to change a leaf node, and then
every internal node up to the top. And you expect that internal nodes
are unlikely to overlap (except the root).

I played with a few more hash functions. We have the original '16' which
has 16-way fan out at each level, and 255 which has 255-way at every level.

I then played with several that had a smaller fan-out at the root (to
shrink that page) and then the 255-way fan out for all other nodes.

16-way  Inventories:    21897 KiB   2%     10834 KiB  46%    15374
255-way Inventories:    31012 KiB   3%     15987 KiB  56%    12090

16/255	Inventories:    31601 KiB   3%     19090 KiB  60%    15197
63/255  Inventories:    18043 KiB   1%     11604 KiB  48%    17151
127/255 Inventories:    16763 KiB   1%     10909 KiB  46%    17712
127-way Inventories:    16647 KiB   1%     10769 KiB  46%    17627

So 127 seems to be the 'sweet spot' for this tree. Also all of this is
without compression. I still want to test this with compression. Though
I noticed recently that the 'gc' format repositories weren't getting
better compression than plain zlib compression for the chk pages. I
think I know why, I just need to spend some time to fix it.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmnDFsACgkQJdeBCYSNAAMSagCfZJImHW5IQLG03UcSZrcQvARH
MCwAnj6Sf6Vagkhd7EkVGofa+SDtyCUe
=isA+
-----END PGP SIGNATURE-----



More information about the bazaar mailing list