Initial results using 'hash trie'

John Arbash Meinel john at arbash-meinel.com
Tue Dec 23 21:52:52 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> I hacked together some work to use the zlib crc32 instead of the raw
> file-id as the lookup key for split inventories. I got some interesting
> results.

...

> The results for the first 1034 revisions of mysql are as follows:
> 
> original (no hash)
> Commits: 1043
>                       Raw    %    Compressed    %  Objects
> Revisions:       3990 KiB   0%      1228 KiB   5%     1043
> Inventories:    17905 KiB   1%      9503 KiB  43%    20970
> Texts:         882272 KiB  97%     11174 KiB  51%     7226
> Signatures:         0 KiB   0%         0 KiB   0%        0
> Total:         904168 KiB 100%     21906 KiB 100%    29239
> 
> Extra Info:           count    total  avg stddev  min  max
> internal node refs    11864   114461    9    8.1    2   27
> internal p_id refs     1256     6840    5    5.7    2   24
> inv depth              6005    35566    5    3.5    1    9
> leaf node items        6005    34646    5    5.2    1   16
> leaf p_id items         802     6566    8    9.1    1   33
> p_id depth              802     7050    8    4.2    1   14
> 
> 
> 
> 16-way fan out
> Commits: 1043
>                       Raw    %    Compressed    %  Objects
> Revisions:       3990 KiB   0%      1228 KiB   6%     1043
> Inventories:    11982 KiB   1%      7882 KiB  38%    19390
> Texts:         882272 KiB  98%     11174 KiB  55%     7226
> Signatures:         0 KiB   0%         0 KiB   0%        0
> Total:         898245 KiB 100%     20285 KiB 100%    27659
> 
> Extra Info:           count    total  avg stddev  min  max
> internal node refs     9906   127314   12    5.3    7   16
> internal p_id refs      621     5373    8    4.6    2   16
> inv depth              7232    28832    3    2.4    1    4
> leaf node items        7232    15841    2    1.7    1   14
> leaf p_id items         588     7550   12   11.3    1   52
> p_id depth              588     2505    4    1.7    1    6
> 
> 
> 255-way fan out
> Commits: 1043
>                       Raw    %    Compressed    %  Objects
> Revisions:       3990 KiB   0%      1228 KiB   4%     1043
> Inventories:    30982 KiB   3%     16000 KiB  56%    11993
> Texts:         882272 KiB  96%     11174 KiB  39%     7226
> Signatures:         0 KiB   0%         0 KiB   0%        0
> Total:         917245 KiB 100%     28403 KiB 100%    20262
> 
> Extra Info:           count    total  avg stddev  min  max
> internal node refs     1932   279512  144  120.2   12  255
> internal p_id refs      343    26541   77   46.8    2  169
> inv depth              7029    15989    2    1.0    1    3
> leaf node items        7029    54469    7    5.5    1   14
> leaf p_id items        1646     5342    3    6.8    1   52
> p_id depth             1646     5458    3    1.4    1    4
> 

3,8-way fan out '%011o'
Commits: 1043
                      Raw    %    Compressed    %  Objects
Revisions:       3990 KiB   0%      1228 KiB   6%     1043
Inventories:    12352 KiB   1%      8015 KiB  39%    21870
Texts:         882272 KiB  98%     11174 KiB  54%     7226
Signatures:         0 KiB   0%         0 KiB   0%        0
Total:         898615 KiB 100%     20417 KiB 100%    30139

Extra Info:           count    total  avg stddev  min  max
internal node refs    12613    91665    7    2.4    2    8
internal p_id refs      921     4501    4    3.7    2    8
inv depth              6734    33573    4    2.8    1    5
leaf node items        6734    22515    3    2.2    1   14
leaf p_id items         559     8491   15   11.9    1   52
p_id depth              559     3146    5    2.8    1    8



8-way fan out '%011o'[1:]
Commits: 1043
                      Raw    %    Compressed    %  Objects
Revisions:       3990 KiB   0%      1228 KiB   5%     1043
Inventories:    16069 KiB   1%      8979 KiB  41%    18495
Texts:         882272 KiB  97%     11174 KiB  52%     7226
Signatures:         0 KiB   0%         0 KiB   0%        0
Total:         902332 KiB 100%     21382 KiB 100%    26764

Extra Info:           count    total  avg stddev  min  max
internal node refs     9900    79147    7    3.7    7    8
internal p_id refs      728     4536    6    2.6    2    8
inv depth              6260    25037    3    2.4    1    4
leaf node items        6260    37408    5    4.0    1   13
leaf p_id items         564     8164   14   10.9    1   52
p_id depth              564     2551    4    2.2    1    6


So interestingly, the number of redundant leaf items went down in both
ways, the overall depth increased (inv depth 3 => 4 for 3,8 breakout).
But the total bytes actually did go up (12MB => 12.4MB).


Next up is going to be bringing in the knit-delta hack, just to see how
things end up. It also ends up bringing a change in the serializer, so
that a line-delta can be more optimal on these texts.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklRXbQACgkQJdeBCYSNAAN88QCgyCC2IJ/54hmraeziSEJSEnYX
hsMAoNl+o8U+bd4K6etQ7YN2uB9Lhpgk
=tDcb
-----END PGP SIGNATURE-----



More information about the bazaar mailing list