Initial results using 'hash trie'

John Arbash Meinel john at arbash-meinel.com
Tue Dec 23 23:51:16 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:

>> 16-way fan out
>> Commits: 1043
>>                       Raw    %    Compressed    %  Objects
>> Revisions:       3990 KiB   0%      1228 KiB   6%     1043
>> Inventories:    11982 KiB   1%      7882 KiB  38%    19390
>> Texts:         882272 KiB  98%     11174 KiB  55%     7226
>> Signatures:         0 KiB   0%         0 KiB   0%        0
>> Total:         898245 KiB 100%     20285 KiB 100%    27659
> 
>> Extra Info:           count    total  avg stddev  min  max
>> internal node refs     9906   127314   12    5.3    7   16
>> internal p_id refs      621     5373    8    4.6    2   16
>> inv depth              7232    28832    3    2.4    1    4
>> leaf node items        7232    15841    2    1.7    1   14
>> leaf p_id items         588     7550   12   11.3    1   52
>> p_id depth              588     2505    4    1.7    1    6
> 

16-way fan out, delta-compressed
Commits: 1043
                      Raw    %    Compressed    %  Objects
Revisions:       3990 KiB   0%      1228 KiB   7%     1043
Inventories:    11967 KiB   1%      4507 KiB  26%    19412
Texts:         882272 KiB  98%     11174 KiB  66%     7226
Signatures:         0 KiB   0%         0 KiB   0%        0
Total:         898230 KiB 100%     16910 KiB 100%    27681

Extra Info:           count    total  avg stddev  min  max
internal node refs     9926   127495   12    5.3    7   16
internal p_id refs      623     5396    8    4.7    2   16
inv depth              7233    28856    3    2.4    1    4
leaf node items        7233    15616    2    1.4    1   14
leaf p_id items         587     7474   12   11.0    1   52
p_id depth              587     2501    4    1.7    1    6

> 
>> 255-way fan out
>> Commits: 1043
>>                       Raw    %    Compressed    %  Objects
>> Revisions:       3990 KiB   0%      1228 KiB   4%     1043
>> Inventories:    30982 KiB   3%     16000 KiB  56%    11993
>> Texts:         882272 KiB  96%     11174 KiB  39%     7226
>> Signatures:         0 KiB   0%         0 KiB   0%        0
>> Total:         917245 KiB 100%     28403 KiB 100%    20262
> 
>> Extra Info:           count    total  avg stddev  min  max
>> internal node refs     1932   279512  144  120.2   12  255
>> internal p_id refs      343    26541   77   46.8    2  169
>> inv depth              7029    15989    2    1.0    1    3
>> leaf node items        7029    54469    7    5.5    1   14
>> leaf p_id items        1646     5342    3    6.8    1   52
>> p_id depth             1646     5458    3    1.4    1    4
> 

255-way fan out, delta compressed
$ time wbzr repository-details d4-255-delta-mysql/
Commits: 1043
                      Raw    %    Compressed    %  Objects
Revisions:       3990 KiB   0%      1228 KiB   7%     1043
Inventories:    30981 KiB   3%      4063 KiB  24%    12090
Texts:         882272 KiB  96%     11174 KiB  67%     7226
Signatures:         0 KiB   0%         0 KiB   0%        0
Total:         917244 KiB 100%     16466 KiB 100%    20359

Extra Info:           count    total  avg stddev  min  max
internal node refs     1976   280105  141  120.5   12  255
internal p_id refs      344    26582   77   46.6    2  169
inv depth              7081    16189    2    1.0    1    3
leaf node items        7081    53958    7    5.3    1   14
leaf p_id items        1646     5297    3    6.7    1   52
p_id depth             1646     5459    3    1.4    1    4

255-way fan out, delta compressed, no prefix extraction


So it seems that adding delta compression does a lot to "homogenize" the
results. The 255-way version had been using 16MB compressed, and now it
drops to 4MB, while the 16-way was 7.8MB and only drops to 4.5MB.

Also worth noting, though, is that the time to convert with delta
compression increases a lot. From about 2min to 3m30s. (for the 16-way
it is 2m40s) It was still bad even when I tried to add code to use any
cached parent texts. So it seems to be the time to compute the delta,
rather than the time to extract the previous text.


It also turns out that using delta compression breaks the pack-to-pack
fetching, because it was reading the lines and finding the references
directly, but now it doesn't have the fulltext, so it can't do the same
tricks.


I also did tests as to whether prefix extraction helps or hurts, and in
the end it helps. My guess is that for the p_id map it is so beneficial,
that the losses in delta compression are less than the gains.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklReXQACgkQJdeBCYSNAAPHcwCglFw88ta2bTQqVCIB6IbfL8Q7
9b8AnjLD7OXh5MSxCsNysvoo7kwmvI5a
=O2TL
-----END PGP SIGNATURE-----



More information about the bazaar mailing list