Initial results using 'hash trie'
John Arbash Meinel
john at arbash-meinel.com
Tue Dec 23 23:51:16 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
>> 16-way fan out
>> Commits: 1043
>> Raw % Compressed % Objects
>> Revisions: 3990 KiB 0% 1228 KiB 6% 1043
>> Inventories: 11982 KiB 1% 7882 KiB 38% 19390
>> Texts: 882272 KiB 98% 11174 KiB 55% 7226
>> Signatures: 0 KiB 0% 0 KiB 0% 0
>> Total: 898245 KiB 100% 20285 KiB 100% 27659
>
>> Extra Info: count total avg stddev min max
>> internal node refs 9906 127314 12 5.3 7 16
>> internal p_id refs 621 5373 8 4.6 2 16
>> inv depth 7232 28832 3 2.4 1 4
>> leaf node items 7232 15841 2 1.7 1 14
>> leaf p_id items 588 7550 12 11.3 1 52
>> p_id depth 588 2505 4 1.7 1 6
>
16-way fan out, delta-compressed
Commits: 1043
Raw % Compressed % Objects
Revisions: 3990 KiB 0% 1228 KiB 7% 1043
Inventories: 11967 KiB 1% 4507 KiB 26% 19412
Texts: 882272 KiB 98% 11174 KiB 66% 7226
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 898230 KiB 100% 16910 KiB 100% 27681
Extra Info: count total avg stddev min max
internal node refs 9926 127495 12 5.3 7 16
internal p_id refs 623 5396 8 4.7 2 16
inv depth 7233 28856 3 2.4 1 4
leaf node items 7233 15616 2 1.4 1 14
leaf p_id items 587 7474 12 11.0 1 52
p_id depth 587 2501 4 1.7 1 6
>
>> 255-way fan out
>> Commits: 1043
>> Raw % Compressed % Objects
>> Revisions: 3990 KiB 0% 1228 KiB 4% 1043
>> Inventories: 30982 KiB 3% 16000 KiB 56% 11993
>> Texts: 882272 KiB 96% 11174 KiB 39% 7226
>> Signatures: 0 KiB 0% 0 KiB 0% 0
>> Total: 917245 KiB 100% 28403 KiB 100% 20262
>
>> Extra Info: count total avg stddev min max
>> internal node refs 1932 279512 144 120.2 12 255
>> internal p_id refs 343 26541 77 46.8 2 169
>> inv depth 7029 15989 2 1.0 1 3
>> leaf node items 7029 54469 7 5.5 1 14
>> leaf p_id items 1646 5342 3 6.8 1 52
>> p_id depth 1646 5458 3 1.4 1 4
>
255-way fan out, delta compressed
$ time wbzr repository-details d4-255-delta-mysql/
Commits: 1043
Raw % Compressed % Objects
Revisions: 3990 KiB 0% 1228 KiB 7% 1043
Inventories: 30981 KiB 3% 4063 KiB 24% 12090
Texts: 882272 KiB 96% 11174 KiB 67% 7226
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 917244 KiB 100% 16466 KiB 100% 20359
Extra Info: count total avg stddev min max
internal node refs 1976 280105 141 120.5 12 255
internal p_id refs 344 26582 77 46.6 2 169
inv depth 7081 16189 2 1.0 1 3
leaf node items 7081 53958 7 5.3 1 14
leaf p_id items 1646 5297 3 6.7 1 52
p_id depth 1646 5459 3 1.4 1 4
255-way fan out, delta compressed, no prefix extraction
So it seems that adding delta compression does a lot to "homogenize" the
results. The 255-way version had been using 16MB compressed, and now it
drops to 4MB, while the 16-way was 7.8MB and only drops to 4.5MB.
Also worth noting, though, is that the time to convert with delta
compression increases a lot. From about 2min to 3m30s. (for the 16-way
it is 2m40s) It was still bad even when I tried to add code to use any
cached parent texts. So it seems to be the time to compute the delta,
rather than the time to extract the previous text.
It also turns out that using delta compression breaks the pack-to-pack
fetching, because it was reading the lines and finding the references
directly, but now it doesn't have the fulltext, so it can't do the same
tricks.
I also did tests as to whether prefix extraction helps or hurts, and in
the end it helps. My guess is that for the p_id map it is so beneficial,
that the losses in delta compression are less than the gains.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAklReXQACgkQJdeBCYSNAAPHcwCglFw88ta2bTQqVCIB6IbfL8Q7
9b8AnjLD7OXh5MSxCsNysvoo7kwmvI5a
=O2TL
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list