Initial results of chk+gc+split-inv
John Arbash Meinel
john at arbash-meinel.com
Fri Feb 13 22:53:48 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
So I finally put together the combination of bits that we wanted. I have
a brisbane-core branch, which uses an inventory layout that uses '\n' as
the separator, and then shoves all of that into a groupcompress pack.
At the moment, I'm only testing on a branch of bzrtools going through
rev 599 (because groupcompress autopack is currently broken).
So for 998 revs of bzrtools, the results are decent.
One interesting bit. Revision texts compress better than you would
expect in groupcompress repos. (484KiB down to 345KiB in plain
repositories, is down to 135KiB in a groupcompress repo.)
There is one bit you have to be careful with. Namely that the
*conversion* does things one inventory at a time, so you get pretty much
0 inter-text benefits. You have to then branch that repository to a
fresh repository, which causes it to re-groupcompress all of the texts.
- --1.9 format:
Commits: 998
Raw % Compressed % Objects
Revisions: 484 KiB 1% 355 KiB 16% 998
Inventories: 16445 KiB 51% 617 KiB 28% 998
Texts: 15168 KiB 47% 1169 KiB 54% 1895
Signatures: 19 KiB 0% 18 KiB 0% 51
Total: 32118 KiB 100% 2159 KiB 100% 3942
- --16-way fan out, no delta compression
Commits: 998
Raw % Compressed % Objects
Revisions: 484 KiB 2% 355 KiB 8% 998
Inventories: 5215 KiB 24% 2607 KiB 61% 4736
Texts: 15168 KiB 72% 1237 KiB 29% 1895
Signatures: 19 KiB 0% 18 KiB 0% 51
Total: 20888 KiB 100% 4218 KiB 100% 7680
- --16-way fan out, knit-delta compression
Commits: 998
Raw % Compressed % Objects
Revisions: 484 KiB 2% 355 KiB 13% 998
Inventories: 5187 KiB 24% 1139 KiB 42% 4736
Texts: 15168 KiB 72% 1169 KiB 43% 1895
Signatures: 19 KiB 0% 18 KiB 0% 51
Total: 20859 KiB 100% 2682 KiB 100% 7680
- --16-way fan out, groupcompress
Commits: 998
Raw % Compressed % Objects
Revisions: 484 KiB 2% 135 KiB 3% 998
Inventories: 5215 KiB 24% 2409 KiB 66% 4736
Texts: 15168 KiB 72% 1075 KiB 29% 1895
Signatures: 19 KiB 0% 7 KiB 0% 51
Total: 20888 KiB 100% 3627 KiB 100% 7680
- --255-way fan out, groupcompress
Commits: 998
Raw % Compressed % Objects
Revisions: 484 KiB 2% 135 KiB 4% 998
Inventories: 4428 KiB 22% 1667 KiB 57% 4642
Texts: 15168 KiB 75% 1069 KiB 37% 1895
Signatures: 19 KiB 0% 7 KiB 0% 51
Total: 20101 KiB 100% 2880 KiB 100% 7586
Now, the 16-way fan has a few other tricks about extraction order, to
make insertion order better. Also, the conversion of texts actually goes
down to 960 KiB originally, and then expands when we recompress. So it
seems that we need to do a little bit better about our GC insertion
ordering (10% difference).
The results are rather early, and probably within whatever noise
margins. But I *do* see that GC repositories compress texts noticeably
better than --1.9 format. The inventory still is bloated versus --1.9.
And groupcompress isn't compressing those pages any better that the
knit-delta version (though I spent a lot more time tweaking that code).
I'll try to submit my '--development5' branch on Monday next week. (This
changes the inventory texts to use '\n' in them, which allows the delta
algorithms to do a decent job.)
I think it is ready, I just want to look over the patch again before
submitting it.
After that, I'd like to get the groupcompress autopack code working
again, and then start testing with a mysql repository.
And then looking at changing the insertion order for groupcompress data.
Also, I think I mentioned in the past, that we might want to consider
writing offsets as distance from current position. (Consider writing 2
texts in a row right after eachother, but very far from the first text.
You may have a delta that is ~100bytes, versus an offset that is 100,000
bytes from the beginning.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkmV+fwACgkQJdeBCYSNAAO5/gCgq3Q2odDeNJjWSkT+B8Xz1GeT
b9kAoNozzwLrDdIenBZKJrG41P31wolB
=4nG0
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list