B+Tree indices: ongoing progress
John Arbash Meinel
john at arbash-meinel.com
Wed Jul 2 06:41:54 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
| On Tue, 2008-07-01 at 21:59 -0500, John Arbash Meinel wrote:
|> So... If you trust that we will always get at least 2:1 compression
|> (it
|> seems reasonable given the amount of entropy in our revision_ids),
|> then
|> I can cut the time down quite a bit, and retain *most* of the saving.
|>
|> (9b) gives 4,612,096 versus 6,172,672 without repacking, is 1:1.34,
|> instead of the maximum 1:1.4.
|>
|> Note that this is also one of those places where you would probably
|> like
|> "bzr pack" to set max_repacks = 100, and have it go to town.
|>
|> Interestingly, if I set compression_level = 9 I get exactly the same
|> compression.
|
| re safety:
| As long as in the event of an overrun it *at worst* asserts during
| writing, I'm _ok_. To be happy I'd like it to always work no matter what
| (short of 4K long values :P). I expect that (bzr-git revids will be very
| high entropy and thus unlikely to get 2:1 compression.
|
| re: compression level - thats because of the window size - we're dealing
| in a very small stream anyway. We can drop the compression but we can't
| raise it using zlib.
|
| re: performance - I found while tuning commit to be hg-speed that we
| really need to eliminate fat all the way through the process. At the end
| of that process I was seeking 0.1 second wins throughout the code base.
| (On a 20K file tree - or 1/4 the size of this data set. So any expansion
| here is something we will need to counteract to prevent commit()
| becoming slower).
Ok, this is done in the attached patch. (9b). So it will only try to
repack 2 times, and assumes that the compressor will do a 2:1 job on the
input data.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkhrFSIACgkQJdeBCYSNAAPHIgCglO/GHrrjxZrFYLuHsl48q9FZ
AqcAn0gCzB91qE9zzon5GH2NB6HoG0iL
=FfA0
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: faster_chunk_writer.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080702/0ce9f22c/attachment-0001.diff
More information about the bazaar
mailing list