groupcompress extraction 10x faster
John Arbash Meinel
john at arbash-meinel.com
Thu Feb 19 19:10:41 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert observed a while back that gc extraction was faster than knits
*if* you didn't abuse the "_group_cache". I've been playing with some
conversions, and I noticed that getting the texts out was pretty slow.
When I looked closer, I found out why. Namely, we were using
"dict-sorted" order for texts during get_record_stream(..., 'unordered').
I went ahead and committed the attached patch, which makes a huge
difference on my mysql test repository. Specifically "time bzr
repository-details" dropped from 7m10s down to 35s (14x faster). (I was
able to get the same performance by changing the group_cache to 250MB,
but obviously that uses a lot more RAM during processing.)
Anyway, I thought Robert especially would like to know about this
change. I'm also probably going to play around with a "gc-optimal"
ordering, just to see what happens.
As it is now, because of the semi-random ordering, gc actually ends up a
net loss for my test of "mysql-5.1 -r525".
gc+chk255:
Commits: 1043
Raw % Compressed % Objects
Revisions: 3990 KiB 0% 826 KiB 2% 1043
Inventories: 31012 KiB 3% 15328 KiB 38% 12090
Texts: 882272 KiB 96% 23565 KiB 59% 7226
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 917275 KiB 100% 39720 KiB 100% 20359
chk255:
Commits: 1043
Raw % Compressed % Objects
Revisions: 3990 KiB 0% 1228 KiB 4% 1043
Inventories: 31012 KiB 3% 15987 KiB 56% 12090
Texts: 882272 KiB 96% 11174 KiB 39% 7226
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 917275 KiB 100% 28390 KiB 100% 20359
and versus the original knit repo:
Commits: 1043
Raw % Compressed % Objects
Revisions: 3949 KiB 0% 1201 KiB 8% 1043
Inventories: 842115 KiB 48% 1840 KiB 12% 1043
Texts: 882272 KiB 51% 11174 KiB 78% 7226
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 1728337 KiB 100% 14216 KiB 100% 9312
Something still isn't right with the gc+chk255 repository, as we
certainly should be getting *some* compression for inventories, better
than the chk255 repository. I mostly just wanted to point out that
without proper ordering the gc compressed texts actually double in size.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkmdrrEACgkQJdeBCYSNAANJkgCeKo3CmyWVtZXP5e0cxh1SACWH
aN8AoNaWFNVdMsifQTpWFMCRV7vkwYWl
=C3EG
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: gc_sorting.diff
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090219/6526c9de/attachment.diff
More information about the bazaar
mailing list