Compressing packages with bzip2 instead gzip?
Phillip Susi
psusi at cfl.rr.com
Wed Dec 7 16:37:21 GMT 2005
Patrick McFarland wrote:
> On Wednesday 07 December 2005 09:03, Stefan Glasenhardt wrote:
> Except you usually don't see that kind of gain on binary files. Infact, I've
> seen in a minority of binary files that gzip out performed bzip2. The only
> place I can see bzip2 making sense is on docs, manual pages, xml files, and
> anything that isn't an executable, a library, an image/sound/video/other
> media file, or the holy visage of the the flying spaghetti monster.
>
I'm not sure where you got that idea, but as far as I can see, it is not
correct. To test I tared up and compressed my /bin directory with both
gzip and bzip2 on max compression. Uncompressed the tar is 8,028,160
bytes. gzip got it down to 3,573,887 bytes and bzip2 got it down to
3,176,206 bytes. That's more than 10% better compression on binary files.
Then I tried 7zip using the recommended settings for maximum
compression. It shrank the archive down to 1,288,041 bytes, which is
less than half the size of bzip2! Then I did some simple CPU usage
comparisons. I decompressed all 3 to /dev/null and timed the results.
gzip was the fastest, with only 0.2 seconds of cpu time used.
Interestingly though, 7zip was 3x faster than bzip2 with a time of 0.5
seconds vs. 1.5 seconds.
Next I found the largest .deb package in my apt cache, which was
linux-image-2.6.12-10-386_2.6.12-10.24_i386.deb at 18 megs. I extracted
it with dpkg -x and tared the resulting directory tree, and again,
compressed with gzip, bzip2, and 7zip. Here are the results:
Original .deb: 18,025,624
.tar.gz: 18,070,956
.tar.bz2: 16,337,633
.tar.7z: 12,631,291
Interestingly, the gzip archive was slightly larger than the original
.deb, but only slightly. bzip2 again managed about 10% better
compression, and 7zip managed 30% better compression. When
decompressing, 7zip was twice as fast as bzip2.
I repeated the test on
openoffice.org2-help-en-us_1.9.129-0.1ubuntu5_all.deb:
Original .deb: 10,941,694
.tar.gz: 10,965,302
.tar.bz2: 9,880,228
.tar.7z: 7,036,209
Again, bzip2 got about 10% better compression, and 7zip managed 35%
better compression than gzip, and again, 7zip was about twice as fast
extracting as bzip2.
I think these tests are pretty fair, and the conclusion is that we
should be using 7zip because it provides much better compression than
either gzip or bzip2, and is faster at extracting than bzip2.
More information about the ubuntu-devel
mailing list