Compressing packages with bzip2 instead gzip?

Phillip Susi psusi at cfl.rr.com
Wed Dec 7 16:37:21 GMT 2005


Patrick McFarland wrote:
> On Wednesday 07 December 2005 09:03, Stefan Glasenhardt wrote:
> Except you usually don't see that kind of gain on binary files. Infact, I've 
> seen in a minority of binary files that gzip out performed bzip2. The only 
> place I can see bzip2 making sense is on docs, manual pages, xml files, and 
> anything that isn't an executable, a library, an image/sound/video/other 
> media file, or the holy visage of the the flying spaghetti monster.
> 

I'm not sure where you got that idea, but as far as I can see, it is not 
correct.  To test I tared up and compressed my /bin directory with both 
gzip and bzip2 on max compression.  Uncompressed the tar is 8,028,160 
bytes.  gzip got it down to 3,573,887 bytes and bzip2 got it down to 
3,176,206 bytes.  That's more than 10% better compression on binary files.

Then I tried 7zip using the recommended settings for maximum 
compression.  It shrank the archive down to 1,288,041 bytes, which is 
less than half the size of bzip2!  Then I did some simple CPU usage 
comparisons.  I decompressed all 3 to /dev/null and timed the results. 
gzip was the fastest, with only 0.2 seconds of cpu time used. 
Interestingly though, 7zip was 3x faster than bzip2 with a time of 0.5 
seconds vs. 1.5 seconds.

Next I found the largest .deb package in my apt cache, which was 
linux-image-2.6.12-10-386_2.6.12-10.24_i386.deb at 18 megs.  I extracted 
it with dpkg -x and tared the resulting directory tree, and again, 
compressed with gzip, bzip2, and 7zip.  Here are the results:

Original .deb:    18,025,624
.tar.gz:          18,070,956
.tar.bz2:         16,337,633
.tar.7z:          12,631,291

Interestingly, the gzip archive was slightly larger than the original 
.deb, but only slightly.  bzip2 again managed about 10% better 
compression, and 7zip managed 30% better compression.  When 
decompressing, 7zip was twice as fast as bzip2.

I repeated the test on 
openoffice.org2-help-en-us_1.9.129-0.1ubuntu5_all.deb:

Original .deb:    10,941,694
.tar.gz:          10,965,302
.tar.bz2:         9,880,228
.tar.7z:          7,036,209

Again, bzip2 got about 10% better compression, and 7zip managed 35% 
better compression than gzip, and again, 7zip was about twice as fast 
extracting as bzip2.


I think these tests are pretty fair, and the conclusion is that we 
should be using 7zip because it provides much better compression than 
either gzip or bzip2, and is faster at extracting than bzip2.



More information about the ubuntu-devel mailing list