More LiveCD space optimizations

John McCabe-Dansted gmatht at gmail.com
Thu Oct 7 16:07:24 UTC 2010


On Thu, Oct 7, 2010 at 10:05 AM, Louis Simard <louis.simard at gmail.com> wrote:
> Hey :)
>
> Thanks for the interest in this optimisation! Unfortunately I wasn't
> pushy enough in my thread from May-June and it wasn't included in the
> Maverick LiveCD. A pending question is what to do to include the
> recompressed files into the archive's packages [1].

I think this will be discussed at UDS-N, see:
http://archives.free.net.ph/message/20101004.065026.e553efd1.en.html

> 2010-10-06 16:08 GMT John McCabe-Dansted <gmatht at gmail.com>:
>> In May, Louis Simard proposed rencoding PNG files and SVG files to
>> reduce their size [Quoted 1]. I note that we can save further space by:
>>
>> 1) Using advdef on the png files in addition to optipng. This is what
>> optimizegraphics does, and this shrinks the pngs on the Maverick RC
>> liveCD from about 100.1MB to 85.3MB providing a saving of 14.8MB.
>
> So it does; I didn't know about that. Reading the man file for advpng,
> it gave a warning that it was only supported for AdvanceMAME-generated
> PNG files, so I was skeptical, but it does shave off about 4% more
> filesize on average with 'advpng -z4'.

We could test each file to ensure the image is identical, perhaps
using pngtopnm, and md5sum. This would be especially important for
jpegrescan/jpgcrush, which is at version 0.0.0-1.

>> 2) Recompressing gz files with advdef. Using advdef, we can shrink the
>> gz files from 89.5MB to 84.8MB, and provides a saving of 4.7MB.
>
> That's an interesting optimisation; I didn't really know about it
> either. However, I did use 7zip's Deflate compressor to recompress a
> .zip file of OpenOffice.org's from 5.9 MB to 5.4 MB. The method was
> rather crude, but it did the job:
>
> mkdir extracted
> cd extracted
> unzip ../file.zip
> 7z a -tzip -mx=9 -mfb=258 file.repack.zip extracted/*
> rm -r extracted

You mean images_human.zip? I have a hunch that compressing that file
wouldn't actually save space on the liveCD as I can gzip it down to
3.9MB. It may be better to leave it as an uncompressed zip, and let
squashfs deal with it. Recompressing the pngs contained in the zip
sounds worthwhile though. Strangely, even running advzip -z -0
images_human.zip shrinks it by 3%, and even shrinks the corresponding
images_human.zip.gz file

Also, there are 12MB of jar files, which are basically zip files. We
can also shrink those by 5MB or so with advzip, but that doesn't seem
to shrink a .tgz of them so it may not shrink the liveCD. Since zip
files compress file by file, we may be able to save space on the
liveCD by running "advzip -z -0" on them. That would expand them to
24MB, but reduces the size of a .tgz of them to 4.6MB, possibly saving
space on the liveCD if squashfs is similarly efficient.

>> 3) Recompressing jpeg files with jpegrescan. This only saves 0.5MB,
>> but implementing this would add just a couple more lines of code, and
>> jpegrescan does not lose any picture quality [Quoted 2].
>
> jpegoptim indeed performs lossless optimisation of JPEG files by
> editing Huffman tables, and it's used as the basis of jpegrescan.
> However, jpegoptim doesn't make non-progressive files progressive, as
> I understand jpegrescan does. This may make jpegoptim's optimisations
> more transparent to applications that, for some reason, can't decode
> progressive JPEGs and thus have non-progressive JPEGs in their
> packages. However, most applications should be using libjpeg anyway,
> so perhaps this point is moot.
>
>>
>> Together these should shrink the liveCD by over 20MB. This is without
>> even considering the .xml and .svg optimizations Louis proposed.
>>
>> A further 10MB could be saved by recompressing the gz files as lzma.
>
> At what LZMA compression level? Default (7) or --best (9)?

--best

Also, if we want to take replacing deflate with lzma to extremes, we
could replace the deflate compression in the png files with lzma. A
command that does this is "advpng -z -0 $f && lzma --best $f". I found
that this could save 18.7MB. However,  It may also degrade performance
slightly, but I doubt it would be too significant on modern CPUs.
Running unlzma on all 66MB of the .png.lzma files takes:
real	1m2.666s
user	0m6.540s
sys	0m5.610s

I think the user/sys are the relevant ones, and taking 12s to read
every png doesn't seem too bad. The main thing is that I doubt that it
would work out of the box.

If we use lzma in the squashfs, just deflating them all with advpng -z
-0 could reduce the liveCD size. Probably wouldn't help the installed
size though.

>> This seems reasonable as lzma has reasonable decompression times (e.g.
>> 7ms to decompress a largish manpage like lsof).
>
> 7 ms? What's your CPU? :)

Core2Duo E7200  @ 2.53GHz

>> Since the liveCD is
>> compressed anyway, it seems that if a file is compressed with gzip. it
>> is worth compressing with lzma.  The command "man" already seems to
>> have lzma support, but we'd want to test each application to ensure
>> that it functions correctly when its .gz files are replaced with lzma
>> files. We could also selectively recompress the gz files, as some .gz
>> files are actually smaller (by about 40 bytes) than the corresponding
>> lzma file.
>
> I hadn't considered this type of "transcoding" for the LiveCD. We may
> want to ourselves test which programs accept .lzma files in their
> directories in addition to .gz. Shall you do it, shall I, or shall we
> both do it? Also, is anyone else interested?

There are a over a dozen different types of file to be tested (and
there may be more than one application that wants to read them). For
reference, I have attached them. Probably the most important thing to
check is that printing still works, as many of the gz files seem to
e.g. ppd files.

Maybe if you added it to your script and just gave the resulting iso a
spin in a VM to see if there was obvious breakage?

> Your point about files being compressed anyway is kind of interesting:
> both Deflate and LZMA recompress very poorly, so saving bytes by
> switching from one to the other makes sense. That would not shrink the
> *installed size* of these man pages much, though, because of default 4
> KB blocks for ext[2-4].

Hmm, the biggest difference I could find was that advdef can shrink
libidn11/AUTHORS.gz 170 bytes smaller than lzma. In total 4720 files
are smaller as .gz, in total we can save another 165KB by letting some
gz files remain gz files (and this is not counting the 9K we save in
the directories as ".gz" is two bytes smaller than ".lzma" ;)

Still something feels a bit unclean about arbitrarily picking gz or lzma.

>> I attach the script I used to check how much space would be saved.
>> This is purely for reproduction of my results, it is not integrated
>> into Louis's script.
>
> Do you want me to add to my script any of the optimisations discussed
> in your email? They are: Using AdvanceCOMP to recompress .png images
> and gzipped files; using either of jpegoptim or jpegrescan to
> losslessly recompress .jpg images; "transcoding" man pages from .gz to
> .lzma. I'm not going to add untested optimisations yet, such as
> transcoding *all* .gz files to .lzma.

Sure. This could help with testing that these actually work ;).

> I'm still very interested in this, despite the lack of posting about
> the subject in the last 4 months! I've just been waiting for the guys
> at Debian to advise me on how to best integrate these optimisations
> into packages. Perhaps I should just devise a set of suitable
> build-depends additions (optipng, advancecomp, jpegoptim) and makefile
> rules for .png/.jpg/.gz, then file a single bug report on all of the
> packages that would benefit the most from optimisations? That way,
> package maintainers could opt in rather easily.

I wouldn't file bugreports until it has been discussed at UDS-N end of
October. However it does seem this could be useful for upsteam, e.g.
if OO could drop the size of their 150MB windows installer by a few
MB.

P.S. You mentioned html files previously. I tried running Webpack with
the HTML::Clean backend. This shrunk the html files by 1MB, but only
shrunk the corresponding .tgz file by 100k. Also on many files it gave
warnings that it was removing whitespace even though the file had a
<pre> tag which made whitespace important. We could fix this, but it
seems like a low priority.

-- 
John C. McCabe-Dansted
-------------- next part --------------
languagelists
/usr/lib/ubiquity/localechooser/languagelist.data.gz
/usr/share/localechooser/languagelist.data.gz

Changelogs
/usr/share/doc/libsamplerate0/changelog.gz
/usr/share/doc/libsamplerate0/changelog.Debian.gz

PPD, LDL and 
/usr/share/ppd/hplip/HP/HP-Fax2-hpijs.ppd.gz
/usr/share/hplip/data/ldl/cbpcal.ldl.gz
/usr/share/hplip/data/pcl/crbcal.pcl.gz
/usr/share/hplip/data/ps/clean_page.pdf.gz

FONTS
/usr/share/fonts/X11/75dpi/luBIS14-ISO8859-1.pcf.gz

CHARMAPS
/usr/share/cups/charmaps/euc-kr.txt.gz

CONSOLETRANS?
/usr/share/consoletrans/KOI8-U.acm.gz

CONSOLEFONTS?
/usr/share/consolefonts/CyrSlav-Terminus24x12.psf.gz

INFOFILES
/usr/share/info/ssip.info.gz

ASPELLFILES
/usr/share/aspell/en_GB-ise-w_accents-only.cwl.gz

i18n charmaps
/usr/share/i18n/charmaps/ISO_6937-2-ADD.gz

kbdnames
/usr/share/console-setup-mini/kbdnames.gz

gedit plugins
/usr/share/gedit-2/plugins/taglist/XSLT.tags.gz

alternatives
/etc/alternatives/testparm.1.gz

/etc/console-setup/cached.kmap.gz

/var/lib/dpkg/alternatives/builtins.7.gz



More information about the Ubuntu-devel-discuss mailing list