More LiveCD space optimizations

John McCabe-Dansted gmatht at gmail.com
Sun Oct 17 05:22:39 UTC 2010


On Fri, Oct 8, 2010 at 6:22 AM, Louis Simard <louis.simard at gmail.com> wrote:
>>>> A further 10MB could be saved by recompressing the gz files as lzma.
>>> At what LZMA compression level? Default (7) or --best (9)?
>> --best
>
> I just want to add that blanket recompression of gzip files as lzma
> with --best could be harmful, but with small files it's probably OK.
> LZMA uses a huge dictionary to do its work, which needs to be
> allocated even on the decompressing side, and --best may overrun the
> memory of low-end computers on larger files.

Also I noticed that --best does not always produce the smallest files.

I compared compressing everything with lzma -7 vs compressing using -1
through -7 and picking the one that produced the smallest file.
Generally -3 and -5 produced the smallest files, and we this saved
40K. More importantly it also limits the amount of memory used in
recompression.

I also noted that xz is newer than lzma, and can write to the lzma
format. Strangely, it usually produces larger files than lzma, and
stranger still the "extreme" compression option -e produces larger
files again. However, it does sometimes produce smaller lzma files,
and so by selectively using xz or xz -e can save an extra 150K.
Interestingly the optimum compression option for xz is -6 (the maximum
I allowed) while for xz -e it was usually -1 or -2, so while the -e
option increases average file size it also decreases average memory
required for decompression.

xz also has the option to manually set the filters used rather than
pick the presets -1..-9. We could probably write and equivalent of
optipng to dynamically pick the best filters for xz to gain a little
more compression, but I have not done this (at least not yet).

Limiting ourselves to compression options that result in files that
can be decompressed in 1MB of ram increases the size by 417K to
75.68MB, still much smaller than the 91MB used by gzip.

I also note that uncompressing the gzip files actually saves space on
the squashfs relative to leaving them as gz files, but only 8K so not
significant.

Some files are not likely to be used very often and are good
candidates for recompression as lzma. For example, recompressing the
gz files in /usr/share/doc drops them from 48.47MB to  41.99MB, saving
about 6.5MB. The changelogs alone take up 31.78MB, but can be
compressed into 27.44MB lzmas, saving 4.4MB. Recompressing the man
pages was not so effective, just shrinking them from 15.58MB to
15.14MB, saving just 438K.

Disclaimer: The file sizes above are a simple total of the compressed
sizes. The size of the squashfs is usually about 65% of this, perhaps
squashfs deduplicates files. For example, it seems that switching from
gz to lzma saves only 8MB instead of the 15MB one may expect from the
relative sizes of the compressed files.

For the record, I attach the script "ultralzma" I used the recompress
the files. The spreadsheet containing the results is at
   http://dansted.co.cc/Summary.xlsx.gz
(it chose xlsx  format due to bug #661836)

-- 
John C. McCabe-Dansted
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ultralzma
Type: application/octet-stream
Size: 2319 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-devel-discuss/attachments/20101017/b12947b5/attachment.obj>


More information about the Ubuntu-devel-discuss mailing list