[ubuntu-art] Recompressing PNGs to save space?

Frank Schoep frank at ffnn.nl
Tue Apr 25 10:00:16 BST 2006

Thanks for the warm welcome Mark. Matt, I CCed you because I wasn't subscribed 
to ubuntu-devel yet so couldn't reply directly to your mail there.

First of all, some new data and a word of caution:

Today I used MD5 hashes to see if there were any duplicate PNGs in my 
collection and indeed there were quite a lot: out of all 12.023 images, there 
were 4.514 duplicates (37.5% of the total count).

Using OpenOffice.org Calc I calculated the savings of getting rid of the 
duplicates: 10.545.222 bytes (10 Mb).

At this point I got suspicious and thought about mistakes I could have made 
during my initial data gathering. I thought about the effect of symlinks in 
the directory structure and although "find" doesn't follow symlinks when 
they're directories, "find" does report  symlinked files.

I will try to investigate whether or not symlinks are causing the many 
duplicate image files and report back once I have the new, possibly 
corrected, numbers.

Matt, I also wanted to address your comment about not gaining much space on 
the CDs: in my original mail I wrote that the savings are roughly eight 
megabytes when using bzip2 to compress all PNGs into a single archive. Eight 
megabytes is quite a lot I think, especially because this makes room for some 
new or updated packages on the CD.

To summarize: before we make any decisions on my initial analysis I'd like to 
investigate the correctness of the numbers I provided, I hope this won't pose 
a problem. The timespan for this is about two or three days.

In the meantime, I'll be looking forward to discussing the matter on 
ubuntu-art and ubuntu-devel with anyone interested in this subject.

With kind regards,

Frank Schoep

On Tuesday 25 April 2006 02:03, Mark Shuttleworth wrote:
> This is a very cool bit of work from Frank Schoep over on
> ubuntu-art at l.u.c that I think might be of interest to ubuntu-devel at .
> Very late for such a pervasive change BUT 15MB is not to be sniffed at.
> Frank Schoep wrote:
> > ...

