Rsyncable LiveCD overview

Paul Sladen ubuntu at paul.sladen.org
Sat Jan 15 15:45:48 CST 2005


This morning somebody pointed out that a small change (kernel rev) on the
Ubuntu LiveCD images was producing a major churn and resulting in rsync
needing to fetch down 175MB (nearly a third of the image).

  aaaabbbbccccdd|   |ddeeeeffffgggg   <--- old image
               /     \
  aaaabbbbcccc|NEWDATA|eeeeffffgggg   <--- new image

Even when new information is added to a file, rsync has the magic and
intelliegence to look backwards and forwards through the old file to see it
can find a match and avoid any re-downloading.  In the example above, rsync
copies slightly less of the first half, inserts the NEWDATA, skips a bit and
finds a match in the second half.  Rsyncing uncompressed files works great.

However, in the case of the Ubuntu LiveCD, 95% of the disk is actually a
single huge file which is itself compressed (this is what is mounted using
the kernel 'cloop' driver.  For reference 'cloop' is read-only).

  aaaabbbbccccddddeeeeffffgggg      <--- old image data
  A4(^4)*6                          <--- old image compressed
  |----|
  A4(^4)*3NEWDATAE4^4^4             <--- new image compressed
  aaaabbbbccccNEWDATAeeeeffffgggg   <--- new image data

What you can here is that whilst the actual images (old+new) are not much
different (only the 'NEWDATA' would need copying), in fact the compressed
versions are *vastly* different and after the first 6 bytes there is no
apparent correlelation *at all*.

So one solution that would work is to rsync an /uncompressed/ version of the
'big file' on the LiveCD.  This could potentially be done by using a loop
driver at each end of the rsync.  The second is to ensure that the image
that is produced does not suffer like the example above.

  aaaabbbbccccddddeeeeffffgggg      <--- old image data
  A4^4^4^4^     4^4^4               <--- old image compressed (ignore spaces)
  |----|        |---|
  A4^4^4NEWDATAE4^4^4               <--- new image compressed
  aaaabbbbccccNEWDATAeeeeffffgggg   <--- new image data

Here, the decompressor has not changed at all, but the compressor has been
told to synchoronize the stream every 4 input bytes.  This has made a larger
compressed version (about 0.07% larger in the real-world), but rsync can now
see matches before and after the New Data in the middle.

This idea is already built into 'gzip' and can be enabled with the
'--rsyncable' argument.  In Ubuntu and Debian, all the '.deb's are built
this way to reduce rsync-traffic.  The install CD ISO images are built by
Jigdo by concatranating (connecting together) all these '.deb's and adding
the filesystem magic around them.  This is why the install CD syncs fine
between revisions.

The gzip 'deflate' alogorithm is used all over the place and the library
'zlib' that does the actual work has spread further than gzip;  it's used by
PNG, compressed webpages and in the kernel for PPP, compressing the boot
kernel and by 'cloop'---the compressed loopback driver that handles the
LiveCD filesystem.  (Gzip is compiled with it's own copy of 'deflate.c' and
is not linked against the shared zlib library).

Since there are no changes to the decompressor, only zlib itself needs
patching for the compression stage (and only the copy linked to the cloop
tools);  the 'create_compressed_fs' command in 'cloop-utils' can then use
this without any modifications.  Matt added that the compressor in use for
Ubuntu images is really the newer 'advfs' installed with a different name.

A Hungarian distribution hit something similar (with apt-rsync) a couple of
years ago, ported the original '--rsyncable' patch and added a simple hack
to test an environment variable to transparently enable the sychronisation
code without the application needing to know or be recompiled:

  http://lists.debian.org/debian-devel/2003/07/msg00462.html

Patches  2a, 2b & 3  are the patches to pull in, the last of which is the
environment variable patch, but I'm not sure if that would get accepted
upstream, whereas adding an extra flag to the 'level' parameter of
zlib's compress2() might be more paletable.

Lamont says he's going to try and work on it (I guess not having broadband
is a motivator!)

I hope that's useful to everyone who listened this far,

	-Paul
-- 
Is there no safe way to travel?  London, GB





More information about the ubuntu-devel mailing list