Compressing packages with bzip2 instead gzip?

Paul Sladen ubuntu at paul.sladen.org
Wed Jan 18 13:41:04 GMT 2006


On Tue, 17 Jan 2006, Michael Vogt wrote:
> On Tue, Jan 17, 2006 at 12:14:36PM +0000, Paul Sladen wrote:
> > Delta-debs on Mirrors. [..] zsync
> Using dpkg-repack scares me a bit TBH.

The purpose is to generate a big file of potential building-blocks, so:

  $ dpkg -L $package | xargs tar cf lego-bricks.bigblob

would be equivalent.  As would 'xargs | cat' ;-).  The timestamp or order
files in the tarball doesn't really matter but the files should be ones
/likely/ to /resemble/ files in the update;  the list of files can be passed
to 'zsync', rather than building it.

Any pieces not matched locally are fetched using HTTP partial-content;  
Therefore, the failure mode is to fetch the whole '.deb' verbatium (as would
have happened anyway).

The big problem is that the server-side data also needs to be uncompressed.  
(Or faked-uncompressed with gzip --rsyncable, which 'zsyncmake' can generate
itself).  The mirrors likely cannot take a ten-fold increase of having
uncompressed .debs but can probably take the X% hit of rsyncable .debs.

The reason is that you have to be able to address a particular byte-range;
Compression changes the length of the file in a non-linear fashion and means
there isn't an instant mapping from file_offset -> uncompressed_file_offset;  
These identity points can be indexed, zsync internally does this and calls
them 'Z-Maps', they get embedded in the '.zsync'.

If bzip2 (a block encoder) is used, then identity points only exist every
900kB.

> zsync needs to be told how to deal with the data.tar and that we need a
> way to either recontruct the same md5 sum of the zsynced deb

The checksums are used for two things, (a) sanity checking, (b) signing:

  Release.gpg -> .deb [->] control/md5sums -> data.tar:/usr/bin/foo

Currently that signing of the total '.deb' implicitly takes care of (1)
timestamps and permissions on 'data.tar', (2) signing all control scripts.  
Ideally the following trust relationship would be true.

  Release.gpg -> control.tar
  Release.gpg -> data.tar

Then the package-format version and compression encoding can be changed
without breaking the trust.

  (Security Note: Yes, an attackor could potentially use a corrupted gzip
  stream to attempt to overflow 'dpkg' to cause a Denial of Service or 
  privilge elevation...).

> We can do that today in apt with the
> michael.vogt at ubuntu.com--2005/apt--pdiff--0 branch, it just requires
> the server to publish a suitable Index and patches like on
> ftp://ftp.debian.org/debian/dists/unstable/main/binary-i386/Packages.diff/ 

I like the approach of patch files.  It makes good sense in the case the
'Packages' file---although, I'm guessing they are cummulative.  With
'zsync', only one extra file exists (associated with the latest revision)
and regardless of the number of changes.

'zsync' for the Packages file is doable much more easily.

Conclusion, 10% extra data on the mirrors in exchange for 90% less bandwidth
for the user (with an already-installed version).  Some brain-thinking
required about signature handling and bzip2 packages.

	-Paul
-- 
This country is covered in white fluffy snow.  Helsinki, FI





More information about the ubuntu-devel mailing list