bzip2 and gzip --rsyncable internals

Paul Sladen ubuntu at paul.sladen.org
Fri Jan 20 15:07:00 GMT 2006


On Thu, 19 Jan 2006, Phillip Susi wrote:
> Paul Sladen wrote:
> > > > The data stream DOESN'T restart every 900 KB,
> > The advantage of Free Software is that you get the source code...
> Not sure what your point was with that. 

I believe a better understanding of the algorthims can furthered by studying
the source-code of the implentation.  The following pointers are good
starting points for working out what the bzip2 code is up to:

  $ grep \'9\' bzip2-1.0.3/bzip2.c
  case '9': blockSize100k    = 9; break;

  $ grep -m1 nblockMAX bzip2-1.0.3/bzlib.c
  s->nblockMAX = 100000 * blockSize100k - 19;

This explains why if you execute 'bzip2' with arguments '-vvv' you'll see
the number 899981 come up frequently.

> > > > That's what gzip --rsyncable does,
> I do not believe that is the case

The question of what black-magic is in use originally came up when looking
at how to make the LiveCDs efficiently rsyncable:

  $ grep -m1 -A1 RSYNC_SUM_MATCH gzip-1.3.5/deflate.c
  #define RSYNC_SUM_MATCH(sum) ((sum) % RSYNC_WIN == 0)
  /* Whether window sum matches magic value */

	-Paul
-- 
This country is covered in white fluffy snow.  Helsinki, FI




More information about the ubuntu-devel mailing list