bzip2 and gzip --rsyncable internals
Paul Sladen
ubuntu at paul.sladen.org
Fri Jan 20 15:07:00 GMT 2006
On Thu, 19 Jan 2006, Phillip Susi wrote:
> Paul Sladen wrote:
> > > > The data stream DOESN'T restart every 900 KB,
> > The advantage of Free Software is that you get the source code...
> Not sure what your point was with that.
I believe a better understanding of the algorthims can furthered by studying
the source-code of the implentation. The following pointers are good
starting points for working out what the bzip2 code is up to:
$ grep \'9\' bzip2-1.0.3/bzip2.c
case '9': blockSize100k = 9; break;
$ grep -m1 nblockMAX bzip2-1.0.3/bzlib.c
s->nblockMAX = 100000 * blockSize100k - 19;
This explains why if you execute 'bzip2' with arguments '-vvv' you'll see
the number 899981 come up frequently.
> > > > That's what gzip --rsyncable does,
> I do not believe that is the case
The question of what black-magic is in use originally came up when looking
at how to make the LiveCDs efficiently rsyncable:
$ grep -m1 -A1 RSYNC_SUM_MATCH gzip-1.3.5/deflate.c
#define RSYNC_SUM_MATCH(sum) ((sum) % RSYNC_WIN == 0)
/* Whether window sum matches magic value */
-Paul
--
This country is covered in white fluffy snow. Helsinki, FI
More information about the ubuntu-devel
mailing list