zsync and --rsyncable - Was: Compressing data with bzip2 ...
John C. McCabe-Dansted
gmatht at gmail.com
Thu Jan 19 05:58:44 GMT 2006
On Thursday 19 January 2006 16:07, Phillip Susi wrote:
> Aigars Mahinovs wrote:
> >That would require either a huge amount of disk space or additional
> >software on the server + a significant load on the server. That is no
> >better then the rsync situation today as this is simply not acceptable
> >to most mirrors.
>
> An apache module could be written to decompress the deb on demand, cache
> it, and serve the decompressed contents out of the cache. That would
> not require a whole lot of cpu ( assuming a good cache hit rate ) or
> disk resources on the server, just some added software. In the end it
> would save disk space on the servers because it would allow the packages
> to be compressed with 7zip instead of gzip, and would save bandwidth
> because people would not be downloading nearly as much data.
>
> John C. McCabe-Dansted wrote:
> >This does *not* require that the uncompressed tar be stored on the server.
> > In fact the zsync manual recommends that you do not decompress the tar,
> > so that zsync can read the smaller compressed data.
>
> The zsync manual is foolish then.
Well statistics
http://zsync.moria.org.uk/paper/ch03s05.html
are equally foolish.
> It requires the tar to be
> uncompressed on the server
no
> because you want to zsync the uncompressed
> data, not the compressed data.
Zsync reads compressed data from the server, but reads and writes uncompressed
data to/from a local file.
> You change one byte in the uncompressed
> data and it ripples changes through the compressed stream from there to
> the end of the block,
yes
> causing zsync to have to send a LOT more data.
no, see http://zsync.moria.org.uk/paper/
> The gzip --rsyncable sets gzip to use small blocks to contain the
> ripples,
This is necessary for rsync not zsync. My tests indicate that --rsyncable does
not assist zsync's performance (nor does the 4KB blocks that dpkg uses).
> but if you want good compression, you compress the entire file
> in one pass, and use a better algorithm like LZMA, but then zsync won't
> work very well because of the massive change ripples.
It should be possible to get zsync-look-inside to work with LZMA, if it is a
stream compressor like gzip rather than a block compressor like bzip2 (as I
understand).
> >Zsync uses a clever trick which I think goes basically like this:
> >
> >$gzip_opts=detect which options were used to create data.tar.gz
> >tail -f data.tar data.tar | gzip $gzip_opts > data.tar.gz &
> >if next block b matches
> >then
> > cat b >> data.tar
> >else
> > read data from http://server/data.tar.gz until there is enough data in
> > the local copy of data.tar.gz to reconstruct block b.
> >fi
>
> That code contradicts your original statement by using the uncompressed
> tars, which you said the server did not need to store. Other than that,
> I can't make much sense out of it.
data.tar and data.tar.gz are local files, http://.../data.tar.gz is a remote
file. Hence we only need to (generate) an uncompressed file on the local
machine. There is no uncompressed remote file.
The dataflow is:
file://.../new/data.tar <- zsync (
file://.../old/data.tar,
http://.../new/data.tar.zsync;
http://.../new/data.tar.gz;
}
If my explanation is unclear you should probably read the full paper (above).
--
John C. McCabe-Dansted
Masters Student
More information about the ubuntu-devel
mailing list