Compressing packages with bzip2 instead gzip?
psusi at cfl.rr.com
Wed Jan 18 17:04:31 GMT 2006
Paul Sladen wrote:
> Assuming there is a small change and zsync, there are three options:
> gzip, treated as data: Download ~100% of file
> gzip with look-inside: Download ~30%-40% of file
> gzip with --rsyncable: Download ~ 5%-10% of file
You got that backwards. gzip with look-inside will give smaller deltas
than gzip with --rsyncable. The look-inside method also is usable on
bzip2 and 7zip, so you can get better compression.
> (zsync needs teaching where to find the start of the two zlib streams within
> a .deb for the second two. The .debs need repacking for the last one).
Also not true. zsync does not need teaching to find the zlib streams
unless it is going to look inside them. If it isn't looking inside the
gzip streams, then it is syncing the entire deb, which is just an ar of
the two gzip tars, so there's just a few extra bytes of header, which
won't bother zsync at all. You also don't need to repack the debs for
the last one since dpkg already basically builds with --rsyncable.
> Zsync can be taught that restart points occur every 900kB, likewise zsync
> could be taught to find the restart markers in LZMA, but there's even less
> of those. You have a choice of optimising for two cases:
The data stream DOESN'T restart every 900 KB, that's the problem.
That's what gzip --rsyncable does, and bzip2 and 7zip don't support that
( and don't want to since it sacrifices compression ). When you do
cause data restarts every x KB, zsync doesn't require any special
handling to sync them efficiently, the restarts themselves contain the
fallout from a small change in the original uncompressed stream, which
means there is less changed data zsync has to find and send.
> Smallest size of .debs on the CD image
> Least bandwidth used during a network upgrade
> Pick either.
I'll take both ;)
>> My solution would be to just use bsdiff patches against data.tar and
> This is O(N^2), right? Whereas zsync is O(1).
> Yes, 'apt-torrent' could be used to fetch the chucks (instead of partial
> HTTP), once 'zsync' has figured out what they are.
apt-torrent doesn't make any sense, since torrents need to be rather
large to be effective, and packages are rather small. Otherwise it will
take forever to fetch small packages using torrents.
Bittorrent makes sense when downloading the entire install cd image, not
for downloading individual packages.
More information about the ubuntu-devel