Diff and merge of archives - proposal

Martin (gzlist) gzlist at googlemail.com
Wed Oct 13 21:15:53 BST 2010


On 13/10/2010, John Arbash Meinel <john at arbash-meinel.com> wrote:
> ...
>> If an archive has changed, the contents will be decompressed and
>> diffed (note that the contained files and folders will not have their
>> own file-ids, so renames will not be detected!).
>
> Note that if someone is versioning a 'tar.gz' file, often they need the
> exact binary content. As an example, debian packages build from a tar.gz
> file. However they use a sha/md5sum to make sure that the tarball is
> valid. But they use the shasum of the compressed content. Because of
> this, there have been a lot of hacks (pristine-tar), because gzip is not
> deterministic.

My thought was why not store the uncompressed archive contents in the
repo, using pristine tar style hacks to reproduce the original archive
as needed? Versioning the content and a small hunk of archive metadata
would be better for bzr and is along the lines of Martin Pool's
thoughts on content filtering. Would mean existing binary blobs
wouldn't magically grow nice diff and merge behaviour, but saves repo
bloat from blob changes.

Martin



More information about the bazaar mailing list