Binary file support

Martin Pool martinpool at gmail.com
Thu Oct 13 05:51:56 BST 2005


On 13/10/05, John Arbash Meinel <john at arbash-meinel.com> wrote:
> I know Aaron mentioned a patch in the past, to add a binary flag to
> files, so that we can more properly handle diff and merge.

I'd rather have bzr just notice that the file is binary and therefore
shouldn't be run through a text diff or merge.

> I also know that Robert Collins was doing something about "rolling
> checksums", so maybe I just need to hear what he was doing, and I'll
> have no more to say.

Rolling checksums are a way of splitting binary files into chunks,
such that the contents of chunks tend to stay the same even if data is
inserted or removed between them.  Where you would use lines in a text
file, you can use chunks in a binary file.  This is part of the
algorithm rsync and rdiff use.  We can use it to store diffs between
binaries.

However many binaries are going to change entirely from one version to
another, without any content being in common, so perhaps it would be
better just to store their full text.

I think a fast weave-like format needs to allow storing full copies
from time to time (like arch cacherevs), so that you don't need to
traverse all of history.  For binary files (or some binary files) we
could just store a full copy every time, so avoiding calculating
useless diffs but still using just a single format.

--
Martin




More information about the bazaar mailing list