Binary file support
Aaron Bentley
aaron.bentley at utoronto.ca
Thu Oct 13 19:08:18 BST 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> Aaron Bentley wrote:
>
>>diff's heuristic for 'binary' is reported to be 'contains NUL in the
>>first 1k'. For text diffing, another useful test is 'contains VT-102
>>control characters'.
>
>
> This also fails for UTF-16 files, which could be good candidates for
> diff & patch.
Yeah, I know. Doing this really properly would involve assigning a
character encoding to files, which I'm not ready to do yet.
> I believe a lot of Java files are UTF-16. I don't know the
> specifics, other than because the character size is 16-bits if you are
> writing in a western language every other byte is NUL.
> I'm guessing difflib wouldn't have any problems, it is just an issue of
> how to detect the "newline" character.
Yeah, I expect you can patch an executible using difflib. Or we could
use unicode strings as the input.
>>Also, it allows you to truncate history.
>
>
> So this is talking about the idea of a revised weave format, right? The
> question is what would need to be put in a cached revision. Because
> wouldn't you still want annotations for every line that is present? Now,
> you might get some compaction because not all ancestors contribute to
> the current text.
Right.
> Unless you really just want to truncate the ancestry. And pretend like
> the current text is the baseline.
I meant removing old revisions from the weave and having the 'cacherev'
be annotated.
> I think for a lot of binaries you could get some compression out of a
> weave, though I don't know if it is worth it or not.
It's easier that way, and it might be nice sometimes.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDTqKS0F+nu1YWqI0RAslvAJ9AkWP49pC+YUgk88xoIrK3+3wPdQCffXAU
ijj0v8KhxldM1RV5UXJGNMk=
=db+v
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list