Binary file support

Aaron Bentley aaron.bentley at utoronto.ca
Thu Oct 13 19:08:18 BST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> Aaron Bentley wrote:
> 
>>diff's heuristic for 'binary' is reported to be 'contains NUL in the
>>first 1k'.  For text diffing, another useful test is 'contains VT-102
>>control characters'.
> 
> 
> This also fails for UTF-16 files, which could be good candidates for
> diff & patch. 

Yeah, I know.  Doing this really properly would involve assigning a
character encoding to files, which I'm not ready to do yet.

> I believe a lot of Java files are UTF-16. I don't know the
> specifics, other than because the character size is 16-bits if you are
> writing in a western language every other byte is NUL.

> I'm guessing difflib wouldn't have any problems, it is just an issue of
> how to detect the "newline" character.

Yeah, I expect you can patch an executible using difflib.  Or we could
use unicode strings as the input.

>>Also, it allows you to truncate history.
> 
> 
> So this is talking about the idea of a revised weave format, right? The
> question is what would need to be put in a cached revision. Because
> wouldn't you still want annotations for every line that is present? Now,
> you might get some compaction because not all ancestors contribute to
> the current text.

Right.

> Unless you really just want to truncate the ancestry. And pretend like
> the current text is the baseline.

I meant removing old revisions from the weave and having the 'cacherev'
be annotated.

> I think for a lot of binaries you could get some compression out of a
> weave, though I don't know if it is worth it or not.

It's easier that way, and it might be nice sometimes.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDTqKS0F+nu1YWqI0RAslvAJ9AkWP49pC+YUgk88xoIrK3+3wPdQCffXAU
ijj0v8KhxldM1RV5UXJGNMk=
=db+v
-----END PGP SIGNATURE-----




More information about the bazaar mailing list