[RFC] binary files and merging

Aaron Bentley aaron.bentley at utoronto.ca
Tue Jul 18 15:42:08 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> We just had some discussion on IRC, because PDF files may have a header
> longer than 1K before they get into their compressed content. So just
> reading the first 1K bytes may yield a false negative as text data.

Wow, that's surprising.  I get NULs very early in PDFs:

%PDF-1.4
%äãÏÒ
9 0 obj
<</Length 10 0 R
/Filter/FlateDecode
>>
stream
x~\^C^@^@^@^@^A

But in general, the NUL check is a heuristic that may be wrong.  It's
possible to have binary files with no NULs, for example.

> On IRC the question was brought up as to whether it would be worthwhile
> to check all lines while you are merging. We already have to read the
> data, and the code seems to be structured such that you could raise
> BinaryFile at any time.

It sounds pretty resonable.  We'd want to make sure we checked all lines
we read, rather than a subset (like conflicting lines).  Also, we should
consider whether it's valuable to have a consistent definition of
'binary file'.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEvPNA0F+nu1YWqI0RAlzbAJ0WOk0G8oFdKFsmY39nTk5bgZDBzgCdHgB3
kh0+ruXldZkqZEk0fRn9KRU=
=eKTI
-----END PGP SIGNATURE-----




More information about the bazaar mailing list