[MERGE REVIEW] Binary file handling

Robert Collins robertc at robertcollins.net
Wed Apr 19 05:59:07 BST 2006


On Tue, 2006-04-18 at 16:14 +1000, Martin Pool wrote:
> On 16 Apr 2006, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> > Matthieu Moy wrote:
> > | Aaron Bentley <aaron.bentley at utoronto.ca> writes:
> 
> > |>Binary files are defined as files containing the NUL character (\x00) in
> > |>their first 1024 bytes.  Reportedly, this is the heuristic used by diff.
> > |>This does, unfortunately, mean that UTF-16 files will be treated as
> > |>binary.
> 
> Have you ever seen a UTF-16/UCS-2 source file in a tree?  I know they
> might occur on Windows but it seems unlikely even there.  I suppose the
> current diff code will (unknowningly) probably do the right thing with
> them by seeing the end of lines.

Its probably more common in CJK characterset using locales.

There is a defined way to detect utf16 though - look for the byte
ordering mark.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060419/8349df91/attachment.pgp 


More information about the bazaar mailing list