Unicode (UTF-16) files on Windows

Alexander Belchenko bialix at ukr.net
Thu Aug 20 10:56:52 BST 2009

Philippe Lhoste пишет:
> I was puzzled because I had a simple .reg file (exported by regedit) 
> which I hacked to add support for a new source code extension (icon, 
> editor/compiler, etc.), and Bazaar was seeing it as binary although my 
> editor shown only CR and LF control chars...
> The Bazaar User Reference mentions (casually) that binary status is 
> guessed by content (I suppose looking for some control chars at first 
> bytes, as usual).

UTF-16 encoding uses 2 bytes (typically) for each character, so any 
ASCII letter becomes as 'letter' '\x00'. This extra NUL byte force bzr 
to think this is a binary file.

> When I opened the file with a hex editor, I saw the reason: it is an 
> UTF-16 file with Bom (0xFF 0xFE).

BOM is good, but actually bzr looking for null bytes.

> It is annoying because I cannot do diffs (it says just "Binary files ... 
> differ and qdiff shows nothing -- at least I can do an external diff), 
> cats are strange (letters are double spaced -- qcat shows a hex view), etc.

We can support UTF-16 better in QBzr (and separately from bzr internal 
behavior): please file 2 bug reports with feature request for qdiff and 
qcat. They are separate code paths so I'd like to track (and fix) them 
separately. Thanks.

> How come Bazaar doesn't handle properly UTF-16 with Bom? Maybe you can 
> add the detection of the Bom to the heuristic of binary file detection? 
> Of course, it means other commands (like cat) should understand UTF-16 
> as well, so it might imply more work than it seems.
> I found back a similar case: 
> https://lists.ubuntu.com/archives/bazaar/2006q2/010794.html
> "Have you ever seen a UTF-16/UCS-2 source file in a tree? I know they 
> might occur on Windows but it seems unlikely even there."
> Well, we have here a "typical" case. That, and text files (documents) 
> written with Notepad (an error, I know...) which defaults (?) to UTF-16, 
> for example.
> I suppose such support is low priority (after all I can use WinDiff, my 
> editor and other external tools) but that's the kind of glitches that 
> make some users to say that Bazaar support of Windows is lacking (saw 
> that in StackOverflow: http://stackoverflow.com/questions/995636 ).

That comment on stack overflow is typical FUD and bullshit.
But anyway it creates wrong perception, it's true and it's pity.

More information about the bazaar mailing list