Unicode (UTF-16) files on Windows
Philippe Lhoste
PhiLho at GMX.net
Thu Aug 20 09:38:39 BST 2009
I was puzzled because I had a simple .reg file (exported by regedit) which I hacked to add
support for a new source code extension (icon, editor/compiler, etc.), and Bazaar was
seeing it as binary although my editor shown only CR and LF control chars...
The Bazaar User Reference mentions (casually) that binary status is guessed by content (I
suppose looking for some control chars at first bytes, as usual).
When I opened the file with a hex editor, I saw the reason: it is an UTF-16 file with Bom
(0xFF 0xFE).
It is annoying because I cannot do diffs (it says just "Binary files ... differ and qdiff
shows nothing -- at least I can do an external diff), cats are strange (letters are double
spaced -- qcat shows a hex view), etc.
How come Bazaar doesn't handle properly UTF-16 with Bom? Maybe you can add the detection
of the Bom to the heuristic of binary file detection? Of course, it means other commands
(like cat) should understand UTF-16 as well, so it might imply more work than it seems.
I found back a similar case: https://lists.ubuntu.com/archives/bazaar/2006q2/010794.html
"Have you ever seen a UTF-16/UCS-2 source file in a tree? I know they might occur on
Windows but it seems unlikely even there."
Well, we have here a "typical" case. That, and text files (documents) written with Notepad
(an error, I know...) which defaults (?) to UTF-16, for example.
I suppose such support is low priority (after all I can use WinDiff, my editor and other
external tools) but that's the kind of glitches that make some users to say that Bazaar
support of Windows is lacking (saw that in StackOverflow:
http://stackoverflow.com/questions/995636 ).
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
More information about the bazaar
mailing list