Unicode (UTF-16) files on Windows
bialix at ukr.net
Thu Aug 20 10:56:52 BST 2009
Philippe Lhoste пишет:
> I was puzzled because I had a simple .reg file (exported by regedit)
> which I hacked to add support for a new source code extension (icon,
> editor/compiler, etc.), and Bazaar was seeing it as binary although my
> editor shown only CR and LF control chars...
> The Bazaar User Reference mentions (casually) that binary status is
> guessed by content (I suppose looking for some control chars at first
> bytes, as usual).
UTF-16 encoding uses 2 bytes (typically) for each character, so any
ASCII letter becomes as 'letter' '\x00'. This extra NUL byte force bzr
to think this is a binary file.
> When I opened the file with a hex editor, I saw the reason: it is an
> UTF-16 file with Bom (0xFF 0xFE).
BOM is good, but actually bzr looking for null bytes.
> It is annoying because I cannot do diffs (it says just "Binary files ...
> differ and qdiff shows nothing -- at least I can do an external diff),
> cats are strange (letters are double spaced -- qcat shows a hex view), etc.
We can support UTF-16 better in QBzr (and separately from bzr internal
behavior): please file 2 bug reports with feature request for qdiff and
qcat. They are separate code paths so I'd like to track (and fix) them
> How come Bazaar doesn't handle properly UTF-16 with Bom? Maybe you can
> add the detection of the Bom to the heuristic of binary file detection?
> Of course, it means other commands (like cat) should understand UTF-16
> as well, so it might imply more work than it seems.
> I found back a similar case:
> "Have you ever seen a UTF-16/UCS-2 source file in a tree? I know they
> might occur on Windows but it seems unlikely even there."
> Well, we have here a "typical" case. That, and text files (documents)
> written with Notepad (an error, I know...) which defaults (?) to UTF-16,
> for example.
> I suppose such support is low priority (after all I can use WinDiff, my
> editor and other external tools) but that's the kind of glitches that
> make some users to say that Bazaar support of Windows is lacking (saw
> that in StackOverflow: http://stackoverflow.com/questions/995636 ).
That comment on stack overflow is typical FUD and bullshit.
But anyway it creates wrong perception, it's true and it's pity.
More information about the bazaar