Unicode error on Windows
Philippe Lhoste
PhiLho at GMX.net
Thu Jul 9 07:37:25 BST 2009
On 09/07/2009 04:06, Martin Pool wrote:
> What do you think should happen when we have a character that can't be
> shown on the console? Maybe we should map it to '?' or similar?
The minimum I ask, of course, is no crash... :-)
Perhaps display:
How do I use @require and @resource ? Userscripts.org <== Undisplayable
character there
or something, so the user can know he cannot use that in a script.
More information:
- I use a French Windows XP Pro SP3, with UnxUtils providing a useful
set of Unix commands like ls or cat.
- I have set Lucida Console as display font of the CMD window.
- The file was created by drag'n'dropping the favicon of the Web page
(on the left of the address bar of Firefox 3) to Windows Explorer: FF
transcodes some forbidden characters (eg. > becomes ») but keeps the
others. I use to do that to bookmark pages which are useful coding
reference: double-click on the file to open the page. That's why I have
funky chars in "source files"...
Doing a little more experiments:
- I created a file with – U+2013 (EN DASH) and another with ⁂ U+2042
(ASTERISM), and for good measure, one with § U+00A7 (SECTION SIGN) which
is in the high Ascii part of my codepage (CP1252) (and available on my
keyboard).
- On Windows Explorer, the asterism (three stars stacked) is displayed
as a blank box, it isn't available in the font used for display (Tahoma?).
I can open the file with asterism in Notepad and WordPad but not in
SciTE, my editor of choice...
- Some commands:
> dir
[skip stuff...]
08/07/2009 07:30 0 .bzrignore
01/09/2008 08:12 4 390 A Bzr – Test.txt
01/09/2008 08:12 4 390 B Bzr ⁂ Test.txt
01/09/2008 08:12 4 390 C Bzr § Test.txt
# I just notice: the asterism is displayed as blank box on the console.
I copy/pasted the output to this e-mail (Thunderbird) and I can see the
character, so it is preserved on output!
> ls -1
A Bzr û Test.txt
B Bzr ? Test.txt
C Bzr º Test.txt
D:\Temp\TestBzr\RunApp
> ls -l
ls: B Bzr ? Test.txt: No such file or directory
total 26
-rw-rw-rw- 1 user group 4390 Sep 1 2008 A Bzr û Test.txt
-rw-rw-rw- 1 user group 4390 Sep 1 2008 C Bzr º Test.txt
# ls (from UnxUtils) transposes these characters, and is lost when
trying to get stats on the asterism...
> type "B Bzr ⁂ Test.txt" # Copy/paste from DIR or auto-completion
[output of file content]
> ls "B Bzr ⁂ Test.txt"
ls: B Bzr ? Test.txt: No such file or directory
> cat "B Bzr ⁂ Test.txt"
cat: B Bzr ? Test.txt: Invalid argument
ls and cat work with A and C files, using either copy/paste (of DIR
result) or auto-completion. But cat doesn't work with the name output by
ls...
Not sure what the conclusion is, but it looks that on the XP console,
you can preserve the Unicode information and re-use it... At least,
native Windows tools can do that.
Ah, also:
> bzr add
adding "A Bzr ? Test.txt"
adding "B Bzr ? Test.txt"
adding "C Bzr § Test.txt"
> bzr mv "A Bzr – Test.txt" "A Bzr -- Test.txt"
A Bzr ? Test.txt => A Bzr -- Test.txt
> bzr mv "B Bzr ⁂ Test.txt" "B Bzr @ Test.txt"
B Bzr ? Test.txt => B Bzr @ Test.txt
> bzr ls
.bzrignore
A Bzr -- Test.txt
B Bzr @ Test.txt
C Bzr § Test.txt
I used auto-completion above.
Somehow, Bazaar still manage to correctly handle these files...
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
More information about the bazaar
mailing list