Unicode error on Windows

Philippe Lhoste PhiLho at GMX.net
Thu Jul 9 07:37:25 BST 2009


On 09/07/2009 04:06, Martin Pool wrote:
> What do you think should happen when we have a character that can't be
> shown on the console?  Maybe we should map it to '?' or similar?

The minimum I ask, of course, is no crash... :-)
Perhaps display:

How do I use @require and @resource ? Userscripts.org <== Undisplayable 
character there

or something, so the user can know he cannot use that in a script.

More information:
- I use a French Windows XP Pro SP3, with UnxUtils providing a useful 
set of Unix commands like ls or cat.
- I have set Lucida Console as display font of the CMD window.
- The file was created by drag'n'dropping the favicon of the Web page 
(on the left of the address bar of Firefox 3) to Windows Explorer: FF 
transcodes some forbidden characters (eg. > becomes ») but keeps the 
others. I use to do that to bookmark pages which are useful coding 
reference: double-click on the file to open the page. That's why I have 
funky chars in "source files"...

Doing a little more experiments:
- I created a file with – U+2013 (EN DASH) and another with ⁂ U+2042 
(ASTERISM), and for good measure, one with § U+00A7 (SECTION SIGN) which 
is in the high Ascii part of my codepage (CP1252) (and available on my 
keyboard).
- On Windows Explorer, the asterism (three stars stacked) is displayed 
as a blank box, it isn't available in the font used for display (Tahoma?).
I can open the file with asterism in Notepad and WordPad but not in 
SciTE, my editor of choice...

- Some commands:
 > dir
[skip stuff...]
08/07/2009  07:30                 0 .bzrignore
01/09/2008  08:12             4 390 A Bzr – Test.txt
01/09/2008  08:12             4 390 B Bzr ⁂ Test.txt
01/09/2008  08:12             4 390 C Bzr § Test.txt

# I just notice: the asterism is displayed as blank box on the console. 
I copy/pasted the output to this e-mail (Thunderbird) and I can see the 
character, so it is preserved on output!

 > ls -1
A Bzr û Test.txt
B Bzr ? Test.txt
C Bzr º Test.txt

D:\Temp\TestBzr\RunApp
 > ls -l
ls: B Bzr ? Test.txt: No such file or directory
total 26
-rw-rw-rw-   1 user     group        4390 Sep  1  2008 A Bzr û Test.txt
-rw-rw-rw-   1 user     group        4390 Sep  1  2008 C Bzr º Test.txt

# ls (from UnxUtils) transposes these characters, and is lost when 
trying to get stats on the asterism...

 > type "B Bzr ⁂ Test.txt" # Copy/paste from DIR or auto-completion
[output of file content]

 > ls "B Bzr ⁂ Test.txt"
ls: B Bzr ? Test.txt: No such file or directory

 > cat "B Bzr ⁂ Test.txt"
cat: B Bzr ? Test.txt: Invalid argument

ls and cat work with A and C files, using either copy/paste (of DIR 
result) or auto-completion. But cat doesn't work with the name output by 
ls...

Not sure what the conclusion is, but it looks that on the XP console, 
you can preserve the Unicode information and re-use it... At least, 
native Windows tools can do that.

Ah, also:

 > bzr add
adding "A Bzr ? Test.txt"
adding "B Bzr ? Test.txt"
adding "C Bzr § Test.txt"

 > bzr mv "A Bzr – Test.txt" "A Bzr -- Test.txt"
A Bzr ? Test.txt => A Bzr -- Test.txt

 > bzr mv "B Bzr ⁂ Test.txt" "B Bzr @ Test.txt"
B Bzr ? Test.txt => B Bzr @ Test.txt

 > bzr ls
.bzrignore
A Bzr -- Test.txt
B Bzr @ Test.txt
C Bzr § Test.txt

I used auto-completion above.
Somehow, Bazaar still manage to correctly handle these files...

-- 
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --




More information about the bazaar mailing list