[BUG] Unicode string must be always used with encodings

John A Meinel john at arbash-meinel.com
Tue Sep 27 04:24:37 BST 2005


Andrew Bennetts wrote:
> On Mon, Sep 26, 2005 at 02:11:23PM -0500, John A Meinel wrote:
> 
>>Alexander Belchenko wrote:
>>
>>>I suppose that:
>>>
>>>* for control files it must be used utf-8 always,
>>>* for input from user it must be used input_encoding (sys.stdin.encoding 
>>>or user_encoding),
>>>* for output to user it must be used output_encoding 
>>>(sys.stdout.encoding or user_encoding)
>>>* for decode filenames to unicode strings it must be used user_encoding
>>
>>I'm not sure about this last one. For instance, most Linux systems use 
>>utf-8 as the encoding. And Windows uses UTF-16 (of which python doesn't 
>>seem able to read).
>>
>>I'm not sure about some characters, but I know that I'm not able to read 
>>arabic filenames in python (native or cygwin). Now, arabic is extra 
>>crazy because of right-to-left vs left-to-right, so there might be 
>>better support for other languages. (But IDLE can print arabic 
>>characters correctly, and still os.listdir() shows the files as "??????")
> 
> 
> os.listdir has no way to know what encoding the filenames have.  It just
> returns byte strings (str type), not unicode strings (unicode type).
> 
> If you know of a way to fix this on all platforms, python-dev would be
> interested to hear about it ;)
> 
> There's nothing preventing a directory on at least linux holding files with
> names in different encodings, so I'm not sure it easily solvable at all, and
> ideally tools like bzr need to be able to cope with that :(
> 
> -Andrew.
> 
> 
Well, on Windows, files are stored as UTF-16 strings, so it seems if you 
supply a unicode string to os.listdir() it will return the unicode entries.

But I agree, on Linux it is just a set of bytes, and really doesn't care 
what you use.

I thought most systems tried to translate into utf-8, but I could be wrong.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050926/ed9007f6/attachment.pgp 


More information about the bazaar mailing list