[BUG] Unicode string must be always used with encodings
John A Meinel
john at arbash-meinel.com
Tue Sep 27 04:24:37 BST 2005
Andrew Bennetts wrote:
> On Mon, Sep 26, 2005 at 02:11:23PM -0500, John A Meinel wrote:
>
>>Alexander Belchenko wrote:
>>
>>>I suppose that:
>>>
>>>* for control files it must be used utf-8 always,
>>>* for input from user it must be used input_encoding (sys.stdin.encoding
>>>or user_encoding),
>>>* for output to user it must be used output_encoding
>>>(sys.stdout.encoding or user_encoding)
>>>* for decode filenames to unicode strings it must be used user_encoding
>>
>>I'm not sure about this last one. For instance, most Linux systems use
>>utf-8 as the encoding. And Windows uses UTF-16 (of which python doesn't
>>seem able to read).
>>
>>I'm not sure about some characters, but I know that I'm not able to read
>>arabic filenames in python (native or cygwin). Now, arabic is extra
>>crazy because of right-to-left vs left-to-right, so there might be
>>better support for other languages. (But IDLE can print arabic
>>characters correctly, and still os.listdir() shows the files as "??????")
>
>
> os.listdir has no way to know what encoding the filenames have. It just
> returns byte strings (str type), not unicode strings (unicode type).
>
> If you know of a way to fix this on all platforms, python-dev would be
> interested to hear about it ;)
>
> There's nothing preventing a directory on at least linux holding files with
> names in different encodings, so I'm not sure it easily solvable at all, and
> ideally tools like bzr need to be able to cope with that :(
>
> -Andrew.
>
>
Well, on Windows, files are stored as UTF-16 strings, so it seems if you
supply a unicode string to os.listdir() it will return the unicode entries.
But I agree, on Linux it is just a set of bytes, and really doesn't care
what you use.
I thought most systems tried to translate into utf-8, but I could be wrong.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050926/ed9007f6/attachment.pgp
More information about the bazaar
mailing list