bug: files with non-ascii chars?

Gábor Farkas gabor at nekomancer.net
Thu Jan 11 10:11:24 GMT 2007


John Arbash Meinel wrote:
> As near as I can tell, we have found a bug in python. Specifically if we do:
> 
> # touch foo, fooç
> python
>>>> open('foo', 'wb').close()
>>>> open(u'foo\xe7', 'wb').close()
> 
> LANG=C python
>>>> import os
>>>> os.listdir(u'.')
> [u'foo', 'foo\xc3\xa7']
>          ^^- Notice that this is a plain string, not a unicode string.
> And it is storing the utf-8 bytestream of the filename, not a unicode
> string.
> 
> I would have expected that if we did "os.listdir('.')" since that always
> returns the bytestreams. But u'.' is supposed to return decoded strings.
> 

yes, it is a bug (you could say that it's a bug in the documentation, 
bacause it does not mention it)... basically, when the filename cannot 
be decoded then it just returns the bytestring. for a discussion, see:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/ab5f367f51e07c1f/a11971e2ddd8beaf

gabor



More information about the bazaar mailing list