bug: files with non-ascii chars?
Gábor Farkas
gabor at nekomancer.net
Thu Jan 11 10:11:24 GMT 2007
John Arbash Meinel wrote:
> As near as I can tell, we have found a bug in python. Specifically if we do:
>
> # touch foo, fooç
> python
>>>> open('foo', 'wb').close()
>>>> open(u'foo\xe7', 'wb').close()
>
> LANG=C python
>>>> import os
>>>> os.listdir(u'.')
> [u'foo', 'foo\xc3\xa7']
> ^^- Notice that this is a plain string, not a unicode string.
> And it is storing the utf-8 bytestream of the filename, not a unicode
> string.
>
> I would have expected that if we did "os.listdir('.')" since that always
> returns the bytestreams. But u'.' is supposed to return decoded strings.
>
yes, it is a bug (you could say that it's a bug in the documentation,
bacause it does not mention it)... basically, when the filename cannot
be decoded then it just returns the bytestring. for a discussion, see:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/ab5f367f51e07c1f/a11971e2ddd8beaf
gabor
More information about the bazaar
mailing list