Bazaar-NG traffic #2

Joel Rosdahl joel at rosdahl.net
Tue Oct 11 16:46:43 BST 2005


Magnus Therning <magnus at therning.org> writes:

> On Tue, Oct 11, 2005 at 04:19:44PM +0200, Joel Rosdahl wrote:
>
> [...]
>> I have made some notes about how to use Unicode in a Python-based
>> project of mine:
>>
>>    http://kofoto.rosdahl.net/trac/wiki/UnicodeInPython
>>
>> They may be useful for others too.
>
> Useful indeed.
>
>  Note 2: os.listdir(u"path") returns Unicode strings for names that
>  can be decoded with sys.getfilesystemencoding() but silently
>  returns byte strings for names that can't be decoded. That is, the
>  return value of os.listdir(u"path") is potentially a mixed list of
>  Unicode and byte strings.
>
> Is the following non-ascii enough?
>
>  $ ls | cat
>  Hallå_där
>  Köp_blåbär
>
> With unicode:
>  [u'Hall\xe5_d\xe4r', u'K\xf6p_bl\xe5b\xe4r']
>
> Without:
>  ['Hall\xc3\xa5_d\xc3\xa4r', 'K\xc3\xb6p_bl\xc3\xa5b\xc3\xa4r']

If you're trying to test my note about the mixed list of Unicode and
byte strings, then: no. :-)

Try this program:

===[cut here]=========================================================
import os
import shutil

os.mkdir("test")
os.chdir("test")
open("r\xe5ka", "w")
open(u"r\xe4v", "w")
print os.listdir(u".")
os.chdir("..")
shutil.rmtree("test")
===[cut here]=========================================================

In a UTF-8 environment, the above program will print this:

    ['r\xe5ka', u'r\xe4v']

In an ISO-8859-1 environment, the program will print this:

    [u'r\xe5ka', u'r\xe4v']

> Hej Joel!

Hallå Magnus!

-- 
Regards,
Joel Rosdahl <joel at rosdahl.net>
Key BB845E97; fingerprint 9F4B D780 6EF4 5700 778D  8B22 0064 F9FF BB84 5E97




More information about the bazaar mailing list