Bazaar-NG traffic #2

John A Meinel john at arbash-meinel.com
Tue Oct 11 16:13:53 BST 2005


Magnus Therning wrote:
> On Tue, Oct 11, 2005 at 04:19:44PM +0200, Joel Rosdahl wrote:
> 
>>John A Meinel <john at arbash-meinel.com> writes:
>>
>>
>>>Magnus Therning wrote:
>>>
>>>
>>>>On Tue, Oct 11, 2005 at 08:33:17AM -0500, John A Meinel wrote:
>>>>
>>>>
>>>>>Do you have any of these directories/files available? I would be
>>>>>curious what this returns:
>>>>>
>>>>>python -c "import os; print os.listdir(u'.')"
>>>>>versus
>>>>>python -c "import os; print os.listdir('.')"
>>>>>
>>>>>The first should try and interpret the names and return unicode,
>>>>>the second should just do ascii names (possibly just byte-stream
>>>>>names).
>>>>
>>>>Both return a list containing all the files in the current dir on
>>>>my Linux machine. The first one is a list of unicode strings
>>>>([u'str1', u'str2']). The second is a list of regular strings
>>>>(['str1', 'str2']). I.e. exactly what you predicted.
>>>
>>>Naturally, I would expect that. :) What I wanted to know was what
>>>happens when you have non-ascii characters in that directory?
>>>[...]
>>
>>os.listdir(u".") returns regular strings for names that can't be
>>decoded using the filesystem encoding.
>>
>>I have made some notes about how to use Unicode in a Python-based
>>project of mine:
>>
>>   http://kofoto.rosdahl.net/trac/wiki/UnicodeInPython
>>
>>They may be useful for others too.
> 
> 
> Useful indeed.
> 
>  Note 2: os.listdir(u"path") returns Unicode strings for names that can
>  be decoded with sys.getfilesystemencoding() but silently returns byte
>  strings for names that can't be decoded. That is, the return value of
>  os.listdir(u"path") is potentially a mixed list of Unicode and byte
>  strings.
> 
> 
> Is the following non-ascii enough?
> 
>  $ ls | cat
>  Hallå_där
>  Köp_blåbär
> 
> With unicode:
>  [u'Hall\xe5_d\xe4r', u'K\xf6p_bl\xe5b\xe4r']
>
> Without:
>  ['Hall\xc3\xa5_d\xc3\xa4r', 'K\xc3\xb6p_bl\xc3\xa5b\xc3\xa4r']

Actually, these are non-ascii, but they are properly encoded. Which I
have also tested, and found that python works just fine (as long as you
specify the u'.').

I need to hear back from David, since he is the one who seems to have
incorrectly encoded filenames. Though I guess you could do:

>>> open('H\x01\x01\x01.txt').write('hello\n')
>>> print os.listdir(u'.')
>>> print os.listdir('.')

I'm pretty sure that the filename is valid on Linux, but is not properly
encoded anywhere. (Though it might be perfectly valid utf-8, and just
not correspond to standard "characters").

John
=:->

> 
> /M
> 
> Hej Joel!
> 
>>Joel Rosdahl <joel at rosdahl.net>
>>Key BB845E97; fingerprint 9F4B D780 6EF4 5700 778D  8B22 0064 F9FF BB84 5E97
>>
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051011/874a2361/attachment.pgp 


More information about the bazaar mailing list