Bazaar-NG traffic #2

John A Meinel john at arbash-meinel.com
Tue Oct 11 14:33:17 BST 2005


David Allouche wrote:
> On Tue, 2005-10-11 at 03:21 -0400, James Blackwell wrote:
>
>>= _Always_ Unicode =
>>
>>The Unicode discussions continued this week. Last week, Alexander Belchenko
>>referred to some bzr code that didn't handle Russian filenames properly.
>>This week Belchenko followed up a couple more times without a response.
>
>
> Something which has been somewhat nagging me...
>
> I would like if it were possible to have byte-stream file names. In some
> situations (e.g. automated imports from CVS) you might end up with file
> names contaning non-ASCII characters without any encoding information.
> Trying to interpret those names as unicode is haphazard at best, and
> likely incorrect.
>
> Generally, when getting data from legacy sources, you cannot expect to
> have encoding information. I would like to read about how CVS handles
> non-ascii file names, from people who have direct experience with that.
>
> To be honest we only once had non-ascii file names in source code
> repositories in a few hundred mainline imports, but the number are
> biased since we have been focusing on increasing the number of
> successful imports, disregarding (numerous) import failures.

Do you have any of these directories/files available? I would be curious
what this returns:

python -c "import os; print os.listdir(u'.')"
versus
python -c "import os; print os.listdir('.')"

The first should try and interpret the names and return unicode, the
second should just do ascii names (possibly just byte-stream names).

I know for sure that on windows, if you have a non-ascii name, the
former returns ['???????'], while the later returns [u'\u07077\u71070']
I believe the windows filesystem is all UTF-16.

>
> Just something I think might be troublesome in the future.

I agree that it is something we need to be aware of. Can you see how
python will handle it? Since likely we will try to let it handle
determining filenames. Though we might have separate code paths on Linux
vs Windows if we find we have to do something specific in our code
depending on platform.

John
=:->

>
>
>>= How to do tabs in bzr =
>
>               ^^^^ tags?
>
>>The thread from last week on how to do threads in bzr wrapped up this week.
>
>                                          ^^^^^^^ tags?
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051011/c8ec14f5/attachment.pgp 


More information about the bazaar mailing list