test_nonascii: two unicode a's

John Arbash Meinel john at arbash-meinel.com
Mon Jul 3 18:31:30 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belchenko wrote:
> John Arbash Meinel пишет:
> Wouter van Heyst wrote:
>>> On Sun, Jul 02, 2006 at 09:26:09AM -0500, John Arbash Meinel wrote:
>>>> Also, what version of Windows are you using?
>>> >From irc and the begin of the thread,
>>> <bialix>: I have windows 2000 on FAT32. The same behaviour with
>>> cygwin on the same FS.
>>
>> Well, on WinXP and FAT32 I can still create both å and ä in the same
>> directory, and they both show up properly in Explorer.
>>
>> So maybe it is a Win2k issue.
> 
> I don't agree here. And I have opinion that this is FAT32 vs. NTFS
> difference.

No, I don't think so. If you look at Explorer, it is actually creating
the correct files.
The problem is something between python and the operating system.

Again I ask, what does this return:

python -c 'import sys; print sys.getfilesystemencoding()'

I'm guessing it might be (incorrectly) returning your russian encoding,
rather than returning MBCS.

...

> I have hypothesis that FAT32 cannot distinct between filenames that
> cannot be represented in current coding page. I suppose that John works
> in latin-1 codepage so for him this files are differ. But on russian
> (cyrillic) machine this filenames is not representable in cp1251 so
> windows choose most similar character -- and this character is simple
> ascii 'a' (as you can see in output of os.listdir('.').

If it was a FAT32 issue, then Explorer couldn't show you the correct
filenames. Because it couldn't physically represent them on disk.

Obviously it has a way to store the exact filename in the filesystem.
There is just an issue with how python is using the win32 apis (or maybe
it is the win32 apis themselves that are at fault)

We can try to track down where the bug lies, but I really don't think it
is an explicit FAT32 issue.
If FAT32 couldn't handle the character, then it would be munged, and
explorer wouldn't show the correct characters.
FAT32 can store the right characters, this seems more like a
'case-insensitive' issue, where \xe5 is considered the same character,
but in a different case than \xe4.

That actually makes the most sense to me. That we are running into a bug
with how win32 handles case sensitivity when you are using characters
that are not in your locale. Or maybe it is just the russian locale that
has weird settings for what characters match when ignoring case.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEqVRyJdeBCYSNAAMRAg3PAJ9zgdV7cQ1KDmejWnQThG43iH6w4ACfbQGg
kjblz3O93PP45DUHG1KDoAg=
=vhZx
-----END PGP SIGNATURE-----




More information about the bazaar mailing list