test_nonascii: two unicode a's

Alexander Belchenko bialix at ukr.net
Mon Jul 3 20:01:38 BST 2006


John Arbash Meinel пишет:
> Alexander Belchenko wrote:
>> John Arbash Meinel пишет:
>> Wouter van Heyst wrote:
>>>> On Sun, Jul 02, 2006 at 09:26:09AM -0500, John Arbash Meinel wrote:
>>>>> Also, what version of Windows are you using?
>>>> >From irc and the begin of the thread,
>>>> <bialix>: I have windows 2000 on FAT32. The same behaviour with
>>>> cygwin on the same FS.
>>> Well, on WinXP and FAT32 I can still create both å and ä in the same
>>> directory, and they both show up properly in Explorer.
>>>
>>> So maybe it is a Win2k issue.
>> I don't agree here. And I have opinion that this is FAT32 vs. NTFS
>> difference.
> 
> No, I don't think so. If you look at Explorer, it is actually creating
> the correct files.
> The problem is something between python and the operating system.

But this problem exists only on FAT32 + russian Windows. On NTFS this 
problem don't exists.

> Again I ask, what does this return:
> 
> python -c 'import sys; print sys.getfilesystemencoding()'

It returns 'mbcs'.

> I'm guessing it might be (incorrectly) returning your russian encoding,
> rather than returning MBCS.

No. It returns correct thing.

>> I have hypothesis that FAT32 cannot distinct between filenames that
>> cannot be represented in current coding page. I suppose that John works
>> in latin-1 codepage so for him this files are differ. But on russian
>> (cyrillic) machine this filenames is not representable in cp1251 so
>> windows choose most similar character -- and this character is simple
>> ascii 'a' (as you can see in output of os.listdir('.').
> 
> If it was a FAT32 issue, then Explorer couldn't show you the correct
> filenames. Because it couldn't physically represent them on disk.

Well, it seems for me that windows has 2 forms of file entry in
filesystem: some internal thing and external-unicode with preserved 
case. IIRC, FAT32 was introduced in Windows95 days when unicode was not 
supported properly.

> Obviously it has a way to store the exact filename in the filesystem.
> There is just an issue with how python is using the win32 apis (or maybe
> it is the win32 apis themselves that are at fault)
> 
> We can try to track down where the bug lies, but I really don't think it
> is an explicit FAT32 issue.
> If FAT32 couldn't handle the character, then it would be munged, and
> explorer wouldn't show the correct characters.
> FAT32 can store the right characters, this seems more like a
> 'case-insensitive' issue, where \xe5 is considered the same character,
> but in a different case than \xe4.

But it works in latin-1 environment correctly, only in non-latin-1 have 
problems.

> That actually makes the most sense to me. That we are running into a bug
> with how win32 handles case sensitivity when you are using characters
> that are not in your locale. Or maybe it is just the russian locale that
> has weird settings for what characters match when ignoring case.

So, why it's 'weird russian locale' depends on filesystem? I repeat it 
again: on NTFS it works correct.

I really don't think this small issue require too much efforts to dig 
into truth. I propose to slightly change filenames in these tests to 
make them different even on my weird russian machine, per example to 
append numbers 1,2,3 to filenames and add explanation about this strange 
effect in comments of test. I really want to see that all tests pass and 
on my crazy machine too.

--
Alexander





More information about the bazaar mailing list