test_nonascii: two unicode a's

Alexander Belchenko bialix at ukr.net
Mon Jul 3 17:46:14 BST 2006


John Arbash Meinel пишет:
Wouter van Heyst wrote:
>> On Sun, Jul 02, 2006 at 09:26:09AM -0500, John Arbash Meinel wrote:
>>> Also, what version of Windows are you using?
>> >From irc and the begin of the thread, 
>>
>> <bialix>: I have windows 2000 on FAT32. The same behaviour with cygwin on the same FS.
> 
> Well, on WinXP and FAT32 I can still create both å and ä in the same
> directory, and they both show up properly in Explorer.
> 
> So maybe it is a Win2k issue.

I don't agree here. And I have opinion that this is FAT32 vs. NTFS
difference.

I ask in a russian python's user forum for little help -- to test next code:

import os

f1 = file(u'\xe4', 'w')
f1.write('foo')
f1.close()

f2 = file(u'\xe5', 'w')
f2.write('bar')
f2.close()

print os.listdir('.')
print os.listdir(u'.')

print file(u'\xe4').read()
print file(u'\xe5').read()

I ask people to run this code on russian windows with FAT32 and NTFS.
Even on russian WinXP + FAT32 this test fails as on my machine, i.e.
result is:

['test.py', 'a']
[u'test.py', u'\xe4']
bar
bar

On russian WinXP + NTFS all OK:

['test.py', 'a', 'a']
[u'test.py', u'\xe4', u'\xe5']
foo
bar

Even on Win2k + NTFS it seems to work. But I cannot bet.

So I think it's a FAT32 limitation.

Even if I change sequence of creating files then it works in the same
manner:

import os

f2 = file(u'\xe5', 'w')
f2.write('bar')
f2.close()

f1 = file(u'\xe4', 'w')
f1.write('foo')
f1.close()

print os.listdir('.')
print os.listdir(u'.')

print file(u'\xe4').read()
print file(u'\xe5').read()

--

['test.py', 'a']
[u'test.py', u'\xe5']
foo
foo

I have hypothesis that FAT32 cannot distinct between filenames that
cannot be represented in current coding page. I suppose that John works
in latin-1 codepage so for him this files are differ. But on russian
(cyrillic) machine this filenames is not representable in cp1251 so
windows choose most similar character -- and this character is simple
ascii 'a' (as you can see in output of os.listdir('.').

I make another test to show that this character can de distinct if there
is another different characters in filename:

import os

for ix, fn in enumerate([u'\xe0', u'\xe1', u'\xe2', u'\xe3',
                           u'\xe4', u'\xe5', u'\xe6']):
      file(fn + '.' + str(ix), 'wb').write(str(ix))

print os.listdir('.')
print os.listdir(u'.')

--

['test3.py', 'a.0', 'a.1', 'a.2', 'a.3', 'a.4', 'a.5', '?.6']
[u'test3.py', u'\xe0.0', u'\xe1.1', u'\xe2.2', u'\xe3.3', u'\xe4.4',
u'\xe5.5', u'\xe6.6']

See attached screenshot: view from explorer.

More proposal to deal with this issue I'll send later.

--
Alexander

-------------- next part --------------
A non-text attachment was scrubbed...
Name: unicode-a.png
Type: image/png
Size: 5732 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060703/ac0caf88/attachment.png 


More information about the bazaar mailing list