test_nonascii: two unicode a's
Alexander Belchenko
bialix at ukr.net
Mon Jul 3 17:46:14 BST 2006
John Arbash Meinel пишет:
Wouter van Heyst wrote:
>> On Sun, Jul 02, 2006 at 09:26:09AM -0500, John Arbash Meinel wrote:
>>> Also, what version of Windows are you using?
>> >From irc and the begin of the thread,
>>
>> <bialix>: I have windows 2000 on FAT32. The same behaviour with cygwin on the same FS.
>
> Well, on WinXP and FAT32 I can still create both å and ä in the same
> directory, and they both show up properly in Explorer.
>
> So maybe it is a Win2k issue.
I don't agree here. And I have opinion that this is FAT32 vs. NTFS
difference.
I ask in a russian python's user forum for little help -- to test next code:
import os
f1 = file(u'\xe4', 'w')
f1.write('foo')
f1.close()
f2 = file(u'\xe5', 'w')
f2.write('bar')
f2.close()
print os.listdir('.')
print os.listdir(u'.')
print file(u'\xe4').read()
print file(u'\xe5').read()
I ask people to run this code on russian windows with FAT32 and NTFS.
Even on russian WinXP + FAT32 this test fails as on my machine, i.e.
result is:
['test.py', 'a']
[u'test.py', u'\xe4']
bar
bar
On russian WinXP + NTFS all OK:
['test.py', 'a', 'a']
[u'test.py', u'\xe4', u'\xe5']
foo
bar
Even on Win2k + NTFS it seems to work. But I cannot bet.
So I think it's a FAT32 limitation.
Even if I change sequence of creating files then it works in the same
manner:
import os
f2 = file(u'\xe5', 'w')
f2.write('bar')
f2.close()
f1 = file(u'\xe4', 'w')
f1.write('foo')
f1.close()
print os.listdir('.')
print os.listdir(u'.')
print file(u'\xe4').read()
print file(u'\xe5').read()
--
['test.py', 'a']
[u'test.py', u'\xe5']
foo
foo
I have hypothesis that FAT32 cannot distinct between filenames that
cannot be represented in current coding page. I suppose that John works
in latin-1 codepage so for him this files are differ. But on russian
(cyrillic) machine this filenames is not representable in cp1251 so
windows choose most similar character -- and this character is simple
ascii 'a' (as you can see in output of os.listdir('.').
I make another test to show that this character can de distinct if there
is another different characters in filename:
import os
for ix, fn in enumerate([u'\xe0', u'\xe1', u'\xe2', u'\xe3',
u'\xe4', u'\xe5', u'\xe6']):
file(fn + '.' + str(ix), 'wb').write(str(ix))
print os.listdir('.')
print os.listdir(u'.')
--
['test3.py', 'a.0', 'a.1', 'a.2', 'a.3', 'a.4', 'a.5', '?.6']
[u'test3.py', u'\xe0.0', u'\xe1.1', u'\xe2.2', u'\xe3.3', u'\xe4.4',
u'\xe5.5', u'\xe6.6']
See attached screenshot: view from explorer.
More proposal to deal with this issue I'll send later.
--
Alexander
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unicode-a.png
Type: image/png
Size: 5732 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060703/ac0caf88/attachment.png
More information about the bazaar
mailing list