is utf-8 the standard filename encoding?

Roberto Alsina roberto.alsina at canonical.com
Wed Dec 21 15:39:01 UTC 2011


On 12/21/2011 12:26 PM, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> ...
>
>> On U1 we have a lot of code to handle this, because we also deal
>> with windows, where things are completely different.
> I'd be interested to hear more, given that on Windows the filesystem
> encoding of NTFS is officially UTF-16 (without surrogate pairs, I
> believe, making it UCS-2).
>
> So while there are lots of 8-bit mappings on Windows (code page, ANSI,
> OEM, file-content vs filename, etc, etc.), the filenames on disk are
> all Unicode. (FAT-32 is probably a different story, though.)
>

Well, the main problem, IIRC, was that since this was linux code, it 
assumed things were either utf-8 strings or unicode, and ... well, on 
windows, it never is utf-8 strings.
On windows, most FS APIs will just give you unicode strings encoded in 
UTF-16.

Also, on python there are lots of "fun" things, like listdir(".") and 
listdir(u".") giving different results, of course :-)



More information about the ubuntu-devel mailing list