is utf-8 the standard filename encoding?

Roberto Alsina roberto.alsina at
Wed Dec 21 15:39:01 UTC 2011

On 12/21/2011 12:26 PM, John Arbash Meinel wrote:
> Hash: SHA1
> ...
>> On U1 we have a lot of code to handle this, because we also deal
>> with windows, where things are completely different.
> I'd be interested to hear more, given that on Windows the filesystem
> encoding of NTFS is officially UTF-16 (without surrogate pairs, I
> believe, making it UCS-2).
> So while there are lots of 8-bit mappings on Windows (code page, ANSI,
> OEM, file-content vs filename, etc, etc.), the filenames on disk are
> all Unicode. (FAT-32 is probably a different story, though.)

Well, the main problem, IIRC, was that since this was linux code, it 
assumed things were either utf-8 strings or unicode, and ... well, on 
windows, it never is utf-8 strings.
On windows, most FS APIs will just give you unicode strings encoded in 

Also, on python there are lots of "fun" things, like listdir(".") and 
listdir(u".") giving different results, of course :-)

More information about the ubuntu-devel mailing list