is utf-8 the standard filename encoding?

Wed Dec 21 19:18:11 UTC 2011

On Wed, 2011-12-21 at 09:42 -0800, Steve Langasek wrote:
> It's possible I'm mistaken about the default behavior on Ubuntu
> Server,
> though - someone please correct me if I'm wrong.  Maybe this is
> another
> reason why we need to get the C.UTF-8 locale going everywhere.

It is definitely not using C.UTF-8 everywhere. And just C is not
UTF-8. Is it even valid to specify a charset for C locale? Doesn't
POSIX define it as always being ASCII?

> Notwithstanding the above (which indeed also explains why using the
> locale's
> charset value is a poor heuristic for interpreting filenames on the
> Linux
> filesystem), it's my understanding that the GNOME vfs stack has
> refused for
> several years now to work with any filenames that aren't UTF-8.  So
> desktop
> users with non-utf8 filenames are going to have a hard time of it.
> 
This isn't quite true. There is a complicated set of environment
variables, and checks in the code, to ensure that display is always
UTF-8, but it generally handles non-UTF-8 filenames gracefully.
Python on the other hand, just raises Unicode encoding/decoding
exceptions, and apps have to handle these to be graceful themselves.

I think Python 3 might make this a bit better though, by using Unicode
as the default string type, rather than the bytes in 2.x.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20111221/72b8a9ff/attachment-0001.pgp>