is utf-8 the standard filename encoding?
Rodney Dawes
rodney.dawes at canonical.com
Wed Dec 21 19:18:11 UTC 2011
On Wed, 2011-12-21 at 09:42 -0800, Steve Langasek wrote:
> It's possible I'm mistaken about the default behavior on Ubuntu
> Server,
> though - someone please correct me if I'm wrong. Maybe this is
> another
> reason why we need to get the C.UTF-8 locale going everywhere.
It is definitely not using C.UTF-8 everywhere. And just C is not
UTF-8. Is it even valid to specify a charset for C locale? Doesn't
POSIX define it as always being ASCII?
> Notwithstanding the above (which indeed also explains why using the
> locale's
> charset value is a poor heuristic for interpreting filenames on the
> Linux
> filesystem), it's my understanding that the GNOME vfs stack has
> refused for
> several years now to work with any filenames that aren't UTF-8. So
> desktop
> users with non-utf8 filenames are going to have a hard time of it.
>
This isn't quite true. There is a complicated set of environment
variables, and checks in the code, to ensure that display is always
UTF-8, but it generally handles non-UTF-8 filenames gracefully.
Python on the other hand, just raises Unicode encoding/decoding
exceptions, and apps have to handle these to be graceful themselves.
I think Python 3 might make this a bit better though, by using Unicode
as the default string type, rather than the bytes in 2.x.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20111221/72b8a9ff/attachment-0001.pgp>
More information about the ubuntu-devel
mailing list