is utf-8 the standard filename encoding?
Roberto Alsina
roberto.alsina at canonical.com
Wed Dec 21 12:16:37 UTC 2011
On 12/20/2011 11:51 PM, Martin Pool wrote:
> We have a question in<https://bugs.launchpad.net/bugs/794353> and
> <http://bugs.python.org/issue13643> about what encoding bzr and Python
> ought to assume for file names if there is no locale configured.
>
> As a specific example, if you run a Python program from cron, it has
> no locale by default. It tries to decode filenames as ascii. If it
> encounters a non-ascii filename, it will likely crash. People hit
> this kind of thing a lot with bzr; we have put in a workaround but it
> seems it would be better to fix it in Python.
>
> My impression is the vast majority of filesystems use utf-8 names, and
> that other Ubuntu software (Nautilus? U1?) assumes this will generally
> be true. Does Ubuntu have any policy that filenames ought to be in
> UTF-8?
>
> (I see a bit of discussion in
> http://www.cl.cam.ac.uk/~mgk25/unicode.html#linux but nothing more.)
>
Yes, on Linux, most of thetime it's UTF-8. However:
* Most of the time is not all the time ;-)
* People sometimes have files where the name is not valid UTF-8, even on
filesystems where UTF-8 is the standard
On U1 we have a lot of code to handle this, because we also deal with
windows, where things are completely different.
More information about the ubuntu-devel
mailing list