user_encoding fix
John A Meinel
john at arbash-meinel.com
Mon Feb 20 18:17:05 GMT 2006
Nir Soffer wrote:
>
> On 20/02/2006, at 18:04, John A Meinel wrote:
>
>> Users need to set LANG on Mac OSX anyway. Otherwise 'ls' and friends
>> won't do the right thing. I came across that before I had problems with
>> bzr. (I've never used bzr to control unicode filenames in a real
>> application, but I have created unicode filenames for personal stuff).
>
> Requiring users to set LANG is a problem, bzr should simply work out of
> the box for the common case. But if it really works, maybe its a better
> solution.
>
> I tested LANG on 10.3 (same result in x11 terminal and Terminal.app):
>
> $ LANG=en_US.UTF-8 python -c 'import locale; print
> locale.getpreferredencoding()'
> mac-roman
>
> So it does not help to get the input encoding.
>
> $ LANG=en_US.UTF-8 python -c 'import sys; print sys.stdout.encoding'
> UTF-8
>
> Works for the output encoding, so the special checks for darwin can be
> eliminated.
>
>
>
> Best Regards,
>
> Nir Soffer
>
We already have a workaround for the fact that darwin doesn't set
preferredencoding() properly. Specifically we have:
if sys.platform == 'darwin':
# work around egregious python 2.4 bug
sys.platform = 'posix'
import locale
sys.platform = 'darwin'
else:
import locale
Which basically forces python to honor the LANG setting, even on Mac. On
other platforms, it does the right thing. On Mac it is (incorrectly?)
hardcoded to Mac-Roman because of legacy issues.
The code in question is in locale.py. For python2.4 we have:
if sys.platform in ('win32', 'darwin', 'mac'):
# On Win32, this will return the ANSI code page
# On the Mac, it should return the system encoding;
# it might return "ascii" instead
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using."""
import _locale
return _locale._getdefaultlocale()[1]
else:
# On Unix, if CODESET is available, use that.
try:
CODESET
except NameError:
# Fall back to parsing environment variables :-(
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using,
by looking at environment variables."""
return getdefaultlocale()[1]
else:
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using,
according to the system configuration."""
if do_setlocale:
oldloc = setlocale(LC_CTYPE)
setlocale(LC_CTYPE, "")
result = nl_langinfo(CODESET)
setlocale(LC_CTYPE, oldloc)
return result
else:
return nl_langinfo(CODESET)
Which basically says that use the compiled-in default for win32 and
darwin, but use LANG or LC_CTYPE for all other platforms. At least that
is how I read the above code.
Specifically, you can test it with this:
LANG=en_US.UTF-8 python -c "import sys; sys.platform = 'posix'; import
locale; print locale.getpreferredencoding()"
mac-roman is a really poor encoding, and doesn't match up to what the
terminal can do.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060220/71222f6d/attachment.pgp
More information about the bazaar
mailing list