user_encoding fix

John A Meinel john at arbash-meinel.com
Mon Feb 20 16:04:33 GMT 2006


Alexander Belchenko wrote:
> Nir Soffer пишет:
>>
>> On 20/02/2006, at 12:42, Alexander Belchenko wrote:
>>>
>>> So, I think right code should be as following:
>>>
>>> if getattr(sys.stdout, "encoding", "ascii") not in (None, "ascii"):
>>>     encoding = sys.stdout.encoding
>>> else:
>>>     encoding = bzrlib.user_encoding
>>
>> Sounds good, but on Mac OS X I get 'US-ASCII', so you need this:
>>
>> if getattr(sys.stdout, 'encoding', None) not in (None, 'ascii',
>> 'US-ASCII'):
>>     encoding = sys.stdout.encoding
>> else:
>>     encoding = bzrlib.user_encoding
> 
> +1 on this variant. John, what you say about this?
> 
>> btw, when running with a pipe, sys.stdout does have encoding and is
>> set to None.
> 
> Probably, you're right. I don't remeber exactly in which case sys.stdout
> or sys.stdin(?) does not have encoding attribute at all.
> 
> -- 
> Alexander

I'm okay adding some more ASCII statements, but I'm also thinking that
we should probably just special case OS X. So we just do:

if sys.platform == 'darwin':
  encoding = 'utf-8'
elif getattr(sys.stdout, 'encoding', None) not in (None, 'ascii'):
  encoding = sys.stdout.encoding
else:
  encoding = bzrlib.user_encoding

The reason I say that, is I think the OS X platform is small enough, and
it explicitly states how the terminal will act.

Now, I want to test this against the X terminal, as well as the built-in
terminal, and any other terminals on Mac.

I do believe that I force LANG=en_US.UTF-8 on my mac, because programs
other than bzr required it, and Mac doesn't set LANG on its own. (So I
haven't had these problems, because I fixed it with my configuration,
rather than changing bzr.)

I just did a quick test of XDarwin, and it turns out that the X server
doesn't support unicode at all. So the output really is ASCII.

And we have another problem. If I create a unicode filename on the mac,
and then do LANG=en_US.UTF-8 ls, I get a different result than if I use
LANG=US-ASCII ls

Now, there seems to be more problems, because the mac native 'ls' acts
differently than the Fink version of ls. (Mac native fails to display
unicode filenames properly, though it almost gets it right if you set
LANG=*.UTF-8).

Attached are some screenshots to explain what I'm finding.

Now, what is really weird is that 'ls *.txt | vim -' is always
interpreted correctly as utf-8 encoded. Apparently if you pipe the
output of ls it assumes utf-8 encoding. And what is even weirder is that
native ls gets it right, even though it gets it wrong if you just set
LANG and have it print to the terminal.



So here is my summary:

Users need to set LANG on Mac OSX anyway. Otherwise 'ls' and friends
won't do the right thing. I came across that before I had problems with
bzr. (I've never used bzr to control unicode filenames in a real
application, but I have created unicode filenames for personal stuff).

We could force output to be UTF-8 on Mac, but I don't think that would
work correctly when piping the output into something else.

If we can show that it would work, then I'm okay with forcing it. I need
people to test it to make sure, though.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: x-term.png
Type: image/png
Size: 4078 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060220/4a377801/attachment.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: term.png
Type: image/png
Size: 7600 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060220/4a377801/attachment-0001.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060220/4a377801/attachment.pgp 


More information about the bazaar mailing list