[merge] Unicode Exception fixes

John Arbash Meinel john at arbash-meinel.com
Mon Aug 21 22:24:11 BST 2006


Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> Aaron Bentley wrote:
>>>
>>>> I would think we should
>>>> continue, and let get_cmd_object raise "unknown command" as usual.
>>>
>>> I could go either way. But raising an explicit error helps people
>>> realize that we don't support unicode command names.
>>>
>>> This could also be done on the register_command() side.
>>> At the very least this gives a nicer traceback.
> 
> I think
> 1. regular users should not care whether we support unicode command
> names.  When we get internationalized, I expect we *will* support
> unicode command names, anyhow.

I'm not as positive about this. But I would be okay leaving it as you
say, and just letting the standard 'command doesn't exist' handle it.

I'm a little concerned that it might cause errors later on since the
code doesn't expect a unicode string.

> 
> 2. yes, it would be nice to let developers know we don't support unicode
> command names at register_command time.
> 
> 
>>>> I think this is going too far; a string may be unicode while still being
>>>> convertible-to-ascii.  In any case, we may well be able to print errors
>>>> in unicode, which would be preferable.
>>>>
>>>> How about:
>>>
>>>
>>> You forget that utf-8 is a strict superset of ascii. All ascii
>>> characters are unchanged in utf-8. 
> 
> You're right; I got confuzzled by the earlier repr discussion.
> 
>>> So your proposal actually always
>>> gives the same result, only you have to catch yet another exception.
> 
> Well, not exactly the same result, because it also provides a
> __unicode__ builtin, which would allow us get the unicode
> representation.  OTOH, if we somehow get 8-bit arguments to an error, it
> will fail more often.
> 
> Aaron

Well, the other bug in what you proposed is that sometimes
'__unicode___' will actually return a plain string.

I think we need a way to change how we do error logging anyway. Because
we have other issues about displaying raw ascii URLs as part of errors,
rather than nicely encoded urls. So it would be nice to have some sort of:

BzrNewError.format_error(encoding='ascii')

sort of a thing.

But I'm also not 100% happy with how we do encoding reporting either.
Because of all the things that Alexander has commented on (should diff
use user_encoding or terminal encoding, what about log | less versus
diff > ,,foo.diff, etc etc).

Also, we have the problem that:

f = codecs.open('foo', 'wb', 'utf8')

f.write('\xb5\xb5')
will fail, because codecs always converts the argument.

Which is why we can't pass 'diff' a codec wrapper, because codec
wrappers translate everything, not just strings.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060821/7f79eb00/attachment.pgp 


More information about the bazaar mailing list