Encoding woes

Mon Jan 2 20:54:45 GMT 2006

On Fri, 2005-12-30 at 15:15 -0600, John A Meinel wrote:
> > Which is exactly the reason unicode should be required. Because otherwise I,
> > as a front-end writer, can easily forget to decode the input and not notice
> > until much later.
> 
> What I would prefer, rather than "isinstance(param, unicode)" is to have:
> 
> param = unicode(param)
> 
> Which means that internally the string would be converted into a unicode
> string. If it was non-ascii, python will fail to decode it.
> 
> There is a lot of places (especially in test code) where it is far
> easier to just write ascii strings.
> I really would prefer not to require adding u everywhere.

Yah. The problem with just 'unicode(foo)' is (IIRC) that it uses the
default encoding for the .decode - which is exactly how we can get into
trouble.
So I'm proposing that all 'str' strings that we need to coerce into
unicode be considered utf8 - and that that be the documented api for the
library. 

Alternatively, we can say 'all 'str' strings are ascii only', which will
trap encoded strings being sent in as 'str' instances.

I've no particular preference on that aspect of the solution.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060103/454c4b43/attachment.pgp