Encoding woes

Jan Hudec bulb at ucw.cz
Fri Dec 30 21:25:33 GMT 2005


On Fri, Dec 30, 2005 at 15:15:50 -0600, John A Meinel wrote:
> What I would prefer, rather than "isinstance(param, unicode)" is to have:
> 
> param = unicode(param)
> 
> Which means that internally the string would be converted into a unicode
> string. If it was non-ascii, python will fail to decode it.

Except that that is unfortunately false :-(. AFAIK the unicode constructor
decodes using an encoding, that is dependent on the phase of the moon, value
of the kilogramagesalad field and, last but not least, site init script. See
default /etc/python2.?/site.py on Debian/Ubuntu - it first sets the default
encoding (to ascii, but there is a comment allowing user to change that to
locale) and then deletes the setdefaultencoding function. Making the whole
default encoding thing completely useless.

> There is a lot of places (especially in test code) where it is far
> easier to just write ascii strings.
> I really would prefer not to require adding u everywhere.

Personally I prefer the perl way. If I tell perl:

use utf8;

then for the rest of the source, ALL string literals are unicode. Python has
a way of telling it the encoding, but no way of telling it I want string
literals to default to unicode. But given that there is no way to do that,
I prefer to write u anywhere to maintain my sanity. But I can't speek for
anyone else, of course.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051230/12f7cf7b/attachment.pgp 


More information about the bazaar mailing list