Identify automatic str/unicode coercions
Andrew Bennetts
andrew at canonical.com
Wed Jun 11 07:45:48 BST 2008
Martin von Gagern wrote:
[...]
> The idea is that automatic conversions between byte and unicode strings
> should be avoided, as they are bound to fail if a string contains
> non-ASCII characters. Instead, all conversions should be done
> ecplicitely.
>
> I liked the idea, especially as str/unicode conversions currently give
> me headaches in https://bugs.launchpad.net/bzr/+bug/128496 (in
> combination with bzr-svn). The main problem is the question of how to
> enforce this policy. A solution is to override the default encoding by a
> specil encoding, which logs all access before performing default ascii
> encoding. That's an idea originally proposed by Jan Hudec, but I found
> no implementation for it yet. Now there is a basic proof of concept:
> https://code.launchpad.net/~gagern/bzr/str-unicode
An simpler option is set the default encoding to "undefined", and run the test
suite. "undefined" raises an error on any implicit conversion:
>>> import sys
>>> reload(sys)
>>> sys.setdefaultencoding('undefined')
>>> str(u'')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/encodings/undefined.py", line 19, in encode
raise UnicodeError("undefined encoding")
UnicodeError: undefined encoding
Unfortunately, this currently breaks a lot of things, including the
running of the test suite :)
-Andrew.
More information about the bazaar
mailing list