Identify automatic str/unicode coercions

Andrew Bennetts andrew at canonical.com
Wed Jun 11 07:45:48 BST 2008


Martin von Gagern wrote:
[...]
> The idea is that automatic conversions between byte and unicode strings  
> should be avoided, as they are bound to fail if a string contains  
> non-ASCII characters. Instead, all conversions should be done 
> ecplicitely.
>
> I liked the idea, especially as str/unicode conversions currently give  
> me headaches in https://bugs.launchpad.net/bzr/+bug/128496 (in  
> combination with bzr-svn). The main problem is the question of how to  
> enforce this policy. A solution is to override the default encoding by a  
> specil encoding, which logs all access before performing default ascii  
> encoding. That's an idea originally proposed by Jan Hudec, but I found  
> no implementation for it yet. Now there is a basic proof of concept:  
> https://code.launchpad.net/~gagern/bzr/str-unicode

An simpler option is set the default encoding to "undefined", and run the test
suite.  "undefined" raises an error on any implicit conversion:

    >>> import sys
    >>> reload(sys)
    >>> sys.setdefaultencoding('undefined')
    >>> str(u'')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.5/encodings/undefined.py", line 19, in encode
        raise UnicodeError("undefined encoding")
    UnicodeError: undefined encoding

Unfortunately, this currently breaks a lot of things, including the
running of the test suite :)

-Andrew.




More information about the bazaar mailing list