About encoding issues

Martin Pool mbp at sourcefrog.net
Mon Apr 24 03:59:53 BST 2006


On 24/04/2006, at 2:32 AM, Jan Hudec wrote:

> Hello,
>
> Hearing about encoding issues and seeing some myself I thought  
> about whether
> it is possible to disable automatic conversion between string and  
> unicode.
> And alas it is. One can do:
>
> sys.setdefaultencoding('undefined')
>
> from which moment on all automatic conversion attemtps raise an error.
> That is unless one's /etc/python2.4/site.py contains:
>
>     if hasattr(sys, "setdefaultencoding"):
>         del sys.setdefaultencoding
>
> (which I commented out here -- I wonder who invented that)

If I recall correctly, it's there because it's considered unsafe to  
change it while Python is running.

http://blog.ianbicking.org/illusive-setdefaultencoding.html

I like your approach - since all conversions should be explicitly  
handled, we might as well try to make it really be all, not just non- 
ascii conversions.

You can put it in sitecustomize.py in your pythonpath (e.g. in ~/lib/ 
python).  I'll try running with this for a while.

> Therefore I tried to do:
>
> === modified file 'a/bzrlib/tests/__init__.py'
> --- a/bzrlib/tests/__init__.py	
> +++ b/bzrlib/tests/__init__.py	
> @@ -887,6 +887,7 @@
>  def run_suite(suite, name='test', verbose=False, pattern=".*",
>                stop_on_failure=False, keep_output=False,
>                transport=None):
> +    sys.setdefaultencoding('undefined')
>      TestCaseInTempDir._TEST_NAME = name
>      if verbose:
>          verbosity = 2
>
> and see what happens. Unfortunately the errors it gave were pretty  
> useless:
>
> Traceback (most recent call last):
>   File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit
>   File "/usr/lib/python2.4/encodings/undefined.py", line 22, in decode
>     raise UnicodeError, "undefined encoding"
> UnicodeError: undefined encoding

Perhaps we're initializing logging in a way that provokes this?  It  
does seem possible to load and use logging with the default encoding  
of undefined.

> But if someone could look into it and managed to get proper  
> backtraces out of
> it, I think it would catch many of the encoding problems.

We can also try using rot_13 encoding, which may trap some bugs (or  
make them more obvious) but in a more delayed way.

-- 
Martin







More information about the bazaar mailing list