About encoding issues
Martin Pool
mbp at sourcefrog.net
Mon Apr 24 03:59:53 BST 2006
On 24/04/2006, at 2:32 AM, Jan Hudec wrote:
> Hello,
>
> Hearing about encoding issues and seeing some myself I thought
> about whether
> it is possible to disable automatic conversion between string and
> unicode.
> And alas it is. One can do:
>
> sys.setdefaultencoding('undefined')
>
> from which moment on all automatic conversion attemtps raise an error.
> That is unless one's /etc/python2.4/site.py contains:
>
> if hasattr(sys, "setdefaultencoding"):
> del sys.setdefaultencoding
>
> (which I commented out here -- I wonder who invented that)
If I recall correctly, it's there because it's considered unsafe to
change it while Python is running.
http://blog.ianbicking.org/illusive-setdefaultencoding.html
I like your approach - since all conversions should be explicitly
handled, we might as well try to make it really be all, not just non-
ascii conversions.
You can put it in sitecustomize.py in your pythonpath (e.g. in ~/lib/
python). I'll try running with this for a while.
> Therefore I tried to do:
>
> === modified file 'a/bzrlib/tests/__init__.py'
> --- a/bzrlib/tests/__init__.py
> +++ b/bzrlib/tests/__init__.py
> @@ -887,6 +887,7 @@
> def run_suite(suite, name='test', verbose=False, pattern=".*",
> stop_on_failure=False, keep_output=False,
> transport=None):
> + sys.setdefaultencoding('undefined')
> TestCaseInTempDir._TEST_NAME = name
> if verbose:
> verbosity = 2
>
> and see what happens. Unfortunately the errors it gave were pretty
> useless:
>
> Traceback (most recent call last):
> File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit
> File "/usr/lib/python2.4/encodings/undefined.py", line 22, in decode
> raise UnicodeError, "undefined encoding"
> UnicodeError: undefined encoding
Perhaps we're initializing logging in a way that provokes this? It
does seem possible to load and use logging with the default encoding
of undefined.
> But if someone could look into it and managed to get proper
> backtraces out of
> it, I think it would catch many of the encoding problems.
We can also try using rot_13 encoding, which may trap some bugs (or
make them more obvious) but in a more delayed way.
--
Martin
More information about the bazaar
mailing list