About encoding issues

Jan Hudec bulb at ucw.cz
Mon Apr 24 09:52:42 BST 2006


On Mon, Apr 24, 2006 at 12:59:53 +1000, Martin Pool wrote:
> On 24/04/2006, at 2:32 AM, Jan Hudec wrote:
> 
> >Hello,
> >
> >Hearing about encoding issues and seeing some myself I thought  
> >about whether
> >it is possible to disable automatic conversion between string and  
> >unicode.
> >And alas it is. One can do:
> >
> >sys.setdefaultencoding('undefined')
> >
> >from which moment on all automatic conversion attemtps raise an error.
> >That is unless one's /etc/python2.4/site.py contains:
> >
> >    if hasattr(sys, "setdefaultencoding"):
> >        del sys.setdefaultencoding
> >
> >(which I commented out here -- I wonder who invented that)
> 
> If I recall correctly, it's there because it's considered unsafe to  
> change it while Python is running.
> 
> http://blog.ianbicking.org/illusive-setdefaultencoding.html
> 
> I like your approach - since all conversions should be explicitly  
> handled, we might as well try to make it really be all, not just non- 
> ascii conversions.
> 
> You can put it in sitecustomize.py in your pythonpath (e.g. in ~/lib/ 
> python).  I'll try running with this for a while.
> 
> >Therefore I tried to do:
> >
> >=== modified file 'a/bzrlib/tests/__init__.py'
> >--- a/bzrlib/tests/__init__.py	
> >+++ b/bzrlib/tests/__init__.py	
> >@@ -887,6 +887,7 @@
> > def run_suite(suite, name='test', verbose=False, pattern=".*",
> >               stop_on_failure=False, keep_output=False,
> >               transport=None):
> >+    sys.setdefaultencoding('undefined')
> >     TestCaseInTempDir._TEST_NAME = name
> >     if verbose:
> >         verbosity = 2
> >
> >and see what happens. Unfortunately the errors it gave were pretty  
> >useless:
> >
> >Traceback (most recent call last):
> >  File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit
> >  File "/usr/lib/python2.4/encodings/undefined.py", line 22, in decode
> >    raise UnicodeError, "undefined encoding"
> >UnicodeError: undefined encoding
> 
> Perhaps we're initializing logging in a way that provokes this?  It  
> does seem possible to load and use logging with the default encoding  
> of undefined.
> 
> >But if someone could look into it and managed to get proper  
> >backtraces out of
> >it, I think it would catch many of the encoding problems.
> 
> We can also try using rot_13 encoding, which may trap some bugs (or  
> make them more obvious) but in a more delayed way.

Yes. I thought about this as well. But it would be harder to debug.

What we could do is write a special codec, that would log all it's uses
to special log (it would have to avoid getting it's output hidden by
the test runner) and then call to ascii codec. But it needs someone to
find out how to register a codec and implement it.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060424/efa91924/attachment.pgp 


More information about the bazaar mailing list