[BUG] Unicode string must be always used with encodings

Alexander Belchenko bialix at ukr.net
Thu Sep 29 12:10:34 BST 2005


John A Meinel пишет:
> Well, on Mac, I'm getting really weird errors. Apparently, the terminal
> knows how to handle utf-8 encoded names, so I can do:
...
> added:
>   bzr: ERROR: 'ascii' codec can't encode characters in position 0-7:
> ordinal not in range(128)
>   at /Users/jameinel/dev/bzr/bzr.dev/bzrlib/delta.py line 98, in show_list()
>   see ~/.bzr.log for debug information
> 
> Now, if I just check locale.getpreferredencoding() it says "mac-roman".
> But if I try to print a unicode string, it tells me the same "ascii
> codec can't encode characters" stuff.

That is one more reason why I starting that thread.
On russian windows when I run bzr from console, sys.stdout.encoding is 
'cp866'.
But when I run it from text editor as subprocess, sys.stdout.encoding is 
None. And I prefer to forcing usage of locale.getprefferedencoding(), 
i.e. 'cp1251' for my system.

> What I'm seeing, is that the original bzr status is treating the file as
> just a blob of characters (which happen to be utf-8 encoded), same with
> the initial add.
> 
> However, these characters are then sent to ElementTree, and when they
> are read back in, they are considered "Unicode" characters, not utf-8
> characters.

May be there is need to write to XML file standard XML encoding declaration:

<?xml version="1.0" encoding="utf-8" ?>

BTW, the standard elementtree package from Fredrik Lundh contains module 
called SimpleXMLWriter:

"The SimpleXMLWriter module contains a simple helper class for 
applications that need to generate well-formed XML data."

May be it worth to use this module to generate well-formed XML-output?

> There is still a problem if you try to commit without a commit message,
> because it creates a StringIO and grabs the output of show_status(). The
> problem is that a cStringIO.StringIO() is an ascii codec. The good news
> is that StringIO.StringIO() seems capable of handling unicode.
...
> Now, we still have some problems when writing to files, so we might need
> to fall back to using "codecs.open()"

I'm also faced with problem related to StringIO in test case for log 
command. At now as simple solution I change this part of test to use 
real file on disk and use codecs.open() for transparent decoding.

Alexander.





More information about the bazaar mailing list