[BUG] Unicode string must be always used with encodings
Alexander Belchenko
bialix at ukr.net
Thu Sep 29 12:10:34 BST 2005
John A Meinel пишет:
> Well, on Mac, I'm getting really weird errors. Apparently, the terminal
> knows how to handle utf-8 encoded names, so I can do:
...
> added:
> bzr: ERROR: 'ascii' codec can't encode characters in position 0-7:
> ordinal not in range(128)
> at /Users/jameinel/dev/bzr/bzr.dev/bzrlib/delta.py line 98, in show_list()
> see ~/.bzr.log for debug information
>
> Now, if I just check locale.getpreferredencoding() it says "mac-roman".
> But if I try to print a unicode string, it tells me the same "ascii
> codec can't encode characters" stuff.
That is one more reason why I starting that thread.
On russian windows when I run bzr from console, sys.stdout.encoding is
'cp866'.
But when I run it from text editor as subprocess, sys.stdout.encoding is
None. And I prefer to forcing usage of locale.getprefferedencoding(),
i.e. 'cp1251' for my system.
> What I'm seeing, is that the original bzr status is treating the file as
> just a blob of characters (which happen to be utf-8 encoded), same with
> the initial add.
>
> However, these characters are then sent to ElementTree, and when they
> are read back in, they are considered "Unicode" characters, not utf-8
> characters.
May be there is need to write to XML file standard XML encoding declaration:
<?xml version="1.0" encoding="utf-8" ?>
BTW, the standard elementtree package from Fredrik Lundh contains module
called SimpleXMLWriter:
"The SimpleXMLWriter module contains a simple helper class for
applications that need to generate well-formed XML data."
May be it worth to use this module to generate well-formed XML-output?
> There is still a problem if you try to commit without a commit message,
> because it creates a StringIO and grabs the output of show_status(). The
> problem is that a cStringIO.StringIO() is an ascii codec. The good news
> is that StringIO.StringIO() seems capable of handling unicode.
...
> Now, we still have some problems when writing to files, so we might need
> to fall back to using "codecs.open()"
I'm also faced with problem related to StringIO in test case for log
command. At now as simple solution I change this part of test to use
real file on disk and use codecs.open() for transparent decoding.
Alexander.
More information about the bazaar
mailing list