[BUG] 54173 Output encoding
John Arbash Meinel
john at arbash-meinel.com
Wed Oct 18 07:35:33 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Richard Wilbur wrote:
> John,
>
> You commented on bug 54173 in Launchpad,
>
> "This is because note() et al assume the output encoding is utf8, rather
> than actually checking the console encoding.
> In general, we need to update so that note() and error() use the right
> encoding (mutter only goes to the log file, so utf8 is okay)."
>
> Which note() are you referring to? I found a list of 'def note('
Alexander pointed you correctly to 'bzrlib.trace.note()' and mutter(),
warning, etc are also in there.
...
> And when you mention 'note() et al', what are the others? Is this
> specifically console output from commands that we need to properly
> encode? I assume this generally refers only to filesystem entities like
> path names and file names since bzr is currently not translated into
> other languages.
>
> Do you recommend using bzrlib.osutils.get_user_encoding() as the source
> of the proper encoding?
>
> This bug looked interesting and it was unassigned so I took it on.
>
> Thanks,
>
> Richard
I think Alexander pointed out the bulk of it. Basically that note(),
warning(), and possibly error() should be taking Unicode strings, and
writing out information as terminal encoded. The ultimate fix is
probably to refactor how we write things, using a richer api to indicate
that we are writing out a URL or a local path, or etc. But the
short-term fix would be to wrap stderr so that Unicode strings get
translated based on terminal encoding.
Also, since note() and such are used for informing the user of things,
they shouldn't generally be used by scripts, so if a character cannot be
represented then it should just be replaced, rather than raising an
exception. In python terms it means you use
str.encode(encoding, 'replace')
str.encode(encoding) or str.encode(encoding, 'strict') or 'ignore.
Probably this would want to use a stream wrapper from the standard
module 'codecs', rather than doing a direct .encode() on the strings.
Also, this would certainly need tests, and should probably exercise
note(), warning(), and error() with encodings that can encode the
character, and ones that can't. And check that the output is properly
formatted.
A little care needs to be taken because of how the TestSuite hooks into
the logging system. (We don't want to actually print anything to the
screen while running the tests, but we record it for later in case there
was a failure).
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFNcs1JdeBCYSNAAMRAs+7AJ9UjlIg91oxtvXGffqj9kKANtOPRgCfT5GT
EjxLxd4QmvmEEjJ5UAzmkbM=
=lSNQ
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list