format string should be unicode instead byte string

Martin Pool mbp at canonical.com
Mon Sep 7 08:22:57 BST 2009


2009/9/7 Robert Collins <robertc at robertcollins.net>:
> Implicit ascii->unicode conversions are definitely bad.
>
> We should rather, I think, teach error.py and other places that str
> should be utf8-decoded, than splatter u'' on all our strings,

I'm not sure at what time you want this decoding to happen, and which
strs will be assumed to be in utf8.  When eg formatting the error
messages?

We could for instance define all the _fmts as byte strings and then
convert them to unicode before %-interpolation.

> for a
> couple of reasons:
>
>  - we have less code that processes strings, than strings.
>  - its likely to be noticably expensive to construct all the static
> strings we need when compiling and loading pyc files.
>
> The latter we should fix by doing less work at startup *anyway*, but
> that is then, this is now.



-- 
Martin <http://launchpad.net/~mbp/>



More information about the bazaar mailing list