[MERGE] Make annotate behave in a non-ASCII world
Goffredo Baroncelli
kreijack at tiscalinet.it
Wed Jul 11 20:45:00 BST 2007
On Wednesday 11 July 2007, Aaron Bentley wrote:
> Goffredo Baroncelli wrote:
> > this class returns always an unicode string. If the string passed to this
> > wrapper is an internal bazaar [meta]data (as user name or path which are
> > unicode string), the data are stored "as is". If the data is a string, the
> > data are _decoded_ with the selected encoding, then it is stored.
> >
> > Any thougts
>
> I think this is backwards. You frequently don't know what the encoding
> of your data is, e.g. with patches. An even if you do know, it may vary.
>
> It's much saner to turn unicode into the terminal encoding, which we
> always do know.
>
My code doesn't solve the problem of which (and when) encoding we should
use....
My code solve the problem of writing both unicode data and 8-bit data in a
stream like StringIO (as is done in the diff code and in the annotate code).
StringIO has trouble mixing both unicode data and 8-bit strings..
try:
$ python
Python 2.5.1 (r251:54863, May 2 2007, 16:27:44)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import StringIO
>>> s=StringIO.StringIO();print
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/StringIO.py", line 270, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0:
ordinal not in range(128)
My code try to solve the problem abowe. How you encode/decode the 8-bit string
is another question which you can solve passing the correct encoding to the
UnicodeStringIO constructor.
If you like to use the terminal encoding for the 8-bit string ("\xd8") you
can:
>>> s=UnicodeStringIO(decoding=sys.stdout.encoding or "ascii")
>>> s.write(u"\u00d8");s.write("\xd8")
>>> print s.getvalue()
�
Goffredo
> Aaron
>
--
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack at inwind.it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87 87C0 BB86 505C 6B2A CFF9
More information about the bazaar
mailing list