[MERGE] Make annotate behave in a non-ASCII world

Goffredo Baroncelli kreijack at tiscalinet.it
Wed Jul 11 20:45:00 BST 2007


On Wednesday 11 July 2007, Aaron Bentley wrote:
> Goffredo Baroncelli wrote:
> > this class returns always an unicode string. If the string passed to this
> > wrapper is an internal bazaar [meta]data (as user name or path which are
> > unicode string), the data are stored "as is". If the data is a string, the
> > data are _decoded_ with the selected encoding, then it is stored.
> >
> > Any thougts
> 
> I think this is backwards.  You frequently don't know what the encoding
> of your data is, e.g. with patches.  An even if you do know, it may vary.
> 
> It's much saner to turn unicode into the terminal encoding, which we
> always do know.
> 
My code doesn't solve the problem of which (and when) encoding we should 
use....

My code solve the problem of writing both unicode data and 8-bit data in a 
stream like StringIO (as is done in the diff code and in the annotate code).

StringIO has trouble mixing both unicode data and 8-bit strings.. 

try:

  $ python
  Python 2.5.1 (r251:54863, May  2 2007, 16:27:44)
  [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import StringIO
  >>> s=StringIO.StringIO();print 
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib/python2.5/StringIO.py", line 270, in getvalue
      self.buf += ''.join(self.buflist)
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: 
ordinal not in range(128)

My code try to solve the problem abowe. How you encode/decode the 8-bit string 
is another question which you can solve passing the correct encoding to the 
UnicodeStringIO constructor.

If you like to use the terminal encoding for the 8-bit string ("\xd8") you 
can:

>>> s=UnicodeStringIO(decoding=sys.stdout.encoding or "ascii")
>>> s.write(u"\u00d8");s.write("\xd8")
>>> print s.getvalue()
�

Goffredo

> Aaron
> 



-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack at inwind.it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87  87C0 BB86 505C 6B2A CFF9



More information about the bazaar mailing list