[MERGE] Make annotate behave in a non-ASCII world

Aaron Bentley aaron.bentley at utoronto.ca
Wed Jul 11 22:47:47 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Goffredo Baroncelli wrote:
>>
> My code doesn't solve the problem of which (and when) encoding we should 
> use....

True, but it also increases the scope of that problem, by introducing
the requirement that the output be unicode.

> My code solve the problem of writing both unicode data and 8-bit data in a 
> stream like StringIO (as is done in the diff code and in the annotate code).

No, your code makes that problem worse, because it forces the diff,
which is binary, to be decoded into text.

> StringIO has trouble mixing both unicode data and 8-bit strings.. 
> 
> try:
> 
>   $ python
>   Python 2.5.1 (r251:54863, May  2 2007, 16:27:44)
>   [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
>   Type "help", "copyright", "credits" or "license" for more information.
>   >>> import StringIO
>   >>> s=StringIO.StringIO();print 
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in <module>
>     File "/usr/lib/python2.5/StringIO.py", line 270, in getvalue
>       self.buf += ''.join(self.buflist)
>   UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: 
> ordinal not in range(128)

You example code does not have the effect you claim:
~$ python
Python 2.5.1 (r251:54863, May  2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import StringIO
>>> s=StringIO.StringIO();print

>>>


> My code try to solve the problem abowe. How you encode/decode the 8-bit string 
> is another question which you can solve passing the correct encoding to the 
> UnicodeStringIO constructor.

You can't pass a correct encoding to the UnicodeStringIO constructor.
Diffs may be in many encodings, or no single encoding.

However, your output will always be eight-bit.  The diffs will always be
eight-bit.  Instead of decoding the eight-bit strings, you should encode
the unicode strings.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlVAD0F+nu1YWqI0RAhHQAJ0bvfpmlAx39KPwwRisxM5FSc4X/ACcDIG4
b7gDpvi0ivbb37HXbTg2Ksk=
=PbEi
-----END PGP SIGNATURE-----



More information about the bazaar mailing list