RFC: make ~/.bzr.log utf8
John Arbash Meinel
john at arbash-meinel.com
Sat Dec 5 22:25:19 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> We can use errors=backslashreplace, or is that too new?
>
> -Rob
So if the string is already Unicode, then we obviously won't have
problems encoding it as UTF-8. If it is an 8-bit string, the problem is
that we can't decode it to be able to encode it as utf-8. (Unless we
used .decode('utf-8'), but I don't think you can handle invalid data there.)
The best option seems to be:
1) If already UTF-8, write it out
2) If Unicode, encode to UTF-8
3) If .decode(UTF-8) fails, .decode(LATIN-1).encode(UTF-8)
*or*, .encode('string_escape')
So something like:
if type(bytes) is unicode:
outf.write(bytes.encode('utf-8')
elif type(bytes) is str:
try:
bytes.decode('utf8')
except UnicodeDecodeError:
outf.write(bytes.encode('string_escape'))
else:
outf.write(bytes) # it is utf-8 compatible
else:
???
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAksa3c8ACgkQJdeBCYSNAAMXoQCg0c2OXTEMzXQW0ScEQDHE3W9v
3n4An3ixldCnvpM0dtUiyPAStUi8Sz0E
=I5SE
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list