RFC: make ~/.bzr.log utf8

John Arbash Meinel john at arbash-meinel.com
Sat Dec 5 22:25:19 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> We can use errors=backslashreplace, or is that too new?
> 
> -Rob

So if the string is already Unicode, then we obviously won't have
problems encoding it as UTF-8. If it is an 8-bit string, the problem is
that we can't decode it to be able to encode it as utf-8. (Unless we
used .decode('utf-8'), but I don't think you can handle invalid data there.)

The best option seems to be:

1) If already UTF-8, write it out
2) If Unicode, encode to UTF-8
3) If .decode(UTF-8) fails, .decode(LATIN-1).encode(UTF-8)
   *or*, .encode('string_escape')


So something like:

if type(bytes) is unicode:
  outf.write(bytes.encode('utf-8')
elif type(bytes) is str:
  try:
      bytes.decode('utf8')
  except UnicodeDecodeError:
      outf.write(bytes.encode('string_escape'))
  else:
      outf.write(bytes) # it is utf-8 compatible
else:
  ???

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksa3c8ACgkQJdeBCYSNAAMXoQCg0c2OXTEMzXQW0ScEQDHE3W9v
3n4An3ixldCnvpM0dtUiyPAStUi8Sz0E
=I5SE
-----END PGP SIGNATURE-----



More information about the bazaar mailing list