Encoding woes

Tue Dec 27 02:57:46 GMT 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
| I think that our internal code should be generally plain strings: Even
| if were to require u'' everywhere, other library users will not realise
| this, and chaos will ensue. And requiring isinstance(foo, unicode)
| everywhere would be just nasty.

I think that if our internal code uses bytestrings everywhere, then we
need to specify encoding everywhere.  We need redundant encode/decode
steps.  I think that's worse than using unicode everywhere we can.

| So code that uses public apis should *always* be safe if passing in
| ascii strings inside python.

I don't think ascii strings are at issue.  We can easily force those
into unicode.  It's the bytestrings with high-bit characters that are
the problem.

I'd like to see unicode for semantically textual data and bytestrings
for binary data.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDsKx40F+nu1YWqI0RAoLlAJ0TJHMUmb7CvM+7M6jO6xex3qZPSgCePB45
f0o1H/NuqTLum/wZsaTV7jc=
=UEb1
-----END PGP SIGNATURE-----