[Bulk] Re: Python 3

Wed Jun 23 16:45:46 BST 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gordon Tyler wrote:
> On 22/06/2010 9:42 PM, Robert Collins wrote:
>> For the bytes stuff I've mentioned - see
>> http://bugs.python.org/issue5425 - 2to3 converts str to 'string' not
>> 'bytes', so we have to mark up everything, and only Python 2.6
>> supports b'foo'. If we change the importer to go to bytes for 'str',
>> then we end up with all our literals that are meant for humans as
>> bytes, which is also wrong. And while we support < 2.6, we can't
>> annotate strings properly for bytes recognition without something like
>> a domain specific language - e.g. a _b() function.
> 
> Why aren't all string literals meant for users marked as unicode in the
> current codebase?
> 
> Ciao,
> Gordon
> 
> 

I would say the #1 reason (by a large margin) is that they "didn't need
to be". As such, someone needs to put in the time to evaluate whether it
is worthwhile.

There are also things like "unicode + str == unicode" so if you have
your hard-coded strings as plain 'str', then if you ever concatenate you
don't accidentally auto-cast.

There are also things like diff headers, where it is a bit more unclear
whether it is valid to have them as Unicode. (even though users see them)

I also don't know how long __unicode__ has existed, but certainly in the
*bzrlib* codebase we haven't done unicode(exception) we've done
str(exception).

This also gets a bit messy when dealing with OS errors. Where it is very
common to have the OS return a localized error message in some 8-bit
string, which we then want to combine with say a Unicode path.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwiLCoACgkQJdeBCYSNAANYjACfTq4F2R8ipftqCefbjM8vKmog
4MsAoMGaFHqFIgt2GJxvl+d+i4dG/ZJN
=SoIe
-----END PGP SIGNATURE-----