[Merge] Not all whitespace is unicode

Thu Feb 1 15:00:30 GMT 2007

On Thu, Feb 01, 2007 at 08:50:43AM -0600, John Arbash Meinel wrote:
> Specifically, I can see that here we have:
> whitespace = ' \t\n\r\v\f'
> 
> Even weirder, though, is that if I do:
> 
> python -c "import string; print repr(string.whitespace))"
> 
> I get:
> '\t\n\x0b\x0c\r '
> 
> Where somehow the 0x20 has moved from the beginning of the string to the
> end. I can understand that \v == \x0b and and \f == \x0c, but I really
> don't understand how the ' ' moved from being at the beginning to being
> at the end. And '\r' has moved, too. My best guess is that some other
> class (locale?) is overwritting string.whitespace based on the current
> locale. Which would also explain how '\xa0' shows up.

/usr/lib/python2.4/string.py, lines 522--531:

    # Try importing optional built-in module "strop" -- if it exists,
    # it redefines some string operations that are 100-1000 times faster.
    # It also defines values for whitespace, lowercase and uppercase
    # that match <ctype.h>'s definitions.

    try:
        from strop import maketrans, lowercase, uppercase, whitespace
        letters = lowercase + uppercase
    except ImportError:
        pass                                          # Use the original versions

Marius Gedminas
-- 
Computers are not intelligent.  They only think they are.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070201/b34fe5b2/attachment.pgp