unicode issue in osutils.contains_whitespace
John Arbash Meinel
john at arbash-meinel.com
Sat Jun 17 23:08:43 BST 2006
Aaron Bentley wrote:
...
> Hmm, so string.whitespace is a bytestring that contains characters which
> may or may not be whitespace, depending on how you decode it, and it
> doesn't enforce that you decode it using the right encoding. Urgh.
>
> I think at minimum, we should decode string.whitespace to Unicode. We
> should also consider detecting other Unicode whitespace characters.
>
> Aaron
I think if we really care we should use a regex, and then do:
_whitespace_re = re.compile('\s', re.UNICODE)
if _whitespace_re.search(s):
return True
return False
That lets us check for any Unicode whitespace characters (that python
recognizes). And it means a single pass over the string, rather than a
pass for every possible whitespace character.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060617/a1aec36a/attachment.pgp
More information about the bazaar
mailing list