unicode issue in osutils.contains_whitespace

John Arbash Meinel john at arbash-meinel.com
Sat Jun 17 23:08:43 BST 2006


Aaron Bentley wrote:

...

> Hmm, so string.whitespace is a bytestring that contains characters which
> may or may not be whitespace, depending on how you decode it, and it
> doesn't enforce that you decode it using the right encoding.  Urgh.
> 
> I think at minimum, we should decode string.whitespace to Unicode.  We
> should also consider detecting other Unicode whitespace characters.
> 
> Aaron

I think if we really care we should use a regex, and then do:

_whitespace_re = re.compile('\s', re.UNICODE)

if _whitespace_re.search(s):
  return True
return False

That lets us check for any Unicode whitespace characters (that python
recognizes). And it means a single pass over the string, rather than a
pass for every possible whitespace character.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060617/a1aec36a/attachment.pgp 


More information about the bazaar mailing list