unicode issue in osutils.contains_whitespace
Aaron Bentley
aaron.bentley at utoronto.ca
Sat Jun 17 23:03:14 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Lukáš Lalinský wrote:
> I tried to convert one Subversion repository into Bzr using Tailor and I was
> getting an unicode TypeError from osutils.contains_whitespace.
It would help if you included the traceback from ~/.bzr.log. But I'm
going to assume it came from KnitVersionedFile._check_add, and in that
context, unicode arguments are permitted.
> I'm not sure if
> this function is supposed to be Unicode safe. If no, then this is a Tailor bug
> and you can just ignore this e-mail. :)
Supporting Unicode properly is important to bzr.
> The problem is that Tailor sends unicode strings to bzrlib, and when
> contains_whitespace is looking for whitespace characters it uses
> string.whitespace. But string.whitespace contains a '\xa0' (non-breaking space
> in ISO-8859-1), which is not convertable to unicode using the default 'ascii'
> codec and so it raises a TypeError exception ("TypeError: 'in <string>' requires
> string as left operand").
Hmm, so string.whitespace is a bytestring that contains characters which
may or may not be whitespace, depending on how you decode it, and it
doesn't enforce that you decode it using the right encoding. Urgh.
I think at minimum, we should decode string.whitespace to Unicode. We
should also consider detecting other Unicode whitespace characters.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFElHwh0F+nu1YWqI0RAp4NAJ47cu/42m1naFbGseQyUFOJ9dwIBwCfY4Hd
wbShVkAqn1HFn0AV6Tx+O0o=
=2rFH
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list