[MERGE] switch to using utf-8 revision ids

John Arbash Meinel john at arbash-meinel.com
Tue Feb 13 16:33:25 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> The attached patch changes the internals to assume that revision ids are
> utf-8 strings, rather than being Unicode strings.
> 

In merging with bzr.dev, the only questionable thing I had to change was
osutils.contains_whitespace().

Because now when we pass in an 8-bit string, it will automatically
up-cast it to unicode. It seems "unicode in str" will upcast the str to
unicode first. And I'm guessing the bug we used to have was that "str in
unicode" would also upcast the str.

The problem is that now I'm passing str objects which are utf-8, and so
don't upcast automatically. The simple fix is to just change the
whitespace list to be a plain string.

for ch in ' \t\n\v\f':
  if ch in s:
    return True
else:
  return False

A different way to do it would be to make the whitespace list
unicode/ascii depending on what type the incoming string is

whitespace = ' \t\n\r\v\f'
if isinstance(s, unicode):
  whitespace = unicode(whitespace)
for ch in whitespace:
  if ch in s:
    return True
else:
  return False


We also could switch the whole thing to being a regex.

_whitespace = re.compile('\s')
if _whitespace.search(s):
  return True
return False


That would find the 'standard' whitespace. If we really wanted unicode
whitespace we could do

_whitespace = re.compile('\s', re.UNICODE)

I have the feeling the regex version would actually be faster, but it
isn't a function that we call very often.

I do believe we primarily call it for things like file-ids and
revision-ids, so it should be set up for working on utf-8 strings,
rather than just unicode strings.

The only other place it is used is for the key portion of revision
properties.

For now, I've just switched to using a plain string for the whitespace
list. But I thought I would bring it up.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF0ehVJdeBCYSNAAMRAhk0AJ46MpKBdyScPbMmjTBc5zdQJwC+fACg2CYA
ndze+ehnhxqF2oHrhVwh0cU=
=y/Ik
-----END PGP SIGNATURE-----



More information about the bazaar mailing list