[MERGE] More support for non-ascii revision ids

John Arbash Meinel john at arbash-meinel.com
Sat Feb 17 20:51:16 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've started looking into converting our file-id code to use utf-8 file
ids instead of Unicode file ids. And the further I dig into the
internals, the more I'm becoming aware that we don't actually support
Unicode revision ids or file ids.

For example file_ids_affected_by_revision_ids doesn't do any actual utf8
decoding, nor did it support xml escaped unicode characters (é).
This patch adds support for it, though it's mostly as I move forward to
start changing file-ids to be utf8.

Having done all of this work, it is pretty clear to me that bzr <= 0.15
never really fully supported Unicode revision or file ids. So I think we
have the opportunity to say that they they must be ascii if we really want.

In testing, there is an interesting discrepancy, though. If I take a
reasonable length string (approx the length of a file id) and do:

coder = codecs.getencoder('ascii')
coder(test_id)

Then I see the following

	utf8 	ascii
encode	0.853us	0.967us
decode	2.230us 1.130us

So decoding is more expensive than encoding. But that utf8 decode is
much more expensive than ascii decode, even though utf8 encode is
*cheaper* than asci encode.

Anyway, the attached patch adds some tests and ensures support in a few
more places for non-ascii file ids and revision ids. I'm planning on
adding on to this, to actually process file ids like I did revision ids,
and leave everything in 8-bit strings.

John
=:->


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF12rDJdeBCYSNAAMRApIFAJ9wUcJmY5VMQBmzR0egRzHMh25T/QCfYP5D
io+NhucSFf3IhZTXNdNMFaA=
=5rKi
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utf8_revision_ids.patch
Type: text/x-patch
Size: 11931 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070217/cdd39a41/attachment.bin 


More information about the bazaar mailing list