[RFC] Use utf-8 revision ids

Thu Feb 1 05:27:59 GMT 2007

On 31 Jan 2007, John Arbash Meinel <john at arbash-meinel.com> wrote:
> This patch is not ready to be completely merged, but I think the
> performance results show that it is worth being considered.
> 
> Right now all of our natively generated bzr revision ids and file ids
> are ascii only. Because we explicitly strip out the other characters.
> We've made the statement that they are Unicode, but we can pretty easily
> change that to saying they must be utf-8. Or even ascii only (you would
> have to encode your information somehow).
> 
> The attached patch changes the Knit reading code, so that it does not
> decode revision ids. Both as part of parsing the line deltas and
> fulltexts, and as part of parsing KnitIndex files.

This patch is ok with me.  It's a nice saving.

It does seem there may be some breakage if people have somehow generated
non-ascii ids, and their default encoding is not utf-8 -- they may get
an exception when trying to compare the str value to a unicode revision
id they got from somewhere else.  But such data should be rare, and if
we generally go towards treating these ids just as strs then it should
be ok.

-- 
Martin