[merge] cache encoding

holger krekel holger at merlinux.de
Sat Aug 12 07:14:12 BST 2006

On Thu, Aug 10, 2006 at 09:07 -0500, John Arbash Meinel wrote:
> Attached is a bundle that caches encode/decode from and to utf8. The
> biggest application for this is the fact that when you commit a new
> kernel tree, it has to annotate every line in the tree with the current
> revision. The specific location that I saw was this line in knit
> return ['%s %s' % (o.encode('utf-8'), t) for o, t in content._lines]
> So basically, it was doing a new encode for *every* line. Which with a
> new kernel tree, you have 7.7M lines. This doesn't account for a huge
> portion of the overall time (only about 45s/10min). But it doesn't hurt
> to do it faster.

Ouch.  Btw, is there documentation on the general strategy how
bzr deals with unicode?  It does not use the somewhat common scheme
of "always use unicode, only convert at specified barriers", does it?



