[merge] cache encoding
Martin Pool
mbp at canonical.com
Tue Aug 15 05:49:33 BST 2006
On 14 Aug 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> Sure. But we don't check that, so we should update the email-address
> parser to require it. (Or at least the email => revision_id code).
>
> There is also punycode domain names. Which might be nicer as a real
> unicode string internally.
I can see presenting them to the user as unicode, but I don't see why
we'd want them internally as unicode. They're just ids; the computer
doesn't care.
> > Right, but if someone is converting from a source which has nonascii
> > bits, they can always do the escaping themselves, in the code which
> > specifically has to support it.
>
> Sure. But the only Unicode => ascii escaping I know of is urlescaping,
> though maybe UTF-7 would fall under that category as well. The problem
> with url escaping is that it has to be escaped again at the next layer.
I'd suggest something like '_%04x' on the unicode values - concise and
not needing double escaping.
> At this point, I don't think we can do filesystem-safe ascii. (no :, /,
> ", <, >, etc), and I think it would be overly restrictive to require
> them to be lowercase. (Especially thinking of SVN conversions, which
> then have to escape all sorts of things).
>
> I would be a little happier just having them be utf-8. But I'm okay with
> them being ASCII. I'm probably +0.25 on it, though. Being unicode gives
> us flexibility, but it might be reasonable for a while to assert that
> they are ascii.
I would be a little happier only supporting ascii, just to be
conservative and avoid trouble later on. But I won't insist on it.
--
Martin
More information about the bazaar
mailing list