[merge] cache encoding
Martin Pool
mbp at canonical.com
Mon Aug 14 02:01:34 BST 2006
On 12 Aug 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> Well, bzr itself has not been able to create anything but restricted
> ascii revision ids and file ids for a while.
>
> I guess we could get non-ascii revision ids if the user's email
> contained a non-ascii character. The file-id generator removes
> everything that matches '[^\w.]' which I believe expands to something
> like "a-zA-Z_." (I should even removes '-')
It's fairly common to have nonascii characters in the 'real name' part
of the email address but I don't think they're valid, or at least
they're very rare, in the actual address itself, which is all we want.
> For revision ids we use:
> s = '%s-%s-' % (self._config.user_email(),
> compact_date(self._timestamp))
> s += hexlify(rand_bytes(8))
>
> Obviously timestamp and hexlify won't generate non-ascii.
>
> The problem, though is code like 'Tailor', et al. I know we are safe for
> baz=>bzr, because Arch never supported anything other than ASCII (at
> least not officially).
I don't know of any other system which does, so I don't see why Tailor
would be trying to support them.
We can't be constrained to continue permitting everything which was just
not prohibited before.
> I realize that revision ids are mostly arbitrary. But people are more
> and more assigning meaning to them. Mostly as part of the conversion
> process.
Right, but if someone is converting from a source which has nonascii
bits, they can always do the escaping themselves, in the code which
specifically has to support it.
> I do believe the utf8=>unicode conversion is down in the noise right
> now, but that doesn't mean it won't become a bigger deal as we do more work.
Making sure they get handled correctly is one more thing to get right,
and one more thing that can take time. Unless we really need it, and I
don't think we do, then we might as well restrict it.
So I would say, just require them to be ascii. If we can agree on that,
we should put it in developer documentation somewhere (bzrlib/doc/api or
HACKING.)
--
Martin
More information about the bazaar
mailing list