New format checklist

Jan Hudec bulb at ucw.cz
Tue Jan 10 13:07:56 GMT 2006


On Mon, Jan 09, 2006 at 20:03:42 -0600, John A Meinel wrote:
> Martin Pool wrote:
> > On Tue, 2006-01-03 at 10:28 +1100, Robert Collins wrote:
> Well, actually, utf-8 at least specifies the possibility for characters
> to go out to 5 bytes.

Actually utf-8, as a general encoding schema, can go up to *7* characters
for codepoint, allowing 36-bit codepoints.

> So you could theoretically have one unicode
> character expand to 25 characters.

No, it can't. Unicode has 21-bit codepoints (highest unicode codepoint
is actually 0x10ffff, which seems related to how utf-16 was defined) and
all letters of all live languages are within 16-bits. That means any
utf-8 unicode letter will, after double uri-escaping, have at most 20
bytes and any letter will have at most 15 bytes.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060110/52dd96ae/attachment.pgp 


More information about the bazaar mailing list