UTF-16 versus UCS2
Dennis Benzinger
Dennis.Benzinger at gmx.net
Wed May 23 11:13:10 BST 2007
Am Wed, 23 May 2007 10:40:15 +0200
schrieb John Arbash Meinel <john at arbash-meinel.com>:
> I just checked around and found this page:
> http://en.wikipedia.org/wiki/UTF-16
>
> Which basically says that UTF-16 == UCS-2 for everything that isn't
> in the "extended" character set.
>
> Basically, Unicode extended how many codes there was going to be, so
> you need >65,000 values in your serialized data. UCS2 *doesn't*
> support those. UTF-16 supports them by having some codes be encoded
> with 4-bytes instead of just 2. (Like how UTF-8 can use up to 6?)
>
> Anyway, I would guess that you could write a trivial UTF-16 => UCS2
> converter. If you want to be really safe, it should check for the
> special codes, and either complain or just flatten them.
>
> Either that, or just accept some small chance for inaccuracy when
> passing UTF-16 to code that is expecting a UCS-2 string. (The extra
> codes are probably extremely rare, and UCS2 can't handle them anyway).
>
> John
> =:->
A qoute from the Basic Questions part of the Unicode FAQ
<http://www.unicode.org/faq/basic_q.html#25>:
"In particular, for the purposes of data exchange, UCS-2 and UTF-16 are
identical formats. Both are 16-bit, and have exactly the same code unit
representation."
You didn't give much context info in your mail so I don't really know
if that applies to your use case.
Dennis Benzinger
More information about the bazaar
mailing list