[Bug 821951] Re: sort -u erase some utf8 characters

Bug Watch Updater 821951 at bugs.launchpad.net
Sat Aug 6 18:28:58 UTC 2011


Launchpad has imported 2 comments from the remote bug at
http://sourceware.org/bugzilla/show_bug.cgi?id=13063.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2011-08-06T17:21:16+00:00 An Yang wrote:

Hi,

Refer to glibc/localedata/locales/zh_CN and iso14651_t1_pinyin or
iso14651_t1, glibc just support unicode3.0.

The new version of unicode is 6.0, it extend CJK UNIFIED IDEOGRAPH with
extension A/B/C/D, and extension A is included in GB18030:2005( China
locale charset standard).

So at least, glibc should sort all Chinese characters in CJK UNIFIED
IDEOGRAPH and EXTENSIONA(U+3400-U+4DBF).

The real effect is sort -u.
If you execute sort -u examples_CJK_extensionA.txt (see attachment), you
will got only one Chinese character "㑗".


Regards,
An Yang

Reply at: https://bugs.launchpad.net/eglibc/+bug/821951/comments/9

------------------------------------------------------------------------
On 2011-08-06T17:24:33+00:00 An Yang wrote:

Created attachment 5880
example characters in CJK extension A.

Reply at: https://bugs.launchpad.net/eglibc/+bug/821951/comments/10


** Changed in: eglibc
       Status: Unknown => Confirmed

** Changed in: eglibc
   Importance: Unknown => Critical

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to eglibc in Ubuntu.
https://bugs.launchpad.net/bugs/821951

Title:
  sort -u erase some utf8 characters

Status in Embedded GLIBC:
  Confirmed
Status in “eglibc” package in Ubuntu:
  Confirmed

Bug description:
  sort -u will erase some utf8 characters.

  see attachment for detail data.
  sort -u x.sorted.utf8 > x.sorted.uniq.utf8
  diff x.sorted.uniq.utf8 x.sorted.utf8 > x.diff

To manage notifications about this bug go to:
https://bugs.launchpad.net/eglibc/+bug/821951/+subscriptions




More information about the foundations-bugs mailing list