[Bug 885250] Re: libc iconv does not reject surrogates when transcoding from UTF-32le to UTF-8

Dmitry Shachnev Mitya57 at gmail.com
Wed Nov 2 14:28:35 UTC 2011


** Package changed: ubuntu => eglibc (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to eglibc in Ubuntu.
https://bugs.launchpad.net/bugs/885250

Title:
  libc iconv does not reject surrogates when transcoding from UTF-32le
  to UTF-8

Status in “eglibc” package in Ubuntu:
  New

Bug description:
  Compile and run the following program:

  """

  #include <stdio.h>
  #include <errno.h>
  #include <iconv.h>

  int main(int argc, char **argv) {
  	iconv_t cd = iconv_open("UTF-8", "UTF-32LE");
  	//iconv_t cd = iconv_open("UCS-2LE", "UCS-2LE");
  	if (cd == (iconv_t)-1) {
  		printf("Could not open: %d\n", errno);
  		return 1;
  	}
  	
  	//char in_buf[] = { 0xA1, 0xDC, 0xA5, 0xDC };
  	//char in_buf[] = { 0xDC, 0xA1, 0xDC, 0xA5 };
  	char in_buf[] = { 0xA1, 0xDC, 0x00, 0x00, 0xA5, 0xDC, 0x00, 0x00 };
  	char out_buf[20];

  	char *in_buf_p = in_buf;   size_t in_buf_left = sizeof(in_buf)/sizeof(char);
  	char *out_buf_p = out_buf; size_t out_buf_left = 20;
  	size_t conv_count = iconv(cd, &in_buf_p, &in_buf_left, &out_buf_p, &out_buf_left);

  	if (conv_count == (size_t)-1) {
  		switch (errno) {
  		// Triggered by invalid multibyte sequence in input
  		case EILSEQ:	printf("Conversion error: EILSEQ\n"); break;
  		// Not enough space in output buffer
  		case E2BIG:	printf("Conversion error: E2BIG\n"); break;
  		// Incomplete multibyte sequence in input
  		case EINVAL:	printf("Conversion error: EINVAL\n"); break;
  		// Some other unknown error
  		default:	printf("Conversion error: %d\n", errno);
  		}
  		return 2;
  	}

  	printf("Consumed %d, produced %d, converted %d\n", (in_buf_p-in_buf)/sizeof(char), (out_buf_p-out_buf)/sizeof(char), conv_count);
  	for (char *out_buf_read = out_buf; out_buf_read < out_buf_p; out_buf_read++) {
  		printf("\t%x\n", (unsigned char)*out_buf_read);
  	}

  	if (iconv_close(cd) != 0) {
  		printf("Could not close: %d\n", errno);
  		return 3;
  	}

  	return 0;
  }
  """

  Expected result:
  """
  Conversion error: EILSEQ
  """

  Actual result:
  """
  Consumed 8, produced 6, converted 0
  	ed
  	b2
  	a1
  	ed
  	b2
  	a5
  """

  This UTF-8 byte sequence is invalid according to the standard because
  it encodes a surrogate code point.

  Note that if you take this output byte sequence and run it through
  iconv *again* (with both input and output encodings as UTF-8) then
  EILSEQ is reported as expected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/885250/+subscriptions




More information about the foundations-bugs mailing list