[Bug 1828282] Comment bridged from LTC Bugzilla

Mon May 13 11:29:18 UTC 2019

------- Comment From STLI at de.ibm.com 2019-05-13 07:27 EDT-------
Hi xnox,

this issue has nothing todo with an issue in s390x specific setjmp/longjmp implementation!
Setjmp/longjmp is just used for error handling inside bunzip2 implementation in busybox!
But due to an issue in busybox implementation, longjmp is called on s390x but not on e.g. x86.
Please report this bug to busybox with the detailed information below!

According to bunzip2.tests:
bunzip2: bunzip error -5 => PASS
bunzip2: bunzip error -3 => XFAIL

As side note:
Error -3 also occures on s390x Ubuntu 18.04.2 LTS!

According to archival/libarchive/decompress_bunzip2.c:
62#define RETVAL_UNEXPECTED_INPUT_EOF     (dbg("%d", __LINE__), -3)
64#define RETVAL_DATA_ERROR               (dbg("%d", __LINE__), -5)

RETVAL_UNEXPECTED_INPUT_EOF is used only in get_bits():
128                        bd->inbufCount = read(bd->in_fd, bd->inbuf, IOBUF_SIZE);
129                        if (bd->inbufCount <= 0)
130                                longjmp(*bd->jmpbuf, RETVAL_UNEXPECTED_INPUT_EOF);
If you start gdb and set a breakpoint there ...:
busybox-1.30.1/testsuite$ gdb ../busybox_unstripped
(gdb) b decompress_bunzip2.c:130
(gdb) run bunzip2 <bz2_issue_11.bz2 2>&1 >/dev/null
... it will be hit, bd->inbufCount will be zero and the longjmp jumps back to setjmp in unpack_bz2_stream().
i will be -3 and "bunzip error -3" will be reported.:
788                i = setjmp(jmpbuf);
789                if (i == 0)
790                        i = start_bunzip(&jmpbuf, &bd, xstate->src_fd, outbuf + 2, len);
791
792                if (i == 0) {
793                        while (1) { /* "Produce some output bytes" loop */
794                                i = read_bunzip(bd, outbuf, IOBUF_SIZE);
795                                if (i < 0) /* error? */
796                                        break;
...
808                if (i != RETVAL_LAST_BLOCK
809                /* Observed case when i == RETVAL_OK:
810                 * "bzcat z.bz2", where "z.bz2" is a bzipped zero-length file
811                 * (to be exact, z.bz2 is exactly these 14 bytes:
812                 * 42 5a 68 39 17 72 45 38 50 90 00 00 00 00).
813                 */
814                 && i != RETVAL_OK
815                ) {
816                        bb_error_msg("bunzip error %d", i);
817                        break;
818                }

The difference between reporting -5 or -3 depends on uninitialized values on the stack while calling read_bunzip()->get_next_block().
There you have the array mtfSymbol on stack:
156/* Unpacks the next block and sets up for the inverse Burrows-Wheeler step. */
157static int get_next_block(bunzip_data *bd)
158{
159        int groupCount, selector,
160                i, j, symCount, symTotal, nSelectors, byteCount[256];
161        uint8_t uc, symToByte[256], mtfSymbol[256], *selectors;
...

The groupCount is read and values in mtfSymbol are initialized:
...
219        /* How many different Huffman coding groups does this block use? */
220        groupCount = get_bits(bd, 3);
221        if (groupCount < 2 || groupCount > MAX_GROUPS)
222                return RETVAL_DATA_ERROR;
...
228        for (i = 0; i < groupCount; i++)
229                mtfSymbol[i] = i;
...
=> In the relevant case, groupCount == 6 and mtfSymbol[0..5] are initialized to 0..5.

For each selector, the group (see variable n) is determined
and tmp_byte is set to the value of mtfSymbol[n]:
233        for (i = 0; i < nSelectors; i++) {
234                uint8_t tmp_byte;
235                /* Get next value */
236                int n = 0;
237                while (get_bits(bd, 1)) {
=> For each "1" bit, n is incremented. Unfortunately the "too-large" check is done before incrementing n!!!
If the "n++" line is moved before the check, then the bz2_issue_11.bz2 testcase passes also on s390x!
238                        if (n >= groupCount)
239                                return RETVAL_DATA_ERROR;
240                        n++;
241                }
242                /* Decode MTF to get the next selector */
243                tmp_byte = mtfSymbol[n];
=> In this testcase, for selector i==395, n is 6 and the uninitialized value of mtfSymbol[6] is first stored to tmp_byte and afterwards to selectors[395] although groupCount == 6!
(Note: there is also an commented out check which would return -5!)
244                while (--n >= 0)
245                        mtfSymbol[n + 1] = mtfSymbol[n];
246//We catch it later, in the second loop where we use selectors[i].
247//Maybe this is a better place, though?
248//              if (tmp_byte >= groupCount) {
249//                      dbg("%d: selectors[%d]:%d groupCount:%d",
250//                                      __LINE__, i, tmp_byte, groupCount);
251//                      return RETVAL_DATA_ERROR;
252//              }
253                mtfSymbol[0] = selectors[i] = tmp_byte;
254        }
=> Note: on the s390x case, selectors[395] == 0 whereas on x86 it was selectors[395] == 20! This value depends on previous operations on the stack!

Afterwards each selector is processed:
382        for (;;) {
383                struct group_data *hufGroup;
384                int *base, *limit;
385                int nextSym;
386                uint8_t ngrp;
387
388                /* Fetch next Huffman coding group from list. */
389                symCount = GROUP_SIZE - 1;
390                if (selector >= nSelectors)
391                        return RETVAL_DATA_ERROR;
392                ngrp = selectors[selector++];
393                if (ngrp >= groupCount) {
394                        dbg("%d selectors[%d]:%d groupCount:%d",
395                                __LINE__, selector-1, ngrp, groupCount);
396                        return RETVAL_DATA_ERROR;
397                }
...
=> In the relevant case, groupCount == 6 and we look at selector == 395:
On x86, ngrp == 20 => RETVAL_DATA_ERROR (=-5) is returned.
On s390x, ngrp == 0 => No error is reported and processing continues
until the input stream comes to end of file and get_bits() is called which triggers the longjmp with value -3 (see above)!
418                                if (bd->inbufPos == bd->inbufCount) {
419                                        nextSym = get_bits(bd, hufGroup->maxLen);

Note: This bug is also observable with valgrind on s390x and x86:
busybox-1.30.1/testsuite$ valgrind ../busybox_unstripped bunzip2 <bz2_issue_11.bz2 2>&1 >/dev/null
...
==58836== Conditional jump or move depends on uninitialised value(s)
==58836==    at 0x1C3D2C: get_next_block (decompress_bunzip2.c:393)
==58836==    by 0x1C3ED7: get_next_block (decompress_bunzip2.c:419)
==58836==  Uninitialised value was created by a stack allocation
==58836==    at 0x1C392A: get_next_block (decompress_bunzip2.c:158)

@xnox: As mentioned at the beginning: Please report this bug to busybox
and integrate the fix into the ubuntu busybox package!

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1828282

Title:
  busybox 1.30.1 crashes bzip2 test case with glibc 2.29, always

Status in Ubuntu on IBM z Systems:
  New
Status in busybox package in Ubuntu:
  New
Status in glibc package in Ubuntu:
  New

Bug description:
  Steps to reproduce:

  1) Get a system with glibc 2.29

  2) Get busybox 1.30.1 installed (e.g. eoan, or download busybox
  package from
  https://launchpad.net/ubuntu/+source/busybox/1:1.30.1-4ubuntu3/+build/16724246
  and use $ apt install ./busybox*.deb to install)

  3) Get busybox 1.30.1 source code, e.g. $ pull-lp-source busybox
  Or like download the orig tarball from https://launchpad.net/ubuntu/+source/busybox/1:1.30.1-4ubuntu3

  4) Run the bunzip2 testsuite:

  cd testsuite/
  ECHO=/bin/echo ./bunzip2.tests

  Observe that with glibc 2.29 the:
  PASS: bunzip2: bz2_issue_11.bz2 corrupted example

  is XFAIL or FAIL, on s390x, whereas it passes on all other arches.

  If one uses glibc 2.28 (ie. use Cosmic, and install busybox & use
  matching test suite from eoan using links above) one can observe that
  the testcase always passes.

  We suspect this might be a glibc 2.29 s390x-specific setjmp
  regression. Probably due to setjmp usage in
  ./archival/libarchive/decompress_bunzip2.c

  The tests were done on a z13 machine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1828282/+subscriptions