[Bug 655463] Re: strstr broken for some inputs on pre-SSE4 machines
Bug Watch Updater
655463 at bugs.launchpad.net
Wed May 25 18:07:34 UTC 2011
Launchpad has imported 11 comments from the remote bug at
http://sourceware.org/bugzilla/show_bug.cgi?id=12092.
If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.
------------------------------------------------------------------------
On 2010-10-05T05:56:48+00:00 Ppluzhnikov-google wrote:
Created attachment 5035
what appears to be minimal test case
Attached test case fails with glibc-2.11.1 and with current git trunk;
passes with glibc-2.7.
The failure I see on SSE2 and SSE3 machines is:
BUG: 55 vs. 115
The bug does *not* show up on SSE4_2 machines (either 32 or 64-bit
mode).
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/0
------------------------------------------------------------------------
On 2010-10-05T17:08:33+00:00 Ppluzhnikov-google wrote:
Additional analysis from iant at google.com:
I'm not completely sure, but this is what I see so far. The bug can
only occur when the second argument to strstr (the needle) is periodic,
which is to say that it consists entirely of some repeated string. When
that happens, the code can fail to match if the first argument to strstr
(the haystack) contains two or more repetitions of the needle's periodic
string, but not as many as the number of occurrences as are in the
needle. In that case strstr can sometimes return a pointer to the
smaller number of repetitions, when it should properly return NULL or a
later pointer. Also, the needle has to be 32 bytes or more.
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/1
------------------------------------------------------------------------
On 2010-10-05T17:36:31+00:00 Ppluzhnikov-google wrote:
Created attachment 5037
slightly simplified test case
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/2
------------------------------------------------------------------------
On 2010-10-05T18:17:44+00:00 Ian Lance Taylor wrote:
I think the problem is the Boyer-Moore shift in two_way_long_needle in
str-two-way.h. It does not correctly update MEMORY. I think we need something
like
if (memory && shift < period)
{
/* Since needle is periodic, but the last period has
a byte out of place, there can be no match until
after the mismatch. */
shift = needle_len - period;
memory = 0;
}
else if (memory > shift)
memory = memory - shift;
else
memory = 0;
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/3
------------------------------------------------------------------------
On 2010-10-05T18:31:36+00:00 Eric Blake wrote:
Yep, resetting 'memory' after a large shift is required; I'm testing
your idea now, but think you have the right patch in mind.
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/4
------------------------------------------------------------------------
On 2010-10-05T22:10:16+00:00 Eric Blake wrote:
Your test for (memory > shift) will never be reached. Other than the
assignment added by your proposed patch, memory is only ever assigned to
be 0 or needle_len - period. And for a periodic needle, shift is either
needle_len or < period, by virtue of how the shift table is constructed.
Therefore, if memory is non-zero but shift >= period, then shift is
necessarily > memory at that point.
Which means your code can be reduced to this simpler patch:
diff --git i/string/str-two-way.h w/string/str-two-way.h
index 502af47..76044b3 100644
--- i/string/str-two-way.h
+++ w/string/str-two-way.h
@@ -350,8 +350,8 @@ two_way_long_needle (const unsigned char *haystack,
a byte out of place, there can be no match until
after the mismatch. */
shift = needle_len - period;
- memory = 0;
}
+ memory = 0;
j += shift;
continue;
}
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/5
------------------------------------------------------------------------
On 2010-10-05T22:23:54+00:00 Eric Blake wrote:
Created attachment 5039
fix strstr and memmem
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/6
------------------------------------------------------------------------
On 2010-10-05T22:29:09+00:00 Jakub Jelinek wrote:
Please add a testcase and post to libc-alpha at sources.redhat.com.
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/7
------------------------------------------------------------------------
On 2010-10-06T15:43:59+00:00 Eric Blake wrote:
Interestingly enough:
strstr() and strcasestr() are only broken pre-SSE4, but memmem() is
broken even on SSE4 machines.
On the other hand, on SSE4 machines, strstr() and strcasestr() are quadratic in behavior; in other words, the use of an assembly implementation has actually caused a performance regression over the fix for
http://sourceware.org/bugzilla/show_bug.cgi?id=5514
$ cat foo.c
#define _GNU_SOURCE
#include <string.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#define P ":012345678-"
static void quit (int sig) { exit (sig + 128); }
int main(int argc, char **argv)
{
const char *hay = ";" ":013245678-" P P ":012345678." P ":012345678." P;
const char *needle = P P P;
size_t m = 1000000;
char *largehay = malloc (2 * m + 2);
char *largeneedle = malloc (m + 2);
signal (SIGALRM, quit);
alarm (5);
if (!largehay || !largeneedle)
return 2;
memset (largehay, 'A', 2 * m);
largehay[2 * m] = 'B';
largehay[2 * m + 1] = 0;
memset (largeneedle, 'A', m);
largeneedle[m] = 'B';
largeneedle[m + 1] = 0;
switch (argc > 1 ? atoi (argv[1]) : 0)
{
/* Demonstrate str-two-way.h bug. */
case 1:
return !!memmem (hay, strlen (hay), needle, strlen (needle));
case 2:
return !!strstr (hay, needle);
case 3:
return !!strcasestr (hay, needle);
/* Demonstrate quadratic behavior. */
case 4:
return !memmem (largehay, strlen (largehay),
largeneedle, strlen (largeneedle));
case 5:
return !strstr (largehay, largeneedle);
case 6:
return !strcasestr (largehay, largeneedle);
/* Usage error. */
default:
return 2;
}
}
$ for i in $(seq 6); do ./foo $i; echo $?; done
1
0
0
0
142
142
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/10
------------------------------------------------------------------------
On 2010-10-06T17:50:05+00:00 Drepper-fsp wrote:
Fixed in git.
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/11
------------------------------------------------------------------------
On 2010-10-06T17:58:36+00:00 Eric Blake wrote:
(In reply to comment #9)
> Fixed in git.
The incorrect results of memmem() and of non-SSE4 strstr() are fixed.
However, the glibc 2.11 regression of reintroducing quadratic behavior
for SSE4 strstr is not yet fixed.
Reply at: https://bugs.launchpad.net/eglibc/+bug/655463/comments/12
** Changed in: eglibc
Status: Unknown => Fix Released
** Changed in: eglibc
Importance: Unknown => Medium
** Bug watch added: Sourceware.org Bugzilla #5514
http://sourceware.org/bugzilla/show_bug.cgi?id=5514
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to eglibc in Ubuntu.
https://bugs.launchpad.net/bugs/655463
Title:
strstr broken for some inputs on pre-SSE4 machines
Status in Embedded GLIBC:
Fix Released
Status in Release Notes for Ubuntu:
Won't Fix
Status in “eglibc” package in Ubuntu:
Fix Released
Status in “eglibc” source package in Lucid:
Fix Released
Status in “eglibc” source package in Maverick:
Fix Released
Bug description:
# lsb_release -rd
Description: Ubuntu 10.04.1 LTS
Release: 10.04
# apt-cache policy libc6
libc6:
Installed: 2.11.1-0ubuntu7.2
Candidate: 2.11.1-0ubuntu7.2
The strstr function is broken for certain classes of inputs. This has
already been reported in upstream glibc at
http://sourceware.org/bugzilla/show_bug.cgi?id=12092. The bug
includes a proposed fix.
I've verified that libc6 2.11.1-0ubuntu7.2 on Lucid exhibits this
broken behavior on a pre-SSE4 machine (a Xeon L5335).
I'm attaching Paul Pluzhnikov's code snippet from the upstream bug
that demonstrates the broken behavior.
More information about the foundations-bugs
mailing list