[Bug 1434222] Re: Spurious valgrind errors due to memcpy replacement getting autovectorised
Mathew Hodson
mathew.hodson at gmail.com
Tue Dec 1 04:53:58 UTC 2015
** Changed in: valgrind (Ubuntu)
Importance: Undecided => Medium
** Changed in: valgrind (Ubuntu Trusty)
Importance: Undecided => Medium
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to valgrind in Ubuntu.
https://bugs.launchpad.net/bugs/1434222
Title:
Spurious valgrind errors due to memcpy replacement getting
autovectorised
Status in valgrind package in Ubuntu:
Fix Released
Status in valgrind source package in Trusty:
Fix Committed
Bug description:
== Comment: #0 - Anton Blanchard <antonb at au1.ibm.com> - 2015-03-18 00:05:56 ==
I'm seeing enormous numbers of these type of errors when using valgrind:
==95540== Invalid read of size 8
==95540== at 0x408D038: memcpy (in /usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
==95540== by 0x10414F5B: mem_file_write (in /usr/bin/gdb)
==95540== by 0x10414CF3: null_file_fputs (in /usr/bin/gdb)
==95540== by 0x104160E3: fputs_unfiltered (in /usr/bin/gdb)
==95540== by 0x1040E20B: fprintf_unfiltered (in /usr/bin/gdb)
In this case I ran "valgrind gdb". The issue here is the valgrind
memcpy replacement code is getting autovectorised (since we are
building the package with -O3 on Ubuntu):
0x000000000408d034: rldicr r9,r5,0,59
=> 0x000000000408d038: lxvd2x vs33,0,r9
0x000000000408d03c: xxswapd vs33,vs33
0x000000000408d040: vperm v13,v1,v0,v12
0x000000000408d044: xxlor vs32,vs33,vs33
0x000000000408d048: xxswapd vs0,vs45
0x000000000408d04c: stxvd2x vs0,r10,r5
0x000000000408d050: addi r5,r5,16
0x000000000408d054: bdnz 0x408d034
In this case the source and destination are not 16B aligned, and gcc
has decided to realign things via a permute. The problem is this code
will always read too much data (which it just throws away). A safe
optimisation, but one which confuses valgrind.
The simple fix is to override any optimise flags and build
shared/vg_replace_strmem.c with -O2.
Some of the commit messages on shared/vg_replace_strmem.c, suggest we
would like these loops to be autovectorised for performance, but I'm
not sure if we can do that and avoid gcc tricks that read in too much
data.
== Comment: #1 - William J. Schmidt <wschmidt at us.ibm.com> - 2015-03-18 09:23:23 ==
Hi Anton,
Note there is a pending fix to GCC that will avoid the realignment
code for POWER8, where unaligned load cost is much lower than
previously. See
https://bugzilla.linux.ibm.com/show_bug.cgi?id=122395.
The current status is that the GCC trunk is closed until GCC 5
releases. Once that occurs, I will be backporting the fix to 5, 4.9,
and 4.8 where it can get picked up at the next opportunity by each of
the distros. We will also provide it in the next releases of the
Advance Toolchain (AT7, AT8, AT9).
== Comment: #2 - David Heller <hellerda at us.ibm.com> - 2015-03-19 01:28:25 ==
So is the short term fix to build valgrind (or at least the one module) with -O2, and is that what we want to ask Canonical to do?
== Comment: #3 - William J. Schmidt <wschmidt at us.ibm.com> - 2015-03-19 09:37:57 ==
For 15.04, yes, that would be best. The GCC schedules make it impossible for us to fix the compiler in time for 15.04.
Note that a less impactful change to the compile would be to replace
-O3 with -O3 -fno-tree-vectorize. I'd predict this will still solve
the problem.
We will be fixing this properly in time for 15.10, so Canonical should
treat this as a one-time change.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1434222/+subscriptions
More information about the foundations-bugs
mailing list