[Bug 1434222] Re: Spurious valgrind errors due to memcpy replacement getting autovectorised

Tue Dec 1 04:53:58 UTC 2015

** Changed in: valgrind (Ubuntu)
   Importance: Undecided => Medium

** Changed in: valgrind (Ubuntu Trusty)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to valgrind in Ubuntu.
https://bugs.launchpad.net/bugs/1434222

Title:
  Spurious valgrind errors due to memcpy replacement getting
  autovectorised

Status in valgrind package in Ubuntu:
  Fix Released
Status in valgrind source package in Trusty:
  Fix Committed

Bug description:
  == Comment: #0 - Anton Blanchard <antonb at au1.ibm.com> - 2015-03-18 00:05:56 ==
  I'm seeing enormous numbers of these type of errors when using valgrind:

  ==95540== Invalid read of size 8
  ==95540==    at 0x408D038: memcpy (in /usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
  ==95540==    by 0x10414F5B: mem_file_write (in /usr/bin/gdb)
  ==95540==    by 0x10414CF3: null_file_fputs (in /usr/bin/gdb)
  ==95540==    by 0x104160E3: fputs_unfiltered (in /usr/bin/gdb)
  ==95540==    by 0x1040E20B: fprintf_unfiltered (in /usr/bin/gdb)

  In this case I ran "valgrind gdb". The issue here is the valgrind
  memcpy replacement code is getting autovectorised (since we are
  building the package with -O3 on Ubuntu):

     0x000000000408d034:  rldicr  r9,r5,0,59
  => 0x000000000408d038:  lxvd2x  vs33,0,r9
     0x000000000408d03c:  xxswapd vs33,vs33
     0x000000000408d040:  vperm   v13,v1,v0,v12
     0x000000000408d044:  xxlor   vs32,vs33,vs33
     0x000000000408d048:  xxswapd vs0,vs45
     0x000000000408d04c:  stxvd2x vs0,r10,r5
     0x000000000408d050:  addi    r5,r5,16
     0x000000000408d054:  bdnz    0x408d034

  In this case the source and destination are not 16B aligned, and gcc
  has decided to realign things via a permute. The problem is this code
  will always read too much data (which it just throws away). A safe
  optimisation, but one which confuses valgrind.

  The simple fix is to override any optimise flags and build
  shared/vg_replace_strmem.c with -O2.

  Some of the commit messages on shared/vg_replace_strmem.c, suggest we
  would like these loops to be autovectorised for performance, but I'm
  not sure if we can do that and avoid gcc tricks that read in too much
  data.

  == Comment: #1 - William J. Schmidt <wschmidt at us.ibm.com> - 2015-03-18 09:23:23 ==
  Hi Anton,  

  Note there is a pending fix to GCC that will avoid the realignment
  code for POWER8, where unaligned load cost is much lower than
  previously.  See
  https://bugzilla.linux.ibm.com/show_bug.cgi?id=122395.

  The current status is that the GCC trunk is closed until GCC 5
  releases.  Once that occurs, I will be backporting the fix to 5, 4.9,
  and 4.8 where it can get picked up at the next opportunity by each of
  the distros.  We will also provide it in the next releases of the
  Advance Toolchain (AT7, AT8, AT9).

  == Comment: #2 - David Heller <hellerda at us.ibm.com> - 2015-03-19 01:28:25 ==
  So is the short term fix to build valgrind (or at least the one module) with -O2, and is that what we want to ask Canonical to do?

  == Comment: #3 - William J. Schmidt <wschmidt at us.ibm.com> - 2015-03-19 09:37:57 ==
  For 15.04, yes, that would be best.  The GCC schedules make it impossible for us to fix the compiler in time for 15.04.

  Note that a less impactful change to the compile would be to replace
  -O3 with -O3 -fno-tree-vectorize.  I'd predict this will still solve
  the problem.

  We will be fixing this properly in time for 15.10, so Canonical should
  treat this as a one-time change.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1434222/+subscriptions