[Bug 1640518] Comment bridged from LTC Bugzilla

bugproxy bugproxy at us.ibm.com
Thu Nov 10 00:48:04 UTC 2016


- python ./buildscripts/scons.py CC=/usr/bin/gcc CXX=/usr/bin/g++
CCFLAGS="-mcpu=power8 -mtune=power8 -mcmodel=medium" --ssl --implicit-
cache --build-fast-and-loose -j ./mongo ./mongod ./mongos

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to gcc-5 in Ubuntu.
https://bugs.launchpad.net/bugs/1640518

Title:
  MongoDB Memory corruption

Status in gcc-5 package in Ubuntu:
  New

Bug description:
  == Comment: #0 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-01 23:09:10 ==
  Team has changed to the Bare-metal Ubuntu 16.4.  The problem still exists, so it is not related to the virtualization. 

  Since the bug is complicated to reproduce,  Could we use sets of tools
  to collect the data when this happens?

  
  ---Problem Description---
  MongoDB has memory corruption issues which only occurred on Ubuntu 16.04, it doesn't occur on Ubuntu 15.
   
  Contact Information =Calvin Sze/Austin/IBM
   
  ---uname output---
  Linux master 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:00:57 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = Model:                 2.1 (pvr 004b 0201) Model name:            POWER8E (raw), altivec supported 
   
  ---System Hang---
   the system is still alive
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   Unfortunately, not very easily. I had a test case that I was running on ubuntu1604-ppc-dev.pic.build.10gen.cc and xxxx-ppc-dev.pic.build.10gen.cc. I understand these to be two VMs running on the same physical host.

  About 3.5% of the test runs on ubuntu1604-ppc-dev.pic.build.10gen.cc
  would fail, but all of the runs on the other machine passed.
  Originally, this failure manifested as the GCC stack protector (from
  -fstack-protector-strong) claiming stack corruption.

  Hoping to be able to see the data that was being written and
  corrupting the stack, I manually injected a guard region into the
  stack of the failing functions as follows:


  
  +namespace {
  +
  +class Canary {
  +public:
  +
  +    static constexpr size_t kSize = 1024;
  +
  +    explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
  +        ::memset(const_cast<unsigned char*>(_t), kBits, kSize);
  +    }
  +
  +    ~Canary() {
  +        _verify();
  +    }
  +
  +private:
  +    static constexpr uint8_t kBits = 0xCD;
  +    static constexpr size_t kChecksum = kSize * size_t(kBits);
  +
  +    void _verify() const noexcept {
  +        invariant(std::accumulate(&_t[0], &_t[kSize], 0UL) == kChecksum);
  +    }
  +
  +    const volatile unsigned char* const _t;
  +};
  +
  +}  // namespace
  +

   Status bsonExtractField(const BSONObj& object, StringData fieldName, BSONElement* outElement) {
  +
  +    volatile unsigned char* const cookie = static_cast<unsigned char *>(alloca(Canary::kSize));
  +    const Canary c(cookie);
  + 

  When running with this, the invariant would sometimes fire. Examining
  the stack cookie under the debugger would show two consecutive bytes,
  always at an offset ending 0x...e, written as either 0 0, or 0 1,
  somewhere at random within the middle of the cookie.

  This indicated that it was not a conventional stack smash, where we
  were writing past the end of a contiguous buffer. Instead it appeared
  that either the currently running thread had reached up some arbitrary
  and random amount on the stack and done either two one-byte writes, or
  an unaligned 2-byte write. Another possibility was that a local
  variable had been transferred to another thread, which had written to
  it.

  However, while looking at the code to find such a thing, I realized
  that there was another possibility, which was that the bytes had never
  been written correctly in the first place. I changed the stack canary
  constructor to be:


  +    explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
  +        ::memset(const_cast<unsigned char*>(_t), kBits, kSize);
  +        _verify();
  +    }  

  So that immediately after writing the byte pattern to the stack
  buffer, we verified the contents we wrote. Amazingly, this *failed*,
  with the same corruption as seen before. This means that either
  between the time we called memset to write the bytes and when we read
  them back, something either overwrote the stack cookie region, or that
  the bytes were never written correctly by memset, or that memset wrote
  the bytes, but the underlying physical memory never took the write.



   
  Stack trace output:
   no
   
  Oops output:
   no
   
  Userspace tool common name: MongoDB 

  Userspace rpm: mongod 
   
  The userspace tool has the following bit modes: 64bit 
   
  System Dump Info:
    The system is not configured to capture a system dump.

  Userspace tool obtained from project website:  na 
   
  *Additional Instructions for Lilian Romero/Austin/IBM: 
  -Post a private note with access information to the machine that the bug is occuring on. 
  -Attach sysctl -a output output to the bug.
  -Attach ltrace and strace of userspace application.

  == Comment: #1 - Luciano Chavez <chavez at us.ibm.com> - 2016-11-02 08:41:47 ==
  Normally for userspace memory corruption type problems I would recommend Valgrind's memcheck tool though if this works on other versions of linux, one would want to compare the differences such as whether or not  you are using the same version of mongodb, gcc, glibc and the kernel. 

  Has a standalone testcase been produced that shows the issue without
  mongodb?

  == Comment: #2 - Steven J. Munroe <sjmunroe at us.ibm.com> - 2016-11-02 10:27:40 ==
  We really need that standalone test case.

  Need to look at WHAT c++ is doing with memset. I suspect the compiler
  is short circuiting the function and inlining. That is what you would
  want for optimization, but we need to know so we can steer this to the
  correct team.

  == Comment: #3 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-02 13:17:30 ==
  Hi Luciano and Steve, Thanks for the advise,

  They don't have a standalone test case without Mongodb,  I could image
  it take a while and probably not that easy to produce.  I am seeking
  your advise how to approach this.  The failure takes at least 24 - 48
  hours running to reproduce.  Steve, do you have what you needed for
  C++ test,  or there is something I need to ask Mongo development team?

  Thanks

  == Comment: #4 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-02 16:29:26 ==
  (In reply to comment #3)
  > Hi Luciano and Steve, Thanks for the advise,
  > 
  > They don't have a standalone test case without Mongodb,  I could image it
  > take a while and probably not that easy to produce.  I am seeking your
  > advise how to approach this.  The failure takes at least 24 - 48 hours
  > running to reproduce.  Steve, do you have what you needed for C++ test,  or
  > there is something I need to ask Mongo development team?
  > 
  > Thanks

  It's unclear to me yet that we have evidence of this being a problem
  in the toolchain.  Does the last experiment (revised Canary
  constructor) ALWAYS fail, or does it also fail only ever 24 - 48
  hours?  If the latter, then all we know is that stack corruption
  happens.  There's no indication of where the wild pointer is coming
  from (application problem, compiler problem, etc.).  If it does always
  fail, however, then I question the assertion that they can't provide a
  standalone test case.

  We need something more concrete to work with.

  Bill

  == Comment: #5 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-03 18:08:33 ==
  Could this ticket be viewed by external customer/ISV?
  I am thinking how to establish the direct communications between Mongodb development team and experts/owner of the ticket to pass the middle man, me :-)

  Here are the MongoDB deelopment director, Andrew's answers to my 3
  questions. And in addition he added comments.

  Basically, there are 3 questions,

  > 1. Is the mongoDB binary built with gcc came with Linux
  distributions or with IBM Advance toolchain gcc?

  
  We build our own GCC, but we have reproduced the issue with both our custom GCC, and the builtin linux distribution GCC. We have also reproduced with clang 3.9 built from source on the Ubuntu 16.04 POWER machine, so we do not think that this is a compiler issue (could still be a std library issue).

  
  > 2. Does the last experiment (revised Canary constructor) ALWAYS fail, or does it also fail only ever 24 - 48 hours?

  No, we have never been able to construct a deterministic repro. We are
  only able to get it to fail after running the test a very large number
  of times.


  > 3. Is there any way we can have a standalone test case without
  MongoDB?

  We do not have such a repro at this time.

  I do understand the position they are taking - it isn't a lot of
  information to go on, and most of the time the correct response to a
  mysterious software crash is to blame the software itself, not the
  surrounding ecosystem. However, we have a lot of *indirect* evidence
  that has made us skeptical that this is our bug. We would love to be
  proved wrong!

  
  - The stack corruption has not reproduced on any other systems. We are running these same tests on every commit across dozens of Linux variants, and across four cpu architectures (x86_64, POWER, zSeries, ARMv8).
  - We don't see crashes on other POWER, but we do on Ubuntu POWER.
  - We don't see crashes on Windows, Solaris, OS X
  - We have run the under the clang address sanitizer, with no reports.
  - We have enabled the clang address sanitizer use-after-return detector, and found no results.

  
  If this were a wild pointer in the MongoDB server process that was writing to the stack of other threads, we would expect to see corruption show up elsewhere, but we simply do not. 

  However, lets assume that this is a bug in our code, that for whatever
  reason only reveals itself on POWER, and only on Ubuntu. We would
  still be interesting in learning from the kernel team if there are
  additional power specific debugging techniques that we might be able
  to apply. In particular, the ability to programmatically set/unset
  hardware watchpoints over the stack canary. Another possibility would
  be to mprotect the stack canary, but it is not clear to us whether it
  is valid to mprotect part of the stack, either in general, or on
  POWER.

  We would be happy to hear any suggestions on how to proceed.

  
  Thanks,
  Andrew

  == Comment: #6 - Steven J. Munroe <sjmunroe at us.ibm.com> - 2016-11-03 18:34:30 ==
  you could tell what specific GCC version you are based on and configure options.

  You could provide the disassemble of the canary code.

  == Comment: #7 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-03 23:01:55 ==
  It would be useful to see what the Canary is compiled into, as Steve suggested.  Let's make sure it's doing what we think it is.

  Given we have multiple compilers producing the same results, we may
  want to think more about the runtime environment -- are you using the
  same glibc and libstdc++ in all cases?  Clang at least would pick up
  the distro versions, as it doesn't provide its own.

  One reason you see this on Ubuntu 16.04 and not on another linux
  distro is likely because of glibc level.  The other linux's glibc is
  quite old by comparison.  glibc 2.23, which appears on Ubuntu 16.04,
  is the first version to be compiled with -fstack-protector-strong by
  default.  So this doesn't necessarily mean that the bug doesn't exist
  elsewhere; it just means that the stack protector code isn't enabled
  to spot the problem.  If the stack corruption is benign, then it
  wouldn't be noticed otherwise.

  I assume that glibc 2.23 was compiled with Ubuntu's version of gcc 5
  that ships with the system, in case that becomes relevant.

  I don't personally have a lot of experience with trying to debug
  something of this nature, in case we don't see something obvious from
  the disassembly of the canary.  CCing Ulrich Weigand in case he has
  some ideas of other approaches to try.

  == Comment: #9 - Ulrich Weigand <Ulrich.Weigand at de.ibm.com> - 2016-11-04 12:21:48 ==
  I don't really have any other great ideas either.   Just two comments:

  - Even though the original reported mentioned they already tried
  clang's address sanitizer, I'd definitely still also try reproducing
  the problem under valgrind -- the two are different in what exactly
  they detect, and using both tools in a complex problem can only help.

  - The Canary code sample above has strictly speaking undefined
  behavior, I think: it is calling memset on a const *.  (The const_cast
  makes the warning go away, but doesn't actually cure the undefined
  behavior.)  I don't *think* this will cause codegen changes in this
  example, but it cannot hurt to try to fix this and see if anything
  changes.

  == Comment: #12 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-06 10:32:25 ==
  Hi Bill, Thanks

  I have asked Andrew, waiting for his confirmation.

  == Comment: #14 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-06 10:56:49 ==
  Hi Calvin -

  
  I can provide the assembly of the function that contains the canary (the canary itself gets inlined), but I think it might just be easier if I uploaded a binary and an associated corefile? That way your engineers could disassemble the crashing function themselves in the debugger and see exactly what the state was at the time of the crash.

  
  What is the best way for me to get that information to you?

  
  Thanks,
  Andrew

  == Comment: #15 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-06 10:58:54 ==
  Provided the binary and core information.

  Note from Mongo;

  		 I've uploaded a sample core file and the associated binary to your ftp 
  server as detailed above.  The binary is named `mongod.power` and the core is 
  named `mongod.power.core`.

  		 You should expect to see a backtrace on the faulting thread which looks 
  like this (for the first few frames):

  (gdb) bt
  #0  0x00003fff997be5d0 in __libc_signal_restore_set (set=0x3fff5814c1f0)
      at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
  #1  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:55
  #2  0x00003fff997c0c00 in __GI_abort () at abort.c:89
  #3  0x00000000223c33e8 in mongo::invariantFailed (expr=<optimized out>, 
      file=0x24131b38 "src/mongo/bson/util/bson_extract.cpp", 
      line=<optimized out>) at src/mongo/util/assert_util.cpp:154
  #4  0x00000000224bbc48 in mongo::(anonymous namespace)::Canary::_verify (
      this=<optimized out>) at src/mongo/bson/util/bson_extract.cpp:58

  
  The "Canary::_verify" frame (number 4) has a local variable "_t" which is an 
  on-the-stack array and filled with "0xcd" for a span of 1024 bytes.  Near the 
  end of this block we see two bytes of poisoned memory which were altered:

  0x3fff5814c858: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd
  0x3fff5814c860: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd
  0x3fff5814c868: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0x01    0x00
  0x3fff5814c870: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd
  0x3fff5814c878: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd

  
  Note the two bytes set to values "0x01" and "0x00".

  At the time of core-dump all the other threads seemed to be paused on system 
  calls such as "recv" or "__pthread_cond_wait".  The verify function is called 
  when setting up our software canary, and checks the memory immediately after 
  its setup.  We do not run any other functions on this thread between the 
  memory poisoning and the verification of the poisoning.  All other threads 
  appear to be paused at this time.

  == Comment: #16 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-06 10:59:40 ==
  A follow up message from Mongo

  The function calling the canary code, which you'll want to possibly 
  disassemble is in frame 6:

  #6  mongo::bsonExtractStringField (object=..., fieldName=..., 
      out=0x3fff5814caa8) at src/mongo/bson/util/bson_extract.cpp:138

                   The lower numbered frames deal with the canary code
  itself.

  == Comment: #17 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-06 11:03:46 ==
  From Andrew,

  >Given we have multiple compilers producing the same results, we may want to
  >think more about the runtime environment -- are you using the same glibc and
  >libstdc++ in all cases? Clang at least would pick up the distro versions, as
  >it doesn't provide its own.

  We have repro'd with three compilers:

  - The system GCC, using system libstdc++ and system glibc
  - Our hand-rolled GCC, using its own libstdc++, and system glibc
  - One off clang-3.9 build, using system libstdc++, and system glibc.

  
  Coincidentally, both system and hand-rolled GCC are 5.4.0, so there may not be as much variation there as hoped. We could try building with clang and libc++ to at least rule out libstdc++ as a factor.
   

  >One reason you see this on Ubuntu 16.04 and not on the other linux distro is likely because of
  >glibc level. The other linux distro's glibc is quite old by comparison. glibc 2.23, which
  >appears on Ubuntu 16.04, is the first version to be compiled with
  >-fstack-protector-strong by default.

  I'm not sure I follow. Our software has been built with -fstack-protector-strong on both platforms, whether or not glibc has been, and the invocation of the __stack_chk_fail function is always from our code, not from glibc, or libstdc++. So, I'd expect that if there were stack corruption taking place as a result of our code, we would see the stack protector trip on both platforms. Or are you saying that on platforms where glibc itself wasn't built with -fstack-protector-whatever that user code built with that same flag won't report errors?
   
  >So this doesn't necessarily mean that the
  >bug doesn't exist elsewhere; it just means that the stack protector code isn't
  >enabled to spot the problem. If the stack corruption is benign, then it
  >wouldn't be noticed otherwise.

  Yeah, still confused. I can definitely make the other linux distro box
  report a stack corruption:

  
  [amorrow at xxxx-ppc-dev.pic.build ~]$ cat > boom.c
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>

  
  struct no_chars {
      unsigned int len;
      unsigned int data;
  };

  
  int main(int argc, char * argv[])
  {
      struct no_chars info = { };

  
      if (argc < 3) {
          fprintf(stderr, "Usage: %s LENGTH DATA...\n", argv[0]);
          return 1;
      }

  
      info.len = atoi(argv[1]);
      memcpy(&info.data, argv[2], info.len);

  
      return 0;
  }
  [amorrow at rhel71-ppc-dev.pic.build ~]$ gcc -Wall -O2 -U_FORTIFY_SOURCE -fstack-protector-strong boom.c -o boom

  
  [amorrow at rhel71-ppc-dev.pic.build ~]$ ./boom 64 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  *** stack smashing detected ***: ./boom terminated
  Segmentation fault


  
  I assume that glibc 2.23 was compiled with Ubuntu's version of gcc 5 that ships
  with the system, in case that becomes relevant.

  
  Correct, we have not made any changes to glibc - we are using the stock version that ships on the system.

  == Comment: #18 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-06 11:04:24 ==
  From Andrew

  Also, I want to re-iterate that while we have definitely observed
  cases where the stack protector detects the stack corruption, we have
  also observed stack corruption within our own hand-rolled stack
  buffer, per the code posted earlier. The core dump that Adam provided
  is of this latter sort So to some extent, this is independent of
  -fstack-protector-strong.

  
  One thing that I have not yet ruled out is whether -fstack-protect-strong could itself be at fault, somehow, though I find that unlikely given that we have reproduced with clang as well.

  
  Still, it sounds like a worthwhile experiment, so I will see if I can still detect corruption in our hand-rolled stack canary when building without any form of -fstack-protector enabled.

  == Comment: #19 - Calvin L. Sze <calvins at us.ibm.com> - 2016-11-06 11:05:58 ==
  From Andrew,

  
  I've performed this experiment, replacing our use of -fstack-protector-strong with -fno-stack-protector when building MongoDB, and I can confirm that we still observe stack corruption in our hand-rolled canary, per the code posted earlier.

  
  I have a core file and executable. Let me know if you would be interested in my providing those in addition to the files provided yesterday by Adam.

  == Comment: #21 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-07 11:10:54 ==
  Andrew, thanks for all the details, and for the binary and core file!  I'll start poking through them this morning.  I've just been absorbing all the notes that Calvin dumped into our bug tracking system yesterday.

  You can ignore what I was saying about -fstack-protector-strong.  My
  thought at the time was that *if* the flow of control entered glibc,
  that whether or not the code *there* was compiled with -fstack-
  protector-strong might prove to make a difference.  Reading back
  through today, I see that was off base, so sorry for the distraction.

  While I'm looking at the binary, there are a couple of other things you might want to try:
   - Replace ::memset with __builtin_memset with GCC to see whether that makes any difference;
   - Try Ulrich Weigand's suggestions from comment #9;
   - As you suggested, try clang + libc++ to try to rule libstdc++ in or out.

  A couple of questions that may or may not prove relevant:  
   - You've mentioned you don't get the crashes on the other linux distro.  Have you tried your modified canary on the other linux distro anyway?  If we're certain the two systems behave differently with the canary that may help us in narrowing things down.
   - Which version of the C++ standard are you compiling against?  Is it just the default on all systems, or are you forcing a specific -std=...?

  == Comment: #22 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-07 12:18:41 ==
  I'm having some difficulties with core file compatibility.  I put your files on an Ubuntu 16.04.1 system, but I don't see quite the same results as you report under gdb, with libc and libgcc shared libs not at the correct address and a problem with the stack.  There's a transcript below.  I'm particularly concerned about the warning that the core file and executable may not match.  Note also the report of stack corruption above frame #4, so I can't get to frame #6 to look at the register state.  The library frames at #0-#3 are reporting the wrong information, which I assume to be because the libraries are at the wrong address.

  For debug purposes it would probably be best to use the system
  compiler, just in case that wasn't the case here.

  $ ls -l
  total 1950688
  -rw-r--r-- 1 wschmidt wschmidt  700141992 Nov  7 14:37 mongod.power
  -rw-r--r-- 1 wschmidt wschmidt 1297350656 Nov  7 14:39 mongod.power.core
  $ gdb mongod.power mongod.power.core
  GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
  Copyright (C) 2016 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "powerpc64le-linux-gnu".
  Type "show configuration" for configuration details.
  For bug reporting instructions, please see:
  <http://www.gnu.org/software/gdb/bugs/>.
  Find the GDB manual and other documentation resources online at:
  <http://www.gnu.org/software/gdb/documentation/>.
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  Reading symbols from mongod.power...done.

  warning: core file may not match specified executable file.
  [New LWP 101461]
  [New LWP 100045]
  [New LWP 100062]
  [New LWP 100056]
  [New LWP 99983]
  [New LWP 100052]
  [New LWP 100054]
  [New LWP 99892]
  [New LWP 100051]
  [New LWP 100048]
  [New LWP 100007]
  [New LWP 99868]
  [New LWP 100059]
  [New LWP 101459]
  [New LWP 100001]
  [New LWP 99986]
  [New LWP 101403]
  [New LWP 99980]
  [New LWP 99882]
  [New LWP 99893]
  [New LWP 99877]
  [New LWP 99872]
  [New LWP 101462]
  [New LWP 99874]
  [New LWP 100058]
  [New LWP 100231]
  [New LWP 99994]
  [New LWP 99873]
  [New LWP 100003]
  [New LWP 99993]
  [New LWP 99879]
  [New LWP 101398]
  [New LWP 99891]
  [New LWP 99880]
  [New LWP 99910]
  [New LWP 99895]
  [New LWP 99901]
  [New LWP 100011]
  [New LWP 99974]
  [New LWP 100049]
  [New LWP 99898]
  [New LWP 99875]
  [New LWP 101460]
  [New LWP 99878]
  [New LWP 99871]
  [New LWP 99896]
  [New LWP 101954]
  [New LWP 101406]
  [New LWP 100015]
  [New LWP 100068]
  [New LWP 99984]
  [New LWP 101519]
  [New LWP 100053]
  [New LWP 99996]
  [New LWP 100050]
  [New LWP 100055]
  [New LWP 100057]
  [New LWP 101807]
  [New LWP 99890]
  [New LWP 100004]
  [New LWP 99884]
  [New LWP 101437]
  [New LWP 101455]
  [New LWP 100013]
  [New LWP 99894]
  [New LWP 101411]
  [New LWP 101457]
  [New LWP 101431]
  [New LWP 101458]
  [New LWP 100443]
  [New LWP 101438]
  [New LWP 101414]
  [New LWP 101433]
  [New LWP 101784]
  [New LWP 99979]
  [New LWP 101397]
  [New LWP 101402]
  [New LWP 101401]
  [New LWP 101435]
  [New LWP 101405]
  [New LWP 101423]
  [New LWP 101425]
  [New LWP 99897]
  [New LWP 101419]
  [New LWP 99989]
  [New LWP 101409]
  [New LWP 100008]
  [New LWP 101410]
  [New LWP 99998]
  [New LWP 101413]
  [New LWP 101469]
  [New LWP 101418]
  [New LWP 101427]
  [New LWP 101399]
  [New LWP 101235]
  [New LWP 101396]
  [New LWP 101421]
  [New LWP 99990]
  [New LWP 101407]
  [New LWP 101480]
  [New LWP 100060]
  [New LWP 101499]
  [New LWP 101506]
  [New LWP 101395]
  [New LWP 101415]
  [New LWP 101400]
  [New LWP 101412]
  [New LWP 101408]
  [New LWP 101420]
  [New LWP 101416]
  [New LWP 101492]
  [New LWP 101513]
  [New LWP 101782]
  [New LWP 101404]
  [New LWP 101481]
  [New LWP 101417]
  [New LWP 100067]
  [New LWP 101429]
  [New LWP 99883]
  [New LWP 101430]
  [New LWP 101436]
  [New LWP 101454]
  [New LWP 101428]
  [New LWP 101422]
  [New LWP 100108]
  [New LWP 101434]
  [New LWP 100064]
  [New LWP 101453]
  [New LWP 100061]
  [New LWP 101426]
  [New LWP 100066]
  [New LWP 101452]
  [New LWP 101439]
  [New LWP 101456]
  [New LWP 101451]
  [New LWP 101450]
  [New LWP 101432]
  [New LWP 101449]
  [New LWP 101424]
  [New LWP 100065]
  [New LWP 100063]
  [New LWP 101448]
  [New LWP 101447]
  [New LWP 101446]
  [New LWP 101445]
  [New LWP 101444]
  [New LWP 101443]
  [New LWP 101442]
  [New LWP 101441]
  [New LWP 101440]

  warning: .dynamic section for "/lib/powerpc64le-linux-
  gnu/libgcc_s.so.1" is not at the expected address (wrong library or
  version mismatch?)

  warning: .dynamic section for "/lib/powerpc64le-linux-gnu/libc.so.6" is not at the expected address (wrong library or version mismatch?)
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
  Core was generated by `/home/pic1user/proj/mongo-repro/mongod --oplogSize 1024 --port 30012 --nopreall'.
  Program terminated with signal SIGABRT, Aborted.
  #0  0x00003fff997be5d0 in __copysign (y=<optimized out>, x=<optimized out>)
      at ../sysdeps/generic/math_private.h:233
  233	../sysdeps/generic/math_private.h: No such file or directory.
  [Current thread is 1 (Thread 0x3fff5814ec20 (LWP 101461))]
  (gdb) bt
  #0  0x00003fff997be5d0 in __copysign (y=<optimized out>, x=<optimized out>)
      at ../sysdeps/generic/math_private.h:233
  #1  __modf_power5plus (x=-6.2774385622041925e+66, iptr=0x3fff5814c1f0)
      at ../sysdeps/powerpc/power5+/fpu/s_modf.c:44
  #2  0x00003fff997be4f0 in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6
  #3  0x00003fff997c0c00 in ?? () at ../signal/allocrtsig.c:45
     from /lib/powerpc64le-linux-gnu/libc.so.6
  #4  0x00000000223c33e8 in mongo::invariantFailed (expr=<optimized out>, 
      file=0x24131b38 "src/mongo/bson/util/bson_extract.cpp", 
      line=<optimized out>) at src/mongo/util/assert_util.cpp:154
  Backtrace stopped: previous frame inner to this frame (corrupt stack?)
  (gdb) quit
  $ gcc -v
  Using built-in specs.
  COLLECT_GCC=gcc
  COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc64le-linux-gnu/5/lto-wrapper
  Target: powerpc64le-linux-gnu
  Configured with: ../src/configure -v --with-pkgversion='Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-ppc64el/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-ppc64el --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-ppc64el --with-arch-directory=ppc64le --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-secureplt --with-cpu=power8 --enable-targets=powerpcle-linux --disable-multilib --enable-multiarch --disable-werror --with-long-double-128 --enable-checking=release --build=powerpc64le-linux-gnu --host=powerpc64le-linux-gnu --target=powerpc64le-linux-gnu
  Thread model: posix
  gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) 
  $ lsb_release -a
  No LSB modules are available.
  Distributor ID:	Ubuntu
  Description:	Ubuntu 16.04.1 LTS
  Release:	16.04
  Codename:	xenial
  $ 

  
  I'll disassemble the binary and see if I can spot anything without the state information.

  Oh, still waiting on permission to mirror the bug.

  == Comment: #23 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-07 13:39:45 ==
  A little more information:

  I've been looking at bsonExtractStringField's disassembly.  It appears
  that this binary inlines the call to the Canary constructor as well as
  the call to _verify.  As evidence, I see the PLT call to glibc's
  memset:

    8ebb3c:       71 c9 06 48     bl      9584ac
  <00000d72.plt_call.memset@@GLIBC_2.17>

  And later I see the call to invariantFailed:

    8ebc44:       e9 75 f0 4b     bl      7f322c
  <_ZN5mongo15invariantFailedEPKcS1_j+0x8>

  So we've answered Steve's initial question about which memset we're
  using.  This isn't being inlined by the compiler, but does an out-of-
  line dynamic call to the GLIBC_2.17 version.

  I'm not sure whether GCC would inline a 1024-byte memset using
  __builtin_memset, or just end up calling out the same way, but it
  might be worth trying out that replacement, and disassembling
  bsonExtractStringField again to see if the PLT call has gone away.

  == Comment: #24 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-07 13:50:04 ==
  I forgot to mention that the ensuing code generation to accumulate the checksum and test it is completely straightforward and looks correct.  So this looks like pretty strong evidence that the problem is in the GLIBC memset implementation.

    8ebb3c:       71 c9 06 48     bl      9584ac <00000d72.plt_call.memset@@GLIBC_2.17>
    8ebb40:       18 00 41 e8     ld      r2,24(r1)
    8ebb44:       00 04 40 39     li      r10,1024
    8ebb48:       00 00 20 39     li      r9,0
    8ebb4c:       a6 03 49 7d     mtctr   r10
    8ebb50:       00 00 43 89     lbz     r10,0(r3)
    8ebb54:       01 00 63 38     addi    r3,r3,1
    8ebb58:       14 52 29 7d     add     r9,r9,r10
    8ebb5c:       f4 ff 00 42     bdnz    8ebb50 <_ZN5mongo22bsonExtractStringFieldERKNS_7BSONObjENS_10StringDataEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x80>
    8ebb60:       03 00 40 3d     lis     r10,3
    8ebb64:       00 34 4a 61     ori     r10,r10,13312
    8ebb68:       00 50 a9 7f     cmpd    cr7,r9,r10
    8ebb6c:       c4 00 9e 40     bne     cr7,8ebc30 <_ZN5mongo22bsonExtractStringFieldERKNS_7BSONObjENS_10StringDataEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x160>

  ...

    8ebc30:       44 ff 82 3c     addis   r4,r2,-188
    8ebc34:       44 ff 62 3c     addis   r3,r2,-188
    8ebc38:       3a 00 a0 38     li      r5,58
    8ebc3c:       38 aa 84 38     addi    r4,r4,-21960
    8ebc40:       60 aa 63 38     addi    r3,r3,-21920
    8ebc44:       e9 75 f0 4b     bl      7f322c <_ZN5mongo15invariantFailedEPKcS1_j+0x8>

  == Comment: #28 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-08 11:02:18 ==
  Recording some information from email discussions.

  (1) The customer is planning to attempt to use valgrind memcheck.
  (2) The const cast problem with the canary has been fixed without changing the results.
  (3) Prior to that fix, the canary was used on the RHEL system with no corruption detected, so this does seem to be Ubuntu-specific.
  (4) -std=c++11 is used everywhere.
  (5) The core and binary compatibility issues appear to be that they were generated on 16.10, not 16.04.  New ones coming.
  (6) The canary code now looks like:

  +namespace {
  +
  +class Canary {
  +public:
  +
  +    static constexpr size_t kSize = 2048;
  +
  +    explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
  +        __builtin_memset(const_cast<unsigned char*>(t), kBits, kSize);
  +        _verify();
  +    }
  +
  +    ~Canary() {
  +        _verify();
  +    }
  +
  +private:
  +    static constexpr uint8_t kBits = 0xCD;
  +    static constexpr size_t kChecksum = kSize * size_t(kBits);
  +
  +    void _verify() const noexcept {
  +        invariant(std::accumulate(&_t[0], &_t[kSize], 0UL) == kChecksum);
  +    }
  +
  +    const volatile unsigned char* const _t;
  +};
  +
  +}  // namespace
  +

  And its application in bsonExtractTypedField looks like:

  @@ -47,6 +82,10 @@ Status bsonExtractTypedField(const BSONObj& object,
                                StringData fieldName,
                                BSONType type,
                                BSONElement* outElement) {
  +
  +    volatile unsigned char* const cookie = static_cast<unsigned char *>(alloca(Canary::kSize));
  +    const Canary c(cookie);
  +
       Status status = bsonExtractField(object, fieldName, outElement);

  (7) Steve Munroe investigated memset and he and Andrew are in
  agreement that we can rule it out:

  I looked at the memset_power8 code (memset is just a IFUNC resolve
  stub). and I don't see how this problem is caused by memset_power8.

  First some observations:

  The canary is allocated with alloca for a large power of 2 (1024 bytes).
  Alloca returns quadword aligned memory as required to maintain quadword stack alignment.
  For this case memset_power8 will quickly jump to the vector store loop (quadword x 8) all from the same register (a vector splat of the fill char).

  With this code the failure modes could only be:
  Overwrite by N*quadwords,
  Underwrite by N*quadwords,
  A repeated pattern every quadword.

  But we are not see this. Also think we are back to a clobber by some
  other code.

  == Comment: #29 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-08 11:03:33 ==
  From Andrew, difficulties with Valgrind:

  I did try the valgrind repro. However, I'm not able to make valgrind
  work:

  The first try resulted in lots of "mismatched free/delete" reports,
  which is sort of odd, because they all seem to be from within the
  standard library:

  > valgrind --soname-synonyms=somalloc=NONE --track-origins=yes --leak-check=no ./mongos
  ==17387== Memcheck, a memory error detector
  ==17387== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
  ==17387== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
  ==17387== Command: ./mongos
  ==17387==
  ==17387== Mismatched free() / delete / delete []
  ==17387==    at 0x4895888: free (in /usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
  ==17387==    by 0x59514F: deallocate (new_allocator.h:110)
  ==17387==    by 0x59514F: deallocate (alloc_traits.h:517)
  ==17387==    by 0x59514F: _M_deallocate_buckets (hashtable_policy.h:2010)
  ==17387==    by 0x59514F: _M_deallocate_buckets (hashtable.h:356)
  ==17387==    by 0x59514F: _M_deallocate_buckets (hashtable.h:361)
  ==17387==    by 0x59514F: _M_rehash_aux (hashtable.h:1999)
  ==17387==    by 0x59514F: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_rehash(unsigned long, unsigned long const&) (hashtable.h:1953)
  ==17387==    by 0x595253: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, true>*) (hashtable.h:1600)
  ==17387==    by 0x5954D3: std::__detail::_Map_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (hashtable_policy.h:600)
  ==17387==    by 0x593693: operator[] (unordered_map.h:668)
  ==17387==    by 0x593693: mongo::InitializerDependencyGraph::addInitializer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<mongo::Status (mongo::InitializerContext*)> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (initializer_dependency_graph.cpp:58)
  ==17387==    by 0x591057: mongo::GlobalInitializerRegisterer::GlobalInitializerRegisterer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<mongo::Status (mongo::InitializerContext*)> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (global_initializer_registerer.cpp:44)
  ==17387==    by 0x52D46F: __static_initialization_and_destruction_0(int, int) [clone .constprop.34] (mongos_options_init.cpp:39)
  ==17387==    by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
  ==17387==    by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
  ==17387==    by 0x4F83337: (below main) (libc-start.c:116)
  ==17387==  Address 0x5151fb0 is 0 bytes inside a block of size 16 alloc'd
  ==17387==    at 0x48951D4: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
  ==17387==    by 0x59328F: allocate (new_allocator.h:104)
  ==17387==    by 0x59328F: allocate (alloc_traits.h:491)
  ==17387==    by 0x59328F: std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, true> > >::_M_allocate_buckets(unsigned long) [clone .isra.108] (hashtable_policy.h:1996)
  ==17387==    by 0x595093: _M_allocate_buckets (hashtable.h:347)
  ==17387==    by 0x595093: _M_rehash_aux (hashtable.h:1974)
  ==17387==    by 0x595093: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_rehash(unsigned long, unsigned long const&) (hashtable.h:1953)
  ==17387==    by 0x595253: std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, true>*) (hashtable.h:1600)
  ==17387==    by 0x5954D3: std::__detail::_Map_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (hashtable_policy.h:600)
  ==17387==    by 0x59356B: operator[] (unordered_map.h:668)
  ==17387==    by 0x59356B: mongo::InitializerDependencyGraph::addInitializer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<mongo::Status (mongo::InitializerContext*)> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (initializer_dependency_graph.cpp:46)
  ==17387==    by 0x591057: mongo::GlobalInitializerRegisterer::GlobalInitializerRegisterer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<mongo::Status (mongo::InitializerContext*)> const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (global_initializer_registerer.cpp:44)
  ==17387==    by 0x52D46F: __static_initialization_and_destruction_0(int, int) [clone .constprop.34] (mongos_options_init.cpp:39)
  ==17387==    by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
  ==17387==    by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
  ==17387==    by 0x4F83337: (below main) (libc-start.c:116)

  
  So, that is a puzzle. However, I can instruct valgrind to ignore that. But it still fails to start, now with something more odd:

  $ valgrind --show-mismatched-frees=no --soname-synonyms=somalloc=NONE --track-origins=yes --leak-check=no ./mongos
  ==19834== Memcheck, a memory error detector
  ==19834== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
  ==19834== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
  ==19834== Command: ./mongos
  ==19834==
  MC_(get_otrack_shadow_offset)(ppc64)(off=1688,sz=8)

  Memcheck: mc_machine.c:329 (get_otrack_shadow_offset_wrk): the
  'impossible' happened.

  host stacktrace:
  ==19834==    at 0x3808D9B8: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x3808DB5F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x3808DCDB: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x38078CE3: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x38076FAB: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x380BAA2B: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x381B9BB7: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x380BE19F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x3810D04F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x3810FFEF: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
  ==19834==    by 0x3812BB97: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)

  sched status:
    running_tid=1

  Thread 1: status = VgTs_Runnable (lwpid 19834)
  ==19834==    at 0x4F3AC14: __lll_lock_elision (elision-lock.c:60)
  ==19834==    by 0x4F2BBC7: pthread_mutex_lock (pthread_mutex_lock.c:92)
  ==19834==    by 0x602753: mongo::DBConnectionPool::DBConnectionPool() (connpool.cpp:196)
  ==19834==    by 0x5319EB: __static_initialization_and_destruction_0 (global_conn_pool.cpp:35)
  ==19834==    by 0x5319EB: _GLOBAL__sub_I__ZN5mongo14globalConnPoolE (global_conn_pool.cpp:39)
  ==19834==    by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
  ==19834==    by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
  ==19834==    by 0x4F83337: (below main) (libc-start.c:116)

  
  Note: see also the FAQ in the source distribution.
  It contains workarounds to several common problems.
  In particular, if Valgrind aborted or crashed after
  identifying problems in your program, there's a good chance
  that fixing those problems will prevent Valgrind aborting or
  crashing, especially if it happened in m_mallocfree.c.

  If that doesn't help, please report this bug to: www.valgrind.org

  In the bug report, send all the above text, the valgrind
  version, and what OS and version you are using.  Thanks.

  
  I'm not really sure what to make of that, except that I did see some thing die in the same place, once or twice (__lll_lock_elision), when running with clang ASAN with the stack-use-after-return checking enabled. I wasn't really sure what to make of that, but it is interesting that this has turned up twice. I presume this is related to hardware lock elision?

  Anyway, it doesn't seem like I can get this running with valgrind.
  Happy to try again if anyone is aware of a workaround.

  == Comment: #30 - William J. Schmidt <wschmidt at us.ibm.com> - 2016-11-08 11:06:00 ==
  CCing Carl Love.  Carl, have you seen this sort of interaction between valgrind and lock elision before?  (Comment #29, you can ignore the rest of this bugzilla for now.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gcc-5/+bug/1640518/+subscriptions



More information about the foundations-bugs mailing list