[Bug 1354114] Re: multipath segmentation Fault (libmultipath: update waiter handling)

Rafael David Tinoco rafael.tinoco at canonical.com
Thu Aug 7 20:07:57 UTC 2014


So probably to last commits touching libmultipath/waiter.c (and fixing
this issue) would be:

1)

commit e1fcc5933ac44683cdee1a02304e1115abec3ff5
Author: Benjamin Marzinski <bmarzins at redhat.com>
Date:   Sat May 19 01:37:03 2012 -0500

    multipath: clean up code for stopping the waiter threads

    The way multipathd currently stops the waiter threads needs some work.
    Right now they are stopped by being sent the SIGUSR1 signal. However their
    cleanup code assumes that they are being cancelled, just like all the other
    threads are.  There's no reason for them to be so unnecessarily
    complicated and different from the other threads

    This patch does a couple of things.  First, it removes the mutex from
    the event_thread.  This wasn't doing anything. It was designed to protect
    the wp->mapname variable, which the waiter threads were checking to see
    if they should quit. However, the mutex was only ever being used by the
    thread itself, and it clearly didn't need to serialize with itself.  Also,
    the function to clear the mapname, signal_waiter(), was set with
    pthread_cleanup_push(), which never got called early, since the threads
    weren't being cancelled.  Thus, the mapname never got cleared
    until the pthreads were about to shut down.

    The patch also rips out all the signal stopping code, and just uses
    pthread_cancel.  There already are cancellation points in the waiter
    thread code. Between the cancellation points, both explicit and implicit,
    and the fact that the waiter threads will never be killed except when the
    killer is holding the vecs lock, there shouldn't be any place where the
    waiter thread can access freed data.

    To make sure the waiter thread cleans itself up properly, the dmt
    has been moved into the event_thread structure, and is destroyed in
    free_waiter() if necessary.

    Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1354114

Title:
  multipath segmentation Fault (libmultipath: update waiter handling)

Status in “multipath-tools” package in Ubuntu:
  Confirmed

Bug description:
  
  [Impact]

   * Multipath can cause segmentation fault due to wrong code and can 
     possibly cause user to loose access to multipath devices.

  [Test Case]

   * Working on it.

  [Regression Potential]

   * Fix based on upstream code (96f8146) Tag 0.5.0 already functioning. 
   * Introducing mutex, logic to deal with already dead pthread and other
     way to access same data (instead of accessing other time lived 
     structure). 

  [Other Info]

   * Original bug description:

  ----------------

  It was brought to me (~inaddy) the following situation with
  multipathd:

  #####

  Program terminated with signal 6, Aborted.
  #0 0x00007fbc6ae09445 in raise () from /lib/x86_64linuxgnu/
  libc.so.6
  (gdb) bt
  #0 0x00007fbc6ae09445 in raise () from /lib/x86_64linuxgnu/
  libc.so.6
  #1 0x00007fbc6ae0cbab in abort () from /lib/x86_64linuxgnu/
  libc.so.6
  #2 0x00007fbc6ae0210e in ?? () from /lib/x86_64linuxgnu/
  libc.so.6
  #3 0x00007fbc6ae021b2 in __assert_fail () from /lib/x86_64linuxgnu/
  libc.so.6
  #4 0x00007fbc6b849efb in pthread_mutex_lock () from /lib/x86_64linuxgnu/
  libpthread.so.0
  #5 0x00007fbc6b1cba5f in free_waiter (data=0x1691de0) at waiter.c:44
  #6 0x00007fbc6b1cc25a in waitevent (et=0x1691de0) at waiter.c:204
  #7 0x00007fbc6b847e9a in start_thread () from /lib/x86_64linuxgnu/
  libpthread.so.0
  #8 0x00007fbc6aec54bd in clone () from /lib/x86_64linuxgnu/
  libc.so.6
  #9 0x0000000000000000 in ?? ()

  --------------------------------------------------------------------------------------------
  #5 0x00007fbc6b1cba5f in free_waiter (data=0x1691de0) at waiter.c:44
  44 lock(wp>
  vecs>
  lock);
  (gdb) print wp>
  vecs>
  lock
  $1 = {mutex = 0x168c280, depth = 1}
  In pthread_mutex_lock.c:62 there's an assert that fails:
  #4 0x00007fbc6b849efb in __pthread_mutex_lock (mutex=0xfefefefefefefeff) at pthread_mutex_lock.c:62
  62 assert (mutex>_
  data._owner == 0);
  In this run:
  (gdb) p *wp>
  vecs>
  lock>
  mutex
  $3 = {_data = {lock = 1, __count = 0, __owner = 49, __nusers = 0, __kind = 0, __spins = 0, __list = {_prev = 0x0, __next = 0xffffffff}},
  __size = "\001\000\000\000\000\000\000\000\061", '\000' <repeats 23 times>"\377, \377\377\377\000\000\000", __align = 1}
  so __owner is 49 and not 0.
  Note that 49 is somewhat strange; it's expected to be a pid_t obtained via
  pid_t id = THREAD_GETMEM (THREAD_SELF, tid);

  According to https://bugzilla.redhat.com/show_bug.cgi?id=570278 , this
  assert failure could be an expected behaviour if, for some reason the
  multipath code was trying to release a mutex that has already been
  freed.

  The multipath-tools package is up to date (0.4.9-3ubuntu5)

  I do not find obvious thing related in http://git.opensvc.com/gitweb.cgi?p=multipath-tools%2F.git except may be
  http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=commitdiff;h=5ee9f716549d913aeb9800041f78ee9c6a50d860

  #####

  In between Precise's version and Upstream there are the following
  patches touching waiter.c:

  d887f4a = signal waiter thread to stop waiting on dm events
  5ee9f71 = simplify multipath signal handlers
  af4fd6d = Fix race condition in stop_waiter_thread()
  e1fcc59 = multipath: clean up code for stopping the waiter threads
  03ec4ef = multipath: fix shutdown crashes
  4dfdaf2 = multipath: Update multipath device on show topology
  c301a3f = Race condition when calling stop_waiter_thread()
  96f8146 = libmultipath: update waiter handling

  This specific one: 96f8146 (libmultipath: update waiter handling)

  """
  The current 'waiter' structure accesses fields which belong
  to the main 'mpp' structure, which has a totally different
  lifetime.
  """

  Shows that due to different lifetime between different structures,
  there can be use-after-free segfaults (what seems to be happening).

  waiter.c:44 = lock(wp->vecs->lock);

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1354114/+subscriptions



More information about the foundations-bugs mailing list