[Bug 1535898] Re: Trusty & Vivid multipath-tools (multipathd) seg-fault core dump
Nish Aravamudan
nish.aravamudan at canonical.com
Wed Jun 7 20:28:06 UTC 2017
Hello, Precise is EOL and we are no longer providing bug-fixes to it. It
would appear this particular issue is fixed in Trusty (the only current
release it is present) -- In Bug 1629644, it was determined this version
did not regress Trusty (a different upload did), and it has since
expired due to inactivity, unfortunately. I am unsubscribing the server
team and marking the precise task as "Won't Fix". Thank you for your
contributions to Ubuntu!
** Changed in: multipath-tools (Ubuntu Precise)
Status: In Progress => Won't Fix
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1535898
Title:
Trusty & Vivid multipath-tools (multipathd) seg-fault core dump
Status in multipath-tools package in Ubuntu:
Incomplete
Status in multipath-tools source package in Precise:
Won't Fix
Status in multipath-tools source package in Trusty:
Fix Released
Bug description:
[SRU justification]
Without this patch, multipathd may exit in SEGV in trying to add a map that aleady exists
[Impact]
multipathd crashes with SIGSEGV
A typical trace of such a situation is a message similar to this one in /var/log/syslog :
multipathd: 360060160164034004cd59cfdb22ce611: failed in domap for
addition of new path sdr
[Fix]
Check if the map already exists and do a RELOAD in domap() instead of failing.
[Test Case]
Problem was encountered in a complex Openstack test environment where the following was done :
A test tool which runs which :
- first boots a number of virtual machines.
- then it creates a number of threads and in each thread it
creates volumes, takes snapshots of the volumes, and attaches the volumes to the initially booted virtual machines. After a short while the volumes are detached, and snapshots and volumes are deleted.
Running this tool overnight normally result in running in the
multipathd SEGV situation.
[Regression]
This is a straight backport of the code being used in 0.5.0. No regression is to be expected.
It is important to note that the reproducer in the original
description did not lead to such a problem.
[Original description of the problem]
We have a problem on multipath-tools.
Usually after a path removal and a re-scan, the multipathd process
dies.
I created 2 hosts:
iscsi-server
iscsi-client
With 4 NICs in between them and with a simple multibus multipath. With
that I was able to check that there is a regression in multipath-
tools.
It looks like the patches brought from upstream:
0017-multipath-get-right-sysfs-value-for-checker_timeout.patch
0018-multipath-handle-offlined-paths.patch
#
# from here
#
0019-multipath-fix-scsi-timeout-code.patch
0020-multipath-make-tgt_node_name-work-for-iscsi-devices.patch
0021-multipath-cleanup-dev_loss_tmo-issues.patch
0022-Fix-for-setting-0-to-fast_io_fail.patch
0023-Fix-fast_io_fail-capping.patch
0024-multipath-enable-getting-uevents-through-libudev.patch
0025-Use-devpath-as-argument-for-sysfs-functions.patch
0026-multipathd-remove-references-to-sysfs_device.patch
0027-multipathd-use-struct-path-as-argument-for-event-pro.patch
0028-Add-global-udev-reference-pointer-to-config.patch
0029-Use-udev-enumeration-during-discovery.patch
0030-use-struct-udev_device-during-discovery.patch
0031-More-debugging-output-when-synchronizing-path-states.patch
0032-Use-struct-udev_device-instead-of-sysdev.patch
0033-discovery-Fixup-cciss-discovery.patch
0035-Use-udev-devices-during-discovery.patch
0036-Remove-all-references-to-hand-craftes-sysfs-code.patch
#
# to here
#
# 0037-multipath-libudev-cleanup-and-bugfixes.patch
# 0038-multipath-check-if-a-device-belongs-to-multipath.patch
# 0039-multipath-and-wwids_file-multipath.conf-option.patch
# 0040-multipath-Check-blacklists-as-soon-as-possible.patch
# 0041-add-wwids-file-cleanup-options.patch
# 0042-add-find_multipaths-option.patch
# 0043-alloc-keywords.patch
# lp1503305_libmultipath_info_on_1st_path_down_dbd131e.patch
In the range 19-36 caused a regression.
Whenever I generate the package (for trusty) including those patches
I'm able to generate a core dump indicating a possible double-free or
null-dereference related to a path removal (that is why I can
reproduce with the test case). Unfortunately it usually explodes
inside malloc() or somewhere in glibc.
Using valgrind I was able to verify some free() errors:
==30415== Invalid free() / delete / delete[] / realloc()
==30415== at 0x4C2BDEC: free (vg_replace_malloc.c:473)
==30415== by 0x54E243C: vector_del_slot (vector.c:95)
==30415== by 0x550A516: _remove_map (structs_vec.c:139)
==30415== by 0x550A5C3: _remove_maps (structs_vec.c:170)
==30415== by 0x550A64B: remove_maps (structs_vec.c:181)
==30415== by 0x40713F: configure (main.c:1153)
==30415== by 0x407A74: child (main.c:1419)
==30415== by 0x40837D: main (main.c:1618)
And they are exactly aligned to a core dump (multipathd) I got from
another user. (wrong free was coming from _remove_map).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1535898/+subscriptions
More information about the foundations-bugs
mailing list