[Bug 1842730] Re: glibc: dlopen crash after a previously failed call to dlopen
Bug Watch Updater
1842730 at bugs.launchpad.net
Tue Oct 8 16:13:54 UTC 2019
Launchpad has imported 7 comments from the remote bug at
https://sourceware.org/bugzilla/show_bug.cgi?id=20839.
If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.
------------------------------------------------------------------------
On 2016-11-18T13:16:20+00:00 Florian Weimer wrote:
There are some cases in the implementation of dlopen where
_dl_signal_error is called without removing all partially-initialized
link maps. The downstream bug report refers to an error raised from
_dl_map_object in response to a missing file (the final call to
_dl_signal_error). We do some cleanup, but it seems we skip removal of
a NODELETE object.
It's not clear to me if we should complete the initialization of the
NODELETE object, or somehow arrange that we are always in a situation in
which we can remove the NODELETE object without observable effects if we
have to. The latter probably means that we cannot start running
constructors and IFUNCs until all objects in the current link operation
have been found, mapped, and all required ld.so data structures have at
least been allocated.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1842730/comments/0
------------------------------------------------------------------------
On 2017-10-11T04:42:35+00:00 Ben Woodard wrote:
*** Bug 22280 has been marked as a duplicate of this bug. ***
Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1842730/comments/1
------------------------------------------------------------------------
On 2017-10-11T04:44:53+00:00 Ben Woodard wrote:
Created attachment 10522
a fairly simple reproducer
As a case in point here is a reproducer that I whipped up based upon a
customer report.
[ben at Mustang dl-bug]$ make all
cc -g -c -o main.o main.c
cc -g -o main main.o -ldl
cc -g -c -fpic a.c
cc -g -c -fpic d.c
cc -g -fpic -shared -Wl,-z,nodelete -o libd.so d.o
cc -g -c -fpic e.c
cc -g -fpic -shared -o libe.so e.o
cc -g -fpic -shared -o liba.so a.o -L. -ld -le
cc -g -c -fpic b.c
cc -g -fpic -shared -o libb.so b.o -L. -ld
[ben at Mustang dl-bug]$ make run
LD_LIBRARY_PATH=. ./main
d_fn x=12
inside b_fn
rm libe.so
LD_LIBRARY_PATH=. ./main
Could not open liba.so - libe.so: cannot open shared object file: No such file or directory
make: *** [Makefile:38: run] Segmentation fault (core dumped)
Note that libd.so is marked NODELETE
So when main dlopen's liba.so which needs on libd.so and libe.so because libe.so is missing, the load of liba.so fails. This is expected. However, when libb.so is loaded which also needs libd.so the application crashes because the relocations haven't been done.
[ben at Mustang dl-bug]$ LD_LIBRARY_PATH=. LD_DEBUG=reloc,files ./main 2> foo
d_fn x=12
inside b_fn
[ben at Mustang dl-bug]$ egrep file\|reloc foo
10901: file=libdl.so.2 [0]; needed by ./main [0]
10901: file=libdl.so.2 [0]; generating link map
10901: file=libc.so.6 [0]; needed by ./main [0]
10901: file=libc.so.6 [0]; generating link map
10901: relocation processing: /lib64/libc.so.6
10901: relocation processing: /lib64/libdl.so.2
10901: relocation processing: ./main (lazy)
10901: relocation processing: /lib64/ld-linux-x86-64.so.2
10901: file=liba.so [0]; dynamically loaded by ./main [0]
10901: file=liba.so [0]; generating link map
10901: file=libd.so [0]; needed by ./liba.so [0]
10901: file=libd.so [0]; generating link map
10901: file=libe.so [0]; needed by ./liba.so [0]
10901: file=libe.so [0]; generating link map
10901: relocation processing: ./libe.so
10901: relocation processing: ./libd.so
10901: relocation processing: ./liba.so
10901: opening file=./liba.so [0]; direct_opencount=1
10901: file=libb.so [0]; dynamically loaded by ./main [0]
10901: file=libb.so [0]; generating link map
10901: relocation processing: ./libb.so
10901: opening file=./libb.so [0]; direct_opencount=1
vs.
[ben at Mustang dl-bug]$ rm libe.so
[ben at Mustang dl-bug]$ LD_LIBRARY_PATH=. LD_DEBUG=reloc,files ./main 2> foo
Segmentation fault (core dumped)
[ben at Mustang dl-bug]$ egrep file\|reloc foo
10965: file=libdl.so.2 [0]; needed by ./main [0]
10965: file=libdl.so.2 [0]; generating link map
10965: file=libc.so.6 [0]; needed by ./main [0]
10965: file=libc.so.6 [0]; generating link map
10965: relocation processing: /lib64/libc.so.6
10965: relocation processing: /lib64/libdl.so.2
10965: relocation processing: ./main (lazy)
10965: relocation processing: /lib64/ld-linux-x86-64.so.2
10965: file=liba.so [0]; dynamically loaded by ./main [0]
10965: file=liba.so [0]; generating link map
10965: file=libd.so [0]; needed by ./liba.so [0]
10965: file=libd.so [0]; generating link map
10965: file=libe.so [0]; needed by ./liba.so [0]
10965: file=./liba.so [0]; destroying link map
Could not open liba.so - libe.so: cannot open shared object file: No such file or directory
10965: file=libb.so [0]; dynamically loaded by ./main [0]
10965: file=libb.so [0]; generating link map
10965: relocation processing: ./libb.so
Note on the failing case the relocations are never done on libd.so
Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1842730/comments/2
------------------------------------------------------------------------
On 2017-10-11T05:19:54+00:00 Ben Woodard wrote:
It is my opinion that the NODELETE flag should not be honored at this
early stage. The reason for the NODELETE flag is that the library may
have side effects that are irreversible. However, because the library
has not been relocated, it cannot even have had its constructor run.
Therefore, its ability to cause irreversible side effects are
practically nil. Therefore it is safe to remove it as if the NODELETE
flag had not been set.
Earlier testing with my reproducer demonstrated that without the
-Wl,nodelete command line option, the problem does not manifest.
Therefore, we have a case where we do not have a NODELETE flag that
works correctly and we have a case where we do honor the NODELETE flag
which crashes. Therefore it seems to make sense that the NODELETE flag
only takes effect after the relocations have been done. Or maybe only
after the library's constructor has been run. It is only then, that the
library could have made a change which that would not permit it from
being deleted.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1842730/comments/3
------------------------------------------------------------------------
On 2018-10-23T18:59:04+00:00 Lion-y wrote:
*** Bug 23810 has been marked as a duplicate of this bug. ***
Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1842730/comments/4
------------------------------------------------------------------------
On 2018-12-15T23:28:09+00:00 Github-e wrote:
Created attachment 11463
Honor NODELETE only after relocation
At our side, this bug manifest itself as a "libgcc_s.so.1 must be
installed for pthread_cancel to work" message.
I agree with Ben that we shuold not honor NODELETE too early. The
attached patch makes our production use-case work. I might have time to
add a test case too in the following days.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1842730/comments/5
------------------------------------------------------------------------
On 2019-07-01T19:45:41+00:00 Cvs-commit wrote:
The master branch has been updated by H.J. Lu <hjl at sourceware.org>:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d0093c5cefb7f7a4143f3bb03743633823229cc6
commit d0093c5cefb7f7a4143f3bb03743633823229cc6
Author: H.J. Lu <hjl.tools at gmail.com>
Date: Mon Jul 1 12:23:10 2019 -0700
Call _dl_open_check after relocation [BZ #24259]
This is a workaround for [BZ #20839] which doesn't remove the NODELETE
object when _dl_open_check throws an exception. Move it after relocation
in dl_open_worker to avoid leaving the NODELETE object mapped without
relocation.
[BZ #24259]
* elf/dl-open.c (dl_open_worker): Call _dl_open_check after
relocation.
* sysdeps/x86/Makefile (tests): Add tst-cet-legacy-5a,
tst-cet-legacy-5b, tst-cet-legacy-6a and tst-cet-legacy-6b.
(modules-names): Add tst-cet-legacy-mod-5a, tst-cet-legacy-mod-5b,
tst-cet-legacy-mod-5c, tst-cet-legacy-mod-6a, tst-cet-legacy-mod-6b
and tst-cet-legacy-mod-6c.
(CFLAGS-tst-cet-legacy-5a.c): New.
(CFLAGS-tst-cet-legacy-5b.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-5a.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-5b.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-5c.c): Likewise.
(CFLAGS-tst-cet-legacy-6a.c): Likewise.
(CFLAGS-tst-cet-legacy-6b.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-6a.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-6b.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-6c.c): Likewise.
($(objpfx)tst-cet-legacy-5a): Likewise.
($(objpfx)tst-cet-legacy-5a.out): Likewise.
($(objpfx)tst-cet-legacy-mod-5a.so): Likewise.
($(objpfx)tst-cet-legacy-mod-5b.so): Likewise.
($(objpfx)tst-cet-legacy-5b): Likewise.
($(objpfx)tst-cet-legacy-5b.out): Likewise.
(tst-cet-legacy-5b-ENV): Likewise.
($(objpfx)tst-cet-legacy-6a): Likewise.
($(objpfx)tst-cet-legacy-6a.out): Likewise.
($(objpfx)tst-cet-legacy-mod-6a.so): Likewise.
($(objpfx)tst-cet-legacy-mod-6b.so): Likewise.
($(objpfx)tst-cet-legacy-6b): Likewise.
($(objpfx)tst-cet-legacy-6b.out): Likewise.
(tst-cet-legacy-6b-ENV): Likewise.
* sysdeps/x86/tst-cet-legacy-5.c: New file.
* sysdeps/x86/tst-cet-legacy-5a.c: Likewise.
* sysdeps/x86/tst-cet-legacy-5b.c: Likewise.
* sysdeps/x86/tst-cet-legacy-6.c: Likewise.
* sysdeps/x86/tst-cet-legacy-6a.c: Likewise.
* sysdeps/x86/tst-cet-legacy-6b.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-5.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-5a.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-5b.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-5c.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-6.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-6a.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-6b.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-6c.c: Likewise.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1842730/comments/6
** Changed in: glibc
Status: Unknown => Confirmed
** Changed in: glibc
Importance: Unknown => Medium
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1842730
Title:
glibc: dlopen crash after a previously failed call to dlopen
Status in GLibC:
Confirmed
Status in glibc package in Ubuntu:
New
Bug description:
Environment
===========
Ubuntu 18.04.3 LTS
Linux 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
libc6:amd64 2.27-3ubuntu1
gcc 4:7.4.0-1ubuntu2.3
Steps to reproduce the crash
============================
(note: all libraries are linked with --no-as-needed to keep them as
DT_NEEDED entries in the dynamic section, even though they are
unused.)
1) create an empty library libNOTFOUND.so
2) create an empty library libB.so, linked to libNOTFOUND.so
3) create an empty library libA.so, linked to glibc's librt.so
4) create an empty library libPLUGIN.so, linked to libA.so and libB.so, set DT_RUNPATH to '$ORIGIN'
5) create an empty library libMAIN.so
6) create an executable, linked to libMAIN.so and libdl.so, set DT_RUNPATH to '$ORIGIN', this program calls:
a) dlopen("<absolute path to>/libPLUGIN.so")
b) dlopen("<absolute path to>/libMAIN.so")
Behaviour
=========
a) dlopen("<absolute path to>/libPLUGIN.so") fails because it cannot find libNOTFOUND.so via default search methods. This is wanted and OK!
b) dlopen("<absolute path to>/libMAIN.so") raises SIGSEGV somewhere deep inside the dynamic linking code of glibc (backtrace attached). Expected result: returns a valid handle to libMAIN.so.
Comments
========
Attached is a simple test script which does all the steps from above
and also shows the workaround: Ensure that librt.so is loaded and
fully initialized before the failing call to
dlopen("<...>/libPLUGIN.so") happens. This can be done either via
LD_PRELOAD or by linking the executable to librt.so.
You can also replace librt.so with libpthread.so to reproduce this
behaviour. Any other library I tried instead of librt.so (e.g.
libm.so) does not trigger this bug.
I also attached a trace with LD_DEBUG=all. Here you can see that glibc
tries to relocate librt.so while it loads libMAIN.so. I would expect
that librt.so is loaded/relocated when libPLUGIN.so is dlopen'ed or
that it is neither loaded nor relocated because libPLUGIN.so has unmet
dependencies.
This example is a stripped down version of a real scenario where an
application was misconfigured.
To manage notifications about this bug go to:
https://bugs.launchpad.net/glibc/+bug/1842730/+subscriptions
More information about the foundations-bugs
mailing list