[Bug 1634989] Re: Segfault on rabbitmq-server start
Jon Grimm
jon.grimm at canonical.com
Wed Mar 22 20:25:13 UTC 2017
Did a bit more reading into the root cause; seems to be that it is
undefined behavior in erlang when one opens a port multiple times with
the same fd. This is path that rabbitmq-server currently triggers, and
in _some_ versions of erlang this segfaults.
A standalone example of the faulty code in rabbitmq-server is:
erl -noshell -eval 'Port = open_port({fd, 0, 2}, [out]), Port2 =
open_port({fd, 2, 2}, [out]), port_command(Port, "a"), port_close(Port),
erlang:halt(10)'
Standalone test:
Trusty: OK, Xenial: segfault Yakkety: segfault, Zesty: OK
rabbitmq-server has offending code:
Trusty: yes Xenial: yes, Yakkety: yes, Zesty: no
Note: The rabbitmq-server offending code path is essentially anything
that uses the format_stderr(Fmt, Args) function helper function. The
testcase provided in #1 is just a single specifc instance that could
trigger the segfault. IOW, the bug is somewhat more broad of a bug than
that testcase and description, thus more interesting to SRU a fix into
Xenial/Yakkety.
As yakkety and xenial contain both a rabbitmq-server with the offending
code && an erlang that will segfault with it, we should SRU there.
While trusty contains the offending codepath 1) it cannot be triggered
with the erlang version in trusty (1.16.x) and 2) the proposed upstream
commit for the fix makes claim that it is safe with changes now made
erlang-17 or later, so this fix is not certain to not cause other issues
on trusty. IOW, best to leave trusty alone.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to rabbitmq-server in Ubuntu.
https://bugs.launchpad.net/bugs/1634989
Title:
Segfault on rabbitmq-server start
Status in rabbitmq-server package in Ubuntu:
Fix Released
Status in rabbitmq-server source package in Xenial:
In Progress
Status in rabbitmq-server source package in Yakkety:
In Progress
Bug description:
---Problem Description---
Starting rabbitmq-server triggers segfault.
The segfault happens when the host is not reachable, for instance, which breaks the installation of rabbitmq-server package.
It is comprehensible that an error must occur, but segfault should not be a default behaviour.
This has been tested on 16.04 and 16.10, archs ppc64el and x86_64
---uname output---
Linux vm1 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:14:41 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
---Steps to Reproduce---
#Better reproducible on a machine with 1 cpu
root at yakkety:~# echo "192.168.1.1 blah" >> /etc/hosts
root at yakkety:~# hostname blah
root at yakkety:~# apt-get install rabbitmq-server
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
rabbitmq-server
0 upgraded, 1 newly installed, 0 to remove and 2 not upgraded.
Need to get 0 B/4,251 kB of archives.
After this operation, 5,243 kB of additional disk space will be used.
Selecting previously unselected package rabbitmq-server.
(Reading database ... 63962 files and directories currently installed.)
Preparing to unpack .../rabbitmq-server_3.5.7-1_all.deb ...
Unpacking rabbitmq-server (3.5.7-1) ...
Processing triggers for ureadahead (0.100.0-19) ...
Setting up rabbitmq-server (3.5.7-1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/rabbitmq-server.service ? /lib/systemd/system/rabbitmq-server.service.
Job for rabbitmq-server.service failed because the control process exited with error code.
See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
invoke-rc.d: initscript rabbitmq-server, action "start" failed.
? rabbitmq-server.service - RabbitMQ Messaging Server
Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2016-10-19 11:13:46 EDT; 7ms ago
Process: 2818 ExecStartPost=/usr/lib/rabbitmq/bin/rabbitmq-server-wait (code=exited, status=139)
Process: 2817 ExecStart=/usr/sbin/rabbitmq-server (code=exited, status=1/FAILURE)
Main PID: 2817 (code=exited, status=1/FAILURE)
Oct 19 11:13:13 blah systemd[1]: Starting RabbitMQ Messaging Server...
Oct 19 11:13:13 blah rabbitmq[2818]: Waiting for rabbit at blah ...
Oct 19 11:13:13 blah rabbitmq[2818]: pid is 2826 ...
Oct 19 11:13:43 blah systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=1/FAILURE
Oct 19 11:13:46 blah rabbitmq[2818]: Segmentation fault
Oct 19 11:13:46 blah systemd[1]: rabbitmq-server.service: Control process exited, code=exited status=139
Oct 19 11:13:46 blah systemd[1]: Failed to start RabbitMQ Messaging Server.
Oct 19 11:13:46 blah systemd[1]: rabbitmq-server.service: Unit entered failed state.
Oct 19 11:13:46 blah systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.
dpkg: error processing package rabbitmq-server (--configure):
subprocess installed post-installation script returned error exit status 1
Processing triggers for systemd (231-9git1) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for ureadahead (0.100.0-19) ...
Errors were encountered while processing:
rabbitmq-server
E: Sub-process /usr/bin/dpkg returned an error code (1)
root at yakkety:~# dmesg -T
[Wed Oct 19 11:11:55 2016] async_10[2334]: unhandled signal 11 at 0000000000000000 nip 00000000206867bc lr 0000000020635648 code 30001
[Wed Oct 19 11:13:02 2016] random: crng init done
[Wed Oct 19 11:13:02 2016] systemd[1]: apt-daily.timer: Adding 3h 37min 32.381328s random time.
[Wed Oct 19 11:13:02 2016] systemd[1]: apt-daily.timer: Adding 11h 5min 8.314218s random time.
[Wed Oct 19 11:13:02 2016] systemd[1]: apt-daily.timer: Adding 11h 7min 37.045127s random time.
[Wed Oct 19 11:13:03 2016] systemd[1]: apt-daily.timer: Adding 8h 43min 50.771575s random time.
[Wed Oct 19 11:13:03 2016] systemd[1]: apt-daily.timer: Adding 2h 31min 33.179443s random time.
[Wed Oct 19 11:13:04 2016] systemd[1]: apt-daily.timer: Adding 4h 22min 42.585438s random time.
[Wed Oct 19 11:13:04 2016] systemd[1]: apt-daily.timer: Adding 36min 58.644429s random time.
[Wed Oct 19 11:13:04 2016] systemd[1]: apt-daily.timer: Adding 9h 16min 4.769857s random time.
[Wed Oct 19 11:13:12 2016] systemd[1]: apt-daily.timer: Adding 7h 48min 614.372ms random time.
[Wed Oct 19 11:13:12 2016] systemd[1]: apt-daily.timer: Adding 3h 13min 41.779132s random time.
[Wed Oct 19 11:13:12 2016] systemd[1]: apt-daily.timer: Adding 9h 39min 46.023823s random time.
[Wed Oct 19 11:13:45 2016] async_10[2912]: unhandled signal 11 at 0000000000000000 nip 000000004f0d67bc lr 000000004f085648 code 30001
[Wed Oct 19 11:13:45 2016] systemd[1]: apt-daily.timer: Adding 9h 5min 5.067674s random time.
Userspace tool common name: rabbitmq-server
The userspace tool has the following bit modes: 64
Userspace package: rabbitmq-server
I have just tested the patch in https://github.com/rabbitmq/rabbitmq-common/pull/54, which is present on v3.6.1 and prevents the segfault. The patch works and can be easily backported.
Thanks
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1634989/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list