[Bug 1784757] Re: [SRU] rabbitmq-server does not properly shutdown

Sergio Durigan Junior 1784757 at bugs.launchpad.net
Mon Jul 13 20:54:17 UTC 2020


** Description changed:

  [Impact]
- TBD
+ 
+ The systemd file rabbitmq-server.service on Bionic uses "Type=simple"
+ when defining the service, but unfortunately this doesn't work very well
+ for rabbitmq-server. In certain situations, systemd will fail to keep
+ track of a start/stop/restart event, and will hang for 90 seconds before
+ giving the prompt back to the user. Another problem is that rabbitmq-
+ server must start after the epmd service, so we need to explicitly
+ declare this dependency in the service file.
  
  [Test Case]
- TBD
+ 
+ Although I was able to reproduce this almost 100% of the time, there
+ were rare occasions when the restart procedure finished normally. I was
+ also only able to reproduce it using a bionic VM, not a container. If
+ you have multipass or lxd configured to launch VMs, that should be easy.
+ 
+ The steps are:
+ 
+ $ lxc launch ubuntu-daily:bionic --vm bug1784757-rabbitmq-server # or use multipass
+ $ lxc shell bug1784757-rabbitmq-server
+ # apt update
+ # apt install rabbitmq-server -y
+ # systemctl restart rabbitmq-server.service
+ 
+ In a normal scenario, the restart should take around 3 seconds or less.
+ With the bug, it takes around 90 seconds. If you can't reproduce it, try
+ running "systemctl restart" again. A quick way to trigger it is to run a
+ for loop like:
+ 
+ # for i in $(seq 10); do time systemctl restart rabbitmq-server.service
+ ; done
  
  [Regression Potential]
- TBD
  
- [Fix]
- TBD
+ * Because rabbitmq-server implements systemd's "Type=notify" using socat
+ to communicate with systemd-notify over a socket, we will be introducing
+ another point of failure (socat) in the mix.
  
- The fix is available upstream in Debian as of 3.7.6, so would need
- backported for bionic and cosmic:
- 
-  rabbitmq-server | 3.5.7-1                | xenial          | source, all
-  rabbitmq-server | 3.6.10-1               | bionic          | source, all
-  rabbitmq-server | 3.6.10-1               | cosmic          | source, all
-  rabbitmq-server | 3.7.8-4ubuntu2         | disco           | source, all
-  rabbitmq-server | 3.7.8-4ubuntu2         | eoan            | source, all
- 
- [Discussion]
- TBD
+ * So, albeit unlikely, there may be a problem when using socat.  That
+ would not be a regression, though, since the outcome would be the same
+ as we have today: "systemctl restart" would not properly work, even
+ though the service did restart.
  
  [Original Report]
- When I run `systemctl restart rabbitmq-server` it waits for 90 seconds then systemd sends SIGKILL to it.
+ 
+ When I run `systemctl restart rabbitmq-server` it waits for 90 seconds
+ then systemd sends SIGKILL to it.
  
  Presumably the `epmd` process does not receive SIGTERM, since if I run
  `kill 1493` (or whatever pid it currently is) then restart happens
  straight after that successfully
  
  ● rabbitmq-server.service - RabbitMQ Messaging Server
     Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
     Active: deactivating (final-sigterm) since Wed 2018-08-01 01:17:04 UTC; 7s ago
    Process: 1183 ExecStop=/usr/sbin/rabbitmqctl stop (code=exited, status=0/SUCCESS)
    Process: 178 ExecStartPost=/usr/lib/rabbitmq/bin/rabbitmq-server-wait (code=exited, status=0/SUCCESS)
    Process: 177 ExecStart=/usr/sbin/rabbitmq-server (code=killed, signal=TERM)
   Main PID: 177 (code=killed, signal=TERM)
      Tasks: 1 (limit: 4915)
     CGroup: /system.slice/rabbitmq-server.service
             └─1493 /usr/lib/erlang/erts-9.2/bin/epmd -daemon
  
  Aug 01 01:11:20 rmq-1 systemd[1]: rabbitmq-server.service: Failed to reset devices.list: Operation not permitted
  Aug 01 01:11:20 rmq-1 systemd[1]: Starting RabbitMQ Messaging Server...
  Aug 01 01:11:25 rmq-1 rabbitmq[178]: Waiting for 'rabbit at rmq-1'
  Aug 01 01:11:25 rmq-1 rabbitmq[178]: pid is 204
  Aug 01 01:11:30 rmq-1 systemd[1]: Started RabbitMQ Messaging Server.
  Aug 01 01:17:04 rmq-1 systemd[1]: Stopping RabbitMQ Messaging Server...
  Aug 01 01:17:06 rmq-1 rabbitmq[1183]: Stopping and halting node 'rabbit at rmq-1'

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to rabbitmq-server in Ubuntu.
https://bugs.launchpad.net/bugs/1784757

Title:
  [SRU] rabbitmq-server does not properly shutdown

Status in rabbitmq-server package in Ubuntu:
  Fix Released
Status in rabbitmq-server source package in Bionic:
  Triaged
Status in rabbitmq-server source package in Cosmic:
  Won't Fix
Status in rabbitmq-server package in Debian:
  Fix Released

Bug description:
  [Impact]

  The systemd file rabbitmq-server.service on Bionic uses "Type=simple"
  when defining the service, but unfortunately this doesn't work very
  well for rabbitmq-server. In certain situations, systemd will fail to
  keep track of a start/stop/restart event, and will hang for 90 seconds
  before giving the prompt back to the user. Another problem is that
  rabbitmq-server must start after the epmd service, so we need to
  explicitly declare this dependency in the service file.

  [Test Case]

  Although I was able to reproduce this almost 100% of the time, there
  were rare occasions when the restart procedure finished normally. I
  was also only able to reproduce it using a bionic VM, not a container.
  If you have multipass or lxd configured to launch VMs, that should be
  easy.

  The steps are:

  $ lxc launch ubuntu-daily:bionic --vm bug1784757-rabbitmq-server # or use multipass
  $ lxc shell bug1784757-rabbitmq-server
  # apt update
  # apt install rabbitmq-server -y
  # systemctl restart rabbitmq-server.service

  In a normal scenario, the restart should take around 3 seconds or
  less. With the bug, it takes around 90 seconds. If you can't reproduce
  it, try running "systemctl restart" again. A quick way to trigger it
  is to run a for loop like:

  # for i in $(seq 10); do time systemctl restart rabbitmq-
  server.service ; done

  [Regression Potential]

  * Because rabbitmq-server implements systemd's "Type=notify" using
  socat to communicate with systemd-notify over a socket, we will be
  introducing another point of failure (socat) in the mix.

  * So, albeit unlikely, there may be a problem when using socat.  That
  would not be a regression, though, since the outcome would be the same
  as we have today: "systemctl restart" would not properly work, even
  though the service did restart.

  [Original Report]

  When I run `systemctl restart rabbitmq-server` it waits for 90 seconds
  then systemd sends SIGKILL to it.

  Presumably the `epmd` process does not receive SIGTERM, since if I run
  `kill 1493` (or whatever pid it currently is) then restart happens
  straight after that successfully

  ● rabbitmq-server.service - RabbitMQ Messaging Server
     Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
     Active: deactivating (final-sigterm) since Wed 2018-08-01 01:17:04 UTC; 7s ago
    Process: 1183 ExecStop=/usr/sbin/rabbitmqctl stop (code=exited, status=0/SUCCESS)
    Process: 178 ExecStartPost=/usr/lib/rabbitmq/bin/rabbitmq-server-wait (code=exited, status=0/SUCCESS)
    Process: 177 ExecStart=/usr/sbin/rabbitmq-server (code=killed, signal=TERM)
   Main PID: 177 (code=killed, signal=TERM)
      Tasks: 1 (limit: 4915)
     CGroup: /system.slice/rabbitmq-server.service
             └─1493 /usr/lib/erlang/erts-9.2/bin/epmd -daemon

  Aug 01 01:11:20 rmq-1 systemd[1]: rabbitmq-server.service: Failed to reset devices.list: Operation not permitted
  Aug 01 01:11:20 rmq-1 systemd[1]: Starting RabbitMQ Messaging Server...
  Aug 01 01:11:25 rmq-1 rabbitmq[178]: Waiting for 'rabbit at rmq-1'
  Aug 01 01:11:25 rmq-1 rabbitmq[178]: pid is 204
  Aug 01 01:11:30 rmq-1 systemd[1]: Started RabbitMQ Messaging Server.
  Aug 01 01:17:04 rmq-1 systemd[1]: Stopping RabbitMQ Messaging Server...
  Aug 01 01:17:06 rmq-1 rabbitmq[1183]: Stopping and halting node 'rabbit at rmq-1'

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1784757/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list