[Bug 1772236] Re: rabbit died and everything else died

 Christian Ehrhardt  1772236 at bugs.launchpad.net
Fri Jun 15 10:23:20 UTC 2018


I first thought we could log some data like:
 $ rabbitmqctl list_queues name durable owner_pid messages_ready messages_unacknowledged messages messages_ready_ram messages_unacknowledged_ram messages_ram messages_persistent message_bytes message_bytes_ram message_bytes_persistent memory state
via cron.
But then we don't know what exactly we look for yet.

I found that the service oriented
 $ rabbitmqctl report
has all the data you could want.
If we don't gather it too often, and maybe even gzip it

In my test it had 7.5k raw and 2.6k zipped.
A real case might be bigger, but if we do that hourly or so we would see which element grows over time.

Especially interesting is the definition of the base memory counter:
  memory Bytes of memory consumed by the Erlang process associated with the queue, including 
         stack, heap and internal structures.

Yeah could be useful next time this happens.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to rabbitmq-server in Ubuntu.
https://bugs.launchpad.net/bugs/1772236

Title:
  rabbit died and everything else died

Status in Auto Package Testing:
  New
Status in rabbitmq-server package in Ubuntu:
  New

Bug description:
  Why did it die?

  Should it have self-restarted?

  ubuntu at juju-prod-ues-proposed-migration-machine-1:~$ journalctl -u rabbitmq-server.service -n1000 | cat
  -- Logs begin at Sun 2018-05-20 00:18:25 UTC, end at Sun 2018-05-20 08:58:27 UTC. --
  May 20 04:00:11 juju-prod-ues-proposed-migration-machine-1 systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=137/n/a
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: Stopping and halting node 'rabbit at ps45-10-25-180-146' ...
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: Error: unable to connect to node 'rabbit at ps45-10-25-180-146': nodedown
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: DIAGNOSTICS
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: ===========
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: attempted to contact: ['rabbit at ps45-10-25-180-146']
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: rabbit at ps45-10-25-180-146:
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]:   * connected to epmd (port 4369) on ps45-10-25-180-146
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]:   * epmd reports: node 'rabbit' not running at all
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]:                   other nodes on ps45-10-25-180-146: ['rabbitmq-cli-28979']
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]:   * suggestion: start the node
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: current node details:
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: - node name: 'rabbitmq-cli-28979 at juju-prod-ues-proposed-migration-machine-1'
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: - home dir: .
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 rabbitmq[28971]: - cookie hash: 7+AChRZDewWFJK8SEUhx+Q==
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 systemd[1]: rabbitmq-server.service: Control process exited, code=exited status=2
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 systemd[1]: rabbitmq-server.service: Unit entered failed state.
  May 20 04:00:12 juju-prod-ues-proposed-migration-machine-1 systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.

To manage notifications about this bug go to:
https://bugs.launchpad.net/auto-package-testing/+bug/1772236/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list