[Bug 1496409] Re: Race condition in mnesia_locker after node down

James Page james.page at ubuntu.com
Tue Oct 24 09:16:54 UTC 2017


Jorge

Have you seen this issue in Xenial or later releases of the RabbitMQ
server package?  Xenial included a 3.5.x series rabbitmq which should
contain the fixes referenced.

Note that we also provide this version of RMQ for trusty uses via the
Icehouse Ubuntu Cloud Archive.

** Changed in: rabbitmq-server (Ubuntu)
       Status: New => Triaged

** Changed in: rabbitmq-server (Ubuntu)
   Importance: Undecided => Medium

** Changed in: rabbitmq-server (Ubuntu)
   Importance: Medium => Low

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to rabbitmq-server in Ubuntu.
https://bugs.launchpad.net/bugs/1496409

Title:
  Race condition in mnesia_locker after node down

Status in rabbitmq-server package in Ubuntu:
  Triaged

Bug description:
  [Description]

  On a 3 nodes rabbitMQ cluster after a failure on one of the nodes (kernel crash) the following succession
  of events has been registered:

  ===
  This initial crash report

  $ ack-grep -i mnesia_locker  | wc -l
  16

  60-=CRASH REPORT==== 7-Sep-2015::14:48:45 ===
  61-  crasher:
  --
  66:                        {mnesia_locker,'rabbit at 10-10-34-2',granted}}
  67-      in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1045)
  68-    ancestors: [worker_pool_sup,rabbit_sup,<0.110.0>]
  69-    messages: []
  70-    links: [<0.114.0>]
  71-    dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct},
  72-                  {{xtype_to_module,topic},rabbit_exchange_type_topic},
  73-                  {random_seed,{5643,25632,27953}},
  74-                  {{xtype_to_module,fanout},rabbit_exchange_type_fanout},
  75-                  {worker_pool_worker,true},
  76-                  {fhc_age_tree,{0,nil}},
  --
  88:     Reason:     {unexpected_info,{mnesia_locker,'rabbit at 10-10-34-2',granted}}
  89-     Offender:   [{pid,<0.142.0>},
  90-                  {name,25},
  91-                  {mfargs,{worker_pool_worker,start_link,[25]}},
  92-                  {restart_type,transient},
  93-                  {shutdown,4294967295},
  94-                  {child_type,worker}]

  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1.log
  5166:Mnesia('rabbit at 10-10-34-1'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit at 10-10-34-2'}

  sosreport-svz-op-fdc-os-3.00087142-20150909140820/var/log/rabbitmq/rabbit at 10-10-34-3.log.1
  545:Mnesia('rabbit at 10-10-34-3'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit at 10-10-34-1'}

  After this I started seeing this traces:

  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log:572:=CRASH REPORT==== 7-Sep-2015::14:48:45 ===
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log:573:  crasher:
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-574-    initial call: gen:init_it/6
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-575-    pid: <0.2346.189>
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-576-    registered_name: []
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-577-    exception exit: {{unexpected_cast,{next_job_from,<0.4960.181>}},
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-578-                     {gen_server2,call,
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-579-                         [<0.22014.255>,
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-580-                          {submit,#Fun<rabbit_misc.6.25154013>,<0.2862.189>},
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-581-                          infinity]}}
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-582-      in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1045)
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-583-    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.110.0>]
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-584-    messages: []
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-585-    links: [<0.259.0>]
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-586-    dictionary: [{guid,{{3735536962,2437967587,3023977752,60868675},0}}]
  sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit at 10-10-34-1-sasl.log-587-    trap_exit: true

  Related Bugs:

  - https://bugs.launchpad.net/mos/+bug/1401948
  - https://github.com/erlang/otp/compare/maint...dgud:dgud%3Bmnesia%3Bsticky-race%3BOTP-11375

  Possible related upstream commit:

  - http://hg.rabbitmq.com/rabbitmq-server/rev/5a63c9e273cc

  This seems to be related to [1] and [2] both fixes available since
  rabbitmq-server 3.4.0

  [1] https://github.com/rabbitmq/rabbitmq-
  server/commit/1a32616b744f4dab09cba4e7a7e747c2b6550361#diff-
  3b9dc5e3c18be9549b0ab00763e4e123

  [2] https://github.com/rabbitmq/rabbitmq-
  server/commit/238243b10ba6f666fcd8e84289525961fe6e68b9#diff-
  3b9dc5e3c18be9549b0ab00763e4e123

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1496409/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list