[Bug 1890334] [NEW] ceph: nautilus: backport fixes for msgr/eventcenter

Mauricio Faria de Oliveira 1890334 at bugs.launchpad.net
Tue Aug 4 19:09:48 UTC 2020


Public bug reported:

Ceph Nautilus in bionic-train may hit daemon crashes (e.g., ceph-mgr)
in msgr/eventcenter as it lacks the following set of fixes backports:

  https://github.com/ceph/ceph/pull/33820

Reporting the bug against UCA since Ubuntu Eoan (Train) is EOL.
Working on the debdiffs and tests.

Example stack trace as reported by 'ceph crash info' and GDB:

$ sudo ceph crash info <crash ID>
...
    "process_name": "ceph-mgr",
...
    "backtrace": [
        "(()+0x128a0) [0x7f8e4ae928a0]",
        "(bool ProtocolV2::append_frame<ceph::msgr::v2::MessageFrame>(ceph::msgr::v2::MessageFrame&)+0x48a) [0x7f8e4bf4219a]",
        "(ProtocolV2::write_message(Message*, bool)+0x4dd) [0x7f8e4bf249dd]",
        "(ProtocolV2::write_event()+0x2c5) [0x7f8e4bf39d55]",
        "(AsyncConnection::handle_write()+0x43) [0x7f8e4bef89e3]",
        "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xd57) [0x7f8e4bf51157]",
        "(()+0x59b848) [0x7f8e4bf55848]",
        "(()+0xbd6df) [0x7f8e4a9b06df]",
        "(()+0x76db) [0x7f8e4ae876db]",
        "(clone()+0x3f) [0x7f8e4a06da3f]"
    ]
...

(gdb) bt
#0  raise (sig=sig at entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x000055b9deda9140 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=11) at ./src/global/signal_handler.cc:326
#3  <signal handler called>
#4  ceph::msgr::v2::Frame<ceph::msgr::v2::MessageFrame, (unsigned short)8, (unsigned short)8, (unsigned short)8, (unsigned short)4096>::get_buffer (session_stream_handlers=..., this=<optimized out>) at ./src/msg/async/frames_v2.h:273
#5  ProtocolV2::append_frame<ceph::msgr::v2::MessageFrame> (this=this at entry=0x55b9e4830680, frame=...) at ./src/msg/async/ProtocolV2.cc:552
#6  0x00007f8e4bf249dd in ProtocolV2::write_message (this=this at entry=0x55b9e4830680, m=m at entry=0x55b9e596da40, more=more at entry=false)
    at ./src/msg/async/ProtocolV2.cc:515
#7  0x00007f8e4bf39d55 in ProtocolV2::write_event (this=0x55b9e4830680) at ./src/msg/async/ProtocolV2.cc:627
#8  0x00007f8e4bef89e3 in AsyncConnection::handle_write (this=0x55b9e73ec480) at ./src/msg/async/AsyncConnection.cc:692
#9  0x00007f8e4bf51157 in EventCenter::process_events (this=this at entry=0x55b9e05502c0, timeout_microseconds=<optimized out>, 
    timeout_microseconds at entry=30000000, working_dur=working_dur at entry=0x7f8e466d5828) at ./src/msg/async/Event.cc:441
#10 0x00007f8e4bf55848 in NetworkStack::<lambda()>::operator() (__closure=0x55b9e05feff8) at ./src/msg/async/Stack.cc:53
#11 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/7/bits/std_function.h:316
#12 0x00007f8e4a9b06df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f8e4ae876db in start_thread (arg=0x7f8e466d8700) at pthread_create.c:463
#14 0x00007f8e4a06da3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

** Affects: cloud-archive
     Importance: Undecided
     Assignee: Mauricio Faria de Oliveira (mfo)
         Status: In Progress

** Changed in: cloud-archive
     Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: cloud-archive
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1890334

Title:
  ceph: nautilus: backport fixes for msgr/eventcenter

Status in Ubuntu Cloud Archive:
  In Progress

Bug description:
  Ceph Nautilus in bionic-train may hit daemon crashes (e.g., ceph-mgr)
  in msgr/eventcenter as it lacks the following set of fixes backports:

    https://github.com/ceph/ceph/pull/33820

  Reporting the bug against UCA since Ubuntu Eoan (Train) is EOL.
  Working on the debdiffs and tests.

  Example stack trace as reported by 'ceph crash info' and GDB:

  $ sudo ceph crash info <crash ID>
  ...
      "process_name": "ceph-mgr",
  ...
      "backtrace": [
          "(()+0x128a0) [0x7f8e4ae928a0]",
          "(bool ProtocolV2::append_frame<ceph::msgr::v2::MessageFrame>(ceph::msgr::v2::MessageFrame&)+0x48a) [0x7f8e4bf4219a]",
          "(ProtocolV2::write_message(Message*, bool)+0x4dd) [0x7f8e4bf249dd]",
          "(ProtocolV2::write_event()+0x2c5) [0x7f8e4bf39d55]",
          "(AsyncConnection::handle_write()+0x43) [0x7f8e4bef89e3]",
          "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xd57) [0x7f8e4bf51157]",
          "(()+0x59b848) [0x7f8e4bf55848]",
          "(()+0xbd6df) [0x7f8e4a9b06df]",
          "(()+0x76db) [0x7f8e4ae876db]",
          "(clone()+0x3f) [0x7f8e4a06da3f]"
      ]
  ...

  (gdb) bt
  #0  raise (sig=sig at entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
  #1  0x000055b9deda9140 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:81
  #2  handle_fatal_signal (signum=11) at ./src/global/signal_handler.cc:326
  #3  <signal handler called>
  #4  ceph::msgr::v2::Frame<ceph::msgr::v2::MessageFrame, (unsigned short)8, (unsigned short)8, (unsigned short)8, (unsigned short)4096>::get_buffer (session_stream_handlers=..., this=<optimized out>) at ./src/msg/async/frames_v2.h:273
  #5  ProtocolV2::append_frame<ceph::msgr::v2::MessageFrame> (this=this at entry=0x55b9e4830680, frame=...) at ./src/msg/async/ProtocolV2.cc:552
  #6  0x00007f8e4bf249dd in ProtocolV2::write_message (this=this at entry=0x55b9e4830680, m=m at entry=0x55b9e596da40, more=more at entry=false)
      at ./src/msg/async/ProtocolV2.cc:515
  #7  0x00007f8e4bf39d55 in ProtocolV2::write_event (this=0x55b9e4830680) at ./src/msg/async/ProtocolV2.cc:627
  #8  0x00007f8e4bef89e3 in AsyncConnection::handle_write (this=0x55b9e73ec480) at ./src/msg/async/AsyncConnection.cc:692
  #9  0x00007f8e4bf51157 in EventCenter::process_events (this=this at entry=0x55b9e05502c0, timeout_microseconds=<optimized out>, 
      timeout_microseconds at entry=30000000, working_dur=working_dur at entry=0x7f8e466d5828) at ./src/msg/async/Event.cc:441
  #10 0x00007f8e4bf55848 in NetworkStack::<lambda()>::operator() (__closure=0x55b9e05feff8) at ./src/msg/async/Stack.cc:53
  #11 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
      at /usr/include/c++/7/bits/std_function.h:316
  #12 0x00007f8e4a9b06df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  #13 0x00007f8e4ae876db in start_thread (arg=0x7f8e466d8700) at pthread_create.c:463
  #14 0x00007f8e4a06da3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1890334/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list