[Bug 1890334] [NEW] ceph: nautilus: backport fixes for msgr/eventcenter
Mauricio Faria de Oliveira
1890334 at bugs.launchpad.net
Tue Aug 4 19:09:48 UTC 2020
Public bug reported:
Ceph Nautilus in bionic-train may hit daemon crashes (e.g., ceph-mgr)
in msgr/eventcenter as it lacks the following set of fixes backports:
https://github.com/ceph/ceph/pull/33820
Reporting the bug against UCA since Ubuntu Eoan (Train) is EOL.
Working on the debdiffs and tests.
Example stack trace as reported by 'ceph crash info' and GDB:
$ sudo ceph crash info <crash ID>
...
"process_name": "ceph-mgr",
...
"backtrace": [
"(()+0x128a0) [0x7f8e4ae928a0]",
"(bool ProtocolV2::append_frame<ceph::msgr::v2::MessageFrame>(ceph::msgr::v2::MessageFrame&)+0x48a) [0x7f8e4bf4219a]",
"(ProtocolV2::write_message(Message*, bool)+0x4dd) [0x7f8e4bf249dd]",
"(ProtocolV2::write_event()+0x2c5) [0x7f8e4bf39d55]",
"(AsyncConnection::handle_write()+0x43) [0x7f8e4bef89e3]",
"(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xd57) [0x7f8e4bf51157]",
"(()+0x59b848) [0x7f8e4bf55848]",
"(()+0xbd6df) [0x7f8e4a9b06df]",
"(()+0x76db) [0x7f8e4ae876db]",
"(clone()+0x3f) [0x7f8e4a06da3f]"
]
...
(gdb) bt
#0 raise (sig=sig at entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x000055b9deda9140 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:81
#2 handle_fatal_signal (signum=11) at ./src/global/signal_handler.cc:326
#3 <signal handler called>
#4 ceph::msgr::v2::Frame<ceph::msgr::v2::MessageFrame, (unsigned short)8, (unsigned short)8, (unsigned short)8, (unsigned short)4096>::get_buffer (session_stream_handlers=..., this=<optimized out>) at ./src/msg/async/frames_v2.h:273
#5 ProtocolV2::append_frame<ceph::msgr::v2::MessageFrame> (this=this at entry=0x55b9e4830680, frame=...) at ./src/msg/async/ProtocolV2.cc:552
#6 0x00007f8e4bf249dd in ProtocolV2::write_message (this=this at entry=0x55b9e4830680, m=m at entry=0x55b9e596da40, more=more at entry=false)
at ./src/msg/async/ProtocolV2.cc:515
#7 0x00007f8e4bf39d55 in ProtocolV2::write_event (this=0x55b9e4830680) at ./src/msg/async/ProtocolV2.cc:627
#8 0x00007f8e4bef89e3 in AsyncConnection::handle_write (this=0x55b9e73ec480) at ./src/msg/async/AsyncConnection.cc:692
#9 0x00007f8e4bf51157 in EventCenter::process_events (this=this at entry=0x55b9e05502c0, timeout_microseconds=<optimized out>,
timeout_microseconds at entry=30000000, working_dur=working_dur at entry=0x7f8e466d5828) at ./src/msg/async/Event.cc:441
#10 0x00007f8e4bf55848 in NetworkStack::<lambda()>::operator() (__closure=0x55b9e05feff8) at ./src/msg/async/Stack.cc:53
#11 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
at /usr/include/c++/7/bits/std_function.h:316
#12 0x00007f8e4a9b06df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f8e4ae876db in start_thread (arg=0x7f8e466d8700) at pthread_create.c:463
#14 0x00007f8e4a06da3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
** Affects: cloud-archive
Importance: Undecided
Assignee: Mauricio Faria de Oliveira (mfo)
Status: In Progress
** Changed in: cloud-archive
Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)
** Changed in: cloud-archive
Status: New => In Progress
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1890334
Title:
ceph: nautilus: backport fixes for msgr/eventcenter
Status in Ubuntu Cloud Archive:
In Progress
Bug description:
Ceph Nautilus in bionic-train may hit daemon crashes (e.g., ceph-mgr)
in msgr/eventcenter as it lacks the following set of fixes backports:
https://github.com/ceph/ceph/pull/33820
Reporting the bug against UCA since Ubuntu Eoan (Train) is EOL.
Working on the debdiffs and tests.
Example stack trace as reported by 'ceph crash info' and GDB:
$ sudo ceph crash info <crash ID>
...
"process_name": "ceph-mgr",
...
"backtrace": [
"(()+0x128a0) [0x7f8e4ae928a0]",
"(bool ProtocolV2::append_frame<ceph::msgr::v2::MessageFrame>(ceph::msgr::v2::MessageFrame&)+0x48a) [0x7f8e4bf4219a]",
"(ProtocolV2::write_message(Message*, bool)+0x4dd) [0x7f8e4bf249dd]",
"(ProtocolV2::write_event()+0x2c5) [0x7f8e4bf39d55]",
"(AsyncConnection::handle_write()+0x43) [0x7f8e4bef89e3]",
"(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xd57) [0x7f8e4bf51157]",
"(()+0x59b848) [0x7f8e4bf55848]",
"(()+0xbd6df) [0x7f8e4a9b06df]",
"(()+0x76db) [0x7f8e4ae876db]",
"(clone()+0x3f) [0x7f8e4a06da3f]"
]
...
(gdb) bt
#0 raise (sig=sig at entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x000055b9deda9140 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:81
#2 handle_fatal_signal (signum=11) at ./src/global/signal_handler.cc:326
#3 <signal handler called>
#4 ceph::msgr::v2::Frame<ceph::msgr::v2::MessageFrame, (unsigned short)8, (unsigned short)8, (unsigned short)8, (unsigned short)4096>::get_buffer (session_stream_handlers=..., this=<optimized out>) at ./src/msg/async/frames_v2.h:273
#5 ProtocolV2::append_frame<ceph::msgr::v2::MessageFrame> (this=this at entry=0x55b9e4830680, frame=...) at ./src/msg/async/ProtocolV2.cc:552
#6 0x00007f8e4bf249dd in ProtocolV2::write_message (this=this at entry=0x55b9e4830680, m=m at entry=0x55b9e596da40, more=more at entry=false)
at ./src/msg/async/ProtocolV2.cc:515
#7 0x00007f8e4bf39d55 in ProtocolV2::write_event (this=0x55b9e4830680) at ./src/msg/async/ProtocolV2.cc:627
#8 0x00007f8e4bef89e3 in AsyncConnection::handle_write (this=0x55b9e73ec480) at ./src/msg/async/AsyncConnection.cc:692
#9 0x00007f8e4bf51157 in EventCenter::process_events (this=this at entry=0x55b9e05502c0, timeout_microseconds=<optimized out>,
timeout_microseconds at entry=30000000, working_dur=working_dur at entry=0x7f8e466d5828) at ./src/msg/async/Event.cc:441
#10 0x00007f8e4bf55848 in NetworkStack::<lambda()>::operator() (__closure=0x55b9e05feff8) at ./src/msg/async/Stack.cc:53
#11 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
at /usr/include/c++/7/bits/std_function.h:316
#12 0x00007f8e4a9b06df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f8e4ae876db in start_thread (arg=0x7f8e466d8700) at pthread_create.c:463
#14 0x00007f8e4a06da3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1890334/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list