[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

Maksym Medvied 2089565 at bugs.launchpad.net
Sat Dec 21 20:15:22 UTC 2024


Now we see that the dir with the Ceph source is is ceph-19.2.0. Let's
create a symlink so gdb would be able to find it:

> sudo ln -sv ceph-19.2.0 ceph-19.2.0-0ubuntu0.24.04.1
'ceph-19.2.0-0ubuntu0.24.04.1' -> 'ceph-19.2.0'

Let's restart gdb with ceph-mon again:

(gdb) start
Temporary breakpoint 1 at 0x32c670: file /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc, line 250.
Starting program: /tmp/2/usr/bin/ceph-mon 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdf98)
    at /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc:250
250     {
(gdb) l
245       }
246       return addrs;
247     }
248
249     int main(int argc, const char **argv)
250     {
251       // reset our process name, in case we did a respawn, so that it's not
252       // left as "exe".
253       ceph_pthread_setname(pthread_self(), "ceph-mon");
254

Now we see the sources. The part of the backtrace that we want to know
more about is

 10:
(MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xca1)
[0x7497532c3ab1]

Let's see what's there:

(gdb) set pagination off
(gdb) disassemble 'MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)'
Dump of assembler code for function _ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE:
Address range 0x7ffff7cc2e10 to 0x7ffff7cc3c4d:
   0x00007ffff7cc2e10 <+0>:     endbr64
   0x00007ffff7cc2e14 <+4>:     push   %rbp
   0x00007ffff7cc2e15 <+5>:     mov    %rsp,%rbp
   0x00007ffff7cc2e18 <+8>:     push   %r15
   0x00007ffff7cc2e1a <+10>:    push   %r14
   0x00007ffff7cc2e1c <+12>:    lea    -0x2f3(%rbp),%rdx
...

We see that offsets here are in decimal and offsets in the stack
backtrace are in hex. We need decimal, so

(gdb) p 0xca1
$1 = 3233

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2089565

Title:
  MON and MDS crash upgrading  CEPH  on ubuntu 24.04 LTS

Status in ceph package in Ubuntu:
  Confirmed

Bug description:
  This issue is a continuation of
  https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2065515

  
  On Ubuntu 24.04 lts we did upgrade Ceph to  19.2.0-0ubuntu0.24.04.1

  Previous release is : 19.2.0~git20240301.4c76c50-0ubuntu6

  whenever  upgrading (tested on 2 different clusters)  the ceph-mon
  ends up crashing repeatedly with the below stack error

  ```
   ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)
   1: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x788409245320]
   2: pthread_kill()
   3: gsignal()
   4: abort()
   5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5ff5) [0x7884096a5ff5]
   6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb0da) [0x7884096bb0da]
   7: (std::unexpected()+0) [0x7884096a5a55]
   8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb391) [0x7884096bb391]
   9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+0x193) [0x78840a293593]
   10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xca1) [0x78840a4c3ab1]
   11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1c3) [0x78840a4e4303]
   12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x280) [0x78840a4e6ef0]
   13: (MDSMonitor::update_from_paxos(bool*)+0x291) [0x631ac5dea801]
   14: (Monitor::refresh_from_paxos(bool*)+0x124) [0x631ac5b7a164]
   15: (Monitor::preinit()+0x98e) [0x631ac5bb2fbe]
   16: main()
   17: /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x78840922a1ca]
   18: __libc_start_main()
   19: _start()
   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  ```

  
  mitigation:
  a rollback to the previous release 19.2.0~git20240301.4c76c50-0ubuntu6 is still possible to restore service

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2089565/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list