[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS
Maksym Medvied
2089565 at bugs.launchpad.net
Sat Dec 21 20:17:13 UTC 2024
The addresses here are not continuous, so it makes sense to look at the
full disassembled version as well (i.e. disassemble without /m):
(gdb) disassemble 'MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)'
Dump of assembler code for function _ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE:
Address range 0x7ffff7cc2e10 to 0x7ffff7cc3c4d:
0x00007ffff7cc2e10 <+0>: endbr64
0x00007ffff7cc2e14 <+4>: push %rbp
...
0x00007ffff7cc3a65 <+3157>: cmp $0x10,%r12w
0x00007ffff7cc3a6a <+3162>: je 0x7ffff7cc3371 <_ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE+1377>
0x00007ffff7cc3a70 <+3168>: lea -0x2a4(%rbp),%rdx
0x00007ffff7cc3a77 <+3175>: mov $0x4,%esi
0x00007ffff7cc3a7c <+3180>: mov %r13,%rdi
0x00007ffff7cc3a7f <+3183>: lea 0x1c0(%rbx),%r14
0x00007ffff7cc3a86 <+3190>: call 0x7ffff7a93320 <_ZN4ceph6buffer7v15_2_04list13iterator_implILb1EE4copyEjPc>
0x00007ffff7cc3a8b <+3195>: mov 0x1c0(%rbx),%rax
0x00007ffff7cc3a92 <+3202>: mov -0x2a4(%rbp),%esi
0x00007ffff7cc3a98 <+3208>: mov %r14,%rdx
0x00007ffff7cc3a9b <+3211>: mov %r13,%rdi
0x00007ffff7cc3a9e <+3214>: movq $0x0,0x1c8(%rbx)
0x00007ffff7cc3aa9 <+3225>: movb $0x0,(%rax)
0x00007ffff7cc3aac <+3228>: call 0x7ffff7a93400 <_ZN4ceph6buffer7v15_2_04list13iterator_implILb1EE4copyEjRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>
0x00007ffff7cc3ab1 <+3233>: cmp $0x11,%r12w
...
The function that was called just before that has signature
_ZN4ceph6buffer7v15_2_04list13iterator_implILb1EE4copyEjRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE,
which is
(gdb) demangle _ZN4ceph6buffer7v15_2_04list13iterator_implILb1EE4copyEjRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
and it seems like it's the function that we see in the frame 9:
9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned
int, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >&)+0x193) [0x749753093593]
so we're on the right track. The function is just after the
0x00007ffff7cc3a65 <+3157>: cmp $0x10,%r12w
0x00007ffff7cc3a6a <+3162>: je 0x7ffff7cc3371 <_ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE+1377>
branch, which seems like this is the if part from
963 if (ev >= 17) {
964 decode(bal_rank_mask, p);
965 }
If the value equals 16 then the jump happens, otherwise decode(bal_rank_mask, p); is called.
bal_rank_mask is std::string, and the function has basic_string in the list of parameters, so it seems like we're still on the right track.
697 std::string bal_rank_mask = "-1";
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2089565
Title:
MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS
Status in ceph package in Ubuntu:
Confirmed
Bug description:
This issue is a continuation of
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2065515
On Ubuntu 24.04 lts we did upgrade Ceph to 19.2.0-0ubuntu0.24.04.1
Previous release is : 19.2.0~git20240301.4c76c50-0ubuntu6
whenever upgrading (tested on 2 different clusters) the ceph-mon
ends up crashing repeatedly with the below stack error
```
ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x788409245320]
2: pthread_kill()
3: gsignal()
4: abort()
5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5ff5) [0x7884096a5ff5]
6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb0da) [0x7884096bb0da]
7: (std::unexpected()+0) [0x7884096a5a55]
8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb391) [0x7884096bb391]
9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+0x193) [0x78840a293593]
10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xca1) [0x78840a4c3ab1]
11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1c3) [0x78840a4e4303]
12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x280) [0x78840a4e6ef0]
13: (MDSMonitor::update_from_paxos(bool*)+0x291) [0x631ac5dea801]
14: (Monitor::refresh_from_paxos(bool*)+0x124) [0x631ac5b7a164]
15: (Monitor::preinit()+0x98e) [0x631ac5bb2fbe]
16: main()
17: /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x78840922a1ca]
18: __libc_start_main()
19: _start()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
```
mitigation:
a rollback to the previous release 19.2.0~git20240301.4c76c50-0ubuntu6 is still possible to restore service
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2089565/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list