[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS
Maksym Medvied
2089565 at bugs.launchpad.net
Sat Dec 21 20:17:22 UTC 2024
As we see in the diff above
if (ev >= 17) {
- decode(max_xattr_size, p);
+ decode(bal_rank_mask, p);
}
if (ev >= 18) {
- decode(bal_rank_mask, p);
+ decode(max_xattr_size, p);
+ }
+
these two decode() calls were swapped. Let's find out why.
To do so we need to clone the upstream repo and run git blame on the file to see when and why the lines were changed:
> git clone https://github.com/ceph/ceph ceph-upstream
> cd ceph-upstream/
> git blame src/mds/MDSMap.cc
...
e134c8907013 (Yongseok Oh 2022-10-11 20:47:32 +0900 963) if (ev >= 17) {
78abfeaff27f (Patrick Donnelly 2024-02-15 10:28:32 -0500 964) decode(bal_rank_mask, p);
36ee8e7ed365 (Venky Shankar 2023-12-01 04:32:20 -0500 965) }
36ee8e7ed365 (Venky Shankar 2023-12-01 04:32:20 -0500 966)
36ee8e7ed365 (Venky Shankar 2023-12-01 04:32:20 -0500 967) if (ev >= 18) {
78abfeaff27f (Patrick Donnelly 2024-02-15 10:28:32 -0500 968) decode(max_xattr_size, p);
e134c8907013 (Yongseok Oh 2022-10-11 20:47:32 +0900 969) }
...
We see that both decode() functions where changed in the same commit. If
we look at it with
> git show 78abfeaff27f
we'll see that this is what we were looking for. A link to the commit:
https://github.com/ceph/ceph/commit/78abfeaff27fee343fb664db633de5b221699a73.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2089565
Title:
MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS
Status in ceph package in Ubuntu:
Confirmed
Bug description:
This issue is a continuation of
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2065515
On Ubuntu 24.04 lts we did upgrade Ceph to 19.2.0-0ubuntu0.24.04.1
Previous release is : 19.2.0~git20240301.4c76c50-0ubuntu6
whenever upgrading (tested on 2 different clusters) the ceph-mon
ends up crashing repeatedly with the below stack error
```
ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x788409245320]
2: pthread_kill()
3: gsignal()
4: abort()
5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5ff5) [0x7884096a5ff5]
6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb0da) [0x7884096bb0da]
7: (std::unexpected()+0) [0x7884096a5a55]
8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb391) [0x7884096bb391]
9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+0x193) [0x78840a293593]
10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xca1) [0x78840a4c3ab1]
11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1c3) [0x78840a4e4303]
12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x280) [0x78840a4e6ef0]
13: (MDSMonitor::update_from_paxos(bool*)+0x291) [0x631ac5dea801]
14: (Monitor::refresh_from_paxos(bool*)+0x124) [0x631ac5b7a164]
15: (Monitor::preinit()+0x98e) [0x631ac5bb2fbe]
16: main()
17: /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x78840922a1ca]
18: __libc_start_main()
19: _start()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
```
mitigation:
a rollback to the previous release 19.2.0~git20240301.4c76c50-0ubuntu6 is still possible to restore service
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2089565/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list