[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

Maksym Medvied 2089565 at bugs.launchpad.net
Sat Dec 21 20:17:22 UTC 2024


As we see in the diff above

   if (ev >= 17) {
-    decode(max_xattr_size, p);
+    decode(bal_rank_mask, p);
   }
 
   if (ev >= 18) {
-    decode(bal_rank_mask, p);
+    decode(max_xattr_size, p);
+  }
+

these two decode() calls were swapped. Let's find out why.
To do so we need to clone the upstream repo and run git blame on the file to see when and why the lines were changed:

> git clone https://github.com/ceph/ceph ceph-upstream
> cd ceph-upstream/
> git blame src/mds/MDSMap.cc
...
e134c8907013 (Yongseok Oh      2022-10-11 20:47:32 +0900  963)   if (ev >= 17) {
78abfeaff27f (Patrick Donnelly 2024-02-15 10:28:32 -0500  964)     decode(bal_rank_mask, p);
36ee8e7ed365 (Venky Shankar    2023-12-01 04:32:20 -0500  965)   }
36ee8e7ed365 (Venky Shankar    2023-12-01 04:32:20 -0500  966) 
36ee8e7ed365 (Venky Shankar    2023-12-01 04:32:20 -0500  967)   if (ev >= 18) {
78abfeaff27f (Patrick Donnelly 2024-02-15 10:28:32 -0500  968)     decode(max_xattr_size, p);
e134c8907013 (Yongseok Oh      2022-10-11 20:47:32 +0900  969)   }
...

We see that both decode() functions where changed in the same commit. If
we look at it with

> git show 78abfeaff27f

we'll see that this is what we were looking for. A link to the commit:
https://github.com/ceph/ceph/commit/78abfeaff27fee343fb664db633de5b221699a73.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2089565

Title:
  MON and MDS crash upgrading  CEPH  on ubuntu 24.04 LTS

Status in ceph package in Ubuntu:
  Confirmed

Bug description:
  This issue is a continuation of
  https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2065515

  
  On Ubuntu 24.04 lts we did upgrade Ceph to  19.2.0-0ubuntu0.24.04.1

  Previous release is : 19.2.0~git20240301.4c76c50-0ubuntu6

  whenever  upgrading (tested on 2 different clusters)  the ceph-mon
  ends up crashing repeatedly with the below stack error

  ```
   ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)
   1: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x788409245320]
   2: pthread_kill()
   3: gsignal()
   4: abort()
   5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5ff5) [0x7884096a5ff5]
   6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb0da) [0x7884096bb0da]
   7: (std::unexpected()+0) [0x7884096a5a55]
   8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb391) [0x7884096bb391]
   9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+0x193) [0x78840a293593]
   10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xca1) [0x78840a4c3ab1]
   11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1c3) [0x78840a4e4303]
   12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x280) [0x78840a4e6ef0]
   13: (MDSMonitor::update_from_paxos(bool*)+0x291) [0x631ac5dea801]
   14: (Monitor::refresh_from_paxos(bool*)+0x124) [0x631ac5b7a164]
   15: (Monitor::preinit()+0x98e) [0x631ac5bb2fbe]
   16: main()
   17: /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x78840922a1ca]
   18: __libc_start_main()
   19: _start()
   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  ```

  
  mitigation:
  a rollback to the previous release 19.2.0~git20240301.4c76c50-0ubuntu6 is still possible to restore service

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2089565/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list