[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

Maksym Medvied 2089565 at bugs.launchpad.net
Sat Dec 21 19:28:04 UTC 2024


This is the SIGABRT stack backtrace:

 1: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x749752045320]
 2: pthread_kill()
 3: gsignal()
 4: abort()
 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5ff5) [0x7497524a5ff5]
 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb0da) [0x7497524bb0da]
 7: (std::unexpected()+0) [0x7497524a5a55]
 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb391) [0x7497524bb391]
 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+0x193) [0x749753093593]
 10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xca1) [0x7497532c3ab1]
 11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1c3) [0x7497532e4303]
 12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x280) [0x7497532e6ef0]
 13: (MDSMonitor::update_from_paxos(bool*)+0x291) [0x600eddf89801]
 14: (Monitor::refresh_from_paxos(bool*)+0x124) [0x600eddd19164]
 15: (Monitor::preinit()+0x98e) [0x600eddd51fbe]
 16: main()
 17: /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x74975202a1ca]
 18: __libc_start_main()
 19: _start()

>From what we can see here the last Ceph-related frame is 9, and
list::iterator_impl looks like something generic. The previous frame 10
is in MDSMap::decode(), and it's a great place to have version
incompatibility. Let's dig deeper into the frame.

To start, we need to figure out what our binary is. We see

0> 2024-11-25T13:53:45.524+0000 74975268ba80 -1 *** Caught signal (Aborted) **
in thread 74975268ba80 thread_name:ceph-mon

just before the stack backtrace, and by searching for "ceph-mon"
backward we see

-365> 2024-11-25T13:53:45.514+0000 74975268ba80  0 ceph version 19.2.0
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable), process ceph-
mon, pid 1304994

so it's likely that the binary is ceph-mon and the git version is 16063ff2022298c9300e49a547a16ffda59baf13.
To start let's see if there is a separate package for the binary:

> apt search ceph-mon
Sorting... Done
Full Text Search... Done
ceph-base/noble-updates 19.2.0-0ubuntu0.24.04.1 amd64
  common ceph daemon libraries and management tools

ceph-mon/noble-updates 19.2.0-0ubuntu0.24.04.1 amd64
  monitor server for the ceph storage system

Let's see if we could find the binary in the package

apt download ceph-mon

> dpkg-deb --verbose --raw-extract ./ceph-mon_19.2.0-0ubuntu0.24.04.1_amd64.deb ./
...
./usr/bin/ceph-mon
...

We were lucky that the binary name is the package name and the binary is in that package.
Now we know the exact package version 19.2.0-0ubuntu0.24.04.1 that is currently in the archive. This is the same version that is mentioned in the bug report as the "new" Ceph version. The "old" version mentioned in the bug report is 19.2.0~git20240301.4c76c50-0ubuntu6.
Let's compare the sources for MDSMap::decode() to see if it changed between the versions - if so, it would be a good suspect.

The Ceph source for the Ceph packages is in
https://git.launchpad.net/ubuntu/+source/ceph.

git clone https://git.launchpad.net/ubuntu/+source/ceph
cd ceph

> git grep -n MDSMap::decode
src/mds/FSMap.cc:1086:   * Insert INLINE; see comment in MDSMap::decode.
src/mds/MDSMap.cc:836:void MDSMap::decode(bufferlist::const_iterator& p)

So we're interested in src/mds/MDSMap.cc (if the file was not renamed
and the function was not moved).

Let's get the file for 2 different revisions, extract MDSMap::decode()
function from both and then compare to see the difference.

> git tag | grep 19.2.0-0ubuntu0.24.04.1
applied/19.2.0-0ubuntu0.24.04.1
import/19.2.0-0ubuntu0.24.04.1
> git show applied/19.2.0-0ubuntu0.24.04.1:src/mds/MDSMap.cc > /tmp/MDSMap.cc.new

The old version is 19.2.0~git20240301.4c76c50-0ubuntu6, the closest tag
(by name) in the repo is applied/19.2.0_git20240301.4c76c50-0ubuntu6:

> git show applied/19.2.0_git20240301.4c76c50-0ubuntu6:src/mds/MDSMap.cc
> /tmp/MDSMap.cc.old

After running diff for the files we see that both encode and decode
functions were changed. This is the relevant part for the decode
function:

> diff -u /tmp/MDSMap.cc.old /tmp/MDSMap.cc.new
...
@@ -852,7 +863,8 @@
     decode(cas_pool, p);
   }
 
-  // kclient ignores everything from here
+  // kclient skips most of what's below
+  // see fs/ceph/mdsmap.c for current decoding
   __u16 ev = 1;
   if (struct_v >= 2)
     decode(ev, p);
@@ -949,11 +961,16 @@
   }
 
   if (ev >= 17) {
-    decode(max_xattr_size, p);
+    decode(bal_rank_mask, p);
   }
 
   if (ev >= 18) {
-    decode(bal_rank_mask, p);
+    decode(max_xattr_size, p);
+  }
+
+  if (ev >= 19) {
+    decode(qdb_cluster_leader, p);
+    decode(qdb_cluster_members, p);
   }
 
   /* All MDS since at least v14.0.0 understand INLINE */

We see that the order of fields and the number of fields changed in the
decode() function, and it doesn't seem to be an error handling for the
cases when the format is incorrect.

Now let's explore the binary to see where exactly is the panic in
MDSMap::decode().

We have ceph-mon binary extracted earlier. We could load it in gdb,
which should provide disassembled versions of the functions. We could
also try to load debuginfo and put the source tree at the right place to
get even better symbols and source references.

> gdb ./usr/bin/ceph-mon
...
This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
...
(gdb) start
Downloading source file /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc
Temporary breakpoint 1 at 0x32c670: file /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc, line 250.
...
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdf98)
    at /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc:250
warning: 250    /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc: No such file or directory
(gdb)

Now we know that it's looking for the source tree in
/usr/src/ceph-19.2.0-0ubuntu0.24.04.1/. Let's put the tree there (you
may need to add "deb-src" after "deb" (so it becomes "deb deb-src") in
/etc/apt/sources.list.d/ubuntu.sources):

> cd /usr/src/
> sudo apt source ceph

Now we see that the dir with the Ceph source is is ceph-19.2.0. Let's
create a symlink so gdb would be able to find it:

> sudo ln -sv ceph-19.2.0 ceph-19.2.0-0ubuntu0.24.04.1
'ceph-19.2.0-0ubuntu0.24.04.1' -> 'ceph-19.2.0'

Let's restart gdb with ceph-mon again:

(gdb) start
Temporary breakpoint 1 at 0x32c670: file /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc, line 250.
Starting program: /tmp/2/usr/bin/ceph-mon 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdf98)
    at /usr/src/ceph-19.2.0-0ubuntu0.24.04.1/src/ceph_mon.cc:250
250     {
(gdb) l
245       }
246       return addrs;
247     }
248
249     int main(int argc, const char **argv)
250     {
251       // reset our process name, in case we did a respawn, so that it's not
252       // left as "exe".
253       ceph_pthread_setname(pthread_self(), "ceph-mon");
254

Now we see the sources. The part of the backtrace that we want to know
more about is

 10:
(MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xca1)
[0x7497532c3ab1]

Let's see what's there:

(gdb) set pagination off
(gdb) disassemble 'MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)'
Dump of assembler code for function _ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE:
Address range 0x7ffff7cc2e10 to 0x7ffff7cc3c4d:
   0x00007ffff7cc2e10 <+0>:     endbr64
   0x00007ffff7cc2e14 <+4>:     push   %rbp
   0x00007ffff7cc2e15 <+5>:     mov    %rsp,%rbp
   0x00007ffff7cc2e18 <+8>:     push   %r15
   0x00007ffff7cc2e1a <+10>:    push   %r14
   0x00007ffff7cc2e1c <+12>:    lea    -0x2f3(%rbp),%rdx
...

We see that offsets here are in decimal and offsets in the stack
backtrace are in hex. We need decimal, so

(gdb) p 0xca1
$1 = 3233

Let's find this offset in the disassembled function:

(gdb) disassemble/m 'MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)'
Dump of assembler code for function _ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE:
Address range 0x7ffff7cc2e10 to 0x7ffff7cc3c4d:
837     {
   0x00007ffff7cc2e10 <+0>:     endbr64
   0x00007ffff7cc2e14 <+4>:     push   %rbp
   0x00007ffff7cc2e15 <+5>:     mov    %rsp,%rbp
...
963       if (ev >= 17) {                                                           
   0x00007ffff7cc3a65 <+3157>:  cmp    $0x10,%r12w                                  
   0x00007ffff7cc3a6a <+3162>:  je     0x7ffff7cc3371 <_ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE+1377>    
                                                                                    
964         decode(bal_rank_mask, p);                                               
   0x00007ffff7cc3a70 <+3168>:  lea    -0x2a4(%rbp),%rdx                            
   0x00007ffff7cc3a77 <+3175>:  mov    $0x4,%esi                                    
   0x00007ffff7cc3a7c <+3180>:  mov    %r13,%rdi                                    
   0x00007ffff7cc3a7f <+3183>:  lea    0x1c0(%rbx),%r14                             
                                                                                    
965       }                                                                         
966                                                                                 
967       if (ev >= 18) {                                                           
   0x00007ffff7cc3ab1 <+3233>:  cmp    $0x11,%r12w                                  
   0x00007ffff7cc3ab6 <+3238>:  je     0x7ffff7cc3371 <_ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE+1377>    
                                                                                    
968         decode(max_xattr_size, p);                                              
969       }                                                                         
970                                                                                 
971       if (ev >= 19) {                                                           
   0x00007ffff7cc3ade <+3278>:  cmp    $0x12,%r12w                                  
...

The return address is 0x00007ffff7cc3ab1 <+3233>, so we're looking for a call just before that.
The addresses here are not continuous, so it makes sense to look at the full disassembled version as well (i.e. disassemble without /m):

(gdb) disassemble 'MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)'
Dump of assembler code for function _ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE:
Address range 0x7ffff7cc2e10 to 0x7ffff7cc3c4d:
   0x00007ffff7cc2e10 <+0>:     endbr64
   0x00007ffff7cc2e14 <+4>:     push   %rbp
...
   0x00007ffff7cc3a65 <+3157>:  cmp    $0x10,%r12w
   0x00007ffff7cc3a6a <+3162>:  je     0x7ffff7cc3371 <_ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE+1377>
   0x00007ffff7cc3a70 <+3168>:  lea    -0x2a4(%rbp),%rdx
   0x00007ffff7cc3a77 <+3175>:  mov    $0x4,%esi
   0x00007ffff7cc3a7c <+3180>:  mov    %r13,%rdi
   0x00007ffff7cc3a7f <+3183>:  lea    0x1c0(%rbx),%r14
   0x00007ffff7cc3a86 <+3190>:  call   0x7ffff7a93320 <_ZN4ceph6buffer7v15_2_04list13iterator_implILb1EE4copyEjPc>
   0x00007ffff7cc3a8b <+3195>:  mov    0x1c0(%rbx),%rax
   0x00007ffff7cc3a92 <+3202>:  mov    -0x2a4(%rbp),%esi
   0x00007ffff7cc3a98 <+3208>:  mov    %r14,%rdx
   0x00007ffff7cc3a9b <+3211>:  mov    %r13,%rdi
   0x00007ffff7cc3a9e <+3214>:  movq   $0x0,0x1c8(%rbx)
   0x00007ffff7cc3aa9 <+3225>:  movb   $0x0,(%rax)
   0x00007ffff7cc3aac <+3228>:  call   0x7ffff7a93400 <_ZN4ceph6buffer7v15_2_04list13iterator_implILb1EE4copyEjRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE>
   0x00007ffff7cc3ab1 <+3233>:  cmp    $0x11,%r12w
...

The function that was called just before that has signature
_ZN4ceph6buffer7v15_2_04list13iterator_implILb1EE4copyEjRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE,
which is

(gdb) demangle _ZN4ceph6buffer7v15_2_04list13iterator_implILb1EE4copyEjRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)

and it seems like it's the function that we see in the frame 9:

 9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned
int, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >&)+0x193) [0x749753093593]

so we're on the right track. The function is just after the

   0x00007ffff7cc3a65 <+3157>:  cmp    $0x10,%r12w
   0x00007ffff7cc3a6a <+3162>:  je     0x7ffff7cc3371 <_ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE+1377>

branch, which seems like this is the if part from

 963   if (ev >= 17) {
 964     decode(bal_rank_mask, p);
 965   }

If the value equals 16 then the jump happens, otherwise decode(bal_rank_mask, p); is called.
bal_rank_mask is std::string, and the function has basic_string in the list of parameters, so it seems like we're still on the right track.

697   std::string bal_rank_mask = "-1";

As we see in the diff above

   if (ev >= 17) {
-    decode(max_xattr_size, p);
+    decode(bal_rank_mask, p);
   }
 
   if (ev >= 18) {
-    decode(bal_rank_mask, p);
+    decode(max_xattr_size, p);
+  }
+

these two decode() calls were swapped. Let's find out why.
To do so we need to clone the upstream repo and run git blame on the file to see when and why the lines were changed:

> git clone https://github.com/ceph/ceph ceph-upstream
> cd ceph-upstream/
> git blame src/mds/MDSMap.cc
...
e134c8907013 (Yongseok Oh      2022-10-11 20:47:32 +0900  963)   if (ev >= 17) {
78abfeaff27f (Patrick Donnelly 2024-02-15 10:28:32 -0500  964)     decode(bal_rank_mask, p);
36ee8e7ed365 (Venky Shankar    2023-12-01 04:32:20 -0500  965)   }
36ee8e7ed365 (Venky Shankar    2023-12-01 04:32:20 -0500  966) 
36ee8e7ed365 (Venky Shankar    2023-12-01 04:32:20 -0500  967)   if (ev >= 18) {
78abfeaff27f (Patrick Donnelly 2024-02-15 10:28:32 -0500  968)     decode(max_xattr_size, p);
e134c8907013 (Yongseok Oh      2022-10-11 20:47:32 +0900  969)   }
...

We see that both decode() functions where changed in the same commit. If
we look at it with

> git show 78abfeaff27f

we'll see that this is what we were looking for. A link to the commit:
https://github.com/ceph/ceph/commit/78abfeaff27fee343fb664db633de5b221699a73.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2089565

Title:
  MON and MDS crash upgrading  CEPH  on ubuntu 24.04 LTS

Status in ceph package in Ubuntu:
  Confirmed

Bug description:
  This issue is a continuation of
  https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2065515

  
  On Ubuntu 24.04 lts we did upgrade Ceph to  19.2.0-0ubuntu0.24.04.1

  Previous release is : 19.2.0~git20240301.4c76c50-0ubuntu6

  whenever  upgrading (tested on 2 different clusters)  the ceph-mon
  ends up crashing repeatedly with the below stack error

  ```
   ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)
   1: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x788409245320]
   2: pthread_kill()
   3: gsignal()
   4: abort()
   5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5ff5) [0x7884096a5ff5]
   6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb0da) [0x7884096bb0da]
   7: (std::unexpected()+0) [0x7884096a5a55]
   8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb391) [0x7884096bb391]
   9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)+0x193) [0x78840a293593]
   10: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xca1) [0x78840a4c3ab1]
   11: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1c3) [0x78840a4e4303]
   12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x280) [0x78840a4e6ef0]
   13: (MDSMonitor::update_from_paxos(bool*)+0x291) [0x631ac5dea801]
   14: (Monitor::refresh_from_paxos(bool*)+0x124) [0x631ac5b7a164]
   15: (Monitor::preinit()+0x98e) [0x631ac5bb2fbe]
   16: main()
   17: /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x78840922a1ca]
   18: __libc_start_main()
   19: _start()
   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  ```

  
  mitigation:
  a rollback to the previous release 19.2.0~git20240301.4c76c50-0ubuntu6 is still possible to restore service

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2089565/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list