[Bug 1891567] Re: [SRU] ceph_osd crash in _committed_osd_maps when failed to encode first inc map
Ponnuvel Palaniyappan
1891567 at bugs.launchpad.net
Thu Sep 10 08:14:03 UTC 2020
Tests for Focal:
$ for osd in {0..2}; do juju ssh ceph-osd/$osd 'sudo dpkg -l | grep ceph'; done
ii ceph 15.2.3-0ubuntu0.20.04.2 amd64 distributed storage and file system
ii ceph-base 15.2.3-0ubuntu0.20.04.2 amd64 common ceph daemon libraries and management tools
ii ceph-common 15.2.3-0ubuntu0.20.04.2 amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-mds 15.2.3-0ubuntu0.20.04.2 amd64 metadata server for the ceph distributed file system
ii ceph-mgr 15.2.3-0ubuntu0.20.04.2 amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 15.2.3-0ubuntu0.20.04.2 all ceph manager modules which are always enabled
ii ceph-mon 15.2.3-0ubuntu0.20.04.2 amd64 monitor server for the ceph storage system
ii ceph-osd 15.2.3-0ubuntu0.20.04.2 amd64 OSD server for the ceph storage system
ii libcephfs2 15.2.3-0ubuntu0.20.04.2 amd64 Ceph distributed file system client library
ii python3-ceph-argparse 15.2.3-0ubuntu0.20.04.2 amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 15.2.3-0ubuntu0.20.04.2 all Python 3 utility libraries for Ceph
ii python3-cephfs 15.2.3-0ubuntu0.20.04.2 amd64 Python 3 libraries for the Ceph libcephfs library
Connection to 10.5.2.78 closed.
ii ceph 15.2.3-0ubuntu0.20.04.2 amd64 distributed storage and file system
ii ceph-base 15.2.3-0ubuntu0.20.04.2 amd64 common ceph daemon libraries and management tools
ii ceph-common 15.2.3-0ubuntu0.20.04.2 amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-mds 15.2.3-0ubuntu0.20.04.2 amd64 metadata server for the ceph distributed file system
ii ceph-mgr 15.2.3-0ubuntu0.20.04.2 amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 15.2.3-0ubuntu0.20.04.2 all ceph manager modules which are always enabled
ii ceph-mon 15.2.3-0ubuntu0.20.04.2 amd64 monitor server for the ceph storage system
ii ceph-osd 15.2.3-0ubuntu0.20.04.2 amd64 OSD server for the ceph storage system
ii libcephfs2 15.2.3-0ubuntu0.20.04.2 amd64 Ceph distributed file system client library
ii python3-ceph-argparse 15.2.3-0ubuntu0.20.04.2 amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 15.2.3-0ubuntu0.20.04.2 all Python 3 utility libraries for Ceph
ii python3-cephfs 15.2.3-0ubuntu0.20.04.2 amd64 Python 3 libraries for the Ceph libcephfs library
Connection to 10.5.0.35 closed.
ii ceph 15.2.3-0ubuntu0.20.04.2 amd64 distributed storage and file system
ii ceph-base 15.2.3-0ubuntu0.20.04.2 amd64 common ceph daemon libraries and management tools
ii ceph-common 15.2.3-0ubuntu0.20.04.2 amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-mds 15.2.3-0ubuntu0.20.04.2 amd64 metadata server for the ceph distributed file system
ii ceph-mgr 15.2.3-0ubuntu0.20.04.2 amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 15.2.3-0ubuntu0.20.04.2 all ceph manager modules which are always enabled
ii ceph-mon 15.2.3-0ubuntu0.20.04.2 amd64 monitor server for the ceph storage system
ii ceph-osd 15.2.3-0ubuntu0.20.04.2 amd64 OSD server for the ceph storage system
ii libcephfs2 15.2.3-0ubuntu0.20.04.2 amd64 Ceph distributed file system client library
ii python3-ceph-argparse 15.2.3-0ubuntu0.20.04.2 amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 15.2.3-0ubuntu0.20.04.2 all Python 3 utility libraries for Ceph
ii python3-cephfs 15.2.3-0ubuntu0.20.04.2 amd64 Python 3 libraries for the Ceph libcephfs library
Connection to 10.5.2.144 closed.
$ juju ssh ceph-mon/0 'sudo dpkg -l | grep ceph'
ii ceph 15.2.3-0ubuntu0.20.04.2 amd64 distributed storage and file system
ii ceph-base 15.2.3-0ubuntu0.20.04.2 amd64 common ceph daemon libraries and management tools
ii ceph-common 15.2.3-0ubuntu0.20.04.2 amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-mds 15.2.3-0ubuntu0.20.04.2 amd64 metadata server for the ceph distributed file system
ii ceph-mgr 15.2.3-0ubuntu0.20.04.2 amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 15.2.3-0ubuntu0.20.04.2 all ceph manager modules which are always enabled
ii ceph-mon 15.2.3-0ubuntu0.20.04.2 amd64 monitor server for the ceph storage system
ii ceph-osd 15.2.3-0ubuntu0.20.04.2 amd64 OSD server for the ceph storage system
ii libcephfs2 15.2.3-0ubuntu0.20.04.2 amd64 Ceph distributed file system client library
ii python3-ceph-argparse 15.2.3-0ubuntu0.20.04.2 amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 15.2.3-0ubuntu0.20.04.2 all Python 3 utility libraries for Ceph
ii python3-cephfs 15.2.3-0ubuntu0.20.04.2 amd64 Python 3 libraries for the Ceph libcephfs library
Connection to 10.5.1.29 closed.
$ juju ssh ceph-mon/0 'sudo ceph versions'
{
"mon": {
"ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)": 1
},
"mgr": {
"ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)": 1
},
"osd": {
"ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)": 3
},
"mds": {},
"overall": {
"ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)": 5
}
}
Connection to 10.5.1.29 closed.
** Tags removed: verification-needed-focal
** Tags added: verification-needed-done
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1891567
Title:
[SRU] ceph_osd crash in _committed_osd_maps when failed to encode
first inc map
Status in Ubuntu Cloud Archive:
Invalid
Status in Ubuntu Cloud Archive ussuri series:
Fix Committed
Status in Ubuntu Cloud Archive victoria series:
Invalid
Status in ceph package in Ubuntu:
Fix Released
Status in ceph source package in Focal:
Fix Committed
Status in ceph source package in Groovy:
Fix Released
Bug description:
[Impact]
Upstream tracker: issue#46443 [0].
The ceph-osd service can crash when processing osd map updates.
When the osd encounters a CRC error while processing an incremental
map update, it will request a full map update from its peers. In this
code path, an uninitialized variable was recently introduced and that
will get de-referenced causing a crash.
The uninitialized variable was introduced in nautilus 14.2.10, and
octopus 15.2.1.
[Test Case]
# Inject osd_inject_bad_map_crc_probability = 1
sudo ceph daemon osd.{id} config set osd_inject_bad_map_crc_probability 1
# Trigger some osd map updates by restarting a different osd
sudo systemctl restart osd@{diff-id}
[Regression Potential]
The code has been updated to leave handle_osd_maps() early if a CRC error is encountered, therefore preventing the map commit if the failure is encountered while processing an incremental map update. This will make the full map update take longer but should prevent the crash that resulted in this bug. Additionally, _committed_osd_maps() is now coded to assert if first <= last, but it is assumed that code should never be reached.
[Other Info]
Upstream has released a fix for this issue in Nautilus 14.2.11. The SRU for this point release is being tracked by LP: #1891077
Upstream has merged a fix for this issue in Octopus [1], but there is
no current release target. The ceph packages in focal, groovy, and the
ussuri cloud archive are exposed to this critical regression.
[0] https://tracker.ceph.com/issues/46443
[1] https://github.com/ceph/ceph/pull/36340
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1891567/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list