[Bug 1838400] Re: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
James Page
james.page at ubuntu.com
Thu Apr 9 09:38:20 UTC 2020
Upstream activity seems to have dropped off on reproduction and
resolution of this issue in Luminous - is this still an issue?
I'd also note that its usual to upgrade a xenial deployment to the most
recent updates in the UCA for Queens before upgrading the underlying
Ubuntu series to avoid any version changes in deployed components during
the series upgrade process.
** Changed in: ceph (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1838400
Title:
OSD crashes when loading pgs with "FAILED assert(interval.last >
last)"
Status in OpenStack ceph-osd charm:
Invalid
Status in ceph package in Ubuntu:
Incomplete
Bug description:
This issue is tracked at https://tracker.ceph.com/issues/21142
Today, I hit it when running the following procedure:
0) nova-compute openstack-origin="cloud:xenial-queens"
1) juju upgrade-series 19 prepare bionic
2) apt-get update, dist-upgrade, do-release-upgrade, reboot
Machine 19 is hyperconverged, and runs nova-compute and ceph-osd.
Ceph was initially 12.2.8, and was updated to the latest in the
xenial-queens repo (12.2.12).
3) "ceph -s" showed half of the cluster with OSDs down due to "FAILED assert(interval.last > last)"
3.1) I tried to upgrade all ceph packages from 12.2.8 to 12.2.12 but the issue remained.
In the end, the suggestion from the linked bug was:
"""
# set [DEFAULT] ceph.conf section to
debug osd = 10/5
# from the CLI (for osd.N)
service ceph-osd at N restart
# wait until coredump is seen in the logs
service ceph-osd at N stop
perl -nle '/pg (\S+) first map.*,\ssame_interval_since/ && print$1' /var/log/ceph/ceph-osd.N.log | sort -u | xargs -I@ bash -xc 'ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-N/ --op rm-past-intervals --pgid @'
"""
Note: running the above on all the OSDs that were down fixed them.
It seems the above fix is pending backport, and also to be included in
the ubuntu ceph packaging.
4) once the upgrade of that single compute-storage node was over,
4.0) juju config nova-compute openstack-origin=distro
4.1) juju config ceph-osd source=distro
4.2) juju upgrade-series 19 complete
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-osd/+bug/1838400/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list