[Bug 1804261] Re: Ceph OSD units requires reboot if they boot before vault (and if not unsealed with 150s)
Drew Freiberger
1804261 at bugs.launchpad.net
Tue Sep 29 17:04:15 UTC 2020
The bionic-ussuri package has the retries set for 10000. My start time
to vault unseal time was about 18 hours. We should have this set to
heal for up to 5 days after machine start.
I'm almost wondering if vaultlocker-decrypt also needs the retries
increased as well.
Here's a workaround I've found for anyone experiencing this
operationally:
After unsealing the vault, loop through ceph-osd units with the
following two loops to decrypt and start the LVM volumes for ceph-osd
services to startup:
for i in $(ls /etc/systemd/system/multi-user.target.wants/vaultlocker-decrypt@*|cut -d/ -f6); do sudo systemctl start $i; done
for i in $(ls /etc/systemd/system/multi-user.target.wants/ceph-volume@*|cut -d/ -f6); do sudo systemctl start $i; done
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1804261
Title:
Ceph OSD units requires reboot if they boot before vault (and if not
unsealed with 150s)
Status in OpenStack ceph-osd charm:
Invalid
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive queens series:
Fix Released
Status in Ubuntu Cloud Archive rocky series:
Fix Released
Status in Ubuntu Cloud Archive stein series:
Fix Released
Status in Ubuntu Cloud Archive train series:
Fix Released
Status in Ubuntu Cloud Archive ussuri series:
Fix Released
Status in ceph package in Ubuntu:
Fix Released
Status in ceph source package in Bionic:
Fix Released
Status in ceph source package in Disco:
Won't Fix
Status in ceph source package in Eoan:
Fix Released
Status in ceph source package in Focal:
Fix Released
Bug description:
[Impact]
Various configuration option values that are read from environment variables are incorrectly parsed as strings rather than ints which means that for certain deployment use-cases, the timeouts for starting the ceph-osd volume units cannot be increased to accommodate dependencies starting first.
[Test Case]
Deploy ceph with vault for key management
set a systemd override for ceph-volume@
Environment=CEPH_VOLUME_SYSTEMD_TRIES=2000
Seal vault units (by restarting the vault service)
Reboot ceph-osd machines - Environment override is ignored as its not correctly parsed.
[Regression Potential]
Low - this fix has been accept upstream in later releases.
[Original Bug Report]
In a scenario where Ceph is encrypted and using Vault as the keymanager, in a scenario where vault and ceph are both stopped, any OSDs on the unit(s) affected will require a further reboot if they try to start before vault is unsealed.
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-osd/+bug/1804261/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list