[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

Xav Paice xav.paice at canonical.com
Tue May 28 21:00:42 UTC 2019


Charm is cs:ceph-osd-284
Ceph version is 12.2.11-0ubuntu0.18.04.2

The udev rules are created by curtin during the maas install.

Here's an example udev rule:

cat bcache4.rules

# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{CACHED_UUID}=="7b0e872b-ac78-4c4e-af18-8ccdce5962f6", SYMLINK+="disk/by-dname/bcache4"

The problem here is that when the host boots, for some OSDs (random,
changes each boot), there's no symlinks for block.db and block.wal in
/var/lib/ceph/osd/ceph-${thing}.  If I manually create those two
symlinks (and make sure the perms are right for the links themselves),
then the OSD starts.

Some of the OSDs do get those links though, and that's interesting
because on these hosts, the ceph wal and db for all the OSDs are LVs on
the same nvme device, in fact the same partition even.  The ceph OSD
block dev is an LV on a different device.


** Changed in: systemd (Ubuntu)
       Status: Incomplete => New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1828617

Title:
  Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

Status in systemd package in Ubuntu:
  New

Bug description:
  Ubuntu 18.04.2 Ceph deployment.

  Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices.
  LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev.
  However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process.

  The behaviour can be fixed manually by running "#/sbin/lvm pvscan
  --cache --activate ay /dev/nvme0n1" command for re-activating the LVM
  components and then the services can be started.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions



More information about the foundations-bugs mailing list