[Bug 1986747] Re: [quincy] invalid osd_class_dir blocks rados client connections

Launchpad Bug Tracker 1986747 at bugs.launchpad.net
Thu Sep 1 15:29:06 UTC 2022


** Merge proposal linked:
   https://code.launchpad.net/~chris.macnaughton/ubuntu/+source/ceph/+git/ceph/+merge/429304

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1986747

Title:
  [quincy] invalid osd_class_dir blocks rados client connections

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive yoga series:
  New
Status in ceph package in Ubuntu:
  Confirmed
Status in ceph source package in Jammy:
  Confirmed
Status in ceph source package in Kinetic:
  Confirmed

Bug description:
  [Impact]

  A Ceph cluster that has upgraded from before 19.04 will break upon
  upgrade to Ceph Quincy (Jammy and Kinetic).

  Ubuntu packaging is configuring `osd_class_dir` with a relative path
  `CMAKE_INSTALL_LIBDIR` instead of the required absolute path
  `CMAKE_INSTALL_FULL_LIBDIR` [0].

  The default value for `osd_class_dir` changed in Quincy, starting with
  v17.1.0 [1].

  The ceph-osd service relies on the `osd_class_dir` path to find and load class libraries that extend RADOS [2]. When this is set incorrectly, RADOS clients fail with repeated "Operation not supported" errors:
  ```
  2022-08-16T17:42:15.044+0000 7fe375685e40 0 rgw main: ERROR: failed reading data (obj=default.rgw.log:bucket.sync-target-hints.), r=-95
  2022-08-16T17:42:15.048+0000 7fe375685e40 0 rgw main: ERROR: failed to read targets index for bucket=:[]) r=-95
  2022-08-16T17:42:15.048+0000 7fe375685e40 0 rgw main: ERROR: failed to initialize bucket sync policy handler: get_bucket_sync_hints() on bucket=-- returned r=-95
  2022-08-16T17:42:15.048+0000 7fe375685e40 -1 rgw main: ERROR: could not initialize zone policy handler for zone=default
  2022-08-16T17:42:15.048+0000 7fe375685e40 0 rgw main: ERROR: failed to start notify service ((95) Operation not supported
  2022-08-16T17:42:15.048+0000 7fe375685e40 0 rgw main: ERROR: failed to init services (ret=(95) Operation not supported)
  ```

  The ceph-osd service will also report `_load_class` errors:
  ```
  2022-08-16T19:05:55.562+0000 7f4770ff9700 0 _load_class could not stat class lib/x86_64-linux-gnu/rados-classes/libcls_rbd.so: (2) No such file or directory
  ```

  Admins can resolve this issue by manually setting `osd_class_dir` to the correct value. Run the following command on a ceph-mon:
  ```
  sudo ceph config set global osd_class_dir /usr/lib/x86_64-linux-gnu/rados-classes
  ```

  Then restart all ceph-osd services to pick up the new `osd_class_dir`
  location.

  [0] https://cmake.org/cmake/help/v3.24/module/GNUInstallDirs.html#result-variables
  [1] https://github.com/ceph/ceph/commit/3bee4b02611459b9ae949cebf5967e4d83ef55de
  [2] https://docs.ceph.com/en/latest/dev/osd-class-path/

  [Test Plan]

  1. Install Ceph at Bionic
  2. Upgrade through to Jammy
    a. Confirm that client usage is broken
  3. Upgrade to Jammy-proposed
    a. Confirm that client usage works again

  In addition to client activity, it can be confirmed that the OSDs
  don't have error logs about failing to load classes.

  [Where problems could occur]

  Problems could occur as a result of library paths changing, so Ceph
  functionality should be verified. This will be done with functional
  tests of Ceph using the Ceph Juju charms.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1986747/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list