[Bug 2003530] Re: Rook mgr module crashes due to missing mgr.nfs

Luciano Lo Giudice 2003530 at bugs.launchpad.net
Mon Jul 10 20:24:06 UTC 2023


Hello everyone, and sorry about the noise before. Here's a test plan
that I think should satisfy all the requirements.

First, we create a small ceph cluster. I used juju to add some machines
with the following commands:

'juju add-machine --series=jammy'

Afterwards, we ssh into the target machines and added the -proposed
archives:

$ cat /etc/apt/sources.list.d/ubuntu-jammy-proposed.list
deb http://archive.ubuntu.com/ubuntu jammy-proposed main multiverse restricted universe

We can verify that the 17.2.6 is going to be installed by running:

$ apt-cache policy ceph
ceph:
  Installed: 17.2.6-0ubuntu0.22.04.1
  Candidate: 17.2.6-0ubuntu0.22.04.1
  Version table:
 *** 17.2.6-0ubuntu0.22.04.1 500
        500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     17.2.5-0ubuntu0.22.04.3 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     17.1.0-0ubuntu3 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu jammy/main amd64 Packages

With that in place, we deploy a small ceph cluster. I used 3 mons and 3
osd's, but anything should work.

Once the cluster has been deployed we once again ssh into one of the
target machines (in this case, one of the mons).

As a precaution, we can test that ceph is running the proposed package:

$ ceph-mon -v
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)

Now, we ensure that rook-mgr isn't installed or running:

$ sudo ceph mgr module ls

MODULE                           
balancer           on (always on)
crash              on (always on)
devicehealth       on (always on)
orchestrator       on (always on)
pg_autoscaler      on (always on)
progress           on (always on)
rbd_support        on (always on)
status             on (always on)
telemetry          on (always on)
volumes            on (always on)
iostat             on            
nfs                on            
restful            on            
alerts             -             
influx             -             
insights           -             
localpool          -             
mirroring          -             
osd_perf_query     -             
osd_support        -             
prometheus         -             
selftest           -             
snap_schedule      -             
stats              -             
telegraf           -             
test_orchestrator  -             
zabbix             - 

Then, we install the rook-mgr module by hand:

$ sudo apt install ceph-mgr-rook

Next, we enable the module:

$ sudo ceph mgr module enable rook

We then check that the ceph cluster is in a healthy state and no modules
have crashed by running:

ubuntu at juju-9f12b1-ceph-0:~$ sudo ceph -s
  cluster:
    id:     026e3f56-1f5e-11ee-bf28-95ba6942eafd
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum juju-9f12b1-ceph-1,juju-9f12b1-ceph-2,juju-9f12b1-ceph-0 (age 9m)
    mgr: juju-9f12b1-ceph-2(active, since 6m), standbys: juju-9f12b1-ceph-1, juju-9f12b1-ceph-0
    osd: 3 osds: 3 up, 3 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail

And finally, we check that the rook module is up and running:

$ sudo ceph mgr module ls

MODULE                           
balancer           on (always on)
crash              on (always on)
devicehealth       on (always on)
orchestrator       on (always on)
pg_autoscaler      on (always on)
progress           on (always on)
rbd_support        on (always on)
status             on (always on)
telemetry          on (always on)
volumes            on (always on)
iostat             on            
nfs                on            
restful            on            
rook               on            
alerts             -             
influx             -             
insights           -             
localpool          -             
mirroring          -             
osd_perf_query     -             
osd_support        -             
prometheus         -             
selftest           -             
snap_schedule      -             
stats              -             
telegraf           -             
test_orchestrator  -             
zabbix             -

The process can be repeated with Kinetic instead, with identical
results.

Again, I apologize for not providing a correct test plan before. Please
let me know if any further verification is needed.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2003530

Title:
  Rook mgr module crashes due to missing mgr.nfs

Status in ceph package in Ubuntu:
  Fix Released
Status in ceph source package in Jammy:
  Fix Committed
Status in ceph source package in Kinetic:
  Fix Committed
Status in ceph source package in Lunar:
  Fix Released

Bug description:
  [Impact]

  The rook mgr service crashes on installing the ceph-mgr-rook package
  (see below traceback from /var/log/syslog). This is due to a missing
  ceph mgr package "nfs" which the rook mgr module depends upon.

  This makes the rook mgr module unusable which is required for
  integrating Ceph with the Rook storage orchestrator.

  The proposed patch fixes this by including the nfs mgr package into
  the ceph-mgr-modules-core .deb. This is similar as upstream packages
  nfs for the ceph mgr system.

  
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: debug 2023-01-17T16:39:18.008+0000 7f930419fdc0 -1 mgr[py] Module not found: 'rook'
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: debug 2023-01-17T16:39:18.008+0000 7f930419fdc0 -1 mgr[py] Traceback (most recent call last):
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: File "/usr/share/ceph/mgr/rook/__init__.py", line 5, in <module>
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: from .module import RookOrchestrator
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: File "/usr/share/ceph/mgr/rook/module.py", line 41, in <module>
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: from .rook_cluster import RookCluster
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: File "/usr/share/ceph/mgr/rook/rook_cluster.py", line 29, in <module>
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: from nfs.cluster import create_ganesha_pool
  Jan 17 16:39:18 devcontainer-269785 bash[247610]: ModuleNotFoundError: No module named 'nfs'

  [Test plan]

  The test requires a Ceph cluster. SSH to a system with a running ceph-
  mon service.

  $ sudo ceph mgr module ls  # verify: no rook mgr module
  $ sudo apt-get -q install ceph-mgr-rook
  $ sudo ceph -s  # verify: no crashed modules
  $ sudo ceph mgr module ls  # verify: rook mgr module present and enabled

  
  [Where problems could occur]

  The proposed patch only includes an additional Python package, and
  regression potential should be low.

  Issues could occur due to packaging bugs, such as missing dependencies
  for the nfs mgr package. As the nfs package is currently missing,
  there should not be any additional impact due to this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2003530/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list