[Bug 1825843] Re: systemd issues with bionic-rocky causing nagios alert and can't restart daemon

Thu May 2 22:14:32 UTC 2019

Currently trying to get charm ceph-radosgw  rev 268, Ubuntu Bionic,
OpenStack Stein (base bundle like)

I'm having the issue where the service seems running, then after few
minutes the unit change to blocked and report the service is not
running, this is in a baremetal deployment with multiples network spaces
(handled by maas).

For this charm I deployed using single commands per unit required:

juju deploy --to lxd:0 --config ceph-radosgw.yaml ceph-radosgw --bind="public admin=admin cluster=cluster internal=internal public=public"

then the config file looks like:
---
ceph-radosgw:
  ceph-osd-replication-count: 2
  cache-size: 1200
  os-admin-network: 10.101.0.0/24
  os-internal-network: 10.50.0.0/24
  os-public-network: 10.100.0.0/24
  vip: 10.100.0.210 10.101.0.210 10.50.0.210
  pool-prefix: sc
  source: 'cloud:bionic-stein'

juju shows this:

ceph-radosgw/0*           blocked   idle       0/lxd/0  10.100.0.64     80/tcp                      Services not running that should be: ceph-radosgw at rgw.juju-a2d93a-0-lxd-0
  ha-radosgw/0*           active    executing           10.100.0.64                                 Unit is ready and clustered
ceph-radosgw/1            blocked   executing  6/lxd/0  10.100.0.77     80/tcp                      Services not running that should be: ceph-radosgw at rgw.juju-a2d93a-6-lxd-0
  ha-radosgw/1            active    idle                10.100.0.77                                 Unit is ready and clustered
ceph-radosgw/2            blocked   executing  1/lxd/3  10.100.0.78     80/tcp                      Services not running that should be: ceph-radosgw at rgw.juju-a2d93a-1-lxd-3

then going to each LXD container and checking the service I got this output in every unit:

ubuntu at juju-a2d93a-0-lxd-0:~$ sudo service jujud-unit-ceph-radosgw-0 status
● jujud-unit-ceph-radosgw-0.service - juju unit agent for ceph-radosgw/0
   Loaded: loaded (/lib/systemd/system/jujud-unit-ceph-radosgw-0/jujud-unit-ceph-radosgw-0.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-05-02 19:59:07 UTC; 2h 2min ago
 Main PID: 3820 (bash)
    Tasks: 67 (limit: 7372)
   CGroup: /system.slice/jujud-unit-ceph-radosgw-0.service
           ├─3820 bash /lib/systemd/system/jujud-unit-ceph-radosgw-0/exec-start.sh
           └─3824 /var/lib/juju/tools/unit-ceph-radosgw-0/jujud unit --data-dir /var/lib/juju --unit-name ceph-radosgw/0 --debug

May 02 21:57:27 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:14 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:24 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:27 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:27 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted

then run this command: 

ubuntu at juju-a2d93a-0-lxd-0:~$ sudo service ceph-radosgw at rgw.juju-a2d93a-6-lxd-0 status
● ceph-radosgw at rgw.juju-a2d93a-6-lxd-0.service - Ceph rados gateway
   Loaded: loaded (/lib/systemd/system/ceph-radosgw at .service; indirect; vendor preset: enabled)
   Active: inactive (dead)

Is there some workaround for this issue? 

juju unit log: /var/log/juju/unit-ceph-radosgw-0.log

https://paste.ubuntu.com/p/RhFcHSmx3B/

Thanks guys

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1825843

Title:
  systemd issues with bionic-rocky causing nagios alert and can't
  restart daemon

Status in OpenStack ceph-radosgw charm:
  Fix Committed
Status in ceph package in Ubuntu:
  Invalid

Bug description:
  During deployment of a bionic-rocky cloud on 19.04 charms, we are
  seeing an issue with the ceph-radosgw units related to the systemd
  service definition for radosgw.service.

  If you look through this pastebin, you'll notice that there is a
  running radosgw daemon and the local haproxy unit thinks all radosgw
  backend services are available (via nagios check), but systemd can't
  control radosgw properly (note that before a restart with systemd,
  systemd just showed the unit as loaded inactive, however, it now shows
  active exited, but that did not actually restart the radosgw service.

  https://pastebin.ubuntu.com/p/Pn3sQ3zHXx/

  charm: cs:ceph-radosgw-266
  cloud:bionic-rocky
   *** 13.2.4+dfsg1-0ubuntu0.18.10.1~cloud0 500
          500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/rocky/main amd64 Packages

  ceph-radosgw/0                    active    idle   18/lxd/2  10.20.175.60    80/tcp                                   Unit is ready
    hacluster-radosgw/2             active    idle             10.20.175.60                                             Unit is ready and clustered
  ceph-radosgw/1                    active    idle   19/lxd/2  10.20.175.48    80/tcp                                   Unit is ready
    hacluster-radosgw/1             active    idle             10.20.175.48                                             Unit is ready and clustered
  ceph-radosgw/2*                   active    idle   20/lxd/2  10.20.175.25    80/tcp                                   Unit is ready
    hacluster-radosgw/0*            active    idle             10.20.175.25                                             Unit is ready and clustered

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1825843/+subscriptions