[Bug 1825843] Re: systemd issues with bionic-rocky causing nagios alert and can't restart daemon
Angel Vargas
angelvargas at outlook.es
Thu May 2 22:14:32 UTC 2019
Currently trying to get charm ceph-radosgw rev 268, Ubuntu Bionic,
OpenStack Stein (base bundle like)
I'm having the issue where the service seems running, then after few
minutes the unit change to blocked and report the service is not
running, this is in a baremetal deployment with multiples network spaces
(handled by maas).
For this charm I deployed using single commands per unit required:
juju deploy --to lxd:0 --config ceph-radosgw.yaml ceph-radosgw --bind="public admin=admin cluster=cluster internal=internal public=public"
then the config file looks like:
---
ceph-radosgw:
ceph-osd-replication-count: 2
cache-size: 1200
os-admin-network: 10.101.0.0/24
os-internal-network: 10.50.0.0/24
os-public-network: 10.100.0.0/24
vip: 10.100.0.210 10.101.0.210 10.50.0.210
pool-prefix: sc
source: 'cloud:bionic-stein'
juju shows this:
ceph-radosgw/0* blocked idle 0/lxd/0 10.100.0.64 80/tcp Services not running that should be: ceph-radosgw at rgw.juju-a2d93a-0-lxd-0
ha-radosgw/0* active executing 10.100.0.64 Unit is ready and clustered
ceph-radosgw/1 blocked executing 6/lxd/0 10.100.0.77 80/tcp Services not running that should be: ceph-radosgw at rgw.juju-a2d93a-6-lxd-0
ha-radosgw/1 active idle 10.100.0.77 Unit is ready and clustered
ceph-radosgw/2 blocked executing 1/lxd/3 10.100.0.78 80/tcp Services not running that should be: ceph-radosgw at rgw.juju-a2d93a-1-lxd-3
then going to each LXD container and checking the service I got this output in every unit:
ubuntu at juju-a2d93a-0-lxd-0:~$ sudo service jujud-unit-ceph-radosgw-0 status
● jujud-unit-ceph-radosgw-0.service - juju unit agent for ceph-radosgw/0
Loaded: loaded (/lib/systemd/system/jujud-unit-ceph-radosgw-0/jujud-unit-ceph-radosgw-0.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2019-05-02 19:59:07 UTC; 2h 2min ago
Main PID: 3820 (bash)
Tasks: 67 (limit: 7372)
CGroup: /system.slice/jujud-unit-ceph-radosgw-0.service
├─3820 bash /lib/systemd/system/jujud-unit-ceph-radosgw-0/exec-start.sh
└─3824 /var/lib/juju/tools/unit-ceph-radosgw-0/jujud unit --data-dir /var/lib/juju --unit-name ceph-radosgw/0 --debug
May 02 21:57:27 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:14 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:15 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:24 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:27 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
May 02 22:00:27 juju-a2d93a-0-lxd-0 systemd[1]: jujud-unit-ceph-radosgw-0.service: Failed to reset devices.list: Operation not permitted
then run this command:
ubuntu at juju-a2d93a-0-lxd-0:~$ sudo service ceph-radosgw at rgw.juju-a2d93a-6-lxd-0 status
● ceph-radosgw at rgw.juju-a2d93a-6-lxd-0.service - Ceph rados gateway
Loaded: loaded (/lib/systemd/system/ceph-radosgw at .service; indirect; vendor preset: enabled)
Active: inactive (dead)
Is there some workaround for this issue?
juju unit log: /var/log/juju/unit-ceph-radosgw-0.log
https://paste.ubuntu.com/p/RhFcHSmx3B/
Thanks guys
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1825843
Title:
systemd issues with bionic-rocky causing nagios alert and can't
restart daemon
Status in OpenStack ceph-radosgw charm:
Fix Committed
Status in ceph package in Ubuntu:
Invalid
Bug description:
During deployment of a bionic-rocky cloud on 19.04 charms, we are
seeing an issue with the ceph-radosgw units related to the systemd
service definition for radosgw.service.
If you look through this pastebin, you'll notice that there is a
running radosgw daemon and the local haproxy unit thinks all radosgw
backend services are available (via nagios check), but systemd can't
control radosgw properly (note that before a restart with systemd,
systemd just showed the unit as loaded inactive, however, it now shows
active exited, but that did not actually restart the radosgw service.
https://pastebin.ubuntu.com/p/Pn3sQ3zHXx/
charm: cs:ceph-radosgw-266
cloud:bionic-rocky
*** 13.2.4+dfsg1-0ubuntu0.18.10.1~cloud0 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/rocky/main amd64 Packages
ceph-radosgw/0 active idle 18/lxd/2 10.20.175.60 80/tcp Unit is ready
hacluster-radosgw/2 active idle 10.20.175.60 Unit is ready and clustered
ceph-radosgw/1 active idle 19/lxd/2 10.20.175.48 80/tcp Unit is ready
hacluster-radosgw/1 active idle 10.20.175.48 Unit is ready and clustered
ceph-radosgw/2* active idle 20/lxd/2 10.20.175.25 80/tcp Unit is ready
hacluster-radosgw/0* active idle 10.20.175.25 Unit is ready and clustered
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1825843/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list