[Bug 1825843] Re: systemd issues with bionic-rocky causing nagios alert and can't restart daemon

Tue Apr 23 10:05:03 UTC 2019

@james-page

In my case, after the upgrade the charm unit showed running, but the
openstack api/horizon wasn't able to talk to the service, we went to the
logs and there wasn't obvious information telling why the service wasn't
up, only from the juju controller logs we saw for some reason the
radosgw was going to failed state, after reboot the container, the
service wasn't starting, so we decide to deploy a lean openstack-base
r59 (bionic-stein) bundle as usual, so the charm was showing the exact
same problem/behavior. Not sure what is wrong with the update we did, to
get back radosgw working we have to downgrade to (bionic-rocky):

ceph-mon               13.2.4+dfsg1  active       3  ceph-mon               jujucharms   32  ubuntu
ceph-osd               13.2.4+dfsg1  active       3  ceph-osd               jujucharms  275  ubuntu
ceph-radosgw           13.2.4+dfsg1  active       1  ceph-radosgw           jujucharms  263  ubuntu

And we got the services up again. We are currently working to fix the production environment. 

When I wrote first time, I thought the initial way how the bug affect us was related to the current report.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1825843

Title:
  systemd issues with bionic-rocky causing nagios alert and can't
  restart daemon

Status in OpenStack ceph-radosgw charm:
  Triaged
Status in ceph package in Ubuntu:
  Invalid

Bug description:
  During deployment of a bionic-rocky cloud on 19.04 charms, we are
  seeing an issue with the ceph-radosgw units related to the systemd
  service definition for radosgw.service.

  If you look through this pastebin, you'll notice that there is a
  running radosgw daemon and the local haproxy unit thinks all radosgw
  backend services are available (via nagios check), but systemd can't
  control radosgw properly (note that before a restart with systemd,
  systemd just showed the unit as loaded inactive, however, it now shows
  active exited, but that did not actually restart the radosgw service.

  https://pastebin.ubuntu.com/p/Pn3sQ3zHXx/

  charm: cs:ceph-radosgw-266
  cloud:bionic-rocky
   *** 13.2.4+dfsg1-0ubuntu0.18.10.1~cloud0 500
          500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/rocky/main amd64 Packages

  ceph-radosgw/0                    active    idle   18/lxd/2  10.20.175.60    80/tcp                                   Unit is ready
    hacluster-radosgw/2             active    idle             10.20.175.60                                             Unit is ready and clustered
  ceph-radosgw/1                    active    idle   19/lxd/2  10.20.175.48    80/tcp                                   Unit is ready
    hacluster-radosgw/1             active    idle             10.20.175.48                                             Unit is ready and clustered
  ceph-radosgw/2*                   active    idle   20/lxd/2  10.20.175.25    80/tcp                                   Unit is ready
    hacluster-radosgw/0*            active    idle             10.20.175.25                                             Unit is ready and clustered

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1825843/+subscriptions