[Bug 1906280] Re: [SRU] Add support for disabling mlockall() calls in ovs-vswitchd

Michael Skalka 1906280 at bugs.launchpad.net
Wed Jan 27 18:45:47 UTC 2021


We are still seeing this issue using the -next version of the ovn-
chassis charm, as seen during this test run for the charm release:
https://solutions.qa.canonical.com/testruns/testRun/23d8528d-2931-4be6-a0d1-bad21e3d75a5

Artifacts can be found here: https://oil-
jenkins.canonical.com/artifacts/23d8528d-2931-4be6-a0d1-bad21e3d75a5/index.html

And specifically the openstack crashdump here: https://oil-
jenkins.canonical.com/artifacts/23d8528d-2931-4be6-a0d1-bad21e3d75a5/generated/generated/openstack
/juju-crashdump-openstack-2021-01-27-18.32.08.tar.gz

Symptoms are the same, ovn-chassis units stay blocked:

ubuntu at production-cpe-23d8528d-2931-4be6-a0d1-bad21e3d75a5:~$ juju status octavia-ovn-chassis
Model      Controller        Cloud/Region        Version  SLA          Timestamp
openstack  foundations-maas  maas_cloud/default  2.8.7    unsupported  18:29:58Z

App                    Version  Status   Scale  Charm             Store       Rev  OS      Notes
hacluster-octavia               active       0  hacluster         jujucharms  161  ubuntu  
logrotated                      active       0  logrotated        jujucharms    2  ubuntu  
octavia                6.1.0    blocked      3  octavia           jujucharms   90  ubuntu  
octavia-ovn-chassis    20.03.1  waiting      3  ovn-chassis       jujucharms   49  ubuntu  
public-policy-routing           active       0  advanced-routing  jujucharms    3  ubuntu  

Unit                        Workload  Agent      Machine  Public address  Ports     Message
octavia/0*                  blocked   idle       1/lxd/8  10.244.40.229   9876/tcp  Awaiting end-user execution of `configure-resources` action to create required resources
  hacluster-octavia/0*      active    idle                10.244.40.229             Unit is ready and clustered
  logrotated/62             active    idle                10.244.40.229             Unit is ready.
  octavia-ovn-chassis/0*    waiting   executing           10.244.40.229             'ovsdb' incomplete
  public-policy-routing/44  active    idle                10.244.40.229             Unit is ready
octavia/1                   blocked   idle       3/lxd/8  10.244.40.244   9876/tcp  Awaiting leader to create required resources
  hacluster-octavia/1       active    idle                10.244.40.244             Unit is ready and clustered
  logrotated/63             active    idle                10.244.40.244             Unit is ready.
  octavia-ovn-chassis/1     waiting   executing           10.244.40.244             'ovsdb' incomplete
  public-policy-routing/45  active    idle                10.244.40.244             Unit is ready
octavia/2                   blocked   idle       5/lxd/8  10.244.40.250   9876/tcp  Awaiting leader to create required resources
  hacluster-octavia/2       active    idle                10.244.40.250             Unit is ready and clustered
  logrotated/64             active    idle                10.244.40.250             Unit is ready.
  octavia-ovn-chassis/2     waiting   executing           10.244.40.250             'ovsdb' incomplete
  public-policy-routing/46  active    idle                10.244.40.250             Unit is ready

Machine  State    DNS            Inst id              Series  AZ     Message
1        started  10.244.41.35   armaldo              bionic  zone1  Deployed
1/lxd/8  started  10.244.40.229  juju-15ff71-1-lxd-8  bionic  zone1  Container started
3        started  10.244.41.17   spearow              bionic  zone2  Deployed
3/lxd/8  started  10.244.40.244  juju-15ff71-3-lxd-8  bionic  zone2  Container started
5        started  10.244.41.18   beartic              bionic  zone3  Deployed
5/lxd/8  started  10.244.40.250  juju-15ff71-5-lxd-8  bionic  zone3  Container started


Confirmed that the --no-mlockall flag was present in /etc/default/openvswitch-switch:

# This is a POSIX shell fragment                -*- sh -*-
###############################################################################
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Configuration managed by neutron-openvswitch charm
# Service restart triggered by remote application: 
#                                                  
###############################################################################
OVS_CTL_OPTS='--no-mlockall'

Then at the request of Billy Olsen I restarted the ovs-vswitchd service
on one of the units which caused it to go back to ready:

root at juju-15ff71-5-lxd-8:~# service ovs-vswitchd restart
root at juju-15ff71-5-lxd-8:~# service ovs-vswitchd status
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: enabled)
   Active: active (running) since Wed 2021-01-27 18:33:34 UTC; 7s ago
  Process: 70258 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS)
  Process: 79634 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random start $OVS_CTL_OPTS (code=exited, status=0/SUCCESS)
    Tasks: 22 (limit: 314572)
   CGroup: /system.slice/ovs-vswitchd.service
           └─79674 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vs

Jan 27 18:33:34 juju-15ff71-5-lxd-8 systemd[1]: Starting Open vSwitch Forwarding Unit...
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]: nice: cannot set niceness: Permission denied
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]:  * Starting ovs-vswitchd
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-vsctl[79702]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=juju-15ff71-5-lxd-8.production.solutionsqa
Jan 27 18:33:34 juju-15ff71-5-lxd-8 ovs-ctl[79634]:  * Enabling remote OVSDB managers
Jan 27 18:33:34 juju-15ff71-5-lxd-8 systemd[1]: Started Open vSwitch Forwarding Unit.

Before being booted out of the system due to our CI cleaning up the run.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1906280

Title:
  [SRU] Add support for disabling mlockall() calls in ovs-vswitchd

Status in OpenStack neutron-openvswitch charm:
  Fix Committed
Status in charm-ovn-chassis:
  Fix Committed
Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in openvswitch package in Ubuntu:
  Fix Released
Status in openvswitch source package in Bionic:
  Fix Released
Status in openvswitch source package in Focal:
  Fix Released
Status in openvswitch source package in Groovy:
  Fix Released
Status in openvswitch source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  Recent changes to systemd rlimit are resulting in memory exhaustion
  with ovs-vswitchd's use of mlockall(). mlockall() can be disabled via
  /etc/defaults/openvswitch-vswitch, however there is currently a bug in
  the shipped ovs-vswitchd systemd unit file that prevents it. The
  package will be fixed in this SRU. Additionally the neutron-
  openvswitch charm will be updated to enable disabling of mlockall()
  use in ovs-vswitchd via a config option.

  More details on the above summary can be found in the following comments:
  https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1906280/comments/16
  https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1906280/comments/19

  ====  Original bug details ===
  Original bug title:

  Charm stuck waiting for ovsdb 'no key "ovn-remote" in Open_vSwitch
  record'

  Original bug details:

  As seen during this Focal Ussuri test run: https://solutions.qa.canonical.com/testruns/testRun/5f7ad510-f57e-40ce-beb7-5f39800fa5f0
  Crashdump here: https://oil-jenkins.canonical.com/artifacts/5f7ad510-f57e-40ce-beb7-5f39800fa5f0/generated/generated/openstack/juju-crashdump-openstack-2020-11-28-03.40.36.tar.gz

  Full history of occurrences can be found here:
  https://solutions.qa.canonical.com/bugs/bugs/bug/1906280

  Octavia's ovn-chassis units are stuck waiting:

  octavia/0                             blocked   idle       1/lxd/8   10.244.8.170    9876/tcp           Awaiting leader to create required resources
    hacluster-octavia/1                 active    idle                 10.244.8.170                       Unit is ready and clustered
    logrotated/63                       active    idle                 10.244.8.170                       Unit is ready.
    octavia-ovn-chassis/1               waiting   executing            10.244.8.170                       'ovsdb' incomplete
    public-policy-routing/45            active    idle                 10.244.8.170                       Unit is ready

  When the db is reporting healthy:

  ovn-central/0*                        active    idle       1/lxd/9   10.246.64.225   6641/tcp,6642/tcp  Unit is ready (leader: ovnnb_db, ovnsb_db)
    logrotated/19                       active    idle                 10.246.64.225                      Unit is ready.
  ovn-central/1                         active    idle       3/lxd/9   10.246.64.250   6641/tcp,6642/tcp  Unit is ready (northd: active)
    logrotated/27                       active    idle                 10.246.64.250                      Unit is ready.
  ovn-central/2                         active    idle       5/lxd/9   10.246.65.21    6641/tcp,6642/tcp  Unit is ready
    logrotated/52                       active    idle                 10.246.65.21                       Unit is ready.

  Warning in the juju unit logs indicates that the charm is blocking on
  a missing key in the ovsdb:

  2020-11-27 23:36:57 INFO juju-log ovsdb:195: Invoking reactive handler: hooks/relations/ovsdb-subordinate/provides.py:97:joined:ovsdb-subordinate
  2020-11-27 23:36:57 DEBUG jujuc server.go:211 running hook tool "relation-get"
  2020-11-27 23:36:57 WARNING ovsdb-relation-changed ovs-vsctl: no key "ovn-remote" in Open_vSwitch record "." column external_ids
  2020-11-27 23:36:57 DEBUG jujuc server.go:211 running hook tool "juju-log"
  2020-11-27 23:36:57 INFO juju-log ovsdb:195: Invoking reactive handler: hooks/relations/ovsdb/requires.py:34:joined:ovsdb
  ==============================

  [Test Case]
  Note: Bionic requires additional testing due to pairing with other SRUS.

  The easiest way to test this is to deploy openstack with the neutron-
  openvswitch charm, using the new charm updates. Once deployed, edit
  /usr/share/openvswitch/scripts/ovs-ctl with an echo to show what
  MLOCKALL is set to. Then toggle the charm config option [1] and look
  at journalctl -xe to find the echo output, which should correspond to
  the mlockall setting.

  [1]
  juju config neutron-openvswitch disable-mlockall=true
  juju config neutron-openvswitch disable-mlockall=false

  [Regression Potential]
  There's potential that this will break users who have come to depend on the incorrect EnvironmentFile setting and environment variable in the systemd unit file for ovs-vswitchd. If that is the case they must be running with modified systemd unit files anyway so it is probably a moot point.

  [Discussion]
  == Groovy ==
  Update (16-12-2020): I chatted briefly with Christian and it sounds like the ltmain-whole-archive.diff may be optional, so I've dropped it from this upload. There are now 2 openvswitch's in the groovy unapproved queue. Please reject the upload from 15-12-2020 and consider accepting the upload from 16-12-2020.
  I have a query out to James and Christian about an undocumented commit that is getting picked up in the groovy upload. It is committed to the ubuntu/groovy branch of the package Vcs. See debian/ltmain-whole-archive.diff and debian/rules in the upload debdiff at http://launchpadlibrarian.net/511453613/openvswitch_2.13.1-0ubuntu1_2.13.1-0ubuntu1.1.diff.gz

  == Bionic ==
  The bionic upload is paired with the following SRUs which will also require verification:
  https://bugs.launchpad.net/bugs/1823295
  https://bugs.launchpad.net/bugs/1881077

  == Package details ==
  New package versions are in progress and can be found at:
  hirsute: https://launchpad.net/ubuntu/+source/openvswitch/2.14.0-0ubuntu2
  groovy: https://launchpad.net/ubuntu/groovy/+queue?queue_state=1&queue_text=openvswitch
  focal: https://launchpad.net/ubuntu/focal/+queue?queue_state=1&queue_text=openvswitch
  train: https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/train-staging/+packages?field.name_filter=openvswitch&field.status_filter=published&field.series_filter=
  stein: https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/stein-staging/+packages?field.name_filter=openvswitch&field.status_filter=published&field.series_filter=
  bionic: https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=openvswitch

  == Charm update ==
  https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/767212

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1906280/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list