[Bug 1813371] Re: OVS 2.9+ systemd integration issues

James Page james.page at ubuntu.com
Mon Jul 15 08:42:43 UTC 2019


** Also affects: openvswitch (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Also affects: openvswitch (Ubuntu Eoan)
   Importance: Undecided
       Status: Confirmed

** Also affects: openvswitch (Ubuntu Disco)
   Importance: Undecided
       Status: New

** Changed in: openvswitch (Ubuntu Eoan)
       Status: Confirmed => In Progress

** Changed in: openvswitch (Ubuntu Eoan)
     Assignee: (unassigned) => James Page (james-page)

** Changed in: openvswitch (Ubuntu Eoan)
   Importance: Undecided => Medium

** Changed in: openvswitch (Ubuntu Disco)
   Importance: Undecided => Medium

** Changed in: openvswitch (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: openvswitch (Ubuntu Disco)
       Status: New => Triaged

** Changed in: openvswitch (Ubuntu Bionic)
       Status: New => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1813371

Title:
  OVS 2.9+ systemd integration issues

Status in openvswitch package in Ubuntu:
  In Progress
Status in openvswitch source package in Bionic:
  Triaged
Status in openvswitch source package in Disco:
  Triaged
Status in openvswitch source package in Eoan:
  In Progress

Bug description:
  For a few months now, we have been using OVS 2.9 (or newer) on Ubuntu Xenial in OPNFV, both with and without DPDK.
  A while ago, we observed a couple of rare race conditions when multiple Linux interfaces/bridges are mixed with OVS ports/bridges. We also observed races between DPDK binding and openvswitch-switch (actually openvswitch-switch-dpdk configured using alternatives).
  We worked around those issues by using a solution derived from the official OVS Debian readme, which recommends avoiding using `auto` for OVS bridges. Instead, we used `auto` for OVS bridges, but omitted the `auto` for the OVS ports in them. That worked almost perfectly for a while.

  However, we recently bumped a few unrelated software components (since
  we migrated from Queens to Rocky in OPNFV) and we started experiecing
  race conditions again.

  So I dugg a bit and found a couple of things:

  1. Broken dependency between ovsdb-server/ovs-vswitchd systemd services and networking.service
  This is probably a copy-pasta error from [1] `Before: network.service` which should probably be `Before: networking.service` on Debian systems.
  The consequence is quite serious - on Debian systems, the OVS services start *after* networking.service.
  Changing this leads to a service order change, which turns out to be quite the rabbit hole ...

  2. Outdated ifupdown scripts
  For example /etc/network/if-pre-up.d/openvswitch still references the old `openvswitch-nonetwork.service`.
  Luckily, this is not critical, as the fallback uses `service openvswitch-switch [...]`, so I'm not sure this should be changed, but I thought it's worth mentioning.

  3. Debian OVS does *not* handle OVS bridges without `auto`
  Upstream OVS readme recommends ommitting `auto` for OVS bridges, as mentioned earlier, to avoid exactly the race conditions we saw.
  Although following the recommendation in the upstream readme leads to a working system (`networking.service` no longer fails to start due to missing OVS bridges and/or vice-versa - ovs services no longer complain about Linux interfaces being in down state when trying to add them to OVS bridges), OVS bridges end up in DOWN state since nobody bothers to ifup them.
  Imo, networking.service (or some *other* mechanism) should call `/sbin/ifup --allow=ovs -a --read-environment` *after* the initial `/sbin/ifup -a --read-enviroment` (provided the ordering issue #1 was changed to start OVS first, of course).

  4. ovsdb-server should never start before DPDK service if DPDK is installed
  This should actually be easy to fix and I have to admit I haven't run into it lately, although I remember it being an issue a while ago.
  Anyway, a simple `After: dpdk.service` wouldn't hurt.

  5. If OVS starts before networking.service, cloud-init causes cyclic dependencies
  If we configure OVS services to start first, systemd might decide to randomly remove some units to break the following circular dependency:
    ovs-vswitchd --> ovsdb-server -(default dep)-> sysinit.target -->
    cloud-init.service --> networking.service --> ovs-vswitchd
  In my tests, I just set 'DefaultDependencies=no' for OVS services, although this might require explicitly adding back some of the indirect dependencies of `sysinit.target`, so it's a sensible recommendation.

  On my test systems, I didn't bother handling #2, as for the others I
  have some systemd drop-ins (see below), which so far seem to produce
  reproductible working environments.

  # cat /etc/systemd/system/ovsdb-server.service.d/override.conf
  [Unit]
  After=dpdk.service
  Before=networking.service
  DefaultDependencies=no

  # cat /etc/systemd/system/networking.service.d/ovs_workaround.conf
  [Service]
  ExecStart=/sbin/ifup --allow=ovs -a --read-environment

  # cat /etc/systemd/system/ovs-vswitchd.service.d/override.conf
  [Unit]
  Before=networking.service
  DefaultDependencies=no

  # lsb_release -rd
  Description:    Ubuntu 16.04.5 LTS
  Release:        16.04

  # apt-cache policy openvswitch-switch
  openvswitch-switch:
    Installed: 2.9.0-0ubuntu1~cloud0
    Candidate: 2.9.0-0ubuntu1~cloud0
    Version table:
   *** 2.9.0-0ubuntu1~cloud0 500
          500 http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-updates/queens/main amd64 Packages
          100 /var/lib/dpkg/status

  [1] https://github.com/openvswitch/ovs/blob/master/rhel
  /usr_lib_systemd_system_ovsdb-server.service#L4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1813371/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list