[Bug 1963698] Re: ovn-controller on Wallaby creates high CPU usage after moving port

DUFOUR Olivier 1963698 at bugs.launchpad.net
Fri Mar 4 16:12:51 UTC 2022


** Description changed:

  We are deploying Focal Wallaby for a customer
  Neutron package version (2:18.2.0-0ubuntu1~cloud0), GLIBC 2.31-0ubuntu9.7
  
  When running rally/tempest tests that are creating some VMs, the following symptoms happen:
  1) A huge increase of size and load of writings on /var/lib/openvswitch/conf.db
  (If ovsdb-server is restarted while OVS database is a few GB, the unit can fail to start)
  
  2) A very high CPU usage on the following processes :
  * neutron-ovn-metadata-agent
  * nova-compute
  * ovn-controller
  * ovsdb-server
  
  3) The Nova compute node may face some severe delays and may time-out
  when creating any instance (for Nova or Octavia Amphora) on it.
  
- A temporary way to solve the issue is to restart ovn-controller service. 
+ A temporary way to solve the issue is to restart ovn-controller service.
  Then it reproduces again after some time on a different hypervisor.
  
  It has been reproducible so far only on a customer deployment with many
  Nova-compute units.
  
  Ovn-controller.log on the hypervisor:
  2022-03-04T12:54:43.065Z|00479|binding|INFO|Changing chassis for lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from comp04.maas to comp18.maas
  .
  2022-03-04T12:54:43.065Z|00480|binding|INFO|cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53: Claiming fa:16:3e:15:1f:a6 10.218.131.106/18
  2022-03-04T12:54:43.077Z|00481|binding|INFO|Releasing lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from this chassis.
  2022-03-04T12:54:46.798Z|00482|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
  2022-03-04T12:54:46.799Z|00483|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
  2022-03-04T12:54:46.799Z|00484|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
  2022-03-04T12:54:46.799Z|00485|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
  
- 
  Full log of ovn-controller available here :
  https://private-fileshare.canonical.com/~alitvinov/random/ovn-controller.txt
+ 
+ Bundle available as well here :
+ https://private-fileshare.canonical.com/~alitvinov/random/bundle-ovn-controller.txt

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ovn in Ubuntu.
https://bugs.launchpad.net/bugs/1963698

Title:
  ovn-controller on Wallaby creates high CPU usage after moving port

Status in ovn package in Ubuntu:
  New

Bug description:
  We are deploying Focal Wallaby for a customer
  Neutron package version (2:18.2.0-0ubuntu1~cloud0), GLIBC 2.31-0ubuntu9.7

  When running rally/tempest tests that are creating some VMs, the following symptoms happen:
  1) A huge increase of size and load of writings on /var/lib/openvswitch/conf.db
  (If ovsdb-server is restarted while OVS database is a few GB, the unit can fail to start)

  2) A very high CPU usage on the following processes :
  * neutron-ovn-metadata-agent
  * nova-compute
  * ovn-controller
  * ovsdb-server

  3) The Nova compute node may face some severe delays and may time-out
  when creating any instance (for Nova or Octavia Amphora) on it.

  A temporary way to solve the issue is to restart ovn-controller service.
  Then it reproduces again after some time on a different hypervisor.

  It has been reproducible so far only on a customer deployment with
  many Nova-compute units.

  Ovn-controller.log on the hypervisor:
  2022-03-04T12:54:43.065Z|00479|binding|INFO|Changing chassis for lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from comp04.maas to comp18.maas
  .
  2022-03-04T12:54:43.065Z|00480|binding|INFO|cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53: Claiming fa:16:3e:15:1f:a6 10.218.131.106/18
  2022-03-04T12:54:43.077Z|00481|binding|INFO|Releasing lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from this chassis.
  2022-03-04T12:54:46.798Z|00482|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
  2022-03-04T12:54:46.799Z|00483|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
  2022-03-04T12:54:46.799Z|00484|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
  2022-03-04T12:54:46.799Z|00485|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)

  Full log of ovn-controller available here :
  https://private-fileshare.canonical.com/~alitvinov/random/ovn-controller.txt

  Bundle available as well here :
  https://private-fileshare.canonical.com/~alitvinov/random/bundle-ovn-controller.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1963698/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list