[Bug 1977851] [NEW] Netplan is not setting up SRIOV Virtual Functions on Jammy Charmed OpenStack during boot

Itai Levy 1977851 at bugs.launchpad.net
Tue Jun 7 13:36:18 UTC 2022


Public bug reported:

Trying to deploy Charmed OpenStack (Yoga) Jammy series with OVN Hardware
Offload.

# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"

# uname -a
Linux node3 5.15.0-35-generic #36-Ubuntu SMP Sat May 21 02:24:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
      
# cat /etc/openstack-release 
OPENSTACK_CODENAME=yoga


As part of the charms bundle the following config is used:
 ovn-chassis:
    charm: ch:ovn-chassis
    # Please update the `bridge-interface-mappings` to values suitable for the
    # hardware used in your deployment. See the referenced documentation at the
    # top of this file.
    options:
      ovn-bridge-mappings: datacentre:br-ex
      bridge-interface-mappings: *data-port
      enable-hardware-offload: true
      sriov-numvfs:  "ens1f1:8"
    channel: 22.03/stable
    bindings:
      "": *internal-space
      data: *overlay-space

This is translated to the following netplan file on the deployed node:
 cat /etc/netplan/150-charm-ovn.yaml 
###############################################################################
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Config managed by ovn-chassis charm
###############################################################################
network:
  version: 2
  ethernets:
    ens1f1:
      virtual-function-count: 8
      embedded-switch-mode: switchdev
      delay-virtual-functions-rebind: true
    
    

However after reboot of the deployed servers, the SRIOV VFs are not enabled on the NVIDIA NIC:
# lspci | grep -i nox
08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller


When manually running the netplan, VFs are configured (and switch mode change is failing as the NIC is already bounded - I believe this is expected):


#netplan --debug apply
.
.
.
   ens1f1:
      delay-virtual-functions-rebind: true
      embedded-switch-mode: switchdev
      match:
        macaddress: 04:3f:72:9e:0b:a1
      mtu: 1500
      set-name: ens1f1
      virtual-function-count: 8
.
.
.

DEBUG:Found VFs of 0000:08:00.1: ['0000:08:02.3', '0000:08:02.4', '0000:08:02.5', '0000:08:02.6', '0000:08:02.7', '0000:08:03.0', '0000:08:03.1', '0000:08:03.2']
Error: mlx5_core: Can't change mode, E-Switch is busy.
kernel answers: Device or resource busy
Traceback (most recent call last):
  File "/usr/sbin/netplan", line 23, in <module>
    netplan.main()
  File "/usr/share/netplan/netplan/cli/core.py", line 50, in main
    self.run_command()
  File "/usr/share/netplan/netplan/cli/utils.py", line 247, in run_command
    self.func()
  File "/usr/share/netplan/netplan/cli/commands/apply.py", line 61, in run
    self.run_command()
  File "/usr/share/netplan/netplan/cli/utils.py", line 247, in run_command
    self.func()
  File "/usr/share/netplan/netplan/cli/commands/apply.py", line 245, in command_apply
    NetplanApply.process_sriov_config(config_manager, exit_on_error)
  File "/usr/share/netplan/netplan/cli/commands/apply.py", line 376, in process_sriov_config
    apply_sriov_config(config_manager)
  File "/usr/share/netplan/netplan/cli/sriov.py", line 492, in apply_sriov_config
    pcidev.devlink_set('eswitch', 'mode', eswitch_mode)
  File "/usr/share/netplan/netplan/cli/sriov.py", line 143, in devlink_set
    subprocess.check_call(
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/sbin/devlink', 'dev', 'eswitch', 'set', 'pci/0000:08:00.1', 'mode', 'switchdev']' returned non-zero exit status 1.
root at node3:/home/ubuntu# 


# lspci | grep -i nox
08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
08:02.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.1 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

** Affects: plan (Ubuntu)
     Importance: Undecided
         Status: New

** Package changed: openvswitch (Ubuntu) => plan (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1977851

Title:
  Netplan is not setting up SRIOV Virtual Functions on Jammy Charmed
  OpenStack during boot

Status in plan package in Ubuntu:
  New

Bug description:
  Trying to deploy Charmed OpenStack (Yoga) Jammy series with OVN
  Hardware Offload.

  # cat /etc/lsb-release 
  DISTRIB_ID=Ubuntu
  DISTRIB_RELEASE=22.04
  DISTRIB_CODENAME=jammy
  DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"

  # uname -a
  Linux node3 5.15.0-35-generic #36-Ubuntu SMP Sat May 21 02:24:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
        
  # cat /etc/openstack-release 
  OPENSTACK_CODENAME=yoga

  
  As part of the charms bundle the following config is used:
   ovn-chassis:
      charm: ch:ovn-chassis
      # Please update the `bridge-interface-mappings` to values suitable for the
      # hardware used in your deployment. See the referenced documentation at the
      # top of this file.
      options:
        ovn-bridge-mappings: datacentre:br-ex
        bridge-interface-mappings: *data-port
        enable-hardware-offload: true
        sriov-numvfs:  "ens1f1:8"
      channel: 22.03/stable
      bindings:
        "": *internal-space
        data: *overlay-space

  This is translated to the following netplan file on the deployed node:
   cat /etc/netplan/150-charm-ovn.yaml 
  ###############################################################################
  # [ WARNING ]
  # Configuration file maintained by Juju. Local changes may be overwritten.
  # Config managed by ovn-chassis charm
  ###############################################################################
  network:
    version: 2
    ethernets:
      ens1f1:
        virtual-function-count: 8
        embedded-switch-mode: switchdev
        delay-virtual-functions-rebind: true
      
      

  However after reboot of the deployed servers, the SRIOV VFs are not enabled on the NVIDIA NIC:
  # lspci | grep -i nox
  08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
  08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller

  
  When manually running the netplan, VFs are configured (and switch mode change is failing as the NIC is already bounded - I believe this is expected):


  #netplan --debug apply
  .
  .
  .
     ens1f1:
        delay-virtual-functions-rebind: true
        embedded-switch-mode: switchdev
        match:
          macaddress: 04:3f:72:9e:0b:a1
        mtu: 1500
        set-name: ens1f1
        virtual-function-count: 8
  .
  .
  .

  DEBUG:Found VFs of 0000:08:00.1: ['0000:08:02.3', '0000:08:02.4', '0000:08:02.5', '0000:08:02.6', '0000:08:02.7', '0000:08:03.0', '0000:08:03.1', '0000:08:03.2']
  Error: mlx5_core: Can't change mode, E-Switch is busy.
  kernel answers: Device or resource busy
  Traceback (most recent call last):
    File "/usr/sbin/netplan", line 23, in <module>
      netplan.main()
    File "/usr/share/netplan/netplan/cli/core.py", line 50, in main
      self.run_command()
    File "/usr/share/netplan/netplan/cli/utils.py", line 247, in run_command
      self.func()
    File "/usr/share/netplan/netplan/cli/commands/apply.py", line 61, in run
      self.run_command()
    File "/usr/share/netplan/netplan/cli/utils.py", line 247, in run_command
      self.func()
    File "/usr/share/netplan/netplan/cli/commands/apply.py", line 245, in command_apply
      NetplanApply.process_sriov_config(config_manager, exit_on_error)
    File "/usr/share/netplan/netplan/cli/commands/apply.py", line 376, in process_sriov_config
      apply_sriov_config(config_manager)
    File "/usr/share/netplan/netplan/cli/sriov.py", line 492, in apply_sriov_config
      pcidev.devlink_set('eswitch', 'mode', eswitch_mode)
    File "/usr/share/netplan/netplan/cli/sriov.py", line 143, in devlink_set
      subprocess.check_call(
    File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['/sbin/devlink', 'dev', 'eswitch', 'set', 'pci/0000:08:00.1', 'mode', 'switchdev']' returned non-zero exit status 1.
  root at node3:/home/ubuntu# 


  # lspci | grep -i nox
  08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
  08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
  08:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
  08:02.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
  08:02.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
  08:02.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
  08:02.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
  08:02.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
  08:03.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
  08:03.1 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
  08:03.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/plan/+bug/1977851/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list