[Bug 1977851] [NEW] Netplan is not setting up SRIOV Virtual Functions on Jammy Charmed OpenStack during boot
Itai Levy
1977851 at bugs.launchpad.net
Tue Jun 7 13:36:18 UTC 2022
Public bug reported:
Trying to deploy Charmed OpenStack (Yoga) Jammy series with OVN Hardware
Offload.
# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"
# uname -a
Linux node3 5.15.0-35-generic #36-Ubuntu SMP Sat May 21 02:24:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/openstack-release
OPENSTACK_CODENAME=yoga
As part of the charms bundle the following config is used:
ovn-chassis:
charm: ch:ovn-chassis
# Please update the `bridge-interface-mappings` to values suitable for the
# hardware used in your deployment. See the referenced documentation at the
# top of this file.
options:
ovn-bridge-mappings: datacentre:br-ex
bridge-interface-mappings: *data-port
enable-hardware-offload: true
sriov-numvfs: "ens1f1:8"
channel: 22.03/stable
bindings:
"": *internal-space
data: *overlay-space
This is translated to the following netplan file on the deployed node:
cat /etc/netplan/150-charm-ovn.yaml
###############################################################################
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Config managed by ovn-chassis charm
###############################################################################
network:
version: 2
ethernets:
ens1f1:
virtual-function-count: 8
embedded-switch-mode: switchdev
delay-virtual-functions-rebind: true
However after reboot of the deployed servers, the SRIOV VFs are not enabled on the NVIDIA NIC:
# lspci | grep -i nox
08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
When manually running the netplan, VFs are configured (and switch mode change is failing as the NIC is already bounded - I believe this is expected):
#netplan --debug apply
.
.
.
ens1f1:
delay-virtual-functions-rebind: true
embedded-switch-mode: switchdev
match:
macaddress: 04:3f:72:9e:0b:a1
mtu: 1500
set-name: ens1f1
virtual-function-count: 8
.
.
.
DEBUG:Found VFs of 0000:08:00.1: ['0000:08:02.3', '0000:08:02.4', '0000:08:02.5', '0000:08:02.6', '0000:08:02.7', '0000:08:03.0', '0000:08:03.1', '0000:08:03.2']
Error: mlx5_core: Can't change mode, E-Switch is busy.
kernel answers: Device or resource busy
Traceback (most recent call last):
File "/usr/sbin/netplan", line 23, in <module>
netplan.main()
File "/usr/share/netplan/netplan/cli/core.py", line 50, in main
self.run_command()
File "/usr/share/netplan/netplan/cli/utils.py", line 247, in run_command
self.func()
File "/usr/share/netplan/netplan/cli/commands/apply.py", line 61, in run
self.run_command()
File "/usr/share/netplan/netplan/cli/utils.py", line 247, in run_command
self.func()
File "/usr/share/netplan/netplan/cli/commands/apply.py", line 245, in command_apply
NetplanApply.process_sriov_config(config_manager, exit_on_error)
File "/usr/share/netplan/netplan/cli/commands/apply.py", line 376, in process_sriov_config
apply_sriov_config(config_manager)
File "/usr/share/netplan/netplan/cli/sriov.py", line 492, in apply_sriov_config
pcidev.devlink_set('eswitch', 'mode', eswitch_mode)
File "/usr/share/netplan/netplan/cli/sriov.py", line 143, in devlink_set
subprocess.check_call(
File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/sbin/devlink', 'dev', 'eswitch', 'set', 'pci/0000:08:00.1', 'mode', 'switchdev']' returned non-zero exit status 1.
root at node3:/home/ubuntu#
# lspci | grep -i nox
08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
08:02.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.1 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
** Affects: plan (Ubuntu)
Importance: Undecided
Status: New
** Package changed: openvswitch (Ubuntu) => plan (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1977851
Title:
Netplan is not setting up SRIOV Virtual Functions on Jammy Charmed
OpenStack during boot
Status in plan package in Ubuntu:
New
Bug description:
Trying to deploy Charmed OpenStack (Yoga) Jammy series with OVN
Hardware Offload.
# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"
# uname -a
Linux node3 5.15.0-35-generic #36-Ubuntu SMP Sat May 21 02:24:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/openstack-release
OPENSTACK_CODENAME=yoga
As part of the charms bundle the following config is used:
ovn-chassis:
charm: ch:ovn-chassis
# Please update the `bridge-interface-mappings` to values suitable for the
# hardware used in your deployment. See the referenced documentation at the
# top of this file.
options:
ovn-bridge-mappings: datacentre:br-ex
bridge-interface-mappings: *data-port
enable-hardware-offload: true
sriov-numvfs: "ens1f1:8"
channel: 22.03/stable
bindings:
"": *internal-space
data: *overlay-space
This is translated to the following netplan file on the deployed node:
cat /etc/netplan/150-charm-ovn.yaml
###############################################################################
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Config managed by ovn-chassis charm
###############################################################################
network:
version: 2
ethernets:
ens1f1:
virtual-function-count: 8
embedded-switch-mode: switchdev
delay-virtual-functions-rebind: true
However after reboot of the deployed servers, the SRIOV VFs are not enabled on the NVIDIA NIC:
# lspci | grep -i nox
08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
When manually running the netplan, VFs are configured (and switch mode change is failing as the NIC is already bounded - I believe this is expected):
#netplan --debug apply
.
.
.
ens1f1:
delay-virtual-functions-rebind: true
embedded-switch-mode: switchdev
match:
macaddress: 04:3f:72:9e:0b:a1
mtu: 1500
set-name: ens1f1
virtual-function-count: 8
.
.
.
DEBUG:Found VFs of 0000:08:00.1: ['0000:08:02.3', '0000:08:02.4', '0000:08:02.5', '0000:08:02.6', '0000:08:02.7', '0000:08:03.0', '0000:08:03.1', '0000:08:03.2']
Error: mlx5_core: Can't change mode, E-Switch is busy.
kernel answers: Device or resource busy
Traceback (most recent call last):
File "/usr/sbin/netplan", line 23, in <module>
netplan.main()
File "/usr/share/netplan/netplan/cli/core.py", line 50, in main
self.run_command()
File "/usr/share/netplan/netplan/cli/utils.py", line 247, in run_command
self.func()
File "/usr/share/netplan/netplan/cli/commands/apply.py", line 61, in run
self.run_command()
File "/usr/share/netplan/netplan/cli/utils.py", line 247, in run_command
self.func()
File "/usr/share/netplan/netplan/cli/commands/apply.py", line 245, in command_apply
NetplanApply.process_sriov_config(config_manager, exit_on_error)
File "/usr/share/netplan/netplan/cli/commands/apply.py", line 376, in process_sriov_config
apply_sriov_config(config_manager)
File "/usr/share/netplan/netplan/cli/sriov.py", line 492, in apply_sriov_config
pcidev.devlink_set('eswitch', 'mode', eswitch_mode)
File "/usr/share/netplan/netplan/cli/sriov.py", line 143, in devlink_set
subprocess.check_call(
File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/sbin/devlink', 'dev', 'eswitch', 'set', 'pci/0000:08:00.1', 'mode', 'switchdev']' returned non-zero exit status 1.
root at node3:/home/ubuntu#
# lspci | grep -i nox
08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
08:02.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.1 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/plan/+bug/1977851/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list