[Bug 1892361] Re: SRIOV instance gets type-PF interface, libvirt kvm fails
OpenStack Infra
1892361 at bugs.launchpad.net
Tue May 25 10:24:37 UTC 2021
Reviewed: https://review.opendev.org/c/openstack/nova/+/761824
Committed: https://opendev.org/openstack/nova/commit/1fb4cc03e315f5b4dbebc521f0d1299273c7c396
Submitter: "Zuul (22348)"
Branch: stable/rocky
commit 1fb4cc03e315f5b4dbebc521f0d1299273c7c396
Author: Hemanth Nakkina <hemanth.nakkina at canonical.com>
Date: Tue Sep 1 09:36:51 2020 +0530
Update pci stat pools based on PCI device changes
At start up of nova-compute service, the PCI stat pools are
populated based on information in pci_devices table in Nova
database. The pools are updated only when new device is added
or removed but not on any device changes like device type.
If an existing device is configured as SRIOV and nova-compute
is restarted, the pci_devices table gets updated but the device
is still listed under the old pool in pci_tracker.stats.pool
(in-memory object).
This patch looks for device type updates in existing devices
and updates the pools accordingly.
Conflicts:
nova/tests/functional/libvirt/test_pci_sriov_servers.py
nova/tests/unit/virt/libvirt/fakelibvirt.py
The functional test requires to skip the capabilities of pci
device. This can be done by getting capability template out of
pci_dev_template [1] which is introduced by commit
b927748c257e705903c2aa0ffa47b19914e31ede. Not able to clean
backport the mentioned commit and so removed funtional test
case.
[1] https://opendev.org/openstack/nova/src/commit/b0a451d4289dae2086b730fde6b0c7b30f3da2e8/nova/tests/unit/virt/libvirt/fakelibvirt.py#L186
Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730
Closes-Bug: #1892361
(cherry picked from commit b8695de6da56db42b83b9d9d4c330148766644be)
(cherry picked from commit d8b8a8193b6b8228f6e7d6bde68b5ea6bb53dd8b)
(cherry picked from commit f58399cf496566e39d11f82a61e0b47900f2eafa)
(cherry picked from commit 8378785f995dd4bec2a5a20f7bf5946b3075120d)
(cherry picked from commit 73e631862a81e85fdf9305f3d15b201d780c8743)
** Changed in: cloud-archive/rocky
Status: New => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1892361
Title:
SRIOV instance gets type-PF interface, libvirt kvm fails
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive queens series:
New
Status in Ubuntu Cloud Archive rocky series:
Fix Committed
Status in Ubuntu Cloud Archive stein series:
Fix Committed
Status in Ubuntu Cloud Archive train series:
Fix Released
Status in Ubuntu Cloud Archive ussuri series:
Fix Released
Status in Ubuntu Cloud Archive victoria series:
Fix Released
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
In Progress
Status in OpenStack Compute (nova) rocky series:
In Progress
Status in OpenStack Compute (nova) stein series:
Fix Committed
Status in OpenStack Compute (nova) train series:
Fix Released
Status in OpenStack Compute (nova) ussuri series:
Fix Released
Status in OpenStack Compute (nova) victoria series:
Fix Released
Status in nova package in Ubuntu:
Fix Released
Status in nova source package in Bionic:
New
Status in nova source package in Focal:
Fix Released
Status in nova source package in Groovy:
Fix Released
Status in nova source package in Hirsute:
Fix Released
Bug description:
When spawning an SR-IOV enabled instance on a newly deployed host,
nova attempts to spawn it with an type-PF pci device. This fails with
the below stack trace.
After restarting neutron-sriov-agent and nova-compute services on the
compute node and spawning an SR-IOV instance again, a type-VF pci
device is selected, and instance spawning succeeds.
Stack trace:
2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [insta
nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last):
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in _build_resources
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in _create_domain_and_network
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5620, in _create_domain_and_network
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] post_xml_callback=post_xml_callback)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5555, in _create_domain
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] guest.launch(pause=pause)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 144, in launch
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self._encoded_xml, errors='ignore')
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 139, in launch
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] return self._domain.createWithFlags(flags)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] rv = execute(f, *args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(c, e, tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] rv = meth(*args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1092, in createWithFlags
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]
2020-08-20 08:29:09.599 7624 INFO nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Terminating instance
To reproduce, bring up an instance with an SR-IOV port on a freshly
deployed compute:
+ openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_network=physnet2 testinstance_net-port
+ openstack server create --flavor ce6da933-adc3-4e5f-a688-63b037705729 --image a3580f59-a6c6-41f6-85fa-2fc7277492a1 --nic port-id=547cd89a-3f91-4646-84d9-c9559b497526 --availability-zone nova:foo-compute-host testinstance_vanilla_66016d81-bc32-4def-a7b3-a3a164ca5164
Observe that a PF is getting selected for the sriov nic.
From nova-compute.log:
<interface type='hostdev' managed='yes'>
<mac address='98:03:9b:61:22:e9'/>
<source>
<address type='pci' domain='0x0000' bus='0xd8' slot='0x00' function='0x1'/>
</source>
<vlan>
<tag id='48'/>
</vlan>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
...
2020-08-20 08:29:09.056 7624 DEBUG nova.virt.libvirt.vif [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b]
vif_type=hw_veb ...
vif={"profile":
{"pci_slot": "0000:d8:00.1", "physical_network": "physnet2", "pci_vendor_info": "15b3:1015"},
"ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [],
"address": "192.168.0.5"}], "version": 4, "meta": {"dhcp_server": "192.168.0.2"}, "dns": [], "routes": [], "cidr": "192.168.0.0/29",
"gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.0.1"}}], "meta": {"injected": false, "tenant_id": "dd99e7950a5b46b5b924ccd1720b6257",
"physical_network": "physnet2", "mtu": 9000},
"id": "60b3001e-21c1-4947-8996-314449f614c060b3001e-21c1-4947-8996-314449f614c0", "label": "net_20Aug-1"}, "devname": "tapf3953098-98", "vnic_type": "direct", "qbh_params": null, "meta": {},
"details": {"port_filter": false, "vlan": "48"}, "address": "98:03:9b:61:22:e9", "active": false, "type": "hw_veb", "id": "f3953098-98f7-4dd1-8b31-11f51a5a760f", "qbg_params": null}
virt_type=kvm get_config /usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py:572
Device is a PF:
# lspci | grep d8:00.1
d8:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
Also the nova pci_devices table has it's dev_type correctly listed:
mysql> select compute_nodes.host, pci_devices.created_at, compute_node_id, address, dev_type, status, pci_devices.dev_id from pci_devices join compute_nodes ON (compute_nodes.id = pci_devices.compute_node_id) where compute_nodes.host = 'foo-compute-host' and pci_devices.dev_type = 'type-PF';
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
| host | created_at | compute_node_id | address | dev_type | status | dev_id |
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
| foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 |
| foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:d8:00.1 | type-PF | available | pci_0000_d8_00_1 |
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
Restarting services:
# systemctl status neutron-sriov-agent.service
# systemctl restart neutron-sriov-agent.service
Spawning an instance again, it gets allocated a type-VF port (and
spawning succeeds):
<interface type='hostdev' managed='yes'>
<mac address='fa:16:3e:34:d2:99'/>
<source>
<address type='pci' domain='0x0000' bus='0xd8' slot='0x05' function='0x1'/>
</source>
<vlan>
<tag id='4'/>
</vlan>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
# lspci | grep d8:05.1
d8:05.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
After spawning an instance, the PF get marked as "unavailable" in the
nova db:
+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
| host | created_at | updated_at | instance_uuid | compute_node_id | address | dev_type | status | dev_id |
+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
| foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:45:07 | NULL | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 |
| foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:46:30 | NULL | 95 | 0000:d8:00.1 | type-PF | unavailable | pci_0000_d8_00_1 |
+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
Software versions:
# dpkg -l | grep nova-common
ii nova-common 2:17.0.12-0ubuntu1 all OpenStack Compute - common files
# dpkg -l | grep libvirt0
ii libvirt0:amd64 4.0.0-1ubuntu8.17 amd64 library for interfacing with different virtualization systems
# lsb_release -r
Release: 18.04
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Impact]
Spawning an SR-IOV instance fails on a newly deployed compute.
Nova attempts to spawn a PCI device of type type-PCI instead of type-VF.
This was happened in OpenStack Queens deployment.
[Test case]
1. Issue can be reproduced by following steps in comment #3
https://bugs.launchpad.net/nova/+bug/1892361/comments/3
2. Install the package with fixed code
3. Confirm bug have been fixed
Repeat the steps mentioned in comment #3 and check if the instance with sriov port is created successfully.
[Where problems could occur]
Upstream CI ran all the functional test cases that triggers this scenario.
Installation of new package will result in restart of nova-compute service.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1892361/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list