[Bug 1892361] Re: SRIOV instance gets type-PF interface, libvirt kvm fails

Hemanth Nakkina 1892361 at bugs.launchpad.net
Wed Jan 13 08:50:18 UTC 2021


** Description changed:

  When spawning an SR-IOV enabled instance on a newly deployed host, nova
  attempts to spawn it with an type-PF pci device. This fails with the
  below stack trace.
  
  After restarting neutron-sriov-agent and nova-compute services on the
  compute node and spawning an SR-IOV instance again, a type-VF pci device
  is selected, and instance spawning succeeds.
  
  Stack trace:
  2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [insta
  nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last):
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in _build_resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     yield resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     block_device_info=block_device_info)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     destroy_disks_on_failure=True)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in _create_domain_and_network
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     destroy_disks_on_failure)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     self.force_reraise()
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     six.reraise(self.type_, self.value, self.tb)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5620, in _create_domain_and_network
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     post_xml_callback=post_xml_callback)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5555, in _create_domain
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     guest.launch(pause=pause)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 144, in launch
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     self._encoded_xml, errors='ignore')
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     self.force_reraise()
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     six.reraise(self.type_, self.value, self.tb)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 139, in launch
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     return self._domain.createWithFlags(flags)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     rv = execute(f, *args, **kwargs)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     six.reraise(c, e, tb)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     rv = meth(*args, **kwargs)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1092, in createWithFlags
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
- 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] 
+ 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]
  2020-08-20 08:29:09.599 7624 INFO nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Terminating instance
  
+ To reproduce, bring up an instance with an SR-IOV port on a freshly
+ deployed compute:
  
- To reproduce, bring up an instance with an SR-IOV port on a freshly deployed compute:
- 
- + openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_network=physnet2 testinstance_net-port  
- + openstack server create --flavor ce6da933-adc3-4e5f-a688-63b037705729 --image a3580f59-a6c6-41f6-85fa-2fc7277492a1 --nic port-id=547cd89a-3f91-4646-84d9-c9559b497526 --availability-zone nova:foo-compute-host testinstance_vanilla_66016d81-bc32-4def-a7b3-a3a164ca5164 
+ + openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_network=physnet2 testinstance_net-port
+ + openstack server create --flavor ce6da933-adc3-4e5f-a688-63b037705729 --image a3580f59-a6c6-41f6-85fa-2fc7277492a1 --nic port-id=547cd89a-3f91-4646-84d9-c9559b497526 --availability-zone nova:foo-compute-host testinstance_vanilla_66016d81-bc32-4def-a7b3-a3a164ca5164
  
  Observe that a PF is getting selected for the sriov nic.
  
  From nova-compute.log:
  
-     <interface type='hostdev' managed='yes'>
-       <mac address='98:03:9b:61:22:e9'/>
-       <source>
-         <address type='pci' domain='0x0000' bus='0xd8' slot='0x00' function='0x1'/>
-       </source>
-       <vlan>
-         <tag id='48'/>
-       </vlan>
-       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
-     </interface>
+     <interface type='hostdev' managed='yes'>
+       <mac address='98:03:9b:61:22:e9'/>
+       <source>
+         <address type='pci' domain='0x0000' bus='0xd8' slot='0x00' function='0x1'/>
+       </source>
+       <vlan>
+         <tag id='48'/>
+       </vlan>
+       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
+     </interface>
  ...
- 2020-08-20 08:29:09.056 7624 DEBUG nova.virt.libvirt.vif [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] 
+ 2020-08-20 08:29:09.056 7624 DEBUG nova.virt.libvirt.vif [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b]
  vif_type=hw_veb ...
- vif={"profile": 
-   {"pci_slot": "0000:d8:00.1", "physical_network": "physnet2", "pci_vendor_info": "15b3:1015"}, 
-   "ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], 
-   "address": "192.168.0.5"}], "version": 4, "meta": {"dhcp_server": "192.168.0.2"}, "dns": [], "routes": [], "cidr": "192.168.0.0/29", 
-   "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.0.1"}}], "meta": {"injected": false, "tenant_id": "dd99e7950a5b46b5b924ccd1720b6257", 
-   "physical_network": "physnet2", "mtu": 9000}, 
-   "id": "60b3001e-21c1-4947-8996-314449f614c060b3001e-21c1-4947-8996-314449f614c0", "label": "net_20Aug-1"}, "devname": "tapf3953098-98", "vnic_type": "direct", "qbh_params": null, "meta": {}, 
-   "details": {"port_filter": false, "vlan": "48"}, "address": "98:03:9b:61:22:e9", "active": false, "type": "hw_veb", "id": "f3953098-98f7-4dd1-8b31-11f51a5a760f", "qbg_params": null} 
+ vif={"profile":
+   {"pci_slot": "0000:d8:00.1", "physical_network": "physnet2", "pci_vendor_info": "15b3:1015"},
+   "ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [],
+   "address": "192.168.0.5"}], "version": 4, "meta": {"dhcp_server": "192.168.0.2"}, "dns": [], "routes": [], "cidr": "192.168.0.0/29",
+   "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.0.1"}}], "meta": {"injected": false, "tenant_id": "dd99e7950a5b46b5b924ccd1720b6257",
+   "physical_network": "physnet2", "mtu": 9000},
+   "id": "60b3001e-21c1-4947-8996-314449f614c060b3001e-21c1-4947-8996-314449f614c0", "label": "net_20Aug-1"}, "devname": "tapf3953098-98", "vnic_type": "direct", "qbh_params": null, "meta": {},
+   "details": {"port_filter": false, "vlan": "48"}, "address": "98:03:9b:61:22:e9", "active": false, "type": "hw_veb", "id": "f3953098-98f7-4dd1-8b31-11f51a5a760f", "qbg_params": null}
  virt_type=kvm get_config /usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py:572
  
  Device is a PF:
  
  # lspci | grep d8:00.1
  d8:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
  
  Also the nova pci_devices table has it's dev_type correctly listed:
  
  mysql> select compute_nodes.host, pci_devices.created_at, compute_node_id, address, dev_type, status, pci_devices.dev_id from pci_devices join compute_nodes ON (compute_nodes.id = pci_devices.compute_node_id) where   compute_nodes.host = 'foo-compute-host' and pci_devices.dev_type = 'type-PF';
  +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
  | host             | created_at          | compute_node_id | address      | dev_type | status    | dev_id           |
  +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
  | foo-compute-host | 2020-08-12 17:10:19 |              95 | 0000:19:00.1 | type-PF  | available | pci_0000_19_00_1 |
  | foo-compute-host | 2020-08-12 17:10:19 |              95 | 0000:d8:00.1 | type-PF  | available | pci_0000_d8_00_1 |
  +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
  
  Restarting services:
  
- # systemctl status neutron-sriov-agent.service 
- # systemctl restart neutron-sriov-agent.service 
+ # systemctl status neutron-sriov-agent.service
+ # systemctl restart neutron-sriov-agent.service
  
  Spawning an instance again, it gets allocated a type-VF port (and
  spawning succeeds):
  
-     <interface type='hostdev' managed='yes'>
-       <mac address='fa:16:3e:34:d2:99'/>
-       <source>
-         <address type='pci' domain='0x0000' bus='0xd8' slot='0x05' function='0x1'/>
-       </source>
-       <vlan>
-         <tag id='4'/>
-       </vlan>
-       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
-     </interface>
+     <interface type='hostdev' managed='yes'>
+       <mac address='fa:16:3e:34:d2:99'/>
+       <source>
+         <address type='pci' domain='0x0000' bus='0xd8' slot='0x05' function='0x1'/>
+       </source>
+       <vlan>
+         <tag id='4'/>
+       </vlan>
+       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
+     </interface>
  
  # lspci | grep d8:05.1
  d8:05.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
  
- 
- After spawning an instance, the PF get marked as "unavailable" in the nova db:
+ After spawning an instance, the PF get marked as "unavailable" in the
+ nova db:
  
  +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
  | host             | created_at          | updated_at          | instance_uuid | compute_node_id | address      | dev_type | status      | dev_id           |
  +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
  | foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:45:07 | NULL          |              95 | 0000:19:00.1 | type-PF  | available   | pci_0000_19_00_1 |
  | foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:46:30 | NULL          |              95 | 0000:d8:00.1 | type-PF  | unavailable | pci_0000_d8_00_1 |
  +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
- 
  
  Software versions:
  
  # dpkg -l | grep nova-common
  ii  nova-common                            2:17.0.12-0ubuntu1                              all          OpenStack Compute - common files
  # dpkg -l | grep libvirt0
  ii  libvirt0:amd64                         4.0.0-1ubuntu8.17                               amd64        library for interfacing with different virtualization systems
  # lsb_release -r
  Release:        18.04
+ 
+ 
+ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ 
+ [Impact]
+ 
+ Spawning an SR-IOV instance fails on a newly deployed compute. 
+ Nova attempts to spawn a PCI device of type type-PCI instead of type-VF.
+ 
+ This was happened in OpenStack Queens deployment.
+ 
+ [Test case]
+ 
+ 1. Issue can be reproduced by following steps in comment #3
+    https://bugs.launchpad.net/nova/+bug/1892361/comments/3
+ 
+ 2. Install the package with fixed code
+ 
+ 3. Confirm bug have been fixed
+    Repeat the steps mentioned in comment #3 and check if the instance with sriov port is created successfully.
+ 
+ [Where problems could occur]
+ 
+ Upstream CI ran all the functional test cases that triggers this scenario. 
+ Installation of new package will result in restart of nova-compute service.

** Patch added: "Debdiff for focal"
   https://bugs.launchpad.net/ubuntu/focal/+source/nova/+bug/1892361/+attachment/5452615/+files/lp892361_focal.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1892361

Title:
  SRIOV instance gets type-PF interface, libvirt kvm fails

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive queens series:
  New
Status in Ubuntu Cloud Archive rocky series:
  New
Status in Ubuntu Cloud Archive stein series:
  New
Status in Ubuntu Cloud Archive train series:
  New
Status in Ubuntu Cloud Archive ussuri series:
  New
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  In Progress
Status in OpenStack Compute (nova) ussuri series:
  In Progress
Status in OpenStack Compute (nova) victoria series:
  Fix Released
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Bionic:
  New
Status in nova source package in Focal:
  New
Status in nova source package in Groovy:
  Fix Released
Status in nova source package in Hirsute:
  Fix Released

Bug description:
  When spawning an SR-IOV enabled instance on a newly deployed host,
  nova attempts to spawn it with an type-PF pci device. This fails with
  the below stack trace.

  After restarting neutron-sriov-agent and nova-compute services on the
  compute node and spawning an SR-IOV instance again, a type-VF pci
  device is selected, and instance spawning succeeds.

  Stack trace:
  2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [insta
  nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last):
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in _build_resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     yield resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     block_device_info=block_device_info)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     destroy_disks_on_failure=True)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in _create_domain_and_network
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     destroy_disks_on_failure)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     self.force_reraise()
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     six.reraise(self.type_, self.value, self.tb)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5620, in _create_domain_and_network
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     post_xml_callback=post_xml_callback)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5555, in _create_domain
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     guest.launch(pause=pause)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 144, in launch
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     self._encoded_xml, errors='ignore')
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     self.force_reraise()
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     six.reraise(self.type_, self.value, self.tb)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 139, in launch
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     return self._domain.createWithFlags(flags)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     rv = execute(f, *args, **kwargs)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     six.reraise(c, e, tb)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     rv = meth(*args, **kwargs)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1092, in createWithFlags
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]
  2020-08-20 08:29:09.599 7624 INFO nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Terminating instance

  To reproduce, bring up an instance with an SR-IOV port on a freshly
  deployed compute:

  + openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_network=physnet2 testinstance_net-port
  + openstack server create --flavor ce6da933-adc3-4e5f-a688-63b037705729 --image a3580f59-a6c6-41f6-85fa-2fc7277492a1 --nic port-id=547cd89a-3f91-4646-84d9-c9559b497526 --availability-zone nova:foo-compute-host testinstance_vanilla_66016d81-bc32-4def-a7b3-a3a164ca5164

  Observe that a PF is getting selected for the sriov nic.

  From nova-compute.log:

      <interface type='hostdev' managed='yes'>
        <mac address='98:03:9b:61:22:e9'/>
        <source>
          <address type='pci' domain='0x0000' bus='0xd8' slot='0x00' function='0x1'/>
        </source>
        <vlan>
          <tag id='48'/>
        </vlan>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
      </interface>
  ...
  2020-08-20 08:29:09.056 7624 DEBUG nova.virt.libvirt.vif [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b]
  vif_type=hw_veb ...
  vif={"profile":
    {"pci_slot": "0000:d8:00.1", "physical_network": "physnet2", "pci_vendor_info": "15b3:1015"},
    "ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [],
    "address": "192.168.0.5"}], "version": 4, "meta": {"dhcp_server": "192.168.0.2"}, "dns": [], "routes": [], "cidr": "192.168.0.0/29",
    "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.0.1"}}], "meta": {"injected": false, "tenant_id": "dd99e7950a5b46b5b924ccd1720b6257",
    "physical_network": "physnet2", "mtu": 9000},
    "id": "60b3001e-21c1-4947-8996-314449f614c060b3001e-21c1-4947-8996-314449f614c0", "label": "net_20Aug-1"}, "devname": "tapf3953098-98", "vnic_type": "direct", "qbh_params": null, "meta": {},
    "details": {"port_filter": false, "vlan": "48"}, "address": "98:03:9b:61:22:e9", "active": false, "type": "hw_veb", "id": "f3953098-98f7-4dd1-8b31-11f51a5a760f", "qbg_params": null}
  virt_type=kvm get_config /usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py:572

  Device is a PF:

  # lspci | grep d8:00.1
  d8:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

  Also the nova pci_devices table has it's dev_type correctly listed:

  mysql> select compute_nodes.host, pci_devices.created_at, compute_node_id, address, dev_type, status, pci_devices.dev_id from pci_devices join compute_nodes ON (compute_nodes.id = pci_devices.compute_node_id) where   compute_nodes.host = 'foo-compute-host' and pci_devices.dev_type = 'type-PF';
  +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
  | host             | created_at          | compute_node_id | address      | dev_type | status    | dev_id           |
  +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
  | foo-compute-host | 2020-08-12 17:10:19 |              95 | 0000:19:00.1 | type-PF  | available | pci_0000_19_00_1 |
  | foo-compute-host | 2020-08-12 17:10:19 |              95 | 0000:d8:00.1 | type-PF  | available | pci_0000_d8_00_1 |
  +------------------+---------------------+-----------------+--------------+----------+-----------+------------------+

  Restarting services:

  # systemctl status neutron-sriov-agent.service
  # systemctl restart neutron-sriov-agent.service

  Spawning an instance again, it gets allocated a type-VF port (and
  spawning succeeds):

      <interface type='hostdev' managed='yes'>
        <mac address='fa:16:3e:34:d2:99'/>
        <source>
          <address type='pci' domain='0x0000' bus='0xd8' slot='0x05' function='0x1'/>
        </source>
        <vlan>
          <tag id='4'/>
        </vlan>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
      </interface>

  # lspci | grep d8:05.1
  d8:05.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]

  After spawning an instance, the PF get marked as "unavailable" in the
  nova db:

  +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
  | host             | created_at          | updated_at          | instance_uuid | compute_node_id | address      | dev_type | status      | dev_id           |
  +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
  | foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:45:07 | NULL          |              95 | 0000:19:00.1 | type-PF  | available   | pci_0000_19_00_1 |
  | foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:46:30 | NULL          |              95 | 0000:d8:00.1 | type-PF  | unavailable | pci_0000_d8_00_1 |
  +------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+

  Software versions:

  # dpkg -l | grep nova-common
  ii  nova-common                            2:17.0.12-0ubuntu1                              all          OpenStack Compute - common files
  # dpkg -l | grep libvirt0
  ii  libvirt0:amd64                         4.0.0-1ubuntu8.17                               amd64        library for interfacing with different virtualization systems
  # lsb_release -r
  Release:        18.04

  
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

  [Impact]

  Spawning an SR-IOV instance fails on a newly deployed compute. 
  Nova attempts to spawn a PCI device of type type-PCI instead of type-VF.

  This was happened in OpenStack Queens deployment.

  [Test case]

  1. Issue can be reproduced by following steps in comment #3
     https://bugs.launchpad.net/nova/+bug/1892361/comments/3

  2. Install the package with fixed code

  3. Confirm bug have been fixed
     Repeat the steps mentioned in comment #3 and check if the instance with sriov port is created successfully.

  [Where problems could occur]

  Upstream CI ran all the functional test cases that triggers this scenario. 
  Installation of new package will result in restart of nova-compute service.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1892361/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list