[Bug 1633120] Re: [SRU] Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance

Edward Hope-Morley edward.hope-morley at canonical.com
Thu Aug 1 09:42:08 UTC 2019


Mitaka not backportable so abandoning:

$ git-deps -e mitaka-eol 5c5a6b93a07b0b58f513396254049c17e2883894^!
c2c3b97259258eec3c98feabde3b411b519eae6e

$ git-deps -e mitaka-eol c2c3b97259258eec3c98feabde3b411b519eae6e^!
a023c32c70b5ddbae122636c26ed32e5dcba66b2
74fbff88639891269f6a0752e70b78340cf87e9a
e83842b80b73c451f78a4bb9e7bd5dfcebdefcab
1f259e2a9423a4777f79ca561d5e6a74747a5019
b01187eede3881f72addd997c8fd763ddbc137fc
49d9433c62d74f6ebdcf0832e3a03e544b1d6c83


** Changed in: cloud-archive/mitaka
       Status: Triaged => Won't Fix

** Changed in: nova (Ubuntu Xenial)
       Status: Triaged => Won't Fix

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1633120

Title:
  [SRU] Nova scheduler tries to assign an already-in-use SRIOV QAT VF to
  a new instance

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive mitaka series:
  Won't Fix
Status in Ubuntu Cloud Archive ocata series:
  Fix Committed
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Fix Committed
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Xenial:
  Won't Fix
Status in nova source package in Bionic:
  Fix Released
Status in nova source package in Cosmic:
  Fix Released
Status in nova source package in Disco:
  Fix Released
Status in nova source package in Eoan:
  Fix Released

Bug description:
  [Impact]
  This patch is required to prevent nova from accidentally marking pci_device allocations as deleted when it incorrectly reads the passthrough whitelist 

  [Test Case]
  * deploy openstack (any version that supports sriov)
  * single compute configured for sriov with at least once device in pci_passthrough_whitelist
  * create a vm and attach sriov port
  * remove device from pci_passthrough_whitelist and restart nova-compute
  * check that pci_devices allocations have not been marked as deleted

  [Regression Potential]
  None anticipated
  ----------------------------------------------------------------------------
  Upon trying to create VM instance (Say A) with one QAT VF, it fails with the following error i.e., “Requested operation is not valid: PCI device 0000:88:04.7 is in use by driver QEMU, domain instance-00000081”. Please note that, PCI device 0000:88:04.7 is already being assigned to another VM (Say B) .  We have installed openstack-mitaka release on CentO7 system. It has two Intel QAT devices. There are 32 VF devices available per QAT Device/DH895xCC device Out of 64 VFs, only 8 VFs are allocated (to VM instances) and rest should be available.
  But the nova scheduler tries to assign an already-in-use SRIOV VF to a new instance and instance fails. It appears that the nova database is not tracking which VF's have already been taken. But if I shut down VM B instance, then other instance VM A boots up and vice-versa. Note that, both the VM instances cannot run simultaneously because of the aforesaid issue.

  We should always be able to create as many instances with the
  requested PCI devices as there are available VFs.

  Please feel free to let me know if additional information is needed.
  Can anyone please suggest why it tries to assign same PCI device which
  has been assigned already? Is there any way to resolve this issue?
  Thank you in advance for your support and help.

  [root at localhost ~(keystone_admin)]# lspci -d:435
  83:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
  88:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
  [root at localhost ~(keystone_admin)]#

  [root at localhost ~(keystone_admin)]# lspci -d:443 | grep "QAT Virtual Function" | wc -l
  64
  [root at localhost ~(keystone_admin)]#

  [root at localhost ~(keystone_admin)]# mysql -u root nova -e "SELECT hypervisor_hostname, address, instance_uuid, status FROM pci_devices JOIN compute_nodes oncompute_nodes.id=compute_node_id" | grep 0000:88:04.7
  localhost  0000:88:04.7    e10a76f3-e58e-4071-a4dd-7a545e8000de    allocated
  localhost  0000:88:04.7    c3dbac90-198d-4150-ba0f-a80b912d8021    allocated
  localhost  0000:88:04.7    c7f6adad-83f0-4881-b68f-6d154d565ce3    allocated
  localhost.nfv.benunets.com 0000:88:04.7    0c3c11a5-f9a4-4f0d-b120-40e4dde843d4    allocated
  [root at localhost ~(keystone_admin)]#

  [root at localhost ~(keystone_admin)]# grep -r e10a76f3-e58e-4071-a4dd-7a545e8000de /etc/libvirt/qemu
  /etc/libvirt/qemu/instance-00000081.xml:  <uuid>e10a76f3-e58e-4071-a4dd-7a545e8000de</uuid>
  /etc/libvirt/qemu/instance-00000081.xml:      <entry name='uuid'>e10a76f3-e58e-4071-a4dd-7a545e8000de</entry>
  /etc/libvirt/qemu/instance-00000081.xml:      <source file='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/disk'/>
  /etc/libvirt/qemu/instance-00000081.xml:      <source path='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/console.log'/>
  /etc/libvirt/qemu/instance-00000081.xml:      <source path='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/console.log'/>
  [root at localhost ~(keystone_admin)]#
  [root at localhost ~(keystone_admin)]# grep -r 0c3c11a5-f9a4-4f0d-b120-40e4dde843d4 /etc/libvirt/qemu
  /etc/libvirt/qemu/instance-000000ab.xml:  <uuid>0c3c11a5-f9a4-4f0d-b120-40e4dde843d4</uuid>
  /etc/libvirt/qemu/instance-000000ab.xml:      <entry name='uuid'>0c3c11a5-f9a4-4f0d-b120-40e4dde843d4</entry>
  /etc/libvirt/qemu/instance-000000ab.xml:      <source file='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/disk'/>
  /etc/libvirt/qemu/instance-000000ab.xml:      <source path='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/console.log'/>
  /etc/libvirt/qemu/instance-000000ab.xml:      <source path='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/console.log'/>
  [root at localhost ~(keystone_admin)]#

  On the controller, , it appears there are duplicate PCI device entries
  in the Database:

  MariaDB [nova]> select hypervisor_hostname,address,count(*) from pci_devices JOIN compute_nodes on compute_nodes.id=compute_node_id group by hypervisor_hostname,address having count(*) > 1;
  +---------------------+--------------+----------+
  | hypervisor_hostname | address      | count(*) |
  +---------------------+--------------+----------+
  | localhost              | 0000:05:00.0 |        3 |
  | localhost              | 0000:05:00.1 |        3 |
  | localhost              | 0000:83:01.0 |        3 |
  | localhost              | 0000:83:01.1 |        3 |
  | localhost              | 0000:83:01.2 |        3 |
  | localhost              | 0000:83:01.3 |        3 |
  | localhost              | 0000:83:01.4 |        3 |
  | localhost              | 0000:83:01.5 |        3 |
  | localhost              | 0000:83:01.6 |        3 |
  | localhost              | 0000:83:01.7 |        3 |
  | localhost              | 0000:83:02.0 |        3 |
  | localhost              | 0000:83:02.1 |        3 |
  | localhost              | 0000:83:02.2 |        3 |
  | localhost              | 0000:83:02.3 |        3 |
  | localhost              | 0000:83:02.4 |        3 |
  | localhost              | 0000:83:02.5 |        3 |
  | localhost              | 0000:83:02.6 |        3 |
  | localhost              | 0000:83:02.7 |        3 |
  | localhost              | 0000:83:03.0 |        3 |
  | localhost              | 0000:83:03.1 |        3 |
  | localhost              | 0000:83:03.2 |        3 |
  | localhost              | 0000:83:03.3 |        3 |
  | localhost              | 0000:83:03.4 |        3 |
  | localhost              | 0000:83:03.5 |        3 |
  | localhost              | 0000:83:03.6 |        3 |
  | localhost              | 0000:83:03.7 |        3 |
  | localhost              | 0000:83:04.0 |        3 |
  | localhost              | 0000:83:04.1 |        3 |
  | localhost              | 0000:83:04.2 |        3 |
  | localhost              | 0000:83:04.3 |        3 |
  | localhost              | 0000:83:04.4 |        3 |
  | localhost              | 0000:83:04.5 |        3 |
  | localhost              | 0000:83:04.6 |        3 |
  | localhost              | 0000:83:04.7 |        3 |
  | localhost              | 0000:88:01.0 |        3 |
  | localhost              | 0000:88:01.1 |        3 |
  | localhost              | 0000:88:01.2 |        3 |
  | localhost              | 0000:88:01.3 |        3 |
  | localhost              | 0000:88:01.4 |        3 |
  | localhost              | 0000:88:01.5 |        3 |
  | localhost              | 0000:88:01.6 |        3 |
  | localhost              | 0000:88:01.7 |        3 |
  | localhost              | 0000:88:02.0 |        3 |
  | localhost              | 0000:88:02.1 |        3 |
  | localhost              | 0000:88:02.2 |        3 |
  | localhost              | 0000:88:02.3 |        3 |
  | localhost              | 0000:88:02.4 |        3 |
  | localhost              | 0000:88:02.5 |        3 |
  | localhost              | 0000:88:02.6 |        3 |
  | localhost              | 0000:88:02.7 |        3 |
  | localhost              | 0000:88:03.0 |        3 |
  | localhost              | 0000:88:03.1 |        3 |
  | localhost              | 0000:88:03.2 |        3 |
  | localhost              | 0000:88:03.3 |        3 |
  | localhost              | 0000:88:03.4 |        3 |
  | localhost              | 0000:88:03.5 |        3 |
  | localhost              | 0000:88:03.6 |        3 |
  | localhost              | 0000:88:03.7 |        3 |
  | localhost              | 0000:88:04.0 |        3 |
  | localhost              | 0000:88:04.1 |        3 |
  | localhost              | 0000:88:04.2 |        3 |
  | localhost              | 0000:88:04.3 |        3 |
  | localhost              | 0000:88:04.4 |        3 |
  | localhost              | 0000:88:04.5 |        3 |
  | localhost              | 0000:88:04.6 |        3 |
  | localhost              | 0000:88:04.7 |        3 |
  +---------------------+--------------+----------+
  66 rows in set (0.00 sec)

  MariaDB [nova]>

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1633120/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list