[Bug 1972028] Re: [SRU] _get_pci_passthrough_devices prone to race condition

Andreas Hasenack 1972028 at bugs.launchpad.net
Thu Jan 30 18:44:43 UTC 2025


Hello Mohammed, or anyone else affected,

Accepted nova into jammy-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/nova/3:25.2.1-0ubuntu2.8 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
jammy to verification-done-jammy. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-jammy. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: nova (Ubuntu Jammy)
       Status: New => Fix Committed

** Tags added: verification-needed verification-needed-jammy

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1972028

Title:
  [SRU] _get_pci_passthrough_devices prone to race condition

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Jammy:
  Fix Committed
Status in nova source package in Noble:
  Fix Released

Bug description:
  [Impact]

  Nova suffers from a race condition when it does live migrations of vms
  with SRIOV ports whereby a pre-check of available ports and their
  capabilities can error if one or more ports becomes unavailable during
  the check. The fix backported here simply ignores libvirt errors when
  checking device capabilities resulting in those that throw an error
  being ignored.

  [Test Plan]

  Since the bug is a race condition it can be hard to reproduce but a
  succession of live migrations between SRIOV capable nodes with a
  reasonably large quantity of VFs should be a reasonable test.

  * deploy OpenStack Yoga with SRIOV capable hardward
  * create 10 vms with e.g. 5 sriov ports
  * live migrate the vms between the hosts and check for the Traceback in /var/log/nova/nova-compute.log

  [Regression Potential]
  This patch is not anticipated to introduce any regressions.
  -------------------------------------------------

  At the moment, the `_get_pci_passthrough_devices` function is prone to
  race conditions.

  This specific code here calls `listCaps()`, however, it is possible
  that the device has disappeared by the time on method has been called:

  https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7949-L7959

  Which would result in the following traceback:

  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager [req-51b7c1c4-2b4a-46cc-9baa-8bf61801c48d - - - - -] Error updating resources for node <snip>.: libvirt.libvirtError: Node device not found: no node device with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4'
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager Traceback (most recent call last):
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 9946, in _update_available_resource_for_node
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     self.rt.update_available_resource(context, nodename,
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 879, in update_available_resource
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     resources = self.driver.get_available_resource(nodename)
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 8937, in get_available_resource
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     data['pci_passthrough_devices'] = self._get_pci_passthrough_devices()
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7663, in _get_pci_passthrough_devices
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     vdpa_devs = [
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7664, in <listcomp>
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     dev for dev in devices.values() if "vdpa" in dev.listCaps()
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/libvirt.py", line 6276, in listCaps
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     raise libvirtError('virNodeDeviceListCaps() failed')
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager libvirt.libvirtError: Node device not found: no node device with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4'
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager

  I think the cleaner way is to loop over all the items and skip a
  device if it raises an error that the device is not found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1972028/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list