NACK/Cmnt: [SRU][Lunar][PATCH 0/1] UBUNTU: SAUCE: Add mdev_set_iommu_device() kABI.
Tarun Gupta (SW-GPU)
targupta at nvidia.com
Tue May 2 06:54:26 UTC 2023
On 4/26/2023 7:01 PM, Stefan Bader wrote:
> On 25.04.23 21:45, Tarun Gupta wrote:
>> BugLink : https://bugs.launchpad.net/bugs/1988806
>>
>> SRU Justification:
>>
>> [Impact]
>>
>> Currently, with below commit present in 5.16 upstream kernel,
>> mdev_set_iommu_device() kABI is removed.
>>
>> fda49d97f2c4 ("vfio: remove the unused mdev iommu hook")
>>
>> This results in SRIOV based Nvidia vGPU being broken with kernels that
>> have the above upstream commit present.
>> So, with Ubuntu 22.04 HWE kernel update (i.e the 6.2.x Lunar kernel),
>> SRIOV based Nvidia vGPU is broken.
>>
>> Earlier, during 5.19.x HWE kernel in Kinetic release, a similar patch
>> was accepted. Refer
>> https://lists.ubuntu.com/archives/kernel-team/2022-September/133142.html
>> But, this patch didn't get carry-forward from Kinetic to Lunar because
>> of upstream merge conflict and had to be revert.
>>
>> [Fix]
>>
>> On 6.2.x HWE kernel, we revert the above patch which removed the
>> support for mdev_set_iommu_device() kABI so that vGPU works fine.
>>
>> [Testcase]
>>
>> Run SRIOV based (Ampere+) Nvidia vGPU on 6.2.x (Lunar) kernel.
>>
>> Tarun Gupta (1):
>> UBUNTU: SAUCE: Add mdev_set_iommu_device() kABI.
>>
>> drivers/vfio/mdev/mdev_driver.c | 1 +
>> drivers/vfio/mdev/mdev_private.h | 1 -
>> drivers/vfio/vfio_iommu_type1.c | 126 ++++++++++++++++++++++++++++---
>> include/linux/mdev.h | 22 ++++++
>> 4 files changed, 140 insertions(+), 10 deletions(-)
>>
>
> Rejected for the following reasons:
> - 23.04/Lunar has released now and stable release update criteria normally
> requires changes to be upstream
> - For 22.10/Kinetic this seems to have been added to allow development
> before the release. The goal always should be to work on upstream
> solutions so hacks can be dropped when moving to the next release.
> - Obviously this has not happened since 5.19, so before we accept this
> back into 6.2 I would like to see a plan moving forward as part of the
> SRU justification. So we avoid the same thing happening again on the
> next release which will become another HWE kernel in 22.04/Jammy.
Hi Stefan,
The support for mdev_set_iommu_device() kABI was removed from 5.16+
upstream kernel as there was no in-tree driver present making use of the
kABI.
I understand that without upstream support, MDEV framework cannot be
used for Nvidia vGPU for a long time by relying on custom patches.
As result, we plan to use vendor specific vfio-pci (or vfio-pci-core)
framework for Nvidia vGPU. (Refer
https://lore.kernel.org/linux-pci/20210826103912.128972-1-yishaih@nvidia.com/
).
But, the support for vfio-pci-core framework is not present in libvirt.
Libvirt currently only supports assigning VFIO devices which are bind
to vfio-pci.ko module. It doesn't support assigning VFIO devices which
are bind to vendor drivers which is the case with vfio-pci-core framework.
There have been discussions in libvirt mailing list to support this but
it didn't get upstream. In the libvirt mailing list, it was concluded
that support will be added when IOMMUFD is upstream'ed in kernel which
will add a vfio specific cdev in sysfs that libvirt will refer to.
(Refer https://www.spinics.net/linux/fedora/libvir/msg233372.html )
So, this arrangement of using MDEV framework with custom patch is
temporary and in near future we should be able to switch to
vfio-pci-core framework when libvirt support is added.
Currently, Nvidia vGPU does work with vfio-pci-core framework but due to
lack of libvirt support, it will not work out-of-box on Ubuntu as users
will not be able to assign VF to VM using virsh/libvirt.
So, to support existing Nvidia vGPU customers we request this custom
patch in HWE kernels.
Thanks,
Tarun
>
> -Stefan
More information about the kernel-team
mailing list