NACK/Cmnt: [SRU][Lunar][PATCH 0/1] UBUNTU: SAUCE: Add mdev_set_iommu_device() kABI.

Jose Ogando Justo jose.ogando at canonical.com
Tue May 2 15:03:46 UTC 2023


Hello Tarun,

An exception can be made for 6.2, Lunar Kernel, but we need to have
explicit confirmation that you guys have a plan for upstreaming code that
supports this.

This code needs to be fully upstreamed before we make the next 23.10 Kernel
feature freeze. Otherwise, we will not be able to carry this patch across
the series.

Does this make sense?

If you agree with this, we need you to resubmit the patch with an explicit
statement about your upstreaming plans.

Thanks!

On Tue, May 2, 2023 at 8:54 AM Tarun Gupta (SW-GPU) <targupta at nvidia.com>
wrote:

>
>
> On 4/26/2023 7:01 PM, Stefan Bader wrote:
> > On 25.04.23 21:45, Tarun Gupta wrote:
> >> BugLink : https://bugs.launchpad.net/bugs/1988806
> >>
> >> SRU Justification:
> >>
> >> [Impact]
> >>
> >> Currently, with below commit present in 5.16 upstream kernel,
> >> mdev_set_iommu_device() kABI is removed.
> >>
> >>   fda49d97f2c4 ("vfio: remove the unused mdev iommu hook")
> >>
> >> This results in SRIOV based Nvidia vGPU being broken with kernels that
> >> have the above upstream commit present.
> >> So, with Ubuntu 22.04 HWE kernel update (i.e the 6.2.x Lunar kernel),
> >> SRIOV based Nvidia vGPU is broken.
> >>
> >> Earlier, during 5.19.x HWE kernel in Kinetic release, a similar patch
> >> was accepted. Refer
> >>
> https://lists.ubuntu.com/archives/kernel-team/2022-September/133142.html
> >> But, this patch didn't get carry-forward from Kinetic to Lunar because
> >> of upstream merge conflict and had to be revert.
> >>
> >> [Fix]
> >>
> >> On 6.2.x HWE kernel, we revert the above patch which removed the
> >> support for mdev_set_iommu_device() kABI so that vGPU works fine.
> >>
> >> [Testcase]
> >>
> >> Run SRIOV based (Ampere+) Nvidia vGPU on 6.2.x (Lunar) kernel.
> >>
> >> Tarun Gupta (1):
> >>    UBUNTU: SAUCE: Add mdev_set_iommu_device() kABI.
> >>
> >>   drivers/vfio/mdev/mdev_driver.c  |   1 +
> >>   drivers/vfio/mdev/mdev_private.h |   1 -
> >>   drivers/vfio/vfio_iommu_type1.c  | 126 ++++++++++++++++++++++++++++---
> >>   include/linux/mdev.h             |  22 ++++++
> >>   4 files changed, 140 insertions(+), 10 deletions(-)
> >>
> >
> > Rejected for the following reasons:
> > - 23.04/Lunar has released now and stable release update criteria
> normally
> >    requires changes to be upstream
> > - For 22.10/Kinetic this seems to have been added to allow development
> >    before the release. The goal always should be to work on upstream
> >    solutions so hacks can be dropped when moving to the next release.
> > - Obviously this has not happened since 5.19, so before we accept this
> >    back into 6.2 I would like to see a plan moving forward as part of the
> >    SRU justification. So we avoid the same thing happening again on the
> >    next release which will become another HWE kernel in 22.04/Jammy.
>
>
> Hi Stefan,
>
> The support for mdev_set_iommu_device() kABI was removed from 5.16+
> upstream kernel as there was no in-tree driver present making use of the
> kABI.
>
> I understand that without upstream support, MDEV framework cannot be
> used for Nvidia vGPU for a long time by relying on custom patches.
> As result, we plan to use vendor specific vfio-pci (or vfio-pci-core)
> framework for Nvidia vGPU. (Refer
>
> https://lore.kernel.org/linux-pci/20210826103912.128972-1-yishaih@nvidia.com/
> ).
>
> But, the support for vfio-pci-core framework is not present in libvirt.
> Libvirt currently only supports assigning VFIO devices which are bind
> to vfio-pci.ko module. It doesn't support assigning VFIO devices which
> are bind to vendor drivers which is the case with vfio-pci-core framework.
>
> There have been discussions in libvirt mailing list to support this but
> it didn't get upstream. In the libvirt mailing list, it was concluded
> that support will be added when IOMMUFD is upstream'ed in kernel which
> will add a vfio specific cdev in sysfs that libvirt will refer to.
> (Refer https://www.spinics.net/linux/fedora/libvir/msg233372.html )
>
> So, this arrangement of using MDEV framework with custom patch is
> temporary and in near future we should be able to switch to
> vfio-pci-core framework when libvirt support is added.
>
> Currently, Nvidia vGPU does work with vfio-pci-core framework but due to
> lack of libvirt support, it will not work out-of-box on Ubuntu as users
> will not be able to assign VF to VM using virsh/libvirt.
>
> So, to support existing Nvidia vGPU customers we request this custom
> patch in HWE kernels.
>
> Thanks,
> Tarun
>
> >
> > -Stefan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20230502/8c0c6a22/attachment.html>


More information about the kernel-team mailing list