NACK/Cmnt: [SRU][Lunar][PATCH 0/1] UBUNTU: SAUCE: Add mdev_set_iommu_device() kABI.
jose.ogando at canonical.com
jose.ogando at canonical.com
Fri May 19 16:22:56 UTC 2023
Thanks Tarun,
We will follow up.
On Thu, 2023-05-18 at 18:10 +0530, Tarun Gupta (SW-GPU) wrote:
> Hi Jose, Stefan,
>
> I've sent a V2 version of this patch including some details about
> switching to vfio-pci-core framework for vGPU in future so that this
> custom patch is not required.
>
> Please take a look and let me know your feedback.
>
> Patch link :
> https://lists.ubuntu.com/archives/kernel-team/2023-May/139578.html
>
> Thanks,
> Tarun
>
> On 5/5/2023 10:13 PM, Tarun Gupta (SW-GPU) wrote:
> > Hi Jose,
> >
> > Apologies for the delayed response.
> >
> > Yes, we do want to upstream libvirt support for vfio-pci-core
> > framework
> > so that it works out of box on Ubuntu. There were discussions
> > previously
> > in community to get it upstreamed but it was decided to pursue it
> > later
> > when IOMMUFD support is added in kernel and it exposes a new cdev.
> >
> > The kernel patches with IOMMUFD to add a new cdev is in review and
> > will
> > likely get upstreamed in the next merge window. Post, that we can
> > expect
> > libvirt upstreaming.
> >
> > So, requesting here to make an exception for 6.2 Lunar kernel and
> > we can
> > then work to try adding libvirt support by 23.10 feature freeze.
> >
> > Please let me know your thoughts on this, I can then send out a v2
> > of
> > this patch with the above details.
> >
> > Thanks,
> > Tarun
> >
> >
> > On 5/2/2023 8:33 PM, Jose Ogando Justo wrote:
> > > External email: Use caution opening links or attachments
> > >
> > > Hello Tarun,
> > >
> > > An exception can be made for 6.2, Lunar Kernel, but we need to
> > > have
> > > explicit confirmation that you guys have a plan for upstreaming
> > > code
> > > that supports this.
> > >
> > > This code needs to be fully upstreamed before we make the next
> > > 23.10
> > > Kernel feature freeze. Otherwise, we will not be able to carry
> > > this
> > > patch across the series.
> > >
> > > Does this make sense?
> > >
> > > If you agree with this, we need you to resubmit the patch with an
> > > explicit statement about your upstreaming plans.
> > >
> > > Thanks!
> > >
> > > On Tue, May 2, 2023 at 8:54 AM Tarun Gupta (SW-GPU)
> > > <targupta at nvidia.com<mailto:targupta at nvidia.com>> wrote:
> > >
> > >
> > > On 4/26/2023 7:01 PM, Stefan Bader wrote:
> > > > On 25.04.23 21:45, Tarun Gupta wrote:
> > > > > BugLink : https://bugs.launchpad.net/bugs/1988806
> > > > >
> > > > > SRU Justification:
> > > > >
> > > > > [Impact]
> > > > >
> > > > > Currently, with below commit present in 5.16 upstream kernel,
> > > > > mdev_set_iommu_device() kABI is removed.
> > > > >
> > > > > fda49d97f2c4 ("vfio: remove the unused mdev iommu hook")
> > > > >
> > > > > This results in SRIOV based Nvidia vGPU being broken with
> > > > > kernels that
> > > > > have the above upstream commit present.
> > > > > So, with Ubuntu 22.04 HWE kernel update (i.e the 6.2.x Lunar
> > > > > kernel),
> > > > > SRIOV based Nvidia vGPU is broken.
> > > > >
> > > > > Earlier, during 5.19.x HWE kernel in Kinetic release, a
> > > > > similar patch
> > > > > was accepted. Refer
> > > > > https://lists.ubuntu.com/archives/kernel-team/2022-September/133142.html
> > > > > But, this patch didn't get carry-forward from Kinetic to
> > > > > Lunar because
> > > > > of upstream merge conflict and had to be revert.
> > > > >
> > > > > [Fix]
> > > > >
> > > > > On 6.2.x HWE kernel, we revert the above patch which removed
> > > > > the
> > > > > support for mdev_set_iommu_device() kABI so that vGPU works
> > > > > fine.
> > > > >
> > > > > [Testcase]
> > > > >
> > > > > Run SRIOV based (Ampere+) Nvidia vGPU on 6.2.x (Lunar)
> > > > > kernel.
> > > > >
> > > > > Tarun Gupta (1):
> > > > > UBUNTU: SAUCE: Add mdev_set_iommu_device() kABI.
> > > > >
> > > > > drivers/vfio/mdev/mdev_driver.c | 1 +
> > > > > drivers/vfio/mdev/mdev_private.h | 1 -
> > > > > drivers/vfio/vfio_iommu_type1.c | 126
> > > > > ++++++++++++++++++++++++++++---
> > > > > include/linux/mdev.h | 22 ++++++
> > > > > 4 files changed, 140 insertions(+), 10 deletions(-)
> > > > >
> > > >
> > > > Rejected for the following reasons:
> > > > - 23.04/Lunar has released now and stable release update
> > > > criteria
> > > > normally
> > > > requires changes to be upstream
> > > > - For 22.10/Kinetic this seems to have been added to allow
> > > > development
> > > > before the release. The goal always should be to work on
> > > > upstream
> > > > solutions so hacks can be dropped when moving to the next
> > > > release.
> > > > - Obviously this has not happened since 5.19, so before we
> > > > accept this
> > > > back into 6.2 I would like to see a plan moving forward as
> > > > part
> > > > of the
> > > > SRU justification. So we avoid the same thing happening
> > > > again on the
> > > > next release which will become another HWE kernel in
> > > > 22.04/Jammy.
> > >
> > >
> > > Hi Stefan,
> > >
> > > The support for mdev_set_iommu_device() kABI was removed from
> > > 5.16+
> > > upstream kernel as there was no in-tree driver present making use
> > > of the
> > > kABI.
> > >
> > > I understand that without upstream support, MDEV framework cannot
> > > be
> > > used for Nvidia vGPU for a long time by relying on custom
> > > patches.
> > > As result, we plan to use vendor specific vfio-pci (or vfio-pci-
> > > core)
> > > framework for Nvidia vGPU. (Refer
> > > https://lore.kernel.org/linux-pci/20210826103912.128972-1-yishaih@nvidia.com/
> > > ).
> > >
> > > But, the support for vfio-pci-core framework is not present in
> > > libvirt.
> > > Libvirt currently only supports assigning VFIO devices which are
> > > bind
> > > to vfio-pci.ko module. It doesn't support assigning VFIO devices
> > > which
> > > are bind to vendor drivers which is the case with vfio-pci-core
> > > framework.
> > >
> > > There have been discussions in libvirt mailing list to support
> > > this but
> > > it didn't get upstream. In the libvirt mailing list, it was
> > > concluded
> > > that support will be added when IOMMUFD is upstream'ed in kernel
> > > which
> > > will add a vfio specific cdev in sysfs that libvirt will refer
> > > to.
> > > (Refer
> > > https://www.spinics.net/linux/fedora/libvir/msg233372.html )
> > >
> > > So, this arrangement of using MDEV framework with custom patch is
> > > temporary and in near future we should be able to switch to
> > > vfio-pci-core framework when libvirt support is added.
> > >
> > > Currently, Nvidia vGPU does work with vfio-pci-core framework but
> > > due to
> > > lack of libvirt support, it will not work out-of-box on Ubuntu as
> > > users
> > > will not be able to assign VF to VM using virsh/libvirt.
> > >
> > > So, to support existing Nvidia vGPU customers we request this
> > > custom
> > > patch in HWE kernels.
> > >
> > > Thanks,
> > > Tarun
> > >
> > > >
> > > > -Stefan
> > >
More information about the kernel-team
mailing list