<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:PMingLiU;
        panose-1:2 2 5 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Aptos;
        panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
        {font-family:"\@PMingLiU";
        panose-1:2 1 6 1 0 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        font-size:10.0pt;
        font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Aptos",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;
        mso-ligatures:none;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi Bartlomiej,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks! Will fix the comments and the set power code.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">William<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com><br>
<b>Date: </b>Monday, September 23, 2024 at 6:15 AM<br>
<b>To: </b>William Tu <witu@nvidia.com><br>
<b>Cc: </b>kernel-team@lists.ubuntu.com <kernel-team@lists.ubuntu.com>, Vladimir Sokolovsky <vlad@nvidia.com>, Bodong Wang <bodong@nvidia.com>, Sergey Gorenko <sergeygo@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>, Ziv Waksman <zwaksman@nvidia.com><br>
<b>Subject: </b>Re: [SRU][J:linux-bluefield][PATCH v5 4/6] UBUNTU: SAUCE: vfio/pci: Allow MMIO regions to be exported through dma-buf<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">External email: Use caution opening links or attachments<br>
<br>
<br>
On Mon, Sep 16, 2024 at 5:32</span><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><span style="font-size:11.0pt">PM William Tu <witu@nvidia.com> wrote:<br>
><br>
> From: Jason Gunthorpe <jgg@nvidia.com><br>
><br>
> BugLink: <a href="https://bugs.launchpad.net/bugs/2077887">https://bugs.launchpad.net/bugs/2077887</a><br>
><br>
> dma-buf has become a way to safely acquire a handle to non-struct page<br>
> memory that can still have lifetime controlled by the exporter. Notably<br>
> RDMA can now import dma-buf FDs and build them into MRs which allows for<br>
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory<br>
> from PCI device BARs.<br>
><br>
> The patch design loosely follows the pattern in commit<br>
> db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this<br>
> does not support pinning.<br>
><br>
> Instead, this implements what, in the past, we've called a revocable<br>
> attachment using move. In normal situations the attachment is pinned, as a<br>
> BAR does not change physical address. However when the VFIO device is<br>
> closed, or a PCI reset is issued, access to the MMIO memory is revoked.<br>
><br>
> Revoked means that move occurs, but an attempt to immediately re-map the<br>
> memory will fail. In the reset case a future move will be triggered when<br>
> MMIO access returns. As both close and reset are under userspace control<br>
> it is expected that userspace will suspend use of the dma-buf before doing<br>
> these operations, the revoke is purely for kernel self-defense against a<br>
> hostile userspace.<br>
><br>
> Co-authored-by: William Tu <witu@nvidia.com><br>
> [witu: Add new ioctl uAPI for P2P dma buf]<br>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com><br>
> Signed-off-by: William Tu <witu@nvidia.com><br>
> ---<br>
>  drivers/vfio/pci/Makefile          |   1 +<br>
>  drivers/vfio/pci/dma_buf.c         | 262 +++++++++++++++++++++++++++++<br>
>  drivers/vfio/pci/vfio_pci_config.c |  10 +-<br>
>  drivers/vfio/pci/vfio_pci_core.c   |  23 ++-<br>
>  drivers/vfio/pci/vfio_pci_priv.h   |  21 +++<br>
>  include/linux/vfio_pci_core.h      |   1 +<br>
>  include/uapi/linux/vfio.h          |  28 +++<br>
>  7 files changed, 342 insertions(+), 4 deletions(-)<br>
>  create mode 100644 drivers/vfio/pci/dma_buf.c<br>
><br>
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile<br>
> index 74ee55f9c261..14f198c9d9ea 100644<br>
> --- a/drivers/vfio/pci/Makefile<br>
> +++ b/drivers/vfio/pci/Makefile<br>
> @@ -2,6 +2,7 @@<br>
><br>
>  vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o<br>
>  vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o<br>
> +vfio-pci-core-$(CONFIG_DMA_SHARED_BUFFER) += dma_buf.o<br>
>  obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o<br>
><br>
>  vfio-pci-y := vfio_pci.o<br>
> diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c<br>
> new file mode 100644<br>
> index 000000000000..14d32a580190<br>
> --- /dev/null<br>
> +++ b/drivers/vfio/pci/dma_buf.c<br>
> @@ -0,0 +1,262 @@<br>
> +// SPDX-License-Identifier: GPL-2.0-only<br>
> +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.<br>
> + */<br>
> +#include <linux/dma-buf.h><br>
> +#include <linux/pci-p2pdma.h><br>
> +#include <linux/dma-resv.h><br>
> +<br>
> +#include "vfio_pci_priv.h"<br>
> +<br>
> +MODULE_IMPORT_NS(DMA_BUF);<br>
> +<br>
> +struct vfio_pci_dma_buf {<br>
> +       struct dma_buf *dmabuf;<br>
> +       struct vfio_pci_core_device *vdev;<br>
> +       struct list_head dmabufs_elm;<br>
> +       unsigned int index;<br>
> +       unsigned int orig_nents;<br>
> +       size_t offset;<br>
> +       bool revoked;<br>
> +};<br>
> +<br>
> +static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,<br>
> +                                  struct dma_buf_attachment *attachment)<br>
> +{<br>
> +       struct vfio_pci_dma_buf *priv = dmabuf->priv;<br>
> +       int rc;<br>
> +<br>
> +       rc = pci_p2pdma_distance_many(priv->vdev->pdev, &attachment->dev, 1,<br>
> +                                     true);<br>
> +       if (rc < 0)<br>
> +               attachment->peer2peer = false;<br>
> +       return 0;<br>
> +}<br>
> +<br>
> +static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment)<br>
> +{<br>
> +}<br>
> +<br>
> +static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment)<br>
> +{<br>
> +       /*<br>
> +        * Uses the dynamic interface but must always allow for<br>
> +        * dma_buf_move_notify() to do revoke<br>
> +        */<br>
> +       return -EINVAL;<br>
> +}<br>
> +<br>
> +static struct sg_table *<br>
> +vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,<br>
> +                    enum dma_data_direction dir)<br>
> +{<br>
> +       size_t sgl_size = dma_get_max_seg_size(attachment->dev);<br>
> +       struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;<br>
> +       struct scatterlist *sgl;<br>
> +       struct sg_table *sgt;<br>
> +       dma_addr_t dma_addr;<br>
> +       unsigned int nents;<br>
> +       size_t offset;<br>
> +       int ret;<br>
> +<br>
> +       dma_resv_assert_held(priv->dmabuf->resv);<br>
> +<br>
> +       if (!attachment->peer2peer)<br>
> +               return ERR_PTR(-EPERM);<br>
> +<br>
> +       if (priv->revoked)<br>
> +               return ERR_PTR(-ENODEV);<br>
> +<br>
> +       sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);<br>
> +       if (!sgt)<br>
> +               return ERR_PTR(-ENOMEM);<br>
> +<br>
> +       nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size);<br>
> +       ret = sg_alloc_table(sgt, nents, GFP_KERNEL);<br>
> +       if (ret)<br>
> +               goto err_kfree_sgt;<br>
> +<br>
> +       /*<br>
> +        * Since the memory being mapped is a device memory it could never be in<br>
> +        * CPU caches.<br>
> +        */<br>
> +       dma_addr = dma_map_resource(<br>
> +               attachment->dev,<br>
> +               pci_resource_start(priv->vdev->pdev, priv->index) +<br>
> +                       priv->offset,<br>
> +               priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);<br>
> +       ret = dma_mapping_error(attachment->dev, dma_addr);<br>
> +       if (ret)<br>
> +               goto err_free_sgt;<br>
> +<br>
> +       /*<br>
> +        * Break the BAR's physical range up into max sized SGL's according to<br>
> +        * the device's requirement.<br>
> +        */<br>
> +       sgl = sgt->sgl;<br>
> +       for (offset = 0; offset != priv->dmabuf->size;) {<br>
> +               size_t chunk_size = min(priv->dmabuf->size - offset, sgl_size);<br>
> +<br>
> +               sg_set_page(sgl, NULL, chunk_size, 0);<br>
> +               sg_dma_address(sgl) = dma_addr + offset;<br>
> +               sg_dma_len(sgl) = chunk_size;<br>
> +               sgl = sg_next(sgl);<br>
> +               offset += chunk_size;<br>
> +       }<br>
> +<br>
> +       /*<br>
> +        * Because we are not going to include a CPU list we want to have some<br>
> +        * chance that other users will detect this by setting the orig_nents to<br>
> +        * 0 and using only nents (length of DMA list) when going over the sgl<br>
> +        */<br>
> +       priv->orig_nents = sgt->orig_nents;<br>
> +       sgt->orig_nents = 0;<br>
> +       return sgt;<br>
> +<br>
> +err_free_sgt:<br>
> +       sg_free_table(sgt);<br>
> +err_kfree_sgt:<br>
> +       kfree(sgt);<br>
> +       return ERR_PTR(ret);<br>
> +}<br>
> +<br>
> +static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,<br>
> +                                  struct sg_table *sgt,<br>
> +                                  enum dma_data_direction dir)<br>
> +{<br>
> +       struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;<br>
> +<br>
> +       sgt->orig_nents = priv->orig_nents;<br>
> +       dma_unmap_resource(attachment->dev, sg_dma_address(sgt->sgl),<br>
> +                          priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);<br>
> +       sg_free_table(sgt);<br>
> +       kfree(sgt);<br>
> +}<br>
> +<br>
> +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)<br>
> +{<br>
> +       struct vfio_pci_dma_buf *priv = dmabuf->priv;<br>
> +<br>
> +       /*<br>
> +        * Either this or vfio_pci_dma_buf_cleanup() will remove from the list.<br>
> +        * The refcount prevents both.<br>
> +        */<br>
> +       if (priv->vdev) {<br>
> +               down_write(&priv->vdev->memory_lock);<br>
> +               list_del_init(&priv->dmabufs_elm);<br>
> +               up_write(&priv->vdev->memory_lock);<br>
> +               vfio_device_put(&priv->vdev->vdev);<br>
> +       }<br>
> +       kfree(priv);<br>
> +}<br>
> +<br>
> +static const struct dma_buf_ops vfio_pci_dmabuf_ops = {<br>
> +       .attach = vfio_pci_dma_buf_attach,<br>
> +       .map_dma_buf = vfio_pci_dma_buf_map,<br>
> +       .pin = vfio_pci_dma_buf_pin,<br>
> +       .unpin = vfio_pci_dma_buf_unpin,<br>
> +       .release = vfio_pci_dma_buf_release,<br>
> +       .unmap_dma_buf = vfio_pci_dma_buf_unmap,<br>
> +};<br>
> +<br>
> +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev,<br>
> +                                 struct vfio_device_p2p_dma_buf *p2p_dma_buf)<br>
> +{<br>
> +       struct vfio_device_p2p_dma_buf get_dma_buf;<br>
> +       DEFINE_DMA_BUF_EXPORT_INFO(exp_info);<br>
> +       struct vfio_pci_dma_buf *priv;<br>
> +       int ret;<br>
> +<br>
> +       memcpy(&get_dma_buf, p2p_dma_buf, sizeof(get_dma_buf));<br>
> +<br>
> +       /* For PCI the region_index is the BAR number like everything else */<br>
> +       if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)<br>
> +               return -EINVAL;<br>
> +<br>
> +       exp_info.ops = &vfio_pci_dmabuf_ops;<br>
> +       exp_info.size = pci_resource_len(vdev->pdev, get_dma_buf.region_index);<br>
> +       if (!exp_info.size)<br>
> +               return -EINVAL;<br>
> +       if (get_dma_buf.offset || get_dma_buf.length) {<br>
> +               if (get_dma_buf.length > exp_info.size ||<br>
> +                   get_dma_buf.offset >= exp_info.size ||<br>
> +                   get_dma_buf.length > exp_info.size - get_dma_buf.offset ||<br>
> +                   get_dma_buf.offset % PAGE_SIZE ||<br>
> +                   get_dma_buf.length % PAGE_SIZE)<br>
> +                       return -EINVAL;<br>
> +               exp_info.size = get_dma_buf.length;<br>
> +       }<br>
> +       exp_info.flags = get_dma_buf.open_flags;<br>
> +<br>
> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);<br>
> +       if (!priv)<br>
> +               return -ENOMEM;<br>
> +       INIT_LIST_HEAD(&priv->dmabufs_elm);<br>
> +       priv->offset = get_dma_buf.offset;<br>
> +       priv->index = get_dma_buf.region_index;<br>
> +<br>
> +       exp_info.priv = priv;<br>
> +       priv->dmabuf = dma_buf_export(&exp_info);<br>
> +       if (IS_ERR(priv->dmabuf)) {<br>
> +               ret = PTR_ERR(priv->dmabuf);<br>
> +               kfree(priv);<br>
> +               return ret;<br>
> +       }<br>
> +<br>
> +       /* dma_buf_put() now frees priv */<br>
> +<br>
> +       down_write(&vdev->memory_lock);<br>
> +       dma_resv_lock(priv->dmabuf->resv, NULL);<br>
> +       priv->revoked = !__vfio_pci_memory_enabled(vdev);<br>
> +       priv->vdev = vdev;<br>
> +       vfio_device_get(&vdev->vdev);<br>
> +       list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);<br>
> +       dma_resv_unlock(priv->dmabuf->resv);<br>
> +       up_write(&vdev->memory_lock);<br>
> +<br>
> +       /*<br>
> +        * dma_buf_fd() consumes the reference, when the file closes the dmabuf<br>
> +        * will be released.<br>
> +        */<br>
> +       return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags);<br>
> +}<br>
> +<br>
> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)<br>
> +{<br>
> +       struct vfio_pci_dma_buf *priv;<br>
> +       struct vfio_pci_dma_buf *tmp;<br>
> +<br>
> +       lockdep_assert_held_write(&vdev->memory_lock);<br>
> +<br>
> +       list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {<br>
> +               if (!dma_buf_try_get(priv->dmabuf))<br>
> +                       continue;<br>
> +               if (priv->revoked != revoked) {<br>
> +                       dma_resv_lock(priv->dmabuf->resv, NULL);<br>
> +                       priv->revoked = revoked;<br>
> +                       dma_buf_move_notify(priv->dmabuf);<br>
> +                       dma_resv_unlock(priv->dmabuf->resv);<br>
> +               }<br>
> +               dma_buf_put(priv->dmabuf);<br>
> +       }<br>
> +}<br>
> +<br>
> +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)<br>
> +{<br>
> +       struct vfio_pci_dma_buf *priv;<br>
> +       struct vfio_pci_dma_buf *tmp;<br>
> +<br>
> +       down_write(&vdev->memory_lock);<br>
> +       list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {<br>
> +               if (!dma_buf_try_get(priv->dmabuf))<br>
> +                       continue;<br>
> +               dma_resv_lock(priv->dmabuf->resv, NULL);<br>
> +               list_del_init(&priv->dmabufs_elm);<br>
> +               priv->vdev = NULL;<br>
> +               priv->revoked = true;<br>
> +               dma_buf_move_notify(priv->dmabuf);<br>
> +               dma_resv_unlock(priv->dmabuf->resv);<br>
> +               vfio_device_put(&vdev->vdev);<br>
> +               dma_buf_put(priv->dmabuf);<br>
> +       }<br>
> +       up_write(&vdev->memory_lock);<br>
> +}<br>
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c<br>
> index 78c626b8907d..f14a8b81fe35 100644<br>
> --- a/drivers/vfio/pci/vfio_pci_config.c<br>
> +++ b/drivers/vfio/pci/vfio_pci_config.c<br>
> @@ -28,6 +28,8 @@<br>
><br>
>  #include <linux/vfio_pci_core.h><br>
><br>
> +#include "vfio_pci_priv.h"<br>
> +<br>
>  /* Fake capability ID for standard config space */<br>
>  #define PCI_CAP_ID_BASIC       0<br>
><br>
> @@ -581,10 +583,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,<br>
>                 virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);<br>
>                 new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);<br>
><br>
> -               if (!new_mem)<br>
> +               if (!new_mem) {<br>
>                         vfio_pci_zap_and_down_write_memory_lock(vdev);<br>
> -               else<br>
> +                       vfio_pci_dma_buf_move(vdev, true);<br>
> +               } else {<br>
>                         down_write(&vdev->memory_lock);<br>
> +               }<br>
><br>
>                 /*<br>
>                  * If the user is writing mem/io enable (new_mem/io) and we<br>
> @@ -619,6 +623,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,<br>
>                 *virt_cmd &= cpu_to_le16(~mask);<br>
>                 *virt_cmd |= cpu_to_le16(new_cmd & mask);<br>
><br>
> +               if (__vfio_pci_memory_enabled(vdev))<br>
> +                       vfio_pci_dma_buf_move(vdev, false);<br>
>                 up_write(&vdev->memory_lock);<br>
>         }<br>
><br>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c<br>
> index a45cdd3cfb84..4e8fec2315f7 100644<br>
> --- a/drivers/vfio/pci/vfio_pci_core.c<br>
> +++ b/drivers/vfio/pci/vfio_pci_core.c<br>
> @@ -465,6 +465,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)<br>
>         vfio_spapr_pci_eeh_release(vdev->pdev);<br>
>         vfio_pci_core_disable(vdev);<br>
><br>
> +       vfio_pci_dma_buf_cleanup(vdev);<br>
> +<br>
>         mutex_lock(&vdev->igate);<br>
>         if (vdev->err_trigger) {<br>
>                 eventfd_ctx_put(vdev->err_trigger);<br>
> @@ -671,7 +673,10 @@ int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev)<br>
>          */<br>
>         vfio_pci_set_power_state(vdev, PCI_D0);<br>
><br>
> +       vfio_pci_dma_buf_move(vdev, true);<br>
>         ret = pci_try_reset_function(vdev->pdev);<br>
> +       if (__vfio_pci_memory_enabled(vdev))<br>
> +               vfio_pci_dma_buf_move(vdev, false);<br>
>         up_write(&vdev->memory_lock);<br>
><br>
>         return ret;<br>
> @@ -1206,6 +1211,14 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,<br>
>                 default:<br>
>                         return -ENOTTY;<br>
>                 }<br>
> +       } else if (cmd == VFIO_DEVICE_P2P_DMA_BUF) {<br>
> +               struct vfio_device_p2p_dma_buf p2p_dma_buf;<br>
> +<br>
> +               if (copy_from_user(&p2p_dma_buf, (void __user *)arg,<br>
> +                                  sizeof(p2p_dma_buf)))<br>
> +                       return -EFAULT;<br>
> +<br>
> +               return vfio_pci_core_feature_dma_buf(vdev, &p2p_dma_buf);<br>
>         }<br>
><br>
>         return -ENOTTY;<br>
> @@ -1839,6 +1852,7 @@ void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,<br>
>         INIT_LIST_HEAD(&vdev->vma_list);<br>
>         INIT_LIST_HEAD(&vdev->sriov_pfs_item);<br>
>         init_rwsem(&vdev->memory_lock);<br>
> +       INIT_LIST_HEAD(&vdev->dmabufs);<br>
>  }<br>
>  EXPORT_SYMBOL_GPL(vfio_pci_core_init_device);<br>
><br>
> @@ -2151,11 +2165,16 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,<br>
>          * cause the PCI config space reset without restoring the original<br>
>          * state (saved locally in 'vdev->pm_save').<br>
>          */<br>
> -       list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)<br>
> -               vfio_pci_set_power_state(cur, PCI_D0);<br>
> +       list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {<br>
> +               vfio_pci_dma_buf_move(cur, true);<br>
> +       }<br>
<br>
The above code remains incorrect as it still removes<br>
vfio_pci_set_power_state() call.<br>
Could you please fix it?<br>
<br>
>         ret = pci_reset_bus(pdev);<br>
><br>
> +       list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)<br>
> +               if (__vfio_pci_memory_enabled(cur))<br>
> +                       vfio_pci_dma_buf_move(cur, false);<br>
> +<br>
>  err_undo:<br>
>         list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {<br>
>                 if (cur == cur_mem)<br>
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h<br>
> index 4971cd95b431..d60e9cb99140 100644<br>
> --- a/drivers/vfio/pci/vfio_pci_priv.h<br>
> +++ b/drivers/vfio/pci/vfio_pci_priv.h<br>
> @@ -4,4 +4,25 @@<br>
><br>
>  int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev);<br>
><br>
> +#ifdef CONFIG_DMA_SHARED_BUFFER<br>
> +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev,<br>
> +                                 struct vfio_device_p2p_dma_buf *arg);<br>
> +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);<br>
> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);<br>
> +#else<br>
> +static int<br>
> +vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev,<br>
> +                             struct vfio_device_p2p_dma_buf *arg)<br>
> +{<br>
> +       return -ENOTTY;<br>
> +}<br>
> +static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)<br>
> +{<br>
> +}<br>
> +static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev,<br>
> +                                        bool revoked)<br>
> +{<br>
> +}<br>
> +#endif<br>
> +<br>
>  #endif<br>
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h<br>
> index f22d5a382c15..1136ed4a08f9 100644<br>
> --- a/include/linux/vfio_pci_core.h<br>
> +++ b/include/linux/vfio_pci_core.h<br>
> @@ -139,6 +139,7 @@ struct vfio_pci_core_device {<br>
>         struct mutex            vma_lock;<br>
>         struct list_head        vma_list;<br>
>         struct rw_semaphore     memory_lock;<br>
> +       struct list_head        dmabufs;<br>
>  };<br>
><br>
>  #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)<br>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h<br>
> index ef33ea002b0b..18f369426c13 100644<br>
> --- a/include/uapi/linux/vfio.h<br>
> +++ b/include/uapi/linux/vfio.h<br>
> @@ -1002,6 +1002,34 @@ struct vfio_device_feature {<br>
>   */<br>
>  #define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN       (0)<br>
><br>
> +/**<br>
> + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the<br>
> + * region selected.<br>
<br>
The above comment seems incorrect as now this feature uses its own ioctl.<br>
<br>
--<br>
Best regards,<br>
Bartlomiej<br>
<br>
> + *<br>
> + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,<br>
> + * etc. offset/length specify a slice of the region to create the dmabuf from.<br>
> + * If both are 0 then the whole region is used.<br>
> + *<br>
> + * Return: The fd number on success, -1 and errno is set on failure.<br>
> + */<br>
> +<br>
> +/**<br>
> + * VFIO_DEVICE_P2P_DMA_BUF - _IORW(VFIO_TYPE, VFIO_BASE + 22,<br>
> + *                                struct vfio_device_p2p_dma_buf)<br>
> + *<br>
> + * Set the region index, open flags, offset and length to create a dma_buf<br>
> + * for p2p dma.<br>
> + *<br>
> + * Return 0 on success, -errno on failure.<br>
> + */<br>
> +struct vfio_device_p2p_dma_buf {<br>
> +       __u32 region_index;<br>
> +       __u32 open_flags;<br>
> +       __u32 offset;<br>
> +       __u64 length;<br>
> +};<br>
> +#define VFIO_DEVICE_P2P_DMA_BUF        _IO(VFIO_TYPE, VFIO_BASE + 22)<br>
> +<br>
>  /* -------- API for Type1 VFIO IOMMU -------- */<br>
><br>
>  /**<o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</body>
</html>