[RFC v2 0/2][Focal] Add IB peer memory interface

dann frazier dann.frazier at canonical.com
Tue Aug 31 22:39:04 UTC 2021


= What is this? =
Nvidia's GPUDirect RDMA feature requires the nvidia_peermem module, which
is bundled along with the nvidia driver in the >= 460-server branches.
nvidia_peermem itself depends on a non-upstream infiniband interface called
"IB peer memory". We ship a SAUCE patch for IB peer memory in hirsute
and impish, but not in the focal LTS kernel. This means that the
nvidia_peermem module we ship in l-r-m will not load in the focal LTS
kernel.

This is a backport of the SAUCE patch for IB peer memory that we're
carrying in hirsute/impish to focal.

= Upstream status =
This feature is not expected to ever land upstream in this form. There
is work going on upstream (dma-buf/p2pdma I believe) that is expected
to provide equivalent functionality in the future, but it's not clear
when all the pieces will be in place for it. And even when they are,
I suspect it will be far too invasive to integrate into our 5.4.

= Porting Assistance =
We have a commitment to assist with porting this patch set forward to new
kernel versions, including security/bug fixes we pull in from upstream stable.

= Why is it an RFC? =
In a conversation w/ Terry, he mentioned that he'd like to see some amount
of real world functional testing be completed before we'd consider SRU'ing
to other kernels. This has passed validation on the IB side of things,
but hasn't yet gone through "full stack" testing. That testing will take
some scheduling/engineering effort so, in order to minimize the risk of
respins/retests, I'm submitting this as an RFC. My goal here is to try and
fish out any aspects of this patch that are likely to get NAK'd - even if
testing were to pass - before we waste too many test cycles.

= Testing =
We have an automated functional smoke test that we plan to integrate into
SRU testing.

= New patch dependency =
Going back to 5.4 requires backporting an additional upstream patch,
which changes the API of an exported symbol (ib_umem_get). The only out
of tree modules of which I'm aware that use this symbol are the Mellanox
OFED drivers, but they also bundle their own ib_core module that overrides
the ib_umem_get interface we provide, so they aren't directly impacted.
Of course, we can't rule out other users.

RFC v2:
 - Add some paragraphs of context into this cover letter
 - Describe backport process for upstream patch
 - Tag non-upstream patch as SAUCE and clarify provenance and testing

Feras Daoud (1):
  UBUNTU: SAUCE: RDMA/core: Introduce peer memory interface

Moni Shoua (1):
  IB: Allow calls to ib_umem_get from kernel ULPs

 drivers/infiniband/core/Makefile              |   2 +-
 drivers/infiniband/core/ib_peer_mem.h         |  52 ++
 drivers/infiniband/core/peer_mem.c            | 484 ++++++++++++++++++
 drivers/infiniband/core/umem.c                |  69 ++-
 drivers/infiniband/core/umem_odp.c            |  33 +-
 drivers/infiniband/hw/bnxt_re/ib_verbs.c      |  12 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c   |   2 +-
 drivers/infiniband/hw/cxgb4/mem.c             |   2 +-
 drivers/infiniband/hw/efa/efa_verbs.c         |   2 +-
 drivers/infiniband/hw/hns/hns_roce_cq.c       |   2 +-
 drivers/infiniband/hw/hns/hns_roce_db.c       |   3 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c       |   4 +-
 drivers/infiniband/hw/hns/hns_roce_qp.c       |   2 +-
 drivers/infiniband/hw/hns/hns_roce_srq.c      |   5 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c     |   2 +-
 drivers/infiniband/hw/mlx4/cq.c               |   2 +-
 drivers/infiniband/hw/mlx4/doorbell.c         |   3 +-
 drivers/infiniband/hw/mlx4/mr.c               |   8 +-
 drivers/infiniband/hw/mlx4/qp.c               |   5 +-
 drivers/infiniband/hw/mlx4/srq.c              |   3 +-
 drivers/infiniband/hw/mlx5/cq.c               |  11 +-
 drivers/infiniband/hw/mlx5/devx.c             |   2 +-
 drivers/infiniband/hw/mlx5/doorbell.c         |   3 +-
 drivers/infiniband/hw/mlx5/mem.c              |  11 +-
 drivers/infiniband/hw/mlx5/mr.c               |  60 ++-
 drivers/infiniband/hw/mlx5/odp.c              |   2 +-
 drivers/infiniband/hw/mlx5/qp.c               |   4 +-
 drivers/infiniband/hw/mlx5/srq.c              |   2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   |   2 +-
 drivers/infiniband/hw/qedr/verbs.c            |   9 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |   2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |   2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  |   7 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |   2 +-
 drivers/infiniband/sw/rdmavt/mr.c             |   2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c            |   2 +-
 include/rdma/ib_umem.h                        |  33 +-
 include/rdma/ib_umem_odp.h                    |   9 +-
 include/rdma/peer_mem.h                       | 165 ++++++
 40 files changed, 908 insertions(+), 121 deletions(-)
 create mode 100644 drivers/infiniband/core/ib_peer_mem.h
 create mode 100644 drivers/infiniband/core/peer_mem.c
 create mode 100644 include/rdma/peer_mem.h

-- 
2.33.0




More information about the kernel-team mailing list