[PATCH 0/1][Mantic] Restore IB Peer Memory support for 6.5

dann frazier dann.frazier at canonical.com
Mon Feb 26 20:06:01 UTC 2024


BugLink: https://bugs.launchpad.net/bugs/2055082

This patch was dropped during 6.5 devel because it no longer applied,
which regressed GPU Direct over Infiniband support for NVIDIA GPUs.

6.5+ upstream does have an alternative interface based on dma-buf, but
it only works for users with newer generations of GPUs, and only those
running newer driver/CUDA stacks and the open GPU driver variant. We're
working on a post to let users know what their options will be once
we intentionally drop this support in 6.8. Carrying this patch in 6.5
"one last time" will give those users some time to assess those options.

Note: this is the same patch we are carrying in the -nvidia optimized
kernel tree.

Tested on a DGX A100 w/ nvidia_peermem from nvidia-dkms-535-server-open.

Jason Gunthorpe (1):
  UBUNTU: SAUCE: RDMA/core: Introduce peer memory interface

 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/ib_peer_mem.h |  65 ++++
 drivers/infiniband/core/peer_mem.c    | 526 ++++++++++++++++++++++++++
 drivers/infiniband/core/umem.c        |  47 ++-
 drivers/infiniband/hw/mlx5/cq.c       |  12 +-
 drivers/infiniband/hw/mlx5/devx.c     |   3 +-
 drivers/infiniband/hw/mlx5/doorbell.c |   5 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |   2 +-
 drivers/infiniband/hw/mlx5/mr.c       |  61 ++-
 drivers/infiniband/hw/mlx5/qp.c       |  12 +-
 drivers/infiniband/hw/mlx5/srq.c      |   2 +-
 include/rdma/ib_umem.h                |  29 ++
 include/rdma/peer_mem.h               | 176 +++++++++
 13 files changed, 912 insertions(+), 30 deletions(-)
 create mode 100644 drivers/infiniband/core/ib_peer_mem.h
 create mode 100644 drivers/infiniband/core/peer_mem.c
 create mode 100644 include/rdma/peer_mem.h

-- 
2.43.0




More information about the kernel-team mailing list