ACK: [SRU][F][PATCH 0/1] Page fault in RDMA ODP triggers BUG_ON during MMU notifier registration

Tim Gardner tim.gardner at canonical.com
Wed Jan 3 16:44:41 UTC 2024


On 12/15/23 3:03 AM, Chengen Du wrote:
> BugLink: https://bugs.launchpad.net/bugs/2046534
> 
> SRU Justification:
> 
> [Impact]
> When a page fault is triggered in RDMA ODP, it registers an MMU notifier during the process.
> Unfortunately, an error arises due to a race condition where the mm is released while attempting to register a notifier.
> ==========
> Oct 14 23:38:32 bnode001 kernel: [1576115.901880] kernel BUG at mm/mmu_notifier.c:255!
> Oct 14 23:38:32 bnode001 kernel: [1576115.909129] RSP: 0000:ffffbd3def843c90 EFLAGS: 00010246
> Oct 14 23:38:32 bnode001 kernel: [1576115.912689] RAX: ffffa11635d20000 RBX: ffffa0f913ba5800 RCX: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.912691] RDX: ffffffffc0b666f0 RSI: ffffffffc0b601c7 RDI: ffffa0f913ba5850
> Oct 14 23:38:32 bnode001 kernel: [1576115.913564] RAX: 0000000000000000 RBX: ffffffffc0b5a060 RCX: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.913565] RDX: 0000000000000007 RSI: ffffa1152ed3c400 RDI: ffffa1102dcd4300
> Oct 14 23:38:32 bnode001 kernel: [1576115.914431] RBP: ffffbd3defcb7c88 R08: ffffa1163f4f50e0 R09: ffffa11638c072c0
> Oct 14 23:38:32 bnode001 kernel: [1576115.914432] R10: ffffa0fd99a00000 R11: 0000000000000000 R12: ffffa1152c923b80
> Oct 14 23:38:32 bnode001 kernel: [1576115.915263] RBP: ffffbd3def843cb0 R08: ffffa1163f7350e0 R09: ffffa11638c072c0
> Oct 14 23:38:32 bnode001 kernel: [1576115.915265] R10: ffffa1088d000000 R11: 0000000000000000 R12: ffffa1102dcd4300
> Oct 14 23:38:32 bnode001 kernel: [1576115.916079] R13: ffffa1152c923b80 R14: ffffa1152c923bf8 R15: ffffa114f8127800
> Oct 14 23:38:32 bnode001 kernel: [1576115.916080] FS: 0000000000000000(0000) GS:ffffa1163f4c0000(0000) knlGS:0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.917705] R13: ffffa1152ed3c400 R14: ffffa1152ed3c478 R15: ffffa1101cbfbc00
> Oct 14 23:38:32 bnode001 kernel: [1576115.917706] FS: 0000000000000000(0000) GS:ffffa1163f700000(0000) knlGS:0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.918506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 14 23:38:32 bnode001 kernel: [1576115.918508] CR2: 00007f94146af5e0 CR3: 0000001722472004 CR4: 0000000000760ee0
> Oct 14 23:38:32 bnode001 kernel: [1576115.919301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 14 23:38:32 bnode001 kernel: [1576115.919302] CR2: 00007f32f0a2dc80 CR3: 0000001f9f1fc004 CR4: 0000000000760ee0
> Oct 14 23:38:32 bnode001 kernel: [1576115.920082] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.920084] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> Oct 14 23:38:32 bnode001 kernel: [1576115.920850] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Oct 14 23:38:32 bnode001 kernel: [1576115.921604] PKRU: 55555554
> Oct 14 23:38:32 bnode001 kernel: [1576115.921605] Call Trace:
> Oct 14 23:38:32 bnode001 kernel: [1576115.922354] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> Oct 14 23:38:32 bnode001 kernel: [1576115.922355] PKRU: 55555554
> Oct 14 23:38:32 bnode001 kernel: [1576115.923112] mmu_notifier_get_locked+0x5f/0xe0
> Oct 14 23:38:32 bnode001 kernel: [1576115.923867] Call Trace:
> Oct 14 23:38:32 bnode001 kernel: [1576115.923870] ? mmu_notifier_get_locked+0x79/0xe0
> Oct 14 23:38:32 bnode001 kernel: [1576115.924645] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
> Oct 14 23:38:32 bnode001 kernel: [1576115.925409] ib_umem_odp_alloc_child+0x15a/0x290 [ib_core]
> Oct 14 23:38:32 bnode001 kernel: [1576115.926161] pagefault_mr+0x312/0x5d0 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.926906] pagefault_mr+0x312/0x5d0 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.927651] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.928393] pagefault_single_data_segment.isra.0+0x284/0x490 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.929131] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.929866] mlx5_ib_eqe_pf_action+0x7d5/0x990 [mlx5_ib]
> Oct 14 23:38:32 bnode001 kernel: [1576115.930610] process_one_work+0x1eb/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.931351] process_one_work+0x1eb/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.932084] worker_thread+0x4d/0x400
> Oct 14 23:38:32 bnode001 kernel: [1576115.932813] worker_thread+0x4d/0x400
> Oct 14 23:38:32 bnode001 kernel: [1576115.933543] kthread+0x104/0x140
> Oct 14 23:38:32 bnode001 kernel: [1576115.934272] kthread+0x104/0x140
> Oct 14 23:38:32 bnode001 kernel: [1576115.934986] ? process_one_work+0x3b0/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.934988] ? kthread_park+0x90/0x90
> Oct 14 23:38:32 bnode001 kernel: [1576115.935687] ? process_one_work+0x3b0/0x3b0
> Oct 14 23:38:32 bnode001 kernel: [1576115.935689] ? kthread_park+0x90/0x90
> Oct 14 23:38:32 bnode001 kernel: [1576115.936387] ret_from_fork+0x1f/0x40
> Oct 14 23:38:32 bnode001 kernel: [1576115.936389] ---[ end trace 1823b59637af552f ]---
> Oct 14 23:38:32 bnode001 kernel: [1576115.937077] ret_from_fork+0x1f/0x40
> ==========
> 
> [Fix]
> There is an upstream patch that fixes this issue:
> ==========
> commit a4e63bce1414df7ab6eb82ca9feb8494ce13e554
> Author: Jason Gunthorpe <jgg at ziepe.ca>
> Date: Thu Feb 27 13:41:18 2020 +0200
> 
>      RDMA/odp: Ensure the mm is still alive before creating an implicit child
> ==========
> The patch has been implemented to modify the behavior by calling mmget() around the registration, thereby ensuring it is held to avoid the race condition.
> 
> [Test Plan]
> This is a race condition issue and may not be easy to reproduce.
> The test plan involves running on a system with InfiniBand, triggering the RDMA ODP page fault path to check if everything works as expected.
> 
> [Where problems could occur]
> The patch calls mmget_not_zero() before registering the MMU notifier and puts it after registration is done.
> This change may not affect the execution result but ensures that the mm will not be released during registration.
> The risk associated with adopting this patch can be judged as low.
> 
> Jason Gunthorpe (1):
>    RDMA/odp: Ensure the mm is still alive before creating an implicit
>      child
> 
>   drivers/infiniband/core/umem_odp.c | 22 ++++++++++++++++++----
>   1 file changed, 18 insertions(+), 4 deletions(-)
> 
Acked-by: Tim Gardner <tim.gardner at canonical.com>
-- 
-----------
Tim Gardner
Canonical, Inc




More information about the kernel-team mailing list