NACK: [SRU][jammy PATCH v3 0/1] NFS: fix deadlock with pNFS flexfiles IO retry error path

Juerg Haefliger juerg.haefliger at canonical.com
Fri Jan 17 13:31:14 UTC 2025


ML cleanup. Email not threaded. Patch will be pulled in with the next 5.15.y
stable update.

...Juerg


> Please extend me the professional courtesy of letting me know when to
> expect a jammy 5.15 kernel update that includes this fix.  It is now
> in upstream stable/linux-5.15.y 5.15.174 (more below).
> 
> Thanks,
> Mike
> 
> BugLink: https://bugs.launchpad.net/bugs/2089410
> 
> SRU Justification:
> Impact: In production at a mutual "hyperscaler" customer that is using
> the Ubuntu jammy kernel's NFS client with Hammerspace's pNFS
> flexfiles: NFS client deadlock occurred due to upstream commit
> 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on
> rescheduled writes"). Which was later fixed with upstream commit
> b1a28f2eb9ea7 ("NFS: nfs_async_write_reschedule_io must not recurse
> into the writeback code") in August 2022. But it unfortunately wasn't
> marked for stable@ at that time. That has since been rectified and
> Greg Kroah-Hartman has now included it in 5.15.174
> 
> Fix:
> Apply upstream stable/linux-5.15.y commit 31545f4b7cdb6 ("NFS:
> nfs_async_write_reschedule_io must not recurse into the writeback
> code"). Or rebase on 5.15.174.
> 
> Testcase:
> Cause buffered IO issued by NFS client using pNFS flexfiles to hit
> error paths (due to heavy enterprise use, with container limits being
> imposed, which makes OOM within container particularly prone to hit
> error memory allocation errors _and_ additional reason for NFS IO to
> be retransmitted, e.g. due to volume down/up bounces). This can lead
> to deadlock in NFS due to recursion with page locks already held,
> e.g.:
> [<0>] wait_on_page_bit_common+0x10c/0x3d0
> [<0>] wait_on_page_bit+0x3f/0x50
> [<0>] wait_on_page_writeback+0x26/0x80
> [<0>] write_cache_pages+0x138/0x460
> [<0>] nfs_writepages+0x10d/0x200 [nfs]
> [<0>] do_writepages+0xd4/0x200
> [<0>] filemap_fdatawrite_wbc+0x89/0xe0
> [<0>] filemap_fdatawrite_range+0x54/0x70
> [<0>] nfs_async_write_reschedule_io+0x69/0x80 [nfs]
> [<0>] ff_layout_reset_write+0x73/0xe0 [nfs_layout_flexfiles]
> [<0>] ff_layout_write_release+0x7a/0x90 [nfs_layout_flexfiles]
> [<0>] rpc_free_task+0x3d/0x70 [sunrpc]
> [<0>] rpc_async_release+0x30/0x50 [sunrpc]
> [<0>] process_one_work+0x228/0x3d0
> [<0>] worker_thread+0x53/0x420
> [<0>] kthread+0x127/0x150
> [<0>] ret_from_fork+0x1f/0x30
> 
> Trond Myklebust (1):
>   NFS: nfs_async_write_reschedule_io must not recurse into the writeback code
> 
>  fs/nfs/write.c | 2 --
>  1 file changed, 2 deletions(-)
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250117/f4849ee3/attachment.sig>


More information about the kernel-team mailing list