NACK: [jammy] SRU: NFS: fix deadlock with pNFS flexfiles IO retry error path

Magali Lemes magali.lemes at canonical.com
Thu Nov 28 21:30:46 UTC 2024


Hi,

This patch is missing a BugLink. Please file one here 
https://bugs.launchpad.net/ubuntu/+source/linux/+filebug and use its 
link in the cover letter and commit patches.
Also check 
https://canonical-kernel-docs.readthedocs-hosted.com/en/latest/reference/patch_acceptance_criteria/ 
for some reference and examples.
Finally, could you please submit a v2 taking the comments inline into 
consideration?

On 22/11/2024 16:16, Mike Snitzer wrote:
> SRU Justification:
> 
> Impact: In pre-production at a mutual "hyperscaler" customer that is
> using the Ubuntu jammy kernel's NFS client with Hammerspace's pNFS
> flexfiles: NFS client deadlock occurred due to upstream commit
> 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on
> rescheduled writes"). Which was later fixed with upstream commit
> b1a28f2eb9ea7 ("NFS: nfs_async_write_reschedule_io must not recurse
> into the writeback code") in August 2022. But it unfortunately wasn't
> marked for stable@ at that time. That has since been rectified and
> Greg Kroah-Hartman has now picked it up for the next
> stable/linux-5.15.y kernel (but it hasn't yet appeared in the stable
> repo yet), please see:
> https://lore.kernel.org/stable/2024112146-tiptoeing-available-c5fe@gregkh/T/
> 
> Fix: Apply upstream commit b1a28f2eb9ea7 ("NFS:
> nfs_async_write_reschedule_io must not recurse into the writeback
> code"), that commit was developed by and came from Trond Myklebust the
> upstream Linux NFS client maintainer.
> 
> Testcase: Cause buffered IO issued by NFS client using pNFS flexfiles
> to hit error paths (due to heavy enterprise use, with container limits
> being imposed, which makes OOM within container particularly prone to
> hit error memory allocation errors _and_ additional reason for NFS IO
> to be retransmitted, e.g. due to volume down/up bounces). This can
> lead to deadlock in NFS due to recursion with page locks already held,
> e.g.:
> [<0>] wait_on_page_bit_common+0x10c/0x3d0
> [<0>] wait_on_page_bit+0x3f/0x50
> [<0>] wait_on_page_writeback+0x26/0x80
> [<0>] write_cache_pages+0x138/0x460
> [<0>] nfs_writepages+0x10d/0x200 [nfs]
> [<0>] do_writepages+0xd4/0x200
> [<0>] filemap_fdatawrite_wbc+0x89/0xe0
> [<0>] filemap_fdatawrite_range+0x54/0x70
> [<0>] nfs_async_write_reschedule_io+0x69/0x80 [nfs]
> [<0>] ff_layout_reset_write+0x73/0xe0 [nfs_layout_flexfiles]
> [<0>] ff_layout_write_release+0x7a/0x90 [nfs_layout_flexfiles]
> [<0>] rpc_free_task+0x3d/0x70 [sunrpc]
> [<0>] rpc_async_release+0x30/0x50 [sunrpc]
> [<0>] process_one_work+0x228/0x3d0
> [<0>] worker_thread+0x53/0x420
> [<0>] kthread+0x127/0x150
> [<0>] ret_from_fork+0x1f/0x30~
> 

Could you make a separate cover letter patch with the content above?

> [ Upstream commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231 ]
> 

Since we are picking this commit from mainline, we use a `cherry picked 
from` line above your Signed-off-by line instead.

> commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231
> Author: Trond Myklebust <trond.myklebust at hammerspace.com>
> Date:   Mon Aug 1 14:16:51 2022 -0400
> 
>      NFS: nfs_async_write_reschedule_io must not recurse into the writeback code
> 
>      It is not safe to call filemap_fdatawrite_range() from
>      nfs_async_write_reschedule_io(), since we're often calling from a page
>      reclaim context. Just let fsync() redrive the writeback for us.
> 
>      Signed-off-by: Trond Myklebust <trond.myklebust at hammerspace.com>
> 
> Fixes: 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on rescheduled writes")
> Link: https://lore.kernel.org/stable/2024112146-tiptoeing-available-c5fe@gregkh/T/
> Link: https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/2089410
> Signed-off-by: Mike Snitzer <snitzer at hammerspace.com>
> Signed-off-by: Mike Snitzer <snitzer at kernel.org>
> ---
>   fs/nfs/write.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 2e1c45344d1fd..00e11c6602fe4 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1437,8 +1437,6 @@ static void nfs_async_write_error(struct list_head *head, int error)
>   static void nfs_async_write_reschedule_io(struct nfs_pgio_header *hdr)
>   {
>   	nfs_async_write_error(&hdr->pages, 0);
> -	filemap_fdatawrite_range(hdr->inode->i_mapping, hdr->args.offset,
> -			hdr->args.offset + hdr->args.count - 1);
>   }
>   
>   static const struct nfs_pgio_completion_ops nfs_async_write_completion_ops = {


Thanks,
Magali



More information about the kernel-team mailing list