NACK: [jammy] SRU: NFS: fix deadlock with pNFS flexfiles IO retry error path
Magali Lemes
magali.lemes at canonical.com
Thu Nov 28 21:30:46 UTC 2024
Hi,
This patch is missing a BugLink. Please file one here
https://bugs.launchpad.net/ubuntu/+source/linux/+filebug and use its
link in the cover letter and commit patches.
Also check
https://canonical-kernel-docs.readthedocs-hosted.com/en/latest/reference/patch_acceptance_criteria/
for some reference and examples.
Finally, could you please submit a v2 taking the comments inline into
consideration?
On 22/11/2024 16:16, Mike Snitzer wrote:
> SRU Justification:
>
> Impact: In pre-production at a mutual "hyperscaler" customer that is
> using the Ubuntu jammy kernel's NFS client with Hammerspace's pNFS
> flexfiles: NFS client deadlock occurred due to upstream commit
> 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on
> rescheduled writes"). Which was later fixed with upstream commit
> b1a28f2eb9ea7 ("NFS: nfs_async_write_reschedule_io must not recurse
> into the writeback code") in August 2022. But it unfortunately wasn't
> marked for stable@ at that time. That has since been rectified and
> Greg Kroah-Hartman has now picked it up for the next
> stable/linux-5.15.y kernel (but it hasn't yet appeared in the stable
> repo yet), please see:
> https://lore.kernel.org/stable/2024112146-tiptoeing-available-c5fe@gregkh/T/
>
> Fix: Apply upstream commit b1a28f2eb9ea7 ("NFS:
> nfs_async_write_reschedule_io must not recurse into the writeback
> code"), that commit was developed by and came from Trond Myklebust the
> upstream Linux NFS client maintainer.
>
> Testcase: Cause buffered IO issued by NFS client using pNFS flexfiles
> to hit error paths (due to heavy enterprise use, with container limits
> being imposed, which makes OOM within container particularly prone to
> hit error memory allocation errors _and_ additional reason for NFS IO
> to be retransmitted, e.g. due to volume down/up bounces). This can
> lead to deadlock in NFS due to recursion with page locks already held,
> e.g.:
> [<0>] wait_on_page_bit_common+0x10c/0x3d0
> [<0>] wait_on_page_bit+0x3f/0x50
> [<0>] wait_on_page_writeback+0x26/0x80
> [<0>] write_cache_pages+0x138/0x460
> [<0>] nfs_writepages+0x10d/0x200 [nfs]
> [<0>] do_writepages+0xd4/0x200
> [<0>] filemap_fdatawrite_wbc+0x89/0xe0
> [<0>] filemap_fdatawrite_range+0x54/0x70
> [<0>] nfs_async_write_reschedule_io+0x69/0x80 [nfs]
> [<0>] ff_layout_reset_write+0x73/0xe0 [nfs_layout_flexfiles]
> [<0>] ff_layout_write_release+0x7a/0x90 [nfs_layout_flexfiles]
> [<0>] rpc_free_task+0x3d/0x70 [sunrpc]
> [<0>] rpc_async_release+0x30/0x50 [sunrpc]
> [<0>] process_one_work+0x228/0x3d0
> [<0>] worker_thread+0x53/0x420
> [<0>] kthread+0x127/0x150
> [<0>] ret_from_fork+0x1f/0x30~
>
Could you make a separate cover letter patch with the content above?
> [ Upstream commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231 ]
>
Since we are picking this commit from mainline, we use a `cherry picked
from` line above your Signed-off-by line instead.
> commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231
> Author: Trond Myklebust <trond.myklebust at hammerspace.com>
> Date: Mon Aug 1 14:16:51 2022 -0400
>
> NFS: nfs_async_write_reschedule_io must not recurse into the writeback code
>
> It is not safe to call filemap_fdatawrite_range() from
> nfs_async_write_reschedule_io(), since we're often calling from a page
> reclaim context. Just let fsync() redrive the writeback for us.
>
> Signed-off-by: Trond Myklebust <trond.myklebust at hammerspace.com>
>
> Fixes: 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on rescheduled writes")
> Link: https://lore.kernel.org/stable/2024112146-tiptoeing-available-c5fe@gregkh/T/
> Link: https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/2089410
> Signed-off-by: Mike Snitzer <snitzer at hammerspace.com>
> Signed-off-by: Mike Snitzer <snitzer at kernel.org>
> ---
> fs/nfs/write.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 2e1c45344d1fd..00e11c6602fe4 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1437,8 +1437,6 @@ static void nfs_async_write_error(struct list_head *head, int error)
> static void nfs_async_write_reschedule_io(struct nfs_pgio_header *hdr)
> {
> nfs_async_write_error(&hdr->pages, 0);
> - filemap_fdatawrite_range(hdr->inode->i_mapping, hdr->args.offset,
> - hdr->args.offset + hdr->args.count - 1);
> }
>
> static const struct nfs_pgio_completion_ops nfs_async_write_completion_ops = {
Thanks,
Magali
More information about the kernel-team
mailing list