NACK: [jammy] SRU: NFS: fix deadlock with pNFS flexfiles IO retry error path

Mike Snitzer snitzer at kernel.org
Thu Nov 28 21:45:40 UTC 2024


On Thu, Nov 28, 2024 at 06:30:46PM -0300, Magali Lemes wrote:
> Hi,
> 
> This patch is missing a BugLink. Please file one here
> https://bugs.launchpad.net/ubuntu/+source/linux/+filebug and use its link in
> the cover letter and commit patches.

I created a bug, see 2nd "Link:" below .. but it was against the
"kernel-package".  I just switched it to the "linux" package, please
see: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089410 

> Also check https://canonical-kernel-docs.readthedocs-hosted.com/en/latest/reference/patch_acceptance_criteria/
> for some reference and examples.
> Finally, could you please submit a v2 taking the comments inline into
> consideration?
> 
> On 22/11/2024 16:16, Mike Snitzer wrote:
> > SRU Justification:
> > 
> > Impact: In pre-production at a mutual "hyperscaler" customer that is
> > using the Ubuntu jammy kernel's NFS client with Hammerspace's pNFS
> > flexfiles: NFS client deadlock occurred due to upstream commit
> > 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on
> > rescheduled writes"). Which was later fixed with upstream commit
> > b1a28f2eb9ea7 ("NFS: nfs_async_write_reschedule_io must not recurse
> > into the writeback code") in August 2022. But it unfortunately wasn't
> > marked for stable@ at that time. That has since been rectified and
> > Greg Kroah-Hartman has now picked it up for the next
> > stable/linux-5.15.y kernel (but it hasn't yet appeared in the stable
> > repo yet), please see:
> > https://lore.kernel.org/stable/2024112146-tiptoeing-available-c5fe@gregkh/T/
> > 
> > Fix: Apply upstream commit b1a28f2eb9ea7 ("NFS:
> > nfs_async_write_reschedule_io must not recurse into the writeback
> > code"), that commit was developed by and came from Trond Myklebust the
> > upstream Linux NFS client maintainer.
> > 
> > Testcase: Cause buffered IO issued by NFS client using pNFS flexfiles
> > to hit error paths (due to heavy enterprise use, with container limits
> > being imposed, which makes OOM within container particularly prone to
> > hit error memory allocation errors _and_ additional reason for NFS IO
> > to be retransmitted, e.g. due to volume down/up bounces). This can
> > lead to deadlock in NFS due to recursion with page locks already held,
> > e.g.:
> > [<0>] wait_on_page_bit_common+0x10c/0x3d0
> > [<0>] wait_on_page_bit+0x3f/0x50
> > [<0>] wait_on_page_writeback+0x26/0x80
> > [<0>] write_cache_pages+0x138/0x460
> > [<0>] nfs_writepages+0x10d/0x200 [nfs]
> > [<0>] do_writepages+0xd4/0x200
> > [<0>] filemap_fdatawrite_wbc+0x89/0xe0
> > [<0>] filemap_fdatawrite_range+0x54/0x70
> > [<0>] nfs_async_write_reschedule_io+0x69/0x80 [nfs]
> > [<0>] ff_layout_reset_write+0x73/0xe0 [nfs_layout_flexfiles]
> > [<0>] ff_layout_write_release+0x7a/0x90 [nfs_layout_flexfiles]
> > [<0>] rpc_free_task+0x3d/0x70 [sunrpc]
> > [<0>] rpc_async_release+0x30/0x50 [sunrpc]
> > [<0>] process_one_work+0x228/0x3d0
> > [<0>] worker_thread+0x53/0x420
> > [<0>] kthread+0x127/0x150
> > [<0>] ret_from_fork+0x1f/0x30~
> > 
> 
> Could you make a separate cover letter patch with the content above?

That's fine. But it conflicts with the guidance the website provides.
I'm fine with doing things however you'd like to see them done.

> > [ Upstream commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231 ]
> > 
> 
> Since we are picking this commit from mainline, we use a `cherry picked
> from` line above your Signed-off-by line instead.

You have the ability to edit patch headers before committing them.
But sure, I'll do my best to conform with your ideals with the v2
submission (please expect to see it tomorrow).

Thanks,
Mike

> > commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231
> > Author: Trond Myklebust <trond.myklebust at hammerspace.com>
> > Date:   Mon Aug 1 14:16:51 2022 -0400
> > 
> >      NFS: nfs_async_write_reschedule_io must not recurse into the writeback code
> > 
> >      It is not safe to call filemap_fdatawrite_range() from
> >      nfs_async_write_reschedule_io(), since we're often calling from a page
> >      reclaim context. Just let fsync() redrive the writeback for us.
> > 
> >      Signed-off-by: Trond Myklebust <trond.myklebust at hammerspace.com>
> > 
> > Fixes: 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on rescheduled writes")
> > Link: https://lore.kernel.org/stable/2024112146-tiptoeing-available-c5fe@gregkh/T/
> > Link: https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/2089410
> > Signed-off-by: Mike Snitzer <snitzer at hammerspace.com>
> > Signed-off-by: Mike Snitzer <snitzer at kernel.org>
> > ---
> >   fs/nfs/write.c | 2 --
> >   1 file changed, 2 deletions(-)
> > 
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index 2e1c45344d1fd..00e11c6602fe4 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -1437,8 +1437,6 @@ static void nfs_async_write_error(struct list_head *head, int error)
> >   static void nfs_async_write_reschedule_io(struct nfs_pgio_header *hdr)
> >   {
> >   	nfs_async_write_error(&hdr->pages, 0);
> > -	filemap_fdatawrite_range(hdr->inode->i_mapping, hdr->args.offset,
> > -			hdr->args.offset + hdr->args.count - 1);
> >   }
> >   static const struct nfs_pgio_completion_ops nfs_async_write_completion_ops = {
> 
> 
> Thanks,
> Magali



More information about the kernel-team mailing list