NACK: [jammy] SRU: NFS: fix deadlock with pNFS flexfiles IO retry error path

Stefan Bader stefan.bader at canonical.com
Fri Nov 29 08:18:21 UTC 2024


On 28.11.24 22:45, Mike Snitzer wrote:
> On Thu, Nov 28, 2024 at 06:30:46PM -0300, Magali Lemes wrote:
>> Hi,
>>
>> This patch is missing a BugLink. Please file one here
>> https://bugs.launchpad.net/ubuntu/+source/linux/+filebug and use its link in
>> the cover letter and commit patches.
> 
> I created a bug, see 2nd "Link:" below .. but it was against the
> "kernel-package".  I just switched it to the "linux" package, please
> see: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089410
> 
>> Also check https://canonical-kernel-docs.readthedocs-hosted.com/en/latest/reference/patch_acceptance_criteria/
>> for some reference and examples.
>> Finally, could you please submit a v2 taking the comments inline into
>> consideration?
>>
>> On 22/11/2024 16:16, Mike Snitzer wrote:
>>> SRU Justification:
>>>
>>> Impact: In pre-production at a mutual "hyperscaler" customer that is
>>> using the Ubuntu jammy kernel's NFS client with Hammerspace's pNFS
>>> flexfiles: NFS client deadlock occurred due to upstream commit
>>> 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on
>>> rescheduled writes"). Which was later fixed with upstream commit
>>> b1a28f2eb9ea7 ("NFS: nfs_async_write_reschedule_io must not recurse
>>> into the writeback code") in August 2022. But it unfortunately wasn't
>>> marked for stable@ at that time. That has since been rectified and
>>> Greg Kroah-Hartman has now picked it up for the next
>>> stable/linux-5.15.y kernel (but it hasn't yet appeared in the stable
>>> repo yet), please see:
>>> https://lore.kernel.org/stable/2024112146-tiptoeing-available-c5fe@gregkh/T/
>>>
>>> Fix: Apply upstream commit b1a28f2eb9ea7 ("NFS:
>>> nfs_async_write_reschedule_io must not recurse into the writeback
>>> code"), that commit was developed by and came from Trond Myklebust the
>>> upstream Linux NFS client maintainer.
>>>
>>> Testcase: Cause buffered IO issued by NFS client using pNFS flexfiles
>>> to hit error paths (due to heavy enterprise use, with container limits
>>> being imposed, which makes OOM within container particularly prone to
>>> hit error memory allocation errors _and_ additional reason for NFS IO
>>> to be retransmitted, e.g. due to volume down/up bounces). This can
>>> lead to deadlock in NFS due to recursion with page locks already held,
>>> e.g.:
>>> [<0>] wait_on_page_bit_common+0x10c/0x3d0
>>> [<0>] wait_on_page_bit+0x3f/0x50
>>> [<0>] wait_on_page_writeback+0x26/0x80
>>> [<0>] write_cache_pages+0x138/0x460
>>> [<0>] nfs_writepages+0x10d/0x200 [nfs]
>>> [<0>] do_writepages+0xd4/0x200
>>> [<0>] filemap_fdatawrite_wbc+0x89/0xe0
>>> [<0>] filemap_fdatawrite_range+0x54/0x70
>>> [<0>] nfs_async_write_reschedule_io+0x69/0x80 [nfs]
>>> [<0>] ff_layout_reset_write+0x73/0xe0 [nfs_layout_flexfiles]
>>> [<0>] ff_layout_write_release+0x7a/0x90 [nfs_layout_flexfiles]
>>> [<0>] rpc_free_task+0x3d/0x70 [sunrpc]
>>> [<0>] rpc_async_release+0x30/0x50 [sunrpc]
>>> [<0>] process_one_work+0x228/0x3d0
>>> [<0>] worker_thread+0x53/0x420
>>> [<0>] kthread+0x127/0x150
>>> [<0>] ret_from_fork+0x1f/0x30~
>>>
>>
>> Could you make a separate cover letter patch with the content above?
> 
> That's fine. But it conflicts with the guidance the website provides.
> I'm fine with doing things however you'd like to see them done.

We are in-progress of improving there. Could you provide, separately a 
hint where you found the other guidance. We started to add the new one 
(and yes just yesterday) to the mailing list info 
(https://lists.ubuntu.com/kernel-team) and to the welcome letter of that 
list. Hopefully we get all the places in sync soon (which is hard if one 
barely remembers where things were sprinkled to).

Thanks!

> 
>>> [ Upstream commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231 ]
>>>
>>
>> Since we are picking this commit from mainline, we use a `cherry picked
>> from` line above your Signed-off-by line instead.
> 
> You have the ability to edit patch headers before committing them.
> But sure, I'll do my best to conform with your ideals with the v2
> submission (please expect to see it tomorrow). >
> Thanks,
> Mike
> 
>>> commit b1a28f2eb9ea7a5a1763fe53fe699aa0feae4231
>>> Author: Trond Myklebust <trond.myklebust at hammerspace.com>
>>> Date:   Mon Aug 1 14:16:51 2022 -0400
>>>
>>>       NFS: nfs_async_write_reschedule_io must not recurse into the writeback code
>>>
>>>       It is not safe to call filemap_fdatawrite_range() from
>>>       nfs_async_write_reschedule_io(), since we're often calling from a page
>>>       reclaim context. Just let fsync() redrive the writeback for us.
>>>
>>>       Signed-off-by: Trond Myklebust <trond.myklebust at hammerspace.com>
>>>
>>> Fixes: 7be7b3ca16a59 ("NFS: Ensure we immediately start writeback on rescheduled writes")
>>> Link: https://lore.kernel.org/stable/2024112146-tiptoeing-available-c5fe@gregkh/T/
>>> Link: https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/2089410
>>> Signed-off-by: Mike Snitzer <snitzer at hammerspace.com>
>>> Signed-off-by: Mike Snitzer <snitzer at kernel.org>
>>> ---
>>>    fs/nfs/write.c | 2 --
>>>    1 file changed, 2 deletions(-)
>>>
>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>> index 2e1c45344d1fd..00e11c6602fe4 100644
>>> --- a/fs/nfs/write.c
>>> +++ b/fs/nfs/write.c
>>> @@ -1437,8 +1437,6 @@ static void nfs_async_write_error(struct list_head *head, int error)
>>>    static void nfs_async_write_reschedule_io(struct nfs_pgio_header *hdr)
>>>    {
>>>    	nfs_async_write_error(&hdr->pages, 0);
>>> -	filemap_fdatawrite_range(hdr->inode->i_mapping, hdr->args.offset,
>>> -			hdr->args.offset + hdr->args.count - 1);
>>>    }
>>>    static const struct nfs_pgio_completion_ops nfs_async_write_completion_ops = {
>>
>>
>> Thanks,
>> Magali
> 

-- 
- Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 48643 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20241129/bab92bdd/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20241129/bab92bdd/attachment-0001.sig>


More information about the kernel-team mailing list