APPLIED/Cmnt: [PULL][SRU Focal] Avoid overwhelming NFSv4 server with recalled delegations

Stefan Bader stefan.bader at canonical.com
Fri Feb 11 15:29:13 UTC 2022


On 31.01.22 19:47, dann frazier wrote:
> BugLink: https://launchpad.net/bugs/1957986

I fixed up the links in all patches to use bugs.launchpad.net, the form without 
works (now?) but for some reason we use the longer variant everywhere else. So I 
made it blend in.

> 
> A user reported that they were seeing terrible NFSv4 performance with
> one of their workloads, which they were able to simulate using a
> benchmark. I could reproduce the problem using that benchmark, and
> observed read performance quickly dropping from ~700MiB/s to <
> 10MiB/s. This is reproducible with usptream v5.4, but not more recent
> upstream kernels, so I used bisection to identify the following fix:
> 
> 10717f45639f NFSv4: Limit the total number of cached delegations
> 
> = OK, but what are these other patches? =
> The identified patch was patch 5/5 of this series:
>    https://www.spinics.net/lists/linux-nfs/msg76242.html
> It has hard dependencies on patches 1-4, so all 5 are included here.
> 
> Further, that set of 5 patches depends on the following 20-part series:
>    https://www.spinics.net/lists/linux-nfs/msg75367.html
> The first 2 of which already arrived in focal via stable, so it'd be
> 18 new patches there. I did investigate whether or not we need all 18
> of them. I found that we could technically omit 8 of them, and the
> rest of the patches would still apply and still prevent the performance
> issue. This pull request does include those 8 patches anyway, based on
> a recommendation from Nivedita.
> 
> Finally, I searched for any upstream patches that are marked as
> "Fixes:" for these patches, and 2 were identified. Those patches are
> also included here.
> 
> All patches cherry-picked cleanly except for one (see annotation) that
> required a trivial context adjustment in a header file.
> 
> = Exploration of other options =
> We exhausted 2 other options with the user: using the HWE kernel,
> which already includes these fixes, and forcing NFSv3 mode, which is
> not impacted:
> 
> == HWE Kernel ==
> They have a complex stack containing other 3rd party software/drivers
> that are validated only against the LTS kernel. Using the HWE kernel
> would cause support issues w/ that 3rd party.
> 
> == NFSv3 ==
> While NFSv3 is not impacted by this issue, it does have another issue
> that causes the buffer cache to grow too large after a couple of days,
> making it a poor replacement.
> 
> = Testing =
> == Validation ==
> The customer has tested this and confirmed that it resolves the
> problem for them (not just the benchmark).
> 
> == Regression ==
> I performed regression testing using the delegation tests from the
> nfstest project:
>    https://wiki.linux-nfs.org/wiki/index.php/NFStest
> I see the same passes and failures before and after applying these
> patches. I discussed the failures with the maintainer of these tests,
> and learned that those failures are expected with a Linux NFSv4 server
> because our implementation doesn't support all possible delegations. I
> did due diligence to see if I could test against a server that does -
> specifically NetApp's implementation - but that proved unsuccessful.
> 
>    -dann
> 
> The following changes since commit 206c12bf2ef2d607a3942b8e1b7c8959b4e95ec8:
> 
>    UBUNTU: Ubuntu-5.4.0-98.111 (2022-01-28 10:54:24 +0100)
> 
> are available in the Git repository at:
> 
>    git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux focal-nfsv4-submit
> 
> for you to fetch changes up to 93d0c71c551cbc8956726e6f52e15cfb53a69fdc:
> 
>    NFSv4: Ensure the delegation cred is pinned when we call delegreturn (2022-01-31 10:48:54 -0700)
> 
> ----------------------------------------------------------------
> Trond Myklebust (25):
>        NFSv4: Fix delegation handling in update_open_stateid()
>        NFSv4: nfs4_callback_getattr() should ignore revoked delegations
>        NFSv4: Delegation recalls should not find revoked delegations
>        NFSv4: fail nfs4_refresh_delegation_stateid() when the delegation was revoked
>        NFS: Rename nfs_inode_return_delegation_noreclaim()
>        NFSv4: Don't remove the delegation from the super_list more than once
>        NFSv4: Hold the delegation spinlock when updating the seqid
>        NFSv4: Clear the NFS_DELEGATION_REVOKED flag in nfs_update_inplace_delegation()
>        NFSv4: Update the stateid seqid in nfs_revoke_delegation()
>        NFSv4: Revoke the delegation on success in nfs4_delegreturn_done()
>        NFSv4: Ignore requests to return the delegation if it was revoked
>        NFSv4: Don't reclaim delegations that have been returned or revoked
>        NFSv4: nfs4_return_incompatible_delegation() should check delegation validity
>        NFSv4: Fix nfs4_inode_make_writeable()
>        NFS: nfs_inode_find_state_and_recover() fix stateid matching
>        NFSv4: Fix races between open and delegreturn
>        NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn
>        NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done()
>        NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
>        NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
>        NFSv4: Try to return the delegation immediately when marked for return on close
>        NFSv4: Add accounting for the number of active delegations held
>        NFSv4: Limit the total number of cached delegations
>        NFSv4: Ensure the delegation is pinned in nfs_do_return_delegation()
>        NFSv4: Ensure the delegation cred is pinned when we call delegreturn
> 
>   fs/nfs/callback_proc.c |   4 +-
>   fs/nfs/delegation.c    | 274 +++++++++++++++++++++++++++++++++++++------------
>   fs/nfs/delegation.h    |   5 +-
>   fs/nfs/nfs4_fs.h       |   6 ++
>   fs/nfs/nfs4proc.c      |  18 ++--
>   fs/nfs/nfs4state.c     |   8 +-
>   fs/nfs/nfs4super.c     |   4 +-
>   7 files changed, 237 insertions(+), 82 deletions(-)
> 

Applied to focal:linux/master-next. Thanks.

-Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20220211/2d09aaa7/attachment.sig>


More information about the kernel-team mailing list