APPLIED: [SRU][Disco][PATCH 0/1] NFSv4.1: Interrupted connections cause high bandwidth RPC ping-pong between client and server
Khaled Elmously
khalid.elmously at canonical.com
Fri Nov 8 17:32:18 UTC 2019
On 2019-10-30 12:38:14 , Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1828978
>
> [Impact]
>
> There is a bug in NFS v4.1 that causes a large amount of RPC calls between a
> client and server when a previous RPC call is interrupted. This uses a large
> amount of bandwidth and can saturate the network.
>
> The symptoms are so:
>
> * On NFS clients:
> Attempts to access mounted NFS shares associated with the affected server block
> indefinitely.
>
> * On the network:
> A storm of repeated RPCs between NFS client and server uses a lot of bandwidth.
> Each RPC is acknoledged by the server with an NFS4ERR_SEQ_MISORDERED error.
>
> * Other NFS clients connected to the same NFS server:
> Performance drops dramatically.
>
> This occurs during a "false retry", when a client attempts to make a new RPC
> call using a slot+sequence number that references an older, cached call. This
> happens when a user process interrupts an RPC call that is in progress.
>
> [Fix]
>
> This was fixed in 5.1 upstream with the below commit:
>
> commit 3453d5708b33efe76f40eca1c0ed60923094b971
> Author: Trond Myklebust <trond.myklebust at hammerspace.com>
> Date: Wed Jun 20 17:53:34 2018 -0400
> Subject: NFSv4.1: Avoid false retries when RPC calls are interrupted
>
> The fix is to pre-emptively increment the sequence number if an RPC call is
> interrupted, and to address corner cases we interpret the NFS4ERR_SEQ_MISORDERED
> error as a sign we need to locate an approperiate sequence number between the
> value we sent, and the last successfully acked SEQUENCE call.
>
> Commit 3453d5708b33efe76f40eca1c0ed60923094b971 is a clean cherry-pick to disco.
>
> [Testcase]
>
> This is difficult to reproduce on test systems, and has instead been verified on
> a production NFS v4.1 system in a customer environment. This server is heavily
> trafficked and has a large number of different NFS clients connected to it.
>
> I have built a test kernel that contains the above patch, and also patches for
> Bug 1842037. It is available here:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test
>
> Note that the above kernel is for bionic HWE, and not explicitly disco.
>
> Discussion about the patch validation can be found at the bottom of Bug 1842037.
>
> On unpatched kernels, expect to see the symptoms mentioned in Impact, and on
> patched systems, everything working as intended.
>
> [Regression Potential]
>
> The changes are localised to NFS v4.1 only, and other versions of NFS are not
> affected. If a regression occurs, users can downgrade NFS versions to v4.0 or
> v3.x until a fix is made.
>
> The changes only impact when connections are interrupted, and under typical blue
> sky scenarios would not be invoked.
>
> There have been no fixup commits or commits near the requested commit in newer
> kernels, which points to this commit fixing the issue, and adopted by the
> community.
>
> Trond Myklebust (1):
> NFSv4.1: Avoid false retries when RPC calls are interrupted
>
> fs/nfs/nfs4proc.c | 105 ++++++++++++++++++++-----------------------
> fs/nfs/nfs4session.c | 5 ++-
> fs/nfs/nfs4session.h | 5 ++-
> 3 files changed, 55 insertions(+), 60 deletions(-)
>
> --
> 2.20.1
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
More information about the kernel-team
mailing list