ACK: [SRU][Disco][PATCH 0/2] SUNRPC: Use after free when GSSD credentials are invalid causes oops
Andrea Righi
andrea.righi at canonical.com
Wed Oct 30 09:19:54 UTC 2019
On Wed, Oct 30, 2019 at 04:50:04PM +1300, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1842037
>
> [Impact]
>
> There is a use after free which normally causes a null pointer dereference when
> NFS clients send invalid credentials via GSSD to a NFS server which has shares
> protected by kerberos krb5* security.
>
> The call trace is below:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
> Call Trace:
> rpc_check_timeout+0x22/0xe0 [sunrpc]
> call_decode+0x12c/0x190 [sunrpc]
> __rpc_execute+0x7a/0x340 [sunrpc]
> rpc_execute+0xa3/0xb0 [sunrpc]
> rpc_run_task+0xf7/0x140 [sunrpc]
> nfs4_proc_get_lease_time+0xf3/0x130 [nfsv4]
> nfs41_setup_state_renewal+0x3d/0x90 [nfsv4]
> ? nfs4_realloc_slot_table+0x5b/0x130 [nfsv4]
> ? nfs4_setup_session_slot_tables+0x77/0xc0 [nfsv4]
> nfs41_finish_session_reset+0x26/0x30 [nfsv4]
> nfs41_init_clientid+0x44/0x70 [nfsv4]
> nfs4_establish_lease+0x61/0xa0 [nfsv4]
> ? nfs4_handle_reclaim_lease_error+0x108/0x130 [nfsv4]
> nfs4_state_manager+0x1b1/0x750 [nfsv4]
> ? kernel_sigaction+0x43/0xe0
> nfs4_run_state_manager+0x24/0x40 [nfsv4]
> kthread+0x120/0x140
> ? nfs4_state_manager+0x750/0x750 [nfsv4]
> ? __kthread_parkme+0x70/0x70
> ret_from_fork+0x35/0x40
>
> This then causes outages on a heavily trafficked NFS server and no one can
> access their shares until the server is rebooted.
>
> [Fix]
>
> The fix comes in the following two commits:
>
> commit cea57789e4081870ac3498fbefabbbd0d0fd8434
> Author: Trond Myklebust <trond.myklebust at hammerspace.com>
> Date: Sat Mar 9 16:06:47 2019 -0500
> Subject: SUNRPC: Clean up
>
> commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3
> Author: Trond Myklebust <trond.myklebust at hammerspace.com>
> Date: Wed May 29 12:49:52 2019 -0400
> Subject: SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
> credential
>
> There is a small subtlety to be addressed here.
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 is marked as "Fixes"
> cea57789e4081870ac3498fbefabbbd0d0fd8434, and
> cea57789e4081870ac3498fbefabbbd0d0fd8434 is not present in the disco kernel.
> The code path which triggers the use after free can still be reached without
> cea57789e4081870ac3498fbefabbbd0d0fd8434 being present, and applying both
> commits resolves the problem, while still maintaining as little backporting as
> necessary.
>
> Please note, both commits have been backported.
>
> cea57789e4081870ac3498fbefabbbd0d0fd8434 required a small change to the final
> comment, as well as some minor context adjustments in the final hunk.
>
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 required medium to heavy backporting.
> The upstream patch uses a switch statement based on error codes, while the disco
> kernel uses a goto and label type architecture. The commits in the middle were
> numerous and completely unrelated, so I backported the commit to use goto and
> labels. Please review this backport a little more closely than you normally do,
> but it has been tested, and I believe the code to be sound.
>
> cea57789e4081870ac3498fbefabbbd0d0fd8434 landed in 5.1 upstream.
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 landed in 5.2 upstream.
>
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 was also selected for upstream -stable,
> in 5.1.9, and was omitted from disco stable updates process, probably because of
> the patch not cleanly applying and requiring the backport and infrastructure
> provided in cea57789e4081870ac3498fbefabbbd0d0fd8434.
>
> [Testcase]
>
> This is difficult to reproduce on test systems, and has instead been verified on
> a production NFS v4.1 system in a customer environment. This server is heavily
> trafficked and has a large number of different NFS clients connected to it.
>
> I have built a test kernel that contains the above patch, and also patches for
> Bug 1828978. It is available here:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test
>
> Note that the above kernel is for bionic HWE, and not explicitly disco.
>
> Discussion about the patch validation can be found at the bottom of Bug 1842037.
>
> On unpatched kernels, expect to see the symptoms mentioned in Impact, and on
> patched systems, everything working as intended.
>
> [Regression Potential]
>
> The changes are limited to users of sunrpc and the change itself is limited to
> cases where the RPCSEC_GSS credential is rejected. Under blue skies scenarios,
> the code should only be triggered when misbehaving clients do not keep their
> authentication tickets up to date.
>
> In case of regression, misbehaving clients may cause outages on services which
> use sunrpc. In which case, the server administrator would have to revert to an
> earlier kernel in order to restore those services, as you cannot stop
> misbehaving clients from connecting.
>
> Since the fix was selected for upstream stable, it has been vetted and widely
> accepted by the community. The backport I performed should have no functional
> difference at all.
>
> Trond Myklebust (2):
> SUNRPC: Clean up
> SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
> credential
>
> net/sunrpc/clnt.c | 76 +++++++++++++++++++----------------------------
> 1 file changed, 30 insertions(+), 46 deletions(-)
Makes sense to me and backports look coorect.
Acked-by: Andrea Righi <andrea.righi at canonical.com>
More information about the kernel-team
mailing list