APPLIED: [SRU][J][I][F][PATCH 0/1] Null Pointer issue in nfs code running Ubuntu on IBM Z (LP: 1968096)
Kleber Souza
kleber.sacilotto.de.souza at canonical.com
Fri May 27 08:36:50 UTC 2022
On 17.05.22 07:45, frank.heimes at canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/1968096
>
> SRU Justification:
>
> [Impact]
>
> * The kernel crashed under load with a null pointer issue in nfs code:
> [556585.270959] Krnl Code:#0000000000000000: 0000 illegal
> >0000000000000002: 0000 illegal
> 0000000000000004: 0000 illegal
> 0000000000000006: 0000 illegal
> 0000000000000008: 0000 illegal
> 000000000000000a: 0000 illegal
> 000000000000000c: 0000 illegal
> 000000000000000e: 0000 illegal
> [556585.270967] Call Trace:
> [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc])
> [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs]
> [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs]
> [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs]
> [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0
> [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0
> [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0
> [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8
>
> * Several dumps were generated and shared with Canonical.
>
> * Analysis (done by kernel and SEG) point to refcount leaks fixed,
> that are already fixed in the following commit/fix:
>
> [Fix]
>
> * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()"
>
> [Test Case]
>
> * There is unfortunately no reproducer or trigger available for this issue.
>
> * It just happens now and then under higher load.
>
> * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and
> ran for more than a week in a special staging environment (at IBM)
> without further crashes.
>
> * Hence the test and verification will be done by the IBM Z team.
>
> [Where problems could occur]
>
> * The inode handling can become broken, in case the changes
> on the pointers are erroneous.
>
> * Problems with the authentication and/or the credentials could occur
> due to the modifications in put_rpccred, rpc_cred and rpc_auth.
>
> * The expiration of the cached credentials could be harmed as well,
> due to the changes in nfs_ctx_key_to_expire.
>
> * The different pointer artihmetic may cause further issues - wrong
> or null pointer references.
>
> * Positive is that the original commit was brought upstream by nfs experts.
>
> * A patched test kernel sustained day long runs under load in a staging
> and test environment.
>
> * The author of the upstream commit/patch is well known in the NFS area.
>
> [Other]
>
> * The Salesforce Case Number 00334334 is associated with this bug.
>
> * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1.
>
> * But commit ca05cbae2a04 was unfortunately not tagged as stable,
> hence it was not picked automatically.
>
> * Since kinetic's (22.10) target kernel is 5.18,
> it will have the patch included,
> hence no dedicated PATCH request for kinetic.
>
> Trond Myklebust (1):
> NFS: Fix up nfs_ctx_key_to_expire()
>
> fs/nfs/inode.c | 4 ++--
> fs/nfs/write.c | 41 ++++++++++++++++++++++++++++-------------
> include/linux/nfs_fs.h | 2 +-
> 3 files changed, 31 insertions(+), 16 deletions(-)
>
Applied to focal/impish/jammy:linux.
Thanks,
Kleber
More information about the kernel-team
mailing list