[SRU][J:linux-bluefield][PATCH 0/1] genetlink: fix single op policy dump when do is present
William Tu
witu at nvidia.com
Mon Apr 1 18:40:02 UTC 2024
intro
-----
Our internal test triggers a kernel crash dump below
[ 888.690348] Sun Mar 24 23:51:59 2024: DriVerTest - Start Test
[ 888.691834] ----------------------------------------------------------------------------------------------------
[ 888.983912] mlx5_core 0000:08:00.1
eth3: Link up
[ 888.987644] IPv6: ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready
[ 889.336577] mlx5_core 0000:08:00.0 eth2: Link up
[ 894.635836] Sun Mar 24 11:52:04 PM IST 2024 - DriVerTest Debug Heartbeat
[ 940.431644] general protection fault, probably for non-canonical address 0x8002001400000000: 0000 [#1] SMP NOPTI
[ 940.432866] CPU: 7 PID: 94305 Comm: ethtool Tainted: G OE 5.15.0-1039.17.g0d63875-bluefield #1
[ 940.433970] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 940.435220] RIP: 0010:netlink_policy_dump_add_policy+0x95/0x160
fix
---
Need to cherry-pick the following patch
commit c1b05105573b2cd5845921eb0d2caa26e2144a34
Author: Jakub Kicinski <kuba at kernel.org>
Date: Wed Nov 9 10:32:54 2022 -0800
genetlink: fix single op policy dump when do is present
Jonathan reports crashes when running net-next in Meta's fleet.
Stats collection uses ethtool -I which does a per-op policy dump
to check if stats are supported. We don't initialize the dumpit
information if doit succeeds due to evaluation short-circuiting.
The crash may look like this:
BUG: kernel NULL pointer dereference, address: 0000000000000cc0
RIP: 0010:netlink_policy_dump_add_policy+0x174/0x2a0
ctrl_dumppolicy_start+0x19f/0x2f0
genl_start+0xe7/0x140
Jakub Kicinski (1):
genetlink: fix single op policy dump when do is present
net/netlink/genetlink.c | 30 +++++++++++++++++++++---------
1 file changed, 21 insertions(+), 9 deletions(-)
--
2.37.1 (Apple Git-137.1)
More information about the kernel-team
mailing list