[SRU][J:linux-bluefield][PATCH 0/1] Devlink backport: fix race and lock issue
William Tu
witu at nvidia.com
Thu Oct 19 15:09:55 UTC 2023
BugLink: https://bugs.launchpad.net/bugs/2032378
The patch is a follow-up from the previous devlink backport series.
We've found that devlink reload hangs the system when testing against
OFED 2307.
[ 1089.747409] INFO: task devlink:8753 blocked for more than 120 seconds.
[ 1089.760560] Tainted: G OE 5.15.0-1027-bluefield #29-Ubuntu
[ 1089.775086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1089.790829] task:devlink state:D stack: 0 pid: 8753 ppid: 5090 flags:0x00000004
[ 1089.790838] Call trace:
[ 1089.790840] __switch_to+0xf8/0x150
[ 1089.790857] __schedule+0x2b8/0x790
[ 1089.790865] schedule+0x64/0x140
[ 1089.790870] schedule_preempt_disabled+0x18/0x24
[ 1089.790874] __mutex_lock.constprop.0+0x1a0/0x680
[ 1089.790878] __mutex_lock_slowpath+0x40/0x90
[ 1089.790883] mutex_lock+0x64/0x70
[ 1089.790887] devl_lock+0x1c/0x30
[ 1089.790893] mlx5_detach_device+0x58/0x190 [mlx5_core]
[ 1089.791055] mlx5_unload_one+0x40/0xe4 [mlx5_core]
[ 1089.791177] mlx5_devlink_reload_down+0x184/0x270 [mlx5_core]
[ 1089.791318] devlink_reload+0x214/0x290
Checking the OFED source code, we found this missing devl trap group
also need to be backported to avoid deadlock.
void mlx5_detach_device(struct mlx5_core_dev *dev, bool suspend)
{
...
#ifdef HAVE_DEVL_PORT_REGISTER
#ifdef HAVE_DEVL_TRAP_GROUPS_REGISTER
devl_assert_locked(priv_to_devlink(dev));
#else
devl_lock(devlink);
#endif /* HAVE_DEVL_TRAP_GROUPS_REGISTER */
#endif /* HAVE_DEVL_PORT_REGISTER */
mutex_lock(&mlx5_intf_mutex);
#ifdef HAVE_DEVL_PORT_REGISTER
I'm re-using the same BugLink as it is relevant issue.
Jiri Pirko (1):
net: devlink: add unlocked variants of devling_trap*() functions
include/net/devlink.h | 20 +++++
net/core/devlink.c | 180 ++++++++++++++++++++++++++++++++++--------
2 files changed, 168 insertions(+), 32 deletions(-)
--
2.37.1 (Apple Git-137.1)
More information about the kernel-team
mailing list