[SRU][J:linux-bluefield][PATCH 0/7] Devlink backport: Fix mlx5 driver hangs due to mlx5_sf_hw_table_init Edit
William Tu
witu at nvidia.com
Wed Nov 1 14:49:44 UTC 2023
Summary:
Machine hangs when loading OFED 2310 mlx5 driver at BlueField
How to reproduce:
# load the OFED driver
Reason:
BF got stuck and observed call trace "mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core]
dmesg from minicom:
[ 726.569928] INFO: task systemd-udevd:297 blocked for more than 604 seconds.
[ 726.576895] Tainted: G OE 5.15.0-1029-bluefield #31-Ubuntu
[ 726.584101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 726.591913] task:systemd-udevd state:D stack: 0 pid: 297 ppid: 280 flags:0x0000000d
[ 726.600248] Call trace:
[ 726.602680] __switch_to+0xf8/0x150
[ 726.606159] __schedule+0x2b8/0x790
[ 726.609634] schedule+0x64/0x140
[ 726.612850] schedule_preempt_disabled+0x18/0x24
[ 726.617453] __mutex_lock.constprop.0+0x1a0/0x680
[ 726.622141] __mutex_lock_slowpath+0x40/0x90
[ 726.626396] mutex_lock+0x64/0x70
[ 726.629695] devlink_resource_register+0x50/0x1a0
[ 726.634386] mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core]
[ 726.639882] mlx5_init_one_devl_locked+0x1c8/0x784 [mlx5_core]
[ 726.645791] probe_one+0x300/0x5f0 [mlx5_core]
[ 726.650307] local_pci_probe+0x48/0xb4
[ 726.654043] pci_device_probe+0x18c/0x200
[ 726.658039] really_probe+0xd0/0x490
[ 726.661600] __driver_probe_device+0x148/0x190
[ 726.666029] driver_probe_device+0x48/0x180
[ 726.670198] __driver_attach+0x104/0x240
[ 726.674106] bus_for_each_dev+0x78/0xdc
[ 726.677927] driver_attach+0x2c/0x40
[ 726.681486] bus_add_driver+0x154/0x270
[ 726.685307] driver_register+0x80/0x13c
[ 726.689129] __pci_register_driver+0x4c/0x60
[ 726.693386] __init_backport+0xf0/0x1000 [mlx5_core]
[ 726.698425] do_one_initcall+0x4c/0x250
[ 726.702248] do_init_module+0x50/0x260
[ 726.705983] load_module+0x9fc/0xbe0
[ 726.709543] __do_sys_finit_module+0xa8/0x114
How to fix:
This is related to
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869
and we need to backport/cherry-pick more patches from the series
Patches are below
Backport: f655dacb59ac net: devlink: remove unused locked functions
Backport: 012ec02ae441 netdevsim: convert driver to use unlocked devlink API during init/fini
Cherry-pick: eb0e9fa2c635 net: devlink: add unlocked variants of devlink_region_create/destroy() functions
SKIP: 72a4c8c94efa mlxsw: convert driver to use unlocked devlink API during init/fini
Backport: 70a2ff89369d net: devlink: add unlocked variants of devlink_dpipe*() functions
Cherry-pick: 755cfa69c4ec net: devlink: add unlocked variants of devlink_sb*() functions
Cherry-pick: c223d6a4bf6d net: devlink: add unlocked variants of devlink_resource*() functions
Cherry-pick: 852e85a704c2 net: devlink: add unlocked variants of devling_trap*() functions
Cherry-pick: e26fde2f5bef net: devlink: avoid false DEADLOCK warning reported by lock
Thanks!
Jiri Pirko (6):
net: devlink: add unlocked variants of devlink_resource*() functions
net: devlink: add unlocked variants of devlink_sb*() functions
net: devlink: add unlocked variants of devlink_dpipe*() functions
net: devlink: add unlocked variants of devlink_region_create/destroy()
functions
netdevsim: convert driver to use unlocked devlink API during init/fini
net: devlink: remove unused locked functions
Moshe Shemesh (1):
net: devlink: avoid false DEADLOCK warning reported by lockdep
drivers/net/netdevsim/dev.c | 92 +++----
drivers/net/netdevsim/fib.c | 62 ++---
include/net/devlink.h | 60 ++--
net/core/devlink.c | 534 ++++++++++++++++++++----------------
4 files changed, 421 insertions(+), 327 deletions(-)
--
2.37.1 (Apple Git-137.1)
More information about the kernel-team
mailing list