APPLIED: [SRU][J:linux-bluefield][PATCH 0/7] Devlink backport: Fix mlx5 driver hangs due to mlx5_sf_hw_table_init Edit
Bartlomiej Zolnierkiewicz
bartlomiej.zolnierkiewicz at canonical.com
Thu Nov 2 11:43:11 UTC 2023
Applied to jammy:linux-bluefield/master-next. Thanks.
--
Best regards,
Bartlomiej
On Wed, Nov 1, 2023 at 3:51 PM William Tu <witu at nvidia.com> wrote:
>
> Summary:
> Machine hangs when loading OFED 2310 mlx5 driver at BlueField
>
> How to reproduce:
> # load the OFED driver
>
> Reason:
> BF got stuck and observed call trace "mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core]
>
> dmesg from minicom:
> [ 726.569928] INFO: task systemd-udevd:297 blocked for more than 604 seconds.
> [ 726.576895] Tainted: G OE 5.15.0-1029-bluefield #31-Ubuntu
> [ 726.584101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 726.591913] task:systemd-udevd state:D stack: 0 pid: 297 ppid: 280 flags:0x0000000d
> [ 726.600248] Call trace:
> [ 726.602680] __switch_to+0xf8/0x150
> [ 726.606159] __schedule+0x2b8/0x790
> [ 726.609634] schedule+0x64/0x140
> [ 726.612850] schedule_preempt_disabled+0x18/0x24
> [ 726.617453] __mutex_lock.constprop.0+0x1a0/0x680
> [ 726.622141] __mutex_lock_slowpath+0x40/0x90
> [ 726.626396] mutex_lock+0x64/0x70
> [ 726.629695] devlink_resource_register+0x50/0x1a0
> [ 726.634386] mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core]
> [ 726.639882] mlx5_init_one_devl_locked+0x1c8/0x784 [mlx5_core]
> [ 726.645791] probe_one+0x300/0x5f0 [mlx5_core]
> [ 726.650307] local_pci_probe+0x48/0xb4
> [ 726.654043] pci_device_probe+0x18c/0x200
> [ 726.658039] really_probe+0xd0/0x490
> [ 726.661600] __driver_probe_device+0x148/0x190
> [ 726.666029] driver_probe_device+0x48/0x180
> [ 726.670198] __driver_attach+0x104/0x240
> [ 726.674106] bus_for_each_dev+0x78/0xdc
> [ 726.677927] driver_attach+0x2c/0x40
> [ 726.681486] bus_add_driver+0x154/0x270
> [ 726.685307] driver_register+0x80/0x13c
> [ 726.689129] __pci_register_driver+0x4c/0x60
> [ 726.693386] __init_backport+0xf0/0x1000 [mlx5_core]
> [ 726.698425] do_one_initcall+0x4c/0x250
> [ 726.702248] do_init_module+0x50/0x260
> [ 726.705983] load_module+0x9fc/0xbe0
> [ 726.709543] __do_sys_finit_module+0xa8/0x114
>
> How to fix:
> This is related to
> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869
> and we need to backport/cherry-pick more patches from the series
>
> Patches are below
> Backport: f655dacb59ac net: devlink: remove unused locked functions
> Backport: 012ec02ae441 netdevsim: convert driver to use unlocked devlink API during init/fini
> Cherry-pick: eb0e9fa2c635 net: devlink: add unlocked variants of devlink_region_create/destroy() functions
> SKIP: 72a4c8c94efa mlxsw: convert driver to use unlocked devlink API during init/fini
> Backport: 70a2ff89369d net: devlink: add unlocked variants of devlink_dpipe*() functions
> Cherry-pick: 755cfa69c4ec net: devlink: add unlocked variants of devlink_sb*() functions
> Cherry-pick: c223d6a4bf6d net: devlink: add unlocked variants of devlink_resource*() functions
> Cherry-pick: 852e85a704c2 net: devlink: add unlocked variants of devling_trap*() functions
> Cherry-pick: e26fde2f5bef net: devlink: avoid false DEADLOCK warning reported by lock
>
> Thanks!
>
> Jiri Pirko (6):
> net: devlink: add unlocked variants of devlink_resource*() functions
> net: devlink: add unlocked variants of devlink_sb*() functions
> net: devlink: add unlocked variants of devlink_dpipe*() functions
> net: devlink: add unlocked variants of devlink_region_create/destroy()
> functions
> netdevsim: convert driver to use unlocked devlink API during init/fini
> net: devlink: remove unused locked functions
>
> Moshe Shemesh (1):
> net: devlink: avoid false DEADLOCK warning reported by lockdep
>
> drivers/net/netdevsim/dev.c | 92 +++----
> drivers/net/netdevsim/fib.c | 62 ++---
> include/net/devlink.h | 60 ++--
> net/core/devlink.c | 534 ++++++++++++++++++++----------------
> 4 files changed, 421 insertions(+), 327 deletions(-)
>
More information about the kernel-team
mailing list