NAK: [SRU][J:linux-bluefield][PATCH 0/1] Devlink backport: fix race and lock issue

Tim Gardner tim.gardner at canonical.com
Thu Oct 19 16:07:21 UTC 2023


On 10/19/23 10:03 AM, Tim Gardner wrote:
> On 10/19/23 9:09 AM, William Tu wrote:
>> BugLink: https://bugs.launchpad.net/bugs/2032378
>>
>> The patch is a follow-up from the previous devlink backport series.
>> We've found that devlink reload hangs the system when testing against
>> OFED 2307.
>>
>> [ 1089.747409] INFO: task devlink:8753 blocked for more than 120 seconds.
>> [ 1089.760560]       Tainted: G           OE     5.15.0-1027-bluefield 
>> #29-Ubuntu
>> [ 1089.775086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> disables this message.
>> [ 1089.790829] task:devlink         state:D stack:    0 pid: 8753 
>> ppid:  5090 flags:0x00000004
>> [ 1089.790838] Call trace:
>> [ 1089.790840]  __switch_to+0xf8/0x150
>> [ 1089.790857]  __schedule+0x2b8/0x790
>> [ 1089.790865]  schedule+0x64/0x140
>> [ 1089.790870]  schedule_preempt_disabled+0x18/0x24
>> [ 1089.790874]  __mutex_lock.constprop.0+0x1a0/0x680
>> [ 1089.790878]  __mutex_lock_slowpath+0x40/0x90
>> [ 1089.790883]  mutex_lock+0x64/0x70
>> [ 1089.790887]  devl_lock+0x1c/0x30
>> [ 1089.790893]  mlx5_detach_device+0x58/0x190 [mlx5_core]
>> [ 1089.791055]  mlx5_unload_one+0x40/0xe4 [mlx5_core]
>> [ 1089.791177]  mlx5_devlink_reload_down+0x184/0x270 [mlx5_core]
>> [ 1089.791318]  devlink_reload+0x214/0x290
>>
>> Checking the OFED source code, we found this missing devl trap group
>> also need to be backported to avoid deadlock.
>>
>> void mlx5_detach_device(struct mlx5_core_dev *dev, bool suspend)
>> {
>> ...
>> #ifdef HAVE_DEVL_PORT_REGISTER
>> #ifdef HAVE_DEVL_TRAP_GROUPS_REGISTER
>>          devl_assert_locked(priv_to_devlink(dev));
>> #else
>>          devl_lock(devlink);
>> #endif /* HAVE_DEVL_TRAP_GROUPS_REGISTER */
>> #endif /* HAVE_DEVL_PORT_REGISTER */
>>          mutex_lock(&mlx5_intf_mutex);
>> #ifdef HAVE_DEVL_PORT_REGISTER
>>
>> I'm re-using the same BugLink as it is relevant issue.
>>
>> Jiri Pirko (1):
>>    net: devlink: add unlocked variants of devling_trap*() functions
>>
>>   include/net/devlink.h |  20 +++++
>>   net/core/devlink.c    | 180 ++++++++++++++++++++++++++++++++++--------
>>   2 files changed, 168 insertions(+), 32 deletions(-)
>>
> 
> This needs a new LP bug since 00371808 is already fix committed. Also, 
> there was no patch or PR attached to this email. What are we supposed to 
> do with it ?
> 

Never mind that last part. It was in my SPAM for some reason. 
Nevertheless, you need a new LP bug.
-- 
-----------
Tim Gardner
Canonical, Inc




More information about the kernel-team mailing list