[SRU][J:linux-bluefield][PATCH v1 0/9] Kernel panic in restart driver after configuring IPsec full offload

Tony Duan yifeid at nvidia.com
Mon Dec 25 06:20:22 UTC 2023


BugLink: https://bugs.launchpad.net/bugs/2044427

SRU Justification:

[Impact]

* This patch ported some fixes related to xfrm to avoid crash in some cases

[Fix]

* cherry-pick afa8cc09c0effbc6532b4a6d89027c63a4f4dfa2 afa8cc0 net: xfrm: Fix xfrm_address_filter OOB read
  cherry-pick 027657f5b0e5786fb4a3f81f0c56807128c38e8d 027657f xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH
  cherry-pick e2cfb0384b887db477b969e998c53c4745513f92 e2cfb03 xfrm: Silence warnings triggerable by bad packets
  cherry-pick 7cbe43787657bc3d6edd175ba3e486980a89afdf 7cbe437 xfrm: Remove inner/outer modes from input path
  cherry-pick 7e4e5880259f9e85d322969577a36f61d98deff4 7e4e588 net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
  cherry-pick 92ad4f000093dcb14dd131a2fd7bf7d59ae956c0 92ad4f0 net: af_key: fix sadb_x_filter validation
  cherry-pick 4c8893c6d1f25a9d04740afc27ce0166d1662609 4c8893c xfrm: Flush xfrm state synchronously on netdev close or unregister
  backport 1a18e06a37ae5c0eb83f47bdc91a3923a7c21c6f 1a18e06 xfrm: get global statistics from the offloaded device
  backport aabb407c261858f1b772eb1f4fa92bc38a203098 aabb407 xfrm: generalize xdo_dev_state_update_curlft to allow statistics update

[Test Plan]

* Restarting the driver with IPsec full offload transparent mode configuration causes kernel panic.
Kernel version is linux-bluefield 5.15

Test step:
1) configure xfrm rules
2) configure VF
3) configure FW steering mode
4) restart driver
5) check dmesg

Test result:
 [ 937.989359] ------------[ cut here ]------------
 [ 937.989786] WARNING: CPU: 11 PID: 60463 at /tmp/23.10-0.1.8/6.5.0-rc6_mlnx/fedora_32/mlnx-ofa_kernel/BUILD/mlnx-ofa_kernel-23.10/obj/default/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1828 mlx5e_accel_ipsec_fs_cleanup+0x298/0x2b0 [mlx5_core]
 [ 937.991698] fuse virtio_net net_failover failover [last unloaded: vdpa]
 [ 937.999155] CPU: 11 PID: 60463 Comm: modprobe Tainted: G OE 6.5.0-rc6_mlnx #1
 [ 937.999891] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [ 938.000823] RIP: 0010:mlx5e_accel_ipsec_fs_cleanup+0x298/0x2b0 [mlx5_core]
 [ 938.001459] Code: f6 45 31 c0 48 89 ea 31 ff e8 d4 d5 df ff 59 e9 8c fe ff ff c3 0f 0b e9 3b fe ff ff 0f 0b e9 e8 fd ff ff 0f 0b e9 07 fe ff ff <0f> 0b e9 65 fe ff ff 0f 0b e9 82 fe ff ff 66 2e 0f 1f 84 00 00 00
 [ 938.002949] RSP: 0018:ffffc90001183c08 EFLAGS: 00010202
 [ 938.003418] RAX: 0000000000000000 RBX: ffff8882f3869c00 RCX: 0000000000000001
 [ 938.004024] RDX: ffffffff82a305c0 RSI: 0000000000000002 RDI: ffff888103aa2b30
 [ 938.004624] RBP: ffff888103aa2d80 R08: 0000000000000001 R09: ffff888100042800
 [ 938.005238] R10: 0000000000000002 R11: ffffc90001183ba8 R12: ffff8881312e6800
 [ 938.005836] R13: ffff8881127401a0 R14: ffff8881312e6800 R15: ffff888148bbd160
 [ 938.006444] FS: 00007fd22b82c740(0000) GS:ffff88885fac0000(0000) knlGS:0000000000000000
 [ 938.009456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 938.009970] CR2: 00007f26ca697000 CR3: 000000012e73f003 CR4: 0000000000770ee0
 [ 938.010568] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [ 938.011173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [ 938.011772] PKRU: 55555554
 [ 938.012065] Call Trace:
 [ 938.012333]
 [ 938.012583] ? __warn+0x7d/0x120
 [ 938.012921] ? mlx5e_accel_ipsec_fs_cleanup+0x298/0x2b0 [mlx5_core]
 [ 938.013494] ? report_bug+0xf1/0x1c0
 [ 938.013850] ? handle_bug+0x44/0x70
 [ 938.014201] ? exc_invalid_op+0x13/0x60
 [ 938.014568] ? asm_exc_invalid_op+0x16/0x20
 [ 938.014970] ? mlx5e_accel_ipsec_fs_cleanup+0x298/0x2b0 [mlx5_core]
 [ 938.015532] ? mlx5e_accel_ipsec_fs_cleanup+0xf2/0x2b0 [mlx5_core]
 [ 938.016093] mlx5e_ipsec_cleanup+0x1e/0x100 [mlx5_core]
 [ 938.016594] mlx5e_detach_netdev+0x46/0x80 [mlx5_core]
 [ 938.017098] mlx5e_vport_rep_unload+0x147/0x1a0 [mlx5_core]
 [ 938.017623] mlx5_eswitch_unregister_vport_reps+0x13e/0x190 [mlx5_core]
 [ 938.018221] auxiliary_bus_remove+0x18/0x30
 [ 938.018616] device_release_driver_internal+0xaa/0x130
 [ 938.019076] bus_remove_device+0xc3/0x130
 [ 938.019451] device_del+0x157/0x380
 [ 938.019792] ? kobject_put+0xb3/0x200
 [ 938.020153] delete_drivers+0x72/0xa0 [mlx5_core]
 [ 938.020608] mlx5_unregister_device+0x34/0x70 [mlx5_core]
 [ 938.021113] mlx5_uninit_one+0x25/0x130 [mlx5_core]
 [ 938.021572] remove_one+0x72/0xc0 [mlx5_core]
 [ 938.022002] pci_device_remove+0x31/0xb0
 [ 938.022376] device_release_driver_internal+0xaa/0x130
 [ 938.022827] driver_detach+0x3f/0x80
 [ 938.023181] bus_remove_driver+0x69/0xe0
 [ 938.023553] pci_unregister_driver+0x22/0x90
 [ 938.023957] mlx5_cleanup+0xc/0x4c [mlx5_core]
 [ 938.024384] __x64_sys_delete_module+0x157/0x280
 [ 938.024806] do_syscall_64+0x34/0x80
 [ 938.025163] entry_SYSCALL_64_after_hwframe+0x46/0xb0
 [ 938.025616] RIP: 0033:0x7fd22b93812b
 [ 938.025969] Code: 73 01 c3 48 8b 0d 6d 0d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 0d 0c 00 f7 d8 64 89 01 48
 [ 938.027458] RSP: 002b:00007ffce1ea2658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 [ 938.028129] RAX: ffffffffffffffda RBX: 000055b5a4efb3b0 RCX: 00007fd22b93812b
 [ 938.028719] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b5a4efb418
 [ 938.029327] RBP: 000055b5a4efb3b0 R08: 1999999999999999 R09: 0000000000000000
 [ 938.029932] R10: 00007fd22b9acac0 R11: 0000000000000206 R12: 0000000000000000
 [ 938.030529] R13: 000055b5a4efb418 R14: 000055b5a4efe350 R15: 000055b5a4efb150
 [ 938.031134]
 [ 938.031388] ---[ end trace 0000000000000000 ]---

[Where problems could occur]

* Without this patch, it will see kernel panic info in dmesg

[Other Info]

* nothing

Herbert Xu (2):
  xfrm: Remove inner/outer modes from input path
  xfrm: Silence warnings triggerable by bad packets

Jianbo Liu (1):
  xfrm: Flush xfrm state synchronously on netdev close or unregister

Leon Romanovsky (2):
  xfrm: generalize xdo_dev_state_update_curlft to allow statistics
    update
  xfrm: get global statistics from the offloaded device

Lin Ma (4):
  net: af_key: fix sadb_x_filter validation
  net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
  xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH
  net: xfrm: Fix xfrm_address_filter OOB read

 Documentation/networking/xfrm_device.rst |  4 +-
 include/linux/netdevice.h                |  2 +-
 include/net/xfrm.h                       | 14 +++---
 net/key/af_key.c                         |  4 +-
 net/xfrm/xfrm_compat.c                   |  2 +-
 net/xfrm/xfrm_input.c                    | 78 +++++++++++---------------------
 net/xfrm/xfrm_proc.c                     |  1 +
 net/xfrm/xfrm_state.c                    | 19 ++++++--
 net/xfrm/xfrm_user.c                     | 14 +++++-
 9 files changed, 69 insertions(+), 69 deletions(-)

-- 
1.8.3.1




More information about the kernel-team mailing list