APPLIED: [SRU][Bionic][PATCH 0/1] Initialize update_stat_work for ipoib devices
Kleber Souza
kleber.souza at canonical.com
Thu Nov 26 15:09:37 UTC 2020
On 25.11.20 19:07, Ian May wrote:
> BugLink: https://launchpad.net/bugs/1904848
>
> SRU Justification:
>
> [Impact]
> unloading ib_ipoib causes a call trace to be logged in kernel buffer.
>
> bisecting the bionic kernel reveals that this issue was discovered by
> 616e695435e3 workqueue: Try to catch flush_work() without INIT_WORK()
> in version 4.15.0-59.66
>
> [test case]
>
> # modprobe ib_ipoib
> # modprobe ib_ipoib -r
> # dmesg
> [ 306.277717] ------------[ cut here ]------------
> [ 306.277738] WARNING: CPU: 10 PID: 2148 at /build/linux-RJNBJC/linux-4.15.0/kernel/workqueue.c:2906 __flush_work+0x1f8/0x210
> [ 306.277739] Modules linked in: nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc binfmt_misc intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp rpcrdma rdma_ucm ib_umad ib_uverbs coretemp ib_iser rdma_cm kvm_intel kvm iw_cm irqbypass ib_ipoib(-) libiscsi scsi_transport_iscsi ib_cm joydev input_leds crct10dif_pclmul crc32_pclmul mgag200 ttm drm_kms_helper drm hpilo ghash_clmulni_intel pcbc i2c_algo_bit ipmi_ssif fb_sys_fops syscopyarea sysfillrect sysimgblt aesni_intel aes_x86_64 crypto_simd ioatdma glue_helper shpchp cryptd dca intel_cstate intel_rapl_perf
> [ 306.277790] serio_raw acpi_power_meter lpc_ich mac_hid ipmi_si ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables autofs4 mlx5_ib mlx4_ib mlx4_en ib_core hid_generic psmouse mlx5_core usbhid hid pata_acpi hpsa tg3 mlxfw mlx4_core scsi_transport_sas ptp pps_core devlink
> [ 306.277817] CPU: 10 PID: 2148 Comm: modprobe Not tainted 4.15.0-124-generic #127-Ubuntu
> [ 306.277818] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
> [ 306.277823] RIP: 0010:__flush_work+0x1f8/0x210
> [ 306.277825] RSP: 0018:ffffbdeb47ecfcd8 EFLAGS: 00010286
> [ 306.277827] RAX: 0000000000000024 RBX: ffff993a5c3d8ec8 RCX: 0000000000000006
> [ 306.277829] RDX: 0000000000000000 RSI: ffff99429ef16498 RDI: ffff99429ef16490
> [ 306.277830] RBP: ffffbdeb47ecfd48 R08: 000000000000050d R09: 0000000000000004
> [ 306.277832] R10: ffffe263a058c1c0 R11: 0000000000000001 R12: ffff993a5c3d8ec8
> [ 306.277833] R13: 0000000000000001 R14: ffffbdeb47ecfd78 R15: ffffffffb00a9800
> [ 306.277835] FS: 00007fa1124a9540(0000) GS:ffff99429ef00000(0000) knlGS:0000000000000000
> [ 306.277837] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 306.277839] CR2: 000055b1c5007bb0 CR3: 0000000fcf36c002 CR4: 00000000001606e0
> [ 306.277840] Call Trace:
> [ 306.277850] __cancel_work_timer+0x136/0x1b0
> [ 306.277881] ? mlx5_core_destroy_qp+0x99/0xd0 [mlx5_core]
> [ 306.277886] cancel_delayed_work_sync+0x13/0x20
> [ 306.277909] mlx5e_detach_netdev+0x83/0x90 [mlx5_core]
> [ 306.277931] mlx5_rdma_netdev_free+0x30/0x80 [mlx5_core]
> [ 306.277941] mlx5_ib_free_rdma_netdev+0xe/0x10 [mlx5_ib]
> [ 306.277948] ipoib_remove_one+0xe4/0x180 [ib_ipoib]
> [ 306.277965] ib_unregister_client+0x171/0x1e0 [ib_core]
> [ 306.277972] ipoib_cleanup_module+0x15/0x2f [ib_ipoib]
> [ 306.277978] SyS_delete_module+0x1ab/0x2d0
> [ 306.277983] do_syscall_64+0x73/0x130
> [ 306.277989] entry_SYSCALL_64_after_hwframe+0x41/0xa6
> [ 306.277992] RIP: 0033:0x7fa111fc1047
> [ 306.277993] RSP: 002b:00007ffc0db32298 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [ 306.277996] RAX: ffffffffffffffda RBX: 00005614be46cca0 RCX: 00007fa111fc1047
> [ 306.277997] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005614be46cd08
> [ 306.277999] RBP: 00005614be46cca0 R08: 00007ffc0db31241 R09: 0000000000000000
> [ 306.278000] R10: 00007fa11203dc40 R11: 0000000000000206 R12: 00005614be46cd08
> [ 306.278002] R13: 0000000000000001 R14: 00005614be46cd08 R15: 00007ffc0db33680
> [ 306.278004] Code: 24 03 80 c9 f0 e9 5b ff ff ff 48 c7 c7 18 50 0b b1 e8 ed 66 04 00 0f 0b 31 c0 e9 75 ff ff ff 48 c7 c7 18 50 0b b1 e8 d8 66 04 00 <0f> 0b 31 c0 e9 60 ff ff ff e8 5a 35 fe ff 66 2e 0f 1f 84 00 00
> [ 306.278035] ---[ end trace 652f7759937172a2 ]---
> [ 306.646061] ------------[ cut here ]------------
> [ 306.646077] WARNING: CPU: 6 PID: 2148 at /build/linux-RJNBJC/linux-4.15.0/kernel/workqueue.c:2906 __flush_work+0x1f8/0x210
> [ 306.646078] Modules linked in: nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc binfmt_misc intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp rpcrdma rdma_ucm ib_umad ib_uverbs coretemp ib_iser rdma_cm kvm_intel kvm iw_cm irqbypass ib_ipoib(-) libiscsi scsi_transport_iscsi ib_cm joydev input_leds crct10dif_pclmul crc32_pclmul mgag200 ttm drm_kms_helper drm hpilo ghash_clmulni_intel pcbc i2c_algo_bit ipmi_ssif fb_sys_fops syscopyarea sysfillrect sysimgblt aesni_intel aes_x86_64 crypto_simd ioatdma glue_helper shpchp cryptd dca intel_cstate intel_rapl_perf
> [ 306.646123] serio_raw acpi_power_meter lpc_ich mac_hid ipmi_si ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables autofs4 mlx5_ib mlx4_ib mlx4_en ib_core hid_generic psmouse mlx5_core usbhid hid pata_acpi hpsa tg3 mlxfw mlx4_core scsi_transport_sas ptp pps_core devlink
> [ 306.646146] CPU: 6 PID: 2148 Comm: modprobe Tainted: G W 4.15.0-124-generic #127-Ubuntu
> [ 306.646148] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
> [ 306.646152] RIP: 0010:__flush_work+0x1f8/0x210
> [ 306.646154] RSP: 0018:ffffbdeb47ecfcd8 EFLAGS: 00010286
> [ 306.646156] RAX: 0000000000000024 RBX: ffff9942970b8ec8 RCX: 0000000000000006
> [ 306.646158] RDX: 0000000000000000 RSI: ffff99429ee16498 RDI: ffff99429ee16490
> [ 306.646159] RBP: ffffbdeb47ecfd48 R08: 0000000000000533 R09: 0000000000000004
> [ 306.646161] R10: ffffe2639fa66740 R11: 0000000000000001 R12: ffff9942970b8ec8
> [ 306.646162] R13: 0000000000000001 R14: ffffbdeb47ecfd78 R15: ffffffffb00a9800
> [ 306.646164] FS: 00007fa1124a9540(0000) GS:ffff99429ee00000(0000) knlGS:0000000000000000
> [ 306.646166] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 306.646167] CR2: 000055dd889e4a30 CR3: 0000000fcf36c006 CR4: 00000000001606e0
> [ 306.646169] Call Trace:
> [ 306.646177] __cancel_work_timer+0x136/0x1b0
> [ 306.646205] ? mlx5_core_destroy_qp+0x99/0xd0 [mlx5_core]
> [ 306.646210] cancel_delayed_work_sync+0x13/0x20
> [ 306.646233] mlx5e_detach_netdev+0x83/0x90 [mlx5_core]
> [ 306.646255] mlx5_rdma_netdev_free+0x30/0x80 [mlx5_core]
> [ 306.646264] mlx5_ib_free_rdma_netdev+0xe/0x10 [mlx5_ib]
> [ 306.646271] ipoib_remove_one+0xe4/0x180 [ib_ipoib]
> [ 306.646287] ib_unregister_client+0x171/0x1e0 [ib_core]
> [ 306.646295] ipoib_cleanup_module+0x15/0x2f [ib_ipoib]
> [ 306.646300] SyS_delete_module+0x1ab/0x2d0
> [ 306.646305] do_syscall_64+0x73/0x130
> [ 306.646310] entry_SYSCALL_64_after_hwframe+0x41/0xa6
> [ 306.646313] RIP: 0033:0x7fa111fc1047
> [ 306.646314] RSP: 002b:00007ffc0db32298 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [ 306.646317] RAX: ffffffffffffffda RBX: 00005614be46cca0 RCX: 00007fa111fc1047
> [ 306.646318] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005614be46cd08
> [ 306.646319] RBP: 00005614be46cca0 R08: 00007ffc0db31241 R09: 0000000000000000
> [ 306.646321] R10: 00007fa11203dc40 R11: 0000000000000206 R12: 00005614be46cd08
> [ 306.646322] R13: 0000000000000001 R14: 00005614be46cd08 R15: 00007ffc0db33680
> [ 306.646325] Code: 24 03 80 c9 f0 e9 5b ff ff ff 48 c7 c7 18 50 0b b1 e8 ed 66 04 00 0f 0b 31 c0 e9 75 ff ff ff 48 c7 c7 18 50 0b b1 e8 d8 66 04 00 <0f> 0b 31 c0 e9 60 ff ff ff e8 5a 35 fe ff 66 2e 0f 1f 84 00 00
> [ 306.646355] ---[ end trace 652f7759937172a3 ]---
>
> [Fix]
> the root cause for this error is canceling uninitialized delayed_work_queue belongs to ipoib net devices and the solution is not failing to initialize it.
> this solution is specified in the very small patched (one line) attached.
> please note that this patch is not upstream and it is based on the following upstream commits which introduced similar functionality to upstream v4.20-rc1.
>
> 303211b44ce3 net/mlx5e: Always initialize update stats delayed work
> 182570b26223 net/mlx5e: Gather common netdev init/cleanup functionality in one place
>
> applying this two on the bionic tree in a clean way requires more patches that might introduce a large change so I think it's better (if possible) to use the attached patch.
>
> [Regression Potential]
> Regression risk is low since it's introduce a small fix that was also accepted upstream in v4.20.
>
> Amir Tzin (1):
> UBUNTU: SAUCE: net/mlx5e: IPoIB, initialize update_stat_work for ipoib
> devices
>
> drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 1 +
> 1 file changed, 1 insertion(+)
>
Applied to bionic/linux.
Thanks,
Kleber
More information about the kernel-team
mailing list