APPLIED: [SRU][F/gke][F/gkeop][PATCH v3 0/1] Avoid holding spinlock while blocking (LP #1921825)

Khaled Elmously khalid.elmously at canonical.com
Tue Apr 6 21:17:32 UTC 2021


I sent v3 to respond to Stefan's comments - but I see v2 already had 2 ACKs so I will apply the change based on those ACKs.

(v2 and v3 are the same code -- v3 just has an updated description)

Thanks for the reviews!



On 2021-04-06 17:13:45 , Khalid Elmously wrote:
> BugLink: https://bugs.launchpad.net/bugs/1921825
> 
> [Impact]
> Kernel panic during high iscsi activity
> 
> This stacktrace
> 
> [ 223.386958] BUG: scheduling while atomic: iscsiadm/18136/0x00000200
> [ 223.393390] Modules linked in: tcp_diag inet_diag xt_nat ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs sch_htb ebt_ip ebtable_filter ebtables veth xt_mark br_netfilter iptable_mangle xt_MASQUERADE xt_comment xt_addrtype iptable_nat binfmt_misc iptable_filter bpfilter xt_conntrack nf_nat bridge stp llc xfrm_user xfrm_algo aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper input_leds serio_raw sch_fq_codel sunrpc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_rng ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear psmouse virtio_net net_failover failover
> [ 223.393429] CPU: 6 PID: 18136 Comm: iscsiadm Kdump: loaded Not tainted 5.4.0-1033-gke #35~18.04.1-Ubuntu
> [ 223.393430] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [ 223.393430] Call Trace:
> [ 223.393439] dump_stack+0x6d/0x95
> [ 223.393464] __schedule_bug+0x55/0x70
> [ 223.393467] __schedule+0x61b/0x710
> [ 223.393469] schedule+0x33/0xa0
> [ 223.393472] __lock_sock+0x7d/0xc0
> [ 223.393475] ? wait_woken+0x80/0x80
> [ 223.393477] lock_sock_nested+0x64/0x70
> [ 223.393479] inet_getname+0xaa/0xe0
> [ 223.393482] kernel_getpeername+0x1b/0x20
> [ 223.393485] iscsi_sw_tcp_conn_get_param+0xa6/0x110 [iscsi_tcp]
> [ 223.393494] show_conn_ep_param_ISCSI_PARAM_CONN_ADDRESS+0x7e/0xa0 [scsi_transport_iscsi]
> [ 223.393496] dev_attr_show+0x1d/0x50
> [ 223.393499] sysfs_kf_seq_show+0xa1/0x110
> [ 223.393502] kernfs_seq_show+0x27/0x30
> [ 223.393504] seq_read+0xda/0x420
> [ 223.393506] kernfs_fop_read+0x141/0x1a0
> [ 223.393510] __vfs_read+0x1b/0x40
> [ 223.393512] vfs_read+0x8e/0x130
> [ 223.393513] ksys_read+0xa7/0xe0
> [ 223.393515] __x64_sys_read+0x1a/0x20
> [ 223.393518] do_syscall_64+0x57/0x190
> [ 223.393521] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 223.393523] RIP: 0033:0x7f45793ce910
> [ 223.393525] Code: b6 fe ff ff 48 8d 3d 0f be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d f9 2d 2c 00 00 75 10 b8 00 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 de 9b 01 00 48 89 04 24
> [ 223.393526] RSP: 002b:00007ffd9fa13688 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [ 223.393527] RAX: ffffffffffffffda RBX: 00007ffd9fa13820 RCX: 00007f45793ce910
> [ 223.393528] RDX: 0000000000000100 RSI: 00007ffd9fa13720 RDI: 0000000000000003
> [ 223.393528] RBP: 00007ffd9fa13720 R08: 0000000000000000 R09: 0000000000000000
> [ 223.393529] R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000003
> [ 223.393530] R13: 00007ffd9fa13c60 R14: 0000555b0d613708 R15: 0000555b0d613300
> [ 223.393581] sd 1:0:0:0: [sdb] Write Protect is off
> [ 223.393583] sd 1:0:0:0: [sdb] Mode Sense: 43 00 10 08
> [ 223.393660] iscsiadm[18136]: segfault at 7ffd9fa12e58 ip 0000555b0ccd95af sp 00007ffd9fa12e60 error 6 in iscsiadm[555b0ccb4000+58000]
> [ 223.393666] Code: ba 00 02 00 00 48 81 ec 10 04 00 00 48 89 e7 48 8d 9c 24 00 02 00 00 64 48 8b 04 25 28 00 00 00 48 89 84 24 08 04 00 00 31 c0 <e8> 3c ed 00 00 ba 00 02 00 00 4c 89 ee 48 89 e7 e8 6c ed 00 00 ba
> [ 223.394992] sd 1:0:0:0: alua: transition timeout set to 60 seconds
> [ 223.394997] sd 1:0:0:0: alua: port group 02 state N non-preferred supports TOlUSNA
> [ 223.395018] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
> [ 223.395435] Kernel panic - not syncing: Aiee, killing interrupt handler!
> [ 223.396802] sd 1:0:0:0: [sdb] Optimal transfer size 262144 bytes
> [ 223.402387] CPU: 6 PID: 18136 Comm: iscsiadm Kdump: loaded Tainted: G W 5.4.0-1033-gke #35~18.04.1-Ubuntu
> [ 223.402388] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [ 223.402389] Call Trace:
> [ 223.402395] dump_stack+0x6d/0x95
> [ 223.402398] panic+0xfe/0x2e4
> [ 223.402400] do_exit+0x899/0xb90
> [ 223.402402] do_group_exit+0x43/0xa0
> [ 223.402406] get_signal+0x14f/0x860
> [ 223.402409] do_signal+0x34/0x6d0
> [ 223.402414] ? __bad_area_nosemaphore+0x149/0x1f0
> [ 223.457597] exit_to_usermode_loop+0x8e/0x100
> [ 223.462090] prepare_exit_to_usermode+0x91/0xa0
> [ 223.466782] retint_user+0x8/0x8
> [ 223.470131] RIP: 0033:0x555b0ccd95af
> [ 223.473827] Code: ba 00 02 00 00 48 81 ec 10 04 00 00 48 89 e7 48 8d 9c 24 00 02 00 00 64 48 8b 04 25 28 00 00 00 48 89 84 24 08 04 00 00 31 c0 <e8> 3c ed 00 00 ba 00 02 00 00 4c 89 ee 48 89 e7 e8 6c ed 00 00 ba
> [ 223.492991] RSP: 002b:00007ffd9fa12e60 EFLAGS: 00010246
> [ 223.498333] RAX: 0000000000000000 RBX: 00007ffd9fa13060 RCX: 00007f45793ce335
> [ 223.505579] RDX: 0000000000000200 RSI: 0000555b0cf146a0 RDI: 00007ffd9fa12e60
> [ 223.513025] RBP: 00007ffd9fa13ecc R08: 0000000000000000 R09: 0000000080808000
> [ 223.520268] R10: 0000000000000075 R11: 0000000000000246 R12: 00007ffd9fa13540
> [ 223.527521] R13: 00007ffd9fa13540 R14: 0000000000000200 R15: 0000555b0d613300
> 
> Which happens during high iscsi activity
> 
> This issue is also identified in linux-5.8, reported here ( https://lkml.org/lkml/2020/7/28/1085 ) for example. It affects the gke-5.4 kernel specifically because gke-5.4 has backported '1b66d253610c7 ("bpf: Add get{peer, sock}name attach types for sock_addr")' which introduces the issue. (The 5.4 gcp kernel didn't get an updated ebpf so is not affected by the bug)
> 
> [Fix]
> 
> The fix is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bcf3a2953d36bbfb9bd44ccb3db0897d935cc485 from 5.9
> 
> [Test]
> Affected customer has reported that they can no longer reproduce the problem with this fix applied. They were readily reproducing the crash without it.
> 
> [Regression potential]
> I'm not aware of any. The change seems safe and reasonable. It is accepted in mainline and backported to the stable kernels too. It is present in groovy 5.8 as of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898853
> 
> 
> 
> v2: 
>  Updated thread subject
> 
> v3: 
>  Updated description to remove (wrong) references to gcp-5.4. This bug only affects gke-5.4
> 
> 
> Mark Mielke (1):
>   scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling
>     getpeername()
> 
>  drivers/scsi/iscsi_tcp.c | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> -- 
> 2.17.1
> 



More information about the kernel-team mailing list