NAK: [PATCH 0/1 v2][SRU][jammy:linux-azure, kinetic:linux-azure]
Tim Gardner
tim.gardner at canonical.com
Mon Dec 5 17:49:09 UTC 2022
For crying out loud....
v3 on the way
On 12/5/22 10:46 AM, Tim Gardner wrote:
> v2 - use correct subject.
>
> BugLink: https://bugs.launchpad.net/bugs/1998838
>
> SRU Justification
>
> [Impact]
>
> Hello Canonical Team,
>
> This issue was found while doing the validation on CPC's Jammy CVM image.
>
> While running fio, the command fails to exit after 2 minutes. I watched `top` as the command hung and I saw kworkers getting blocked.
>
> sudo fio --ioengine=libaio --bs=4K --filename=/dev/sdc1:/dev/sdd1:/dev/sde1:/dev/sdf1:/dev/sdg1:/dev/sdh1:/dev/sdi1:/dev/sdj1:/dev/sdk1:/dev/sdl1:/dev/sdm1:/dev/sdn1:/dev/sdo1:/dev/sdp1:/dev/sdq1:/dev/sdr1 --readwrite=randwrite --runtime=120 --iodepth=1 --numjob=96 --name=iteration9 --direct=1 --size=8192M --group_reporting --overwrite=1
>
> Example system logs:
> ---------------------------------------------------------------------------------------------------------------
> [ 1096.297641] INFO: task kworker/u192:0:8 blocked for more than 120 seconds.
> [ 1096.302785] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
> [ 1096.306312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1096.310489] INFO: task jbd2/sda1-8:1113 blocked for more than 120 seconds.
> [ 1096.313900] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
> [ 1096.317481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1096.324117] INFO: task systemd-journal:1191 blocked for more than 120 seconds.
> [ 1096.331219] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
> [ 1096.335332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ---------------------------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------------------------
> [ 3241.013230] systemd-udevd[1221]: sdl1: Worker [6686] processing SEQNUM=13323 killed
> [ 3261.492691] systemd-udevd[1221]: sdl1: Worker [6686] failed
> ---------------------------------------------------------------------------------------------------------------
>
> TOP report:
> ---------------------------------------------------------------------------------------------------------------
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 417 root 20 0 0 0 0 R 66.2 0.0 0:34.61 ksoftirqd/59
> 435 root 20 0 0 0 0 I 24.5 0.0 0:09.03 kworker/59:1-mm_percpu_wq
> 416 root rt 0 0 0 0 S 23.5 0.0 0:01.86 migration/59
> 366 root 0 -20 0 0 0 I 19.2 0.0 0:16.64 kworker/49:1H-kblockd
> 378 root 0 -20 0 0 0 I 17.9 0.0 0:15.71 kworker/51:1H-kblockd
> 455 root 0 -20 0 0 0 I 17.9 0.0 0:14.76 kworker/62:1H-kblockd
> 135 root 0 -20 0 0 0 I 17.5 0.0 0:13.08 kworker/17:1H-kblockd
> 420 root 0 -20 0 0 0 I 16.9 0.0 0:14.63 kworker/58:1H-kblockd
> ...
> ---------------------------------------------------------------------------------------------------------------
>
> LISAv3 Testcase: perf_premium_datadisks_4k
> Image : "canonical-test 0001-com-ubuntu-confidential-vm-jammy-preview 22_04-lts-cvm latest"
> VMSize : "Standard_DC96ads_v5"
>
> For repro-ability, I am seeing this every time I run the storage perf tests. It always seems to happen on iteration
> 9 or 10. When running manually, I had to run the command three or four times to reproduce the issue.
>
> [Test Case]
>
> Microsoft tested, requires lots of cores (96) and disks (16)
>
> [Where things could go wrong]
>
> swiotlb buffers could be double freed.
>
> [Other Info]
>
> SF: #00349781
>
--
-----------
Tim Gardner
Canonical, Inc
More information about the kernel-team
mailing list