[SRU][K][PATCH 0/8] sbitmap: fix IO hung due to waiters not

Tue Jun 6 01:25:23 UTC 2023

Thanks Andrei for the review and the details, I will adjust the patches and
resubmit the SRU

Regards,
Gerald

On Mon, Jun 5, 2023 at 5:41 PM Andrei Gherzan <andrei.gherzan at canonical.com>
wrote:

> On 23/06/05 01:45PM, Gerald Yang wrote:
> > BugLink: https://bugs.launchpad.net/bugs/2022318
> >
> > SRU Justification:
> >
> > [ Impact ]
> >
> > When running fio on a NVME on an AWS test instance with 5.19 kernel, IOs
> get stuck and fio never ends
> >
> > fio command:
> > sudo fio --name=read_iops_test --filename=/dev/nvme1n1 --filesize=50G
> --time_based --ramp_time=2s --runtime=1m --ioengine=libaio --direct=1
> --verify=0 --randrepeat=0 --bs=16K --iodepth=256 --rw=randread
> > read_iops_test: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W)
> 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=256
> > fio-3.28
> > Starting 1 process
> > Jobs: 1 (f=0): [/(1)][-.-%][eta 01m:02s]
> >
> > IOs completely get stuck, after a while kernel log shows:
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.230970] INFO: task
> fio:2545 blocked for more than 120 seconds.
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.232878] Not tainted
> 5.19.0-43-generic #44~22.04.1-Ubuntu
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.234738] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237053] task:fio state:D
> stack: 0 pid: 2545 ppid: 2495 flags:0x00000002
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237057] Call Trace:
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237058] <TASK>
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237061]
> __schedule+0x257/0x5d0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237066] schedule+0x68/0x110
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237068]
> io_schedule+0x46/0x80
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237069]
> blk_mq_get_tag+0x117/0x300
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237072] ?
> destroy_sched_domains_rcu+0x40/0x40
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237076]
> __blk_mq_alloc_requests+0xc4/0x1e0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237079]
> blk_mq_get_new_requests+0xf6/0x1a0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237080]
> blk_mq_submit_bio+0x1eb/0x440
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237082]
> __submit_bio+0x109/0x1a0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237085]
> submit_bio_noacct_nocheck+0xc2/0x120
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237087]
> submit_bio_noacct+0x209/0x590
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237088]
> submit_bio+0x40/0xf0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237090]
> __blkdev_direct_IO_async+0x146/0x1f0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237092]
> blkdev_direct_IO.part.0+0x40/0xa0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237093]
> blkdev_read_iter+0x9f/0x1a0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237094] aio_read+0xec/0x1d0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237097] ?
> __io_submit_one.constprop.0+0x113/0x200
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237099]
> __io_submit_one.constprop.0+0x113/0x200
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237100] ?
> __io_submit_one.constprop.0+0x113/0x200
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237101]
> io_submit_one+0xe8/0x3d0
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237103]
> __x64_sys_io_submit+0x84/0x190
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237104] ?
> do_syscall_64+0x69/0x90
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237106] ?
> do_syscall_64+0x69/0x90
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237107]
> do_syscall_64+0x59/0x90
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237108] ?
> syscall_exit_to_user_mode+0x2a/0x50
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237110] ?
> do_syscall_64+0x69/0x90
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237111]
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237113] RIP:
> 0033:0x7f44f351ea3d
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237116] RSP:
> 002b:00007fff1dcfe558 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237117] RAX:
> ffffffffffffffda RBX: 00007f44f2272b68 RCX: 00007f44f351ea3d
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237118] RDX:
> 000056315d9ad828 RSI: 0000000000000001 RDI: 00007f44f224f000
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237119] RBP:
> 00007f44f224f000 R08: 00007f44e9430000 R09: 00000000000002d8
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R10:
> 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237120] R13:
> 0000000000000000 R14: 000056315d9ad828 R15: 000056315d9e1830
> > Jun 1 03:57:52 ip-172-31-39-141 kernel: [ 370.237122] </TASK>
> >
> > This issue can not be reproduced on 5.15 and 6.2 kernels
> >
> > From call trace, it got stuck more than 120 seconds waiting on previous
> IOs to complete for freeing some tags, so new IO requests are able to
> obtain tags
> > But in fact, not all previous IOs got stuck, at least some of previous
> IOs should have completed, but the waiters were not waken up
> >
> > This issue is fixed by the upstream commit below which has been merged
> in kernel 6.1
> > commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e
> > Author: Keith Busch <kbusch at kernel.org>
> > Date: Fri Sep 9 11:40:22 2022 -0700
> >
> >     sbitmap: fix batched wait_cnt accounting
> >
> >     Batched completions can clear multiple bits, but we're only
> decrementing
> >     the wait_cnt by one each time. This can cause waiters to never be
> woken,
> >     stalling IO. Use the batched count instead.
> >
> >     Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
> >     Signed-off-by: Keith Busch <kbusch at kernel.org>
> >     Link:
> https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com
> >     Signed-off-by: Jens Axboe <axboe at kernel.dk>
> >
> > To SRU the commit to 5.19 kernel, we also need to SRU all its
> dependencies and one further fix:
> >
>
> All of the commits, as you mention above, have landed upstream (6.1). As
> such, I would have expected the relevant footers, including your
> Signed-off-by:
>
> "(cherry picked from" ... or "(backported from ..."
> Signed-off-by: ...
>
> > 30514bd2dd4e sbitmap: fix lockup while swapping
>
> This was a clean cherry-pick from
> 30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0.
>
> > 4acb83417cad sbitmap: fix batched wait_cnt accounting
>
> Clean cherry-pick:
> 4acb83417cadfdcbe64215f9d0ddcf3132af808e.
>
> > c35227d4e8cb sbitmap: Use atomic_long_try_cmpxchg in
> __sbitmap_queue_get_batch
>
> Clean cherry-pick:
> c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa.
>
> > 48c033314f37 sbitmap: Avoid leaving waitqueue in invalid state in
> __sbq_wake_up()
>
> Clean cherry-pick:
> 48c033314f372478548203c583529f53080fd078.
>
> > bce1b56c7382 Revert "sbitmap: fix batched wait_cnt accounting"
>
> I don't understand the meaning of this revert. You are adding the
> commit and reverting it later. That revert is also upstream
> (bce1b56c73826fec8caf6187f0c922ede397a5a8) but is
> this "dance" needed for this submission?
>
> > 16ede66973c8 sbitmap: fix batched wait_cnt accounting
>
> 4acb83417cadfdcbe64215f9d0ddcf3132af808e doesn't apply cleanly. Your
> submission does so this should have been mentioned as a backport with
> some details on its implications.
>
> > ddbfc34fcf5d sbitmap: remove unnecessary code in
> __sbitmap_queue_get_batch
>
> Clean cherry-pick:
> ddbfc34fcf5d0bc33b006b90c580c56edeb31068.
>
> > 040b83fcecfb sbitmap: fix possible io hung due to lost wakeup
>
> Clean cherry-pick:
> 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4.
>
> >
> > [ Test Plan ]
> >
> > This can be simply reproduced by launching an instance on AWS EC2
> > And run the fio command on a nvme device for few hours to make sure IOs
> don’t get stuck
> >
> > I’ve built a test kernel with above commits on 5.19.0-43 generic kernel
> here:
> > https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/361041-generic
> >
> > With this test kernel, fio has been running for few hours without any
> issue
> >
> > [ Where problems could occur ]
> >
> > The sbitmap is mainly used in blk-mq in block layer, scsi drivers and
> fungible ethernet driver
> > If there is any issue happens in sbitmap, the symptom should be IO hung,
> or packets get stuck in fungible driver
> >
> > Hugh Dickins (1):
> >   sbitmap: fix lockup while swapping
> >
> > Jan Kara (1):
> >   sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()
> >
> > Jens Axboe (1):
> >   Revert "sbitmap: fix batched wait_cnt accounting"
> >
> > Keith Busch (2):
> >   sbitmap: fix batched wait_cnt accounting
> >   sbitmap: fix batched wait_cnt accounting
> >
> > Liu Song (1):
> >   sbitmap: remove unnecessary code in __sbitmap_queue_get_batch
> >
> > Uros Bizjak (1):
> >   sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch
> >
> > Yu Kuai (1):
> >   sbitmap: fix possible io hung due to lost wakeup
> >
> >  block/blk-mq-tag.c      |   2 +-
> >  include/linux/sbitmap.h |   3 +-
> >  lib/sbitmap.c           | 109 ++++++++++++++++++++++++++--------------
> >  3 files changed, 73 insertions(+), 41 deletions(-)
>
> I'll avoid NACK-ing for now as we could fix all these when we apply but
> I would resubmit.
>
> --
> Andrei Gherzan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20230606/6dc4e85f/attachment-0001.html>