[SRU][F][PATCH 1/1] io_uring: ensure IOPOLL locks around deferred work

Chengen Du chengen.du at canonical.com
Thu Nov 21 07:21:56 UTC 2024


Hi Guoqing,

Thank you for your response.

Before submitting this patch, I analyzed the level of effort required
to make our patch align more closely with SUSE’s implementation.
I understand that narrowing the lock range would be beneficial, but
there are several considerations to take into account.

Regarding commit 810e401b34c4 (io_uring: ensure IOPOLL locks around
deferred work), it uses the io_commit_needs_flush function to
determine whether to invoke __io_commit_cqring_flush.
In the Focal version, we lack the attributes used in the
io_commit_needs_flush logic.
Additionally, the __io_commit_cqring function in Focal includes tasks
beyond simply updating rings->cq.tail.
This raises uncertainty about the tangible benefits of separating the
logic in our context.

Moreover, the commits you've referenced cannot be backported unchanged.
We would need to investigate them in detail to prevent regressions.
The execution order within the io_commit_cqring function differs as
well—__io_commit_cqring is invoked in the middle of the flow instead
of at the head or tail.
To narrow the lock range as SUSE has done, we would need to backport
other commits unrelated to the CVE to adjust the flow accordingly.

Another factor to consider is that the liburing package, which
provides user-space helpers for kernel io_uring functionality, is not
available in Focal.
The CVE in question has also existed for some time in Focal.
Therefore, we need to weigh the necessity of backporting multiple
unrelated commits to address a CVE that may not significantly impact
many users, especially given the risk of introducing regressions to
io_uring functionality.

These are my initial thoughts, though I recognize there may be aspects
I have overlooked.
If you believe it is worthwhile to proceed with separating the logic,
I am happy to focus on that approach.

Best regards,
Chengen Du

On Tue, Nov 19, 2024 at 4:31 PM Guoqing Jiang
<guoqing.jiang at canonical.com> wrote:
>
> Hi,
>
> On 11/8/24 10:49, Chengen Du wrote:
> > From: Jens Axboe <axboe at kernel.dk>
> >
> > CVE-2023-21400
> >
> > BugLink: https://bugs.launchpad.net/bugs/2078659
> >
> > No direct upstream commit exists for this issue. It was fixed in
> > 5.18 as part of a larger rework of the completion side.
> >
> > io_commit_cqring() writes the CQ ring tail to make it visible, but it
> > also kicks off any deferred work we have. A ring setup with IOPOLL
> > does not need any locking around the CQ ring updates, as we're always
> > under the ctx uring_lock. But if we have deferred work that needs
> > processing, then io_queue_deferred() assumes that the completion_lock
> > is held, as it is for !IOPOLL.
> >
> > Add a lockdep assertion to check and document this fact, and have
> > io_iopoll_complete() check if we have deferred work and run that
> > separately with the appropriate lock grabbed.
>
> IIUC, the change in this patch doesn't align with the above description.
>
> > Cc: stable at vger.kernel.org # 5.10, 5.15
> > Reported-by: dghost david <daviduniverse18 at gmail.com>
> > Signed-off-by: Jens Axboe <axboe at kernel.dk>
> > Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
> > (backported from commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 linux-5.10.y)
> > [chengen - apply the locking logic to the corresponding functions /
> > use spin_lock_irq because the same lock is used in the timer callback
> > function.]
> > Signed-off-by: Chengen Du <chengen.du at canonical.com>
> > ---
> >   fs/io_uring.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 875dd8e0f766..aa75c32f2d16 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
> >   {
> >       struct io_rings *rings = ctx->rings;
> >
> > +     lockdep_assert_held(&ctx->completion_lock);
> > +
> >       if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
> >               /* order cqe stores with ring update */
> >               smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
> > @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
> >               }
> >       }
> >
> > +     spin_lock_irq(&ctx->completion_lock);
> >       io_commit_cqring(ctx);
> > +     spin_unlock_irq(&ctx->completion_lock);
> >       io_free_req_many(ctx, reqs, &to_free);
> >   }
>
> For focal, io_commit_cqring was implemented between L577 [1] and L595
> [2]. I think the part
> from L581 to L582 have been changed to io_flush_timeouts() per commit
> 360428f8c0cd
> ("io_uring: move timeouts flushing to a helper").  And
> io_queue_deferred() was based on the
> code between L586 and L594 per these commits:
>
> 045189452210 io_uring: separate DRAIN flushing into a cold path
> 87987898a1db io_uring: remove REQ_F_IO_DRAINED
> 1b4a51b6d03d io_uring: drain next sqe instead of shadowing
>
> Personally, I'd more prefer similar change as in SUSE's fix instead of
> lock the whole io_commit_cqring function.
> [1].
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/fs/io_uring.c?h=master-next#n577
> [2].
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/fs/io_uring.c?h=master-next#n595
>
> Thanks,
> Guoqing



More information about the kernel-team mailing list