[SRU][F][PATCH 1/1] io_uring: ensure IOPOLL locks around deferred work

Fri Nov 8 02:53:33 UTC 2024

Hi,

I’d like to provide additional explanations for this patch, as it
differs significantly from both the upstream version and SUSE’s fix
for 5.3 [1].

The CVE affected kernel 5.4 in a unique way.
When io_uring is set up with the IORING_SETUP_IOPOLL flag and operates
on a filesystem implementing the iopoll function hook, the
io_uring_enter system call will use io_iopoll_check to retrieve
completed events.
If completed events are present, the io_iopoll_complete function is
called, which in turn calls io_commit_cqring without holding
completion_lock.

The io_commit_cqring function extracts entries from defer_list for processing.
The defer_list is appended to when a request is submitted with the
IOSQE_IO_DRAIN flag.
However, removing entries from defer_list without a lock risks
breaking the list structure.

To address this issue, completion_lock should be held before calling
io_commit_cqring.
It’s recommended to hold completion_lock using spin_lock_irq, as it’s
also used in the timer callback function (i.e., io_timeout_fn).
If I understand correctly, spin_lock_irqsave isn’t necessary here,
since io_iopoll_complete function isn’t used in scenarios where IRQs
may be masked.

io_iopoll_complete <- io_do_iopoll <- io_iopoll_getevents
    <- io_sq_thread
    <- io_iopoll_check <- io_uring_enter syscall
    <- io_iopoll_reap_events
        <- io_ring_ctx_free <- io_ring_ctx_wait_and_kill
        <- io_ring_ctx_wait_and_kill
            <- io_uring_create <- io_uring_setup <- io_uring_setup syscall
            <- io_uring_release <- io_uring_fops release function hook

Please let me know if there's anything I may have overlooked.

Best regards,
Chengen Du

[1] https://github.com/SUSE/kernel/commit/a16c5d2daa1cf45f8694ff72bd2665d474a763a7

On Fri, Nov 8, 2024 at 10:50 AM Chengen Du <chengen.du at canonical.com> wrote:
>
> From: Jens Axboe <axboe at kernel.dk>
>
> CVE-2023-21400
>
> BugLink: https://bugs.launchpad.net/bugs/2078659
>
> No direct upstream commit exists for this issue. It was fixed in
> 5.18 as part of a larger rework of the completion side.
>
> io_commit_cqring() writes the CQ ring tail to make it visible, but it
> also kicks off any deferred work we have. A ring setup with IOPOLL
> does not need any locking around the CQ ring updates, as we're always
> under the ctx uring_lock. But if we have deferred work that needs
> processing, then io_queue_deferred() assumes that the completion_lock
> is held, as it is for !IOPOLL.
>
> Add a lockdep assertion to check and document this fact, and have
> io_iopoll_complete() check if we have deferred work and run that
> separately with the appropriate lock grabbed.
>
> Cc: stable at vger.kernel.org # 5.10, 5.15
> Reported-by: dghost david <daviduniverse18 at gmail.com>
> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
> (backported from commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 linux-5.10.y)
> [chengen - apply the locking logic to the corresponding functions /
> use spin_lock_irq because the same lock is used in the timer callback
> function.]
> Signed-off-by: Chengen Du <chengen.du at canonical.com>
> ---
>  fs/io_uring.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 875dd8e0f766..aa75c32f2d16 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
>  {
>         struct io_rings *rings = ctx->rings;
>
> +       lockdep_assert_held(&ctx->completion_lock);
> +
>         if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
>                 /* order cqe stores with ring update */
>                 smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
> @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>                 }
>         }
>
> +       spin_lock_irq(&ctx->completion_lock);
>         io_commit_cqring(ctx);
> +       spin_unlock_irq(&ctx->completion_lock);
>         io_free_req_many(ctx, reqs, &to_free);
>  }
>
> --
> 2.43.0
>