[SRU OEM-5.17] io_uring: ensure IOPOLL locks around deferred work

Thu Aug 3 21:37:11 UTC 2023

From: Jens Axboe <axboe at kernel.dk>

No direct upstream commit exists for this issue. It was fixed in
5.18 as part of a larger rework of the completion side.

io_commit_cqring() writes the CQ ring tail to make it visible, but it
also kicks off any deferred work we have. A ring setup with IOPOLL
does not need any locking around the CQ ring updates, as we're always
under the ctx uring_lock. But if we have deferred work that needs
processing, then io_queue_deferred() assumes that the completion_lock
is held, as it is for !IOPOLL.

Add a lockdep assertion to check and document this fact, and have
io_iopoll_complete() check if we have deferred work and run that
separately with the appropriate lock grabbed.

Cc: stable at vger.kernel.org # 5.10, 5.15
Reported-by: dghost david <daviduniverse18 at gmail.com>
Signed-off-by: Jens Axboe <axboe at kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
(backported from commit fb348857e7b67eefe365052f1423427b66dedbf3 linux-5.15.y)
[cascardo: moved hunk from io_iopoll_completed do io_do_iopoll]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
---
 fs/io_uring.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 13d7c625513f..635fcf79dcab 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1604,6 +1604,8 @@ static void io_kill_timeout(struct io_kiocb *req, int status)
 
 static __cold void io_queue_deferred(struct io_ring_ctx *ctx)
 {
+	lockdep_assert_held(&ctx->completion_lock);
+
 	while (!list_empty(&ctx->defer_list)) {
 		struct io_defer_entry *de = list_first_entry(&ctx->defer_list,
 						struct io_defer_entry, list);
@@ -1655,14 +1657,24 @@ static __cold void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
 		io_queue_deferred(ctx);
 }
 
-static inline void io_commit_cqring(struct io_ring_ctx *ctx)
+static inline bool io_commit_needs_flush(struct io_ring_ctx *ctx)
+{
+	return ctx->off_timeout_used || ctx->drain_active;
+}
+
+static inline void __io_commit_cqring(struct io_ring_ctx *ctx)
 {
-	if (unlikely(ctx->off_timeout_used || ctx->drain_active))
-		__io_commit_cqring_flush(ctx);
 	/* order cqe stores with ring update */
 	smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
 }
 
+static inline void io_commit_cqring(struct io_ring_ctx *ctx)
+{
+	if (unlikely(io_commit_needs_flush(ctx)))
+		__io_commit_cqring_flush(ctx);
+	__io_commit_cqring(ctx);
+}
+
 static inline bool io_sqring_full(struct io_ring_ctx *ctx)
 {
 	struct io_rings *r = ctx->rings;
@@ -2627,7 +2639,12 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
 	if (unlikely(!nr_events))
 		return 0;
 
-	io_commit_cqring(ctx);
+	if (io_commit_needs_flush(ctx)) {
+		spin_lock(&ctx->completion_lock);
+		__io_commit_cqring_flush(ctx);
+		spin_unlock(&ctx->completion_lock);
+	}
+	__io_commit_cqring(ctx);
 	io_cqring_ev_posted_iopoll(ctx);
 	pos = start ? start->next : ctx->iopoll_list.first;
 	wq_list_cut(&ctx->iopoll_list, prev, start);
-- 
2.34.1