[SRU][Focal][PATCH] Revert "epoll: autoremove wakers even more aggressively"

Tue Nov 22 19:57:24 UTC 2022

BugLink: https://bugs.launchpad.net/bugs/1996678

This reverts commit bcf91619e32fe584ecfafa49a3db3d1db4ff70b2.

That commit was suspected of causing regressions, like the one reported in the BugLink, as well as 
other suspected containerd/runc regressions reported on Azure and elsewhere, e.g.: 

https://canonical.lightning.force.com/lightning/r/Case/5004K00000OnSZDQA3/view
https://github.com/opencontainers/runc/issues/3641
https://www.spinics.net/lists/kernel/msg4565924.html


Investigation is ongoing but there is a high probability/confidence that 
bcf91619e32fe584ecfafa49a3db3d1db4ff70b2 is indeed the problem.


Signed-off-by: Khalid Elmously <khalid.elmously at canonical.com>
---
 fs/eventpoll.c | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 7e11135bc915c..339453ac834cc 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1803,21 +1803,6 @@ static inline struct timespec64 ep_set_mstimeout(long ms)
 	return timespec64_add_safe(now, ts);
 }
 
-/*
- * autoremove_wake_function, but remove even on failure to wake up, because we
- * know that default_wake_function/ttwu will only fail if the thread is already
- * woken, and in that case the ep_poll loop will remove the entry anyways, not
- * try to reuse it.
- */
-static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry,
-				       unsigned int mode, int sync, void *key)
-{
-	int ret = default_wake_function(wq_entry, mode, sync, key);
-
-	list_del_init(&wq_entry->entry);
-	return ret;
-}
-
 /**
  * ep_poll - Retrieves ready events, and delivers them to the caller supplied
  *           event buffer.
@@ -1895,15 +1880,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		 * normal wakeup path no need to call __remove_wait_queue()
 		 * explicitly, thus ep->lock is not taken, which halts the
 		 * event delivery.
-		 *
-		 * In fact, we now use an even more aggressive function that
-		 * unconditionally removes, because we don't reuse the wait
-		 * entry between loop iterations. This lets us also avoid the
-		 * performance issue if a process is killed, causing all of its
-		 * threads to wake up without being removed normally.
 		 */
 		init_wait(&wait);
-		wait.func = ep_autoremove_wake_function;
 		write_lock_irq(&ep->lock);
 		__add_wait_queue_exclusive(&ep->wq, &wait);
 		write_unlock_irq(&ep->lock);
-- 
2.34.1