ACK: [SRU Focal 0/3] LP: #1996678 - containerd regressions

Khaled Elmously khalid.elmously at canonical.com
Wed Nov 23 19:29:47 UTC 2022


On 2022-11-23 14:48:33 , Thadeu Lima de Souza Cascardo wrote:
> [Impact]
> Multiple users and partners have reported containerd/k8s/runc failures.
> These manifest as containers being in the unknown state and failures
> to communicate with containerd (response timeouts).
> 
> [Fixes]
> Some reports attribute the failure to epoll, in particular commit a16ceb139610.
> Others have attribute containerd failures to splice commit 97ef77c52b78.
> 
> The epoll commit was applied upstream after many other fixes were applied to
> epoll. Those other fixes prevent race conditions between epoll_pwait timing out
> and wakeups being missed. a16ceb139610 upstream would make such races produce
> extra events, not miss events. But on current 5.4 kernel, specially without
> commit 289caf5d8f6c, this would result in events more likely being missed.
> 
> Some tests have shown that when applying 65759097d804 and 289caf5d8f6c and not
> reverting a16ceb139610, the number of responses timeouts compared to either
> Ubuntu-5.4.0-131 (without a16ceb139610) and 5.4.0-132 (with a16ceb139610) go
> down.
> 
> Also, 5.15 carry some epoll kselftests and at least epoll61 test passes once
> those fixes are applied, and no other regressions in that test suite are found.
> That particular test is demonstrating the described race condition from commit
> 289caf5d8f6c.
> 
> 97ef77c52b78, on the other hand, has also been reverted on upstream 5.4.y as it
> depended on a lot of other changes that have not been backported or targeted to
> 5.4. In particular, the reasoning upstream was NFS regressions.
> 
> In light of all this, even though these fixes may not be necessary to fix the
> reported containerd bugs, they should go into all 5.4 kernels anyway as they
> fix other real bugs.
> 
> [URGENCY]
> Applying these fixes as soon as possible is in line with a strategy of getting
> fixes out so they can be tested and verified, particularly for those users who
> require offical Ubuntu builds. In case these cause more harm or do not fix the
> regressions, they can be reverted and better fixes applied instead.
> 
> 
> Roman Penyaev (1):
>   epoll: call final ep_events_available() check under the lock
> 
> Sasha Levin (1):
>   Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
> 
> Soheil Hassas Yeganeh (1):
>   epoll: check for events when removing a timed out thread from the wait
>     queue
> 
>  fs/eventpoll.c | 69 ++++++++++++++++++++++++++++++--------------------
>  fs/splice.c    | 10 +++++---
>  2 files changed, 48 insertions(+), 31 deletions(-)
> 
> -- 
> 2.34.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list