ACK: [SRU Focal 0/3] LP: #1996678 - containerd regressions

Kamal Mostafa kamal at canonical.com
Wed Nov 23 19:41:44 UTC 2022


Acked-by: Kamal Mostafa <kamal at canonical.com>

LGTM.

 -Kamal

On Wed, Nov 23, 2022 at 9:50 AM Thadeu Lima de Souza Cascardo <
cascardo at canonical.com> wrote:

> [Impact]
> Multiple users and partners have reported containerd/k8s/runc failures.
> These manifest as containers being in the unknown state and failures
> to communicate with containerd (response timeouts).
>
> [Fixes]
> Some reports attribute the failure to epoll, in particular commit
> a16ceb139610.
> Others have attribute containerd failures to splice commit 97ef77c52b78.
>
> The epoll commit was applied upstream after many other fixes were applied
> to
> epoll. Those other fixes prevent race conditions between epoll_pwait
> timing out
> and wakeups being missed. a16ceb139610 upstream would make such races
> produce
> extra events, not miss events. But on current 5.4 kernel, specially without
> commit 289caf5d8f6c, this would result in events more likely being missed.
>
> Some tests have shown that when applying 65759097d804 and 289caf5d8f6c and
> not
> reverting a16ceb139610, the number of responses timeouts compared to either
> Ubuntu-5.4.0-131 (without a16ceb139610) and 5.4.0-132 (with a16ceb139610)
> go
> down.
>
> Also, 5.15 carry some epoll kselftests and at least epoll61 test passes
> once
> those fixes are applied, and no other regressions in that test suite are
> found.
> That particular test is demonstrating the described race condition from
> commit
> 289caf5d8f6c.
>
> 97ef77c52b78, on the other hand, has also been reverted on upstream 5.4.y
> as it
> depended on a lot of other changes that have not been backported or
> targeted to
> 5.4. In particular, the reasoning upstream was NFS regressions.
>
> In light of all this, even though these fixes may not be necessary to fix
> the
> reported containerd bugs, they should go into all 5.4 kernels anyway as
> they
> fix other real bugs.
>
> [URGENCY]
> Applying these fixes as soon as possible is in line with a strategy of
> getting
> fixes out so they can be tested and verified, particularly for those users
> who
> require offical Ubuntu builds. In case these cause more harm or do not fix
> the
> regressions, they can be reverted and better fixes applied instead.
>
>
> Roman Penyaev (1):
>   epoll: call final ep_events_available() check under the lock
>
> Sasha Levin (1):
>   Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
>
> Soheil Hassas Yeganeh (1):
>   epoll: check for events when removing a timed out thread from the wait
>     queue
>
>  fs/eventpoll.c | 69 ++++++++++++++++++++++++++++++--------------------
>  fs/splice.c    | 10 +++++---
>  2 files changed, 48 insertions(+), 31 deletions(-)
>
> --
> 2.34.1
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20221123/02a2d978/attachment-0001.html>


More information about the kernel-team mailing list