ACK: [SRU Focal 0/3] LP: #1996678 - containerd regressions
Khaled Elmously
khalid.elmously at canonical.com
Wed Nov 23 19:31:27 UTC 2022
Adding my signoff properly:
Acked-by: Khalid Elmously <khalid.elmously at canonical.com>
On 2022-11-23 14:29:49 , Khaled Elmously wrote:
> On 2022-11-23 14:48:33 , Thadeu Lima de Souza Cascardo wrote:
> > [Impact]
> > Multiple users and partners have reported containerd/k8s/runc failures.
> > These manifest as containers being in the unknown state and failures
> > to communicate with containerd (response timeouts).
> >
> > [Fixes]
> > Some reports attribute the failure to epoll, in particular commit a16ceb139610.
> > Others have attribute containerd failures to splice commit 97ef77c52b78.
> >
> > The epoll commit was applied upstream after many other fixes were applied to
> > epoll. Those other fixes prevent race conditions between epoll_pwait timing out
> > and wakeups being missed. a16ceb139610 upstream would make such races produce
> > extra events, not miss events. But on current 5.4 kernel, specially without
> > commit 289caf5d8f6c, this would result in events more likely being missed.
> >
> > Some tests have shown that when applying 65759097d804 and 289caf5d8f6c and not
> > reverting a16ceb139610, the number of responses timeouts compared to either
> > Ubuntu-5.4.0-131 (without a16ceb139610) and 5.4.0-132 (with a16ceb139610) go
> > down.
> >
> > Also, 5.15 carry some epoll kselftests and at least epoll61 test passes once
> > those fixes are applied, and no other regressions in that test suite are found.
> > That particular test is demonstrating the described race condition from commit
> > 289caf5d8f6c.
> >
> > 97ef77c52b78, on the other hand, has also been reverted on upstream 5.4.y as it
> > depended on a lot of other changes that have not been backported or targeted to
> > 5.4. In particular, the reasoning upstream was NFS regressions.
> >
> > In light of all this, even though these fixes may not be necessary to fix the
> > reported containerd bugs, they should go into all 5.4 kernels anyway as they
> > fix other real bugs.
> >
> > [URGENCY]
> > Applying these fixes as soon as possible is in line with a strategy of getting
> > fixes out so they can be tested and verified, particularly for those users who
> > require offical Ubuntu builds. In case these cause more harm or do not fix the
> > regressions, they can be reverted and better fixes applied instead.
> >
> >
> > Roman Penyaev (1):
> > epoll: call final ep_events_available() check under the lock
> >
> > Sasha Levin (1):
> > Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
> >
> > Soheil Hassas Yeganeh (1):
> > epoll: check for events when removing a timed out thread from the wait
> > queue
> >
> > fs/eventpoll.c | 69 ++++++++++++++++++++++++++++++--------------------
> > fs/splice.c | 10 +++++---
> > 2 files changed, 48 insertions(+), 31 deletions(-)
> >
> > --
> > 2.34.1
> >
> >
> > --
> > kernel-team mailing list
> > kernel-team at lists.ubuntu.com
> > https://lists.ubuntu.com/mailman/listinfo/kernel-team
More information about the kernel-team
mailing list