Cmnt: [SRU][F/aws][PATCH v2 0/6] aws: proper fix for c5.18xlarge hibernation issues

Andrea Righi andrea.righi at canonical.com
Wed May 19 14:44:29 UTC 2021


On Tue, May 18, 2021 at 03:41:03PM -0300, Guilherme Piccoli wrote:
> On Tue, May 18, 2021 at 12:26 PM Andrea Righi
> <andrea.righi at canonical.com> wrote:
> >
> > BugLink: https://bugs.launchpad.net/bugs/1920944
> >
> > [Impact]
> >
> > In LP: #1918694 we applied a fix and a workaround to solve the
> > hibernation issues on c5.18xlarge. The workaround was in the form of a
> > SAUCE patch:
> >
> >   "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot"
> >
> > It looks like we can replace this workaround with a proper fix, by
> > applying this patch:
> >
> > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/
> >
> > [Test plan]
> >
> > Create a c5.18xlarge instance, run the memory stress test script (the
> > same test script that we are using to stress test hibernation), trigger
> > the hibernate event, trigger the resume event. Repeat a couple of times
> > and the problem is very likely to happen.
> >
> > [Fix]
> >
> > Replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot"
> > with:
> >
> > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/
> >
> > The fix has been tested extensively in the AWS infrastructure with
> > positive results.
> >
> > [Where problems could occur]
> >
> > This new code introduced by the fix can be executed also when a CPU is
> > put offline, so we may see potential regressions in the KVM CPU
> > hotplugging.
> >
> > ----------------------------------------------------------------
> > Changelog (v1 -> v2):
> >  - new patch set from readhat
> >
> > NOTE: backport activity was minimal, it only required some context
> > adjustments to properly apply the changes.
> >
> > Andrea Righi (1):
> >       Revert "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot"
> >
> > Vitaly Kuznetsov (5):
> >       x86/kvm: Fix pr_info() for async PF setup/teardown
> >       x86/kvm: Teardown PV features on boot CPU as well
> >       x86/kvm: Disable kvmclock on all CPUs on shutdown
> >       x86/kvm: Disable all PV features on crash
> >       x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline()
> >
> >  arch/x86/include/asm/kvm_para.h |   9 ++----
> >  arch/x86/kernel/kvm.c           | 113 ++++++++++++++++++++++++++++++++++++++++++++----------------------
> >  arch/x86/kernel/kvmclock.c      |  28 ++---------------
> >  3 files changed, 79 insertions(+), 71 deletions(-)
> >
> >
> 
> Thanks Andrea, very good patchset to have in our kernels!
> I'm ready to ACK, but I'd like to clarify the following before:

Thanks for the review Guilherme!

> 
> (a) Should it be in 5.8/5.11 as well?

I would say yes, but we haven't tested them in 5.8 and 5.11 yet, this is
why I was sending the patch set for F/aws only for now. The other
kernels will probably receive the patch set during the regular SRU
process.

> 
> (b) Should it be sent to main kernel and get pulled by all
> derivatives, or really only for -aws?

I would say only for aws now, because they are experiencing a specific
bug that can be fixed by this patch set.

Ditto about the SRU process.

> 
> (c) Also, patches are upstream[0], so should we have the IDs in the commits?

Absolutely! Thanks for noticing it and my bad for not checking if they
landed upstream.

They should contain the proper "cherry-picked / backported" line. I'll
fix this and send a new patch set.

-Andrea



More information about the kernel-team mailing list