[SRU][F/aws][PATCH 0/2] aws: proper fix for c5.18xlarge hibernation issues
Andrea Righi
andrea.righi at canonical.com
Tue Mar 23 17:02:19 UTC 2021
On Tue, Mar 23, 2021 at 04:46:25PM +0000, Colin Ian King wrote:
> On 23/03/2021 16:15, Andrea Righi wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1920944
> >
> > [Impact]
> >
> > In LP: #1918694 we applied a fix and a workaround to solve the
> > hibernation issues on c5.18xlarge. The workaround was in the form of a
> > SAUCE patch:
> >
> > "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot"
> >
> > It looks like we can replace this workaround with a proper fix, by
> > applying this patch:
> > https://lore.kernel.org/kvm/87sg4t7vqy.fsf@vitty.brq.redhat.com/T/#m7533e1d1e551bff425da029fd401bd87935edc33
> >
> > [Test plan]
> >
> > Create a c5.18xlarge instance, run the memory stress test script (the
> > same test script that we are using to stress test hibernation), trigger
> > the hibernate event, trigger the resume event. Repeat a couple of times
> > and the problem is very likely to happen.
> >
> > [Fix]
> >
> > Replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot"
> > with:
> >
> > https://lore.kernel.org/kvm/87sg4t7vqy.fsf@vitty.brq.redhat.com/T/#m7533e1d1e551bff425da029fd401bd87935edc33
>
> There has been a follow-up comment on this fix:
>
> https://lore.kernel.org/kvm/87sg4t7vqy.fsf@vitty.brq.redhat.com/T/#e7533e1d1e551bff425da029fd401bd87935edc33
>
> should we wait for a V2 of this fix?
I can try to ping the author of the patch to check if he's planning to
send a v2 soon. The v1 has been tested already in AWS with positive
results, however I think there's no reason to rush and apply this ASAP,
because we already have the kvm clock workaround applied and it seems to
be enough to prevent the problem from happening.
If we need to respin the kernel for any reason, maybe it would make
sense to apply this patch, that is still better than the SAUCE
workaround (at the end the follow-up comments are not addressing
anything critical, the only relevant comment is probably the last one
about a failure path). Otherwise, it's probably a good idea to wait for
a v2.
Thanks,
-Andrea
More information about the kernel-team
mailing list