[SRU][F/aws][PATCH 0/2] aws: proper fix for c5.18xlarge hibernation issues

Andrea Righi andrea.righi at canonical.com
Thu Apr 8 07:54:25 UTC 2021


On Thu, Apr 08, 2021 at 08:09:39AM +0200, Stefan Bader wrote:
> On 23.03.21 18:02, Andrea Righi wrote:
> > On Tue, Mar 23, 2021 at 04:46:25PM +0000, Colin Ian King wrote:
> > > On 23/03/2021 16:15, Andrea Righi wrote:
> > > > BugLink: https://bugs.launchpad.net/bugs/1920944
> > > > 
> > > > [Impact]
> > > > 
> > > > In LP: #1918694 we applied a fix and a workaround to solve the
> > > > hibernation issues on c5.18xlarge. The workaround was in the form of a
> > > > SAUCE patch:
> > > > 
> > > >    "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot"
> > > > 
> > > > It looks like we can replace this workaround with a proper fix, by
> > > > applying this patch:
> > > > https://lore.kernel.org/kvm/87sg4t7vqy.fsf@vitty.brq.redhat.com/T/#m7533e1d1e551bff425da029fd401bd87935edc33
> > > > 
> > > > [Test plan]
> > > > 
> > > > Create a c5.18xlarge instance, run the memory stress test script (the
> > > > same test script that we are using to stress test hibernation), trigger
> > > > the hibernate event, trigger the resume event. Repeat a couple of times
> > > > and the problem is very likely to happen.
> > > > 
> > > > [Fix]
> > > > 
> > > > Replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot"
> > > > with:
> > > > 
> > > > https://lore.kernel.org/kvm/87sg4t7vqy.fsf@vitty.brq.redhat.com/T/#m7533e1d1e551bff425da029fd401bd87935edc33
> > > 
> > > There has been a follow-up comment on this fix:
> > > 
> > > https://lore.kernel.org/kvm/87sg4t7vqy.fsf@vitty.brq.redhat.com/T/#e7533e1d1e551bff425da029fd401bd87935edc33
> > > 
> > > should we wait for a V2 of this fix?
> > 
> > I can try to ping the author of the patch to check if he's planning to
> > send a v2 soon. The v1 has been tested already in AWS with positive
> > results, however I think there's no reason to rush and apply this ASAP,
> > because we already have the kvm clock workaround applied and it seems to
> > be enough to prevent the problem from happening.
> > 
> > If we need to respin the kernel for any reason, maybe it would make
> > sense to apply this patch, that is still better than the SAUCE
> > workaround (at the end the follow-up comments are not addressing
> > anything critical, the only relevant comment is probably the last one
> > about a failure path). Otherwise, it's probably a good idea to wait for
> > a v2.
> > 
> > Thanks,
> > -Andrea
> > 
> Was there any update on this?

The author mentioned that he's going to post a new version of this patch
soon (v3), but I haven't seen it yet. I'll keep following the lkml for
his patches.

-Andrea



More information about the kernel-team mailing list