ACK: [SRU][jammy/linux-aws][kinetic/linux-aws][PATCH 00/20] UBUNTU: SAUCE: PM: Hibernate: Enable Hibernation for Xen Based Instance Types

Marcelo Henrique Cerri marcelo.cerri at canonical.com
Wed Aug 17 14:00:27 UTC 2022


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


Acked-by: Marcelo Henrique Cerri <marcelo.cerri at canonical.com>

On Wed, Aug 17 2022, Gerald Yang wrote:
> BugLink: https://bugs.launchpad.net/bugs/1968062
>
> SRU Justification:
>
> [Impact]
>
> Hibernation currently fails for all AWS Xen instance types
> (c3/c4/i3/m3/m4/r3/r4/t2) with Jammy 5.15 and Kinetic 5.19 linux-aws kernels.
>
> When attempting to hibernate, the system gets stuck in sync_inodes_one_sb() when
> processing the rootfs, fails to hibernate, and shuts down. When you start the
> instance, it starts fresh, and does not resume from the incomplete hibernation
> image. Networking is also broken, and you cannot ssh in.
>
> Upon review of the jammy/linux-aws git log, it appears that the kernel is
> missing AWS hibernation enablement patches entirely. These need to be included
> to get hibernation working.
>
> [Fix]
>
> Hibernation currently works on the Amazon Linux 2 5.15 Kernel:
> https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline
>
> After careful review of the amazon-5.15.y/mainline branch, we have found the
> below set of patches authored by Amazon AWS Hibernation team to be minimally
> sufficient to get hibernation working on both Jammy 5.15 and Kinetic 5.19.
>
> xen: Restore xen-pirqs on resume from hibernation
> xen-netfront: call netif_device_attach on resume
> xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
> xen: restore pirqs on resume from hibernation.
> block: xen-blkfront: consider new dom0 features on restore
> x86: tsc: avoid system instability in hibernation
> xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
> Revert "xen: dont fiddle with event channel masking in suspend/resume"
> PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
> x86/xen: close event channels for PIRQs in system core suspend callback
> xen/events: add xen_shutdown_pirqs helper function
> x86/xen: save and restore steal clock
> xen/time: introduce xen_{save,restore}_steal_clock
> xen-netfront: add callbacks for PM suspend and hibernation support
> xen-blkfront: add callbacks for PM suspend and hibernation
> x86/xen: add system core suspend and resume callbacks
> x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume
> xenbus: add freeze/thaw/restore callbacks support
> xen/manage: introduce helper function to know the on-going suspend mode
> xen/manage: keep track of the on-going suspend mode
>
> These patches will be carried as SAUCE patches, and their subjects marked with
> "UBUNTU: SAUCE [aws]". Their upstream is the Amazon Hibernation team, with the
> repo being the Amazon Linux 2 kernel repo.
>
> [Testcase]
>
> 1. Log into Amazon EC2.
> 2. Select Launch Instance.
> 3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest t2.medium.
> 4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane.
> 5. Select your SSH keypair.
> 6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes.
> 7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable.
> 8. Create the Instance. SSH in.
> 9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and configure grub.
> 10. Start a screen session. Echo some text and then detach with ctrl-d.
> 11. Log out from instance.
> 12. In EC2, select "Instance State" > "Hibernate".
> 13. Wait 30 seconds to one minute. The state will go from "Stopping" to "Stopped".
> 14. Start the instance again.
> 15. SSH in.
> 16. Attempt to resume screen session with "screen -r".
>
> If you are not able to ssh into the instance, hibernation had failed. If ssh
> works and the screen session is still running, hibernation was successful.
>
> Alternatively, the CPC team can run their Hibernation testsuite over Jammy and
> Kinetic.
>
> We have built test kernels for Jammy and Kinetic with the patches, and they are
> available in the below ppa:
>
> https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/aws-hibernate-test
>
> If you try and hibernate and resume with the test kernels, hibernation is
> successful.
>
> [Where problems could occur]
>
> We are adding a significant amount of code to the Xen subsystem, spread across
> many commits. This code has not been mainlined, and is instead maintained out
> of tree by the Amazon AWS Hibernation team.
>
> The changes target hibernation, block devices, and clock devices, specific to
> those used on AWS Xen instances. Most of these patches have been applied to
> Xenial, Bionic, Focal and other series for a long time, but some patches are
> new for 5.15 onward.
>
> The changes will only target linux-aws to try and limit regression risk to
> AWS users, and any regressions will be limited to users of Xen based instance
> types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen 4.11.
>
> If a regression were to occur, the instance would likely fail to hibernate, and
> at worst, write an incomplete hibernation image to the swapfile. The kernel will
> see this on start, and instead of resuming from the hibernation image, will
> start fresh. It is unlikely to cause any filesystem corruption on the rootfs,
> but any in progress computations at the time of hibernation could be lost. The
> current broken behaviour breaks networking, and users would have to power cycle
> the instance a few times before they can ssh in again.
>
> Aleksei Besogonov (1):
>   PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
>
> Anchal Agarwal (4):
>   x86/xen: Introduce new function to map HYPERVISOR_shared_info on
>     Resume
>   Revert "xen: dont fiddle with event channel masking in suspend/resume"
>   xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>   xen: Restore xen-pirqs on resume from hibernation
>
> Eduardo Valentin (2):
>   x86: tsc: avoid system instability in hibernation
>   block: xen-blkfront: consider new dom0 features on restore
>
> Frank van der Linden (3):
>   xen: restore pirqs on resume from hibernation.
>   xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
>   xen-netfront: call netif_device_attach on resume
>
> Munehisa Kamata (10):
>   xen/manage: keep track of the on-going suspend mode
>   xen/manage: introduce helper function to know the on-going suspend
>     mode
>   xenbus: add freeze/thaw/restore callbacks support
>   x86/xen: add system core suspend and resume callbacks
>   xen-blkfront: add callbacks for PM suspend and hibernation
>   xen-netfront: add callbacks for PM suspend and hibernation support
>   xen/time: introduce xen_{save,restore}_steal_clock
>   x86/xen: save and restore steal clock
>   xen/events: add xen_shutdown_pirqs helper function
>   x86/xen: close event channels for PIRQs in system core suspend
>     callback
>
>  arch/x86/kernel/tsc.c             |  29 ++++++
>  arch/x86/xen/enlighten_hvm.c      |   8 ++
>  arch/x86/xen/suspend.c            |  67 +++++++++++++
>  arch/x86/xen/time.c               |   3 +
>  arch/x86/xen/xen-ops.h            |   2 +
>  drivers/block/xen-blkfront.c      | 161 ++++++++++++++++++++++++++++--
>  drivers/net/xen-netfront.c        | 104 ++++++++++++++++++-
>  drivers/xen/events/events_base.c  |  30 +++++-
>  drivers/xen/manage.c              |  73 ++++++++++++++
>  drivers/xen/time.c                |  29 +++++-
>  drivers/xen/xenbus/xenbus_probe.c |  99 +++++++++++++++---
>  include/linux/irq.h               |   2 +
>  include/linux/sched/clock.h       |   5 +
>  include/xen/events.h              |   2 +
>  include/xen/xen-ops.h             |   8 ++
>  include/xen/xenbus.h              |   3 +
>  kernel/irq/chip.c                 |   4 +-
>  kernel/power/user.c               |   4 +
>  kernel/sched/clock.c              |   4 +-
>  19 files changed, 604 insertions(+), 33 deletions(-)
>
> --
> 2.34.1


- --
Regards,
Marcelo
-----BEGIN PGP SIGNATURE-----

iQHQBAEBCgA6FiEExJjLjAfVL0XbfEr56e82LoessAkFAmL89IQcHG1hcmNlbG8u
Y2VycmlAY2Fub25pY2FsLmNvbQAKCRDp7zYuh6ywCbQEDACN0I8VXYFqkifpMJRR
tRha8MFWjSxE7DtnLPHlet1UjYNI5R14NfiLMvqnpZCYFPM5250cwwCKQbDuohSQ
sEQGzjrnU383BBHnBNT9UrCtVWzharWmHT2UfV8tfy0KQ2omv7XqgALF50uKSWfu
OjgSiqOZgrWu6zUXc5jt1hGitGZApe3QV+7VbK+5AUleB3ysPi345H0fidgfDpR6
0LIH8ZvHvFEy8kwDxZHtGUKmSSKbvGjSr6DvdA/t6jzgD2Qi/dgbTLdD+q7Atr3j
kck1/zncGHy86eKYhsyVj9RK8/+7wnBE6BtvxPTdQAR13zAbN4LZBzrVhM1QpTnj
YLBUKsBGwYrWnfnTWKDCPI/1tiKQ0j+zfFGoJQGYMjFsnQ7/S7Ap/Mg3EDTcNjfz
CoiuEEontgVLOyaZCDZdmuiOf9bcrDb1+nSEzR9FyJe2yJV95U9Y443WSW03eZoh
yobElb8TjprosWPf3uBTa1I63z3O4/ERnkBI4KpyRLETIU0=
=3A9z
-----END PGP SIGNATURE-----



More information about the kernel-team mailing list