[SRU][jammy/linux-aws][kinetic/linux-aws][PATCH 00/20] UBUNTU: SAUCE: PM: Hibernate: Enable Hibernation for Xen Based Instance Types
Gerald Yang
gerald.yang at canonical.com
Wed Aug 17 08:51:26 UTC 2022
BugLink: https://bugs.launchpad.net/bugs/1968062
SRU Justification:
[Impact]
Hibernation currently fails for all AWS Xen instance types
(c3/c4/i3/m3/m4/r3/r4/t2) with Jammy 5.15 and Kinetic 5.19 linux-aws kernels.
When attempting to hibernate, the system gets stuck in sync_inodes_one_sb() when
processing the rootfs, fails to hibernate, and shuts down. When you start the
instance, it starts fresh, and does not resume from the incomplete hibernation
image. Networking is also broken, and you cannot ssh in.
Upon review of the jammy/linux-aws git log, it appears that the kernel is
missing AWS hibernation enablement patches entirely. These need to be included
to get hibernation working.
[Fix]
Hibernation currently works on the Amazon Linux 2 5.15 Kernel:
https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline
After careful review of the amazon-5.15.y/mainline branch, we have found the
below set of patches authored by Amazon AWS Hibernation team to be minimally
sufficient to get hibernation working on both Jammy 5.15 and Kinetic 5.19.
xen: Restore xen-pirqs on resume from hibernation
xen-netfront: call netif_device_attach on resume
xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
xen: restore pirqs on resume from hibernation.
block: xen-blkfront: consider new dom0 features on restore
x86: tsc: avoid system instability in hibernation
xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
Revert "xen: dont fiddle with event channel masking in suspend/resume"
PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
x86/xen: close event channels for PIRQs in system core suspend callback
xen/events: add xen_shutdown_pirqs helper function
x86/xen: save and restore steal clock
xen/time: introduce xen_{save,restore}_steal_clock
xen-netfront: add callbacks for PM suspend and hibernation support
xen-blkfront: add callbacks for PM suspend and hibernation
x86/xen: add system core suspend and resume callbacks
x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume
xenbus: add freeze/thaw/restore callbacks support
xen/manage: introduce helper function to know the on-going suspend mode
xen/manage: keep track of the on-going suspend mode
These patches will be carried as SAUCE patches, and their subjects marked with
"UBUNTU: SAUCE [aws]". Their upstream is the Amazon Hibernation team, with the
repo being the Amazon Linux 2 kernel repo.
[Testcase]
1. Log into Amazon EC2.
2. Select Launch Instance.
3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest t2.medium.
4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane.
5. Select your SSH keypair.
6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes.
7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable.
8. Create the Instance. SSH in.
9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and configure grub.
10. Start a screen session. Echo some text and then detach with ctrl-d.
11. Log out from instance.
12. In EC2, select "Instance State" > "Hibernate".
13. Wait 30 seconds to one minute. The state will go from "Stopping" to "Stopped".
14. Start the instance again.
15. SSH in.
16. Attempt to resume screen session with "screen -r".
If you are not able to ssh into the instance, hibernation had failed. If ssh
works and the screen session is still running, hibernation was successful.
Alternatively, the CPC team can run their Hibernation testsuite over Jammy and
Kinetic.
We have built test kernels for Jammy and Kinetic with the patches, and they are
available in the below ppa:
https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/aws-hibernate-test
If you try and hibernate and resume with the test kernels, hibernation is
successful.
[Where problems could occur]
We are adding a significant amount of code to the Xen subsystem, spread across
many commits. This code has not been mainlined, and is instead maintained out
of tree by the Amazon AWS Hibernation team.
The changes target hibernation, block devices, and clock devices, specific to
those used on AWS Xen instances. Most of these patches have been applied to
Xenial, Bionic, Focal and other series for a long time, but some patches are
new for 5.15 onward.
The changes will only target linux-aws to try and limit regression risk to
AWS users, and any regressions will be limited to users of Xen based instance
types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen 4.11.
If a regression were to occur, the instance would likely fail to hibernate, and
at worst, write an incomplete hibernation image to the swapfile. The kernel will
see this on start, and instead of resuming from the hibernation image, will
start fresh. It is unlikely to cause any filesystem corruption on the rootfs,
but any in progress computations at the time of hibernation could be lost. The
current broken behaviour breaks networking, and users would have to power cycle
the instance a few times before they can ssh in again.
Aleksei Besogonov (1):
PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
Anchal Agarwal (4):
x86/xen: Introduce new function to map HYPERVISOR_shared_info on
Resume
Revert "xen: dont fiddle with event channel masking in suspend/resume"
xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
xen: Restore xen-pirqs on resume from hibernation
Eduardo Valentin (2):
x86: tsc: avoid system instability in hibernation
block: xen-blkfront: consider new dom0 features on restore
Frank van der Linden (3):
xen: restore pirqs on resume from hibernation.
xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
xen-netfront: call netif_device_attach on resume
Munehisa Kamata (10):
xen/manage: keep track of the on-going suspend mode
xen/manage: introduce helper function to know the on-going suspend
mode
xenbus: add freeze/thaw/restore callbacks support
x86/xen: add system core suspend and resume callbacks
xen-blkfront: add callbacks for PM suspend and hibernation
xen-netfront: add callbacks for PM suspend and hibernation support
xen/time: introduce xen_{save,restore}_steal_clock
x86/xen: save and restore steal clock
xen/events: add xen_shutdown_pirqs helper function
x86/xen: close event channels for PIRQs in system core suspend
callback
arch/x86/kernel/tsc.c | 29 ++++++
arch/x86/xen/enlighten_hvm.c | 8 ++
arch/x86/xen/suspend.c | 67 +++++++++++++
arch/x86/xen/time.c | 3 +
arch/x86/xen/xen-ops.h | 2 +
drivers/block/xen-blkfront.c | 161 ++++++++++++++++++++++++++++--
drivers/net/xen-netfront.c | 104 ++++++++++++++++++-
drivers/xen/events/events_base.c | 30 +++++-
drivers/xen/manage.c | 73 ++++++++++++++
drivers/xen/time.c | 29 +++++-
drivers/xen/xenbus/xenbus_probe.c | 99 +++++++++++++++---
include/linux/irq.h | 2 +
include/linux/sched/clock.h | 5 +
include/xen/events.h | 2 +
include/xen/xen-ops.h | 8 ++
include/xen/xenbus.h | 3 +
kernel/irq/chip.c | 4 +-
kernel/power/user.c | 4 +
kernel/sched/clock.c | 4 +-
19 files changed, 604 insertions(+), 33 deletions(-)
--
2.34.1
More information about the kernel-team
mailing list