APPLIED: [SRU][jammy/linux-aws][kinetic/linux-aws][PATCH 00/20] UBUNTU: SAUCE: PM: Hibernate: Enable Hibernation for Xen Based Instance Types

Tim Gardner tim.gardner at canonical.com
Wed Aug 17 14:59:34 UTC 2022


On 8/17/22 02:51, Gerald Yang wrote:
> BugLink: https://bugs.launchpad.net/bugs/1968062
> 
> SRU Justification:
> 
> [Impact]
> 
> Hibernation currently fails for all AWS Xen instance types
> (c3/c4/i3/m3/m4/r3/r4/t2) with Jammy 5.15 and Kinetic 5.19 linux-aws kernels.
> 
> When attempting to hibernate, the system gets stuck in sync_inodes_one_sb() when
> processing the rootfs, fails to hibernate, and shuts down. When you start the
> instance, it starts fresh, and does not resume from the incomplete hibernation
> image. Networking is also broken, and you cannot ssh in.
> 
> Upon review of the jammy/linux-aws git log, it appears that the kernel is
> missing AWS hibernation enablement patches entirely. These need to be included
> to get hibernation working.
> 
> [Fix]
> 
> Hibernation currently works on the Amazon Linux 2 5.15 Kernel:
> https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline
> 
> After careful review of the amazon-5.15.y/mainline branch, we have found the
> below set of patches authored by Amazon AWS Hibernation team to be minimally
> sufficient to get hibernation working on both Jammy 5.15 and Kinetic 5.19.
> 
> xen: Restore xen-pirqs on resume from hibernation
> xen-netfront: call netif_device_attach on resume
> xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
> xen: restore pirqs on resume from hibernation.
> block: xen-blkfront: consider new dom0 features on restore
> x86: tsc: avoid system instability in hibernation
> xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
> Revert "xen: dont fiddle with event channel masking in suspend/resume"
> PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
> x86/xen: close event channels for PIRQs in system core suspend callback
> xen/events: add xen_shutdown_pirqs helper function
> x86/xen: save and restore steal clock
> xen/time: introduce xen_{save,restore}_steal_clock
> xen-netfront: add callbacks for PM suspend and hibernation support
> xen-blkfront: add callbacks for PM suspend and hibernation
> x86/xen: add system core suspend and resume callbacks
> x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume
> xenbus: add freeze/thaw/restore callbacks support
> xen/manage: introduce helper function to know the on-going suspend mode
> xen/manage: keep track of the on-going suspend mode
> 
> These patches will be carried as SAUCE patches, and their subjects marked with
> "UBUNTU: SAUCE [aws]". Their upstream is the Amazon Hibernation team, with the
> repo being the Amazon Linux 2 kernel repo.
> 
> [Testcase]
> 
> 1. Log into Amazon EC2.
> 2. Select Launch Instance.
> 3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest t2.medium.
> 4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane.
> 5. Select your SSH keypair.
> 6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes.
> 7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable.
> 8. Create the Instance. SSH in.
> 9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and configure grub.
> 10. Start a screen session. Echo some text and then detach with ctrl-d.
> 11. Log out from instance.
> 12. In EC2, select "Instance State" > "Hibernate".
> 13. Wait 30 seconds to one minute. The state will go from "Stopping" to "Stopped".
> 14. Start the instance again.
> 15. SSH in.
> 16. Attempt to resume screen session with "screen -r".
> 
> If you are not able to ssh into the instance, hibernation had failed. If ssh
> works and the screen session is still running, hibernation was successful.
> 
> Alternatively, the CPC team can run their Hibernation testsuite over Jammy and
> Kinetic.
> 
> We have built test kernels for Jammy and Kinetic with the patches, and they are
> available in the below ppa:
> 
> https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/aws-hibernate-test
> 
> If you try and hibernate and resume with the test kernels, hibernation is
> successful.
> 
> [Where problems could occur]
> 
> We are adding a significant amount of code to the Xen subsystem, spread across
> many commits. This code has not been mainlined, and is instead maintained out
> of tree by the Amazon AWS Hibernation team.
> 
> The changes target hibernation, block devices, and clock devices, specific to
> those used on AWS Xen instances. Most of these patches have been applied to
> Xenial, Bionic, Focal and other series for a long time, but some patches are
> new for 5.15 onward.
> 
> The changes will only target linux-aws to try and limit regression risk to
> AWS users, and any regressions will be limited to users of Xen based instance
> types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen 4.11.
> 
> If a regression were to occur, the instance would likely fail to hibernate, and
> at worst, write an incomplete hibernation image to the swapfile. The kernel will
> see this on start, and instead of resuming from the hibernation image, will
> start fresh. It is unlikely to cause any filesystem corruption on the rootfs,
> but any in progress computations at the time of hibernation could be lost. The
> current broken behaviour breaks networking, and users would have to power cycle
> the instance a few times before they can ssh in again.
> 
> Aleksei Besogonov (1):
>    PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
> 
> Anchal Agarwal (4):
>    x86/xen: Introduce new function to map HYPERVISOR_shared_info on
>      Resume
>    Revert "xen: dont fiddle with event channel masking in suspend/resume"
>    xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>    xen: Restore xen-pirqs on resume from hibernation
> 
> Eduardo Valentin (2):
>    x86: tsc: avoid system instability in hibernation
>    block: xen-blkfront: consider new dom0 features on restore
> 
> Frank van der Linden (3):
>    xen: restore pirqs on resume from hibernation.
>    xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
>    xen-netfront: call netif_device_attach on resume
> 
> Munehisa Kamata (10):
>    xen/manage: keep track of the on-going suspend mode
>    xen/manage: introduce helper function to know the on-going suspend
>      mode
>    xenbus: add freeze/thaw/restore callbacks support
>    x86/xen: add system core suspend and resume callbacks
>    xen-blkfront: add callbacks for PM suspend and hibernation
>    xen-netfront: add callbacks for PM suspend and hibernation support
>    xen/time: introduce xen_{save,restore}_steal_clock
>    x86/xen: save and restore steal clock
>    xen/events: add xen_shutdown_pirqs helper function
>    x86/xen: close event channels for PIRQs in system core suspend
>      callback
> 
>   arch/x86/kernel/tsc.c             |  29 ++++++
>   arch/x86/xen/enlighten_hvm.c      |   8 ++
>   arch/x86/xen/suspend.c            |  67 +++++++++++++
>   arch/x86/xen/time.c               |   3 +
>   arch/x86/xen/xen-ops.h            |   2 +
>   drivers/block/xen-blkfront.c      | 161 ++++++++++++++++++++++++++++--
>   drivers/net/xen-netfront.c        | 104 ++++++++++++++++++-
>   drivers/xen/events/events_base.c  |  30 +++++-
>   drivers/xen/manage.c              |  73 ++++++++++++++
>   drivers/xen/time.c                |  29 +++++-
>   drivers/xen/xenbus/xenbus_probe.c |  99 +++++++++++++++---
>   include/linux/irq.h               |   2 +
>   include/linux/sched/clock.h       |   5 +
>   include/xen/events.h              |   2 +
>   include/xen/xen-ops.h             |   8 ++
>   include/xen/xenbus.h              |   3 +
>   kernel/irq/chip.c                 |   4 +-
>   kernel/power/user.c               |   4 +
>   kernel/sched/clock.c              |   4 +-
>   19 files changed, 604 insertions(+), 33 deletions(-)
> 
Applied to jammy:linux-aws/master-next, kinetic:linux-aws/master-next. 
Thanks.

-rtg

-- 
-----------
Tim Gardner
Canonical, Inc



More information about the kernel-team mailing list