ACK/Cmnt: [SRU][jammy/linux-aws][kinetic/linux-aws][PATCH 00/20] UBUNTU: SAUCE: PM: Hibernate: Enable Hibernation for Xen Based Instance Types

Tim Gardner tim.gardner at canonical.com
Wed Aug 17 14:04:21 UTC 2022


On 8/17/22 07:24, Tim Gardner wrote:
> On 8/17/22 02:51, Gerald Yang wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1968062
>>
>> SRU Justification:
>>
>> [Impact]
>>
>> Hibernation currently fails for all AWS Xen instance types
>> (c3/c4/i3/m3/m4/r3/r4/t2) with Jammy 5.15 and Kinetic 5.19 linux-aws 
>> kernels.
>>
>> When attempting to hibernate, the system gets stuck in 
>> sync_inodes_one_sb() when
>> processing the rootfs, fails to hibernate, and shuts down. When you 
>> start the
>> instance, it starts fresh, and does not resume from the incomplete 
>> hibernation
>> image. Networking is also broken, and you cannot ssh in.
>>
>> Upon review of the jammy/linux-aws git log, it appears that the kernel is
>> missing AWS hibernation enablement patches entirely. These need to be 
>> included
>> to get hibernation working.
>>
>> [Fix]
>>
>> Hibernation currently works on the Amazon Linux 2 5.15 Kernel:
>> https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline
>>
>> After careful review of the amazon-5.15.y/mainline branch, we have 
>> found the
>> below set of patches authored by Amazon AWS Hibernation team to be 
>> minimally
>> sufficient to get hibernation working on both Jammy 5.15 and Kinetic 
>> 5.19.
>>
>> xen: Restore xen-pirqs on resume from hibernation
>> xen-netfront: call netif_device_attach on resume
>> xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
>> xen: restore pirqs on resume from hibernation.
>> block: xen-blkfront: consider new dom0 features on restore
>> x86: tsc: avoid system instability in hibernation
>> xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>> Revert "xen: dont fiddle with event channel masking in suspend/resume"
>> PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
>> x86/xen: close event channels for PIRQs in system core suspend callback
>> xen/events: add xen_shutdown_pirqs helper function
>> x86/xen: save and restore steal clock
>> xen/time: introduce xen_{save,restore}_steal_clock
>> xen-netfront: add callbacks for PM suspend and hibernation support
>> xen-blkfront: add callbacks for PM suspend and hibernation
>> x86/xen: add system core suspend and resume callbacks
>> x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume
>> xenbus: add freeze/thaw/restore callbacks support
>> xen/manage: introduce helper function to know the on-going suspend mode
>> xen/manage: keep track of the on-going suspend mode
>>
>> These patches will be carried as SAUCE patches, and their subjects 
>> marked with
>> "UBUNTU: SAUCE [aws]". Their upstream is the Amazon Hibernation team, 
>> with the
>> repo being the Amazon Linux 2 kernel repo.
>>
>> [Testcase]
>>
>> 1. Log into Amazon EC2.
>> 2. Select Launch Instance.
>> 3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I 
>> suggest t2.medium.
>> 4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch 
>> pane.
>> 5. Select your SSH keypair.
>> 6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: 
>> Yes.
>> 7. Under Advanced Settings for the instance, set "Stop - Hibernate" to 
>> Enable.
>> 8. Create the Instance. SSH in.
>> 9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile 
>> and configure grub.
>> 10. Start a screen session. Echo some text and then detach with ctrl-d.
>> 11. Log out from instance.
>> 12. In EC2, select "Instance State" > "Hibernate".
>> 13. Wait 30 seconds to one minute. The state will go from "Stopping" 
>> to "Stopped".
>> 14. Start the instance again.
>> 15. SSH in.
>> 16. Attempt to resume screen session with "screen -r".
>>
>> If you are not able to ssh into the instance, hibernation had failed. 
>> If ssh
>> works and the screen session is still running, hibernation was 
>> successful.
>>
>> Alternatively, the CPC team can run their Hibernation testsuite over 
>> Jammy and
>> Kinetic.
>>
>> We have built test kernels for Jammy and Kinetic with the patches, and 
>> they are
>> available in the below ppa:
>>
>> https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/aws-hibernate-test
>>
>> If you try and hibernate and resume with the test kernels, hibernation is
>> successful.
>>
>> [Where problems could occur]
>>
>> We are adding a significant amount of code to the Xen subsystem, 
>> spread across
>> many commits. This code has not been mainlined, and is instead 
>> maintained out
>> of tree by the Amazon AWS Hibernation team.
>>
>> The changes target hibernation, block devices, and clock devices, 
>> specific to
>> those used on AWS Xen instances. Most of these patches have been 
>> applied to
>> Xenial, Bionic, Focal and other series for a long time, but some 
>> patches are
>> new for 5.15 onward.
>>
>> The changes will only target linux-aws to try and limit regression 
>> risk to
>> AWS users, and any regressions will be limited to users of Xen based 
>> instance
>> types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen 4.11.
>>
>> If a regression were to occur, the instance would likely fail to 
>> hibernate, and
>> at worst, write an incomplete hibernation image to the swapfile. The 
>> kernel will
>> see this on start, and instead of resuming from the hibernation image, 
>> will
>> start fresh. It is unlikely to cause any filesystem corruption on the 
>> rootfs,
>> but any in progress computations at the time of hibernation could be 
>> lost. The
>> current broken behaviour breaks networking, and users would have to 
>> power cycle
>> the instance a few times before they can ssh in again.
>>
>> Aleksei Besogonov (1):
>>    PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
>>
>> Anchal Agarwal (4):
>>    x86/xen: Introduce new function to map HYPERVISOR_shared_info on
>>      Resume
>>    Revert "xen: dont fiddle with event channel masking in suspend/resume"
>>    xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>>    xen: Restore xen-pirqs on resume from hibernation
>>
>> Eduardo Valentin (2):
>>    x86: tsc: avoid system instability in hibernation
>>    block: xen-blkfront: consider new dom0 features on restore
>>
>> Frank van der Linden (3):
>>    xen: restore pirqs on resume from hibernation.
>>    xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
>>    xen-netfront: call netif_device_attach on resume
>>
>> Munehisa Kamata (10):
>>    xen/manage: keep track of the on-going suspend mode
>>    xen/manage: introduce helper function to know the on-going suspend
>>      mode
>>    xenbus: add freeze/thaw/restore callbacks support
>>    x86/xen: add system core suspend and resume callbacks
>>    xen-blkfront: add callbacks for PM suspend and hibernation
>>    xen-netfront: add callbacks for PM suspend and hibernation support
>>    xen/time: introduce xen_{save,restore}_steal_clock
>>    x86/xen: save and restore steal clock
>>    xen/events: add xen_shutdown_pirqs helper function
>>    x86/xen: close event channels for PIRQs in system core suspend
>>      callback
>>
>>   arch/x86/kernel/tsc.c             |  29 ++++++
>>   arch/x86/xen/enlighten_hvm.c      |   8 ++
>>   arch/x86/xen/suspend.c            |  67 +++++++++++++
>>   arch/x86/xen/time.c               |   3 +
>>   arch/x86/xen/xen-ops.h            |   2 +
>>   drivers/block/xen-blkfront.c      | 161 ++++++++++++++++++++++++++++--
>>   drivers/net/xen-netfront.c        | 104 ++++++++++++++++++-
>>   drivers/xen/events/events_base.c  |  30 +++++-
>>   drivers/xen/manage.c              |  73 ++++++++++++++
>>   drivers/xen/time.c                |  29 +++++-
>>   drivers/xen/xenbus/xenbus_probe.c |  99 +++++++++++++++---
>>   include/linux/irq.h               |   2 +
>>   include/linux/sched/clock.h       |   5 +
>>   include/xen/events.h              |   2 +
>>   include/xen/xen-ops.h             |   8 ++
>>   include/xen/xenbus.h              |   3 +
>>   kernel/irq/chip.c                 |   4 +-
>>   kernel/power/user.c               |   4 +
>>   kernel/sched/clock.c              |   4 +-
>>   19 files changed, 604 insertions(+), 33 deletions(-)
>>
> Acked-by: Tim Gardner <tim.gardner at canonical.com>
> 
> Nice work. Since I'm likely the one that will apply these patches, I'm 
> going to make 2 changes.
> 
> 1) Add hibernation to the commit subject so that the intent of the patch 
> is clear.
> 2) Add the URL to Amazon git repository in the commit message.
> 
> 6 months from now those 2 bits of info will be a big help in remembering 
> what these patches are for, especially for those of us with goldfish 
> memories.
> 
> rtg
> 

P.S. In the future, for patch sets this large please submit the patch 
set as a pull request. Especially for interleaved sets like this.

-- 
-----------
Tim Gardner
Canonical, Inc



More information about the kernel-team mailing list