[Bug 1864045] Re: [SRU] Hibernation events sometimes missed on repeated attempts
Andrea Righi
andrea.righi at canonical.com
Wed Mar 4 12:53:45 UTC 2020
@rbalint unfortunately bisecting the kernel is not a trivial task...
there are many changes between the stock 4.15 and the 5.0 kernels and
the process is probably going to take a long time. I'll check if it's
possible to identify only a subset of potential commits that might have
caused this problem.
The steps that I've used to verify that the problem was fixed (or at
least it thought it was fixed) were pretty easy: I got acpid from
https://salsa.debian.org/debian/acpid.git (version 2.0.32-1), recompiled
it, moved it to /usr/sbin/acpid (replacing the stock 2.0.28) and then
tested multiple hibernate/resume cycles via the AWS APIs.
With this "custom" acpid I wasn't able to trigger any failure. It's also
worth mentioning that I was using 5.0.0-1019-aws. I'll repeat my tests
again with the latest bionic aws 5.0 kernel and will check if I can
reproduce the failures.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to acpid in Ubuntu.
https://bugs.launchpad.net/bugs/1864045
Title:
[SRU] Hibernation events sometimes missed on repeated attempts
Status in acpid package in Ubuntu:
Confirmed
Status in linux package in Ubuntu:
Incomplete
Status in acpid source package in Bionic:
Incomplete
Status in linux source package in Bionic:
Incomplete
Status in acpid source package in Eoan:
Confirmed
Status in linux source package in Eoan:
Incomplete
Bug description:
When testing hibernation / resume on AWS with 5.0 or 5.3 kernels on
bionic (using acpid 1:2.0.28-1ubuntu1), we sometimes see failure with
repeated attempts. The first attempt will always be triggered, but the
next attempt may not. The result is the agent never triggers the
hibernation process and the instance will be forced to shutdown after
a timeout period.
Two workarounds have been identified. The first is to restart acpid
during the resume handler. The second is to use the latest upstream
acpid (as of Feb 1, 2020). This second workaround indicates there may
be a patch missing in the acpid in bionic (1:2.0.28-1ubuntu1) to work
with the 5.0+ kernels.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel on a bionic image with on-demand hibernation support enabled.
2) Hibernate and resume the instance, ensuring the system is fully resumed afterward and the swap file has been removed.
3) Hibernate and resume another time. The hibernate should be triggered immediately and the instance should become unresponsive as it saves state to disk.
4) Resume the instance, it should come back with the same processes running.
5) Repeat 3) - 4) as necessary.
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0edf3b95e26a682df
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2a
Ec2InstanceType: m4.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: acpid 1:2.0.28-1ubuntu1
PackageArchitecture: amd64
ProcEnviron:
TERM=screen
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: User Name 5.0.0-1025.28-aws 5.0.21
Tags: bionic ec2-images
Uname: Linux 5.0.0-1025-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/acpid/+bug/1864045/+subscriptions
More information about the foundations-bugs
mailing list