[Bug 1896638] Re: Path to swapfile doesn't use a static device path

Alberto Contreras 1896638 at bugs.launchpad.net
Tue Apr 25 17:12:27 UTC 2023


That's a great idea, thanks!

I have found a way to send the hibernation signal using AWS FIS and
reflected the user story test in the [Test Case].

Thanks for pointing this out.

** Description changed:

  [Impact]
  
  * Using the device name on the kernel cmdline in the resume= option
  leads to failure to resume from hibernation when the device name is not
  stable, which can be the case for nvme drives.
  
  [Test Case]
  
- * Set up an EC2 instance to allow hibernation
- * Wait for hibinit-agent.service fully started
- * /etc/default/grub.d/99-set-swap.cfg should refer to the resume= partition by PARTUUID
+ * ec2-hibinit-agent
+ 
+   * Set up an EC2 instance to allow hibernation
+   * Wait for hibinit-agent.service fully started
+   * /etc/default/grub.d/99-set-swap.cfg should refer to the resume=partition by PARTUUID
+ 
+ * hibagent
+ 
+   * Spin up an EC2 spot instance with `hibernate` as `Interruption behavior` [1].
+   * Install the latest hibagent: `sudo apt-get install hibagent`
+   * Enable hibernation: `sudo /usr/bin/enable-ec2-spot-hibernation`
+   * Create an AWS FIS experiment template to send a spot-instance-interruption signal [2], make it point to the created instance and launch it.
+     Note: This step is optional, one can wait for AWS EC2 to send the interruption signal, but it could take a lot of time.
+   * After some minutes, EC2 will send a signal to resume the interrupted instance.
+   * Verify the instance has correctly been resumed from hibernation.
+ 
+ [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/interruption-behavior.html#specifying-spot-interruption-behavior
+ [2] https://catalog.us-east-1.prod.workshops.aws/workshops/5fc0039f-9f15-47f8-aff0-09dc7b1779ee/en-US/030-basic-content/078-ec2-spot/020-spot-ec2-interrup
  
  [Regression Potential]
  
  * Failure to discover PARTUUID makes the system unable to resume. A
  potential crash would cause the system unable to set up hibernation or
  unable to resume. (On Focal PARTUUID is already in use, even without
  this fix.)
  
  [Original Bug Text]
  
  When the agent inserts the resume device path and offset into the kernel
  cmdline, it uses device names such as the following:
  
  `resume_offset=223232 resume=/dev/nvme1n1p1`
  
  The issue is that `/dev/nvme1n1p1` is not static. On the reboot, the
  block device may appear at `/dev/nvme0n1p1` resulting in failure to find
  the swapfile used to suspend.
  
  The solution should be to use a persistent block device naming scheme.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to hibagent in Ubuntu.
https://bugs.launchpad.net/bugs/1896638

Title:
  Path to swapfile doesn't use a static device path

Status in ec2-hibinit-agent package in Ubuntu:
  Fix Released
Status in hibagent package in Ubuntu:
  Fix Released
Status in ec2-hibinit-agent source package in Xenial:
  Fix Released
Status in hibagent source package in Xenial:
  New
Status in ec2-hibinit-agent source package in Bionic:
  Fix Released
Status in hibagent source package in Bionic:
  New
Status in ec2-hibinit-agent source package in Focal:
  Fix Released
Status in hibagent source package in Focal:
  New
Status in ec2-hibinit-agent source package in Groovy:
  Fix Released
Status in hibagent source package in Groovy:
  Won't Fix
Status in hibagent source package in Jammy:
  New
Status in hibagent source package in Kinetic:
  New

Bug description:
  [Impact]

  * Using the device name on the kernel cmdline in the resume= option
  leads to failure to resume from hibernation when the device name is
  not stable, which can be the case for nvme drives.

  [Test Case]

  * ec2-hibinit-agent

    * Set up an EC2 instance to allow hibernation
    * Wait for hibinit-agent.service fully started
    * /etc/default/grub.d/99-set-swap.cfg should refer to the resume=partition by PARTUUID

  * hibagent

    * Spin up an EC2 spot instance with `hibernate` as `Interruption behavior` [1].
    * Install the latest hibagent: `sudo apt-get install hibagent`
    * Enable hibernation: `sudo /usr/bin/enable-ec2-spot-hibernation`
    * Create an AWS FIS experiment template to send a spot-instance-interruption signal [2], make it point to the created instance and launch it.
      Note: This step is optional, one can wait for AWS EC2 to send the interruption signal, but it could take a lot of time.
    * After some minutes, EC2 will send a signal to resume the interrupted instance.
    * Verify the instance has correctly been resumed from hibernation.

  [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/interruption-behavior.html#specifying-spot-interruption-behavior
  [2] https://catalog.us-east-1.prod.workshops.aws/workshops/5fc0039f-9f15-47f8-aff0-09dc7b1779ee/en-US/030-basic-content/078-ec2-spot/020-spot-ec2-interrup

  [Regression Potential]

  * Failure to discover PARTUUID makes the system unable to resume. A
  potential crash would cause the system unable to set up hibernation or
  unable to resume. (On Focal PARTUUID is already in use, even without
  this fix.)

  [Original Bug Text]

  When the agent inserts the resume device path and offset into the
  kernel cmdline, it uses device names such as the following:

  `resume_offset=223232 resume=/dev/nvme1n1p1`

  The issue is that `/dev/nvme1n1p1` is not static. On the reboot, the
  block device may appear at `/dev/nvme0n1p1` resulting in failure to
  find the swapfile used to suspend.

  The solution should be to use a persistent block device naming scheme.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ec2-hibinit-agent/+bug/1896638/+subscriptions




More information about the foundations-bugs mailing list