[Bug 1986781] Re: [Ubuntu 22.04]cloud-init failed to complete after 10 minutes of waiting was shown during Installation via iDRAC Virtual Console

Mauricio Faria de Oliveira 1986781 at bugs.launchpad.net
Mon Jan 23 20:59:45 UTC 2023


Reproducer/Verification for SRU to Jammy,
based on strace delay injection on read().

Uploading to jammy.
Attaching debdiff for reference.

...

Launch a VM with the Jammy daily live server:

        $ wget https://cdimage.ubuntu.com/ubuntu-server/jammy/daily-
live/current/jammy-live-server-amd64.iso

	$ ISO=jammy-live-server-amd64.iso
	$ VM=casper-jammy
	$ virt-install  --name $VM --cdrom $ISO --vcpus 2 --memory 2048 --disk none --osinfo ubuntu-stable-latest 

Press e to edit, append "break=init console=ttyS0", press ctrl-x, close
window.

Open the serial console and chroot:

	$ virsh console $VM
	...
	(initramfs) chroot /root /bin/bash

Test strace delay injection:

	# time strace --trace read cat /dev/null
	read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\3206\2\0\0\0\0\0"..., 832) = 832
	read(3, "", 131072)                     = 0
	+++ exited with 0 +++

	real	0m0.041s
	user	0m0.009s
	sys	0m0.030s

	# time strace --trace read --inject read:delay_enter=5s cat /dev/null
	read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\3206\2\0\0\0\0\0"..., 832) = 832 (DELAYED)
	read(3, "", 131072)                     = 0 (DELAYED)
	+++ exited with 0 +++

	real	0m10.041s
	user	0m0.010s
	sys	0m0.033s

Modify the casper-md5check service:

        # sed -i '/^ExecStart=/ s,=,=/usr/bin/strace --inject
read:delay_enter=60s ,' /usr/lib/systemd/system/casper-md5check.service

	# cat /usr/lib/systemd/system/casper-md5check.service
	[Unit]
	Description=casper-md5check Verify Live ISO checksums

	[Service]
	Type=oneshot
	ExecStart=/usr/bin/strace --inject read:delay_enter=60s /usr/lib/casper/casper-md5check /cdrom /cdrom/md5sum.txt
	RemainAfterExit=yes

	[Install]
	WantedBy=multi-user.target

Exit and wait; the issue happens:

	# exit
	(initramfs) exit

        ...

        Ubuntu 22.04.1 ubuntu-server ttyS0


	connecting...
	waiting for cloud-init... /

        <10 minutes later>

	================================================================================
	  Serial                                                              [ Help ]
	================================================================================

	  As the installer is running on a serial console, it has started in basic
	  mode, using only the ASCII character set and black and white colours.

	   ┌────────────────────────────────────────────────────────────────────────┐
	   │                                                                        │
	   │  cloud-init failed to complete after 10 minutes of waiting. This       │
	   │  suggests a bug, which we would appreciate help understanding.  If     │
	   │  you could file a bug at                                               │
	   │  https://bugs.launchpad.net/subiquity/+filebug and attach the          │
	   │  contents of /var/log, it would be most appreciated.                   │
	   │                                                                        │
	   │                         [ Switch to a shell ]                          │
	   │                         [ Close             ]                          │
	   │                                                                        │
	   └────────────────────────────────────────────────────────────────────────┘


		                  [ Continue in rich mode  > ]
		                  [ Continue in basic mode > ]


...

Similarly, repeat, this time with test packages from ppa:mfo/lp1986781:

...

Open the serial console and chroot:


	$ virsh console $VM
	...
	(initramfs) chroot /root /bin/bash

Install the test package:

	# dhclient enp1s0
	# wget https://launchpad.net/~mfo/+archive/ubuntu/lp1986781/+build/25512466/+files/casper_1.470.2_amd64.deb
	# dpkg -i casper_*.deb

Modify the casper-md5check service:

        # sed -i '/^ExecStart=/ s,=,=/usr/bin/strace --inject
read:delay_enter=60s ,' /usr/lib/systemd/system/casper-md5check.service

	# cat /usr/lib/systemd/system/casper-md5check.service
	Description=casper-md5check Verify Live ISO checksums
	After=multi-user.target

	[Service]
	Type=oneshot
	ExecStart=/usr/bin/strace --inject read:delay_enter=60s /usr/lib/casper/casper-md5check /cdrom /cdrom/md5sum.txt
	RemainAfterExit=yes

	[Install]
	WantedBy=multi-user.target

Exit and wait; the issue does _not_ happen anymore:

	# exit
	(initramfs) exit

...

        Ubuntu 22.04.1 ubuntu-server ttyS0


	connecting...
	waiting for cloud-init... /

        <some seconds later>


	================================================================================
	  Serial                                                              [ Help ]
	================================================================================

	  As the installer is running on a serial console, it has started in basic
	  mode, using only the ASCII character set and black and white colours.

	  If you are connecting from a terminal emulator such as gnome-terminal that
	  supports unicode and rich colours you can switch to "rich mode" which uses
	  unicode, colours and supports many languages.

	  You can also connect to the installer over the network via SSH, which will
	  allow use of rich mode.


		                  [ Continue in rich mode  > ]
		                  [ Continue in basic mode > ]
		                  [ View SSH instructions    ]


Checking the casper-md5sum service is still running:

        Help > Enter shell.

	# systemctl status casper-md5check.service --no-pager | grep 'Active:'
	     Active: activating (start) since Mon 2023-01-23 18:56:50 UTC; 3min 26s ago

And it should not be a problem, as its start timeout is not limited.

	# systemctl show casper-md5check.service | grep -i timeout
	TimeoutStartUSec=infinity
	TimeoutStopUSec=1min 30s
	TimeoutAbortUSec=1min 30s
	TimeoutStartFailureMode=terminate
	TimeoutStopFailureMode=terminate
	TimeoutCleanUSec=infinity
	JobTimeoutUSec=infinity
	JobRunningTimeoutUSec=infinity
	JobTimeoutAction=none

Quit, cleanup the VM.

        Press ctrl-]

	$ virsh destroy $VM
	$ virsh undefine $VM


** Description changed:

+ [Impact]
+ 
+  * Users that install Ubuntu Server through slow
+    media (eg, virtual optical drive over network,
+    which may be common on enterprise deployments)
+    might hit the following subiquity startup error:
+ 
+   'cloud-init failed to complete after 10 minutes of waiting'
+ 
+  * (That in addition to 10 minutes of waiting themselves.)
+ 
+  * This happens because casper-md5check.service is
+    (slowly) verifying the integrity of install media,
+    which blocks `multi-user.target`,
+    which blocks `cloud-final.service`,
+    which blocks `cloud-init status --wait`
+    which is used in subiquity / `waiting on cloud-init`).
+ 
+ [Fix]
+ 
+  * The adopted solution (merged on lunar) is simply
+    not to block `multi-user.target`, but rather run
+    _after_ it.
+    
+ [Test Steps]
+ 
+  For a synthetic reproducer of slowness of casper-md5check:
+  
+  * boot with `break=init` to break into initramfs-tools
+    before exec() systemd.
+  * chroot /root /bin/bash
+  * edit /usr/lib/systemd/system/casper-md5check.service
+  * prepend `strace --inject read:delay_enter=5s` to the
+    command in `ExecStart`, to introduce a 5 secs delay
+    to every read() syscall performed by casper-md5check.
+  * exit twice (chroot, initramfs shell) to resume boot.
+   
+   See comment 37 for examples.
+ 
+ [Other Info]
+ 
+  * There's a small glitch in the proposed solution:
+    the systemd line when casper-md5check finishes
+    shows up on top of subiquity's menu (screenshot):
+ 
+    "[ OK ] Finished casper-md5check Verify Live ISO checksums."
+ 
+    Dan Bungert mentioned this is known and should be
+    addressed in a future change to subiquity, and is
+    not supposed to block the SRU for Jammy / 22.04.2.
+ 
+ 
+ [Original Description]
+ 
  Description:
  
  On Dell EMC PowerEdge system when Install Ubuntu 22.04 via iDRAC Virtual
  Console, cloud-init failed to complete after 10 minutes of waiting.
  
  Steps to Reproduce:
  
  1. Login to iDRAC and Launch Virtual Console.
  2. Connect to Virtual Media and Map ubuntu 22.04 iso file using Map CD/DVD option.
  3. Try Installing Ubuntu server.
  4. "cloud-init" failed to complete after 10 minutes of waiting was shown during Installation.
  
  Expected Results :-
  
  Installation should be successful.

** Description changed:

  [Impact]
  
-  * Users that install Ubuntu Server through slow
-    media (eg, virtual optical drive over network,
-    which may be common on enterprise deployments)
-    might hit the following subiquity startup error:
+  * Users that install Ubuntu Server through slow
+    media (eg, virtual optical drive over network,
+    which may be common on enterprise deployments)
+    might hit the following subiquity startup error:
  
-   'cloud-init failed to complete after 10 minutes of waiting'
+   'cloud-init failed to complete after 10 minutes of waiting'
  
-  * (That in addition to 10 minutes of waiting themselves.)
+  * (That in addition to 10 minutes of waiting themselves.)
  
-  * This happens because casper-md5check.service is
-    (slowly) verifying the integrity of install media,
-    which blocks `multi-user.target`,
-    which blocks `cloud-final.service`,
-    which blocks `cloud-init status --wait`
-    which is used in subiquity / `waiting on cloud-init`).
+  * This happens because casper-md5check.service is
+    (slowly) verifying the integrity of install media,
+    which blocks `multi-user.target`,
+    which blocks `cloud-final.service`,
+    which blocks `cloud-init status --wait`
+    which is used in subiquity / `waiting on cloud-init`).
  
  [Fix]
  
-  * The adopted solution (merged on lunar) is simply
-    not to block `multi-user.target`, but rather run
-    _after_ it.
-    
+  * The adopted solution (merged on lunar) is simply
+    not to block `multi-user.target`, but rather run
+    _after_ it.
+ 
  [Test Steps]
  
-  For a synthetic reproducer of slowness of casper-md5check:
-  
-  * boot with `break=init` to break into initramfs-tools
-    before exec() systemd.
-  * chroot /root /bin/bash
-  * edit /usr/lib/systemd/system/casper-md5check.service
-  * prepend `strace --inject read:delay_enter=5s` to the
-    command in `ExecStart`, to introduce a 5 secs delay
-    to every read() syscall performed by casper-md5check.
-  * exit twice (chroot, initramfs shell) to resume boot.
-   
-   See comment 37 for examples.
+  For a synthetic reproducer of slowness of casper-md5check:
  
- [Other Info]
+  * boot with `break=init` to break into initramfs-tools
+    before exec() systemd.
+  * chroot /root /bin/bash
+  * edit /usr/lib/systemd/system/casper-md5check.service
+  * prepend `strace --inject read:delay_enter=5s` to the
+    command in `ExecStart`, to introduce a 5 secs delay
+    to every read() syscall performed by casper-md5check.
+  * exit twice (chroot, initramfs shell) to resume boot.
  
-  * There's a small glitch in the proposed solution:
-    the systemd line when casper-md5check finishes
-    shows up on top of subiquity's menu (screenshot):
+   See comment 37 for examples.
  
-    "[ OK ] Finished casper-md5check Verify Live ISO checksums."
+ [Regression Potential]
  
-    Dan Bungert mentioned this is known and should be
-    addressed in a future change to subiquity, and is
-    not supposed to block the SRU for Jammy / 22.04.2.
+  * Functionality related to install media integrity check.
  
+  * Users with corrupted install media might not realize
+    this until later on; but this is rarely the case and
+    even w/out the fix, there's a lot that runs _before_
+    we even get to casper-md5check, so they may (still)
+    see errors early anyway.
+ 
+  * There's a cosmetic glitch in the proposed solution:
+    the systemd line when casper-md5check finishes
+    shows up on top of subiquity's menu (screenshot):
+ 
+    "[ OK ] Finished casper-md5check Verify Live ISO checksums."
+ 
+    Dan Bungert mentioned this is known and should be
+    addressed in a future change to subiquity, and is
+    not supposed to block the SRU for Jammy / 22.04.2.
  
  [Original Description]
  
  Description:
  
  On Dell EMC PowerEdge system when Install Ubuntu 22.04 via iDRAC Virtual
  Console, cloud-init failed to complete after 10 minutes of waiting.
  
  Steps to Reproduce:
  
  1. Login to iDRAC and Launch Virtual Console.
  2. Connect to Virtual Media and Map ubuntu 22.04 iso file using Map CD/DVD option.
  3. Try Installing Ubuntu server.
  4. "cloud-init" failed to complete after 10 minutes of waiting was shown during Installation.
  
  Expected Results :-
  
  Installation should be successful.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to casper in Ubuntu.
https://bugs.launchpad.net/bugs/1986781

Title:
  [Ubuntu 22.04]cloud-init failed to complete after 10 minutes of
  waiting was shown during Installation via iDRAC Virtual Console

Status in cloud-init:
  Invalid
Status in subiquity:
  Invalid
Status in casper package in Ubuntu:
  Fix Released
Status in casper source package in Focal:
  Triaged
Status in casper source package in Jammy:
  In Progress
Status in casper source package in Kinetic:
  Won't Fix

Bug description:
  [Impact]

   * Users that install Ubuntu Server through slow
     media (eg, virtual optical drive over network,
     which may be common on enterprise deployments)
     might hit the following subiquity startup error:

    'cloud-init failed to complete after 10 minutes of waiting'

   * (That in addition to 10 minutes of waiting themselves.)

   * This happens because casper-md5check.service is
     (slowly) verifying the integrity of install media,
     which blocks `multi-user.target`,
     which blocks `cloud-final.service`,
     which blocks `cloud-init status --wait`
     which is used in subiquity / `waiting on cloud-init`).

  [Fix]

   * The adopted solution (merged on lunar) is simply
     not to block `multi-user.target`, but rather run
     _after_ it.

  [Test Steps]

   For a synthetic reproducer of slowness of casper-md5check:

   * boot with `break=init` to break into initramfs-tools
     before exec() systemd.
   * chroot /root /bin/bash
   * edit /usr/lib/systemd/system/casper-md5check.service
   * prepend `strace --inject read:delay_enter=5s` to the
     command in `ExecStart`, to introduce a 5 secs delay
     to every read() syscall performed by casper-md5check.
   * exit twice (chroot, initramfs shell) to resume boot.

    See comment 37 for examples.

  [Regression Potential]

   * Functionality related to install media integrity check.

   * Users with corrupted install media might not realize
     this until later on; but this is rarely the case and
     even w/out the fix, there's a lot that runs _before_
     we even get to casper-md5check, so they may (still)
     see errors early anyway.

   * There's a cosmetic glitch in the proposed solution:
     the systemd line when casper-md5check finishes
     shows up on top of subiquity's menu (screenshot):

     "[ OK ] Finished casper-md5check Verify Live ISO checksums."

     Dan Bungert mentioned this is known and should be
     addressed in a future change to subiquity, and is
     not supposed to block the SRU for Jammy / 22.04.2.

  [Original Description]

  Description:

  On Dell EMC PowerEdge system when Install Ubuntu 22.04 via iDRAC
  Virtual Console, cloud-init failed to complete after 10 minutes of
  waiting.

  Steps to Reproduce:

  1. Login to iDRAC and Launch Virtual Console.
  2. Connect to Virtual Media and Map ubuntu 22.04 iso file using Map CD/DVD option.
  3. Try Installing Ubuntu server.
  4. "cloud-init" failed to complete after 10 minutes of waiting was shown during Installation.

  Expected Results :-

  Installation should be successful.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1986781/+subscriptions




More information about the foundations-bugs mailing list