[Bug 2081124] Re: systemd service dependency loop between cloud-init, NetworkManager and dbus

Andreas Hasenack 2081124 at bugs.launchpad.net
Thu Sep 26 21:34:12 UTC 2024


Hello Yao, or anyone else affected,

Accepted cloud-init into noble-proposed. The package will build now and
be available at https://launchpad.net/ubuntu/+source/cloud-
init/24.3.1-0ubuntu0~24.04.2 in a few hours, and then in the -proposed
repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
noble to verification-done-noble. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-noble. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: cloud-init (Ubuntu Noble)
       Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-noble

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to livecd-rootfs in Ubuntu.
https://bugs.launchpad.net/bugs/2081124

Title:
  systemd service dependency loop between cloud-init, NetworkManager and
  dbus

Status in OEM Priority Project:
  New
Status in cloud-init package in Ubuntu:
  Fix Released
Status in livecd-rootfs package in Ubuntu:
  Invalid
Status in cloud-init source package in Noble:
  Fix Committed
Status in livecd-rootfs source package in Noble:
  Invalid

Bug description:
  [ Impact ]

  cloud-init 24.2 shifted the systemd configuration of cloud-init-
  hotplugd.socket to earlier in boot before sysinit.target, but still
  retained the systemd unit DefaultDependencies. This lead to a systemd
  ordering cycle which affects only Ubuntu Live Desktop image on 24.04
  (Noble) and 24.10 (Oracular) due to a custom system drop in for cloud-
  init.service provided by livecd-rootfs which orders cloud-init.service
  After=NetworkManager.service NetworkManager-wait-online.service.

  The affected systemd ordering cycle messages are visible in journalctl
  -b 0 in either Desktop ephemeral boot or first boot post-installation.

  It may result in either cloud-init-hotplug.service,
  NetworkManager.service or dbus.socket deleted from the systemd boot
  goals resulting in an unresponsive system at first boot.

  Without this changeset, Ubuntu Live Desktop launches of ephemeral boot (or first boot after install) can see "ordering cycle" messages in journalctl -b 0 which leads systemd to kick outany of the following potential conflicting services:
  - cloud-init-hotplugd.service
  - NetworkManager.service
  - dbus.service

  [ Test Plan ]
  Validate both desktop and server images do not expose systemd ordering cycle issues related to hotplug

  == Test  case 1 (desktop) ==
  Download daily noble desktop live image from https://cdimage.ubuntu.com/daily-live/20240421/

  1.Launch in virt-manager or qemu-kvm.
  2. Bring up a gnome terminal during ephemeral boot before responding to any configuration prompts Alt-Ctrl-T
  3. Confirm ordering cycle issues: journalctl -b 0 | grep "ordering cycle"
  4. Shutdown daily failing image
  5. Follow https://help.ubuntu.com/community/LiveCDCustomization#Amending_the_LiveCD_Squash_Files_System to update cloud-init from -proposed in this daily Live Desktop ISO, creating a new desktop-noble-cloud-init-proposed.iso
  6. Launch in virt-manager or qemu-kvm
  7. Confirm ordering cycle is resolved: journalctl -b 0 | grep "ordering cycle"
  8. Confirm all affected services are healthy
  for service_name in NetworkManager.service dbus.service cloud-init-hotplugd.socket cloud-init-hotplugd.service; do
   systemctl status $service_name
  done
  9. Complete live installer prompts and reboot into "first boot"
  10. Login and confirm no ordering cycles on first boot: Atl-Ctrl-T: journalctl -b 0 | grep "ordering cycle"
  11. Assert previously affected services are healthy:
  for service_name in NetworkManager.service dbus.service cloud-init-hotplugd.socket cloud-init-hotplugd.service; do
   systemctl status $service_name
  done
  12. Assert cloud-init is healthy: cloud-init status --format=yaml

  == Test  case 2 (server)  broad integration test coverage ==
  1. Run full suite of cloud-init integration tests using the ppa:cloud-init--proposed PPA against lxd_container lxd_vm
  CLOUD_INIT_PLATFORM=lxd_vm CLOUD_INIT_CLOUD_INIT_SOURCE=PROPOSED CLOUD_INIT_OS_IMAGE=noble tox -e integration-tests

  CLOUD_INIT_PLATFORM=lxd_container
  CLOUD_INIT_CLOUD_INIT_SOURCE=PROPOSED

  2. Run hotplug specific integration tests against ec2 and azure
  CLOUD_INIT_PLATFORM=ec2 CLOUD_INIT_CLOUD_INIT_SOURCE=PROPOSED CLOUD_INIT_OS_IMAGE=noble tox -e integration-tests -- tests/integration_tests/modules/test_hotplug.py

  CLOUD_INIT_PLATFORM=azure CLOUD_INIT_CLOUD_INIT_SOURCE=PROPOSED
  CLOUD_INIT_OS_IMAGE=noble tox -e integration-tests --
  tests/integration_tests/modules/test_hotplug.py

  3.  validate no negative impacts to boot speed
  Leverage https://github.com/canonical/server-test-scripts/pull/201 to get qemu-kvm samples of before/after this changeset to ensure boot speed is not negatively impacted. 

  [ Where problems can occur ]

   * This upload is a direct resolution of where problems could occur.
  If there are systemd ordering cycles introduced by new systemd units
  or services, systemd may punt conflicting services out of boot goals
  for the system. If critical services are deleted from boot goals, the
  system, and affected services will not be brought up and configured as
  anticipated. This leads to misconfigured, unconfigured or inaccessible
  systems. The good news is that the symptom of systemd ordering cycles
  is easily detected during systemd generator timeframe and systemd
  leaves logs in journalctl about any affected services when this
  occurs.

  [ Other Info ]

  This bug in systemd ordering was not seen in Oracular Live images
  originally because of a separate bug:
  https://bugs.launchpad.net/ubuntu/+source/livecd-rootfs/+bug/2081325
  where Desktop image overrides were not being applied to cloud-init-
  network.service (Oracular only). So Oracular did not surface this
  systemd ordering cycle issue. The livecd-rootfs bug has been accepted
  into Oracular Sept 23rd, so that release would have also exhibited
  this broken behavior if the resulting fix from cloud-init was not also
  was accepted to Oracular Sept 23rd as well.

  [ Original Description ]
  We got errors that some services like snapd and NetworkManager is not started when running cloud-init or desktop, excerpt from journal below:

  Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found ordering cycle on NetworkManager-wait-online.service/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on basic.target/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on sockets.target/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on cloud-init-hotplugd.socket/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on cloud-config.target/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Found dependency on cloud-init.service/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: cloud-init.service: Job NetworkManager-wait-online.service/start deleted to break ordering cycle starting with cloud-init.service/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found ordering cycle on dbus.service/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on basic.target/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on sockets.target/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on cloud-init-hotplugd.socket/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on cloud-config.target/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on cloud-init.service/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Found dependency on NetworkManager.service/start
  Sep 13 12:37:41 localhost.localdomain systemd[1]: NetworkManager.service: Job dbus.service/start deleted to break ordering cycle starting with NetworkManager.service/start

  Related logs and service files are attached in sosreport.

  Internal reference: NANTOU-473

To manage notifications about this bug go to:
https://bugs.launchpad.net/oem-priority/+bug/2081124/+subscriptions




More information about the foundations-bugs mailing list