[Bug 2002445] Re: udev NIC renaming race with mlx5_core driver
Nick Rosbrook
2002445 at bugs.launchpad.net
Mon Mar 20 13:36:00 UTC 2023
The fix is incomplete for jammy and newer, and we will need additional
patches from upstream.
** Tags removed: verification-needed-jammy verification-needed-kinetic
** Tags added: verification-failed-jammy verification-failed-kinetic
** Changed in: systemd (Ubuntu Lunar)
Status: Fix Released => In Progress
** Changed in: systemd (Ubuntu Jammy)
Status: Fix Committed => In Progress
** Changed in: systemd (Ubuntu Kinetic)
Status: Fix Committed => In Progress
** Changed in: systemd (Ubuntu Jammy)
Assignee: (unassigned) => Nick Rosbrook (enr0n)
** Changed in: systemd (Ubuntu Kinetic)
Assignee: (unassigned) => Nick Rosbrook (enr0n)
** Changed in: systemd (Ubuntu Lunar)
Assignee: (unassigned) => Nick Rosbrook (enr0n)
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445
Title:
udev NIC renaming race with mlx5_core driver
Status in systemd package in Ubuntu:
In Progress
Status in systemd source package in Focal:
Triaged
Status in systemd source package in Jammy:
In Progress
Status in systemd source package in Kinetic:
In Progress
Status in systemd source package in Lunar:
In Progress
Bug description:
[Impact]
On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core driver's own configuration of subordinate interfaces. When the kernel wins this race, the device cannot be renamed as udev has attempted, and this causes systemd-network-online.target to timeout waiting for links to be configured. This ultimately results in boot being delayed by about 2 minutes.
[Test Plan]
Repeated launches of Standard_D8ds_v5 instance types will generally hit this race around 1 in 10 runs. Create a vm snapshot with updated systemd from ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated systemd. Assert not failure in cloud-init status and no 2 minute delay in network-online.target.
To check for failure symptom:
- Assert that network-online.target isn't the longest pole from systemd-analyze blame.
To assert success condition during net rename busy race:
- assert when "eth1" is still the primary device name, that two altnames are listed (preserving the altname due to the primary NIC rename being hit.
Sample script uses pycloudlib to create modified base image for test
and launches 100 VMs of type Standard_D8ds_v5, counting both successes
and any failures seen.
#!/usr/bin/env python3
# This file is part of pycloudlib. See LICENSE file for license information.
"""Basic examples of various lifecycle with an Azure instance."""
import logging
import json
import pycloudlib
LOG = logging.getLogger()
base_cfg = """#cloud-config
ssh-import-id: [chad.smith, enr0n]
"""
apt_cfg = """
# Add developer PPA
apt:
sources:
systemd-testing:
source: "deb [allow-insecure=yes] https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
# upgrade systemd after cloud-init is nearly done
runcmd:
- apt install systemd udev -y --allow-unauthenticated
"""
debug_systemd_cfg = """
# Create systemd-udev debug override.conf in base image
write_files:
- path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug
- path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug
LogRateLimitIntervalSec=0
"""
cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
cloud_config2 = base_cfg + debug_systemd_cfg
def debug_systemd_image_launch_overlake_v5_with_snapshot():
"""Test overlake v5 timeouts
test procedure:
- Launch base focal image
- enable ppa:enr0n/systemd-245 and systemd/udev debugging
- cloud-init clean --logs && deconfigure waalinux agent before shutdown
- snapshot a base image
- launch v5 system from snapshot
- check systemd-analyze for expected timeout
"""
client = pycloudlib.Azure(tag="azure")
image_id = client.daily_image(release="focal")
pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
priv_path = "/home/ubuntu/.ssh/id_rsa"
client.use_key(pub_path, priv_path)
base_instance = client.launch(
image_id=image_id,
instance_type="Standard_DS1_v2",
user_data=cloud_config.format(defer="true"),
)
LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
base_instance.wait()
LOG.info(base_instance.execute("apt cache policy systemd"))
snapshotted_image_id = client.snapshot(base_instance)
reproducer = False
tries = 0
success_count_with_race = 0
success_count_no_race = 0
failure_count_network_delay = 0
failure_count_no_altnames = 0
TEST_SUMMARY_TMPL = """
----- Test run complete: {tries} attempted -----
Successes without rename race: {success_count_no_race}
Successes with rename race and preserved altname: {success_count_with_race}
Failures due to network delay: {failure_count_network_delay}
Failures due to no altnames persisted: {failure_count_no_altnames}
===================================
"""
while tries < 100 and not reproducer:
tries += 1
new_instance = client.launch(
image_id=snapshotted_image_id,
instance_type="Standard_D8ds_v5",
user_data=cloud_config.format(defer="false"),
)
# breadcrumb for us pycloudlib/Azure will reuse available IPs
new_instance.wait()
blame = new_instance.execute("systemd-analyze blame").splitlines()
LOG.info(f"--- Attempt {tries} ssh ubuntu@{new_instance.ip} Blame: {blame[0]}")
ip_addr = json.loads(new_instance.execute("ip -j addr").stdout)
rename_race_present = False # set true when we see eth1 not renamed
for d in ip_addr:
if d['ifname'] == "eth1":
rename_race_present = True
if len(d.get('altnames', [])) > 1:
LOG.info(
f"--- SUCCESS persisting altnames {d['altnames']} due to rename race on resource busy on {d['ifname']}"
)
else:
failure_count_no_altnames += 1
LOG.error(f"FAILURE: to preserve altnames for {d['ifname']}. Only preserved {d.get('altnames', [])}")
LOG.info(new_instance.execute("journalctl -u systemd-udevd.service -b 0 --no-pager"))
LOG.info('\n'.join([f'{d["ifname"]}: {d.get("altnames")}' for d in ip_addr]))
if "systemd-networkd-wait-online.service" not in blame[0]:
if rename_race_present:
success_count_with_race += 1
else:
success_count_no_race += 1
LOG.info(f"Destroying instance, normal boot seen: {blame[0]}")
new_instance.delete()
else:
failure_count_network_delay += 1
LOG.info(f"--- Attempt {tries} found delayed instance boot: {blame[0]}")
r = new_instance.execute("journalctl -u systemd-udevd.service -b 0 --no-pager")
LOG.info(r)
if "Failure to rename" in str(r):
LOG.info(f"Found rename refusal!: {r[0]}")
new_instance.delete()
LOG.info(
TEST_SUMMARY_TMPL.format(
success_count_with_race=success_count_with_race,
success_count_no_race=success_count_no_race,
failure_count_network_delay=failure_count_network_delay,
failure_count_no_altnames=failure_count_no_altnames, tries=tries
)
)
base_instance.delete(wait=False)
if __name__ == "__main__":
# Avoid polluting the log with azure info
logging.getLogger("paramiko").setLevel(logging.WARNING)
logging.getLogger("pycloudlib").setLevel(logging.WARNING)
logging.getLogger("adal-python").setLevel(logging.WARNING)
logging.getLogger("cli.azure.cli.core").setLevel(logging.WARNING)
logging.basicConfig(filename="sru-systemd.log", level=logging.INFO)
debug_systemd_image_launch_overlake_v5_with_snapshot()
[Where problems could occur]
The patches effectively make it so that if a interface cannot be renamed from udev, then the new name is left as an alternative name as a fallback. If problems occur, it would be related to device renaming, and particularly related to the devices alternative names.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+subscriptions
More information about the foundations-bugs
mailing list