[Bug 1818340] Re: systemd-networkd core dumps in bionic-proposed
Daniel Axtens
dja at axtens.net
Tue Mar 5 13:04:00 UTC 2019
OK, so with the magic of debug symbols and gdb on Cosmic:
(gdb) run
...
ens8: Gained IPv6LL
Assertion 'link->state == LINK_STATE_SETTING_ADDRESSES' failed at ../src/network/networkd-link.c:803, function link_enter_set_routes(). Aborting.
...
(gdb) up
#3 0x000055555566b194 in link_enter_set_routes (link=0x55555571d050) at ../src/network/networkd-link.c:803
803 ../src/network/networkd-link.c: No such file or directory.
(gdb) p link->state
$3 = LINK_STATE_PENDING
Looking at the code, it seems we are hitting link_enter_set_routes()
before link_enter_set_addresses() which is where the state is set. We're
hitting link_enter_set_routes() because link_check_ready() now calls it
straight off the bat.
I think the backport just needs to add a check to not flow through to
setting the routes until after we've gone through the process of setting
the addresses; we can do that with the attached patch. (It applies to
the cosmic version, I haven't tested it against Bionic.)
Having said that Dan you've obviously had a closer look at the code and more recently, what patches did you think were needed? It looks like perhaps you could solve this by backporting c42ff3a1a7bf ("networkd: Track address configuration")
and 289e6774d0da ("networkd: Use only a generic CONFIGURING state") - is that what you had in mind?
** Patch added: "0001-Do-not-call-link_enter_set_routes-until-LINK_STATE_S.patch"
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1818340/+attachment/5243670/+files/0001-Do-not-call-link_enter_set_routes-until-LINK_STATE_S.patch
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1818340
Title:
systemd-networkd core dumps in bionic-proposed
Status in systemd package in Ubuntu:
Fix Released
Status in systemd source package in Bionic:
In Progress
Status in systemd source package in Cosmic:
In Progress
Status in systemd source package in Disco:
Fix Released
Bug description:
[Impact]
during restart, systemd-networkd fails an assertion and aborts,
leaving the system networking partially (if at all) configured.
Further restarts continue to fail.
[Test Case]
Install a bionic system (cosmic affected also) with only systemd-
networkd networking (i.e. uninstall or do not configure netplan).
Ensure no networkd conf files are in /run/systemd/network. Stop
networkd (sudo systemctl stop systemd-networkd). The interface to
test with networkd (e.g. ens3) should have no address assigned and
should be down.
Create a file similar to below, adjusting for interface name:
$ cat /etc/systemd/network/10-netplan-ens3.network
[Match]
Name=ens3
[Network]
Address=192.168.122.68/24
Start networkd:
ubuntu at lp1818340-b:~$ sudo systemctl start systemd-networkd
ubuntu at lp1818340-b:~$ ip a show ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:6e:8c:9f brd ff:ff:ff:ff:ff:ff
inet 192.168.122.68/24 brd 192.168.122.255 scope global ens3
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe6e:8c9f/64 scope link
valid_lft forever preferred_lft forever
Stop networkd; ens3 should retain its address:
ubuntu at lp1818340-b:~$ sudo systemctl stop systemd-networkd
ubuntu at lp1818340-b:~$ ip a show ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:6e:8c:9f brd ff:ff:ff:ff:ff:ff
inet 192.168.122.68/24 brd 192.168.122.255 scope global ens3
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe6e:8c9f/64 scope link
valid_lft forever preferred_lft forever
Start networkd again; the bug is triggered:
ubuntu at lp1818340-b:~$ sudo systemctl start systemd-networkd
Job for systemd-networkd.service failed because a fatal signal was delivered causing the control process to dump core.
See "systemctl status systemd-networkd.service" and "journalctl -xe" for details.
Alternately, instead of separately stopping and then starting
networkd, the failure can be reproduced with just a restart.
Note the failure only happens with statically-assigned addresses;
interfaces configured with dhcp do not trigger this bug.
[Regression Potential]
TBD
[Other Info]
This was introduced by the SRU for bug 1812760; both the new behavior
of networkd not removing managed addresses/routes from managed
interfaces, as well as the assertion failure bug. This does not fail
in disco; I believe additional commit(s) from upstream need to be
backported.
Original description:
---
I run a number of servers with -proposed enabled and have seen a bunch
of this today:
Mar 02 16:20:58 4-ridge-fw1 systemd[1]: systemd-networkd.service: Failed with result 'core-dump'.
Mar 02 16:20:58 4-ridge-fw1 systemd[1]: Failed to start Network Service.
These machines have numerous bonds, so I suspect that's a factor.
So far I have only observed the issue on machines with -proposed
enabled so I suspect it is a problem with systemd 237-3ubuntu10.14
Example netplan.yaml attached.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1818340/+subscriptions
More information about the foundations-bugs
mailing list