[Bug 2053288] Re: systemd-networkd IPv6 default routes dropped under load, don't recover
Malcolm Scott
2053288 at bugs.launchpad.net
Tue Jun 11 19:33:42 UTC 2024
I see this behaviour too, quite often on multiple machines, always
seemingly happening when the machine is under high load. When it
happens, systemd-networkd typically logs something like:
systemd-networkd[3512]: enp193s0f0np0: Could not set route: Connection timed out
systemd-networkd[3512]: enp193s0f0np0: Failed
In my eyes there are two or three problems here:
1. networkd is deleting and re-adding a route when it probably doesn't
need to (I guess when it receives an ICMPv6 router advertisement); the
route hasn't changed and an identical one already exists in the routing
table
2. The error is handled poorly; perhaps it could retry
3. After the error, the default route stays missing _permanently_ (until
systemd-networkd is prodded with e.g. "netplan apply"); at the very
least it ought to try to re-add the route next time it sees an RA packet
** Bug watch added: github.com/systemd/systemd/issues #25441
https://github.com/systemd/systemd/issues/25441
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2053288
Title:
systemd-networkd IPv6 default routes dropped under load, don't recover
Status in systemd package in Ubuntu:
New
Bug description:
Ubuntu 22.04.3 LTS
systemd 249.11-0ubuntu3.12
systemd issue tracker says this version is too old to report upstream
and I should report to downstream bug tracker.
IPv6 default routes are getting lost and not renewed.
We're using IPv6 RA to find default routes for our servers and
desktops. The RAs come from HP/Aruba routers and have a short lifetime
of about 46s. Occasionally, we will see the default routes get
dropped. Despite receiving RAs, the default routes don't get
recreated.
The most recent machine to be affected had a user running an
excessively large job (load average 157). This is the state of the
network when the machine is working:
```sh
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff
altname enp4s0f0
3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff permaddr 2c:ea:7f:56:9a:67
altname enp4s0f1
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff
inet xxx.xxx.202.112/24 brd 129.215.202.255 scope global bond0
valid_lft forever preferred_lft forever
inet6 xxxx:xxx:xxx:202:2eea:7fff:fe56:9a66/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 2591994sec preferred_lft 604794sec
inet6 fe80::2eea:7fff:fe56:9a66/64 scope link
valid_lft forever preferred_lft forever
# ip -6 r
::1 dev lo proto kernel metric 256 pref medium
xxxx:xxx:xxx:202::/64 dev bond0 proto ra metric 1024 expires 2591998sec pref medium
fe80::/64 dev bond0 proto kernel metric 256 pref medium
default proto ra metric 1024 expires 28sec pref medium
nexthop via fe80::609:73ff:fe48:c000 dev bond0 weight 1
nexthop via fe80::609:73ff:fe48:6500 dev bond0 weight 1
```
When the problem arises, the last three lines disappear. `tcpdump
icmp6` shows RAs being received but networkd doesn't create the routes
in the kernel. The machine keeps its IPv6 addresses, but without a
default route it can't make any IPv6 connections or answer incoming
IPv6 connections.
Sorry, reproduction method is unclear. Here's a best guess:
1. Configure networkd using netplan:
```yaml
---
network:
bonds:
bond0:
addresses:
- xxx.xxx.202.112/24
dhcp4: false
interfaces:
- eth0
- eth1
macaddress: 2C:EA:7F:56:9A:66
parameters:
mii-monitor-interval: 1
mode: active-backup
ethernets:
eth0:
dhcp4: false
match:
macaddress: 2C:EA:7F:56:9A:66
eth1:
dhcp4: false
match:
macaddress: 2C:EA:7F:56:9A:67
renderer: networkd
version: 2
```
2. Load the machine, or just wait. Possibly this is related to packets being dropped, but I would expect the system to recover once the load is removed.
3. Note the lack of IPv6 connectivity, inability to log in with ssh, etc.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2053288/+subscriptions
More information about the foundations-bugs
mailing list