[Bug 2053288] Re: systemd-networkd IPv6 default routes dropped under load, don't recover

Malcolm Scott 2053288 at bugs.launchpad.net
Tue Jun 11 19:33:42 UTC 2024


I see this behaviour too, quite often on multiple machines, always
seemingly happening when the machine is under high load.  When it
happens, systemd-networkd typically logs something like:

systemd-networkd[3512]: enp193s0f0np0: Could not set route: Connection timed out
systemd-networkd[3512]: enp193s0f0np0: Failed

In my eyes there are two or three problems here:

1. networkd is deleting and re-adding a route when it probably doesn't
need to (I guess when it receives an ICMPv6 router advertisement); the
route hasn't changed and an identical one already exists in the routing
table

2. The error is handled poorly; perhaps it could retry

3. After the error, the default route stays missing _permanently_ (until
systemd-networkd is prodded with e.g. "netplan apply"); at the very
least it ought to try to re-add the route next time it sees an RA packet

** Bug watch added: github.com/systemd/systemd/issues #25441
   https://github.com/systemd/systemd/issues/25441

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2053288

Title:
  systemd-networkd IPv6 default routes dropped under load, don't recover

Status in systemd package in Ubuntu:
  New

Bug description:
  Ubuntu 22.04.3 LTS
  systemd 249.11-0ubuntu3.12

  systemd issue tracker says this version is too old to report upstream
  and I should report to downstream bug tracker.

  IPv6 default routes are getting lost and not renewed.

  We're using IPv6 RA to find default routes for our servers and
  desktops. The RAs come from HP/Aruba routers and have a short lifetime
  of about 46s. Occasionally, we will see the default routes get
  dropped. Despite receiving RAs, the default routes don't get
  recreated.

  The most recent machine to be affected had a user running an
  excessively large job (load average 157). This is the state of the
  network when the machine is working:

  ```sh
  # ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host 
         valid_lft forever preferred_lft forever
  2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
      link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff
      altname enp4s0f0
  3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
      link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff permaddr 2c:ea:7f:56:9a:67
      altname enp4s0f1
  4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether 2c:ea:7f:56:9a:66 brd ff:ff:ff:ff:ff:ff
      inet xxx.xxx.202.112/24 brd 129.215.202.255 scope global bond0
         valid_lft forever preferred_lft forever
      inet6 xxxx:xxx:xxx:202:2eea:7fff:fe56:9a66/64 scope global dynamic mngtmpaddr noprefixroute 
         valid_lft 2591994sec preferred_lft 604794sec
      inet6 fe80::2eea:7fff:fe56:9a66/64 scope link 
         valid_lft forever preferred_lft forever
  # ip -6 r
  ::1 dev lo proto kernel metric 256 pref medium
  xxxx:xxx:xxx:202::/64 dev bond0 proto ra metric 1024 expires 2591998sec pref medium
  fe80::/64 dev bond0 proto kernel metric 256 pref medium
  default proto ra metric 1024 expires 28sec pref medium
  	nexthop via fe80::609:73ff:fe48:c000 dev bond0 weight 1 
  	nexthop via fe80::609:73ff:fe48:6500 dev bond0 weight 1 
  ```

  When the problem arises, the last three lines disappear. `tcpdump
  icmp6` shows RAs being received but networkd doesn't create the routes
  in the kernel. The machine keeps its IPv6 addresses, but without a
  default route it can't make any IPv6 connections or answer incoming
  IPv6 connections.

  Sorry, reproduction method is unclear. Here's a best guess:

  1. Configure networkd using netplan:

  ```yaml
  ---
  network:
    bonds:
      bond0:
        addresses:
        - xxx.xxx.202.112/24
        dhcp4: false
        interfaces:
        - eth0
        - eth1
        macaddress: 2C:EA:7F:56:9A:66
        parameters:
          mii-monitor-interval: 1
          mode: active-backup
    ethernets:
      eth0:
        dhcp4: false
        match:
          macaddress: 2C:EA:7F:56:9A:66
      eth1:
        dhcp4: false
        match:
          macaddress: 2C:EA:7F:56:9A:67
    renderer: networkd
    version: 2
  ```

  2. Load the machine, or just wait. Possibly this is related to packets being dropped, but I would expect the system to recover once the load is removed.
  3. Note the lack of IPv6 connectivity, inability to log in with ssh, etc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2053288/+subscriptions




More information about the foundations-bugs mailing list