[Bug 1833671] Re: bond interfaces stop working after restart of systemd-networkd

Tue Jul 23 21:37:57 UTC 2019

** Description changed:

+ [impact]
+ 
+ restarting systemd-networkd drops carrier on all bond slaves,
+ temporarily interrupting networking over the bond.
+ 
+ [test case]
+ 
+ on a bionic system with 2 interfaces that can be put into a bond, create
+ config files such as:
+ 
+ root at lp1833671:~# cat /etc/systemd/network/10-bond0.netdev 
+ [NetDev]
+ Name=bond0
+ Kind=bond
+ 
+ root at lp1833671:~# cat /etc/systemd/network/20-ens8.network 
+ [Match]
+ Name=ens8
+ 
+ [Network]
+ Bond=bond0
+ 
+ root at lp1833671:~# cat /etc/systemd/network/20-ens9.network 
+ [Match]
+ Name=ens9
+ 
+ [Network]
+ Bond=bond0
+ 
+ root at lp1833671:~# cat /etc/systemd/network/30-bond0.network 
+ [Match]
+ Name=bond0
+ 
+ [Network]
+ Address=1.2.3.4/32
+ 
+ 
+ restart networkd, or reboot, and verify the bond is up:
+ 
+ root at lp1833671:~# ip a
+ 3: ens8: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
+     link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
+ 4: ens9: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
+     link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
+ 5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
+     link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
+     inet 1.2.3.4/32 scope global bond0
+        valid_lft forever preferred_lft forever
+     inet6 fe80::4030:62ff:fecc:362b/64 scope link 
+        valid_lft forever preferred_lft forever
+ 
+ 
+ restart networkd and check /var/log/syslog:
+ 
+ root at lp1833671:~# systemctl restart systemd-networkd
+ root at lp1833671:~# cat /var/log/syslog 
+ ...
+ Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens9: Lost carrier
+ Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens8: Lost carrier
+ Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens9: Gained carrier
+ Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens8: Gained carrier
+ 
+ [regression potential]
+ 
+ this changes how bond slaves are managed, so regressions could affect
+ any configurations using bonding.
+ 
+ [other info]
+ 
+ the patch is already included in d, and ifupdown manages networking in
+ x, so this is needed only for b.
+ 
+ [original description]
+ 
  Running systemd-networkd from systemd 237-3ubuntu10.23 on Ubuntu 18.04.2
  I have one machine where, every time systemd-networkd restarts (ie every
  time there is an update to systemd) the bond0 interface stops working.

  I see both physical interfaces go soft down and then come back again:

  Jun 21 07:28:24 odin.openstreetmap.org systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SEC
  Jun 21 07:28:24 odin.openstreetmap.org systemd[1]: Detected architecture x86-64.
  Jun 21 07:28:24 odin.openstreetmap.org kernel: bond0: link status down for backup interface eno2, disabling it in 200 ms
  Jun 21 07:28:24 odin.openstreetmap.org kernel: bond0: link status down for active interface eno1, disabling it in 200 ms
  Jun 21 07:28:24 odin.openstreetmap.org kernel: 8021q: adding VLAN 0 to HW filter on device eno2
  Jun 21 07:28:25 odin.openstreetmap.org kernel: 8021q: adding VLAN 0 to HW filter on device eno1
  Jun 21 07:28:25 odin.openstreetmap.org kernel: bond0: link status up again after 200 ms for interface eno2
  Jun 21 07:28:25 odin.openstreetmap.org kernel: bond0: link status up again after 100 ms for interface eno1

  and after that nothing until I stop systemd-networkd, delete the bond
  interface, and then start systemd-networkd again.

  On most machines the cycle seems to take a bit longer and the interfaces
  reach a hard down start before coming back and in that case there seems
  to be no problem.

  I think this is likely an instance of this upstream bug:

  https://github.com/systemd/systemd/issues/10118

  which has a fix here:

  https://github.com/systemd/systemd/pull/10465

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1833671

Title:
  bond interfaces stop working after restart of systemd-networkd

Status in systemd:
  Unknown
Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  In Progress

Bug description:
  [impact]

  restarting systemd-networkd drops carrier on all bond slaves,
  temporarily interrupting networking over the bond.

  [test case]

  on a bionic system with 2 interfaces that can be put into a bond,
  create config files such as:

  root at lp1833671:~# cat /etc/systemd/network/10-bond0.netdev 
  [NetDev]
  Name=bond0
  Kind=bond

  root at lp1833671:~# cat /etc/systemd/network/20-ens8.network 
  [Match]
  Name=ens8

  [Network]
  Bond=bond0

  root at lp1833671:~# cat /etc/systemd/network/20-ens9.network 
  [Match]
  Name=ens9

  [Network]
  Bond=bond0

  root at lp1833671:~# cat /etc/systemd/network/30-bond0.network 
  [Match]
  Name=bond0

  [Network]
  Address=1.2.3.4/32

  restart networkd, or reboot, and verify the bond is up:

  root at lp1833671:~# ip a
  3: ens8: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
      link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
  4: ens9: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
      link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
  5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
      inet 1.2.3.4/32 scope global bond0
         valid_lft forever preferred_lft forever
      inet6 fe80::4030:62ff:fecc:362b/64 scope link 
         valid_lft forever preferred_lft forever

  restart networkd and check /var/log/syslog:

  root at lp1833671:~# systemctl restart systemd-networkd
  root at lp1833671:~# cat /var/log/syslog 
  ...
  Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens9: Lost carrier
  Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens8: Lost carrier
  Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens9: Gained carrier
  Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens8: Gained carrier

  [regression potential]

  this changes how bond slaves are managed, so regressions could affect
  any configurations using bonding.

  [other info]

  the patch is already included in d, and ifupdown manages networking in
  x, so this is needed only for b.

  [original description]

  Running systemd-networkd from systemd 237-3ubuntu10.23 on Ubuntu
  18.04.2 I have one machine where, every time systemd-networkd restarts
  (ie every time there is an update to systemd) the bond0 interface
  stops working.

  I see both physical interfaces go soft down and then come back again:

  Jun 21 07:28:24 odin.openstreetmap.org systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SEC
  Jun 21 07:28:24 odin.openstreetmap.org systemd[1]: Detected architecture x86-64.
  Jun 21 07:28:24 odin.openstreetmap.org kernel: bond0: link status down for backup interface eno2, disabling it in 200 ms
  Jun 21 07:28:24 odin.openstreetmap.org kernel: bond0: link status down for active interface eno1, disabling it in 200 ms
  Jun 21 07:28:24 odin.openstreetmap.org kernel: 8021q: adding VLAN 0 to HW filter on device eno2
  Jun 21 07:28:25 odin.openstreetmap.org kernel: 8021q: adding VLAN 0 to HW filter on device eno1
  Jun 21 07:28:25 odin.openstreetmap.org kernel: bond0: link status up again after 200 ms for interface eno2
  Jun 21 07:28:25 odin.openstreetmap.org kernel: bond0: link status up again after 100 ms for interface eno1

  and after that nothing until I stop systemd-networkd, delete the bond
  interface, and then start systemd-networkd again.

  On most machines the cycle seems to take a bit longer and the
  interfaces reach a hard down start before coming back and in that case
  there seems to be no problem.

  I think this is likely an instance of this upstream bug:

  https://github.com/systemd/systemd/issues/10118

  which has a fix here:

  https://github.com/systemd/systemd/pull/10465

To manage notifications about this bug go to:
https://bugs.launchpad.net/systemd/+bug/1833671/+subscriptions