[Bug 1752411] Re: bind9-host, avahi-daemon-check-dns.sh hang forever causes network connections to get stuck

Trent Lloyd trent.lloyd at canonical.com
Tue Aug 21 14:49:11 UTC 2018


Request sponsorship of this upload for cosmic and then SRU to bionic
 - New debdiff uploaded for both bionic and cosmic
 - Fixed the SRU version for bionic
 - Added a comment about the workaround to the script
 - Updated bug description with SRU template

Tested patch working on bionic with my machine which consistently
exhibits the issue with a package built from this diff (albeit with a 5
second delay on network interface up, hopefully after this we can switch
to fixing the actual issue with host)

The key note I see on the machine I can reproduce this on (a linux
bridge over an Intel I219-LM) is that both the interface route and the
default route are in the 'linkdown' state when the host command fires
for about 0.7 seconds total. When I looked at a different machine, that
stage never happened or at least for a much shorter time (i'd have to
check ip monitor again).

I don't expect anyone to reproduce this for testing, i'm happy to test
the -proposed packages on an affected machine.

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1752411

Title:
  bind9-host, avahi-daemon-check-dns.sh hang forever causes network
  connections to get stuck

Status in avahi package in Ubuntu:
  Confirmed
Status in bind9 package in Ubuntu:
  Confirmed
Status in openconnect package in Ubuntu:
  Invalid
Status in strongswan package in Ubuntu:
  Invalid
Status in avahi package in Debian:
  New

Bug description:
  [Impact]

   * Network connections for some users fail (in some cases a direct
  interface, in others when connecting a VPN) because the 'host' command
  to check for .local in DNS called by /usr/lib/avahi/avahi-daemon-
  check-dns.sh never times out like it should - leaving the script
  hanging indefinitely blocking interface up and start-up. This appears
  to be a bug in host caused in some circumstances however we implement
  a workaround to call it under 'timeout' as the issue with 'host' has
  not easily been identified, and in any case acts as a fall-back.

  [Test Case]

   * Multiple people have been unable to create a reproducer on a
  generic machine (e.g. it does not occur in a VM), I have a specific
  machine I can reproduce it on (a Skull Canyon NUC with Intel I219-LM)
  by simply "ifdown br0; ifup br0" and there are clearly 10s of other
  users affected in varying circumstances that all involve the same
  symptoms but no clear test case exists. Best I can suggest is that I
  test the patch on my system to ensure it works as expected, and the
  change is only 1 line which is fairly easily auditible and
  understandable.

  [Regression Potential]

   * The change is a single line change to the shell script to call host with "timeout". When tested on working and non-working system this appears to function as expected. I believe the regression potential for this is subsequently low.
   * In attempt to anticipate possible issues, I checked that the timeout command is in the same path (/usr/bin) as the host command that is already called without a path, and the coreutils package (which contains timeout) is an Essential package. I also checked that timeout is not a built-in in bash, for those that have changed /bin/sh to bash (just in case).

  [Other Info]
   
   * N/A

  [Original Bug Description]

  On 18.04 Openconnect connects successfully to any of multiple VPN
  concentrators but network traffic does not flow across the VPN tunnel
  connection. When testing on 16.04 this works flawlessly. This also
  worked on this system when it was on 17.10.

  I have tried reducing the mtu of the tun0 network device but this has
  not resulted in me being able to successfully ping the IP address.

  Example showing ping attempt to the IP of DNS server:

  ~$ cat /etc/resolv.conf
  # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
  #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
  # 127.0.0.53 is the systemd-resolved stub resolver.
  # run "systemd-resolve --status" to see details about the actual nameservers.

  nameserver 172.29.88.11
  nameserver 127.0.0.53

  liam at liam-lat:~$ netstat -nr
  Kernel IP routing table
  Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
  0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 wlp2s0
  105.27.198.106  192.168.1.1     255.255.255.255 UGH       0 0          0 wlp2s0
  169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 docker0
  172.17.0.0      0.0.0.0         255.255.0.0     U         0 0          0 docker0
  172.29.0.0      0.0.0.0         255.255.0.0     U         0 0          0 tun0
  172.29.88.11    0.0.0.0         255.255.255.255 UH        0 0          0 tun0
  192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 wlp2s0
  liam at liam-lat:~$ ping 172.29.88.11
  PING 172.29.88.11 (172.29.88.11) 56(84) bytes of data.
  ^C
  --- 172.29.88.11 ping statistics ---
  4 packets transmitted, 0 received, 100% packet loss, time 3054ms

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: openconnect 7.08-3
  ProcVersionSignature: Ubuntu 4.15.0-10.11-generic 4.15.3
  Uname: Linux 4.15.0-10-generic x86_64
  ApportVersion: 2.20.8-0ubuntu10
  Architecture: amd64
  CurrentDesktop: ubuntu:GNOME
  Date: Wed Feb 28 22:11:33 2018
  InstallationDate: Installed on 2017-06-15 (258 days ago)
  InstallationMedia: Ubuntu 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
  SourcePackage: openconnect
  UpgradeStatus: Upgraded to bionic on 2018-02-22 (6 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1752411/+subscriptions



More information about the Ubuntu-sponsors mailing list