[Bug 1752411] Re: bind9-host, avahi-daemon-check-dns.sh hang forever causes network connections to get stuck

 Christian Ehrhardt  1752411 at bugs.launchpad.net
Wed Aug 29 09:29:43 UTC 2018


@Erich - infinite hangs are usually due to the kernel somewhere, while
suggesting dig was a good idea just to try I wonder if we would have to
find what "host" actually hangs on to be sure that "dig" in turn will
not some day block on just the same.

Can one of you affected when the "host" command hangs check if it is spinning in userspace or if it is a kernel wchan?
$ cat /proc/<pid of host>/wchan
and
$ perf top -p <pid of host>
$ strace -rtf -p <pid of host>
should help to get an idea what it is blocking on.

@Trent - you said you started on strace already, maybe you can provide the full logs here?
Also was it spinning in strace (on the same things) or just waiting?

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1752411

Title:
  bind9-host, avahi-daemon-check-dns.sh hang forever causes network
  connections to get stuck

Status in avahi package in Ubuntu:
  Fix Released
Status in bind9 package in Ubuntu:
  Confirmed
Status in openconnect package in Ubuntu:
  Invalid
Status in strongswan package in Ubuntu:
  Invalid
Status in avahi source package in Bionic:
  Triaged
Status in bind9 source package in Bionic:
  Confirmed
Status in avahi source package in Cosmic:
  Fix Released
Status in bind9 source package in Cosmic:
  Confirmed
Status in avahi package in Debian:
  New

Bug description:
  [Impact]

   * Network connections for some users fail (in some cases a direct
  interface, in others when connecting a VPN) because the 'host' command
  to check for .local in DNS called by /usr/lib/avahi/avahi-daemon-
  check-dns.sh never times out like it should - leaving the script
  hanging indefinitely blocking interface up and start-up. This appears
  to be a bug in host caused in some circumstances however we implement
  a workaround to call it under 'timeout' as the issue with 'host' has
  not easily been identified, and in any case acts as a fall-back.

  [Test Case]

   * Multiple people have been unable to create a reproducer on a
  generic machine (e.g. it does not occur in a VM), I have a specific
  machine I can reproduce it on (a Skull Canyon NUC with Intel I219-LM)
  by simply "ifdown br0; ifup br0" and there are clearly 10s of other
  users affected in varying circumstances that all involve the same
  symptoms but no clear test case exists. Best I can suggest is that I
  test the patch on my system to ensure it works as expected, and the
  change is only 1 line which is fairly easily auditible and
  understandable.

  [Regression Potential]

   * The change is a single line change to the shell script to call host with "timeout". When tested on working and non-working system this appears to function as expected. I believe the regression potential for this is subsequently low.
   * In attempt to anticipate possible issues, I checked that the timeout command is in the same path (/usr/bin) as the host command that is already called without a path, and the coreutils package (which contains timeout) is an Essential package. I also checked that timeout is not a built-in in bash, for those that have changed /bin/sh to bash (just in case).

  [Other Info]
   
   * N/A

  [Original Bug Description]

  On 18.04 Openconnect connects successfully to any of multiple VPN
  concentrators but network traffic does not flow across the VPN tunnel
  connection. When testing on 16.04 this works flawlessly. This also
  worked on this system when it was on 17.10.

  I have tried reducing the mtu of the tun0 network device but this has
  not resulted in me being able to successfully ping the IP address.

  Example showing ping attempt to the IP of DNS server:

  ~$ cat /etc/resolv.conf
  # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
  #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
  # 127.0.0.53 is the systemd-resolved stub resolver.
  # run "systemd-resolve --status" to see details about the actual nameservers.

  nameserver 172.29.88.11
  nameserver 127.0.0.53

  liam at liam-lat:~$ netstat -nr
  Kernel IP routing table
  Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
  0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 wlp2s0
  105.27.198.106  192.168.1.1     255.255.255.255 UGH       0 0          0 wlp2s0
  169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 docker0
  172.17.0.0      0.0.0.0         255.255.0.0     U         0 0          0 docker0
  172.29.0.0      0.0.0.0         255.255.0.0     U         0 0          0 tun0
  172.29.88.11    0.0.0.0         255.255.255.255 UH        0 0          0 tun0
  192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 wlp2s0
  liam at liam-lat:~$ ping 172.29.88.11
  PING 172.29.88.11 (172.29.88.11) 56(84) bytes of data.
  ^C
  --- 172.29.88.11 ping statistics ---
  4 packets transmitted, 0 received, 100% packet loss, time 3054ms

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: openconnect 7.08-3
  ProcVersionSignature: Ubuntu 4.15.0-10.11-generic 4.15.3
  Uname: Linux 4.15.0-10-generic x86_64
  ApportVersion: 2.20.8-0ubuntu10
  Architecture: amd64
  CurrentDesktop: ubuntu:GNOME
  Date: Wed Feb 28 22:11:33 2018
  InstallationDate: Installed on 2017-06-15 (258 days ago)
  InstallationMedia: Ubuntu 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
  SourcePackage: openconnect
  UpgradeStatus: Upgraded to bionic on 2018-02-22 (6 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1752411/+subscriptions



More information about the Ubuntu-sponsors mailing list