[Bug 1752411] Re: bind9-host, avahi-daemon-check-dns.sh hang forever causes network connections to get stuck
Trent Lloyd
trent.lloyd at canonical.com
Tue Aug 21 14:49:11 UTC 2018
Request sponsorship of this upload for cosmic and then SRU to bionic
- New debdiff uploaded for both bionic and cosmic
- Fixed the SRU version for bionic
- Added a comment about the workaround to the script
- Updated bug description with SRU template
Tested patch working on bionic with my machine which consistently
exhibits the issue with a package built from this diff (albeit with a 5
second delay on network interface up, hopefully after this we can switch
to fixing the actual issue with host)
The key note I see on the machine I can reproduce this on (a linux
bridge over an Intel I219-LM) is that both the interface route and the
default route are in the 'linkdown' state when the host command fires
for about 0.7 seconds total. When I looked at a different machine, that
stage never happened or at least for a much shorter time (i'd have to
check ip monitor again).
I don't expect anyone to reproduce this for testing, i'm happy to test
the -proposed packages on an affected machine.
--
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1752411
Title:
bind9-host, avahi-daemon-check-dns.sh hang forever causes network
connections to get stuck
Status in avahi package in Ubuntu:
Confirmed
Status in bind9 package in Ubuntu:
Confirmed
Status in openconnect package in Ubuntu:
Invalid
Status in strongswan package in Ubuntu:
Invalid
Status in avahi package in Debian:
New
Bug description:
[Impact]
* Network connections for some users fail (in some cases a direct
interface, in others when connecting a VPN) because the 'host' command
to check for .local in DNS called by /usr/lib/avahi/avahi-daemon-
check-dns.sh never times out like it should - leaving the script
hanging indefinitely blocking interface up and start-up. This appears
to be a bug in host caused in some circumstances however we implement
a workaround to call it under 'timeout' as the issue with 'host' has
not easily been identified, and in any case acts as a fall-back.
[Test Case]
* Multiple people have been unable to create a reproducer on a
generic machine (e.g. it does not occur in a VM), I have a specific
machine I can reproduce it on (a Skull Canyon NUC with Intel I219-LM)
by simply "ifdown br0; ifup br0" and there are clearly 10s of other
users affected in varying circumstances that all involve the same
symptoms but no clear test case exists. Best I can suggest is that I
test the patch on my system to ensure it works as expected, and the
change is only 1 line which is fairly easily auditible and
understandable.
[Regression Potential]
* The change is a single line change to the shell script to call host with "timeout". When tested on working and non-working system this appears to function as expected. I believe the regression potential for this is subsequently low.
* In attempt to anticipate possible issues, I checked that the timeout command is in the same path (/usr/bin) as the host command that is already called without a path, and the coreutils package (which contains timeout) is an Essential package. I also checked that timeout is not a built-in in bash, for those that have changed /bin/sh to bash (just in case).
[Other Info]
* N/A
[Original Bug Description]
On 18.04 Openconnect connects successfully to any of multiple VPN
concentrators but network traffic does not flow across the VPN tunnel
connection. When testing on 16.04 this works flawlessly. This also
worked on this system when it was on 17.10.
I have tried reducing the mtu of the tun0 network device but this has
not resulted in me being able to successfully ping the IP address.
Example showing ping attempt to the IP of DNS server:
~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
nameserver 172.29.88.11
nameserver 127.0.0.53
liam at liam-lat:~$ netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 wlp2s0
105.27.198.106 192.168.1.1 255.255.255.255 UGH 0 0 0 wlp2s0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.29.0.0 0.0.0.0 255.255.0.0 U 0 0 0 tun0
172.29.88.11 0.0.0.0 255.255.255.255 UH 0 0 0 tun0
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 wlp2s0
liam at liam-lat:~$ ping 172.29.88.11
PING 172.29.88.11 (172.29.88.11) 56(84) bytes of data.
^C
--- 172.29.88.11 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3054ms
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: openconnect 7.08-3
ProcVersionSignature: Ubuntu 4.15.0-10.11-generic 4.15.3
Uname: Linux 4.15.0-10-generic x86_64
ApportVersion: 2.20.8-0ubuntu10
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Feb 28 22:11:33 2018
InstallationDate: Installed on 2017-06-15 (258 days ago)
InstallationMedia: Ubuntu 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
SourcePackage: openconnect
UpgradeStatus: Upgraded to bionic on 2018-02-22 (6 days ago)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1752411/+subscriptions
More information about the Ubuntu-sponsors
mailing list