[Bug 1630578] Re: broken kernel causes eternal test retry loop
Martin Pitt
martin.pitt at ubuntu.com
Mon Oct 10 10:17:25 UTC 2016
This is harder to work around/catch, as in the new case the test does
*not* time out, it just kills sshd (or something in the kernel that
breaks ssh/networking). In general these are cases that we do want to
treat as "tmpfail" and auto-restart, I don't want to treat an auxverb
failure as failure in general.
Perhaps we need to introduce some kind of retry counter, but this would
need to span at least half a day -- three tmpfails on the same worker in
a row are usually a sign of a broken cloud or a broken testbed image,
not a test failure. So perhaps some logic to check if other tests
tmpfail on the same worker/cloud, and if not then call that test a
failure.
This would all require state keeping, which we don't currently do (the
only state is the AMQP queue contents).
** Changed in: autopkgtest (Ubuntu)
Status: In Progress => Triaged
** Changed in: autopkgtest (Ubuntu)
Importance: High => Medium
** Package changed: autopkgtest (Ubuntu) => auto-package-testing
** Changed in: auto-package-testing
Milestone: ubuntu-16.10 => None
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to autopkgtest in Ubuntu.
https://bugs.launchpad.net/bugs/1630578
Title:
broken kernel causes eternal test retry loop
Status in Auto Package Testing:
Triaged
Bug description:
The infra is currently testing linux-meta/4.4.0.41.43 which has some
bug that it freezes the testbed (nothing actually runs any more and
ssh hangs). The actual test timeout is being spotted correctly, but
then it takes another two hours until it finally trips over some
cleanup error/timeout -- this then gets counted as a tmpfail and the
whole tests runs again:
⟫ tail -f /tmp/autopkgtest-work.s6p8ukov/out/log
06:30:45 DEBUG| [stdout] shm-sysv PASSED
06:30:55 DEBUG| [stdout] sigfd PASSED
06:31:05 DEBUG| [stdout] sigfpe PASSED
06:31:15 DEBUG| [stdout] sigpending PASSED
06:31:25 DEBUG| [stdout] sigq PASSED
06:31:35 DEBUG| [stdout] sigsegv PASSED
06:31:45 DEBUG| [stdout] sigsuspend PASSED
06:31:55 DEBUG| [stdout] sleep PASSED
06:32:05 DEBUG| [stdout] sock PASSED
06:32:16 DEBUG| [stdout] sockfd PASSED
autopkgtest [10:43:18]: ERROR: timed out on command "su -s /bin/bash ubuntu -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true; . ~/.profile >/dev/null 2>&1 || true; buildtree="/tmp/autopkgtest.3eTi1L/build.zNX/linux-4.4.0"; mkdir -p -m 1777 -- "/tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-artifacts"; export AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 "/tmp/autopkgtest.3eTi1L/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest.3eTi1L/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=4; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT PIPE; cd "$buildtree"; export 'ADT_TEST_TRIGGERS=linux-meta/4.4.0.41.43'; chmod +x /tmp/autopkgtest.3eTi1L/build.zNX/linux-4.4.0/debian/tests/ubuntu-regression-suite; touch /tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-stdout /tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-stderr; /tmp/autopkgtest.3eTi1L/build.zNX/linux-4.4.0/debian/tests/ubuntu-regression-suite 2> >(tee -a /tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-stderr >&2) > >(tee -a /tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-stdout); " (kind: test)
autopkgtest [10:43:18]: test ubuntu-regression-suite: -----------------------]
^[Unexpected cleanup error:
Traceback (most recent call last):
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 708, in mainloop
command()
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 646, in command
r = f(c, ce)
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 584, in cmd_copyup
copyupdown(c, ce, True)
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 469, in copyupdown
copyupdown_internal(ce[0], c[1:], upp)
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 570, in copyupdown_internal
(wh, ['source', 'destination'][sdn], status))
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 84, in bomb
raise Quit(12, progname + ": failure: %s" % m)
VirtSubproc.Quit: (12, '<VirtSubproc>: failure: copyup source failed, status 255')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 680, in error_cleanup
cleanup()
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 671, in cleanup
caller.hook_cleanup()
File "/home/ubuntu/autopkgtest/virt/autopkgtest-virt-ssh", line 464, in hook_cleanup
VirtSubproc.downtmp_remove()
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 291, in downtmp_remove
auxverb + ['rm', '-rf', '--', downtmp])
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 144, in execute_timeout
(out, err) = sp.communicate(instr)
File "/usr/lib/python3.5/subprocess.py", line 1064, in communicate
self.wait()
File "/usr/lib/python3.5/subprocess.py", line 1658, in wait
(pid, sts) = self._try_wait(0)
File "/usr/lib/python3.5/subprocess.py", line 1608, in _try_wait
(pid, sts) = os.waitpid(self.pid, wait_flags)
File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 64, in alarm_handler
raise Timeout()
VirtSubproc.Timeout
while cleaning up because of another error:
<VirtSubproc>: failure: copyup source failed, status 255
autopkgtest [12:43:19]: ERROR: testbed failure: unexpected eof from the testbed
To manage notifications about this bug go to:
https://bugs.launchpad.net/auto-package-testing/+bug/1630578/+subscriptions
More information about the foundations-bugs
mailing list