[Bug 1630578] Re: broken kernel causes eternal test retry loop

Martin Pitt martin.pitt at ubuntu.com
Mon Oct 10 10:17:25 UTC 2016


This is harder to work around/catch, as in the new case the test does
*not* time out, it just kills sshd (or something in the kernel that
breaks ssh/networking). In general these are cases that we do want to
treat as "tmpfail" and auto-restart, I don't want to treat an auxverb
failure as failure in general.

Perhaps we need to introduce some kind of retry counter, but this would
need to span at least half a day -- three tmpfails on the same worker in
a row are usually a sign of a broken cloud or a broken testbed image,
not a test failure. So perhaps some logic to check if other tests
tmpfail on the same worker/cloud, and if not then call that test a
failure.

This would all require state keeping, which we don't currently do (the
only state is the AMQP queue contents).

** Changed in: autopkgtest (Ubuntu)
       Status: In Progress => Triaged

** Changed in: autopkgtest (Ubuntu)
   Importance: High => Medium

** Package changed: autopkgtest (Ubuntu) => auto-package-testing

** Changed in: auto-package-testing
    Milestone: ubuntu-16.10 => None

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to autopkgtest in Ubuntu.
https://bugs.launchpad.net/bugs/1630578

Title:
  broken kernel causes eternal test retry loop

Status in Auto Package Testing:
  Triaged

Bug description:
  The infra is currently testing linux-meta/4.4.0.41.43 which has some
  bug that it freezes the testbed (nothing actually runs any more and
  ssh hangs). The actual test timeout is being spotted correctly, but
  then it takes another two hours until it finally trips over some
  cleanup error/timeout -- this then gets counted as a tmpfail and the
  whole tests runs again:

  ⟫ tail -f /tmp/autopkgtest-work.s6p8ukov/out/log
  06:30:45 DEBUG| [stdout] shm-sysv PASSED
  06:30:55 DEBUG| [stdout] sigfd PASSED
  06:31:05 DEBUG| [stdout] sigfpe PASSED
  06:31:15 DEBUG| [stdout] sigpending PASSED
  06:31:25 DEBUG| [stdout] sigq PASSED
  06:31:35 DEBUG| [stdout] sigsegv PASSED
  06:31:45 DEBUG| [stdout] sigsuspend PASSED
  06:31:55 DEBUG| [stdout] sleep PASSED
  06:32:05 DEBUG| [stdout] sock PASSED
  06:32:16 DEBUG| [stdout] sockfd PASSED

  
  autopkgtest [10:43:18]: ERROR: timed out on command "su -s /bin/bash ubuntu -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true;  . ~/.profile >/dev/null 2>&1 || true; buildtree="/tmp/autopkgtest.3eTi1L/build.zNX/linux-4.4.0"; mkdir -p -m 1777 -- "/tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-artifacts"; export AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 "/tmp/autopkgtest.3eTi1L/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest.3eTi1L/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=4; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE   LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS   LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT PIPE; cd "$buildtree"; export 'ADT_TEST_TRIGGERS=linux-meta/4.4.0.41.43'; chmod +x /tmp/autopkgtest.3eTi1L/build.zNX/linux-4.4.0/debian/tests/ubuntu-regression-suite; touch /tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-stdout /tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-stderr; /tmp/autopkgtest.3eTi1L/build.zNX/linux-4.4.0/debian/tests/ubuntu-regression-suite 2> >(tee -a /tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-stderr >&2) > >(tee -a /tmp/autopkgtest.3eTi1L/ubuntu-regression-suite-stdout); " (kind: test)
  autopkgtest [10:43:18]: test ubuntu-regression-suite: -----------------------]
  ^[Unexpected cleanup error:
  Traceback (most recent call last):
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 708, in mainloop
      command()
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 646, in command
      r = f(c, ce)
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 584, in cmd_copyup
      copyupdown(c, ce, True)
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 469, in copyupdown
      copyupdown_internal(ce[0], c[1:], upp)
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 570, in copyupdown_internal
      (wh, ['source', 'destination'][sdn], status))
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 84, in bomb
      raise Quit(12, progname + ": failure: %s" % m)
  VirtSubproc.Quit: (12, '<VirtSubproc>: failure: copyup source failed, status 255')

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 680, in error_cleanup
      cleanup()
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 671, in cleanup
      caller.hook_cleanup()
    File "/home/ubuntu/autopkgtest/virt/autopkgtest-virt-ssh", line 464, in hook_cleanup
      VirtSubproc.downtmp_remove()
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 291, in downtmp_remove
      auxverb + ['rm', '-rf', '--', downtmp])
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 144, in execute_timeout
      (out, err) = sp.communicate(instr)
    File "/usr/lib/python3.5/subprocess.py", line 1064, in communicate
      self.wait()
    File "/usr/lib/python3.5/subprocess.py", line 1658, in wait
      (pid, sts) = self._try_wait(0)
    File "/usr/lib/python3.5/subprocess.py", line 1608, in _try_wait
      (pid, sts) = os.waitpid(self.pid, wait_flags)
    File "/home/ubuntu/autopkgtest/lib/VirtSubproc.py", line 64, in alarm_handler
      raise Timeout()
  VirtSubproc.Timeout

  while cleaning up because of another error:
  <VirtSubproc>: failure: copyup source failed, status 255
  autopkgtest [12:43:19]: ERROR: testbed failure: unexpected eof from the testbed

To manage notifications about this bug go to:
https://bugs.launchpad.net/auto-package-testing/+bug/1630578/+subscriptions



More information about the foundations-bugs mailing list