[Bug 1825997] Re: boot-smoke fails due to running jobs
Dan Streetman
dan.streetman at canonical.com
Tue Apr 23 15:22:31 UTC 2019
** Description changed:
[impact]
boot-smoke test reboots 5 times and verifies systemd is fully started up
- after each boot, but only gives 35 seconds for each boot. On loaded
- systems this is too short.
+ after each boot, including checking if there are any running jobs (with
+ list-jobs). However, this test makes the assumption that no further
+ jobs will be started after systemd reaches 'running' (or 'degraded')
+ state, which is a false assumption.
[test case]
see various boot-smoke failures in autopkgtest.ubuntu.com
[regression potential]
- longer autopkgtest times.
+ possible false-positive or false-negative autopkgtest results.
[other info]
- i can't reproduce this failure locally, but it seems to happen
- intermittently on the adt setup. Therefore, I don't know for sure that
- the short timeout is actually the cause of the problem, but it certainly
- seems likely - 35 seconds really isn't very long for a full reboot and
- for systemd to finish starting all services, especially on the highly
- loaded autopkgtest.ubuntu.com systems.
+ The problem appears to be that systemd reaches 'running' (or 'degraded')
+ state, and then other systemd services are started. This confuses the
+ boot-smoke test, because it sees that 'is-system-running' is done, but
+ then it sees running jobs, which fails the test.
- There should be no harm, other than delaying an actual failure, from
- extending the timeout. The test case checks each second if all services
- have finished starting, so on success case it won't wait any longer than
- it currently does.
+ What is starting jobs after systemd reaches running state appears to be
+ X inside the test system. There are various services started by gnome-
+ session and dbus-daemon. Additionally, from the artifacts of one
+ example:
+
+ https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
+ /autopkgtest-
+ bionic/bionic/i386/s/systemd/20190416_171327_478f6@/artifacts.tar.gz
+
+ the artifacts/journal.txt shows that after the boot-smoke test causes
+ the reboot and then re-ssh into the system after the reboot, it only
+ gives the test system 9 seconds before deciding it has failed, and only
+ 4 seconds after ssh'ing into the rebooted test system.
+
+ While increasing the timeout isn't guaranteed to stop the boot-smoke
+ failures due to still-running jobs, the logs show it certainly should
+ help.
+
+ If we continue to get failures for still-running jobs, it probably
+ should just be made a non-failing check.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1825997
Title:
boot-smoke fails due to running jobs
Status in systemd package in Ubuntu:
In Progress
Status in systemd source package in Bionic:
In Progress
Status in systemd source package in Cosmic:
In Progress
Status in systemd source package in Disco:
In Progress
Status in systemd source package in Eoan:
In Progress
Bug description:
[impact]
boot-smoke test reboots 5 times and verifies systemd is fully started
up after each boot, including checking if there are any running jobs
(with list-jobs). However, this test makes the assumption that no
further jobs will be started after systemd reaches 'running' (or
'degraded') state, which is a false assumption.
[test case]
see various boot-smoke failures in autopkgtest.ubuntu.com
[regression potential]
possible false-positive or false-negative autopkgtest results.
[other info]
The problem appears to be that systemd reaches 'running' (or
'degraded') state, and then other systemd services are started. This
confuses the boot-smoke test, because it sees that 'is-system-running'
is done, but then it sees running jobs, which fails the test.
What is starting jobs after systemd reaches running state appears to
be X inside the test system. There are various services started by
gnome-session and dbus-daemon. Additionally, from the artifacts of
one example:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
/autopkgtest-
bionic/bionic/i386/s/systemd/20190416_171327_478f6@/artifacts.tar.gz
the artifacts/journal.txt shows that after the boot-smoke test causes
the reboot and then re-ssh into the system after the reboot, it only
gives the test system 9 seconds before deciding it has failed, and
only 4 seconds after ssh'ing into the rebooted test system.
The timeout waiting for is-system-running is actually probably fine;
what is needed is another timeout while checking list-jobs, after we
know that the system is running. Another timeout should let any new
jobs started after we reached running complete.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1825997/+subscriptions
More information about the foundations-bugs
mailing list