[Bug 1795658] Re: xenial systemd reports 'inactive' instead of 'failed' for service units that repeatedly failed to restart / failed permanently
Mauricio Faria de Oliveira
mfo at canonical.com
Tue Oct 2 14:08:44 UTC 2018
More details on the verification of test package from Launchpad PPA)
---
Test-case)
$ cat <<EOF | sudo tee /etc/systemd/system/fail-on-restart.service
[Service]
ExecStart=/bin/false
Restart=always
EOF
Before) "Active: inactive (dead)"
$ dpkg -s systemd | grep Version
Version: 229-4ubuntu21.4
$ sudo systemctl daemon-reload
$ sudo systemctl start fail-on-restart
$ systemctl status -n0 fail-on-restart
● fail-on-restart.service
Loaded: loaded (/etc/systemd/system/fail-on-restart.service; static; vendor preset: enabled)
Active: inactive (dead)
$ journalctl --no-pager -u fail-on-restart
<...>
Sep 29 10:59:00 havers systemd[1]: Started fail-on-restart.service.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 10:59:00 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 10:59:00 havers systemd[1]: Started fail-on-restart.service.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 10:59:00 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 10:59:00 havers systemd[1]: Started fail-on-restart.service.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 10:59:00 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 10:59:00 havers systemd[1]: Started fail-on-restart.service.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 10:59:00 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 10:59:00 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 10:59:01 havers systemd[1]: Started fail-on-restart.service.
Sep 29 10:59:01 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 10:59:01 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 10:59:01 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 10:59:01 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 10:59:01 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 10:59:01 havers systemd[1]: fail-on-restart.service: Start request repeated too quickly.
Sep 29 10:59:01 havers systemd[1]: Failed to start fail-on-restart.service.
Package from PPA)
$ sudo add-apt-repository ppa:mfo/sf199312
$ sudo apt-get update
$ sudo apt-get install systemd
After) "Active: failed (Result: start-limit-hit)"
$ dpkg -s systemd | grep Version
Version: 229-4ubuntu21.4+1.sf199312.20180928
$ sudo systemctl daemon-reload
$ sudo systemctl start fail-on-restart
$ systemctl status -n0 fail-on-restart
● fail-on-restart.service
Loaded: loaded (/etc/systemd/system/fail-on-restart.service; static; vendor preset: enabled)
Active: failed (Result: start-limit-hit) since Sat 2018-09-29 11:01:34 UTC; 4s ago
Process: 7066 ExecStart=/bin/false (code=exited, status=1/FAILURE)
Main PID: 7066 (code=exited, status=1/FAILURE)
$ journalctl --no-pager -u fail-on-restart
<...>
Sep 29 11:01:33 havers systemd[1]: Started fail-on-restart.service.
Sep 29 11:01:33 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 11:01:33 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 11:01:33 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 11:01:33 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 11:01:33 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 11:01:33 havers systemd[1]: Started fail-on-restart.service.
Sep 29 11:01:33 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 11:01:33 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 11:01:33 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 11:01:33 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 11:01:33 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 11:01:34 havers systemd[1]: Started fail-on-restart.service.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 11:01:34 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 11:01:34 havers systemd[1]: Started fail-on-restart.service.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 11:01:34 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 11:01:34 havers systemd[1]: Started fail-on-restart.service.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Main process exited, code=exited, status=1/FAILURE
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Failed with result 'exit-code'.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Service hold-off time over, scheduling restart.
Sep 29 11:01:34 havers systemd[1]: Stopped fail-on-restart.service.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Start request repeated too quickly.
Sep 29 11:01:34 havers systemd[1]: Failed to start fail-on-restart.service.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Unit entered failed state.
Sep 29 11:01:34 havers systemd[1]: fail-on-restart.service: Failed with result 'start-limit-hit'.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1795658
Title:
xenial systemd reports 'inactive' instead of 'failed' for service
units that repeatedly failed to restart / failed permanently
Status in systemd package in Ubuntu:
In Progress
Bug description:
[Impact]
* In case a service unit has repeatedly failed to restart, it should be
reported as 'failed' permanently, but currently it's instead reported
as 'inactive'.
* System monitoring tools that evaluate the status of systemd service units
and act upon it (for example: restart service, report permanent failure)
are currently misled by information in 'systemctl status <unit>.service'.
* System management tools based on such information may take wrong and/or
sub-optimal actions in the managed systems regarding such service units.
* This systemd patch [1] directly addresses this issue (see systemd github
PR #3166 [2]), and its code is still effectice in upstream systemd today,
without further fixes/changes (the only changes were in doc text and the
busname files that were removed, but still without further fixes to this).
[Test Case]
* This is copied from systemd PR #3166 [2].
* This has been tested by a customer as well, and with its system monitoring
and management solution, for interoperability verification.
$ cat <<EOF | sudo tee /etc/systemd/system/fail-on-restart.service
[Service]
ExecStart=/bin/false
Restart=always
EOF
$ sudo systemctl daemon-reload
$ sudo systemctl start fail-on-restart
Before) "Active: inactive (dead)"
$ systemctl status -n0 fail-on-restart
fail-on-restart.service
Loaded: loaded (/etc/systemd/system/fail-on-restart.service; static; vendor preset: enabled)
Active: inactive (dead)
After) "Active: failed (Result: start-limit-hit)"
$ systemctl status -n0 fail-on-restart
fail-on-restart.service
Loaded: loaded (/etc/systemd/system/fail-on-restart.service; static; vendor preset: enabled)
Active: failed (Result: start-limit-hit) since Sat 2018-09-29 11:01:34 UTC; 4s ago
Process: 7066 ExecStart=/bin/false (code=exited, status=1/FAILURE)
Main PID: 7066 (code=exited, status=1/FAILURE)
[Regression Potential]
* This code changes at which point the check for the number of (re)start
attempts are made, so regressions to (re)start units are theoretically
possible.
* However, this code actually reverts a change that caused a regression,
so it goes back to the code that was known to work correctly before ..
* .. and it is still in this form in upstream systemd nowadays,
without further fixes/changes (see comment in the Impact section).
[Other Info]
* Test package was built on Launchpad PPA for all architectures,
with dependencies from Proposed enabled (more up-to-date for SRU).
* The testsuite (in package build time; blocks the package build result)
has identical results to that in buildlog of current xenial-updates.
============================================================================
Testsuite summary for systemd 229
============================================================================
# TOTAL: 128
# PASS: 109
# SKIP: 19
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
[Links]
[1] https://github.com/systemd/systemd/commit/072993504e3e4206ae1019f5461a0372f7d82ddf
[2] https://github.com/systemd/systemd/issues/3166
[3] https://launchpad.net/~mfo/+archive/ubuntu/sf199312
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1795658/+subscriptions
More information about the foundations-bugs
mailing list