[Bug 1711255] [NEW] systemd RestartSec does not seem to work consistently
Leo Davis
leo.davis at dishdigital.com
Thu Aug 17 01:04:37 UTC 2017
Public bug reported:
The unit file for my service xxxx-33005.service:
[Unit]
Description=product xxxx 33005 service
Wants=some.mount
After=some.mount
PartOf=product.target
PartOf=someother.service
[Service]
Type=simple
ExecStart=/opt/product/bin/xxxx -conf=/etc/product.conf
Restart=on-failure
User=product
Group=product
LimitCORE=infinity
LimitNOFILE=10000
Slice=xxxx.slice
WatchdogSec=120
RestartSec=10
StartLimitInterval=120
StartLimitBurst=10
[Install]
WantedBy=product.target
specifies RestartSec=10, which is also what systemd sees:
$ systemctl show xxxx-33005.service
Type=simple
Restart=on-failure
NotifyAccess=main
RestartUSec=10s
...
However, today I saw in the journalctl log:
$ journalctl -u xxxx-33005.service
...
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Watchdog timeout (limit 2min)!
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Unit entered failed state.
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Failed with result 'exit-code'.
Aug 15 23:28:39 a-hostname systemd[1]: xxxx-33005.service: Service hold-off time over, scheduling restart.
Aug 15 23:28:39 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:28:39 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:28:45 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:28:45 a-hostname systemd[1]: Stopping product xxxx 33005 service...
Aug 15 23:28:45 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:33:51 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:33:51 a-hostname xxxx[3629]: Get http://127.0.0.1:8081/: dial tcp 127.0.0.1:8081: getsockopt: connection refused
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Unit entered failed state.
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Failed with result 'exit-code'.
Aug 15 23:34:01 a-hostname systemd[1]: xxxx-33005.service: Service hold-off time over, scheduling restart.
Aug 15 23:34:02 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: xxxx-33005.service: Start request repeated too quickly.
Aug 15 23:34:02 a-hostname systemd[1]: Failed to start product xxxx 33005 service.
Everything looks to me that systemd is behaving as expected until Aug 15
23:34:02 when it attempts to start xxxx-33005.service not at the rate I
specified in RestartSec, but possibly at the default 100ms rate. This
excerpt shows both the expected 10s restart rate as well as the
unexpected restart rate.
$ lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04
$ uname -a
Linux a-hostname 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ apt-cache policy systemd
systemd:
Installed: 229-4ubuntu16
Candidate: 229-4ubuntu19
Version table:
229-4ubuntu19 500
500 http://internal-repo/ubuntu xenial-updates/main amd64 Packages
*** 229-4ubuntu16 100
100 /var/lib/dpkg/status
229-4ubuntu10 500
500 http://internal-repo/ubuntu xenial-security/main amd64 Packages
229-4ubuntu4 500
500 http://internal-repo/ubuntu xenial/main amd64 Packages
** Affects: systemd (Ubuntu)
Importance: Undecided
Status: New
** Description changed:
- The unit file for my service xxxx:
+ The unit file for my service xxxx-33005.service:
[Unit]
Description=product xxxx 33005 service
Wants=some.mount
After=some.mount
PartOf=product.target
PartOf=someother.service
[Service]
Type=simple
ExecStart=/opt/product/bin/xxxx -conf=/etc/product.conf
Restart=on-failure
User=product
Group=product
LimitCORE=infinity
LimitNOFILE=10000
Slice=xxxx.slice
WatchdogSec=120
RestartSec=10
StartLimitInterval=120
StartLimitBurst=10
[Install]
WantedBy=product.target
specifies RestartSec=10, which is also what systemd sees:
$ systemctl show xxxx-33005.service
Type=simple
Restart=on-failure
NotifyAccess=main
RestartUSec=10s
...
However, today I saw in the journalctl log:
$ journalctl -u xxxx-33005.service
...
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Watchdog timeout (limit 2min)!
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Unit entered failed state.
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Failed with result 'exit-code'.
Aug 15 23:28:39 a-hostname systemd[1]: xxxx-33005.service: Service hold-off time over, scheduling restart.
Aug 15 23:28:39 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:28:39 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:28:45 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:28:45 a-hostname systemd[1]: Stopping product xxxx 33005 service...
Aug 15 23:28:45 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:33:51 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:33:51 a-hostname xxxx[3629]: Get http://127.0.0.1:8081/: dial tcp 127.0.0.1:8081: getsockopt: connection refused
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Unit entered failed state.
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Failed with result 'exit-code'.
Aug 15 23:34:01 a-hostname systemd[1]: xxxx-33005.service: Service hold-off time over, scheduling restart.
Aug 15 23:34:02 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: xxxx-33005.service: Start request repeated too quickly.
Aug 15 23:34:02 a-hostname systemd[1]: Failed to start product xxxx 33005 service.
Everything looks to me that systemd is behaving as expected until Aug 15
23:34:02 when it attempts to start xxxx-33005.service not at the rate I
specified in RestartSec, but possibly at the default 100ms rate. This
excerpt shows both the expected 10s restart rate as well as the
unexpected restart rate.
$ lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04
$ uname -a
Linux a-hostname 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ apt-cache policy systemd
systemd:
- Installed: 229-4ubuntu16
- Candidate: 229-4ubuntu19
- Version table:
- 229-4ubuntu19 500
- 500 http://internal-repo/ubuntu xenial-updates/main amd64 Packages
- *** 229-4ubuntu16 100
- 100 /var/lib/dpkg/status
- 229-4ubuntu10 500
- 500 http://internal-repo/ubuntu xenial-security/main amd64 Packages
- 229-4ubuntu4 500
- 500 http://internal-repo/ubuntu xenial/main amd64 Packages
+ Installed: 229-4ubuntu16
+ Candidate: 229-4ubuntu19
+ Version table:
+ 229-4ubuntu19 500
+ 500 http://internal-repo/ubuntu xenial-updates/main amd64 Packages
+ *** 229-4ubuntu16 100
+ 100 /var/lib/dpkg/status
+ 229-4ubuntu10 500
+ 500 http://internal-repo/ubuntu xenial-security/main amd64 Packages
+ 229-4ubuntu4 500
+ 500 http://internal-repo/ubuntu xenial/main amd64 Packages
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1711255
Title:
systemd RestartSec does not seem to work consistently
Status in systemd package in Ubuntu:
New
Bug description:
The unit file for my service xxxx-33005.service:
[Unit]
Description=product xxxx 33005 service
Wants=some.mount
After=some.mount
PartOf=product.target
PartOf=someother.service
[Service]
Type=simple
ExecStart=/opt/product/bin/xxxx -conf=/etc/product.conf
Restart=on-failure
User=product
Group=product
LimitCORE=infinity
LimitNOFILE=10000
Slice=xxxx.slice
WatchdogSec=120
RestartSec=10
StartLimitInterval=120
StartLimitBurst=10
[Install]
WantedBy=product.target
specifies RestartSec=10, which is also what systemd sees:
$ systemctl show xxxx-33005.service
Type=simple
Restart=on-failure
NotifyAccess=main
RestartUSec=10s
...
However, today I saw in the journalctl log:
$ journalctl -u xxxx-33005.service
...
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Watchdog timeout (limit 2min)!
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Unit entered failed state.
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Failed with result 'exit-code'.
Aug 15 23:28:39 a-hostname systemd[1]: xxxx-33005.service: Service hold-off time over, scheduling restart.
Aug 15 23:28:39 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:28:39 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:28:45 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:28:45 a-hostname systemd[1]: Stopping product xxxx 33005 service...
Aug 15 23:28:45 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:33:51 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:33:51 a-hostname xxxx[3629]: Get http://127.0.0.1:8081/: dial tcp 127.0.0.1:8081: getsockopt: connection refused
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Unit entered failed state.
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Failed with result 'exit-code'.
Aug 15 23:34:01 a-hostname systemd[1]: xxxx-33005.service: Service hold-off time over, scheduling restart.
Aug 15 23:34:02 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: xxxx-33005.service: Start request repeated too quickly.
Aug 15 23:34:02 a-hostname systemd[1]: Failed to start product xxxx 33005 service.
Everything looks to me that systemd is behaving as expected until Aug
15 23:34:02 when it attempts to start xxxx-33005.service not at the
rate I specified in RestartSec, but possibly at the default 100ms
rate. This excerpt shows both the expected 10s restart rate as well
as the unexpected restart rate.
$ lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04
$ uname -a
Linux a-hostname 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ apt-cache policy systemd
systemd:
Installed: 229-4ubuntu16
Candidate: 229-4ubuntu19
Version table:
229-4ubuntu19 500
500 http://internal-repo/ubuntu xenial-updates/main amd64 Packages
*** 229-4ubuntu16 100
100 /var/lib/dpkg/status
229-4ubuntu10 500
500 http://internal-repo/ubuntu xenial-security/main amd64 Packages
229-4ubuntu4 500
500 http://internal-repo/ubuntu xenial/main amd64 Packages
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1711255/+subscriptions
More information about the foundations-bugs
mailing list