[Bug 1125726] Re: boot-time race between /etc/network/if-up.d/ntpdate and "/etc/init.d/ntp start"
Cam Cope
mail at camcope.me
Mon Dec 7 19:23:13 UTC 2015
** Description changed:
- We're seeing a race between if-up.d/ntpdate and the ntp startup script.
+ [Impact]
+ * Hardware clocks are not stepped at boot, which can prevent NTP from ever
+ syncing the clock.
+ Incorrect clocks can cause serious issues in distributed systems.
- 1) if-up.d/ntpdate starts.
- 2) if-up.d/ntpdate acquires the lock "/var/lock/ntpdate-ifup".
- 3) if-up.d/ntpdate stops the ntp service [which isn't running anyway].
- 4) if-up.d/ntpdate starts running ntpdate, which bids UDP *.ntp
- 5) /etc/init.d/rc 2 executes "/etc/rc2.d/S20ntp start"
- 6) /etc/init.d/ntp acquires the lock "/var/lock/ntpdate".
- 7) /etc/init.d/ntp starts the ntp daemon.
- 8) The ntp daemon logs an error, complaining that it cannot bind UDP *.ntp.
- 9) if-up.d/ntpdate now starts the ntp service.
+ * Upstream originally added a lock file to eliminate a race between the ntp
+ service (which keeps the clock synchronized during normal operation) and
+ ntpdate (which is used to step the clock by large intervals at boot time).
+ That change had a flaw which introduced a deadlock. An Ubuntu patch was
+ applied which broke the locking mechanism entirely, reintroducing the race
+ condition.
- The result is a weird churn, though ntpd does end up running at the end.
+ * This change undoes the Ubuntu patch and fixes the deadlock by unlocking
+ before attempting to start the ntp service.
- Should these not be using the same lock file?
+ [Test Case]
+
+ * There are two bugs: The race, and the deadlock. To reproduce the race more
+ consistently:
+ - add 'sleep 30' to '/etc/network/if-up.d/ntpdate' on the line preceding
+ '/usr/sbin/ntpdate-debian -s $OPTS 2>/dev/null || :', and comment out
+ 'invoke-rc.d --quiet $service stop >/dev/null 2>&1 || true'. This will
+ reproduce the case where the ntp service starts between the stop command
+ and the ntpdate command.
+ The result will be that the ntpdate command fails. There will be a
+ message in syslog like:
+ 'ntpdate[17660]: the NTP socket is in use, exiting'
+ - Reintroducing the lock brings back the deadlock issue. Both the ntpdate
+ if-up.d script and the ntp init script check the lock file, but the
+ ntpdate script attempted to start the ntp init script before unlocking
+ the lock. Moving the unlock before the init script invocation fixes
+ the deadlock. The original deadlock behavior is described here:
+ https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/246203
+
+ [Regression Potential]
+
+ * Low. Out-of-sync clocks could be changed a large amount at boot time, but
+ only for machines with static IP's. The clock is only likely to be in this
+ state if the clock was very skewed at boot time, which is also unlikely
+ since NTP usually keeps the software clock in sync during operation and
+ the hardware clock is updated at shutdown.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to ntp in Ubuntu.
https://bugs.launchpad.net/bugs/1125726
Title:
boot-time race between /etc/network/if-up.d/ntpdate and
"/etc/init.d/ntp start"
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1125726/+subscriptions
More information about the Ubuntu-server-bugs
mailing list