[Bug 1020285] Re: Addition of leap second causes spuriously high CPU usage and futex lockups

Luis Henriques luis.henriques at canonical.com
Mon Jul 30 10:30:51 UTC 2012


I've executed the test case on a Lucid system.  Here's the output for the 2.6.32-41.94 kernel:
# uname -a
Linux lucid 2.6.32-41-generic #94-Ubuntu SMP Fri Jul 6 16:51:39 UTC 2012 i686 GNU/Linux
# ./leap_seconds 
now: 1343779191:18314044  diff: 0:275081 rem: 0:0
now: 1343779191:518753118  diff: 0:439074 rem: 0:0
now: 1343779192:18885544  diff: 0:132426 rem: 0:0
now: 1343779192:519186044  diff: 0:300500 rem: 0:0
now: 1343779193:19389186  diff: 0:203142 rem: 0:0
now: 1343779193:520615448  diff: 0:1226262 rem: 0:0
now: 1343779194:20769790  diff: 0:154342 rem: 0:0
now: 1343779194:520932743  diff: 0:162953 rem: 0:0
now: 1343779195:21301107  diff: 0:368364 rem: 0:0
now: 1343779195:521610712  diff: 0:309605 rem: 0:0
now: 1343779196:21818618  diff: 0:207906 rem: 0:0
now: 1343779196:522070412  diff: 0:251794 rem: 0:0
now: 1343779197:22276862  diff: 0:206450 rem: 0:0
now: 1343779197:522477856  diff: 0:200994 rem: 0:0
now: 1343779198:22686111  diff: 0:208255 rem: 0:0
now: 1343779198:522924932  diff: 0:238821 rem: 0:0
now: 1343779199:24238584  diff: 0:1313652 rem: 0:0
now: 1343779199:524540590  diff: 0:302006 rem: 0:0
now: 1343779199:24829471  diff: 0:-999711119 rem: 0:0
now: 1343779199:24938078  diff: 0:-499891393 rem: 0:0
now: 1343779199:24982361  diff: 0:-499955717 rem: 0:0
now: 1343779199:25022793  diff: 0:-499959568 rem: 0:0
now: 1343779199:25063446  diff: 0:-499959347 rem: 0:0
now: 1343779199:25104451  diff: 0:-499958995 rem: 0:0
now: 1343779199:25147178  diff: 0:-499957273 rem: 0:0
now: 1343779199:25193846  diff: 0:-499953332 rem: 0:0
now: 1343779199:25243894  diff: 0:-499949952 rem: 0:0
now: 1343779199:25286937  diff: 0:-499956957 rem: 0:0
now: 1343779199:25332647  diff: 0:-499954290 rem: 0:0
now: 1343779199:25374189  diff: 0:-499958458 rem: 0:0
now: 1343779199:25413411  diff: 0:-499960778 rem: 0:0
... 
# dmesg
[  161.756872] Clock: inserting leap second 23:59:60 UTC

With the new 2.6.32-42.95 kernel, here's the output:
# uname -a
Linux lucid 2.6.32-42-generic #95-Ubuntu SMP Wed Jul 25 15:57:54 UTC 2012 i686 GNU/Linux
root at lucid:/home/ubuntu/Downloads# ./leap_seconds 
now: 1343692791:233925009  diff: 0:178887 rem: 0:0
now: 1343692791:734525073  diff: 0:600064 rem: 0:0
now: 1343692792:234689367  diff: 0:164294 rem: 0:0
now: 1343692792:734955710  diff: 0:266343 rem: 0:0
now: 1343692793:235121427  diff: 0:165717 rem: 0:0
now: 1343692793:735387972  diff: 0:266545 rem: 0:0
now: 1343692794:235636709  diff: 0:248737 rem: 0:0
now: 1343692794:735885332  diff: 0:248623 rem: 0:0
now: 1343692795:236064344  diff: 0:179012 rem: 0:0
now: 1343692795:736370441  diff: 0:306097 rem: 0:0
now: 1343692796:236552965  diff: 0:182524 rem: 0:0
now: 1343692796:736751874  diff: 0:198909 rem: 0:0
now: 1343692797:236931281  diff: 0:179407 rem: 0:0
now: 1343692797:737179866  diff: 0:248585 rem: 0:0
now: 1343692798:237285693  diff: 0:105827 rem: 0:0
now: 1343692798:737387092  diff: 0:101399 rem: 0:0
now: 1343692799:237472566  diff: 0:85474 rem: 0:0
now: 1343692799:737590980  diff: 0:118414 rem: 0:0
now: 1343692800:237755920  diff: 0:164940 rem: 0:0
now: 1343692800:737882098  diff: 0:126178 rem: 0:0
now: 1343692801:238105101  diff: 0:223003 rem: 0:0
now: 1343692801:738282994  diff: 0:177893 rem: 0:0
now: 1343692802:238484975  diff: 0:201981 rem: 0:0
now: 1343692802:739666912  diff: 0:1181937 rem: 0:0
now: 1343692803:239852875  diff: 0:185963 rem: 0:0
now: 1343692803:740122243  diff: 0:269368 rem: 0:0
now: 1343692804:240333133  diff: 0:210890 rem: 0:0
now: 1343692804:740595533  diff: 0:262400 rem: 0:0
now: 1343692805:240814256  diff: 0:218723 rem: 0:0
...
# dmesg
[   53.648003] Clock: inserting leap second 23:59:60 UTC

I'm tagging this bug as verified for Lucid.

** Tags removed: verification-needed-lucid
** Tags added: verification-done-lucid

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to base-files in Ubuntu.
https://bugs.launchpad.net/bugs/1020285

Title:
  Addition of leap second causes spuriously high CPU usage and futex
  lockups

Status in “base-files” package in Ubuntu:
  Fix Released
Status in “linux” package in Ubuntu:
  In Progress
Status in “base-files” source package in Lucid:
  Won't Fix
Status in “linux” source package in Lucid:
  Fix Committed
Status in “base-files” source package in Natty:
  Won't Fix
Status in “linux” source package in Natty:
  Triaged
Status in “base-files” source package in Oneiric:
  Won't Fix
Status in “linux” source package in Oneiric:
  Triaged
Status in “base-files” source package in Precise:
  Won't Fix
Status in “linux” source package in Precise:
  Triaged
Status in “base-files” source package in Quantal:
  Won't Fix
Status in “linux” source package in Quantal:
  In Progress

Bug description:
  [Impact]
  Software that relies on fine-grained pthread timeouts will spin indefinitely and drive up system load following a leap second, when the kernel's idea of time has become desynced and sub-1s timeouts are all hit immediately.  Mysql and Java are in particular reported to be affected by this.  This is a transient issue, in that it will go away the first time the system is rebooted after the leap second and is expected to be fixed before the next leap second occurs; nevertheless admins have been caught off-guard by this misbehavior and in some cases may not have noticed the problem or know what to do about it, so we should help them along by resetting the kernel clock with a minimal-risk base-files update.

  [Test Case]
  1. Find a system that has been online, with mysqld or a java-based process running since before 2012-06-30.
  2. Verify that one or more processes on the system are spinning in futex and driving up the system load.
  3. Upgrade to the base-files package from -proposed.
  4. Verify that the system load comes back down immediately.
  5. A stress-test for leap-second handling has been provided at https://lkml.org/lkml/2012/7/3/37

  [Regression potential]
  No analysis has been done on the effect of resetting the date on applications that require a high-accuracy clock.  While this fixes the problem with the pthreads interfaces, it may cause other problems for other software.  Since the proposed fix is to reset the kernel's date to the current date, which is not atomic, there will be a slight skew of the clock backwards in time.  ntp *should* fix this shortly thereafter for machines that have it enabled.
  Also, because there's a single version check for each copy of the SRU, users whose applications are negatively affected by the running of this date command will also be negatively affected on each subsequent upgrade of the system, up to and including the quantal devel release.

  As widely reported, the addition of the leap second on 2012-06-30 has
  caused high CPU usage and futex lockups in a lot of applications
  including JVMs, Mysql as well as desktop apps like Firefox and
  Thunderbird.

  https://lkml.org/lkml/2012/6/30/122
  http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second
  https://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/

  We've seen this ourselves on the Canonical infrastructure on both
  current Lucid and Precise kernels, i.e.

  ii  linux-image-2.6.32-41-server         2.6.32-41.90                         Linux kernel image for version 2.6.32 on x86_64
  ii  linux-image-3.2.0-24-generic         3.2.0-24.39                          Linux kernel image for version 3.2.0 on 64 bit x86 SMP

  We can also confirm the 'date -s $(date)' workaround fixes the problem
  without requiring a reboot.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/base-files/+bug/1020285/+subscriptions




More information about the foundations-bugs mailing list