[Bug 1834072] Re: Puppet agent using 100% CPU, in sched_yield() loop. Looks like an issue with ruby2.3 which has been fixed but not yet made it into Ubuntu.

Andreas Hasenack andreas at canonical.com
Tue Jul 2 13:38:48 UTC 2019

** Description changed:

-  * An explanation of the effects of the bug on users and
-  * justification for backporting the fix to the stable release.
-  * In addition, it is helpful, but not required, to include an
-    explanation of how the upload fixes this bug.
+ Ruby processes can sometimes get stuck in a loop consuming 100% CPU, as
+ described upstream and in the debian bug report. It has most commonly
+ been seen in the puppet agent.
  [Test Case]
+ It's not easy to reproduce. It has been suggested that this script eventually reproduces the problem:
-  * detailed instructions how to reproduce the bug
+ while nice -n19 /opt/puppetlabs/puppet/bin/ruby sched_yield_loop.rb; do
+ :; done
-  * these should allow someone who is not familiar with the affected
-    package to reproduce the bug and verify that the updated package fixes
-    the problem.
+ Where sched_yield_loop.rb comes from "https://bugs.debian.org/cgi-
+ bin/bugreport.cgi?att=1;bug=876377;filename=sched_yield_loop.rb;msg=22"
+ I personally haven't seen it happen with the script, but maybe it could
+ take days.
  [Regression Potential]
-  * discussion of how regressions are most likely to manifest as a result
- of this change.
-  * It is assumed that any SRU candidate patch is well-tested before
-    upload and has a low overall risk of regression, but it's important
-    to make the effort to think about what ''could'' happen in the
-    event of a regression.
-  * This both shows the SRU team that the risks have been considered,
-    and provides guidance to testers in regression-testing the SRU.
+ Races with threads can be hard to reproduce, and so can regressions.
+ Patch has been applied upstream and to debian for more than a year now.
  [Other Info]
-  * Anything else you think is useful to include
-  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
-  * and address these questions in advance
+ Not at this time.
  [Original Description]
  Ubuntu 16.04
  ruby 2.3.1-2~16.04.12
  kernel 4.4.0-148-generic
  We've noticed an issue across multiple servers where puppet agent will
  seem to get stuck and consume 100% CPU for days or weeks on end until
  manually killed.
  root at ps-prod-jenkins-qa-ui02:~# ps auxwwww|grep -i puppe[t]
  root       1412  0.0  0.2 143716 38680 ?        Ssl  Jun11   0:39 /usr/bin/ruby /usr/bin/puppet agent
  root      34884 74.4  0.3 286848 53724 ?        Rs   Jun23 1141:44 puppet agent: applying configuration
  root     111481 94.1  0.3 288572 54996 ?        Rs   Jun18 8642:32 puppet agent: applying configuration
  root     128479 54.8  0.3 286744 53596 ?        Rs   10:30 250:17 puppet agent: applying configuration
  Strace shows it in a sched_yield() loop:
  root at ps-prod-jenkins-qa-ui02:~# strace -p 34884 -c
  strace: Process 34884 attached
  ^Cstrace: Process 34884 detached
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.002130           0    123189           sched_yield
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.002130                123189           total
  Some googling shows this is a common issue which was supposedly
  fixed/backported to ruby 2.3:
  The following open Ubuntu bugs look to be having the same issue and
  suggest that this fix made it into Debian but never made it into Ubuntu:

You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to ruby2.3 in Ubuntu.

  Puppet agent using 100% CPU, in sched_yield() loop.  Looks like an
  issue with ruby2.3 which has been fixed but not yet made it into

To manage notifications about this bug go to:

More information about the Ubuntu-server-bugs mailing list