[Bug 1245808] [NEW] Ubuntu server should not suppress conosle outputs

Munehisa Kamata kamatam at amazon.com
Tue Oct 29 08:25:20 UTC 2013


Public bug reported:

Hello,

Ubuntu server(12.04 LTS, 12.10, 13.04 and 13.10) currently uses the
following /proc/sys/kernel/printk configuration by default.

ubuntu at ip-10-120-14-14:~$ cat /proc/sys/kernel/printk
4       4       1       7

This configuration suppresses register dump and stacktrace outputs from
lockup detector and khungtaskd on a console.

For example:

lockup detector:
[41812904.107136] BUG: soft lockup - CPU#1 stuck for 22s! [insmod:1003]
[41812904.107213] Stack:
[41812904.107230] Call Trace:
[41812904.107269] Code: 90 41 89 fe 65 44 8b 2c 25 10 da 00 00 66 66 90 0f ae e8 e8 f9 58 d0 ff 66 90 41 89 c4 eb 11 66 90 f3 90 65 8b 1c 25 10 da 00 00 <41> 39 dd 75 20 66 66 90 0f ae e8 e8 d6 58 d0 ff 66 90 89 c2 44
[41812908.227029] INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)

khungtaskd:
[41818362.469083] INFO: task swapon:1032 blocked for more than 120 seconds.
[41818362.469096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[41818482.409149] INFO: task swapon:1032 blocked for more than 120 seconds.
[41818482.409161] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

These lack the most important information for debugging lockup/hung-up
issue and make debugging such kernel issue difficult.

On the other hand, other distros basically use the different
configuration by default as shown below. I believe that Ubuntu server
should use this instead of the currnet one.

[ec2-user at ip-10-132-147-86 ~]$ cat /proc/sys/kernel/printk
7       4       1       7

This will not suppress such important console outputs and we can
understand where a problem happens at a glance.

For example:

[41814106.447145] BUG: soft lockup - CPU#1 stuck for 23s! [insmod:1066]
[41814106.447160] Modules linked in: stallmod(O+) isofs acpiphp
[41814106.447174] CPU 1
[41814106.447178] Modules linked in: stallmod(O+) isofs acpiphp
[41814106.447191]
[41814106.447198] Pid: 1066, comm: insmod Tainted: G O 3.2.0-52-virtual #78-Ubuntu
[41814106.447209] RIP: e030:[<ffffffff8101c216>] [<ffffffff8101c216>] native_read_tsc+0x6/0x20
[41814106.447225] RSP: e02b:ffff8801d0b4dea8 EFLAGS: 00000246
[41814106.447230] RAX: 000000004c5a957c RBX: 0000000000000001 RCX: 000000004c5a954c
...snip...
[41814106.447296] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[41814106.447304] Process insmod (pid: 1066, threadinfo ffff8801d0b4c000, task ffff8801d09b96e0)
[41814106.447311] Stack:
[41814106.447315] ffff8801d0b4ded8 ffffffff8131693a 0000000000001bbd 0000000000000000
[41814106.447326] ffffffffa000c000 0000000001351010 ffff8801d0b4dee8 ffffffff8131686c
[41814106.447336] ffff8801d0b4df08 ffffffffa000702a 0000000000000000 ffffffffa0009020
[41814106.447347] Call Trace:
[41814106.447357] [<ffffffff8131693a>] delay_tsc+0x4a/0x80
[41814106.447365] [<ffffffffa000c000>] ? 0xffffffffa000bfff
[41814106.447371] [<ffffffff8131686c>] __const_udelay+0x2c/0x30
[41814106.447380] [<ffffffffa000702a>] stall_timeout+0x2a/0x38 [stallmod]
[41814106.447387] [<ffffffffa000c02e>] init_module+0x2e/0x1000 [stallmod]
[41814106.447397] [<ffffffff81002040>] do_one_initcall+0x40/0x180
...snip...
[41814106.447541] [<ffffffff81661b42>] system_call_fastpath+0x16/0x1b
[41814112.951787] INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)
[41814112.951801] sending NMI to all CPUs:
[41814123.412289] sched: RT throttling activated


On Ubuntu server, the default configuration is in /etc/sysctl.d/10-console-messages.conf owned by procps package.

ubuntu at ip-10-120-14-14:~$ cat /etc/sysctl.d/10-console-messages.conf

# the following stops low-level messages on console
kernel.printk = 4 4 1 7
ubuntu at ip-10-120-14-14:~$ 

Here is a possible fix.

--- /etc/sysctl.d/10-console-messages.conf.orig 2013-09-12 07:36:09.101819575 +0000
+++ /etc/sysctl.d/10-console-messages.conf 2013-09-12 07:40:16.517819543 +0000
@@ -1,3 +1,6 @@

-# the following stops low-level messages on console
-kernel.printk = 4 4 1 7
+# To stop low-level messages on console, use the following configuration instead.
+#
+# kernel.printk = 4 4 1 7
+#
+kernel.printk = 7 4 1 7

As far as I can see, this has been originally brought from Debian[1] but I don't think that the point of the original report is still true today. I believe that suppressing register dump and stacktrace on console by default is not good idea, especially on server-class OS.
Of course, this can be easily configured by end-user themselves, but most of them will never touch such configuration unless someone asks after a problem has happened.

Do you have any concerns about this? Any comments would be appreciated.


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=292834

** Affects: procps (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to procps in Ubuntu.
https://bugs.launchpad.net/bugs/1245808

Title:
  Ubuntu server should not suppress conosle outputs

Status in “procps” package in Ubuntu:
  New

Bug description:
  Hello,

  Ubuntu server(12.04 LTS, 12.10, 13.04 and 13.10) currently uses the
  following /proc/sys/kernel/printk configuration by default.

  ubuntu at ip-10-120-14-14:~$ cat /proc/sys/kernel/printk
  4       4       1       7

  This configuration suppresses register dump and stacktrace outputs
  from lockup detector and khungtaskd on a console.

  For example:

  lockup detector:
  [41812904.107136] BUG: soft lockup - CPU#1 stuck for 22s! [insmod:1003]
  [41812904.107213] Stack:
  [41812904.107230] Call Trace:
  [41812904.107269] Code: 90 41 89 fe 65 44 8b 2c 25 10 da 00 00 66 66 90 0f ae e8 e8 f9 58 d0 ff 66 90 41 89 c4 eb 11 66 90 f3 90 65 8b 1c 25 10 da 00 00 <41> 39 dd 75 20 66 66 90 0f ae e8 e8 d6 58 d0 ff 66 90 89 c2 44
  [41812908.227029] INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)

  khungtaskd:
  [41818362.469083] INFO: task swapon:1032 blocked for more than 120 seconds.
  [41818362.469096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [41818482.409149] INFO: task swapon:1032 blocked for more than 120 seconds.
  [41818482.409161] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

  These lack the most important information for debugging lockup/hung-up
  issue and make debugging such kernel issue difficult.

  On the other hand, other distros basically use the different
  configuration by default as shown below. I believe that Ubuntu server
  should use this instead of the currnet one.

  [ec2-user at ip-10-132-147-86 ~]$ cat /proc/sys/kernel/printk
  7       4       1       7

  This will not suppress such important console outputs and we can
  understand where a problem happens at a glance.

  For example:

  [41814106.447145] BUG: soft lockup - CPU#1 stuck for 23s! [insmod:1066]
  [41814106.447160] Modules linked in: stallmod(O+) isofs acpiphp
  [41814106.447174] CPU 1
  [41814106.447178] Modules linked in: stallmod(O+) isofs acpiphp
  [41814106.447191]
  [41814106.447198] Pid: 1066, comm: insmod Tainted: G O 3.2.0-52-virtual #78-Ubuntu
  [41814106.447209] RIP: e030:[<ffffffff8101c216>] [<ffffffff8101c216>] native_read_tsc+0x6/0x20
  [41814106.447225] RSP: e02b:ffff8801d0b4dea8 EFLAGS: 00000246
  [41814106.447230] RAX: 000000004c5a957c RBX: 0000000000000001 RCX: 000000004c5a954c
  ...snip...
  [41814106.447296] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  [41814106.447304] Process insmod (pid: 1066, threadinfo ffff8801d0b4c000, task ffff8801d09b96e0)
  [41814106.447311] Stack:
  [41814106.447315] ffff8801d0b4ded8 ffffffff8131693a 0000000000001bbd 0000000000000000
  [41814106.447326] ffffffffa000c000 0000000001351010 ffff8801d0b4dee8 ffffffff8131686c
  [41814106.447336] ffff8801d0b4df08 ffffffffa000702a 0000000000000000 ffffffffa0009020
  [41814106.447347] Call Trace:
  [41814106.447357] [<ffffffff8131693a>] delay_tsc+0x4a/0x80
  [41814106.447365] [<ffffffffa000c000>] ? 0xffffffffa000bfff
  [41814106.447371] [<ffffffff8131686c>] __const_udelay+0x2c/0x30
  [41814106.447380] [<ffffffffa000702a>] stall_timeout+0x2a/0x38 [stallmod]
  [41814106.447387] [<ffffffffa000c02e>] init_module+0x2e/0x1000 [stallmod]
  [41814106.447397] [<ffffffff81002040>] do_one_initcall+0x40/0x180
  ...snip...
  [41814106.447541] [<ffffffff81661b42>] system_call_fastpath+0x16/0x1b
  [41814112.951787] INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)
  [41814112.951801] sending NMI to all CPUs:
  [41814123.412289] sched: RT throttling activated

  
  On Ubuntu server, the default configuration is in /etc/sysctl.d/10-console-messages.conf owned by procps package.

  ubuntu at ip-10-120-14-14:~$ cat /etc/sysctl.d/10-console-messages.conf

  # the following stops low-level messages on console
  kernel.printk = 4 4 1 7
  ubuntu at ip-10-120-14-14:~$ 

  Here is a possible fix.

  --- /etc/sysctl.d/10-console-messages.conf.orig 2013-09-12 07:36:09.101819575 +0000
  +++ /etc/sysctl.d/10-console-messages.conf 2013-09-12 07:40:16.517819543 +0000
  @@ -1,3 +1,6 @@

  -# the following stops low-level messages on console
  -kernel.printk = 4 4 1 7
  +# To stop low-level messages on console, use the following configuration instead.
  +#
  +# kernel.printk = 4 4 1 7
  +#
  +kernel.printk = 7 4 1 7

  As far as I can see, this has been originally brought from Debian[1] but I don't think that the point of the original report is still true today. I believe that suppressing register dump and stacktrace on console by default is not good idea, especially on server-class OS.
  Of course, this can be easily configured by end-user themselves, but most of them will never touch such configuration unless someone asks after a problem has happened.

  Do you have any concerns about this? Any comments would be
  appreciated.

  
  [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=292834

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1245808/+subscriptions



More information about the foundations-bugs mailing list