[ 3.8.y.z extended stable ] Patch "powerpc: Fix stack overflow crash in resume_kernel when ftracing" has been added to staging queue

Kamal Mostafa kamal at canonical.com
Tue Jun 25 22:20:04 UTC 2013


This is a note to let you know that I have just added a patch titled

    powerpc: Fix stack overflow crash in resume_kernel when ftracing

to the linux-3.8.y-queue branch of the 3.8.y.z extended stable tree 
which can be found at:

 http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.8.y-queue

This patch is scheduled to be released in version 3.8.13.4.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.8.y.z tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

------

>From 2b1f1bde1fb0d6857ea1f6b5d6980f0e06058c28 Mon Sep 17 00:00:00 2001
From: Michael Ellerman <michael at ellerman.id.au>
Date: Thu, 13 Jun 2013 21:04:56 +1000
Subject: powerpc: Fix stack overflow crash in resume_kernel when ftracing

commit 0e37739b1c96d65e6433998454985de994383019 upstream.

It's possible for us to crash when running with ftrace enabled, eg:

  Bad kernel stack pointer bffffd12 at c00000000000a454
  cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
      pc: c00000000000a454: resume_kernel+0x34/0x60
      lr: c00000000000335c: performance_monitor_common+0x15c/0x180
      sp: bffffd12
     msr: 8000000000001032
     dar: bffffd12
   dsisr: 42000000

If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.

Dumping the stack we see:

  3:mon> t c0000002ecab0000
  [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
  [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
  --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
  [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
  [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
  [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
  [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
  [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
  [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
  [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
  [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
  [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

  ... and so on

__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.

This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.

The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().

Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.

[ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873
  "powerpc: Rework runlatch code" by myself --BenH
]

Signed-off-by: Michael Ellerman <michael at ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
---
 arch/powerpc/include/asm/exception-64s.h | 2 +-
 arch/powerpc/kernel/process.c            | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index ad708dd..88e0825 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -413,7 +413,7 @@ label##_common:							\
  */
 #define STD_EXCEPTION_COMMON_ASYNC(trap, label, hdlr)		  \
 	EXCEPTION_COMMON(trap, label, hdlr, ret_from_except_lite, \
-			 FINISH_NAP;RUNLATCH_ON;DISABLE_INTS)
+			 FINISH_NAP;DISABLE_INTS;RUNLATCH_ON)

 /*
  * When the idle code in power4_idle puts the CPU into NAP mode,
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 8143067..80ea4b5 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1169,7 +1169,7 @@ EXPORT_SYMBOL(dump_stack);

 #ifdef CONFIG_PPC64
 /* Called with hard IRQs off */
-void __ppc64_runlatch_on(void)
+void notrace __ppc64_runlatch_on(void)
 {
 	struct thread_info *ti = current_thread_info();
 	unsigned long ctrl;
@@ -1182,7 +1182,7 @@ void __ppc64_runlatch_on(void)
 }

 /* Called with hard IRQs off */
-void __ppc64_runlatch_off(void)
+void notrace __ppc64_runlatch_off(void)
 {
 	struct thread_info *ti = current_thread_info();
 	unsigned long ctrl;
--
1.8.1.2





More information about the kernel-team mailing list