[3.19.y-ckt stable] Patch "sched/cputime: Fix steal time accounting vs. CPU hotplug" has been added to the 3.19.y-ckt tree

Kamal Mostafa kamal at canonical.com
Sat Apr 2 00:49:57 UTC 2016


This is a note to let you know that I have just added a patch titled

    sched/cputime: Fix steal time accounting vs. CPU hotplug

to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree 
which can be found at:

    http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue

This patch is scheduled to be released in version 3.19.8-ckt18.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.19.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

---8<------------------------------------------------------------

>From 54fd2004297bd7e2d1daf8ec27e22d4405ef1d7a Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx at linutronix.de>
Date: Fri, 4 Mar 2016 15:59:42 +0100
Subject: sched/cputime: Fix steal time accounting vs. CPU hotplug

commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream.

On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());

	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.

Signed-off-by: Thomas Gleixner <tglx at linutronix.de>
Acked-by: Rik van Riel <riel at redhat.com>
Cc: Frederic Weisbecker <fweisbec at gmail.com>
Cc: Glauber Costa <glommer at parallels.com>
Cc: Linus Torvalds <torvalds at linux-foundation.org>
Cc: Peter Zijlstra <peterz at infradead.org>
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar <mingo at kernel.org>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
---
 kernel/sched/core.c  |  1 +
 kernel/sched/sched.h | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6b5909b..e5963b3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5314,6 +5314,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)

 	case CPU_UP_PREPARE:
 		rq->calc_load_update = calc_load_update;
+		account_reset_rq(rq);
 		break;

 	case CPU_ONLINE:
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6183e4b..0efd0fe 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1613,3 +1613,16 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+static inline void account_reset_rq(struct rq *rq)
+{
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+	rq->prev_irq_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT
+	rq->prev_steal_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+	rq->prev_steal_time_rq = 0;
+#endif
+}
--
2.7.4





More information about the kernel-team mailing list