ACK: [SRU] [Xenial] [PATCH 1/3] x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
Colin Ian King
colin.king at canonical.com
Tue Apr 25 11:02:35 UTC 2017
On 25/04/17 11:20, Kai-Heng Feng wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki at intel.com>
>
> BugLink: https://bugs.launchpad.net/bugs/1686061
>
> On Intel hardware, native_play_dead() uses mwait_play_dead() by
> default and only falls back to the other methods if that fails.
> That also happens during resume from hibernation, when the restore
> (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
> except for the boot one offline.
>
> However, that is problematic, because the address passed to
> __monitor() in mwait_play_dead() is likely to be written to in the
> last phase of hibernate image restoration and that causes the "dead"
> CPU to start executing instructions again. Unfortunately, the page
> containing the address in that CPU's instruction pointer may not be
> valid any more at that point.
>
> First, that page may have been overwritten with image kernel memory
> contents already, so the instructions the CPU attempts to execute may
> simply be invalid. Second, the page tables previously used by that
> CPU may have been overwritten by image kernel memory contents, so the
> address in its instruction pointer is impossible to resolve then.
>
> A report from Varun Koyyalagunta and investigation carried out by
> Chen Yu show that the latter sometimes happens in practice.
>
> To prevent it from happening, temporarily change the smp_ops.play_dead
> pointer during resume from hibernation so that it points to a special
> "play dead" routine which uses hlt_play_dead() and avoids the
> inadvertent "revivals" of "dead" CPUs this way.
>
> A slightly unpleasant consequence of this change is that if the
> system is hibernated with one or more CPUs offline, it will generally
> draw more power after resume than it did before hibernation, because
> the physical state entered by CPUs via hlt_play_dead() is higher-power
> than the mwait_play_dead() one in the majority of cases. It is
> possible to work around this, but it is unclear how much of a problem
> that's going to be in practice, so the workaround will be implemented
> later if it turns out to be necessary.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
> Reported-by: Varun Koyyalagunta <cpudebug at centtech.com>
> Original-by: Chen Yu <yu.c.chen at intel.com>
> Tested-by: Chen Yu <yu.c.chen at intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki at intel.com>
> Acked-by: Ingo Molnar <mingo at kernel.org>
> (cherry picked from commit 406f992e4a372dafbe3c2cff7efbb2002a5c8ebd)
> Signed-off-by: Kai-Heng Feng <kai.heng.feng at canonical.com>
> ---
> arch/x86/include/asm/smp.h | 1 +
> arch/x86/kernel/smpboot.c | 2 +-
> arch/x86/power/cpu.c | 30 ++++++++++++++++++++++++++++++
> kernel/power/hibernate.c | 7 ++++++-
> kernel/power/power.h | 2 ++
> 5 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index f12ceef70ce2..55560e637dfd 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -147,6 +147,7 @@ int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
> int native_cpu_disable(void);
> int common_cpu_die(unsigned int cpu);
> void native_cpu_die(unsigned int cpu);
> +void hlt_play_dead(void);
> void native_play_dead(void);
> void play_dead_common(void);
> void wbinvd_on_cpu(int cpu);
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index afa110370045..54ad7c1d87ad 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1636,7 +1636,7 @@ static inline void mwait_play_dead(void)
> }
> }
>
> -static inline void hlt_play_dead(void)
> +void hlt_play_dead(void)
> {
> if (__this_cpu_read(cpu_info.x86) >= 4)
> wbinvd();
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 9ab52791fed5..6f86dae52011 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -12,6 +12,7 @@
> #include <linux/export.h>
> #include <linux/smp.h>
> #include <linux/perf_event.h>
> +#include <linux/tboot.h>
>
> #include <asm/pgtable.h>
> #include <asm/proto.h>
> @@ -240,6 +241,35 @@ void notrace restore_processor_state(void)
> EXPORT_SYMBOL(restore_processor_state);
> #endif
>
> +#if defined(CONFIG_HIBERNATION) && defined(CONFIG_HOTPLUG_CPU)
> +static void resume_play_dead(void)
> +{
> + play_dead_common();
> + tboot_shutdown(TB_SHUTDOWN_WFS);
> + hlt_play_dead();
> +}
> +
> +int hibernate_resume_nonboot_cpu_disable(void)
> +{
> + void (*play_dead)(void) = smp_ops.play_dead;
> + int ret;
> +
> + /*
> + * Ensure that MONITOR/MWAIT will not be used in the "play dead" loop
> + * during hibernate image restoration, because it is likely that the
> + * monitored address will be actually written to at that time and then
> + * the "dead" CPU will attempt to execute instructions again, but the
> + * address in its instruction pointer may not be possible to resolve
> + * any more at that point (the page tables used by it previously may
> + * have been overwritten by hibernate image data).
> + */
> + smp_ops.play_dead = resume_play_dead;
> + ret = disable_nonboot_cpus();
> + smp_ops.play_dead = play_dead;
> + return ret;
> +}
> +#endif
> +
> /*
> * When bsp_check() is called in hibernate and suspend, cpu hotplug
> * is disabled already. So it's unnessary to handle race condition between
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 922617f4c854..7ee9011eee78 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -410,6 +410,11 @@ int hibernation_snapshot(int platform_mode)
> goto Close;
> }
>
> +int __weak hibernate_resume_nonboot_cpu_disable(void)
> +{
> + return disable_nonboot_cpus();
> +}
> +
> /**
> * resume_target_kernel - Restore system state from a hibernation image.
> * @platform_mode: Whether or not to use the platform driver.
> @@ -434,7 +439,7 @@ static int resume_target_kernel(bool platform_mode)
> if (error)
> goto Cleanup;
>
> - error = disable_nonboot_cpus();
> + error = hibernate_resume_nonboot_cpu_disable();
> if (error)
> goto Enable_cpus;
>
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index caadb566e82b..8025a2baaac6 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -38,6 +38,8 @@ static inline char *check_image_kernel(struct swsusp_info *info)
> }
> #endif /* CONFIG_ARCH_HIBERNATION_HEADER */
>
> +extern int hibernate_resume_nonboot_cpu_disable(void);
> +
> /*
> * Keep some memory free so that I/O operations can succeed without paging
> * [Might this be more than 4 MB?]
>
This is a clean upstream cherry pick of fix that has existed for a while
and we're using in Yakkey+, and addresses the issue and has been also
tested before it landed upstream.
Acked-by: Colin Ian King <colin.king at canonical.com>
More information about the kernel-team
mailing list