[X][PATCH 0/4] LP#1821259 Fix for deadlock in cpu_stopper
Mauricio Faria de Oliveira
mfo at canonical.com
Thu Mar 21 23:44:08 UTC 2019
BugLink: https://bugs.launchpad.net/bugs/1821259
[Impact]
* This problem hard locks up 2 CPUs in a deadlock, and this
soft locks up other CPUs as an effect; the system becomes
unusable.
* This is relatively rare / difficult to hit because it's a
corner case in scheduling/load balancing that needs timing
with CPU stopper code. And it needs SMP plus _NUMA_ system.
(but it can be hit with synthetic test case attached in LP.)
* Since SMP plus NUMA usually equals _servers_ it looks like
a good idea to prevent this bug / hard lockups / rebooting.
* The fix resolves the potential deadlock by removing one of
the calls required to deadlock from under the locked code.
[Test Case]
* There's a synthetic test case to reproduce this problem
(although without the stack traces - just a system hang)
attached to this LP bug.
* It uses kprobes/mdelay/cpu stopper calls to force the code
to execute and force the timing/locking condition to occur.
* $ sudo insmod kmod-stopper.ko
Some dmesg logging occurs, and systems either hangs or not.
See examples in comments.
[Regression Potential]
* These are patches to the cpu stop_machine.c code, and they
change a bit how it works; however, there are no upstream
fixes for these patches anymore and they are still the top
of the 'git log --oneline -- kernel/stop_machine.c' output.
* These patches have been verified with the synthetic test case
and 'stress-ng --class scheduler --sequential 0' (no regressions)
on guest with 2 CPUs and one physical system with 24 CPUs.
[Other Info]
* The patches are required on Xenial and later.
* There are 4 patches for Xenial, and 2 patches pending for Bionic.
* All patches are applied from Cosmic onwards.
Isaac J. Manjarres (2):
stop_machine: Disable preemption when waking two stopper threads
stop_machine: Disable preemption after queueing stopper threads
Peter Zijlstra (1):
stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
Prasad Sodagudi (1):
stop_machine: Atomically queue and wake stopper threads
kernel/stop_machine.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)
--
2.17.1
More information about the kernel-team
mailing list