[3.13.y.z extended stable] Patch "memcg: do not hang on OOM when killed by userspace OOM access to memory reserves" has been added to staging queue
Kamal Mostafa
kamal at canonical.com
Tue Jul 15 21:29:31 UTC 2014
This is a note to let you know that I have just added a patch titled
memcg: do not hang on OOM when killed by userspace OOM access to memory reserves
to the linux-3.13.y-queue branch of the 3.13.y.z extended stable tree
which can be found at:
http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.13.y-queue
This patch is scheduled to be released in version 3.13.11.5.
If you, or anyone else, feels it should not be added to this tree, please
reply to this email.
For more information about the 3.13.y.z tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable
Thanks.
-Kamal
------
>From f936d94fb64a6866a5b304f5c3beb53fbacc0321 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko at suse.cz>
Date: Wed, 4 Jun 2014 16:07:36 -0700
Subject: memcg: do not hang on OOM when killed by userspace OOM access to
memory reserves
commit d8dc595ce3909fbc131bdf5ab8c9808fe624b18d upstream.
Eric has reported that he can see task(s) stuck in memcg OOM handler
regularly. The only way out is to
echo 0 > $GROUP/memory.oom_control
His usecase is:
- Setup a hierarchy with memory and the freezer (disable kernel oom and
have a process watch for oom).
- In that memory cgroup add a process with one thread per cpu.
- In one thread slowly allocate once per second I think it is 16M of ram
and mlock and dirty it (just to force the pages into ram and stay
there).
- When oom is achieved loop:
* attempt to freeze all of the tasks.
* if frozen send every task SIGKILL, unfreeze, remove the directory in
cgroupfs.
Eric has then pinpointed the issue to be memcg specific.
All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled.
Those that have received fatal signal will bypass the charge and should
continue on their way out. The tricky part is that the exit path might
trigger a page fault (e.g. exit_robust_list), thus the memcg charge,
while its memcg is still under OOM because nobody has released any charges
yet.
Unlike with the in-kernel OOM handler the exiting task doesn't get
TIF_MEMDIE set so it doesn't shortcut further charges of the killed task
and falls to the memcg OOM again without any way out of it as there are no
fatal signals pending anymore.
This patch fixes the issue by checking PF_EXITING early in
mem_cgroup_try_charge and bypass the charge same as if it had fatal
signal pending or TIF_MEMDIE set.
Normally exiting tasks (aka not killed) will bypass the charge now but
this should be OK as the task is leaving and will release memory and
increasing the memory pressure just to release it in a moment seems
dubious wasting of cycles. Besides that charges after exit_signals should
be rare.
I am bringing this patch again (rebased on the current mmotm tree). I
hope we can move forward finally. If there is still an opposition then
I would really appreciate a concurrent approach so that we can discuss
alternatives.
http://comments.gmane.org/gmane.linux.kernel.stable/77650 is a reference
to the followup discussion when the patch has been dropped from the mmotm
last time.
Reported-by: Eric W. Biederman <ebiederm at xmission.com>
Signed-off-by: Michal Hocko <mhocko at suse.cz>
Acked-by: David Rientjes <rientjes at google.com>
Acked-by: Johannes Weiner <hannes at cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
[ kamal: backport to 3.13: whitespace ]
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
---
mm/memcontrol.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f506001..44b0eec 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2706,8 +2706,9 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
* in system level. So, allow to go ahead dying process in addition to
* MEMDIE process.
*/
- if (unlikely(test_thread_flag(TIF_MEMDIE)
- || fatal_signal_pending(current)))
+ if (unlikely(test_thread_flag(TIF_MEMDIE) ||
+ fatal_signal_pending(current) ||
+ current->flags & PF_EXITING))
goto bypass;
if (unlikely(task_in_memcg_oom(current)))
--
1.9.1
More information about the kernel-team
mailing list