LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/2] oom and TIF_MEMDIE setting to mm-less tasks fixes
@ 2014-12-29 12:32 Michal Hocko
  2014-12-29 12:32 ` [PATCH 1/2] oom: Don't count on mm-less current process Michal Hocko
  2014-12-29 12:32 ` [PATCH 2/2] oom: Make sure that TIF_MEMDIE is set under task_lock Michal Hocko
  0 siblings, 2 replies; 3+ messages in thread
From: Michal Hocko @ 2014-12-29 12:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Rientjes, Tetsuo Handa, Oleg Nesterov, linux-mm, LKML

Hi Andrew,
could you pick up these two fixes which came out of a longer discussion
started here: http://marc.info/?l=linux-mm&m=141839249819519. The thread
became quite confusing as multiple issues were discussed there so I think
reposting them in a new thread will make more sense.

Thanks


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] oom: Don't count on mm-less current process.
  2014-12-29 12:32 [PATCH 0/2] oom and TIF_MEMDIE setting to mm-less tasks fixes Michal Hocko
@ 2014-12-29 12:32 ` Michal Hocko
  2014-12-29 12:32 ` [PATCH 2/2] oom: Make sure that TIF_MEMDIE is set under task_lock Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2014-12-29 12:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Rientjes, Tetsuo Handa, Oleg Nesterov, linux-mm, LKML

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

out_of_memory() doesn't trigger the OOM killer if the current task is already
exiting or it has fatal signals pending, and gives the task access to memory
reserves instead. However, doing so is wrong if out_of_memory() is called by
an allocation (e.g. from exit_task_work()) after the current task has already
released its memory and cleared TIF_MEMDIE at exit_mm(). If we again set
TIF_MEMDIE to post-exit_mm() current task, the OOM killer will be blocked by
the task sitting in the final schedule() waiting for its parent to reap it.
It will trigger an OOM livelock if its parent is unable to reap it due to
doing an allocation and waiting for the OOM killer to kill it.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.cz>
---
 mm/oom_kill.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index d503e9ce1c7b..f82dd13cca68 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -643,8 +643,12 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	 * If current has a pending SIGKILL or is exiting, then automatically
 	 * select it.  The goal is to allow it to allocate so that it may
 	 * quickly exit and free its memory.
+	 *
+	 * But don't select if current has already released its mm and cleared
+	 * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
 	 */
-	if (fatal_signal_pending(current) || task_will_free_mem(current)) {
+	if (current->mm &&
+	    (fatal_signal_pending(current) || task_will_free_mem(current))) {
 		set_thread_flag(TIF_MEMDIE);
 		return;
 	}
-- 
2.1.4


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 2/2] oom: Make sure that TIF_MEMDIE is set under task_lock
  2014-12-29 12:32 [PATCH 0/2] oom and TIF_MEMDIE setting to mm-less tasks fixes Michal Hocko
  2014-12-29 12:32 ` [PATCH 1/2] oom: Don't count on mm-less current process Michal Hocko
@ 2014-12-29 12:32 ` Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2014-12-29 12:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Rientjes, Tetsuo Handa, Oleg Nesterov, linux-mm, LKML

OOM killer tries to exclude tasks which do not have mm_struct associated
because killing such a task wouldn't help much. The OOM victim gets
TIF_MEMDIE set to disable OOM killer while the current victim releases
the memory and then enables the OOM killer again by dropping the flag.

oom_kill_process is currently prone to a race condition when the OOM
victim is already exiting and TIF_MEMDIE is set after the task releases
its address space. This might theoretically lead to OOM livelock if
the OOM victim blocks on an allocation later during exiting because
it wouldn't kill any other process and the exiting one won't be able
to exit. The situation is highly unlikely because the OOM victim is
expected to release some memory which should help to sort out OOM
situation.

Fix this by checking task->mm and setting TIF_MEMDIE flag under task_lock
which will serialize the OOM killer with exit_mm which sets task->mm to
NULL.
Setting the flag for current is not necessary because check and set is
not racy.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/oom_kill.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f82dd13cca68..294493a7ae4b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -438,11 +438,14 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 * If the task is already exiting, don't alarm the sysadmin or kill
 	 * its children or threads, just set TIF_MEMDIE so it can die quickly
 	 */
-	if (task_will_free_mem(p)) {
+	task_lock(p);
+	if (p->mm && task_will_free_mem(p)) {
 		set_tsk_thread_flag(p, TIF_MEMDIE);
+		task_unlock(p);
 		put_task_struct(p);
 		return;
 	}
+	task_unlock(p);
 
 	if (__ratelimit(&oom_rs))
 		dump_header(p, gfp_mask, order, memcg, nodemask);
@@ -492,6 +495,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 
 	/* mm cannot safely be dereferenced after task_unlock(victim) */
 	mm = victim->mm;
+	set_tsk_thread_flag(victim, TIF_MEMDIE);
 	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
 		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
 		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
@@ -522,7 +526,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		}
 	rcu_read_unlock();
 
-	set_tsk_thread_flag(victim, TIF_MEMDIE);
 	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
 	put_task_struct(victim);
 }
-- 
2.1.4


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-29 12:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-29 12:32 [PATCH 0/2] oom and TIF_MEMDIE setting to mm-less tasks fixes Michal Hocko
2014-12-29 12:32 ` [PATCH 1/2] oom: Don't count on mm-less current process Michal Hocko
2014-12-29 12:32 ` [PATCH 2/2] oom: Make sure that TIF_MEMDIE is set under task_lock Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).