LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Michal Hocko <mhocko@suse.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.14 32/62] mm, oom: fix concurrent munlock and oom reaper unmap, v3
Date: Mon, 14 May 2018 08:48:48 +0200	[thread overview]
Message-ID: <20180514064818.131463728@linuxfoundation.org> (raw)
In-Reply-To: <20180514064816.436958006@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Rientjes <rientjes@google.com>

commit 27ae357fa82be5ab73b2ef8d39dcb8ca2563483a upstream.

Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.

This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma.  Since munlock_vma_pages_range() depends
on clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.

This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup().  If the pmd
is zapped by the oom reaper during follow_page_mask() after the check
for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a
kernel oops.

Fix this by manually freeing all possible memory from the mm before
doing the munlock and then setting MMF_OOM_SKIP.  The oom reaper can not
run on the mm anymore so the munlock is safe to do in exit_mmap().  It
also matches the logic that the oom reaper currently uses for
determining when to set MMF_OOM_SKIP itself, so there's no new risk of
excessive oom killing.

This issue fixes CVE-2018-1000200.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/oom.h |    2 +
 mm/mmap.c           |   44 ++++++++++++++++++------------
 mm/oom_kill.c       |   74 ++++++++++++++++++++++++++++------------------------
 3 files changed, 68 insertions(+), 52 deletions(-)

--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -95,6 +95,8 @@ static inline int check_stable_address_s
 	return 0;
 }
 
+void __oom_reap_task_mm(struct mm_struct *mm);
+
 extern unsigned long oom_badness(struct task_struct *p,
 		struct mem_cgroup *memcg, const nodemask_t *nodemask,
 		unsigned long totalpages);
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2982,6 +2982,32 @@ void exit_mmap(struct mm_struct *mm)
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
 
+	if (unlikely(mm_is_oom_victim(mm))) {
+		/*
+		 * Manually reap the mm to free as much memory as possible.
+		 * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
+		 * this mm from further consideration.  Taking mm->mmap_sem for
+		 * write after setting MMF_OOM_SKIP will guarantee that the oom
+		 * reaper will not run on this mm again after mmap_sem is
+		 * dropped.
+		 *
+		 * Nothing can be holding mm->mmap_sem here and the above call
+		 * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
+		 * __oom_reap_task_mm() will not block.
+		 *
+		 * This needs to be done before calling munlock_vma_pages_all(),
+		 * which clears VM_LOCKED, otherwise the oom reaper cannot
+		 * reliably test it.
+		 */
+		mutex_lock(&oom_lock);
+		__oom_reap_task_mm(mm);
+		mutex_unlock(&oom_lock);
+
+		set_bit(MMF_OOM_SKIP, &mm->flags);
+		down_write(&mm->mmap_sem);
+		up_write(&mm->mmap_sem);
+	}
+
 	if (mm->locked_vm) {
 		vma = mm->mmap;
 		while (vma) {
@@ -3003,24 +3029,6 @@ void exit_mmap(struct mm_struct *mm)
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
-
-	if (unlikely(mm_is_oom_victim(mm))) {
-		/*
-		 * Wait for oom_reap_task() to stop working on this
-		 * mm. Because MMF_OOM_SKIP is already set before
-		 * calling down_read(), oom_reap_task() will not run
-		 * on this "mm" post up_write().
-		 *
-		 * mm_is_oom_victim() cannot be set from under us
-		 * either because victim->mm is already set to NULL
-		 * under task_lock before calling mmput and oom_mm is
-		 * set not NULL by the OOM killer only if victim->mm
-		 * is found not NULL while holding the task_lock.
-		 */
-		set_bit(MMF_OOM_SKIP, &mm->flags);
-		down_write(&mm->mmap_sem);
-		up_write(&mm->mmap_sem);
-	}
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(&tlb, 0, -1);
 
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -456,7 +456,6 @@ bool process_shares_mm(struct task_struc
 	return false;
 }
 
-
 #ifdef CONFIG_MMU
 /*
  * OOM Reaper kernel thread which tries to reap the memory used by the OOM
@@ -467,16 +466,51 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape
 static struct task_struct *oom_reaper_list;
 static DEFINE_SPINLOCK(oom_reaper_lock);
 
-static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+void __oom_reap_task_mm(struct mm_struct *mm)
 {
-	struct mmu_gather tlb;
 	struct vm_area_struct *vma;
+
+	/*
+	 * Tell all users of get_user/copy_from_user etc... that the content
+	 * is no longer stable. No barriers really needed because unmapping
+	 * should imply barriers already and the reader would hit a page fault
+	 * if it stumbled over a reaped memory.
+	 */
+	set_bit(MMF_UNSTABLE, &mm->flags);
+
+	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
+		if (!can_madv_dontneed_vma(vma))
+			continue;
+
+		/*
+		 * Only anonymous pages have a good chance to be dropped
+		 * without additional steps which we cannot afford as we
+		 * are OOM already.
+		 *
+		 * We do not even care about fs backed pages because all
+		 * which are reclaimable have already been reclaimed and
+		 * we do not want to block exit_mmap by keeping mm ref
+		 * count elevated without a good reason.
+		 */
+		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+			struct mmu_gather tlb;
+
+			tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
+			unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
+					 NULL);
+			tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end);
+		}
+	}
+}
+
+static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+{
 	bool ret = true;
 
 	/*
 	 * We have to make sure to not race with the victim exit path
 	 * and cause premature new oom victim selection:
-	 * __oom_reap_task_mm		exit_mm
+	 * oom_reap_task_mm		exit_mm
 	 *   mmget_not_zero
 	 *				  mmput
 	 *				    atomic_dec_and_test
@@ -524,35 +558,8 @@ static bool __oom_reap_task_mm(struct ta
 
 	trace_start_task_reaping(tsk->pid);
 
-	/*
-	 * Tell all users of get_user/copy_from_user etc... that the content
-	 * is no longer stable. No barriers really needed because unmapping
-	 * should imply barriers already and the reader would hit a page fault
-	 * if it stumbled over a reaped memory.
-	 */
-	set_bit(MMF_UNSTABLE, &mm->flags);
-
-	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
-		if (!can_madv_dontneed_vma(vma))
-			continue;
+	__oom_reap_task_mm(mm);
 
-		/*
-		 * Only anonymous pages have a good chance to be dropped
-		 * without additional steps which we cannot afford as we
-		 * are OOM already.
-		 *
-		 * We do not even care about fs backed pages because all
-		 * which are reclaimable have already been reclaimed and
-		 * we do not want to block exit_mmap by keeping mm ref
-		 * count elevated without a good reason.
-		 */
-		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
-			tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
-			unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
-					 NULL);
-			tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end);
-		}
-	}
 	pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
 			task_pid_nr(tsk), tsk->comm,
 			K(get_mm_counter(mm, MM_ANONPAGES)),
@@ -573,13 +580,12 @@ static void oom_reap_task(struct task_st
 	struct mm_struct *mm = tsk->signal->oom_mm;
 
 	/* Retry the down_read_trylock(mmap_sem) a few times */
-	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+	while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm))
 		schedule_timeout_idle(HZ/10);
 
 	if (attempts <= MAX_OOM_REAP_RETRIES)
 		goto done;
 
-
 	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
 		task_pid_nr(tsk), tsk->comm);
 	debug_show_all_locks();

  parent reply	other threads:[~2018-05-14  6:57 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-14  6:48 [PATCH 4.14 00/62] 4.14.41-stable review Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 01/62] ipvs: fix rtnl_lock lockups caused by start_sync_thread Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 02/62] netfilter: ebtables: dont attempt to allocate 0-sized compat array Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 03/62] kcm: Call strp_stop before strp_done in kcm_attach Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 04/62] crypto: af_alg - fix possible uninit-value in alg_bind() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 05/62] netlink: fix uninit-value in netlink_sendmsg Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 06/62] net: fix rtnh_ok() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 07/62] net: initialize skb->peeked when cloning Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 08/62] net: fix uninit-value in __hw_addr_add_ex() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 09/62] dccp: initialize ireq->ir_mark Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 10/62] ipv4: fix uninit-value in ip_route_output_key_hash_rcu() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 11/62] soreuseport: initialise timewait reuseport field Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 12/62] inetpeer: fix uninit-value in inet_getpeer Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 13/62] memcg: fix per_node_info cleanup Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 14/62] perf: Remove superfluous allocation error check Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 15/62] tcp: fix TCP_REPAIR_QUEUE bound checking Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 16/62] bdi: wake up concurrent wb_shutdown() callers Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 17/62] bdi: Fix oops in wb_workfn() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 18/62] KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 19/62] KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 20/62] KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 21/62] arm64: Add work around for Arm Cortex-A55 Erratum 1024718 Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 22/62] compat: fix 4-byte infoleak via uninitialized struct field Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 23/62] gpioib: do not free unrequested descriptors Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 24/62] gpio: fix aspeed_gpio unmask irq Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 25/62] gpio: fix error path in lineevent_create Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 26/62] rfkill: gpio: fix memory leak in probe error path Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 27/62] libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 28/62] dm integrity: use kvfree for kvmallocd memory Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 29/62] tracing: Fix regex_match_front() to not over compare the test string Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 30/62] z3fold: fix reclaim lock-ups Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 31/62] mm: sections are not offlined during memory hotremove Greg Kroah-Hartman
2018-05-14  6:48 ` Greg Kroah-Hartman [this message]
2018-05-14  6:48 ` [PATCH 4.14 33/62] ceph: fix rsize/wsize capping in ceph_direct_read_write() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 34/62] can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 35/62] can: hi311x: Acquire SPI lock on ->do_get_berr_counter Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 36/62] can: hi311x: Work around TX complete interrupt erratum Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 37/62] drm/vc4: Fix scaling of uni-planar formats Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 38/62] drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 39/62] drm/nouveau: Fix deadlock in nv50_mstm_register_connector() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 40/62] drm/atomic: Clean old_state/new_state in drm_atomic_state_default_clear() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 41/62] drm/atomic: Clean private obj " Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 42/62] net: atm: Fix potential Spectre v1 Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 43/62] atm: zatm: " Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 44/62] PCI / PM: Always check PME wakeup capability for runtime wakeup support Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 45/62] PCI / PM: Check device_may_wakeup() in pci_enable_wake() Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 46/62] cpufreq: schedutil: Avoid using invalid next_freq Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 47/62] Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174" Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 48/62] Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 49/62] Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets Greg Kroah-Hartman
2018-05-15 17:07   ` Brian Norris
2018-05-15 19:42     ` Guenter Roeck
2018-05-16  7:57       ` gregkh
2018-05-14  6:49 ` [PATCH 4.14 50/62] thermal: exynos: Reading temperature makes sense only when TMU is turned on Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 51/62] thermal: exynos: Propagate error value from tmu_read() Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 52/62] nvme: add quirk to force medium priority for SQ creation Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 53/62] smb3: directory sync should not return an error Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 54/62] sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 55/62] tracing/uprobe_event: Fix strncpy corner case Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 56/62] perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 57/62] perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 58/62] perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 59/62] perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[] Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 60/62] perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 61/62] KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 62/62] KVM: x86: remove APIC Timer periodic/oneshot spikes Greg Kroah-Hartman
2018-05-14  8:28 ` [PATCH 4.14 00/62] 4.14.41-stable review Nathan Chancellor
2018-05-14 12:45 ` kernelci.org bot
2018-05-14 16:21 ` Guenter Roeck
2018-05-14 22:02 ` Shuah Khan
2018-05-15  5:36 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180514064818.131463728@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [PATCH 4.14 32/62] mm, oom: fix concurrent munlock and oom reaper unmap, v3' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).