LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	mhocko@suse.com, jannh@google.com, vbabka@suse.cz,
	minchan@kernel.org, dancol@google.com, joel@joelfernandes.org,
	akpm@linux-foundation.org
Subject: [PATCH 2/2] mm/madvise: skip MADV_PAGEOUT on shared swap cache pages
Date: Mon, 23 Mar 2020 16:41:51 -0700
Message-ID: <20200323234151.10AF5617@viggo.jf.intel.com> (raw)
In-Reply-To: <20200323234147.558EBA81@viggo.jf.intel.com>


From: Dave Hansen <dave.hansen@linux.intel.com>

MADV_PAGEOUT might interfere with other processes if it is
allowed to reclaim pages shared with other processses.  A
previous patch tried to avoid this for anonymous pages
which were shared by a fork().  It did this by checking
page_mapcount().

That works great for mapped pages.  But, it can not detect
unmapped swap cache pages.  This has not been a problem,
until the previous patch which added the ability for
MADV_PAGEOUT to *find* swap cache pages.

A process doing MADV_PAGEOUT which finds an unmapped swap
cache page and evicts it might interfere with another process
which had the same page mapped.  But, such a page would have
a page_mapcount() of 1 since the page is only actually mapped
in the *other* process.  The page_mapcount() test would fail
to detect the situation.

Thankfully, there is a reference count for swap entries.
To fix this, simply consult both page_mapcount() and the swap
reference count via page_swapcount().

I rigged up a little test program to try to create these
situations.  Basically, if the parent "reader" RSS changes
in response to MADV_PAGEOUT actions in the child, there is
a problem.

	https://www.sr71.net/~dave/intel/madv-pageout.c

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---

 b/mm/madvise.c |   37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff -puN mm/madvise.c~madv-pageout-ignore-shared-swap-cache mm/madvise.c
--- a/mm/madvise.c~madv-pageout-ignore-shared-swap-cache	2020-03-23 16:30:52.022385888 -0700
+++ b/mm/madvise.c	2020-03-23 16:41:15.448384333 -0700
@@ -261,6 +261,7 @@ static struct page *pte_get_reclaim_page
 {
 	swp_entry_t entry;
 	struct page *page;
+	int nr_page_references = 0;
 
 	/* Totally empty PTE: */
 	if (pte_none(ptent))
@@ -271,7 +272,7 @@ static struct page *pte_get_reclaim_page
 		page = vm_normal_page(vma, addr, ptent);
 		if (page)
 			get_page(page);
-		return page;
+		goto got_page;
 	}
 
 	/*
@@ -292,7 +293,33 @@ static struct page *pte_get_reclaim_page
 	 * The PTE was a true swap entry.  The page may be in
 	 * the swap cache.
 	 */
-	return lookup_swap_cache(entry, vma, addr);
+	page = lookup_swap_cache(entry, vma, addr);
+	if (!page)
+		return NULL;
+got_page:
+	/*
+	 * Account for references to the swap entry.  These
+	 * might be "upgraded" to a normal mapping at any
+	 * time.
+	 */
+	if (PageSwapCache(page))
+		nr_page_references += page_swapcount(page);
+
+	/*
+	 * Account for all mappings of the page, including
+	 * when it is in the swap cache.  This ensures that
+	 * MADV_PAGOUT not interfere with anything shared
+	 * with another process.
+	 */
+	nr_page_references += page_mapcount(page);
+
+	/* Any extra references?  Do not reclaim it. */
+	if (nr_page_references > 1) {
+		put_page(page);
+		return NULL;
+	}
+
+	return page;
 }
 
 /*
@@ -477,12 +504,6 @@ regular_page:
 			continue;
 		}
 
-		/* Do not interfere with other mappings of this page */
-		if (page_mapcount(page) != 1) {
-			put_page(page);
-			continue;
-		}
-
 		VM_BUG_ON_PAGE(PageTransCompound(page), page);
 
 		if (!is_swap_pte(ptent) && pte_young(ptent)) {
_

  parent reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-23 23:41 [PATCH 0/2] mm/madvise: teach MADV_PAGEOUT about swap cache Dave Hansen
2020-03-23 23:41 ` [PATCH 1/2] mm/madvise: help MADV_PAGEOUT to find swap cache pages Dave Hansen
2020-03-26  6:24   ` Minchan Kim
2020-03-23 23:41 ` Dave Hansen [this message]
2020-03-26  6:28   ` [PATCH 2/2] mm/madvise: skip MADV_PAGEOUT on shared " Minchan Kim
2020-03-26 23:00     ` Dave Hansen
2020-03-27  6:42       ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200323234151.10AF5617@viggo.jf.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dancol@google.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lkml.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lkml.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lkml.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lkml.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lkml.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lkml.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lkml.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lkml.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lkml.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lkml.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git