LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [REPOST][PATCH 0/3] Unmapped page cache control (v3)
@ 2011-01-20 12:36 Balbir Singh
  2011-01-20 12:36 ` [REPOST] [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v3) Balbir Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Balbir Singh @ 2011-01-20 12:36 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: npiggin, kvm, linux-kernel, kosaki.motohiro, cl, kamezawa.hiroyu


The following series implements page cache control,
this is a split out version of patch 1 of version 3 of the
page cache optimization patches posted earlier at
Previous posting http://lwn.net/Articles/419564/

The previous few revision received lot of comments, I've tried to
address as many of those as possible in this revision. The
last series was reviewed-by Christoph Lameter.

There were comments on overlap with Nick's changes and overlap
with them. I don't feel these changes impact Nick's work and
integration can/will be considered as the patches evolve, if
need be.

Detailed Description
====================
This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario
- In a virtualized environment with cache=writethrough, we see
  double caching - (one in the host and one in the guest). As
  we try to scale guests, cache usage across the system grows.
  The goal of this patch is to reclaim page cache when Linux is running
  as a guest and get the host to hold the page cache and manage it.
  There might be temporary duplication, but in the long run, memory
  in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
  can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
and the number of pages to reclaim when unmapped_page_control argument
is supplied. These numbers were chosen to avoid aggressiveness in
reaping page cache ever so frequently, at the same time providing control.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim.

Data from the previous patchsets can be found at
https://lkml.org/lkml/2010/11/30/79

Size measurement

CONFIG_UNMAPPED_PAGECACHE_CONTROL and CONFIG_NUMA enabled
# size mm/built-in.o 
   text    data     bss     dec     hex filename
 419431 1883047  140888 2443366  254866 mm/built-in.o

CONFIG_UNMAPPED_PAGECACHE_CONTROL disabled, CONFIG_NUMA enabled
# size mm/built-in.o 
   text    data     bss     dec     hex filename
 418908 1883023  140888 2442819  254643 mm/built-in.o


---

Balbir Singh (3):
      Move zone_reclaim() outside of CONFIG_NUMA
      Refactor zone_reclaim code
      Provide control over unmapped pages


 Documentation/kernel-parameters.txt |    8 ++
 include/linux/mmzone.h              |    4 +
 include/linux/swap.h                |   21 +++++-
 init/Kconfig                        |   12 +++
 kernel/sysctl.c                     |   20 +++--
 mm/page_alloc.c                     |    9 ++
 mm/vmscan.c                         |  132 +++++++++++++++++++++++++++++++----
 7 files changed, 175 insertions(+), 31 deletions(-)

-- 
Balbir Singh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [REPOST] [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v3)
  2011-01-20 12:36 [REPOST][PATCH 0/3] Unmapped page cache control (v3) Balbir Singh
@ 2011-01-20 12:36 ` Balbir Singh
  2011-01-20 14:49   ` Christoph Lameter
  2011-01-20 12:36 ` [REPOST] [PATCH 2/3] Refactor zone_reclaim code (v3) Balbir Singh
  2011-01-20 12:36 ` [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3) Balbir Singh
  2 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2011-01-20 12:36 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: npiggin, kvm, linux-kernel, kosaki.motohiro, cl, kamezawa.hiroyu

This patch moves zone_reclaim and associated helpers
outside CONFIG_NUMA. This infrastructure is reused
in the patches for page cache control that follow.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 include/linux/mmzone.h |    4 ++--
 include/linux/swap.h   |    4 ++--
 kernel/sysctl.c        |   18 +++++++++---------
 mm/vmscan.c            |    2 --
 4 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4890662..aeede91 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -302,12 +302,12 @@ struct zone {
 	 */
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
-#ifdef CONFIG_NUMA
-	int node;
 	/*
 	 * zone reclaim becomes active if more unmapped pages exist.
 	 */
 	unsigned long		min_unmapped_pages;
+#ifdef CONFIG_NUMA
+	int node;
 	unsigned long		min_slab_pages;
 #endif
 	struct per_cpu_pageset __percpu *pageset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 84375e4..ac5c06e 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -253,11 +253,11 @@ extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
 
+extern int sysctl_min_unmapped_ratio;
+extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
-extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
-extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #else
 #define zone_reclaim_mode 0
 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a00fdef..e40040e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1211,15 +1211,6 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 	},
 #endif
-#ifdef CONFIG_NUMA
-	{
-		.procname	= "zone_reclaim_mode",
-		.data		= &zone_reclaim_mode,
-		.maxlen		= sizeof(zone_reclaim_mode),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-		.extra1		= &zero,
-	},
 	{
 		.procname	= "min_unmapped_ratio",
 		.data		= &sysctl_min_unmapped_ratio,
@@ -1229,6 +1220,15 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one_hundred,
 	},
+#ifdef CONFIG_NUMA
+	{
+		.procname	= "zone_reclaim_mode",
+		.data		= &zone_reclaim_mode,
+		.maxlen		= sizeof(zone_reclaim_mode),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+	},
 	{
 		.procname	= "min_slab_ratio",
 		.data		= &sysctl_min_slab_ratio,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 42a4859..e841cae 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2740,7 +2740,6 @@ static int __init kswapd_init(void)
 
 module_init(kswapd_init)
 
-#ifdef CONFIG_NUMA
 /*
  * Zone reclaim mode
  *
@@ -2950,7 +2949,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [REPOST] [PATCH 2/3] Refactor zone_reclaim code (v3)
  2011-01-20 12:36 [REPOST][PATCH 0/3] Unmapped page cache control (v3) Balbir Singh
  2011-01-20 12:36 ` [REPOST] [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v3) Balbir Singh
@ 2011-01-20 12:36 ` Balbir Singh
  2011-01-20 14:50   ` Christoph Lameter
  2011-01-20 12:36 ` [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3) Balbir Singh
  2 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2011-01-20 12:36 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: npiggin, kvm, linux-kernel, kosaki.motohiro, cl, kamezawa.hiroyu

Changelog v3
1. Renamed zone_reclaim_unmapped_pages to zone_reclaim_pages

Refactor zone_reclaim, move reusable functionality outside
of zone_reclaim. Make zone_reclaim_unmapped_pages modular

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 mm/vmscan.c |   35 +++++++++++++++++++++++------------
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e841cae..3b25423 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2815,6 +2815,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
 }
 
 /*
+ * Helper function to reclaim unmapped pages, we might add something
+ * similar to this for slab cache as well. Currently this function
+ * is shared with __zone_reclaim()
+ */
+static inline void
+zone_reclaim_pages(struct zone *zone, struct scan_control *sc,
+				unsigned long nr_pages)
+{
+	int priority;
+	/*
+	 * Free memory by calling shrink zone with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	priority = ZONE_RECLAIM_PRIORITY;
+	do {
+		shrink_zone(priority, zone, sc);
+		priority--;
+	} while (priority >= 0 && sc->nr_reclaimed < nr_pages);
+}
+
+/*
  * Try to free up some pages from this zone through reclaim.
  */
 static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
@@ -2823,7 +2844,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	const unsigned long nr_pages = 1 << order;
 	struct task_struct *p = current;
 	struct reclaim_state reclaim_state;
-	int priority;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2847,17 +2867,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
-		/*
-		 * Free memory by calling shrink zone with increasing
-		 * priorities until we have enough memory freed.
-		 */
-		priority = ZONE_RECLAIM_PRIORITY;
-		do {
-			shrink_zone(priority, zone, &sc);
-			priority--;
-		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
-	}
+	if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages)
+		zone_reclaim_pages(zone, &sc, nr_pages);
 
 	nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
 	if (nr_slab_pages0 > zone->min_slab_pages) {


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3)
  2011-01-20 12:36 [REPOST][PATCH 0/3] Unmapped page cache control (v3) Balbir Singh
  2011-01-20 12:36 ` [REPOST] [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v3) Balbir Singh
  2011-01-20 12:36 ` [REPOST] [PATCH 2/3] Refactor zone_reclaim code (v3) Balbir Singh
@ 2011-01-20 12:36 ` Balbir Singh
  2011-01-20 15:00   ` Christoph Lameter
  2 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2011-01-20 12:36 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: npiggin, kvm, linux-kernel, kosaki.motohiro, cl, kamezawa.hiroyu

Changelog v2
1. Use a config option to enable the code (Andrew Morton)
2. Explain the magic tunables in the code or at-least attempt
   to explain them (General comment)
3. Hint uses of the boot parameter with unlikely (Andrew Morton)
4. Use better names (balanced is not a good naming convention)

Provide control using zone_reclaim() and a boot parameter. The
code reuses functionality from zone_reclaim() to isolate unmapped
pages and reclaim them as a priority, ahead of other mapped pages.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 Documentation/kernel-parameters.txt |    8 +++
 include/linux/swap.h                |   21 ++++++--
 init/Kconfig                        |   12 ++++
 kernel/sysctl.c                     |    2 +
 mm/page_alloc.c                     |    9 +++
 mm/vmscan.c                         |   97 +++++++++++++++++++++++++++++++++++
 6 files changed, 142 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index dd8fe2b..f52b0bd 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2515,6 +2515,14 @@ and is between 256 and 4096 characters. It is defined in the file
 			[X86]
 			Set unknown_nmi_panic=1 early on boot.
 
+	unmapped_page_control
+			[KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
+			is enabled. It controls the amount of unmapped memory
+			that is present in the system. This boot option plus
+			vm.min_unmapped_ratio (sysctl) provide granular control
+			over how much unmapped page cache can exist in the system
+			before kswapd starts reclaiming unmapped page cache pages.
+
 	usbcore.autosuspend=
 			[USB] The autosuspend time delay (in seconds) used
 			for newly-detected USB devices (default 2).  This
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ac5c06e..773d7e5 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -253,19 +253,32 @@ extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
 
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 extern int sysctl_min_unmapped_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
-extern int sysctl_min_slab_ratio;
 #else
-#define zone_reclaim_mode 0
 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
 {
 	return 0;
 }
 #endif
 
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+extern bool should_reclaim_unmapped_pages(struct zone *zone);
+#else
+static inline bool should_reclaim_unmapped_pages(struct zone *zone)
+{
+	return false;
+}
+#endif
+
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
+extern int sysctl_min_slab_ratio;
+#else
+#define zone_reclaim_mode 0
+#endif
+
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
 extern void scan_mapping_unevictable_pages(struct address_space *);
 
diff --git a/init/Kconfig b/init/Kconfig
index 3eb22ad..78c9169 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -782,6 +782,18 @@ endif # NAMESPACES
 config MM_OWNER
 	bool
 
+config UNMAPPED_PAGECACHE_CONTROL
+	bool "Provide control over unmapped page cache"
+	default n
+	help
+	  This option adds support for controlling unmapped page cache
+	  via a boot parameter (unmapped_page_control). The boot parameter
+	  with sysctl (vm.min_unmapped_ratio) control the total number
+	  of unmapped pages in the system. This feature is useful if
+	  you want to limit the amount of unmapped page cache or want
+	  to reduce page cache duplication in a virtualized environment.
+	  If unsure say 'N'
+
 config SYSFS_DEPRECATED
 	bool "enable deprecated sysfs features to support old userspace tools"
 	depends on SYSFS
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e40040e..ab2c60a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1211,6 +1211,7 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 	},
 #endif
+#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
 	{
 		.procname	= "min_unmapped_ratio",
 		.data		= &sysctl_min_unmapped_ratio,
@@ -1220,6 +1221,7 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one_hundred,
 	},
+#endif
 #ifdef CONFIG_NUMA
 	{
 		.procname	= "zone_reclaim_mode",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1845a97..1c9fbab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1662,6 +1662,9 @@ zonelist_scan:
 			unsigned long mark;
 			int ret;
 
+			if (should_reclaim_unmapped_pages(zone))
+				wakeup_kswapd(zone, order);
+
 			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 			if (zone_watermark_ok(zone, order, mark,
 				    classzone_idx, alloc_flags))
@@ -4154,10 +4157,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
+#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
 						/ 100;
+#endif
+#ifdef CONFIG_NUMA
+		zone->node = nid;
 		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
 		zone->name = zone_names[j];
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3b25423..9a6682c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -158,6 +158,29 @@ static DECLARE_RWSEM(shrinker_rwsem);
 #define scanning_global_lru(sc)	(1)
 #endif
 
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+static unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
+						struct scan_control *sc);
+static int unmapped_page_control __read_mostly;
+
+static int __init unmapped_page_control_parm(char *str)
+{
+	unmapped_page_control = 1;
+	/*
+	 * XXX: Should we tweak swappiness here?
+	 */
+	return 1;
+}
+__setup("unmapped_page_control", unmapped_page_control_parm);
+
+#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
+static inline unsigned long reclaim_unmapped_pages(int priority,
+				struct zone *zone, struct scan_control *sc)
+{
+	return 0;
+}
+#endif
+
 static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
 						  struct scan_control *sc)
 {
@@ -2297,6 +2320,12 @@ loop_again:
 				shrink_active_list(SWAP_CLUSTER_MAX, zone,
 							&sc, priority, 0);
 
+			/*
+			 * We do unmapped page reclaim once here and once
+			 * below, so that we don't lose out
+			 */
+			reclaim_unmapped_pages(priority, zone, &sc);
+
 			if (!zone_watermark_ok_safe(zone, order,
 					high_wmark_pages(zone), 0, 0)) {
 				end_zone = i;
@@ -2332,6 +2361,11 @@ loop_again:
 				continue;
 
 			sc.nr_scanned = 0;
+			/*
+			 * Reclaim unmapped pages upfront, this should be
+			 * really cheap
+			 */
+			reclaim_unmapped_pages(priority, zone, &sc);
 
 			/*
 			 * Call soft limit reclaim before calling shrink_zone.
@@ -2587,7 +2621,8 @@ void wakeup_kswapd(struct zone *zone, int order)
 		pgdat->kswapd_max_order = order;
 	if (!waitqueue_active(&pgdat->kswapd_wait))
 		return;
-	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
+	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
+		!should_reclaim_unmapped_pages(zone))
 		return;
 
 	trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
@@ -2740,6 +2775,7 @@ static int __init kswapd_init(void)
 
 module_init(kswapd_init)
 
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 /*
  * Zone reclaim mode
  *
@@ -2960,6 +2996,65 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
+#endif
+
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+/*
+ * Routine to reclaim unmapped pages, inspired from the code under
+ * CONFIG_NUMA that does unmapped page and slab page control by keeping
+ * min_unmapped_pages in the zone. We currently reclaim just unmapped
+ * pages, slab control will come in soon, at which point this routine
+ * should be called reclaim cached pages
+ */
+unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
+						struct scan_control *sc)
+{
+	if (unlikely(unmapped_page_control) &&
+		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
+		struct scan_control nsc;
+		unsigned long nr_pages;
+
+		nsc = *sc;
+
+		nsc.swappiness = 0;
+		nsc.may_writepage = 0;
+		nsc.may_unmap = 0;
+		nsc.nr_reclaimed = 0;
+
+		nr_pages = zone_unmapped_file_pages(zone) -
+				zone->min_unmapped_pages;
+		/*
+		 * We don't want to be too aggressive with our
+		 * reclaim, it is our best effort to control
+		 * unmapped pages
+		 */
+		nr_pages >>= 3;
+
+		zone_reclaim_pages(zone, &nsc, nr_pages);
+		return nsc.nr_reclaimed;
+	}
+	return 0;
+}
+
+/*
+ * 16 is a magic number that was pulled out of a magician's
+ * hat. This number automatically provided the best performance
+ * to memory usage (unmapped pages). Lower than this and we spend
+ * a lot of time in frequent reclaims, higher and our control is
+ * weakend.
+ */
+#define UNMAPPED_PAGE_RATIO 16
+
+bool should_reclaim_unmapped_pages(struct zone *zone)
+{
+	if (unlikely(unmapped_page_control) &&
+		(zone_unmapped_file_pages(zone) >
+			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
+		return true;
+	return false;
+}
+#endif
+
 
 /*
  * page_evictable - test whether a page is evictable


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPOST] [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v3)
  2011-01-20 12:36 ` [REPOST] [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v3) Balbir Singh
@ 2011-01-20 14:49   ` Christoph Lameter
  2011-01-21  7:19     ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Lameter @ 2011-01-20 14:49 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

On Thu, 20 Jan 2011, Balbir Singh wrote:

> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -253,11 +253,11 @@ extern int vm_swappiness;
>  extern int remove_mapping(struct address_space *mapping, struct page *page);
>  extern long vm_total_pages;
>
> +extern int sysctl_min_unmapped_ratio;
> +extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
>  #ifdef CONFIG_NUMA
>  extern int zone_reclaim_mode;
> -extern int sysctl_min_unmapped_ratio;
>  extern int sysctl_min_slab_ratio;
> -extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
>  #else
>  #define zone_reclaim_mode 0

So the end result of this patch is that zone reclaim is compiled
into vmscan.o even on !NUMA configurations but since zone_reclaim_mode ==
0 noone can ever call that code?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPOST] [PATCH 2/3] Refactor zone_reclaim code (v3)
  2011-01-20 12:36 ` [REPOST] [PATCH 2/3] Refactor zone_reclaim code (v3) Balbir Singh
@ 2011-01-20 14:50   ` Christoph Lameter
  2011-01-21  7:19     ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Lameter @ 2011-01-20 14:50 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu


Reviewed-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3)
  2011-01-20 12:36 ` [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3) Balbir Singh
@ 2011-01-20 15:00   ` Christoph Lameter
  2011-01-21  7:23     ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Lameter @ 2011-01-20 15:00 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

On Thu, 20 Jan 2011, Balbir Singh wrote:

> +	unmapped_page_control
> +			[KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
> +			is enabled. It controls the amount of unmapped memory
> +			that is present in the system. This boot option plus
> +			vm.min_unmapped_ratio (sysctl) provide granular control

min_unmapped_ratio is there to guarantee that zone reclaim does not
reclaim all unmapped pages.

What you want here is a max_unmapped_ratio.


>  {
> @@ -2297,6 +2320,12 @@ loop_again:
>  				shrink_active_list(SWAP_CLUSTER_MAX, zone,
>  							&sc, priority, 0);
>
> +			/*
> +			 * We do unmapped page reclaim once here and once
> +			 * below, so that we don't lose out
> +			 */
> +			reclaim_unmapped_pages(priority, zone, &sc);
> +
>  			if (!zone_watermark_ok_safe(zone, order,

Hmmmm. Okay that means background reclaim does it. If so then we also want
zone reclaim to be able to work in the background I think.
max_unmapped_ratio could also be useful to the zone reclaim logic.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPOST] [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v3)
  2011-01-20 14:49   ` Christoph Lameter
@ 2011-01-21  7:19     ` Balbir Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2011-01-21  7:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

* Christoph Lameter <cl@linux.com> [2011-01-20 08:49:27]:

> On Thu, 20 Jan 2011, Balbir Singh wrote:
> 
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -253,11 +253,11 @@ extern int vm_swappiness;
> >  extern int remove_mapping(struct address_space *mapping, struct page *page);
> >  extern long vm_total_pages;
> >
> > +extern int sysctl_min_unmapped_ratio;
> > +extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> >  #ifdef CONFIG_NUMA
> >  extern int zone_reclaim_mode;
> > -extern int sysctl_min_unmapped_ratio;
> >  extern int sysctl_min_slab_ratio;
> > -extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> >  #else
> >  #define zone_reclaim_mode 0
> 
> So the end result of this patch is that zone reclaim is compiled
> into vmscan.o even on !NUMA configurations but since zone_reclaim_mode ==
> 0 noone can ever call that code?
>

The third patch, fixes this with the introduction of a config
(cut-copy-paste below). If someone were to bisect to this point, what
you say is correct.

+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) ||
defined(CONFIG_NUMA)
 extern int sysctl_min_unmapped_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
-extern int sysctl_min_slab_ratio;
 #else
-#define zone_reclaim_mode 0
 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned
int order)
 {
        return 0;
 }
 #endif

Thanks for the review! 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPOST] [PATCH 2/3] Refactor zone_reclaim code (v3)
  2011-01-20 14:50   ` Christoph Lameter
@ 2011-01-21  7:19     ` Balbir Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2011-01-21  7:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

* Christoph Lameter <cl@linux.com> [2011-01-20 08:50:40]:

> 
> Reviewed-by: Christoph Lameter <cl@linux.com>
>

Thanks for the review! 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3)
  2011-01-20 15:00   ` Christoph Lameter
@ 2011-01-21  7:23     ` Balbir Singh
  2011-01-21 15:55       ` Christoph Lameter
  0 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2011-01-21  7:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

* Christoph Lameter <cl@linux.com> [2011-01-20 09:00:09]:

> On Thu, 20 Jan 2011, Balbir Singh wrote:
> 
> > +	unmapped_page_control
> > +			[KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
> > +			is enabled. It controls the amount of unmapped memory
> > +			that is present in the system. This boot option plus
> > +			vm.min_unmapped_ratio (sysctl) provide granular control
> 
> min_unmapped_ratio is there to guarantee that zone reclaim does not
> reclaim all unmapped pages.
> 
> What you want here is a max_unmapped_ratio.
>

I thought about that, the logic for reusing min_unmapped_ratio was to
keep a limit beyond which unmapped page cache shrinking should stop.
I think you are suggesting max_unmapped_ratio as the point at which
shrinking should begin, right?
 
> 
> >  {
> > @@ -2297,6 +2320,12 @@ loop_again:
> >  				shrink_active_list(SWAP_CLUSTER_MAX, zone,
> >  							&sc, priority, 0);
> >
> > +			/*
> > +			 * We do unmapped page reclaim once here and once
> > +			 * below, so that we don't lose out
> > +			 */
> > +			reclaim_unmapped_pages(priority, zone, &sc);
> > +
> >  			if (!zone_watermark_ok_safe(zone, order,
> 
> Hmmmm. Okay that means background reclaim does it. If so then we also want
> zone reclaim to be able to work in the background I think.

Anything specific you had in mind, works for me in testing, but is
there anything specific that stands out in your mind that needs to be
done?

Thanks for the review!
 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3)
  2011-01-21  7:23     ` Balbir Singh
@ 2011-01-21 15:55       ` Christoph Lameter
  2011-01-24  6:37         ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Lameter @ 2011-01-21 15:55 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

On Fri, 21 Jan 2011, Balbir Singh wrote:

> * Christoph Lameter <cl@linux.com> [2011-01-20 09:00:09]:
>
> > On Thu, 20 Jan 2011, Balbir Singh wrote:
> >
> > > +	unmapped_page_control
> > > +			[KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
> > > +			is enabled. It controls the amount of unmapped memory
> > > +			that is present in the system. This boot option plus
> > > +			vm.min_unmapped_ratio (sysctl) provide granular control
> >
> > min_unmapped_ratio is there to guarantee that zone reclaim does not
> > reclaim all unmapped pages.
> >
> > What you want here is a max_unmapped_ratio.
> >
>
> I thought about that, the logic for reusing min_unmapped_ratio was to
> keep a limit beyond which unmapped page cache shrinking should stop.

Right. That is the role of it. Its a minimum to leave. You want a maximum
size of the pagte cache.

> I think you are suggesting max_unmapped_ratio as the point at which
> shrinking should begin, right?

The role of min_unmapped_ratio is to never reclaim more pagecache if we
reach that ratio even if we have to go off node for an allocation.

AFAICT What you propose is a maximum size of the page cache. If the number
of page cache pages goes beyond that then you trim the page cache in
background reclaim.

> > > +			reclaim_unmapped_pages(priority, zone, &sc);
> > > +
> > >  			if (!zone_watermark_ok_safe(zone, order,
> >
> > Hmmmm. Okay that means background reclaim does it. If so then we also want
> > zone reclaim to be able to work in the background I think.
>
> Anything specific you had in mind, works for me in testing, but is
> there anything specific that stands out in your mind that needs to be
> done?

Hmmm. So this would also work in a NUMA configuration, right. Limiting the
sizes of the page cache would avoid zone reclaim through these limit. Page
cache size would be limited by the max_unmapped_ratio.

zone_reclaim only would come into play if other allocations make the
memory on the node so tight that we would have to evict more page
cache pages in direct reclaim.
Then zone_reclaim could go down to shrink the page cache size to
min_unmapped_ratio.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3)
  2011-01-21 15:55       ` Christoph Lameter
@ 2011-01-24  6:37         ` Balbir Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2011-01-24  6:37 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

* Christoph Lameter <cl@linux.com> [2011-01-21 09:55:17]:

> On Fri, 21 Jan 2011, Balbir Singh wrote:
> 
> > * Christoph Lameter <cl@linux.com> [2011-01-20 09:00:09]:
> >
> > > On Thu, 20 Jan 2011, Balbir Singh wrote:
> > >
> > > > +	unmapped_page_control
> > > > +			[KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
> > > > +			is enabled. It controls the amount of unmapped memory
> > > > +			that is present in the system. This boot option plus
> > > > +			vm.min_unmapped_ratio (sysctl) provide granular control
> > >
> > > min_unmapped_ratio is there to guarantee that zone reclaim does not
> > > reclaim all unmapped pages.
> > >
> > > What you want here is a max_unmapped_ratio.
> > >
> >
> > I thought about that, the logic for reusing min_unmapped_ratio was to
> > keep a limit beyond which unmapped page cache shrinking should stop.
> 
> Right. That is the role of it. Its a minimum to leave. You want a maximum
> size of the pagte cache.

In this case we want the maximum to be as small as the minimum, but
from a general design perspective maximum does make sense.

> 
> > I think you are suggesting max_unmapped_ratio as the point at which
> > shrinking should begin, right?
> 
> The role of min_unmapped_ratio is to never reclaim more pagecache if we
> reach that ratio even if we have to go off node for an allocation.
> 
> AFAICT What you propose is a maximum size of the page cache. If the number
> of page cache pages goes beyond that then you trim the page cache in
> background reclaim.
> 
> > > > +			reclaim_unmapped_pages(priority, zone, &sc);
> > > > +
> > > >  			if (!zone_watermark_ok_safe(zone, order,
> > >
> > > Hmmmm. Okay that means background reclaim does it. If so then we also want
> > > zone reclaim to be able to work in the background I think.
> >
> > Anything specific you had in mind, works for me in testing, but is
> > there anything specific that stands out in your mind that needs to be
> > done?
> 
> Hmmm. So this would also work in a NUMA configuration, right. Limiting the
> sizes of the page cache would avoid zone reclaim through these limit. Page
> cache size would be limited by the max_unmapped_ratio.
> 
> zone_reclaim only would come into play if other allocations make the
> memory on the node so tight that we would have to evict more page
> cache pages in direct reclaim.
> Then zone_reclaim could go down to shrink the page cache size to
> min_unmapped_ratio.
>

I'll repost with max_unmapped_ration changes

Thanks for the review! 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-01-24  6:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-20 12:36 [REPOST][PATCH 0/3] Unmapped page cache control (v3) Balbir Singh
2011-01-20 12:36 ` [REPOST] [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v3) Balbir Singh
2011-01-20 14:49   ` Christoph Lameter
2011-01-21  7:19     ` Balbir Singh
2011-01-20 12:36 ` [REPOST] [PATCH 2/3] Refactor zone_reclaim code (v3) Balbir Singh
2011-01-20 14:50   ` Christoph Lameter
2011-01-21  7:19     ` Balbir Singh
2011-01-20 12:36 ` [REPOST] [PATCH 3/3] Provide control over unmapped pages (v3) Balbir Singh
2011-01-20 15:00   ` Christoph Lameter
2011-01-21  7:23     ` Balbir Singh
2011-01-21 15:55       ` Christoph Lameter
2011-01-24  6:37         ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).