LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH V5] Allow compaction of unevictable pages
@ 2015-03-13 17:26 Eric B Munson
  2015-03-13 18:56 ` Rik van Riel
  2015-03-13 20:02 ` Michal Hocko
  0 siblings, 2 replies; 12+ messages in thread
From: Eric B Munson @ 2015-03-13 17:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric B Munson, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	Rik van Riel, Michal Hocko, linux-mm, linux-kernel

Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration.  The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of is is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion.  The
compaction code today, does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state.  This patch introduces a sysctl for controlling compaction
behavoir with respect to the unevictable lru.  Users that demand no page
faults after a page is present can set compact_unevictable to 0 and
users who need the large contiguous areas can enable compaction on
locked memory by setting it to 1.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data.  These maps are
created locked and read only.  Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool.  When the
compact_unevictable sysctl is 0, I cannot allocate hugepages after
fragmenting memory.  When the value is set to 1, allocations succeed.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
Changes from V3:
* Updated changelog
* Restrict valid input to 0 or 1 in sysctl
* Add documentation to sysctl/vm.txt

 Documentation/sysctl/vm.txt |   11 +++++++++++
 include/linux/compaction.h  |    1 +
 kernel/sysctl.c             |    9 +++++++++
 mm/compaction.c             |    3 +++
 4 files changed, 24 insertions(+)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 902b457..812f0d4 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -21,6 +21,7 @@ Currently, these files are in /proc/sys/vm:
 - admin_reserve_kbytes
 - block_dump
 - compact_memory
+- compact_unevictable
 - dirty_background_bytes
 - dirty_background_ratio
 - dirty_bytes
@@ -106,6 +107,16 @@ huge pages although processes will also directly compact memory as required.
 
 ==============================================================
 
+compact_unevictable
+
+Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
+allowed to examine the unevictable lru (mlocked pages) for pages to compact.
+This should be used on systems where stalls for minor page faults are an
+acceptable trade for large contiguous free memory.  Set to 0 to prevent
+compaction from moving pages that are unevictable.
+
+==============================================================
+
 dirty_background_bytes
 
 Contains the amount of dirty memory at which the background kernel
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a014559..9dd7e7c 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -34,6 +34,7 @@ extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 extern int sysctl_extfrag_threshold;
 extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 			void __user *buffer, size_t *length, loff_t *ppos);
+extern int sysctl_compact_unevictable;
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 88ea2d6..9272568 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1313,6 +1313,15 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &min_extfrag_threshold,
 		.extra2		= &max_extfrag_threshold,
 	},
+	{
+		.procname	= "compact_unevictable",
+		.data		= &sysctl_compact_unevictable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 
 #endif /* CONFIG_COMPACTION */
 	{
diff --git a/mm/compaction.c b/mm/compaction.c
index 8c0d945..342b221 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1046,6 +1046,8 @@ typedef enum {
 	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
 } isolate_migrate_t;
 
+int sysctl_compact_unevictable;
+
 /*
  * Isolate all pages that can be migrated from the first suitable block,
  * starting at the block pointed to by the migrate scanner pfn within
@@ -1057,6 +1059,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	unsigned long low_pfn, end_pfn;
 	struct page *page;
 	const isolate_mode_t isolate_mode =
+		(sysctl_compact_unevictable ? ISOLATE_UNEVICTABLE : 0) |
 		(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
 
 	/*
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-13 17:26 [PATCH V5] Allow compaction of unevictable pages Eric B Munson
@ 2015-03-13 18:56 ` Rik van Riel
  2015-03-13 19:09   ` Eric B Munson
  2015-03-13 20:02 ` Michal Hocko
  1 sibling, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2015-03-13 18:56 UTC (permalink / raw)
  To: Eric B Munson, Andrew Morton
  Cc: Vlastimil Babka, Thomas Gleixner, Christoph Lameter,
	Peter Zijlstra, Mel Gorman, David Rientjes, Michal Hocko,
	linux-mm, linux-kernel

On 03/13/2015 01:26 PM, Eric B Munson wrote:

> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1046,6 +1046,8 @@ typedef enum {
>  	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
>  } isolate_migrate_t;
>  
> +int sysctl_compact_unevictable;
> +
>  /*
>   * Isolate all pages that can be migrated from the first suitable block,
>   * starting at the block pointed to by the migrate scanner pfn within

I suspect that the use cases where users absolutely do not want
unevictable pages migrated are special cases, and it may make
sense to enable sysctl_compact_unevictable by default.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-13 18:56 ` Rik van Riel
@ 2015-03-13 19:09   ` Eric B Munson
  2015-03-13 20:19     ` Michal Hocko
  2015-03-13 23:18     ` David Rientjes
  0 siblings, 2 replies; 12+ messages in thread
From: Eric B Munson @ 2015-03-13 19:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	Michal Hocko, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1054 bytes --]

On Fri, 13 Mar 2015, Rik van Riel wrote:

> On 03/13/2015 01:26 PM, Eric B Munson wrote:
> 
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1046,6 +1046,8 @@ typedef enum {
> >  	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
> >  } isolate_migrate_t;
> >  
> > +int sysctl_compact_unevictable;
> > +
> >  /*
> >   * Isolate all pages that can be migrated from the first suitable block,
> >   * starting at the block pointed to by the migrate scanner pfn within
> 
> I suspect that the use cases where users absolutely do not want
> unevictable pages migrated are special cases, and it may make
> sense to enable sysctl_compact_unevictable by default.

Given that sysctl_compact_unevictable=0 is the way the kernel behaves
now and the push back against always enabling compaction on unevictable
pages, I left the default to be the behavior as it is today.  I agree
that this is likely the minority case, but I'd really like Peter Z or
someone else from real time to say that they are okay with the default
changing.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-13 17:26 [PATCH V5] Allow compaction of unevictable pages Eric B Munson
  2015-03-13 18:56 ` Rik van Riel
@ 2015-03-13 20:02 ` Michal Hocko
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2015-03-13 20:02 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Andrew Morton, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Fri 13-03-15 13:26:37, Eric B Munson wrote:
[...]
> +compact_unevictable
> +
> +Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
> +allowed to examine the unevictable lru (mlocked pages) for pages to compact.
> +This should be used on systems where stalls for minor page faults are an
> +acceptable trade for large contiguous free memory.  Set to 0 to prevent
> +compaction from moving pages that are unevictable.

It is not clear which behavior is default from this text.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-13 19:09   ` Eric B Munson
@ 2015-03-13 20:19     ` Michal Hocko
  2015-03-16 10:14       ` Vlastimil Babka
  2015-03-13 23:18     ` David Rientjes
  1 sibling, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2015-03-13 20:19 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Rik van Riel, Andrew Morton, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	linux-mm, linux-kernel

On Fri 13-03-15 15:09:15, Eric B Munson wrote:
> On Fri, 13 Mar 2015, Rik van Riel wrote:
> 
> > On 03/13/2015 01:26 PM, Eric B Munson wrote:
> > 
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -1046,6 +1046,8 @@ typedef enum {
> > >  	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
> > >  } isolate_migrate_t;
> > >  
> > > +int sysctl_compact_unevictable;
> > > +
> > >  /*
> > >   * Isolate all pages that can be migrated from the first suitable block,
> > >   * starting at the block pointed to by the migrate scanner pfn within
> > 
> > I suspect that the use cases where users absolutely do not want
> > unevictable pages migrated are special cases, and it may make
> > sense to enable sysctl_compact_unevictable by default.
> 
> Given that sysctl_compact_unevictable=0 is the way the kernel behaves
> now and the push back against always enabling compaction on unevictable
> pages, I left the default to be the behavior as it is today. 

The question is _why_ we have this behavior now. Is it intentional?

e46a28790e59 (CMA: migrate mlocked pages) is a precedence in that
direction. Vlastimil has then changed that by edc2ca612496 (mm,
compaction: move pageblock checks up from isolate_migratepages_range()).
There is no mention about mlock pages so I guess it was more an
unintentional side effect of the patch. At least that is my current
understanding. I might be wrong here.

The thing about RT is that it is not usable with the upstream kernel
without the RT patchset AFAIU. So the default should be reflect what is
better for the standard kernel. RT loads have to tune the system anyway
so it is not so surprising they would disable this option as well. We
should help those guys and do not require them to touch the code but the
knob is reasonable IMHO.

Especially when your changelog suggests that having this enabled by
default is beneficial for the standard kernel.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-13 19:09   ` Eric B Munson
  2015-03-13 20:19     ` Michal Hocko
@ 2015-03-13 23:18     ` David Rientjes
  2015-03-13 23:43       ` Rik van Riel
  2015-03-16 15:39       ` Christoph Lameter
  1 sibling, 2 replies; 12+ messages in thread
From: David Rientjes @ 2015-03-13 23:18 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Rik van Riel, Andrew Morton, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, Michal Hocko,
	linux-mm, linux-kernel

On Fri, 13 Mar 2015, Eric B Munson wrote:

> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -1046,6 +1046,8 @@ typedef enum {
> > >  	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
> > >  } isolate_migrate_t;
> > >  
> > > +int sysctl_compact_unevictable;
> > > +
> > >  /*
> > >   * Isolate all pages that can be migrated from the first suitable block,
> > >   * starting at the block pointed to by the migrate scanner pfn within
> > 
> > I suspect that the use cases where users absolutely do not want
> > unevictable pages migrated are special cases, and it may make
> > sense to enable sysctl_compact_unevictable by default.
> 
> Given that sysctl_compact_unevictable=0 is the way the kernel behaves
> now and the push back against always enabling compaction on unevictable
> pages, I left the default to be the behavior as it is today.  I agree
> that this is likely the minority case, but I'd really like Peter Z or
> someone else from real time to say that they are okay with the default
> changing.
> 

It would be really disappointing to not enable this by default for !rt 
kernels.  We haven't migrated mlocked pages in the past by way of memory 
compaction because it can theoretically result in consistent minor page 
faults, but I haven't yet heard a !rt objection to enabling this.

If the rt patchset is going to carry a patch to disable this, then the 
question arises: why not just carry an ISOLATE_UNEVICTABLE patch instead?  
I think you've done the due diligence required to allow this to be 
disabled at any time in a very easy way from userspace by the new tunable.  
I think it should be enabled and I'd be very surprised to hear any other 
objection about it other than it's different from the status quo.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-13 23:18     ` David Rientjes
@ 2015-03-13 23:43       ` Rik van Riel
  2015-03-16 15:39       ` Christoph Lameter
  1 sibling, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2015-03-13 23:43 UTC (permalink / raw)
  To: David Rientjes, Eric B Munson
  Cc: Andrew Morton, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, Michal Hocko,
	linux-mm, linux-kernel

On 03/13/2015 07:18 PM, David Rientjes wrote:
> On Fri, 13 Mar 2015, Eric B Munson wrote:
> 
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -1046,6 +1046,8 @@ typedef enum {
>>>>  	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
>>>>  } isolate_migrate_t;
>>>>  
>>>> +int sysctl_compact_unevictable;
>>>> +
>>>>  /*
>>>>   * Isolate all pages that can be migrated from the first suitable block,
>>>>   * starting at the block pointed to by the migrate scanner pfn within
>>>
>>> I suspect that the use cases where users absolutely do not want
>>> unevictable pages migrated are special cases, and it may make
>>> sense to enable sysctl_compact_unevictable by default.
>>
>> Given that sysctl_compact_unevictable=0 is the way the kernel behaves
>> now and the push back against always enabling compaction on unevictable
>> pages, I left the default to be the behavior as it is today.  I agree
>> that this is likely the minority case, but I'd really like Peter Z or
>> someone else from real time to say that they are okay with the default
>> changing.
>>
> 
> It would be really disappointing to not enable this by default for !rt 
> kernels.  We haven't migrated mlocked pages in the past by way of memory 
> compaction because it can theoretically result in consistent minor page 
> faults, but I haven't yet heard a !rt objection to enabling this.
> 
> If the rt patchset is going to carry a patch to disable this

It does not have to carry a patch to disable something that can be
disabled at run time.

The smaller the realtime patchset has to be, the better.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-13 20:19     ` Michal Hocko
@ 2015-03-16 10:14       ` Vlastimil Babka
  2015-03-16 13:49         ` Eric B Munson
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2015-03-16 10:14 UTC (permalink / raw)
  To: Michal Hocko, Eric B Munson
  Cc: Rik van Riel, Andrew Morton, Thomas Gleixner, Christoph Lameter,
	Peter Zijlstra, Mel Gorman, David Rientjes, linux-mm,
	linux-kernel, Linux API

[CC += linux-api@]

Since this is a kernel-user-space API change, please CC linux-api@.
The kernel source file Documentation/SubmitChecklist notes that all
Linux kernel patches that change userspace interfaces should be CCed
to linux-api@vger.kernel.org, so that the various parties who are
interested in API changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html


On 03/13/2015 09:19 PM, Michal Hocko wrote:
> On Fri 13-03-15 15:09:15, Eric B Munson wrote:
>> On Fri, 13 Mar 2015, Rik van Riel wrote:
>>
>>> On 03/13/2015 01:26 PM, Eric B Munson wrote:
>>>
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -1046,6 +1046,8 @@ typedef enum {
>>>>   	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
>>>>   } isolate_migrate_t;
>>>>
>>>> +int sysctl_compact_unevictable;

A comment here would be useful I think, as well as explicit default 
value. Maybe also __read_mostly although I don't know how much that matters.

I also wonder if it might be confusing that "compact_memory" is a 
write-only trigger that doesn't even show under "sysctl -a", while 
"compact_unevictable" is a read/write setting. But I don't have a better 
suggestion right now.

>>>> +
>>>>   /*
>>>>    * Isolate all pages that can be migrated from the first suitable block,
>>>>    * starting at the block pointed to by the migrate scanner pfn within
>>>
>>> I suspect that the use cases where users absolutely do not want
>>> unevictable pages migrated are special cases, and it may make
>>> sense to enable sysctl_compact_unevictable by default.
>>
>> Given that sysctl_compact_unevictable=0 is the way the kernel behaves
>> now and the push back against always enabling compaction on unevictable
>> pages, I left the default to be the behavior as it is today.
>
> The question is _why_ we have this behavior now. Is it intentional?

It's there since 748446bb6 ("mm: compaction: memory compaction core"). 
Commit c53919adc0 ("mm: vmscan: remove lumpy reclaim") changes the 
comment in __isolate_lru_page() handling of unevictable pages to mention 
compaction explicitly. It could have been accidental in 748446bb6 
though, maybe it just reused __isolate_lru_page() for compaction - it 
seems that the skipping of unevictable was initially meant to optimize 
lumpy reclaim.

> e46a28790e59 (CMA: migrate mlocked pages) is a precedence in that

Well, CMA and realtime kernels are probably mutually exclusive enough.

> direction. Vlastimil has then changed that by edc2ca612496 (mm,
> compaction: move pageblock checks up from isolate_migratepages_range()).
> There is no mention about mlock pages so I guess it was more an
> unintentional side effect of the patch. At least that is my current
> understanding. I might be wrong here.

Although that commit did change unintentionally more details that I 
would have liked (unfortunately), I think you are wrong on this one. 
ISOLATE_UNEVICTABLE is still passed from isolate_migratepages_range() 
which is used by CMA, while the compaction variant 
isolate_migratepages() does not pass it. So it's kept CMA-specific as 
before.

> The thing about RT is that it is not usable with the upstream kernel
> without the RT patchset AFAIU. So the default should be reflect what is
> better for the standard kernel. RT loads have to tune the system anyway
> so it is not so surprising they would disable this option as well. We
> should help those guys and do not require them to touch the code but the
> knob is reasonable IMHO.
>
> Especially when your changelog suggests that having this enabled by
> default is beneficial for the standard kernel.

I agree, but if there's a danger of becoming too of a bikeshed topic, 
I'm fine with keeping the default same as current behavior and changing 
it later. Or maybe we should ask some -rt mailing list instead of just 
Peter and Thomas?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-16 10:14       ` Vlastimil Babka
@ 2015-03-16 13:49         ` Eric B Munson
  2015-03-18 14:40           ` Vlastimil Babka
  2015-03-18 15:40           ` Steven Rostedt
  0 siblings, 2 replies; 12+ messages in thread
From: Eric B Munson @ 2015-03-16 13:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michal Hocko, Rik van Riel, Andrew Morton, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	linux-mm, linux-kernel, Linux API

[-- Attachment #1: Type: text/plain, Size: 4911 bytes --]

On Mon, 16 Mar 2015, Vlastimil Babka wrote:

> [CC += linux-api@]
> 
> Since this is a kernel-user-space API change, please CC linux-api@.
> The kernel source file Documentation/SubmitChecklist notes that all
> Linux kernel patches that change userspace interfaces should be CCed
> to linux-api@vger.kernel.org, so that the various parties who are
> interested in API changes are informed. For further information, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_man-2Dpages_linux-2Dapi-2Dml.html&d=AwIC-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=aUmMDRRT0nx4IfILbQLv8xzE0wB9sQxTHI3QrQ2lkBU&m=GUotTNnv26L0HxtXrBgiHqu6kwW3ufx2_TQpXIA216c&s=IFFYQ7Zr-4SIaF3slOZqiSP_noyva42kCwVRxxDm5wo&e=

Added to the Cc list, thanks.

> 
> 
> On 03/13/2015 09:19 PM, Michal Hocko wrote:
> >On Fri 13-03-15 15:09:15, Eric B Munson wrote:
> >>On Fri, 13 Mar 2015, Rik van Riel wrote:
> >>
> >>>On 03/13/2015 01:26 PM, Eric B Munson wrote:
> >>>
> >>>>--- a/mm/compaction.c
> >>>>+++ b/mm/compaction.c
> >>>>@@ -1046,6 +1046,8 @@ typedef enum {
> >>>>  	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
> >>>>  } isolate_migrate_t;
> >>>>
> >>>>+int sysctl_compact_unevictable;
> 
> A comment here would be useful I think, as well as explicit default
> value. Maybe also __read_mostly although I don't know how much that
> matters.

I am going to sit on V6 for a couple of days incase anyone from rt wants
to chime in.  But these will be in V6.

> 
> I also wonder if it might be confusing that "compact_memory" is a
> write-only trigger that doesn't even show under "sysctl -a", while
> "compact_unevictable" is a read/write setting. But I don't have a
> better suggestion right now.

Does allow_unevictable_compaction sound better?  It feels too much like
variable naming conventions from other languages which seems to
encourage verbosity to me, but does indicate a difference from
compact_memory.

> 
> >>>>+
> >>>>  /*
> >>>>   * Isolate all pages that can be migrated from the first suitable block,
> >>>>   * starting at the block pointed to by the migrate scanner pfn within
> >>>
> >>>I suspect that the use cases where users absolutely do not want
> >>>unevictable pages migrated are special cases, and it may make
> >>>sense to enable sysctl_compact_unevictable by default.
> >>
> >>Given that sysctl_compact_unevictable=0 is the way the kernel behaves
> >>now and the push back against always enabling compaction on unevictable
> >>pages, I left the default to be the behavior as it is today.
> >
> >The question is _why_ we have this behavior now. Is it intentional?
> 
> It's there since 748446bb6 ("mm: compaction: memory compaction
> core"). Commit c53919adc0 ("mm: vmscan: remove lumpy reclaim")
> changes the comment in __isolate_lru_page() handling of unevictable
> pages to mention compaction explicitly. It could have been
> accidental in 748446bb6 though, maybe it just reused
> __isolate_lru_page() for compaction - it seems that the skipping of
> unevictable was initially meant to optimize lumpy reclaim.
> 
> >e46a28790e59 (CMA: migrate mlocked pages) is a precedence in that
> 
> Well, CMA and realtime kernels are probably mutually exclusive enough.
> 
> >direction. Vlastimil has then changed that by edc2ca612496 (mm,
> >compaction: move pageblock checks up from isolate_migratepages_range()).
> >There is no mention about mlock pages so I guess it was more an
> >unintentional side effect of the patch. At least that is my current
> >understanding. I might be wrong here.
> 
> Although that commit did change unintentionally more details that I
> would have liked (unfortunately), I think you are wrong on this one.
> ISOLATE_UNEVICTABLE is still passed from
> isolate_migratepages_range() which is used by CMA, while the
> compaction variant isolate_migratepages() does not pass it. So it's
> kept CMA-specific as before.
> 
> >The thing about RT is that it is not usable with the upstream kernel
> >without the RT patchset AFAIU. So the default should be reflect what is
> >better for the standard kernel. RT loads have to tune the system anyway
> >so it is not so surprising they would disable this option as well. We
> >should help those guys and do not require them to touch the code but the
> >knob is reasonable IMHO.
> >
> >Especially when your changelog suggests that having this enabled by
> >default is beneficial for the standard kernel.
> 
> I agree, but if there's a danger of becoming too of a bikeshed
> topic, I'm fine with keeping the default same as current behavior
> and changing it later. Or maybe we should ask some -rt mailing list
> instead of just Peter and Thomas?

According to the rt wiki, there is no -rt development list so lkml is
it.  I will change the default to 1 for V6 if I don't hear otherwise by
the time I get back around to spinning V6.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-13 23:18     ` David Rientjes
  2015-03-13 23:43       ` Rik van Riel
@ 2015-03-16 15:39       ` Christoph Lameter
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Lameter @ 2015-03-16 15:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Eric B Munson, Rik van Riel, Andrew Morton, Vlastimil Babka,
	Thomas Gleixner, Peter Zijlstra, Mel Gorman, Michal Hocko,
	linux-mm, linux-kernel

On Fri, 13 Mar 2015, David Rientjes wrote:

> It would be really disappointing to not enable this by default for !rt
> kernels.  We haven't migrated mlocked pages in the past by way of memory
> compaction because it can theoretically result in consistent minor page
> faults, but I haven't yet heard a !rt objection to enabling this.
>
> If the rt patchset is going to carry a patch to disable this, then the
> question arises: why not just carry an ISOLATE_UNEVICTABLE patch instead?
> I think you've done the due diligence required to allow this to be
> disabled at any time in a very easy way from userspace by the new tunable.
> I think it should be enabled and I'd be very surprised to hear any other
> objection about it other than it's different from the status quo.

Compaction can alrady be disabled and thus you can also disable migration
of mlocked pages. In general low latency requires that no expensive kernel
processing is being done. Thus the rest of compaction processing also
needs to be disabled. That means that allowing compaction handling
mlocked pages would be ok. RT loads and low latency configurations (like
my environment) will selective disable compaction to avoid creating
additional latencies. This could be done only for specific nodes and
processors if necessary.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-16 13:49         ` Eric B Munson
@ 2015-03-18 14:40           ` Vlastimil Babka
  2015-03-18 15:40           ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2015-03-18 14:40 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Michal Hocko, Rik van Riel, Andrew Morton, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	linux-mm, linux-kernel, Linux API

On 03/16/2015 02:49 PM, Eric B Munson wrote:
> On Mon, 16 Mar 2015, Vlastimil Babka wrote:
>
>> [CC += linux-api@]
>>
>> Since this is a kernel-user-space API change, please CC linux-api@.
>> The kernel source file Documentation/SubmitChecklist notes that all
>> Linux kernel patches that change userspace interfaces should be CCed
>> to linux-api@vger.kernel.org, so that the various parties who are
>> interested in API changes are informed. For further information, see
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_man-2Dpages_linux-2Dapi-2Dml.html&d=AwIC-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=aUmMDRRT0nx4IfILbQLv8xzE0wB9sQxTHI3QrQ2lkBU&m=GUotTNnv26L0HxtXrBgiHqu6kwW3ufx2_TQpXIA216c&s=IFFYQ7Zr-4SIaF3slOZqiSP_noyva42kCwVRxxDm5wo&e=
>
> Added to the Cc list, thanks.
>
>>
>>
>> On 03/13/2015 09:19 PM, Michal Hocko wrote:
>>> On Fri 13-03-15 15:09:15, Eric B Munson wrote:
>>>> On Fri, 13 Mar 2015, Rik van Riel wrote:
>>>>
>>>>> On 03/13/2015 01:26 PM, Eric B Munson wrote:
>>>>>
>>>>>> --- a/mm/compaction.c
>>>>>> +++ b/mm/compaction.c
>>>>>> @@ -1046,6 +1046,8 @@ typedef enum {
>>>>>>   	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
>>>>>>   } isolate_migrate_t;
>>>>>>
>>>>>> +int sysctl_compact_unevictable;
>>
>> A comment here would be useful I think, as well as explicit default
>> value. Maybe also __read_mostly although I don't know how much that
>> matters.
>
> I am going to sit on V6 for a couple of days incase anyone from rt wants
> to chime in.  But these will be in V6.
>
>>
>> I also wonder if it might be confusing that "compact_memory" is a
>> write-only trigger that doesn't even show under "sysctl -a", while
>> "compact_unevictable" is a read/write setting. But I don't have a
>> better suggestion right now.
>
> Does allow_unevictable_compaction sound better?  It feels too much like

For sorting purposes, maybe compact_unevictable_allowed?

> variable naming conventions from other languages which seems to
> encourage verbosity to me, but does indicate a difference from
> compact_memory.

If it sounds too awkward/long and nobody else has better suggestion, 
then just keep it as "compact_unevictable".


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5] Allow compaction of unevictable pages
  2015-03-16 13:49         ` Eric B Munson
  2015-03-18 14:40           ` Vlastimil Babka
@ 2015-03-18 15:40           ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2015-03-18 15:40 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Vlastimil Babka, Michal Hocko, Rik van Riel, Andrew Morton,
	Thomas Gleixner, Christoph Lameter, Peter Zijlstra, Mel Gorman,
	David Rientjes, linux-mm, linux-kernel, Linux API,
	linux-rt-users

On Mon, Mar 16, 2015 at 09:49:56AM -0400, Eric B Munson wrote:
> On Mon, 16 Mar 2015, Vlastimil Babka wrote:
> 
> > [CC += linux-api@]
> > 
> > Since this is a kernel-user-space API change, please CC linux-api@.
> > The kernel source file Documentation/SubmitChecklist notes that all
> > Linux kernel patches that change userspace interfaces should be CCed
> > to linux-api@vger.kernel.org, so that the various parties who are
> > interested in API changes are informed. For further information, see
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_man-2Dpages_linux-2Dapi-2Dml.html&d=AwIC-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=aUmMDRRT0nx4IfILbQLv8xzE0wB9sQxTHI3QrQ2lkBU&m=GUotTNnv26L0HxtXrBgiHqu6kwW3ufx2_TQpXIA216c&s=IFFYQ7Zr-4SIaF3slOZqiSP_noyva42kCwVRxxDm5wo&e=
> 
> Added to the Cc list, thanks.
> 
> > 
> > 
> > On 03/13/2015 09:19 PM, Michal Hocko wrote:
> > >On Fri 13-03-15 15:09:15, Eric B Munson wrote:
> > >>On Fri, 13 Mar 2015, Rik van Riel wrote:
> > >>
> > >>>On 03/13/2015 01:26 PM, Eric B Munson wrote:
> > >>>
> > >>>>--- a/mm/compaction.c
> > >>>>+++ b/mm/compaction.c
> > >>>>@@ -1046,6 +1046,8 @@ typedef enum {
> > >>>>  	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
> > >>>>  } isolate_migrate_t;
> > >>>>
> > >>>>+int sysctl_compact_unevictable;
> > 
> > A comment here would be useful I think, as well as explicit default
> > value. Maybe also __read_mostly although I don't know how much that
> > matters.
> 
> I am going to sit on V6 for a couple of days incase anyone from rt wants
> to chime in.  But these will be in V6.
> 
> > 
> > I also wonder if it might be confusing that "compact_memory" is a
> > write-only trigger that doesn't even show under "sysctl -a", while
> > "compact_unevictable" is a read/write setting. But I don't have a
> > better suggestion right now.
> 
> Does allow_unevictable_compaction sound better?  It feels too much like
> variable naming conventions from other languages which seems to
> encourage verbosity to me, but does indicate a difference from
> compact_memory.
> 
> > 
> > >>>>+
> > >>>>  /*
> > >>>>   * Isolate all pages that can be migrated from the first suitable block,
> > >>>>   * starting at the block pointed to by the migrate scanner pfn within
> > >>>
> > >>>I suspect that the use cases where users absolutely do not want
> > >>>unevictable pages migrated are special cases, and it may make
> > >>>sense to enable sysctl_compact_unevictable by default.
> > >>
> > >>Given that sysctl_compact_unevictable=0 is the way the kernel behaves
> > >>now and the push back against always enabling compaction on unevictable
> > >>pages, I left the default to be the behavior as it is today.
> > >
> > >The question is _why_ we have this behavior now. Is it intentional?
> > 
> > It's there since 748446bb6 ("mm: compaction: memory compaction
> > core"). Commit c53919adc0 ("mm: vmscan: remove lumpy reclaim")
> > changes the comment in __isolate_lru_page() handling of unevictable
> > pages to mention compaction explicitly. It could have been
> > accidental in 748446bb6 though, maybe it just reused
> > __isolate_lru_page() for compaction - it seems that the skipping of
> > unevictable was initially meant to optimize lumpy reclaim.
> > 
> > >e46a28790e59 (CMA: migrate mlocked pages) is a precedence in that
> > 
> > Well, CMA and realtime kernels are probably mutually exclusive enough.
> > 
> > >direction. Vlastimil has then changed that by edc2ca612496 (mm,
> > >compaction: move pageblock checks up from isolate_migratepages_range()).
> > >There is no mention about mlock pages so I guess it was more an
> > >unintentional side effect of the patch. At least that is my current
> > >understanding. I might be wrong here.
> > 
> > Although that commit did change unintentionally more details that I
> > would have liked (unfortunately), I think you are wrong on this one.
> > ISOLATE_UNEVICTABLE is still passed from
> > isolate_migratepages_range() which is used by CMA, while the
> > compaction variant isolate_migratepages() does not pass it. So it's
> > kept CMA-specific as before.
> > 
> > >The thing about RT is that it is not usable with the upstream kernel
> > >without the RT patchset AFAIU. So the default should be reflect what is
> > >better for the standard kernel. RT loads have to tune the system anyway
> > >so it is not so surprising they would disable this option as well. We
> > >should help those guys and do not require them to touch the code but the
> > >knob is reasonable IMHO.
> > >
> > >Especially when your changelog suggests that having this enabled by
> > >default is beneficial for the standard kernel.
> > 
> > I agree, but if there's a danger of becoming too of a bikeshed
> > topic, I'm fine with keeping the default same as current behavior
> > and changing it later. Or maybe we should ask some -rt mailing list
> > instead of just Peter and Thomas?
> 
> According to the rt wiki, there is no -rt development list so lkml is
> it.  I will change the default to 1 for V6 if I don't hear otherwise by
> the time I get back around to spinning V6.
> 

For kernel development, yes. But this change affects users. Cc'ing the
linux-rt-users mailing list (which I did) is appropriate in this case.

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-03-18 15:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-13 17:26 [PATCH V5] Allow compaction of unevictable pages Eric B Munson
2015-03-13 18:56 ` Rik van Riel
2015-03-13 19:09   ` Eric B Munson
2015-03-13 20:19     ` Michal Hocko
2015-03-16 10:14       ` Vlastimil Babka
2015-03-16 13:49         ` Eric B Munson
2015-03-18 14:40           ` Vlastimil Babka
2015-03-18 15:40           ` Steven Rostedt
2015-03-13 23:18     ` David Rientjes
2015-03-13 23:43       ` Rik van Riel
2015-03-16 15:39       ` Christoph Lameter
2015-03-13 20:02 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).