LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <clameter@sgi.com>, Paul Jackson <pj@sgi.com>,
	David Rientjes <rientjes@google.com>, Mel Gorman <mel@csn.ul.ie>,
	linux-mm <linux-mm@kvack.org>,
	torvalds@linux-foundation.org, Eric Whitney <eric.whitney@hp.com>
Subject: [PATCH 2.6.24-mm1]  Mempolicy:  silently restrict nodemask to allowed nodes V3
Date: Fri, 08 Feb 2008 14:45:13 -0500	[thread overview]
Message-ID: <1202499913.5346.60.camel@localhost> (raw)
In-Reply-To: <20080205163406.270B.KOSAKI.MOTOHIRO@jp.fujitsu.com>

Was "Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't
works on memoryless node."

[Aside:  I noticed there were two slightly different distributions for
this topic.  I've unified the distribution lists w/o dropping anyone, I
think.  Apologies if you'd rather have been dropped...]

Here's V3 of the patch, accomodating Kosaki Motohiro's suggestion for
folding contextualize_policy() into mpol_check_policy() [because my
"was_empty" argument "was ugly" ;-)].  It does seem to clean up the
code.

I'm still deferring David Rientjes' suggestion to fold
mpol_check_policy() into mpol_new().  We need to sort out whether
mempolicies specified for tmpfs and hugetlbfs mounts always need the
same "contextualization" as user/application installed policies.  I
don't want to hold up this bug fix for that discussion.  This is
something Paul J will need to address with his cpuset/mempolicy rework,
so we can sort it out in that context.

Again, tested with "numactl --interleave=all" and memtoy on ia64 using
mem= command line argument to simulate memoryless node.


Lee

============================
[PATCH] 2.6.24-mm1 - mempolicy:  silently restrict nodemask to allowed nodes

V2 -> V3:
+ As suggested by Kosaki Motohito, fold the "contextualization"
  of policy nodemask into mpol_check_policy().  Looks a little
  cleaner. 

V1 -> V2:
+ Communicate whether or not incoming node mask was empty to
  mpol_check_policy() for better error checking.
+ As suggested by David Rientjes, remove the now unused
   cpuset_nodes_subset_current_mems_allowed() from cpuset.h

Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes.  This patch attempts to fix that problem.

Some background:  

numactl --interleave=all calls set_mempolicy(2) with a fully
populated [out to MAXNUMNODES] nodemask.  set_mempolicy()
[in do_set_mempolicy()] calls contextualize_policy() which
requires that the nodemask be a subset of the current task's
mems_allowed; else EINVAL will be returned.  A task's
mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]--
i.e., nodes with memory.  So, a fully populated nodemask will
be declared invalid if it includes memoryless nodes.

  NOTE:  the same thing will occur when running in a cpuset
         with restricted mem_allowed--for the same reason:
         node mask contains dis-allowed nodes.

mbind(2), on the other hand, just masks off any nodes in the 
nodemask that are not included in the caller's mems_allowed.

In each case [mbind() and set_mempolicy()], mpol_check_policy()
will complain [again, resulting in EINVAL] if the nodemask contains 
any memoryless nodes.  This is somewhat redundant as mpol_new() 
will remove memoryless nodes for interleave policy, as will 
bind_zonelist()--called by mpol_new() for BIND policy.

Proposed fix:

1) modify contextualize_policy logic to:
   a) remember whether the incoming node mask is empty.
   b) if not, restrict the nodemask to allowed nodes, as is
      currently done in-line for mbind().  This guarantees
      that the resulting mask includes only nodes with memory.

      NOTE:  this is a [benign, IMO] change in behavior for
             set_mempolicy().  Dis-allowed nodes will be
             silently ignored, rather than returning an error.

   c) fold this code into mpol_check_policy(), replace 2 calls to
      contextualize_policy() to call mpol_check_policy() directly
      and remove contextualize_policy().

2) In existing mpol_check_policy() logic, after "contextualization":
   a) MPOL_DEFAULT:  require that in coming mask "was_empty"
   b) MPOL_{BIND|INTERLEAVE}:  require that contextualized nodemask
      contains at least one node.
   c) add a case for MPOL_PREFERRED:  if in coming was not empty
      and resulting mask IS empty, user specified invalid nodes.
      Return EINVAL.
   c) remove the now redundant check for memoryless nodes

3) remove the now redundant masking of policy nodes for interleave
   policy from mpol_new().

4) Now that mpol_check_policy() contextualizes the nodemask, remove
   the in-line nodes_and() from sys_mbind().  I believe that this
   restores mbind() to the behavior before the memoryless-nodes
   patch series.  E.g., we'll no longer treat an invalid nodemask
   with MPOL_PREFERRED as local allocation.

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>

 include/linux/cpuset.h |    3 --
 mm/mempolicy.c         |   61 ++++++++++++++++++++++++++++---------------------
 2 files changed, 36 insertions(+), 28 deletions(-)

Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c	2008-02-08 11:11:34.000000000 -0500
+++ Linux/mm/mempolicy.c	2008-02-08 13:40:40.000000000 -0500
@@ -116,22 +116,51 @@ static void mpol_rebind_policy(struct me
 /* Do sanity checking on a policy */
 static int mpol_check_policy(int mode, nodemask_t *nodes)
 {
-	int empty = nodes_empty(*nodes);
+	int was_empty, is_empty;
+
+	if (!nodes)
+		return 0;
+
+	/*
+	 * "Contextualize" the in-coming nodemast for cpusets:
+	 * Remember whether in-coming nodemask was empty,  If not,
+	 * restrict the nodes to the allowed nodes in the cpuset.
+	 * This is guaranteed to be a subset of nodes with memory.
+	 */
+	cpuset_update_task_memory_state();
+	is_empty = was_empty = nodes_empty(*nodes);
+	if (!was_empty) {
+		nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
+		is_empty = nodes_empty(*nodes);	/* after "contextualization" */
+	}
 
 	switch (mode) {
 	case MPOL_DEFAULT:
-		if (!empty)
+		/*
+		 * require caller to specify an empty nodemask
+		 * before "contextualization"
+		 */
+		if (!was_empty)
 			return -EINVAL;
 		break;
 	case MPOL_BIND:
 	case MPOL_INTERLEAVE:
-		/* Preferred will only use the first bit, but allow
-		   more for now. */
-		if (empty)
+		/*
+		 * require at least 1 valid node after "contextualization"
+		 */
+		if (is_empty)
+			return -EINVAL;
+		break;
+	case MPOL_PREFERRED:
+		/*
+		 * Did caller specify invalid nodes?
+		 * Don't silently accept this as "local allocation".
+		 */
+		if (!was_empty && is_empty)
 			return -EINVAL;
 		break;
 	}
- 	return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
+ 	return 0;
 }
 
 /* Generate a custom zonelist for the BIND policy. */
@@ -188,8 +217,6 @@ static struct mempolicy *mpol_new(int mo
 	switch (mode) {
 	case MPOL_INTERLEAVE:
 		policy->v.nodes = *nodes;
-		nodes_and(policy->v.nodes, policy->v.nodes,
-					node_states[N_HIGH_MEMORY]);
 		if (nodes_weight(policy->v.nodes) == 0) {
 			kmem_cache_free(policy_cache, policy);
 			return ERR_PTR(-EINVAL);
@@ -421,18 +448,6 @@ static int mbind_range(struct vm_area_st
 	return err;
 }
 
-static int contextualize_policy(int mode, nodemask_t *nodes)
-{
-	if (!nodes)
-		return 0;
-
-	cpuset_update_task_memory_state();
-	if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
-		return -EINVAL;
-	return mpol_check_policy(mode, nodes);
-}
-
-
 /*
  * Update task->flags PF_MEMPOLICY bit: set iff non-default
  * mempolicy.  Allows more rapid checking of this (combined perhaps
@@ -468,7 +483,7 @@ static long do_set_mempolicy(int mode, n
 {
 	struct mempolicy *new;
 
-	if (contextualize_policy(mode, nodes))
+	if (mpol_check_policy(mode, nodes))
 		return -EINVAL;
 	new = mpol_new(mode, nodes);
 	if (IS_ERR(new))
@@ -915,10 +930,6 @@ asmlinkage long sys_mbind(unsigned long 
 	err = get_nodes(&nodes, nmask, maxnode);
 	if (err)
 		return err;
-#ifdef CONFIG_CPUSETS
-	/* Restrict the nodes to the allowed nodes in the cpuset */
-	nodes_and(nodes, nodes, current->mems_allowed);
-#endif
 	return do_mbind(start, len, mode, &nodes, flags);
 }
 
Index: Linux/include/linux/cpuset.h
===================================================================
--- Linux.orig/include/linux/cpuset.h	2008-02-08 11:11:34.000000000 -0500
+++ Linux/include/linux/cpuset.h	2008-02-08 11:12:43.000000000 -0500
@@ -26,8 +26,6 @@ extern nodemask_t cpuset_mems_allowed(st
 #define cpuset_current_mems_allowed (current->mems_allowed)
 void cpuset_init_current_mems_allowed(void);
 void cpuset_update_task_memory_state(void);
-#define cpuset_nodes_subset_current_mems_allowed(nodes) \
-		nodes_subset((nodes), current->mems_allowed)
 int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl);
 
 extern int __cpuset_zone_allowed_softwall(struct zone *z, gfp_t gfp_mask);
@@ -103,7 +101,6 @@ static inline nodemask_t cpuset_mems_all
 #define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
 static inline void cpuset_init_current_mems_allowed(void) {}
 static inline void cpuset_update_task_memory_state(void) {}
-#define cpuset_nodes_subset_current_mems_allowed(nodes) (1)
 
 static inline int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl)
 {



  parent reply	other threads:[~2008-02-08 19:45 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-02  8:12 [2.6.24-rc8-mm1][regression?] numactl --interleave=all doesn't works on memoryless node KOSAKI Motohiro
2008-02-02  9:09 ` Andi Kleen
2008-02-02  9:37   ` KOSAKI Motohiro
2008-02-02 11:30     ` Andi Kleen
2008-02-04 19:03       ` Christoph Lameter
2008-02-04 18:20     ` Lee Schermerhorn
2008-02-05  9:26       ` [2.6.24 regression][BUGFIX] " KOSAKI Motohiro
2008-02-05 21:57         ` Lee Schermerhorn
2008-02-05 22:12           ` Christoph Lameter
2008-02-06 16:00             ` Lee Schermerhorn
2008-02-05 22:15           ` Paul Jackson
2008-02-06  2:17           ` David Rientjes
2008-02-06 16:11             ` Lee Schermerhorn
2008-02-06  6:49           ` KOSAKI Motohiro
2008-02-06 17:38         ` Lee Schermerhorn
2008-02-07  8:31           ` KOSAKI Motohiro
2008-02-08 19:45         ` Lee Schermerhorn [this message]
2008-02-09 18:11           ` [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3 KOSAKI Motohiro
2008-02-10  5:29           ` KOSAKI Motohiro
2008-02-10  5:49             ` Greg KH
2008-02-10  7:42               ` Linus Torvalds
2008-02-10 10:31                 ` Andrew Morton
2008-02-11 16:47                 ` Lee Schermerhorn
2008-02-12  4:30                   ` [PATCH for 2.6.24][regression fix] " KOSAKI Motohiro
2008-02-12  5:06                     ` David Rientjes
2008-02-12  5:07                     ` Andrew Morton
2008-02-12 13:18                       ` KOSAKI Motohiro
2008-02-05 10:17       ` [2.6.24-rc8-mm1][regression?] numactl --interleave=all doesn't works on memoryless node Paul Jackson
2008-02-05 11:14         ` KOSAKI Motohiro
2008-02-05 19:56         ` David Rientjes
2008-02-05 20:51           ` Paul Jackson
2008-02-05 21:03             ` David Rientjes
2008-02-05 21:33               ` Paul Jackson
2008-02-05 22:04                 ` Lee Schermerhorn
2008-02-05 22:44                   ` David Rientjes
2008-02-05 22:50                   ` Paul Jackson
2008-02-05 14:31       ` Mel Gorman
2008-02-05 15:23         ` Lee Schermerhorn
2008-02-05 18:12           ` Christoph Lameter
2008-02-05 18:27             ` Lee Schermerhorn
2008-02-05 19:04               ` Christoph Lameter
2008-02-05 19:15                 ` Paul Jackson
2008-02-05 20:06                   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1202499913.5346.60.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=eric.whitney@hp.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=pj@sgi.com \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [PATCH 2.6.24-mm1]  Mempolicy:  silently restrict nodemask to allowed nodes V3' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).