LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Paul Jackson <pj@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	kosaki.motohiro@jp.fujitsu.com, andi@firstfloor.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, clameter@sgi.com, mel@csn.ul.ie
Subject: Re: [2.6.24-rc8-mm1][regression?] numactl --interleave=all doesn't works on memoryless node.
Date: Tue, 5 Feb 2008 11:56:57 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.0.9999.0802051146300.5854@chino.kir.corp.google.com> (raw)
In-Reply-To: <20080205041755.3411b5cc.pj@sgi.com>

On Tue, 5 Feb 2008, Paul Jackson wrote:

> But that discussion touched on some other long standing deficiencies
> in the way that I had originally glued cpusets and memory policies
> together.  The current mechanism doesn't handle changing cpusets very
> well, especially if the number of nodes in the cpuset increases.
> 

That's because of the nodemask remaps that are done for the various 
mempolicy cases when rebinding the policy.  I agree we cannot change that 
implementation now even though it is undocumented.

The more alarming result of these remaps is in the MPOL_BIND case, as 
we've talked about before.  The language in set_mempolicy(2):

	The MPOL_BIND policy is a strict policy that restricts memory
	allocation to the nodes specified in nodemask. There won't be
	allocations on other nodes.

makes it pretty clear that allocations will not be done on other nodes not 
provided in the set_mempolicy() nodemask if the task is not swapped out.  

But the current implementation allows that if the task is either moved to 
a different cpuset or its cpuset's mems change.  For example, consider a 
task that is allowed nodes 1-3 by its cpuset and asks for a MPOL_BIND 
mempolicy of node 2.  If that cpuset's mems change to 4-6, the mempolicy 
is now effectively a bind on node 5.

> The next two steps I need to take are:
>  1) propose this patch, with careful explanation (it's easy to lose
>     one's bearings in the mappings and remappings of node numberings)
>     to a wider audience, such as linux-mm or linux-kernel, and

Thanks.

>  2) carefully test this, especially on each code path I touched in
>     mm/mempolicy.c, where the changes were delicate, to ensure I
>     didn't break any existing code.
> 
> There were also some other, smaller patches proposed, by myself and
> others.  I was preferring to address a wider set of the long standing
> issues in this area, but the others above mostly preferred the smaller
> patches.  This needs to be discussed in a wider forum, and a concensus
> reached.
> 

I think if these MPOL_* flags that you're proposing are made as generic as 
possible for all possible mempolicies (current and future), it would be 
the optimal change.  It would prevent us from having to add new flags for 
corner-cases in the future and would allow us to keep the flag set as 
small as possible.  My suggestion of MPOL_F_STATIC_NODEMASK goes a long 
way to solve these issues both for MPOL_INTERLEAVE (in conjunction with 
storing the set_mempolicy() intent) and the MPOL_BIND discrepency I 
mentioned above.

		David

  parent reply	other threads:[~2008-02-05 20:00 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-02  8:12 KOSAKI Motohiro
2008-02-02  9:09 ` Andi Kleen
2008-02-02  9:37   ` KOSAKI Motohiro
2008-02-02 11:30     ` Andi Kleen
2008-02-04 19:03       ` Christoph Lameter
2008-02-04 18:20     ` Lee Schermerhorn
2008-02-05  9:26       ` [2.6.24 regression][BUGFIX] " KOSAKI Motohiro
2008-02-05 21:57         ` Lee Schermerhorn
2008-02-05 22:12           ` Christoph Lameter
2008-02-06 16:00             ` Lee Schermerhorn
2008-02-05 22:15           ` Paul Jackson
2008-02-06  2:17           ` David Rientjes
2008-02-06 16:11             ` Lee Schermerhorn
2008-02-06  6:49           ` KOSAKI Motohiro
2008-02-06 17:38         ` Lee Schermerhorn
2008-02-07  8:31           ` KOSAKI Motohiro
2008-02-08 19:45         ` [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3 Lee Schermerhorn
2008-02-09 18:11           ` KOSAKI Motohiro
2008-02-10  5:29           ` KOSAKI Motohiro
2008-02-10  5:49             ` Greg KH
2008-02-10  7:42               ` Linus Torvalds
2008-02-10 10:31                 ` Andrew Morton
2008-02-11 16:47                 ` Lee Schermerhorn
2008-02-12  4:30                   ` [PATCH for 2.6.24][regression fix] " KOSAKI Motohiro
2008-02-12  5:06                     ` David Rientjes
2008-02-12  5:07                     ` Andrew Morton
2008-02-12 13:18                       ` KOSAKI Motohiro
2008-02-05 10:17       ` [2.6.24-rc8-mm1][regression?] numactl --interleave=all doesn't works on memoryless node Paul Jackson
2008-02-05 11:14         ` KOSAKI Motohiro
2008-02-05 19:56         ` David Rientjes [this message]
2008-02-05 20:51           ` Paul Jackson
2008-02-05 21:03             ` David Rientjes
2008-02-05 21:33               ` Paul Jackson
2008-02-05 22:04                 ` Lee Schermerhorn
2008-02-05 22:44                   ` David Rientjes
2008-02-05 22:50                   ` Paul Jackson
2008-02-05 14:31       ` Mel Gorman
2008-02-05 15:23         ` Lee Schermerhorn
2008-02-05 18:12           ` Christoph Lameter
2008-02-05 18:27             ` Lee Schermerhorn
2008-02-05 19:04               ` Christoph Lameter
2008-02-05 19:15                 ` Paul Jackson
2008-02-05 20:06                   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.0.9999.0802051146300.5854@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=clameter@sgi.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=pj@sgi.com \
    --subject='Re: [2.6.24-rc8-mm1][regression?] numactl --interleave=all doesn'\''t works on memoryless node.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).