LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: David Rientjes <rientjes@google.com>
Cc: Paul Jackson <pj@sgi.com>, Christoph Lameter <clameter@sgi.com>,
	Andi Kleen <ak@suse.de>,
	linux-kernel@vger.kernel.org,
	Michael Kerrisk <mtk-manpages@gmx.net>
Subject: Re: [patch 5/6] mempolicy: add MPOL_F_RELATIVE_NODES flag
Date: Wed, 27 Feb 2008 10:37:45 -0500	[thread overview]
Message-ID: <1204126666.5029.26.camel@localhost> (raw)
In-Reply-To: <alpine.DEB.1.00.0802261704310.25919@chino.kir.corp.google.com>

On Tue, 2008-02-26 at 17:17 -0800, David Rientjes wrote:
> On Mon, 25 Feb 2008, David Rientjes wrote:
> 
> > Adds another optional mode flag, MPOL_F_RELATIVE_NODES, that specifies
> > nodemasks passed via set_mempolicy() or mbind() should be considered
> > relative to the current task's mems_allowed.
> > 
> 
> Here's some examples of the functional changes between the default 
> actions of the various mempolicy modes and the new behavior with 
> MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES.

Nice work.  Would you consider adding this [with the corrections you
note below] to the memory policy doc under the "interaction with
cpusets" section?

> 
> To read this, the logical order follows from the left-most column to the 
> right-most:
> 
>  - "mems" is the task's mems_allowed as constrained by its attached
>    cpuset,
> 
>  - "nodemask" is the mask passed with the set_mempolicy() or mbind() call 
>    for that particular policy,
> 
>  - the first "result" is the nodemask that the policy is effected over,
> 
>  - "rebind" is the nodemask of a subsequent change to the cpuset's mems,
>    and
> 
>  - the second "result" is the nodemask that the policy is now effected 
>    over.
> 
> 			MPOL_INTERLEAVE
> 			---------------
> 	mems	nodemask	result		rebind		result
> 	1-3	0-2		1-2[*]		4-6		4-5
> 	1-3	1-2		1-2		0-2		0-1
> 	1-3	1-3		1-3		4-7		4-6
> 	1-3	2-4		2-3		0-2		1-2
> 	1-3	2-6		2-3		4-7		5-6
> 	1-3	4-7		EINVAL
> 	1-3	0-7		1-3		4-7		4-6
> 
> 			MPOL_PREFERRED
> 			--------------
> 	mems	nodemask	result		rebind		result
> 	1-3	0		EINVAL
> 	1-3	2		2		4-7		5
> 	1-3	5		EINVAL
> 
> 			MPOL_BIND
> 			---------
> 	mems	nodemask	result		rebind		result
> 	1-3	0-2		1-2		0-2		0-1
> 	1-3	1-2		1-2		2-7		2-3
> 	1-3	1-3		1-3		0-1		0-1
> 	1-3	2-4		2-3		3-6		4-5
> 	1-3	2-6		2-3		5		5
> 	1-3	4-7		EINVAL
> 	1-3	0-7		1-3		1-3		1-3

Just a note here:  If you had used the same set of "rebind targets" for
_BIND as you did for _INTERLEAVE, I would expect the same results,
because were just remapping bit masks in both cases.  Do you agree?
 
> 
>  [*] Notice how the resulting nodemask for all of these examples when
>      creating the mempolicy is intersected with mems_allowed.  This is
>      the current behavior, with contextualize_policy(), and is identical
>      to the initial result of the MPOL_F_STATIC_NODES case.
> 
>      Perhaps it would make more sense to remap the nodemask when it is
>      created, as well, in the ~MPOL_F_STATIC_NODES case.  For example, in
>      this case, the "result" would be 1-3 instead.
> 
>      That is a departure from what is currently implemented in HEAD (and,
>      thus, can be used as ample justification for the above behavior) but
>      makes more sense.  Thoughts?

Thoughts:

1) this IS a change in behavior, right?  My first inclination is to shy
away from this.  However, ...

2) the current interaction of mempolicies with cpusets is not well
documented--until Paul's cpuset.4 man page hits the streets, anyway.
That doc does say that mempolicy is not allowed to use a node outside
the cpuset.  It does NOT say how this is enforced--reject vs masking vs
remap.  The set_mempolicy(2) and mbind(2) man pages [in at least 2.70
man pages] says that you get EINVAL if you specify a node outside the
current cpuset constraints.  This was relaxed by the recent patch to
"silently restrict" the nodes to mems allowed.

Since we update the man pages anyway, we COULD change it to say that we
remap policy to allowed nodes.  However, the application may have chosen
the nodes based on some knowledge of hardware topology, such as IO
attachement, interrupt handling cpus, ...  In this case, remapping
doesn't make so much sense to me.  

If you need/want a mode that remaps policy to mems allowed on
installation--e.g., to provide the maximum number of interleave
nodes--how about yet another flag, such as '_REMAP, to effect this
behavior?

Just a thought...

> 
> 			MPOL_INTERLEAVE | MPOL_F_STATIC_NODES
> 			-------------------------------------
> 	mems	nodemask	result		rebind		result
> 	1-3	0-2		1-2		4-6		nil
> 	1-3	1-2		1-2		0-2		1-2
> 	1-3	1-3		1-3		4-7		nil
> 	1-3	2-4		2-3		0-2		2
> 	1-3	2-6		2-3		4-7		4-6
> 	1-3	4-7		EINVAL
> 	1-3	0-7		1-3		4-7		4-7

'nil' falls back to local allocation, right?

> 
> 			MPOL_PREFERRED | MPOL_F_STATIC_NODES
> 			------------------------------------
> 	mems	nodemask	result		rebind		result
> 	1-3	0		EINVAL
> 	1-3	2		2		4-7		-1[**]
> 	1-3	5		EINVAL
> 
>  [**] Upon further rebind with a nodemask of 2, the preferred node would
>       again be 2.

Here, '-1' means 'local allocation'.  [Note for documentation...]

> 
> 			MPOL_BIND | MPOL_F_STATIC_NODES
> 			-------------------------------
> 	mems	nodemask	result		rebind		result
> 	1-3	0-2		1-2		0-2		0-2
> 	1-3	1-2		1-2		2-7		2
> 	1-3	1-3		1-3		0-1		1
> 	1-3	2-4		2-3		3-6		3-4
> 	1-3	2-6		2-3		5		5
> 	1-3	4-7		EINVAL
> 	1-3	0-7		1-3		1-3		1-3
> 
> 			MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES
> 			---------------------------------------
> 	mems	nodemask	result		rebind		result
> 	1-3	0-2		1-3		4-6		4-6
> 	1-3	1-2		2-3		0-2		1-2
> 	1-3	1-3		1-3		4-7		5-7
> 	1-3	2-4		1-3		0-2		0-2
> 	1-3	2-6		1-3		4-7		4-7
> 	1-3	4-7		1-3		0-1,5		0-1,5
> 	1-3	0-7		1-3		4-7		4-7
> 
> 			MPOL_PREFERRED | MPOL_F_RELATIVE_NODES
> 			--------------------------------------
> 	mems	nodemask	result		rebind		result[***]
> 	1-3	0		1		0		1
> 	1-3	2		3		4-7		3
> 	1-3	5		3		0-7		3
> 
>  [***] All of these results are wrong and will be corrected in the next
>        posting of the patchset.  They change the preferred node in some
>        cases to be a node that is expressly excluded from being accessed
>        by the cpuset mems change.
> 
> 			MPOL_BIND | MPOL_F_RELATIVE_NODES
> 			---------------------------------
> 	mems	nodemask	result		rebind		result
> 	1-3	0-2		1-3		0-2		0-2
> 	1-3	1-2		2-3		2-7		3-4
> 	1-3	1-3		1-3		0-1		0-1
> 	1-3	2-4		1-3		3-6		3,5-6
> 	1-3	2-6		1-3		5		5
> 	1-3	4-7		1-3		0-3,6		0-2,6
> 	1-3	0-7		1-3		1-3		1-3


  parent reply	other threads:[~2008-02-27 15:37 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-25 15:35 [patch 1/6] mempolicy: convert MPOL constants to enum David Rientjes
2008-02-25 15:35 ` [patch 2/6] mempolicy: support optional mode flags David Rientjes
2008-02-25 15:35   ` [patch 3/6] mempolicy: add MPOL_F_STATIC_NODES flag David Rientjes
2008-02-25 15:35     ` [patch 4/6] mempolicy: add bitmap_onto() and bitmap_fold() operations David Rientjes
2008-02-25 15:35       ` [patch 5/6] mempolicy: add MPOL_F_RELATIVE_NODES flag David Rientjes
2008-02-25 15:35         ` [patch 6/6] mempolicy: update NUMA memory policy documentation David Rientjes
2008-02-26 17:34           ` Paul Jackson
2008-02-26 21:23             ` David Rientjes
2008-02-26  6:12         ` [patch 5/6] mempolicy: add MPOL_F_RELATIVE_NODES flag Paul Jackson
2008-02-26  6:45           ` David Rientjes
2008-02-26 17:44         ` Paul Jackson
2008-02-26 21:17           ` David Rientjes
2008-02-26 21:30             ` Paul Jackson
2008-02-26 21:27           ` Lee Schermerhorn
2008-02-27  1:17         ` David Rientjes
2008-02-27  1:31           ` David Rientjes
2008-02-27  2:30           ` Paul Jackson
2008-02-27 15:37           ` Lee Schermerhorn [this message]
2008-02-27 17:09             ` Paul Jackson
2008-02-28 21:08             ` David Rientjes
2008-02-26  5:46     ` [patch 3/6] mempolicy: add MPOL_F_STATIC_NODES flag Paul Jackson
2008-02-26  6:53       ` David Rientjes
2008-02-26 17:56     ` Paul Jackson
2008-02-26 21:02       ` David Rientjes
2008-02-26 21:32         ` Lee Schermerhorn
2008-02-26 21:54           ` David Rientjes
2008-02-26 22:08             ` Paul Jackson
2008-02-26 21:39         ` Paul Jackson
2008-02-26  3:20 ` [patch 1/6] mempolicy: convert MPOL constants to enum Paul Jackson
2008-02-26  3:35   ` David Rientjes
2008-02-26  4:02     ` Paul Jackson
2008-02-26  4:21       ` David Rientjes
2008-02-26  4:46         ` Paul Jackson
2008-02-27 19:35 ` Christoph Lameter
2008-02-27 19:59   ` David Rientjes
2008-03-01  0:44 David Rientjes
2008-03-01  0:45 ` [patch 2/6] mempolicy: support optional mode flags David Rientjes
2008-03-01  0:45   ` [patch 3/6] mempolicy: add MPOL_F_STATIC_NODES flag David Rientjes
2008-03-01  0:45     ` [patch 4/6] mempolicy: add bitmap_onto() and bitmap_fold() operations David Rientjes
2008-03-01  0:45       ` [patch 5/6] mempolicy: add MPOL_F_RELATIVE_NODES flag David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1204126666.5029.26.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk-manpages@gmx.net \
    --cc=pj@sgi.com \
    --cc=rientjes@google.com \
    --subject='Re: [patch 5/6] mempolicy: add MPOL_F_RELATIVE_NODES flag' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).