LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask
@ 2019-05-25  7:07 zhong jiang
  2019-05-25 18:28 ` Andrew Morton
  2019-06-27  9:59 ` Vlastimil Babka
  0 siblings, 2 replies; 7+ messages in thread
From: zhong jiang @ 2019-05-25  7:07 UTC (permalink / raw)
  To: akpm, osalvador, khandual, mhocko, mgorman, aarcange
  Cc: rcampbell, linux-mm, linux-kernel

We bind an different node to different vma, Unluckily,
it will bind different vma to same node by checking the /proc/pid/numa_maps.   
Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
has introduced the issue.  when we change memory policy by seting cpuset.mems,
A process will rebind the specified policy more than one times. 
if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger.
Maybe result in the out of memory which allocating memory from same node.

Fixes: 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") 
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/mempolicy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index e3ab1d9..a60a3be 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
 	else {
 		nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
 								*nodes);
-		pol->w.cpuset_mems_allowed = tmp;
+		pol->w.cpuset_mems_allowed = *nodes;
 	}
 
 	if (nodes_empty(tmp))
-- 
1.7.12.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask
  2019-05-25  7:07 [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask zhong jiang
@ 2019-05-25 18:28 ` Andrew Morton
  2019-05-27 12:23   ` Vlastimil Babka
  2019-06-27  9:59 ` Vlastimil Babka
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2019-05-25 18:28 UTC (permalink / raw)
  To: zhong jiang
  Cc: osalvador, khandual, mhocko, mgorman, aarcange, rcampbell,
	linux-mm, linux-kernel, Vlastimil Babka

(Cc Vlastimil)

On Sat, 25 May 2019 15:07:23 +0800 zhong jiang <zhongjiang@huawei.com> wrote:

> We bind an different node to different vma, Unluckily,
> it will bind different vma to same node by checking the /proc/pid/numa_maps.   
> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
> has introduced the issue.  when we change memory policy by seting cpuset.mems,
> A process will rebind the specified policy more than one times. 
> if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger.
> Maybe result in the out of memory which allocating memory from same node.
> 
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
>  	else {
>  		nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
>  								*nodes);
> -		pol->w.cpuset_mems_allowed = tmp;
> +		pol->w.cpuset_mems_allowed = *nodes;
>  	}
>  
>  	if (nodes_empty(tmp))

hm, I'm not surprised the code broke.  What the heck is going on in
there?  It used to have a perfunctory comment, but Vlastimil deleted
it.

Could someone please propose a comment for the above code block
explaining why we're doing what we do?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask
  2019-05-25 18:28 ` Andrew Morton
@ 2019-05-27 12:23   ` Vlastimil Babka
  2019-05-27 13:58     ` zhong jiang
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2019-05-27 12:23 UTC (permalink / raw)
  To: Andrew Morton, zhong jiang
  Cc: osalvador, khandual, mhocko, mgorman, aarcange, rcampbell,
	linux-mm, linux-kernel

On 5/25/19 8:28 PM, Andrew Morton wrote:
> (Cc Vlastimil)

Oh dear, 2 years and I forgot all the details about how this works.

> On Sat, 25 May 2019 15:07:23 +0800 zhong jiang <zhongjiang@huawei.com> wrote:
> 
>> We bind an different node to different vma, Unluckily,
>> it will bind different vma to same node by checking the /proc/pid/numa_maps.   
>> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
>> has introduced the issue.  when we change memory policy by seting cpuset.mems,
>> A process will rebind the specified policy more than one times. 
>> if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger.
>> Maybe result in the out of memory which allocating memory from same node.

I have a hard time understanding what the problem is. Could you please
write it as a (pseudo) reproducer? I.e. an example of the process/admin
mempolicy/cpuset actions that have some wrong observed results vs the
correct expected result.

>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
>>  	else {
>>  		nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
>>  								*nodes);
>> -		pol->w.cpuset_mems_allowed = tmp;
>> +		pol->w.cpuset_mems_allowed = *nodes;

Looks like a mechanical error on my side when removing the code for
step1+step2 rebinding. Before my commit there was

pol->w.cpuset_mems_allowed = step ? tmp : *nodes;

Since 'step' was removed and thus 0, I should have used *nodes indeed.
Thanks for catching that.

>>  	}
>>  
>>  	if (nodes_empty(tmp))
> 
> hm, I'm not surprised the code broke.  What the heck is going on in
> there?  It used to have a perfunctory comment, but Vlastimil deleted
> it.

Yeah the comment was specific for the case that was being removed.

> Could someone please propose a comment for the above code block
> explaining why we're doing what we do?

I'll have to relearn this first...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask
  2019-05-27 12:23   ` Vlastimil Babka
@ 2019-05-27 13:58     ` zhong jiang
  2019-06-27  3:57       ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: zhong jiang @ 2019-05-27 13:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, osalvador, khandual, mhocko, mgorman, aarcange,
	rcampbell, linux-mm, linux-kernel

On 2019/5/27 20:23, Vlastimil Babka wrote:
> On 5/25/19 8:28 PM, Andrew Morton wrote:
>> (Cc Vlastimil)
> Oh dear, 2 years and I forgot all the details about how this works.
>
>> On Sat, 25 May 2019 15:07:23 +0800 zhong jiang <zhongjiang@huawei.com> wrote:
>>
>>> We bind an different node to different vma, Unluckily,
>>> it will bind different vma to same node by checking the /proc/pid/numa_maps.   
>>> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
>>> has introduced the issue.  when we change memory policy by seting cpuset.mems,
>>> A process will rebind the specified policy more than one times. 
>>> if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger.
>>> Maybe result in the out of memory which allocating memory from same node.
> I have a hard time understanding what the problem is. Could you please
> write it as a (pseudo) reproducer? I.e. an example of the process/admin
> mempolicy/cpuset actions that have some wrong observed results vs the
> correct expected result.
Sorry, I havn't an testcase to reproduce the issue. At first, It was disappeared by
my colleague to configure the xml to start an vm.  To his suprise, The bind mempolicy
doesn't work.

Thanks,
zhong jiang
>>> --- a/mm/mempolicy.c
>>> +++ b/mm/mempolicy.c
>>> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
>>>  	else {
>>>  		nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
>>>  								*nodes);
>>> -		pol->w.cpuset_mems_allowed = tmp;
>>> +		pol->w.cpuset_mems_allowed = *nodes;
> Looks like a mechanical error on my side when removing the code for
> step1+step2 rebinding. Before my commit there was
>
> pol->w.cpuset_mems_allowed = step ? tmp : *nodes;
>
> Since 'step' was removed and thus 0, I should have used *nodes indeed.
> Thanks for catching that.
>
>>>  	}
>>>  
>>>  	if (nodes_empty(tmp))
>> hm, I'm not surprised the code broke.  What the heck is going on in
>> there?  It used to have a perfunctory comment, but Vlastimil deleted
>> it.
> Yeah the comment was specific for the case that was being removed.
>
>> Could someone please propose a comment for the above code block
>> explaining why we're doing what we do?
> I'll have to relearn this first...
>
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask
  2019-05-27 13:58     ` zhong jiang
@ 2019-06-27  3:57       ` Andrew Morton
  2019-06-27  7:47         ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2019-06-27  3:57 UTC (permalink / raw)
  To: zhong jiang
  Cc: Vlastimil Babka, osalvador, khandual, mhocko, mgorman, aarcange,
	rcampbell, linux-mm, linux-kernel

On Mon, 27 May 2019 21:58:17 +0800 zhong jiang <zhongjiang@huawei.com> wrote:

> On 2019/5/27 20:23, Vlastimil Babka wrote:
> > On 5/25/19 8:28 PM, Andrew Morton wrote:
> >> (Cc Vlastimil)
> > Oh dear, 2 years and I forgot all the details about how this works.
> >
> >> On Sat, 25 May 2019 15:07:23 +0800 zhong jiang <zhongjiang@huawei.com> wrote:
> >>
> >>> We bind an different node to different vma, Unluckily,
> >>> it will bind different vma to same node by checking the /proc/pid/numa_maps.   
> >>> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
> >>> has introduced the issue.  when we change memory policy by seting cpuset.mems,
> >>> A process will rebind the specified policy more than one times. 
> >>> if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger.
> >>> Maybe result in the out of memory which allocating memory from same node.
> > I have a hard time understanding what the problem is. Could you please
> > write it as a (pseudo) reproducer? I.e. an example of the process/admin
> > mempolicy/cpuset actions that have some wrong observed results vs the
> > correct expected result.
> Sorry, I havn't an testcase to reproduce the issue. At first, It was disappeared by
> my colleague to configure the xml to start an vm.  To his suprise, The bind mempolicy
> doesn't work.

So... what do we do with this patch?

> Thanks,
> zhong jiang
> >>> --- a/mm/mempolicy.c
> >>> +++ b/mm/mempolicy.c
> >>> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
> >>>  	else {
> >>>  		nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
> >>>  								*nodes);
> >>> -		pol->w.cpuset_mems_allowed = tmp;
> >>> +		pol->w.cpuset_mems_allowed = *nodes;
> > Looks like a mechanical error on my side when removing the code for
> > step1+step2 rebinding. Before my commit there was
> >
> > pol->w.cpuset_mems_allowed = step ? tmp : *nodes;
> >
> > Since 'step' was removed and thus 0, I should have used *nodes indeed.
> > Thanks for catching that.

Was that an ack?

> >>>  	}
> >>>  
> >>>  	if (nodes_empty(tmp))
> >> hm, I'm not surprised the code broke.  What the heck is going on in
> >> there?  It used to have a perfunctory comment, but Vlastimil deleted
> >> it.
> > Yeah the comment was specific for the case that was being removed.
> >
> >> Could someone please propose a comment for the above code block
> >> explaining why we're doing what we do?
> > I'll have to relearn this first...
> >
> >
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask
  2019-06-27  3:57       ` Andrew Morton
@ 2019-06-27  7:47         ` Vlastimil Babka
  0 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2019-06-27  7:47 UTC (permalink / raw)
  To: Andrew Morton, zhong jiang
  Cc: osalvador, khandual, mhocko, mgorman, aarcange, rcampbell,
	linux-mm, linux-kernel

On 6/27/19 5:57 AM, Andrew Morton wrote:
> On Mon, 27 May 2019 21:58:17 +0800 zhong jiang <zhongjiang@huawei.com> wrote:
> 
>> On 2019/5/27 20:23, Vlastimil Babka wrote:
>>> On 5/25/19 8:28 PM, Andrew Morton wrote:
>>>> (Cc Vlastimil)
>>> Oh dear, 2 years and I forgot all the details about how this works.
>>>
>>>> On Sat, 25 May 2019 15:07:23 +0800 zhong jiang <zhongjiang@huawei.com> wrote:
>>>>
>>>>> We bind an different node to different vma, Unluckily,
>>>>> it will bind different vma to same node by checking the /proc/pid/numa_maps.   
>>>>> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
>>>>> has introduced the issue.  when we change memory policy by seting cpuset.mems,
>>>>> A process will rebind the specified policy more than one times. 
>>>>> if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger.
>>>>> Maybe result in the out of memory which allocating memory from same node.
>>> I have a hard time understanding what the problem is. Could you please
>>> write it as a (pseudo) reproducer? I.e. an example of the process/admin
>>> mempolicy/cpuset actions that have some wrong observed results vs the
>>> correct expected result.
>> Sorry, I havn't an testcase to reproduce the issue. At first, It was disappeared by
>> my colleague to configure the xml to start an vm.  To his suprise, The bind mempolicy
>> doesn't work.
> 
> So... what do we do with this patch?
> 
>> Thanks,
>> zhong jiang
>>>>> --- a/mm/mempolicy.c
>>>>> +++ b/mm/mempolicy.c
>>>>> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
>>>>>  	else {
>>>>>  		nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
>>>>>  								*nodes);
>>>>> -		pol->w.cpuset_mems_allowed = tmp;
>>>>> +		pol->w.cpuset_mems_allowed = *nodes;
>>> Looks like a mechanical error on my side when removing the code for
>>> step1+step2 rebinding. Before my commit there was
>>>
>>> pol->w.cpuset_mems_allowed = step ? tmp : *nodes;
>>>
>>> Since 'step' was removed and thus 0, I should have used *nodes indeed.
>>> Thanks for catching that.
> 
> Was that an ack?

The fix should be fine, but the description is vague. I'll try to
improve it.

>>>>>  	}
>>>>>  
>>>>>  	if (nodes_empty(tmp))
>>>> hm, I'm not surprised the code broke.  What the heck is going on in
>>>> there?  It used to have a perfunctory comment, but Vlastimil deleted
>>>> it.
>>> Yeah the comment was specific for the case that was being removed.
>>>
>>>> Could someone please propose a comment for the above code block
>>>> explaining why we're doing what we do?
>>> I'll have to relearn this first...
>>>
>>>
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask
  2019-05-25  7:07 [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask zhong jiang
  2019-05-25 18:28 ` Andrew Morton
@ 2019-06-27  9:59 ` Vlastimil Babka
  1 sibling, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2019-06-27  9:59 UTC (permalink / raw)
  To: zhong jiang, akpm, osalvador, khandual, mhocko, mgorman, aarcange
  Cc: rcampbell, linux-mm, linux-kernel

On 5/25/19 9:07 AM, zhong jiang wrote:
> We bind an different node to different vma, Unluckily,
> it will bind different vma to same node by checking the /proc/pid/numa_maps.   
> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
> has introduced the issue.  when we change memory policy by seting cpuset.mems,
> A process will rebind the specified policy more than one times. 
> if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger.
> Maybe result in the out of memory which allocating memory from same node.

OK, how about this instead?

mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE
mempoclicies when the tasks's cpuset's mems_allowed changes. For
policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES,
it works by remapping the policy's allowed nodes (stored in v.nodes)
using the previous value of mems_allowed (stored in
w.cpuset_mems_allowed) as the domain of map and the new mems_allowed
(passed as nodes) as the range of the map (see the comment of
bitmap_remap() for details).

The result of remapping is stored back as policy's nodemask in v.nodes,
and the new value of mems_allowed should be stored in
w.cpuset_mems_allowed to facilitate the next rebind, if it happens.

However, commit 213980c0f23b ("mm, mempolicy: simplify rebinding
mempolicies when updating cpusets") introduced a bug where the result of
remapping is stored in w.cpuset_mems_allowed instead. Thus, a
mempolicy's allowed nodes can evolve in an unexpected way after a series
of rebinding due to cpuset mems_allowed changes, possibly binding to a
wrong node or a smaller number of nodes which may e.g. overload them.
This patch fixes the bug so rebinding again works as intended.

> Fixes: 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") 
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>

(an example of what exactly was the sequence of set_mempolicy and cpuset
mems changes with expected wrt actual results would be nice, but I think
the above should be fine by itself)

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/mempolicy.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index e3ab1d9..a60a3be 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
>  	else {
>  		nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
>  								*nodes);
> -		pol->w.cpuset_mems_allowed = tmp;
> +		pol->w.cpuset_mems_allowed = *nodes;
>  	}
>  
>  	if (nodes_empty(tmp))
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-06-27 10:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-25  7:07 [PATCH] mm/mempolicy: Fix an incorrect rebind node in mpol_rebind_nodemask zhong jiang
2019-05-25 18:28 ` Andrew Morton
2019-05-27 12:23   ` Vlastimil Babka
2019-05-27 13:58     ` zhong jiang
2019-06-27  3:57       ` Andrew Morton
2019-06-27  7:47         ` Vlastimil Babka
2019-06-27  9:59 ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).