LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
@ 2007-02-08  2:06 KAMEZAWA Hiroyuki
  2007-02-08  7:49 ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-08  2:06 UTC (permalink / raw)
  To: LKML; +Cc: clameter, Andrew Morton, Andi Kleen, GOTO

Hi, This is much easier fix than previous one...
--
following is back trace of NULL pointer access in slab_node().
This patch fix this.
== backtrace from crash (linux-2.6.20) ==
 #0 [BSP:e000000121f412d8] schedule at a00000010061ccc0
 #1 [BSP:e000000121f41280] rwsem_down_failed_common at a000000100290490
 #2 [BSP:e000000121f41260] rwsem_down_read_failed at a000000100620d30
 #3 [BSP:e000000121f41240] down_read at a0000001000b01a0
 #4 [BSP:e000000121f411e8] ia64_do_page_fault at a000000100625710
 #5 [BSP:e000000121f411e8] ia64_leave_kernel at a00000010000c660
  EFRAME: e000000121f47100
      B0: a00000010013cc40      CR_IIP: a00000010012aa30
 CR_IPSR: 0000101008022018      CR_IFS: 8000000000000205
  AR_PFS: 0000000000000309      AR_RSC: 0000000000000003
 AR_UNAT: 0000000000000000     AR_RNAT: 0000000000000000
  AR_CCV: 0000000000000000     AR_FPSR: 0009804c8a70033f
  LOADRS: 0000000000000000 AR_BSPSTORE: 0000000000000000
      B6: a00000010003f040          B7: a00000010000ccd0
      PR: 000000000055a9a5          R1: a000000100d5a5b0
      R2: e00000010c50df7c          R3: 0000000000000030
      R8: 0000000000000000          R9: e00000011dc52930
     R10: e00000011dc52928         R11: e00000010c50df80
     R12: e000000121f472c0         R13: e000000121f40000
     R14: 0000000000000002         R15: 000000003fffff00
     R16: 0000000010400000         R17: e000000121f40000
     R18: a000000100b5a9d0         R19: e000000121f40018
     R20: e000000121f40c84         R21: 0000000000000000
     R22: e000000121f47330         R23: e000000121f47334
     R24: e000000121f40b88         R25: e000000121f47340
     R26: e000000121f47334         R27: 0000000000000000
     R28: 0000000000000000         R29: e000000121f47338
     R30: 000000007fffffff         R31: a000000100b5b5e0
      F6: 1003eccd55056199632ec     F7: 1003e9e3779b97f4a7c16
      F8: 1003e0a00000010001422     F9: 1003e000000000fa00000
     F10: 1003e000000003b9aca00    F11: 1003e431bde82d7b634db
 #6 [BSP:e000000121f411c0] slab_node at a00000010012aa30
 #7 [BSP:e000000121f41190] alternate_node_alloc at a00000010013cc40
 #8 [BSP:e000000121f41160] kmem_cache_alloc at a00000010013dc40
 #9 [BSP:e000000121f41100] desc_prologue at a00000010003ee00
#10 [BSP:e000000121f410c0] unw_decode_r2 at a00000010003f0c0
#11 [BSP:e000000121f41068] find_save_locs at a00000010003fbf0
#12 [BSP:e000000121f41038] unw_init_frame_info at a000000100040900
#13 [BSP:e000000121f41010] unw_init_running at a00000010000ccf0
==
This panic(hang) was found by a numa test-set on a system with 3 nodes, where
node(2) was memory-less-node.

This patch fixes zero-length zonelist problem in MPOL_MBIND.
If the length of zonelist is zero, just returns -EINVAL.

Changelog: v1 -> v2
- avoid extra pgdat scanning....it is not necessary.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


Index: linux-2.6.20/mm/mempolicy.c
===================================================================
--- linux-2.6.20.orig/mm/mempolicy.c	2007-02-08 09:50:45.000000000 +0900
+++ linux-2.6.20/mm/mempolicy.c	2007-02-08 10:35:53.000000000 +0900
@@ -144,7 +144,7 @@
 	max++;			/* space for zlcache_ptr (see mmzone.h) */
 	zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
 	if (!zl)
-		return NULL;
+		return PTR_ERR(-ENOMEM);
 	zl->zlcache_ptr = NULL;
 	num = 0;
 	/* First put in the highest zones from all nodes, then all the next 
@@ -162,6 +162,10 @@
 			break;
 		k--;
 	}
+	if (!num) {
+		kfree(zl);
+		return PTR_ERR(-EINVAL);
+	}
 	zl->zones[num] = NULL;
 	return zl;
 }
@@ -170,6 +174,7 @@
 static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
 {
 	struct mempolicy *policy;
+	void  *val;
 
 	PDprintk("setting mode %d nodes[0] %lx\n", mode, nodes_addr(*nodes)[0]);
 	if (mode == MPOL_DEFAULT)
@@ -192,11 +197,12 @@
 			policy->v.preferred_node = -1;
 		break;
 	case MPOL_BIND:
-		policy->v.zonelist = bind_zonelist(nodes);
-		if (policy->v.zonelist == NULL) {
+		val = bind_zonelist(nodes);
+		if (IS_ERR(val)) {
 			kmem_cache_free(policy_cache, policy);
-			return ERR_PTR(-ENOMEM);
+			return val;
 		}
+		policy->v.zonelist = val;
 		break;
 	}
 	policy->policy = mode;
@@ -1662,12 +1668,12 @@
 
 		zonelist = bind_zonelist(&nodes);
 
-		/* If no mem, then zonelist is NULL and we keep old zonelist.
+		/* If no mem, then zonelist is ERR_PTR and we keep old zonelist.
 		 * If that old zonelist has no remaining mems_allowed nodes,
 		 * then zonelist_policy() will "FALL THROUGH" to MPOL_DEFAULT.
 		 */
 
-		if (zonelist) {
+		if (!IS_ERR(zonelist)) {
 			/* Good - got mem - substitute new zonelist */
 			kfree(pol->v.zonelist);
 			pol->v.zonelist = zonelist;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
  2007-02-08  2:06 [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2 KAMEZAWA Hiroyuki
@ 2007-02-08  7:49 ` Andi Kleen
  2007-02-08  8:00   ` Andrew Morton
  2007-02-08  8:16   ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2007-02-08  7:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: LKML, clameter, Andrew Morton, GOTO


> This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> node(2) was memory-less-node.

I still think it's the wrong fix -- just get rid of the memory less node.
I expect you'll likely run into more problems with that setup anyways.

>  static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
>  {
>  	struct mempolicy *policy;
> +	void  *val;

Using void * here is nasty when it's a zonelist pointer.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
  2007-02-08  7:49 ` Andi Kleen
@ 2007-02-08  8:00   ` Andrew Morton
  2007-02-08  8:03     ` Andi Kleen
  2007-02-08  8:16   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2007-02-08  8:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO

On Thu, 8 Feb 2007 08:49:41 +0100 Andi Kleen <ak@suse.de> wrote:

> 
> > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > node(2) was memory-less-node.
> 
> I still think it's the wrong fix -- just get rid of the memory less node.

"Let's break it even more"?

> I expect you'll likely run into more problems with that setup anyways.

What happens if he doesn't run into more problems?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
  2007-02-08  8:00   ` Andrew Morton
@ 2007-02-08  8:03     ` Andi Kleen
  2007-02-08  8:08       ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2007-02-08  8:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO

On Thursday 08 February 2007 09:00, Andrew Morton wrote:
> On Thu, 8 Feb 2007 08:49:41 +0100 Andi Kleen <ak@suse.de> wrote:
> 
> > 
> > > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > > node(2) was memory-less-node.
> > 
> > I still think it's the wrong fix -- just get rid of the memory less node.
> 
> "Let's break it even more"?

I still don't get what you believe what would be broken then.

> > I expect you'll likely run into more problems with that setup anyways.
> 
> What happens if he doesn't run into more problems?

Then he's lucky. I ran into problems at least when I still had the empty
nodes some time ago on x86-64.  Christoph said SN2 is doing the same.

iirc slab blew up at least, but that  might be fixed by now. But it's a little risky 
because there is more code now that is node aware.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
  2007-02-08  8:03     ` Andi Kleen
@ 2007-02-08  8:08       ` Andrew Morton
  2007-02-08  8:19         ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2007-02-08  8:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO

On Thu, 8 Feb 2007 09:03:46 +0100 Andi Kleen <ak@suse.de> wrote:

> On Thursday 08 February 2007 09:00, Andrew Morton wrote:
> > On Thu, 8 Feb 2007 08:49:41 +0100 Andi Kleen <ak@suse.de> wrote:
> > 
> > > 
> > > > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > > > node(2) was memory-less-node.
> > > 
> > > I still think it's the wrong fix -- just get rid of the memory less node.
> > 
> > "Let's break it even more"?
> 
> I still don't get what you believe what would be broken then.

A node with no memory is physical reality.  The kernel should do its best
handle and report it accurately.  Pretending that the CPUs on that node are
local to a different node's memory (as I understand your proposal) goes
against that.

> > > I expect you'll likely run into more problems with that setup anyways.
> > 
> > What happens if he doesn't run into more problems?
> 
> Then he's lucky. I ran into problems at least when I still had the empty
> nodes some time ago on x86-64.  Christoph said SN2 is doing the same.
> 
> iirc slab blew up at least, but that  might be fixed by now. But it's a little risky 
> because there is more code now that is node aware.
> 

Well...  I'd suggest that we try to struggle on, get it working.  Is there
a downside to doing that?  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
  2007-02-08  7:49 ` Andi Kleen
  2007-02-08  8:00   ` Andrew Morton
@ 2007-02-08  8:16   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-08  8:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, clameter, akpm, y-goto

On Thu, 8 Feb 2007 08:49:41 +0100
Andi Kleen <ak@suse.de> wrote:

> 
> > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > node(2) was memory-less-node.
> 
> I still think it's the wrong fix -- just get rid of the memory less node.
> I expect you'll likely run into more problems with that setup anyways.
> 
memory-less-node problem is a bit big issue for this small bug.

For removing memory-less-node, we'll have to make consensus on 
"a node represents a group of memory, not cpu, not devices"

I read old e-mails on node-hotplug again.
In the early stage of node-hotplug, I assumes "a node has memory".
After discussion between several groups, I knew there are people
who consider groups of something as node.
(and we had to change our plan......)
like this (old) homepage's FAQ 5. http://lse.sourceforge.net/numa/faq/


> >  static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
> >  {
> >  	struct mempolicy *policy;
> > +	void  *val;
> 
> Using void * here is nasty when it's a zonelist pointer.
> 
okay. changes this to be more neat one..take3 will come.

-Kame





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
  2007-02-08  8:08       ` Andrew Morton
@ 2007-02-08  8:19         ` Andi Kleen
  2007-02-08  8:24           ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2007-02-08  8:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO

On Thursday 08 February 2007 09:08, Andrew Morton wrote:
> On Thu, 8 Feb 2007 09:03:46 +0100 Andi Kleen <ak@suse.de> wrote:
> 
> > On Thursday 08 February 2007 09:00, Andrew Morton wrote:
> > > On Thu, 8 Feb 2007 08:49:41 +0100 Andi Kleen <ak@suse.de> wrote:
> > > 
> > > > 
> > > > > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > > > > node(2) was memory-less-node.
> > > > 
> > > > I still think it's the wrong fix -- just get rid of the memory less node.
> > > 
> > > "Let's break it even more"?
> > 
> > I still don't get what you believe what would be broken then.
> 
> A node with no memory is physical reality.  The kernel should do its best
> handle and report it accurately. 

What would be the advantage of that? 

The reason we present nodes to user space is that we can tell the user
where the memory is. You seem to try to promote it to some abstract entity
beyond that, but that doesn't seem particularly fruitful to me. I think
I prefer "down to earth" memory nodes.

> Pretending that the CPUs on that node are 
> local to a different node's memory (as I understand your proposal) goes
> against that.

Well it is then most local to that node's memory.

> 
> > > > I expect you'll likely run into more problems with that setup anyways.
> > > 
> > > What happens if he doesn't run into more problems?
> > 
> > Then he's lucky. I ran into problems at least when I still had the empty
> > nodes some time ago on x86-64.  Christoph said SN2 is doing the same.
> > 
> > iirc slab blew up at least, but that  might be fixed by now. But it's a little risky 
> > because there is more code now that is node aware.
> > 
> 
> Well...  I'd suggest that we try to struggle on, get it working.  Is there
> a downside to doing that?  

The main problem I used to have was regressions. e.g. having memory
less nodes in different combinations would break in new releases.
(on x86-64 this depends on how the DIMMs are distributed so sometimes
breakage wasn't noticed for a long time. We ended up trying all combinations
on a simulator in the end). Assigning to nearby nodes avoided that special
case.

Ok there is still one other case -- nodes that have memory, but not enough
to do anything useful. This can happen too, but is still somewhat different.
Basically all kmalloc_node() need to fail gratiously.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
  2007-02-08  8:19         ` Andi Kleen
@ 2007-02-08  8:24           ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2007-02-08  8:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO

On Thu, 8 Feb 2007 09:19:16 +0100 Andi Kleen <ak@suse.de> wrote:

> The reason we present nodes to user space is that we can tell the user
> where the memory is. You seem to try to promote it to some abstract entity
> beyond that, but that doesn't seem particularly fruitful to me. I think
> I prefer "down to earth" memory nodes.

Who said a node is all about memory?

A node is (often) a circuit board, with an edge connector, containing some,
all or even none of a) CPUs, b) memory and c) IO devices.

That is how a kernel should model and treat it, surely?  That's reality.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-02-08  8:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-08  2:06 [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2 KAMEZAWA Hiroyuki
2007-02-08  7:49 ` Andi Kleen
2007-02-08  8:00   ` Andrew Morton
2007-02-08  8:03     ` Andi Kleen
2007-02-08  8:08       ` Andrew Morton
2007-02-08  8:19         ` Andi Kleen
2007-02-08  8:24           ` Andrew Morton
2007-02-08  8:16   ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).