LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 0/2] bootmem: Fix node-setup agnostic free_bootmem()
@ 2008-04-12 22:33 Johannes Weiner
  2008-04-12 22:33 ` [patch 1/2] bootmem: Revert "mm: fix boundary checking in free_bootmem_core" Johannes Weiner
  2008-04-12 22:33 ` [patch 2/2] bootmem: Node-setup agnostic free_bootmem() Johannes Weiner
  0 siblings, 2 replies; 21+ messages in thread
From: Johannes Weiner @ 2008-04-12 22:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, Yinghai Lu,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

[the first send didn't make it to lkml, so here it is again]

Hi,

as I was doing some clean-ups in the bootmem allocator, I noticed the
patch

    5a982cbc7b3fe6cf72266f319286f29963c71b9e
    mm: fix boundary checking in free_bootmem_core

which seems to implement the opposite of what the subject promises.

It makes input arguments acceptable if they are `a bit wrong' and
silently aborts if they are completely off the whack.

Please have a look at the two patches I propose: one to revert the
original and the other one to reimplement what the whole thing was
really about: having free_bootmem() itself look up the node holding
the specified address range.

Note also that the reverted patch changed a helper function and
therefore also affected free_bootmem_node(), making the latter paper
over bugs as well.

	Hannes

Original version:

 mm/bootmem.c |   25 +++++++++++++++++++------
 1 files changed, 19 insertions(+), 6 deletions(-)

New version:

 bootmem.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 1/2] bootmem: Revert "mm: fix boundary checking in free_bootmem_core"
  2008-04-12 22:33 [patch 0/2] bootmem: Fix node-setup agnostic free_bootmem() Johannes Weiner
@ 2008-04-12 22:33 ` Johannes Weiner
  2008-04-13  1:55   ` Yinghai Lu
  2008-04-12 22:33 ` [patch 2/2] bootmem: Node-setup agnostic free_bootmem() Johannes Weiner
  1 sibling, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2008-04-12 22:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, Yinghai Lu,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

[-- Attachment #1: 0001-Revert-mm-fix-boundary-checking-in-free_bootmem_co.patch --]
[-- Type: text/plain, Size: 3287 bytes --]

This reverts commit 5a982cbc7b3fe6cf72266f319286f29963c71b9e.

The intention behind this patch was to make the free_bootmem()
interface more robust with regards to the specified range and to let
it operate on multiple node setups as well.

However, it made free_bootmem_core()

  1. handle bogus node/memory-range combination input by just
     returning early without informing the callsite or screaming BUG()
     as it did before
  2. round slightly out of node-range values to the node boundaries
     instead of treating them as the invalid parameters they are

This was partially done to abuse free_bootmem_core() for node
iteration in free_bootmem (just feeding it every node on the box and
let it figure out what it wants to do with it) instead of looking up
the proper node before the call to free_bootmem_core().

It also affects free_bootmem_node() which relies on
free_bootmem_core() and on its sanity checks now removed.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
CC: Yinghai Lu <yhlu.kernel@gmail.com>
CC: Andi Kleen <ak@suse.de>
CC: Yasunori Goto <y-goto@jp.fujitsu.com>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Christoph Lameter <clameter@sgi.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 2ccea70..f6ff433 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -125,7 +125,6 @@ static int __init reserve_bootmem_core(bootmem_data_t *bdata,
 	BUG_ON(!size);
 	BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
 	BUG_ON(PFN_UP(addr + size) > bdata->node_low_pfn);
-	BUG_ON(addr < bdata->node_boot_start);
 
 	sidx = PFN_DOWN(addr - bdata->node_boot_start);
 	eidx = PFN_UP(addr + size - bdata->node_boot_start);
@@ -157,31 +156,21 @@ static void __init free_bootmem_core(bootmem_data_t *bdata, unsigned long addr,
 	unsigned long sidx, eidx;
 	unsigned long i;
 
-	BUG_ON(!size);
-
-	/* out range */
-	if (addr + size < bdata->node_boot_start ||
-		PFN_DOWN(addr) > bdata->node_low_pfn)
-		return;
 	/*
 	 * round down end of usable mem, partially free pages are
 	 * considered reserved.
 	 */
+	BUG_ON(!size);
+	BUG_ON(PFN_DOWN(addr + size) > bdata->node_low_pfn);
 
-	if (addr >= bdata->node_boot_start && addr < bdata->last_success)
+	if (addr < bdata->last_success)
 		bdata->last_success = addr;
 
 	/*
-	 * Round up to index to the range.
+	 * Round up the beginning of the address.
 	 */
-	if (PFN_UP(addr) > PFN_DOWN(bdata->node_boot_start))
-		sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
-	else
-		sidx = 0;
-
+	sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
 	eidx = PFN_DOWN(addr + size - bdata->node_boot_start);
-	if (eidx > bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start))
-		eidx = bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start);
 
 	for (i = sidx; i < eidx; i++) {
 		if (unlikely(!test_and_clear_bit(i, bdata->node_bootmem_map)))
@@ -432,9 +421,7 @@ int __init reserve_bootmem(unsigned long addr, unsigned long size,
 
 void __init free_bootmem(unsigned long addr, unsigned long size)
 {
-	bootmem_data_t *bdata;
-	list_for_each_entry(bdata, &bdata_list, list)
-		free_bootmem_core(bdata, addr, size);
+	free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
 }
 
 unsigned long __init free_all_bootmem(void)

-- 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-12 22:33 [patch 0/2] bootmem: Fix node-setup agnostic free_bootmem() Johannes Weiner
  2008-04-12 22:33 ` [patch 1/2] bootmem: Revert "mm: fix boundary checking in free_bootmem_core" Johannes Weiner
@ 2008-04-12 22:33 ` Johannes Weiner
  2008-04-13  1:59   ` Yinghai Lu
  2008-04-13 16:56   ` Andi Kleen
  1 sibling, 2 replies; 21+ messages in thread
From: Johannes Weiner @ 2008-04-12 22:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, Yinghai Lu,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

[-- Attachment #1: node-agnostic-free_bootmem.patch --]
[-- Type: text/plain, Size: 1112 bytes --]

Make free_bootmem() look up the node holding the specified address
range which lets it work transparently on single-node and multi-node
configurations.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
CC: Yinghai Lu <yhlu.kernel@gmail.com>
CC: Andi Kleen <ak@suse.de>
CC: Yasunori Goto <y-goto@jp.fujitsu.com>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Christoph Lameter <clameter@sgi.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---

Patch tested on x86_32 uma.

Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -421,7 +421,15 @@ int __init reserve_bootmem(unsigned long
 
 void __init free_bootmem(unsigned long addr, unsigned long size)
 {
-	free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
+	bootmem_data_t *bdata;
+
+	list_for_each_entry (bdata, &bdata_list, list) {
+		if (addr < bdata->node_boot_start)
+			continue;
+		free_bootmem_core(bdata, addr, size);
+		return;
+	}
+	BUG();
 }
 
 unsigned long __init free_all_bootmem(void)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] bootmem: Revert "mm: fix boundary checking in free_bootmem_core"
  2008-04-12 22:33 ` [patch 1/2] bootmem: Revert "mm: fix boundary checking in free_bootmem_core" Johannes Weiner
@ 2008-04-13  1:55   ` Yinghai Lu
  0 siblings, 0 replies; 21+ messages in thread
From: Yinghai Lu @ 2008-04-13  1:55 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-kernel, Andi Kleen, Ingo Molnar, Andrew Morton,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
> This reverts commit 5a982cbc7b3fe6cf72266f319286f29963c71b9e.
>
>  The intention behind this patch was to make the free_bootmem()
>  interface more robust with regards to the specified range and to let
>  it operate on multiple node setups as well.
>
>  However, it made free_bootmem_core()
>
>   1. handle bogus node/memory-range combination input by just
>      returning early without informing the callsite or screaming BUG()
>      as it did before
>   2. round slightly out of node-range values to the node boundaries
>      instead of treating them as the invalid parameters they are
>
>  This was partially done to abuse free_bootmem_core() for node
>  iteration in free_bootmem (just feeding it every node on the box and
>  let it figure out what it wants to do with it) instead of looking up
>  the proper node before the call to free_bootmem_core().

it seems intel having one box the [0, 4G), [8, 12G] on node 0, and
[4G, 8G) and [12G, 16) on node 1.

YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-12 22:33 ` [patch 2/2] bootmem: Node-setup agnostic free_bootmem() Johannes Weiner
@ 2008-04-13  1:59   ` Yinghai Lu
  2008-04-13 10:57     ` Johannes Weiner
  2008-04-13 16:56   ` Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2008-04-13  1:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-kernel, Andi Kleen, Ingo Molnar, Andrew Morton,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
> Make free_bootmem() look up the node holding the specified address
>  range which lets it work transparently on single-node and multi-node
>  configurations.
>
>  Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
>  CC: Yinghai Lu <yhlu.kernel@gmail.com>
>  CC: Andi Kleen <ak@suse.de>
>  CC: Yasunori Goto <y-goto@jp.fujitsu.com>
>  CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>  CC: Ingo Molnar <mingo@elte.hu>
>  CC: Christoph Lameter <clameter@sgi.com>
>  CC: Andrew Morton <akpm@linux-foundation.org>
>  ---
>
>  Patch tested on x86_32 uma.
>
>  Index: linux-2.6/mm/bootmem.c
>  ===================================================================
>  --- linux-2.6.orig/mm/bootmem.c
>  +++ linux-2.6/mm/bootmem.c
>  @@ -421,7 +421,15 @@ int __init reserve_bootmem(unsigned long
>
>   void __init free_bootmem(unsigned long addr, unsigned long size)
>   {
>  -       free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
>  +       bootmem_data_t *bdata;
>  +
>  +       list_for_each_entry (bdata, &bdata_list, list) {
>  +               if (addr < bdata->node_boot_start)
>  +                       continue;

could have chance that bootmem with reserved_early that is crossing the nodes.

YH


YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-13  1:59   ` Yinghai Lu
@ 2008-04-13 10:57     ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2008-04-13 10:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, Andi Kleen, Ingo Molnar, Andrew Morton,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

Hi,

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

> On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
>> Make free_bootmem() look up the node holding the specified address
>>  range which lets it work transparently on single-node and multi-node
>>  configurations.
>>
>>  Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
>>  CC: Yinghai Lu <yhlu.kernel@gmail.com>
>>  CC: Andi Kleen <ak@suse.de>
>>  CC: Yasunori Goto <y-goto@jp.fujitsu.com>
>>  CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>  CC: Ingo Molnar <mingo@elte.hu>
>>  CC: Christoph Lameter <clameter@sgi.com>
>>  CC: Andrew Morton <akpm@linux-foundation.org>
>>  ---
>>
>>  Patch tested on x86_32 uma.
>>
>>  Index: linux-2.6/mm/bootmem.c
>>  ===================================================================
>>  --- linux-2.6.orig/mm/bootmem.c
>>  +++ linux-2.6/mm/bootmem.c
>>  @@ -421,7 +421,15 @@ int __init reserve_bootmem(unsigned long
>>
>>   void __init free_bootmem(unsigned long addr, unsigned long size)
>>   {
>>  -       free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
>>  +       bootmem_data_t *bdata;
>>  +
>>  +       list_for_each_entry (bdata, &bdata_list, list) {
>>  +               if (addr < bdata->node_boot_start)
>>  +                       continue;
>
> could have chance that bootmem with reserved_early that is crossing
> the nodes.

Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
nodes, so I don't see where this chance could come from.

	Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-12 22:33 ` [patch 2/2] bootmem: Node-setup agnostic free_bootmem() Johannes Weiner
  2008-04-13  1:59   ` Yinghai Lu
@ 2008-04-13 16:56   ` Andi Kleen
  2008-04-15  6:23     ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2008-04-13 16:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Yinghai Lu,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

Johannes Weiner <hannes@saeurebad.de> writes:

> Make free_bootmem() look up the node holding the specified address
> range which lets it work transparently on single-node and multi-node
> configurations.

Acked-by: Andi Kleen <andi@firstfloor.org>

This is far better than the original change it replaces and which 
I also objected to in review.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-13 16:56   ` Andi Kleen
@ 2008-04-15  6:23     ` Andrew Morton
  2008-04-15  7:04       ` Yinghai Lu
  2008-04-15  7:46       ` Andi Kleen
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2008-04-15  6:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Johannes Weiner, linux-kernel, Ingo Molnar, Yinghai Lu,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:

> Johannes Weiner <hannes@saeurebad.de> writes:
> 
> > Make free_bootmem() look up the node holding the specified address
> > range which lets it work transparently on single-node and multi-node
> > configurations.
> 
> Acked-by: Andi Kleen <andi@firstfloor.org>
> 
> This is far better than the original change it replaces and which 
> I also objected to in review.
> 

So...  do we think these two patches are sufficiently safe and important for
2.6.25?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15  6:23     ` Andrew Morton
@ 2008-04-15  7:04       ` Yinghai Lu
  2008-04-15  7:15         ` Andrew Morton
  2008-04-15  7:46       ` Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2008-04-15  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Johannes Weiner, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>
>  > Johannes Weiner <hannes@saeurebad.de> writes:
>  >
>  > > Make free_bootmem() look up the node holding the specified address
>  > > range which lets it work transparently on single-node and multi-node
>  > > configurations.
>  >
>  > Acked-by: Andi Kleen <andi@firstfloor.org>
>  >
>  > This is far better than the original change it replaces and which
>  > I also objected to in review.
>  >
>
>  So...  do we think these two patches are sufficiently safe and important for
>  2.6.25?

the patch is wrong

YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15  7:04       ` Yinghai Lu
@ 2008-04-15  7:15         ` Andrew Morton
  2008-04-15  7:28           ` Yinghai Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2008-04-15  7:15 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Johannes Weiner, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:

> On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
> >
> >  > Johannes Weiner <hannes@saeurebad.de> writes:
> >  >
> >  > > Make free_bootmem() look up the node holding the specified address
> >  > > range which lets it work transparently on single-node and multi-node
> >  > > configurations.
> >  >
> >  > Acked-by: Andi Kleen <andi@firstfloor.org>
> >  >
> >  > This is far better than the original change it replaces and which
> >  > I also objected to in review.
> >  >
> >
> >  So...  do we think these two patches are sufficiently safe and important for
> >  2.6.25?
> 
> the patch is wrong
> 

The last I saw was this:

On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@saeurebad.de> wrote:

> Hi,
> 
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> 
> > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
> > ...
> >
> > could have chance that bootmem with reserved_early that is crossing
> > the nodes.
> 
> Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
> nodes, so I don't see where this chance could come from.

Is that what you're referring to?

Was Johannes observation incorrect?  If so, why?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15  7:15         ` Andrew Morton
@ 2008-04-15  7:28           ` Yinghai Lu
  2008-04-15  7:36             ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2008-04-15  7:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Johannes Weiner, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Tue, Apr 15, 2008 at 12:15 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>
>  > On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
>  > <akpm@linux-foundation.org> wrote:
>  > >
>  > > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>  > >
>  > >  > Johannes Weiner <hannes@saeurebad.de> writes:
>  > >  >
>  > >  > > Make free_bootmem() look up the node holding the specified address
>  > >  > > range which lets it work transparently on single-node and multi-node
>  > >  > > configurations.
>  > >  >
>  > >  > Acked-by: Andi Kleen <andi@firstfloor.org>
>  > >  >
>  > >  > This is far better than the original change it replaces and which
>  > >  > I also objected to in review.
>  > >  >
>  > >
>  > >  So...  do we think these two patches are sufficiently safe and important for
>  > >  2.6.25?
>  >
>  > the patch is wrong
>  >
>
>  The last I saw was this:
>
>
>  On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@saeurebad.de> wrote:
>
>  > Hi,
>  >
>  > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>  >
>  > > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
>  > > ...
>
> > >
>  > > could have chance that bootmem with reserved_early that is crossing
>  > > the nodes.
>  >
>  > Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
>  > nodes, so I don't see where this chance could come from.
>
>  Is that what you're referring to?
>
>  Was Johannes observation incorrect?  If so, why?

my patch with free_bootmem will make sure free_bootmem_core only free
bootmem in the bdata scope.
so free_bootmem can handle the cross_node bootmem that is done by
reserve_early ( done in another patch, is dropped by you because took
Jonannes).

in setup_arch for x86_64 there is one free_bootmem that is used when
ramdisk is falled out of ram map. that could be crossed by bootloader
and kexec, and kernel or second kernel is memmap=NN@SS to execlue some
memory.

anyway that is extrem case, but my patch could handle that.

I wonder if any regression caused by my previous patch? maybe on other platform?

YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15  7:28           ` Yinghai Lu
@ 2008-04-15  7:36             ` Andrew Morton
  2008-04-15 11:51               ` Johannes Weiner
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2008-04-15  7:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Johannes Weiner, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Tue, 15 Apr 2008 00:28:34 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:

> On Tue, Apr 15, 2008 at 12:15 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> >
> >  > On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
> >  > <akpm@linux-foundation.org> wrote:
> >  > >
> >  > > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
> >  > >
> >  > >  > Johannes Weiner <hannes@saeurebad.de> writes:
> >  > >  >
> >  > >  > > Make free_bootmem() look up the node holding the specified address
> >  > >  > > range which lets it work transparently on single-node and multi-node
> >  > >  > > configurations.
> >  > >  >
> >  > >  > Acked-by: Andi Kleen <andi@firstfloor.org>
> >  > >  >
> >  > >  > This is far better than the original change it replaces and which
> >  > >  > I also objected to in review.
> >  > >  >
> >  > >
> >  > >  So...  do we think these two patches are sufficiently safe and important for
> >  > >  2.6.25?
> >  >
> >  > the patch is wrong
> >  >
> >
> >  The last I saw was this:
> >
> >
> >  On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@saeurebad.de> wrote:
> >
> >  > Hi,
> >  >
> >  > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> >  >
> >  > > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
> >  > > ...
> >
> > > >
> >  > > could have chance that bootmem with reserved_early that is crossing
> >  > > the nodes.
> >  >
> >  > Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
> >  > nodes, so I don't see where this chance could come from.
> >
> >  Is that what you're referring to?
> >
> >  Was Johannes observation incorrect?  If so, why?
> 
> my patch with free_bootmem will make sure free_bootmem_core only free
> bootmem in the bdata scope.
> so free_bootmem can handle the cross_node bootmem that is done by
> reserve_early ( done in another patch, is dropped by you because took
> Jonannes).
> 
> in setup_arch for x86_64 there is one free_bootmem that is used when
> ramdisk is falled out of ram map. that could be crossed by bootloader
> and kexec, and kernel or second kernel is memmap=NN@SS to execlue some
> memory.
> 
> anyway that is extrem case, but my patch could handle that.
> 
> I wonder if any regression caused by my previous patch? maybe on other platform?
> 

Not that I'm aware of.

I restored mm-make-reserve_bootmem-can-crossed-the-nodes.patch.  Johannes,
can you please check 2.6.28-rc8-mm2, see if it looks OK?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15  6:23     ` Andrew Morton
  2008-04-15  7:04       ` Yinghai Lu
@ 2008-04-15  7:46       ` Andi Kleen
  2008-04-15 11:53         ` Johannes Weiner
  1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2008-04-15  7:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, linux-kernel, Ingo Molnar, Yinghai Lu,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

Andrew Morton wrote:
> On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
> 
>> Johannes Weiner <hannes@saeurebad.de> writes:
>>
>>> Make free_bootmem() look up the node holding the specified address
>>> range which lets it work transparently on single-node and multi-node
>>> configurations.
>> Acked-by: Andi Kleen <andi@firstfloor.org>
>>
>> This is far better than the original change it replaces and which 
>> I also objected to in review.
>>
> 
> So...  do we think these two patches are sufficiently safe and important for
> 2.6.25?

It's only strictly needed for .26 I think for some (also slightly
dubious) changes queued in git-x86.

-Andi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15  7:36             ` Andrew Morton
@ 2008-04-15 11:51               ` Johannes Weiner
  2008-04-15 18:52                 ` Yinghai Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2008-04-15 11:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yinghai Lu, Andi Kleen, linux-kernel, Ingo Molnar, Yasunori Goto,
	KAMEZAWA Hiroyuki, Christoph Lameter

Hi,

Andrew Morton <akpm@linux-foundation.org> writes:

> On Tue, 15 Apr 2008 00:28:34 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>
>> On Tue, Apr 15, 2008 at 12:15 AM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>> >
>> > On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>> >
>> >  > On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
>> >  > <akpm@linux-foundation.org> wrote:
>> >  > >
>> >  > > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>> >  > >
>> >  > >  > Johannes Weiner <hannes@saeurebad.de> writes:
>> >  > >  >
>> >  > >  > > Make free_bootmem() look up the node holding the specified address
>> >  > >  > > range which lets it work transparently on single-node and multi-node
>> >  > >  > > configurations.
>> >  > >  >
>> >  > >  > Acked-by: Andi Kleen <andi@firstfloor.org>
>> >  > >  >
>> >  > >  > This is far better than the original change it replaces and which
>> >  > >  > I also objected to in review.
>> >  > >  >
>> >  > >
>> >  > >  So...  do we think these two patches are sufficiently safe and important for
>> >  > >  2.6.25?
>> >  >
>> >  > the patch is wrong
>> >  >
>> >
>> >  The last I saw was this:
>> >
>> >
>> >  On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@saeurebad.de> wrote:
>> >
>> >  > Hi,
>> >  >
>> >  > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>> >  >
>> >  > > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
>> >  > > ...
>> >
>> > > >
>> >  > > could have chance that bootmem with reserved_early that is crossing
>> >  > > the nodes.
>> >  >
>> >  > Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
>> >  > nodes, so I don't see where this chance could come from.
>> >
>> >  Is that what you're referring to?
>> >
>> >  Was Johannes observation incorrect?  If so, why?
>> 
>> my patch with free_bootmem will make sure free_bootmem_core only free
>> bootmem in the bdata scope.
>> so free_bootmem can handle the cross_node bootmem that is done by
>> reserve_early ( done in another patch, is dropped by you because took
>> Jonannes).
>> 
>> in setup_arch for x86_64 there is one free_bootmem that is used when
>> ramdisk is falled out of ram map. that could be crossed by bootloader
>> and kexec, and kernel or second kernel is memmap=NN@SS to execlue some
>> memory.
>> 
>> anyway that is extrem case, but my patch could handle that.

Has this case ever occured?  If this could become real, I have no
objections to implement a way to handle it (why would I?), but until now
you just said that in some time in the future, this could be useful.

>> 
>> I wonder if any regression caused by my previous patch? maybe on other platform?
>> 
>
> Not that I'm aware of.

It papers over buggy usage of free_bootmem().  If its arguments are
bogus, it will just return again where it BUG()ed out before.  The pages
might be never marked free and therefor never reach the buddy allocator.

> I restored mm-make-reserve_bootmem-can-crossed-the-nodes.patch.  Johannes,
> can you please check 2.6.28-rc8-mm2, see if it looks OK?

I object to the way it is implemented.  If it is really needed, that
should be done properly:

	- remove the double loop over the area on the likely succeeding
          path and unroll the reserving on the unlikely path as it was
          done before.  Better to punish exceptional branches than
	  the working paths.
	- make reserve_bootmem_core be strict with its arguments.  If
	  you want to iterate over the bdata list, you should not just
	  throw every item at the _core functions and let them work it
	  out for themselves.  The correct parameters should be
          calculated in advance and then passed to a strict
          _bootmem_core() function that BUG()s on failure.

But still, Yinghai, you never brought in practical reasons for this
whole thing.  You talked about extreme and theoretical cases and I don't
think that this justifies breaking API or pessimizing code at all.

	Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15  7:46       ` Andi Kleen
@ 2008-04-15 11:53         ` Johannes Weiner
  2008-04-15 18:44           ` Yinghai Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2008-04-15 11:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, linux-kernel, Ingo Molnar, Yinghai Lu,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

Andi Kleen <andi@firstfloor.org> writes:

> Andrew Morton wrote:
>> On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>> 
>>> Johannes Weiner <hannes@saeurebad.de> writes:
>>>
>>>> Make free_bootmem() look up the node holding the specified address
>>>> range which lets it work transparently on single-node and multi-node
>>>> configurations.
>>> Acked-by: Andi Kleen <andi@firstfloor.org>
>>>
>>> This is far better than the original change it replaces and which 
>>> I also objected to in review.
>>>
>> 
>> So...  do we think these two patches are sufficiently safe and important for
>> 2.6.25?
>
> It's only strictly needed for .26 I think for some (also slightly
> dubious) changes queued in git-x86.

Does anything yet rely on this new free_bootmem() behaviour?  If not,
the safest thing would be to just revert the original patch in mainline
and drop the second patch completely.

	Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15 11:53         ` Johannes Weiner
@ 2008-04-15 18:44           ` Yinghai Lu
  2008-04-15 19:51             ` Johannes Weiner
  0 siblings, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2008-04-15 18:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Tue, Apr 15, 2008 at 4:53 AM, Johannes Weiner <hannes@saeurebad.de> wrote:
>
> Andi Kleen <andi@firstfloor.org> writes:
>
>  > Andrew Morton wrote:
>  >> On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>  >>
>  >>> Johannes Weiner <hannes@saeurebad.de> writes:
>  >>>
>  >>>> Make free_bootmem() look up the node holding the specified address
>  >>>> range which lets it work transparently on single-node and multi-node
>  >>>> configurations.
>  >>> Acked-by: Andi Kleen <andi@firstfloor.org>
>  >>>
>  >>> This is far better than the original change it replaces and which
>  >>> I also objected to in review.
>  >>>
>  >>
>  >> So...  do we think these two patches are sufficiently safe and important for
>  >> 2.6.25?
>  >
>  > It's only strictly needed for .26 I think for some (also slightly
>  > dubious) changes queued in git-x86.
>
>  Does anything yet rely on this new free_bootmem() behaviour?  If not,
>  the safest thing would be to just revert the original patch in mainline
>  and drop the second patch completely.

1. free_bootmem(ramdisk_image, ramdisk_size) in setup_arch of x86_64 need that
2. another patch in x86.git need that.

YH

commit f62f1fc9ef94f74fda2b456d935ba2da69fa0a40
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date:   Fri Mar 7 15:02:50 2008 -0800

    x86: reserve dma32 early for gart

    a system with 256 GB of RAM, when NUMA is disabled crashes the
    following way:

    Your BIOS doesn't leave a aperture memory hole
    Please enable the IOMMU option in the BIOS setup
    This costs you 64 MB of RAM
    Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
    Kernel panic - not syncing: Not enough memory for aperture
    Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33

    Call Trace:
     [<ffffffff84037c62>] panic+0xb2/0x190
     [<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
     [<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
     [<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
     [<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
     [<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
     [<ffffffff84506a2f>] ? _etext+0x0/0x1
     [<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
     [<ffffffff847ac795>] mem_init+0x45/0x1a0
     [<ffffffff8479ff35>] start_kernel+0x295/0x380
     [<ffffffff8479f1c2>] _sinittext+0x1c2/0x230

    the root cause is : memmap PMD is too big,
    [ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
    almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.

    solution will be:
    1. make memmap allocation get memory above 4G...
    2. reserve some dma32 range early before we try to set up memmap for all.
    and release that before pci_iommu_alloc, so gart or swiotlb could get some
    range under 4g limit for sure.

    the patch is using method 2.
    because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP

    will get
    Your BIOS doesn't leave a aperture memory hole
    Please enable the IOMMU option in the BIOS setup
    This costs you 64 MB of RAM
    Mapping aperture over 65536 KB of RAM @ 4000000
    Memory: 264245736k/268959744k available (8484k kernel code,
4187464k reserved, 4004k data, 724k init)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15 11:51               ` Johannes Weiner
@ 2008-04-15 18:52                 ` Yinghai Lu
  2008-04-15 19:43                   ` Johannes Weiner
  0 siblings, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2008-04-15 18:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Andi Kleen, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Tue, Apr 15, 2008 at 4:51 AM, Johannes Weiner <hannes@saeurebad.de> wrote:
> Hi,
>
>
>
>  Andrew Morton <akpm@linux-foundation.org> writes:
>
>  > On Tue, 15 Apr 2008 00:28:34 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>  >
>  >> On Tue, Apr 15, 2008 at 12:15 AM, Andrew Morton
>  >> <akpm@linux-foundation.org> wrote:
>  >> >
>  >> > On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>  >> >
>  >> >  > On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
>  >> >  > <akpm@linux-foundation.org> wrote:
>  >> >  > >
>  >> >  > > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>  >> >  > >
>  >> >  > >  > Johannes Weiner <hannes@saeurebad.de> writes:
>  >> >  > >  >
>  >> >  > >  > > Make free_bootmem() look up the node holding the specified address
>  >> >  > >  > > range which lets it work transparently on single-node and multi-node
>  >> >  > >  > > configurations.
>  >> >  > >  >
>  >> >  > >  > Acked-by: Andi Kleen <andi@firstfloor.org>
>  >> >  > >  >
>  >> >  > >  > This is far better than the original change it replaces and which
>  >> >  > >  > I also objected to in review.
>  >> >  > >  >
>  >> >  > >
>  >> >  > >  So...  do we think these two patches are sufficiently safe and important for
>  >> >  > >  2.6.25?
>  >> >  >
>  >> >  > the patch is wrong
>  >> >  >
>  >> >
>  >> >  The last I saw was this:
>  >> >
>  >> >
>  >> >  On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@saeurebad.de> wrote:
>  >> >
>  >> >  > Hi,
>  >> >  >
>  >> >  > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>  >> >  >
>  >> >  > > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
>  >> >  > > ...
>  >> >
>  >> > > >
>  >> >  > > could have chance that bootmem with reserved_early that is crossing
>  >> >  > > the nodes.
>  >> >  >
>  >> >  > Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
>  >> >  > nodes, so I don't see where this chance could come from.
>  >> >
>  >> >  Is that what you're referring to?
>  >> >
>  >> >  Was Johannes observation incorrect?  If so, why?
>  >>
>  >> my patch with free_bootmem will make sure free_bootmem_core only free
>  >> bootmem in the bdata scope.
>  >> so free_bootmem can handle the cross_node bootmem that is done by
>  >> reserve_early ( done in another patch, is dropped by you because took
>  >> Jonannes).
>  >>
>  >> in setup_arch for x86_64 there is one free_bootmem that is used when
>  >> ramdisk is falled out of ram map. that could be crossed by bootloader
>  >> and kexec, and kernel or second kernel is memmap=NN@SS to execlue some
>  >> memory.
>  >>
>  >> anyway that is extrem case, but my patch could handle that.
>
>  Has this case ever occured?  If this could become real, I have no
>  objections to implement a way to handle it (why would I?), but until now
>  you just said that in some time in the future, this could be useful.
>
>
>  >>
>  >> I wonder if any regression caused by my previous patch? maybe on other platform?
>  >>
>  >
>  > Not that I'm aware of.
>
>  It papers over buggy usage of free_bootmem().  If its arguments are
>  bogus, it will just return again where it BUG()ed out before.  The pages
>  might be never marked free and therefor never reach the buddy allocator.
>
>
>  > I restored mm-make-reserve_bootmem-can-crossed-the-nodes.patch.  Johannes,
>  > can you please check 2.6.28-rc8-mm2, see if it looks OK?
>
>  I object to the way it is implemented.  If it is really needed, that
>  should be done properly:
>
>         - remove the double loop over the area on the likely succeeding
>           path and unroll the reserving on the unlikely path as it was
>           done before.  Better to punish exceptional branches than
>           the working paths.
>         - make reserve_bootmem_core be strict with its arguments.  If
>           you want to iterate over the bdata list, you should not just
>           throw every item at the _core functions and let them work it
>           out for themselves.  The correct parameters should be
>           calculated in advance and then passed to a strict
>           _bootmem_core() function that BUG()s on failure.
>
>  But still, Yinghai, you never brought in practical reasons for this
>  whole thing.  You talked about extreme and theoretical cases and I don't
>  think that this justifies breaking API or pessimizing code at all.

free_bootmem(ramdisk_image, ramdisk_size) is sitting in setup_arch of
x86_64. or make that panic directly.

what i needed is: free_bootmem can free bootmem cross the nodes.

on numa
alloc_bootmem always return blocks on same nodes. but some via
reserve_early and then to bootmem via early_res_to_bootmem could be
crossing nodes.

BTW, can you look at patches in -mm about make reserve_bootmem cross the nodes?

YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15 18:52                 ` Yinghai Lu
@ 2008-04-15 19:43                   ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2008-04-15 19:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Andi Kleen, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

Hi,

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

> On Tue, Apr 15, 2008 at 4:51 AM, Johannes Weiner <hannes@saeurebad.de> wrote:
>> Hi,
>>
>>
>>
>>  Andrew Morton <akpm@linux-foundation.org> writes:
>>
>>  > On Tue, 15 Apr 2008 00:28:34 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>>  >
>>  >> On Tue, Apr 15, 2008 at 12:15 AM, Andrew Morton
>>  >> <akpm@linux-foundation.org> wrote:
>>  >> >
>>  >> > On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>>  >> >
>>  >> >  > On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
>>  >> >  > <akpm@linux-foundation.org> wrote:
>>  >> >  > >
>>  >> >  > > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>>  >> >  > >
>>  >> >  > >  > Johannes Weiner <hannes@saeurebad.de> writes:
>>  >> >  > >  >
>>  >> >  > >  > > Make free_bootmem() look up the node holding the specified address
>>  >> >  > >  > > range which lets it work transparently on single-node and multi-node
>>  >> >  > >  > > configurations.
>>  >> >  > >  >
>>  >> >  > >  > Acked-by: Andi Kleen <andi@firstfloor.org>
>>  >> >  > >  >
>>  >> >  > >  > This is far better than the original change it replaces and which
>>  >> >  > >  > I also objected to in review.
>>  >> >  > >  >
>>  >> >  > >
>>  >> >  > >  So...  do we think these two patches are sufficiently safe and important for
>>  >> >  > >  2.6.25?
>>  >> >  >
>>  >> >  > the patch is wrong
>>  >> >  >
>>  >> >
>>  >> >  The last I saw was this:
>>  >> >
>>  >> >
>>  >> >  On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@saeurebad.de> wrote:
>>  >> >
>>  >> >  > Hi,
>>  >> >  >
>>  >> >  > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>>  >> >  >
>>  >> >  > > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
>>  >> >  > > ...
>>  >> >
>>  >> > > >
>>  >> >  > > could have chance that bootmem with reserved_early that is crossing
>>  >> >  > > the nodes.
>>  >> >  >
>>  >> >  > Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
>>  >> >  > nodes, so I don't see where this chance could come from.
>>  >> >
>>  >> >  Is that what you're referring to?
>>  >> >
>>  >> >  Was Johannes observation incorrect?  If so, why?
>>  >>
>>  >> my patch with free_bootmem will make sure free_bootmem_core only free
>>  >> bootmem in the bdata scope.
>>  >> so free_bootmem can handle the cross_node bootmem that is done by
>>  >> reserve_early ( done in another patch, is dropped by you because took
>>  >> Jonannes).
>>  >>
>>  >> in setup_arch for x86_64 there is one free_bootmem that is used when
>>  >> ramdisk is falled out of ram map. that could be crossed by bootloader
>>  >> and kexec, and kernel or second kernel is memmap=NN@SS to execlue some
>>  >> memory.
>>  >>
>>  >> anyway that is extrem case, but my patch could handle that.
>>
>>  Has this case ever occured?  If this could become real, I have no
>>  objections to implement a way to handle it (why would I?), but until now
>>  you just said that in some time in the future, this could be useful.
>>
>>
>>  >>
>>  >> I wonder if any regression caused by my previous patch? maybe on other platform?
>>  >>
>>  >
>>  > Not that I'm aware of.
>>
>>  It papers over buggy usage of free_bootmem().  If its arguments are
>>  bogus, it will just return again where it BUG()ed out before.  The pages
>>  might be never marked free and therefor never reach the buddy allocator.
>>
>>
>>  > I restored mm-make-reserve_bootmem-can-crossed-the-nodes.patch.  Johannes,
>>  > can you please check 2.6.28-rc8-mm2, see if it looks OK?
>>
>>  I object to the way it is implemented.  If it is really needed, that
>>  should be done properly:
>>
>>         - remove the double loop over the area on the likely succeeding
>>           path and unroll the reserving on the unlikely path as it was
>>           done before.  Better to punish exceptional branches than
>>           the working paths.
>>         - make reserve_bootmem_core be strict with its arguments.  If
>>           you want to iterate over the bdata list, you should not just
>>           throw every item at the _core functions and let them work it
>>           out for themselves.  The correct parameters should be
>>           calculated in advance and then passed to a strict
>>           _bootmem_core() function that BUG()s on failure.
>>
>>  But still, Yinghai, you never brought in practical reasons for this
>>  whole thing.  You talked about extreme and theoretical cases and I don't
>>  think that this justifies breaking API or pessimizing code at all.
>
> free_bootmem(ramdisk_image, ramdisk_size) is sitting in setup_arch of
> x86_64. or make that panic directly.
>
> what i needed is: free_bootmem can free bootmem cross the nodes.
>
> on numa
> alloc_bootmem always return blocks on same nodes. but some via
> reserve_early and then to bootmem via early_res_to_bootmem could be
> crossing nodes.
>
> BTW, can you look at patches in -mm about make reserve_bootmem cross
> the nodes?

Yep, already looked at them.  My patches were initially against Linus'
tree which does not allow bootmem to act across node boundaries yet.

Regarding node-crossing, what do you think about my idea in
http://lkml.org/lkml/2008/4/15/139?  That way we could preserve the core
functions and keep them clean.  The design could of course be applied to
the other node-crossing functions too.

	Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15 18:44           ` Yinghai Lu
@ 2008-04-15 19:51             ` Johannes Weiner
  2008-04-15 19:57               ` Yinghai Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2008-04-15 19:51 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

Hi,

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

> On Tue, Apr 15, 2008 at 4:53 AM, Johannes Weiner <hannes@saeurebad.de> wrote:
>>
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>>  > Andrew Morton wrote:
>>  >> On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>>  >>
>>  >>> Johannes Weiner <hannes@saeurebad.de> writes:
>>  >>>
>>  >>>> Make free_bootmem() look up the node holding the specified address
>>  >>>> range which lets it work transparently on single-node and multi-node
>>  >>>> configurations.
>>  >>> Acked-by: Andi Kleen <andi@firstfloor.org>
>>  >>>
>>  >>> This is far better than the original change it replaces and which
>>  >>> I also objected to in review.
>>  >>>
>>  >>
>>  >> So...  do we think these two patches are sufficiently safe and important for
>>  >> 2.6.25?
>>  >
>>  > It's only strictly needed for .26 I think for some (also slightly
>>  > dubious) changes queued in git-x86.
>>
>>  Does anything yet rely on this new free_bootmem() behaviour?  If not,
>>  the safest thing would be to just revert the original patch in mainline
>>  and drop the second patch completely.
>
> 1. free_bootmem(ramdisk_image, ramdisk_size) in setup_arch of x86_64
> need that
> 2. another patch in x86.git need that.

Ok, to avoid confusion: we are talking about free_bootmem() iterating
over nodes and looking up an area WITHIN a node or free_bootmem()
freeing an area ACROSS nodes?

The first is what my patch does _only_.

	Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15 19:51             ` Johannes Weiner
@ 2008-04-15 19:57               ` Yinghai Lu
  2008-04-15 20:05                 ` Johannes Weiner
  0 siblings, 1 reply; 21+ messages in thread
From: Yinghai Lu @ 2008-04-15 19:57 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

On Tue, Apr 15, 2008 at 12:51 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
> Hi,
>
>
>  "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>
>  > On Tue, Apr 15, 2008 at 4:53 AM, Johannes Weiner <hannes@saeurebad.de> wrote:
>  >>
>  >> Andi Kleen <andi@firstfloor.org> writes:
>  >>
>  >>  > Andrew Morton wrote:
>  >>  >> On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>  >>  >>
>  >>  >>> Johannes Weiner <hannes@saeurebad.de> writes:
>  >>  >>>
>  >>  >>>> Make free_bootmem() look up the node holding the specified address
>  >>  >>>> range which lets it work transparently on single-node and multi-node
>  >>  >>>> configurations.
>  >>  >>> Acked-by: Andi Kleen <andi@firstfloor.org>
>  >>  >>>
>  >>  >>> This is far better than the original change it replaces and which
>  >>  >>> I also objected to in review.
>  >>  >>>
>  >>  >>
>  >>  >> So...  do we think these two patches are sufficiently safe and important for
>  >>  >> 2.6.25?
>  >>  >
>  >>  > It's only strictly needed for .26 I think for some (also slightly
>  >>  > dubious) changes queued in git-x86.
>  >>
>  >>  Does anything yet rely on this new free_bootmem() behaviour?  If not,
>  >>  the safest thing would be to just revert the original patch in mainline
>  >>  and drop the second patch completely.
>  >
>  > 1. free_bootmem(ramdisk_image, ramdisk_size) in setup_arch of x86_64
>  > need that
>  > 2. another patch in x86.git need that.
>
>  Ok, to avoid confusion: we are talking about free_bootmem() iterating
>  over nodes and looking up an area WITHIN a node or free_bootmem()
>  freeing an area ACROSS nodes?
>
>  The first is what my patch does _only_.

Yes, your patch for free_bootmem only can free blocks in the same node.

but the free_bootmem(ramdisk_image,...) in setup_arch could cross
node... , and some other via reserve_early...

for example two nodes, every node have 2G, and in case use
memmap=NN$SS to execlude some memory on node1. the ramdisk could sit
cross the boundary.

YH

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
  2008-04-15 19:57               ` Yinghai Lu
@ 2008-04-15 20:05                 ` Johannes Weiner
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2008-04-15 20:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Ingo Molnar,
	Yasunori Goto, KAMEZAWA Hiroyuki, Christoph Lameter

Hi,

"Yinghai Lu" <yhlu.kernel@gmail.com> writes:

> On Tue, Apr 15, 2008 at 12:51 PM, Johannes Weiner <hannes@saeurebad.de> wrote:
>> Hi,
>>
>>
>>  "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>>
>>  > On Tue, Apr 15, 2008 at 4:53 AM, Johannes Weiner <hannes@saeurebad.de> wrote:
>>  >>
>>  >> Andi Kleen <andi@firstfloor.org> writes:
>>  >>
>>  >>  > Andrew Morton wrote:
>>  >>  >> On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>>  >>  >>
>>  >>  >>> Johannes Weiner <hannes@saeurebad.de> writes:
>>  >>  >>>
>>  >>  >>>> Make free_bootmem() look up the node holding the specified address
>>  >>  >>>> range which lets it work transparently on single-node and multi-node
>>  >>  >>>> configurations.
>>  >>  >>> Acked-by: Andi Kleen <andi@firstfloor.org>
>>  >>  >>>
>>  >>  >>> This is far better than the original change it replaces and which
>>  >>  >>> I also objected to in review.
>>  >>  >>>
>>  >>  >>
>>  >>  >> So...  do we think these two patches are sufficiently safe and important for
>>  >>  >> 2.6.25?
>>  >>  >
>>  >>  > It's only strictly needed for .26 I think for some (also slightly
>>  >>  > dubious) changes queued in git-x86.
>>  >>
>>  >>  Does anything yet rely on this new free_bootmem() behaviour?  If not,
>>  >>  the safest thing would be to just revert the original patch in mainline
>>  >>  and drop the second patch completely.
>>  >
>>  > 1. free_bootmem(ramdisk_image, ramdisk_size) in setup_arch of x86_64
>>  > need that
>>  > 2. another patch in x86.git need that.
>>
>>  Ok, to avoid confusion: we are talking about free_bootmem() iterating
>>  over nodes and looking up an area WITHIN a node or free_bootmem()
>>  freeing an area ACROSS nodes?
>>
>>  The first is what my patch does _only_.
>
> Yes, your patch for free_bootmem only can free blocks in the same
> node.

Yep.

> but the free_bootmem(ramdisk_image,...) in setup_arch could cross
> node... , and some other via reserve_early...
>
> for example two nodes, every node have 2G, and in case use
> memmap=NN$SS to execlude some memory on node1. the ramdisk could sit
> cross the boundary.

Now it gets clear.  Alright, then my patches should be dropped and I'll
whip something up for the 2.6.26 merge window.

	Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-04-15 20:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-12 22:33 [patch 0/2] bootmem: Fix node-setup agnostic free_bootmem() Johannes Weiner
2008-04-12 22:33 ` [patch 1/2] bootmem: Revert "mm: fix boundary checking in free_bootmem_core" Johannes Weiner
2008-04-13  1:55   ` Yinghai Lu
2008-04-12 22:33 ` [patch 2/2] bootmem: Node-setup agnostic free_bootmem() Johannes Weiner
2008-04-13  1:59   ` Yinghai Lu
2008-04-13 10:57     ` Johannes Weiner
2008-04-13 16:56   ` Andi Kleen
2008-04-15  6:23     ` Andrew Morton
2008-04-15  7:04       ` Yinghai Lu
2008-04-15  7:15         ` Andrew Morton
2008-04-15  7:28           ` Yinghai Lu
2008-04-15  7:36             ` Andrew Morton
2008-04-15 11:51               ` Johannes Weiner
2008-04-15 18:52                 ` Yinghai Lu
2008-04-15 19:43                   ` Johannes Weiner
2008-04-15  7:46       ` Andi Kleen
2008-04-15 11:53         ` Johannes Weiner
2008-04-15 18:44           ` Yinghai Lu
2008-04-15 19:51             ` Johannes Weiner
2008-04-15 19:57               ` Yinghai Lu
2008-04-15 20:05                 ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).