LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] mm: fix boundary checking in free_bootmem_core
@ 2008-03-12 1:01 Yinghai Lu
2008-03-12 23:21 ` Yinghai Lu
0 siblings, 1 reply; 14+ messages in thread
From: Yinghai Lu @ 2008-03-12 1:01 UTC (permalink / raw)
To: Andrew Morton, Ingo Molnar, Christoph Lameter; +Cc: kernel list
[-- Attachment #1: Type: text/plain, Size: 252 bytes --]
[PATCH] mm: fix boundary checking in free_bootmem_core
so call it when numa is enabled, we don't know which node have that range.
and make it more robust.
try to trim it to get valid sidx, and eidx.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: check_boundary_free_bootmem_x.patch --]
[-- Type: text/x-patch; name=check_boundary_free_bootmem_x.patch, Size: 1618 bytes --]
Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -125,6 +125,7 @@ static int __init reserve_bootmem_core(b
BUG_ON(!size);
BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
BUG_ON(PFN_UP(addr + size) > bdata->node_low_pfn);
+ BUG_ON(addr < bdata->node_boot_start);
sidx = PFN_DOWN(addr - bdata->node_boot_start);
eidx = PFN_UP(addr + size - bdata->node_boot_start);
@@ -156,21 +157,31 @@ static void __init free_bootmem_core(boo
unsigned long sidx, eidx;
unsigned long i;
+ BUG_ON(!size);
+
+ /* out range */
+ if (addr + size < bdata->node_boot_start ||
+ PFN_DOWN(addr) > bdata->node_low_pfn)
+ return;
/*
* round down end of usable mem, partially free pages are
* considered reserved.
*/
- BUG_ON(!size);
- BUG_ON(PFN_DOWN(addr + size) > bdata->node_low_pfn);
- if (addr < bdata->last_success)
+ if (addr >= bdata->node_boot_start && addr < bdata->last_success)
bdata->last_success = addr;
/*
- * Round up the beginning of the address.
+ * Round up to index to the range.
*/
- sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
+ if (PFN_UP(addr) > PFN_DOWN(bdata->node_boot_start))
+ sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
+ else
+ sidx = 0;
+
eidx = PFN_DOWN(addr + size - bdata->node_boot_start);
+ if (eidx > bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start))
+ eidx = bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start);
for (i = sidx; i < eidx; i++) {
if (unlikely(!test_and_clear_bit(i, bdata->node_bootmem_map)))
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-12 1:01 [PATCH] mm: fix boundary checking in free_bootmem_core Yinghai Lu
@ 2008-03-12 23:21 ` Yinghai Lu
2008-03-12 23:33 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Yinghai Lu @ 2008-03-12 23:21 UTC (permalink / raw)
To: Andrew Morton, Ingo Molnar, Christoph Lameter; +Cc: kernel list
On Tue, Mar 11, 2008 at 6:01 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> [PATCH] mm: fix boundary checking in free_bootmem_core
>
> so call it when numa is enabled, we don't know which node have that range.
> and make it more robust.
>
> try to trim it to get valid sidx, and eidx.
>
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
>
Ingo,
this one need to be applied before
[PATCH] x86_64: reserve dma32 early for gart
YH
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-12 23:21 ` Yinghai Lu
@ 2008-03-12 23:33 ` Andrew Morton
2008-03-13 1:11 ` Yinghai Lu
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2008-03-12 23:33 UTC (permalink / raw)
To: Yinghai Lu; +Cc: mingo, clameter, linux-kernel
On Wed, 12 Mar 2008 16:21:22 -0700
"Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> On Tue, Mar 11, 2008 at 6:01 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> > [PATCH] mm: fix boundary checking in free_bootmem_core
> >
> > so call it when numa is enabled, we don't know which node have that range.
> > and make it more robust.
> >
> > try to trim it to get valid sidx, and eidx.
> >
> > Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
> >
>
> Ingo,
>
> this one need to be applied before
> [PATCH] x86_64: reserve dma32 early for gart
>
So perhaps we should merge
mm-fix-boundary-checking-in-free_bootmem_core.patch into 2.6.25?
<looks at it>
Sorry, but I find the changelog very hard to amke sense of. I presently
have:
So call it when numa is enabled, we don't know which node have that
range. and make it more robust.
Try to trim it to get valid sidx, and eidx.
Could you please expand on this?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-12 23:33 ` Andrew Morton
@ 2008-03-13 1:11 ` Yinghai Lu
2008-03-13 1:22 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Yinghai Lu @ 2008-03-13 1:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: mingo, clameter, linux-kernel
On Wed, Mar 12, 2008 at 4:33 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 12 Mar 2008 16:21:22 -0700
>
>
> "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>
> > On Tue, Mar 11, 2008 at 6:01 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> > > [PATCH] mm: fix boundary checking in free_bootmem_core
> > >
> > > so call it when numa is enabled, we don't know which node have that range.
> > > and make it more robust.
> > >
> > > try to trim it to get valid sidx, and eidx.
> > >
> > > Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
> > >
> >
> > Ingo,
> >
> > this one need to be applied before
> > [PATCH] x86_64: reserve dma32 early for gart
> >
>
> So perhaps we should merge
> mm-fix-boundary-checking-in-free_bootmem_core.patch into 2.6.25?
also we can remove some NODE_DATA(0) assumpation on free_bootmem path.
>
> <looks at it>
>
> Sorry, but I find the changelog very hard to amke sense of. I presently
> have:
>
>
> So call it when numa is enabled, we don't know which node have that
> range. and make it more robust.
>
> Try to trim it to get valid sidx, and eidx.
>
> Could you please expand on this?
please check following...
---
with numa enabled, some caller could have range on one node but try to
free that on other node.
that could cause some pages is freed wrongly.
For example: when we try to allocate 128g boot ram early for
gart/swiotlb, and free that range later so gart/swiotlb could get some
range afterwards. then with this patch, we don't need to care which
node is the range, just loop to call free_bootmem_node for all online
nodes.
this patch make free_bootmem_core more robust by trim the sidx and
eidx according the ram range that
node have.
----
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-13 1:11 ` Yinghai Lu
@ 2008-03-13 1:22 ` Andrew Morton
2008-03-13 21:59 ` Andi Kleen
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2008-03-13 1:22 UTC (permalink / raw)
To: Yinghai Lu
Cc: mingo, clameter, linux-kernel, Andi Kleen, Yasunori Goto,
KAMEZAWA Hiroyuki
On Wed, 12 Mar 2008 18:11:41 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> > <looks at it>
> >
> > Sorry, but I find the changelog very hard to amke sense of. I presently
> > have:
> >
> >
> > So call it when numa is enabled, we don't know which node have that
> > range. and make it more robust.
> >
> > Try to trim it to get valid sidx, and eidx.
> >
> > Could you please expand on this?
>
> please check following...
>
Heaps better, thanks ;) Below is what I now have.
(cc's people)
Guys, could you please review this? Maybe test it a bit?
Thanks.
From: "Yinghai Lu" <yhlu.kernel@gmail.com>
With numa enabled, some callers could have a range o fmemory on one node but
try to free that on other node. This can cause some pages to be freed
wrongly.
For example: when we try to allocate 128g boot ram early for gart/swiotlb, and
free that range later so gart/swiotlb can get some range afterwards.
With this patch, we don't need to care which node holds the range, just loop
to call free_bootmem_node for all online nodes.
This patch make free_bootmem_core() more robust by trimming the sidx and eidx
according the ram range that the node has.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/bootmem.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff -puN mm/bootmem.c~mm-fix-boundary-checking-in-free_bootmem_core mm/bootmem.c
--- a/mm/bootmem.c~mm-fix-boundary-checking-in-free_bootmem_core
+++ a/mm/bootmem.c
@@ -125,6 +125,7 @@ static int __init reserve_bootmem_core(b
BUG_ON(!size);
BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
BUG_ON(PFN_UP(addr + size) > bdata->node_low_pfn);
+ BUG_ON(addr < bdata->node_boot_start);
sidx = PFN_DOWN(addr - bdata->node_boot_start);
eidx = PFN_UP(addr + size - bdata->node_boot_start);
@@ -156,21 +157,31 @@ static void __init free_bootmem_core(boo
unsigned long sidx, eidx;
unsigned long i;
+ BUG_ON(!size);
+
+ /* out range */
+ if (addr + size < bdata->node_boot_start ||
+ PFN_DOWN(addr) > bdata->node_low_pfn)
+ return;
/*
* round down end of usable mem, partially free pages are
* considered reserved.
*/
- BUG_ON(!size);
- BUG_ON(PFN_DOWN(addr + size) > bdata->node_low_pfn);
- if (addr < bdata->last_success)
+ if (addr >= bdata->node_boot_start && addr < bdata->last_success)
bdata->last_success = addr;
/*
- * Round up the beginning of the address.
+ * Round up to index to the range.
*/
- sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
+ if (PFN_UP(addr) > PFN_DOWN(bdata->node_boot_start))
+ sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
+ else
+ sidx = 0;
+
eidx = PFN_DOWN(addr + size - bdata->node_boot_start);
+ if (eidx > bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start))
+ eidx = bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start);
for (i = sidx; i < eidx; i++) {
if (unlikely(!test_and_clear_bit(i, bdata->node_bootmem_map)))
_
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-13 1:22 ` Andrew Morton
@ 2008-03-13 21:59 ` Andi Kleen
2008-03-13 22:22 ` Yinghai Lu
0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2008-03-13 21:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Yinghai Lu, mingo, clameter, linux-kernel, Yasunori Goto,
KAMEZAWA Hiroyuki
On Thursday 13 March 2008 02:22:40 Andrew Morton wrote:
> On Wed, 12 Mar 2008 18:11:41 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
>
> > > <looks at it>
> > >
> > > Sorry, but I find the changelog very hard to amke sense of. I presently
> > > have:
> > >
> > >
> > > So call it when numa is enabled, we don't know which node have that
> > > range. and make it more robust.
> > >
> > > Try to trim it to get valid sidx, and eidx.
> > >
> > > Could you please expand on this?
> >
> > please check following...
> >
>
> Heaps better, thanks ;) Below is what I now have.
>
> (cc's people)
>
> Guys, could you please review this? Maybe test it a bit?
>
> Thanks.
>
>
> From: "Yinghai Lu" <yhlu.kernel@gmail.com>
>
> With numa enabled, some callers could have a range o fmemory on one node but
> try to free that on other node. This can cause some pages to be freed
> wrongly.
Concrete examples?
If that happens it's really just a problem that the bootmem API
is wrong. I was always annoyed by the hardcoded NODE_DATA(0)s in
free_bootmem.
I would suggest if that happens you just fix free_bootmem to search
for the correct node instead of hardcoding 0 and then eliminate
free_bootmem_node() everywhere and replace it with free_bootmem()
>
> For example: when we try to allocate 128g boot ram early for gart/swiotlb, and
> free that range later so gart/swiotlb can get some range afterwards.
I'm confused by the example. AFAIK there is no memory freeing in either
gart nor swiotlb. At least there wasn't until very recently.
>
> With this patch, we don't need to care which node holds the range, just loop
> to call free_bootmem_node for all online nodes.
>
> This patch make free_bootmem_core() more robust by trimming the sidx and eidx
> according the ram range that the node has.
I think you should just kill free_bootmem_node() and replace it everywhere
with your improved free_bootmem()
> @@ -125,6 +125,7 @@ static int __init reserve_bootmem_core(b
> BUG_ON(!size);
> BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
> BUG_ON(PFN_UP(addr + size) > bdata->node_low_pfn);
> + BUG_ON(addr < bdata->node_boot_start);
That seems unrelated?
>
> sidx = PFN_DOWN(addr - bdata->node_boot_start);
> eidx = PFN_UP(addr + size - bdata->node_boot_start);
> @@ -156,21 +157,31 @@ static void __init free_bootmem_core(boo
> unsigned long sidx, eidx;
> unsigned long i;
>
> + BUG_ON(!size);
> +
> + /* out range */
> + if (addr + size < bdata->node_boot_start ||
> + PFN_DOWN(addr) > bdata->node_low_pfn)
> + return;
I don't really like this silent return without error value.
There should be a BUG() or something for someone passing addresses
outside any node. This check should be probably in the caller.
> /*
> * round down end of usable mem, partially free pages are
> * considered reserved.
> */
> - BUG_ON(!size);
> - BUG_ON(PFN_DOWN(addr + size) > bdata->node_low_pfn);
>
> - if (addr < bdata->last_success)
> + if (addr >= bdata->node_boot_start && addr < bdata->last_success)
> bdata->last_success = addr;
>
> /*
> - * Round up the beginning of the address.
> + * Round up to index to the range.
> */
> - sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
> + if (PFN_UP(addr) > PFN_DOWN(bdata->node_boot_start))
> + sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
> + else
> + sidx = 0;
> +
> eidx = PFN_DOWN(addr + size - bdata->node_boot_start);
> + if (eidx > bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start))
> + eidx = bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start);
I'm not sure for what these other changes are needed? Just adding the
initial range check should be enough.
If you want to fix something else unrelated please do separate patches.
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-13 21:59 ` Andi Kleen
@ 2008-03-13 22:22 ` Yinghai Lu
2008-03-14 11:58 ` Andi Kleen
0 siblings, 1 reply; 14+ messages in thread
From: Yinghai Lu @ 2008-03-13 22:22 UTC (permalink / raw)
To: Andi Kleen
Cc: Andrew Morton, mingo, clameter, linux-kernel, Yasunori Goto,
KAMEZAWA Hiroyuki
On Thu, Mar 13, 2008 at 2:59 PM, Andi Kleen <ak@suse.de> wrote:
> On Thursday 13 March 2008 02:22:40 Andrew Morton wrote:
> > On Wed, 12 Mar 2008 18:11:41 -0700 "Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> >
> > > > <looks at it>
> > > >
> > > > Sorry, but I find the changelog very hard to amke sense of. I presently
> > > > have:
> > > >
> > > >
> > > > So call it when numa is enabled, we don't know which node have that
> > > > range. and make it more robust.
> > > >
> > > > Try to trim it to get valid sidx, and eidx.
> > > >
> > > > Could you please expand on this?
> > >
> > > please check following...
> > >
> >
> > Heaps better, thanks ;) Below is what I now have.
> >
> > (cc's people)
> >
> > Guys, could you please review this? Maybe test it a bit?
> >
> > Thanks.
> >
> >
> > From: "Yinghai Lu" <yhlu.kernel@gmail.com>
> >
> > With numa enabled, some callers could have a range o fmemory on one node but
> > try to free that on other node. This can cause some pages to be freed
> > wrongly.
>
> Concrete examples?
>
> If that happens it's really just a problem that the bootmem API
> is wrong. I was always annoyed by the hardcoded NODE_DATA(0)s in
> free_bootmem.
>
> I would suggest if that happens you just fix free_bootmem to search
> for the correct node instead of hardcoding 0 and then eliminate
> free_bootmem_node() everywhere and replace it with free_bootmem()
>
> >
> > For example: when we try to allocate 128g boot ram early for gart/swiotlb, and
> > free that range later so gart/swiotlb can get some range afterwards.
>
> I'm confused by the example. AFAIK there is no memory freeing in either
> gart nor swiotlb. At least there wasn't until very recently.
For big system when numa=off or disabled, vmemmap will use 3.6g ram
when you have 256g. if you don't allocate the PMD continuous.
then i tried to reserve 64M or 128M RAM before that, and free that
before gart/switotble try to allloc_bootmem under 4g.
that patch will make the system without ram on node0 not happy.
because of free_bootmem is hardcoded to use node0.
>
>
> >
> > With this patch, we don't need to care which node holds the range, just loop
> > to call free_bootmem_node for all online nodes.
> >
> > This patch make free_bootmem_core() more robust by trimming the sidx and eidx
> > according the ram range that the node has.
>
> I think you should just kill free_bootmem_node() and replace it everywhere
> with your improved free_bootmem()
using phys_to_nid()? it seems we only have that on x86_64.
also there is assumpation that reserve_bootmem_node, reserver_bootmem
can not cross the nodes.
I want to remove that constrient too.
YH
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-13 22:22 ` Yinghai Lu
@ 2008-03-14 11:58 ` Andi Kleen
2008-03-14 16:44 ` Yinghai Lu
0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2008-03-14 11:58 UTC (permalink / raw)
To: Yinghai Lu
Cc: Andi Kleen, Andrew Morton, mingo, clameter, linux-kernel,
Yasunori Goto, KAMEZAWA Hiroyuki
"Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>
> then i tried to reserve 64M or 128M RAM before that, and free that
> before gart/switotble try to allloc_bootmem under 4g.
Sounds like an incredible hack. There are far better ways to do that
for bootmem allocations. e.g. you can just specify a high enough "goal"
That is how swiotlb solves a similar problem (at least before my
mask allocator rewrite)
> > with your improved free_bootmem()
>
> using phys_to_nid()? it seems we only have that on x86_64.
pfn/page_to_nid() is generic afaik.
> also there is assumpation that reserve_bootmem_node, reserver_bootmem
> can not cross the nodes.
> I want to remove that constrient too.
Makes sense, but that will be much more work and should be all separated.
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-14 11:58 ` Andi Kleen
@ 2008-03-14 16:44 ` Yinghai Lu
2008-03-14 16:53 ` Andi Kleen
0 siblings, 1 reply; 14+ messages in thread
From: Yinghai Lu @ 2008-03-14 16:44 UTC (permalink / raw)
To: Andi Kleen
Cc: Andi Kleen, Andrew Morton, mingo, clameter, linux-kernel,
Yasunori Goto, KAMEZAWA Hiroyuki
On 14 Mar 2008 12:58:44 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> >
> > then i tried to reserve 64M or 128M RAM before that, and free that
> > before gart/switotble try to allloc_bootmem under 4g.
>
> Sounds like an incredible hack. There are far better ways to do that
> for bootmem allocations. e.g. you can just specify a high enough "goal"
> That is how swiotlb solves a similar problem (at least before my
> mask allocator rewrite)
I don't think so.
anyway, otherway to workaround it is
change
return __earlyonly_bootmem_alloc(node, size, size,
__pa(MAX_DMA_ADDRESS));
in vmemmap_alloc_block to
return __earlyonly_bootmem_alloc(node, size, size,
__pa(MAX_DMA_ADDRESS + (1<<27)));
to make room for gart. but that is global change. and may affect other
platform. and don't make sure gart will get it.
also i assume swiotlb need that range is less than 4g.
>
>
> > > with your improved free_bootmem()
> >
> > using phys_to_nid()? it seems we only have that on x86_64.
>
> pfn/page_to_nid() is generic afaik.
still in bootmem stage? page->flags is ready at that time?
YH
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-14 16:44 ` Yinghai Lu
@ 2008-03-14 16:53 ` Andi Kleen
2008-03-14 17:36 ` Yinghai Lu
0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2008-03-14 16:53 UTC (permalink / raw)
To: Yinghai Lu
Cc: Andi Kleen, Andi Kleen, Andrew Morton, mingo, clameter,
linux-kernel, Yasunori Goto, KAMEZAWA Hiroyuki
On Fri, Mar 14, 2008 at 09:44:50AM -0700, Yinghai Lu wrote:
> On 14 Mar 2008 12:58:44 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> > >
> > > then i tried to reserve 64M or 128M RAM before that, and free that
> > > before gart/switotble try to allloc_bootmem under 4g.
> >
> > Sounds like an incredible hack. There are far better ways to do that
> > for bootmem allocations. e.g. you can just specify a high enough "goal"
> > That is how swiotlb solves a similar problem (at least before my
> > mask allocator rewrite)
>
> I don't think so.
>
> anyway, otherway to workaround it is
> change
> return __earlyonly_bootmem_alloc(node, size, size,
> __pa(MAX_DMA_ADDRESS));
> in vmemmap_alloc_block to
> return __earlyonly_bootmem_alloc(node, size, size,
> __pa(MAX_DMA_ADDRESS + (1<<27)));
> to make room for gart. but that is global change. and may affect other
> platform.
You can just make it an optional architecture defined macro
> and don't make sure gart will get it.
Has nothing to do with the gart?
>
> also i assume swiotlb need that range is less than 4g.
The normal rule is that anybody who needs big bootmem allocations
need to make sure they're high enough to not fill up first 4GB.
For small allocations like most of bootmem it doesn't matter because
they're, um, small.
If vmemmap doesn't do that vmemmap needs to be fixed.
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-14 16:53 ` Andi Kleen
@ 2008-03-14 17:36 ` Yinghai Lu
2008-03-21 19:44 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Yinghai Lu @ 2008-03-14 17:36 UTC (permalink / raw)
To: Andi Kleen
Cc: Andi Kleen, Andrew Morton, mingo, clameter, linux-kernel,
Yasunori Goto, KAMEZAWA Hiroyuki
On Fri, Mar 14, 2008 at 9:53 AM, Andi Kleen <andi@firstfloor.org> wrote:
> On Fri, Mar 14, 2008 at 09:44:50AM -0700, Yinghai Lu wrote:
> > On 14 Mar 2008 12:58:44 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> > > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> > > >
> > > > then i tried to reserve 64M or 128M RAM before that, and free that
> > > > before gart/switotble try to allloc_bootmem under 4g.
> > >
> > > Sounds like an incredible hack. There are far better ways to do that
> > > for bootmem allocations. e.g. you can just specify a high enough "goal"
> > > That is how swiotlb solves a similar problem (at least before my
> > > mask allocator rewrite)
> >
> > I don't think so.
> >
> > anyway, otherway to workaround it is
> > change
> > return __earlyonly_bootmem_alloc(node, size, size,
> > __pa(MAX_DMA_ADDRESS));
> > in vmemmap_alloc_block to
> > return __earlyonly_bootmem_alloc(node, size, size,
> > __pa(MAX_DMA_ADDRESS + (1<<27)));
> > to make room for gart. but that is global change. and may affect other
> > platform.
>
> You can just make it an optional architecture defined macro
it is hard to use MACRO, if someone have allsysconfig, that will make
kernel code use a lot.
>
>
> > and don't make sure gart will get it.
>
> Has nothing to do with the gart?
>
>
>
> >
> > also i assume swiotlb need that range is less than 4g.
>
> The normal rule is that anybody who needs big bootmem allocations
> need to make sure they're high enough to not fill up first 4GB.
> For small allocations like most of bootmem it doesn't matter because
> they're, um, small.
>
> If vmemmap doesn't do that vmemmap needs to be fixed.
how to define big?
it has hundreds of 2M block. when numa is on, they span on all nodes,
and if numa is off, they are sitting on first_online_node.
YH
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-14 17:36 ` Yinghai Lu
@ 2008-03-21 19:44 ` Andrew Morton
2008-03-21 20:00 ` Ingo Molnar
2008-03-21 21:54 ` Thomas Gleixner
0 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2008-03-21 19:44 UTC (permalink / raw)
To: Yinghai Lu
Cc: andi, ak, mingo, clameter, linux-kernel, y-goto, kamezawa.hiroyu
On Fri, 14 Mar 2008 10:36:52 -0700
"Yinghai Lu" <yhlu.kernel@gmail.com> wrote:
> On Fri, Mar 14, 2008 at 9:53 AM, Andi Kleen <andi@firstfloor.org> wrote:
> > On Fri, Mar 14, 2008 at 09:44:50AM -0700, Yinghai Lu wrote:
> > > On 14 Mar 2008 12:58:44 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> > > > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> > > > >
> > > > > then i tried to reserve 64M or 128M RAM before that, and free that
> > > > > before gart/switotble try to allloc_bootmem under 4g.
> > > >
> > > > Sounds like an incredible hack. There are far better ways to do that
> > > > for bootmem allocations. e.g. you can just specify a high enough "goal"
> > > > That is how swiotlb solves a similar problem (at least before my
> > > > mask allocator rewrite)
> > >
> > > I don't think so.
> > >
> > > anyway, otherway to workaround it is
> > > change
> > > return __earlyonly_bootmem_alloc(node, size, size,
> > > __pa(MAX_DMA_ADDRESS));
> > > in vmemmap_alloc_block to
> > > return __earlyonly_bootmem_alloc(node, size, size,
> > > __pa(MAX_DMA_ADDRESS + (1<<27)));
> > > to make room for gart. but that is global change. and may affect other
> > > platform.
> >
> > You can just make it an optional architecture defined macro
> it is hard to use MACRO, if someone have allsysconfig, that will make
> kernel code use a lot.
> >
> >
> > > and don't make sure gart will get it.
> >
> > Has nothing to do with the gart?
> >
> >
> >
> > >
> > > also i assume swiotlb need that range is less than 4g.
> >
> > The normal rule is that anybody who needs big bootmem allocations
> > need to make sure they're high enough to not fill up first 4GB.
> > For small allocations like most of bootmem it doesn't matter because
> > they're, um, small.
> >
> > If vmemmap doesn't do that vmemmap needs to be fixed.
>
> how to define big?
> it has hundreds of 2M block. when numa is on, they span on all nodes,
> and if numa is off, they are sitting on first_online_node.
>
So Ingo has now merged some x86 patches which apparently had a dependency
upon this patch: "otherwise free_bootmem_node in dma32_free could do sth
bad.".
I had this patch on hold awaiting conclusive feedback from Andi. It looks
like it needs to be merged asap and any remaining problems should be
addressed separately.
Here's what I have:
From: "Yinghai Lu" <yhlu.kernel@gmail.com>
With numa enabled, some callers could have a range o fmemory on one node but
try to free that on other node. This can cause some pages to be freed
wrongly.
For example: when we try to allocate 128g boot ram early for gart/swiotlb, and
free that range later so gart/swiotlb can get some range afterwards.
With this patch, we don't need to care which node holds the range, just loop
to call free_bootmem_node for all online nodes.
This patch makes free_bootmem_core() more robust by trimming the sidx and eidx
according the ram range that the node has.
And make the free_bootmem_core handle this out of range case. We could use
bdata_list to make sure the range can be freed for sure. So next time, we
don't need to loop online nodes and could use free_bootmem directly.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/bootmem.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff -puN mm/bootmem.c~mm-fix-boundary-checking-in-free_bootmem_core mm/bootmem.c
--- a/mm/bootmem.c~mm-fix-boundary-checking-in-free_bootmem_core
+++ a/mm/bootmem.c
@@ -125,6 +125,7 @@ static int __init reserve_bootmem_core(b
BUG_ON(!size);
BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
BUG_ON(PFN_UP(addr + size) > bdata->node_low_pfn);
+ BUG_ON(addr < bdata->node_boot_start);
sidx = PFN_DOWN(addr - bdata->node_boot_start);
eidx = PFN_UP(addr + size - bdata->node_boot_start);
@@ -156,21 +157,31 @@ static void __init free_bootmem_core(boo
unsigned long sidx, eidx;
unsigned long i;
+ BUG_ON(!size);
+
+ /* out range */
+ if (addr + size < bdata->node_boot_start ||
+ PFN_DOWN(addr) > bdata->node_low_pfn)
+ return;
/*
* round down end of usable mem, partially free pages are
* considered reserved.
*/
- BUG_ON(!size);
- BUG_ON(PFN_DOWN(addr + size) > bdata->node_low_pfn);
- if (addr < bdata->last_success)
+ if (addr >= bdata->node_boot_start && addr < bdata->last_success)
bdata->last_success = addr;
/*
- * Round up the beginning of the address.
+ * Round up to index to the range.
*/
- sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
+ if (PFN_UP(addr) > PFN_DOWN(bdata->node_boot_start))
+ sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
+ else
+ sidx = 0;
+
eidx = PFN_DOWN(addr + size - bdata->node_boot_start);
+ if (eidx > bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start))
+ eidx = bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start);
for (i = sidx; i < eidx; i++) {
if (unlikely(!test_and_clear_bit(i, bdata->node_bootmem_map)))
@@ -421,7 +432,9 @@ int __init reserve_bootmem(unsigned long
void __init free_bootmem(unsigned long addr, unsigned long size)
{
- free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
+ bootmem_data_t *bdata;
+ list_for_each_entry(bdata, &bdata_list, list)
+ free_bootmem_core(bdata, addr, size);
}
unsigned long __init free_all_bootmem(void)
_
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-21 19:44 ` Andrew Morton
@ 2008-03-21 20:00 ` Ingo Molnar
2008-03-21 21:54 ` Thomas Gleixner
1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2008-03-21 20:00 UTC (permalink / raw)
To: Andrew Morton
Cc: Yinghai Lu, andi, ak, clameter, linux-kernel, y-goto, kamezawa.hiroyu
* Andrew Morton <akpm@linux-foundation.org> wrote:
> diff -puN mm/bootmem.c~mm-fix-boundary-checking-in-free_bootmem_core mm/bootmem.c
> --- a/mm/bootmem.c~mm-fix-boundary-checking-in-free_bootmem_core
> +++ a/mm/bootmem.c
> @@ -125,6 +125,7 @@ static int __init reserve_bootmem_core(b
> BUG_ON(!size);
> BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
> BUG_ON(PFN_UP(addr + size) > bdata->node_low_pfn);
> + BUG_ON(addr < bdata->node_boot_start);
>
> sidx = PFN_DOWN(addr - bdata->node_boot_start);
> eidx = PFN_UP(addr + size - bdata->node_boot_start);
> @@ -156,21 +157,31 @@ static void __init free_bootmem_core(boo
> unsigned long sidx, eidx;
> unsigned long i;
>
> + BUG_ON(!size);
> +
> + /* out range */
> + if (addr + size < bdata->node_boot_start ||
> + PFN_DOWN(addr) > bdata->node_low_pfn)
> + return;
> /*
> * round down end of usable mem, partially free pages are
> * considered reserved.
> */
> - BUG_ON(!size);
> - BUG_ON(PFN_DOWN(addr + size) > bdata->node_low_pfn);
>
> - if (addr < bdata->last_success)
> + if (addr >= bdata->node_boot_start && addr < bdata->last_success)
> bdata->last_success = addr;
>
> /*
> - * Round up the beginning of the address.
> + * Round up to index to the range.
> */
> - sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
> + if (PFN_UP(addr) > PFN_DOWN(bdata->node_boot_start))
> + sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
> + else
> + sidx = 0;
> +
> eidx = PFN_DOWN(addr + size - bdata->node_boot_start);
> + if (eidx > bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start))
> + eidx = bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start);
>
> for (i = sidx; i < eidx; i++) {
> if (unlikely(!test_and_clear_bit(i, bdata->node_bootmem_map)))
> @@ -421,7 +432,9 @@ int __init reserve_bootmem(unsigned long
>
> void __init free_bootmem(unsigned long addr, unsigned long size)
> {
> - free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
> + bootmem_data_t *bdata;
> + list_for_each_entry(bdata, &bdata_list, list)
> + free_bootmem_core(bdata, addr, size);
> }
>
> unsigned long __init free_all_bootmem(void)
note, this combination is quite well tested now, on various x86 systems,
small and large alike, and about a 100 randconfigs booted up.
Acked-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix boundary checking in free_bootmem_core
2008-03-21 19:44 ` Andrew Morton
2008-03-21 20:00 ` Ingo Molnar
@ 2008-03-21 21:54 ` Thomas Gleixner
1 sibling, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2008-03-21 21:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Yinghai Lu, andi, ak, mingo, clameter, linux-kernel, y-goto,
kamezawa.hiroyu
On Fri, 21 Mar 2008, Andrew Morton wrote:
>
> So Ingo has now merged some x86 patches which apparently had a dependency
> upon this patch: "otherwise free_bootmem_node in dma32_free could do sth
> bad.".
>
> I had this patch on hold awaiting conclusive feedback from Andi. It looks
> like it needs to be merged asap and any remaining problems should be
> addressed separately.
I'm too tired to decode all the dependencies and interactions right
now. I will look into that tomorrow morning when my brain works again.
Thanks,
tglx
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-03-21 21:55 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-12 1:01 [PATCH] mm: fix boundary checking in free_bootmem_core Yinghai Lu
2008-03-12 23:21 ` Yinghai Lu
2008-03-12 23:33 ` Andrew Morton
2008-03-13 1:11 ` Yinghai Lu
2008-03-13 1:22 ` Andrew Morton
2008-03-13 21:59 ` Andi Kleen
2008-03-13 22:22 ` Yinghai Lu
2008-03-14 11:58 ` Andi Kleen
2008-03-14 16:44 ` Yinghai Lu
2008-03-14 16:53 ` Andi Kleen
2008-03-14 17:36 ` Yinghai Lu
2008-03-21 19:44 ` Andrew Morton
2008-03-21 20:00 ` Ingo Molnar
2008-03-21 21:54 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).