LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] mm: make mem_map allocation continuous.
@ 2008-03-11  6:22 Yinghai Lu
  2008-03-11  8:14 ` Ingo Molnar
  2008-03-11  8:18 ` Ingo Molnar
  0 siblings, 2 replies; 6+ messages in thread
From: Yinghai Lu @ 2008-03-11  6:22 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar, Christoph Lameter; +Cc: kernel list

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

[PATCH] mm: make mem_map allocation continuous.

vmemmap allocation current got
 [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
 [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
 [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
 [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
 [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
...

there is 2M hole between them.

the rootcause is that usemap (24 bytes) will be allocated after every 2M
mem_map. and it will push next vmemmap (2M) to next align (2M).

solution:
try to allocate mem_map continously.

after patch, will get
 [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
 [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
 [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
 [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
 [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
...
and usemap will share in page because of they are allocated continuously too.
sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
...

so we make the bootmem allocation more compact and use less memory for usemap.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: vmemmap_1.patch --]
[-- Type: text/x-patch; name=vmemmap_1.patch, Size: 1630 bytes --]


Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c
+++ linux-2.6/mm/sparse.c
@@ -244,6 +244,7 @@ static unsigned long *__init sparse_earl
 	int nid = sparse_early_nid(ms);
 
 	usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
+	printk(KERN_INFO "sparse_early_usemap_alloc: usemap = %p size = %ld\n", usemap, usemap_size());
 	if (usemap)
 		return usemap;
 
@@ -285,6 +286,8 @@ struct page __init *sparse_early_mem_map
 	return NULL;
 }
 
+/* section_map pointer array is 64k */
+static __initdata struct page *section_map[NR_MEM_SECTIONS];
 /*
  * Allocate the accumulated non-linear sections, allocate a mem_map
  * for each and record the physical to section mapping.
@@ -295,14 +298,29 @@ void __init sparse_init(void)
 	struct page *map;
 	unsigned long *usemap;
 
+	/*
+	 * map is using big page (aka 2M in x86 64 bit)
+	 * usemap is less one page (aka 24 bytes)
+	 * so alloc 2M (with 2M align) and 24 bytes in turn will
+	 * make next 2M slip to one more 2M later.
+	 * then in big system, the memmory will have a lot hole...
+	 * here try to allocate 2M pages continously.
+	 */
 	for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
+		section_map[pnum] = sparse_early_mem_map_alloc(pnum);
+	}
 
-		map = sparse_early_mem_map_alloc(pnum);
-		if (!map)
+
+	for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+		if (!present_section_nr(pnum))
 			continue;
 
+		map = section_map[pnum];
+		if (!map)
+			 continue;
+
 		usemap = sparse_early_usemap_alloc(pnum);
 		if (!usemap)
 			continue;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: make mem_map allocation continuous.
  2008-03-11  6:22 [PATCH] mm: make mem_map allocation continuous Yinghai Lu
@ 2008-03-11  8:14 ` Ingo Molnar
  2008-03-11 16:48   ` Yinghai Lu
  2008-03-12 11:39   ` Mel Gorman
  2008-03-11  8:18 ` Ingo Molnar
  1 sibling, 2 replies; 6+ messages in thread
From: Ingo Molnar @ 2008-03-11  8:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Christoph Lameter, kernel list, Andy Whitcroft,
	Mel Gorman


* Yinghai Lu <yhlu.kernel@gmail.com> wrote:

> [PATCH] mm: make mem_map allocation continuous.
> 
> vmemmap allocation current got
>  [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
>  [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
>  [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
>  [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
>  [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
> ...
> 
> there is 2M hole between them.
> 
> the rootcause is that usemap (24 bytes) will be allocated after every 2M
> mem_map. and it will push next vmemmap (2M) to next align (2M).
> 
> solution:
> try to allocate mem_map continously.
> 
> after patch, will get
>  [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
>  [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
>  [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
>  [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
>  [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
> ...
> and usemap will share in page because of they are allocated continuously too.
> sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
> sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
> sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
> sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
> ...
> 
> so we make the bootmem allocation more compact and use less memory for usemap.
> 
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

very nice fix!

i suspect this patch should go via -mm.

>  	usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
> +	printk(KERN_INFO "sparse_early_usemap_alloc: usemap = %p size = %ld\n", usemap, usemap_size());

this should be in a separate patch.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: make mem_map allocation continuous.
  2008-03-11  6:22 [PATCH] mm: make mem_map allocation continuous Yinghai Lu
  2008-03-11  8:14 ` Ingo Molnar
@ 2008-03-11  8:18 ` Ingo Molnar
  1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2008-03-11  8:18 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Andrew Morton, Christoph Lameter, kernel list


find below the patch for -mm inclusion.

	Ingo

------------>
Subject: mm: make mem_map allocation continuous.
From: "Yinghai Lu" <yhlu.kernel@gmail.com>
Date: Mon, 10 Mar 2008 23:22:47 -0700

vmemmap allocation currently has this layout:
 [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
 [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
 [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
 [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
 [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
...

there is a 2M hole between them.

the root cause is that usemap (24 bytes) will be allocated after every
2M mem_map and it will push next vmemmap (2M) to next align (2M).

solution:

try to allocate mem_map continously.

after the patch, we get:
 [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
 [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
 [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
 [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
 [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
...
and usemap will share in page because of they are allocated continuously too.
sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
...

so we make the bootmem allocation more compact and use less memory for usemap.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 mm/sparse.c |   22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

Index: linux-x86.q/mm/sparse.c
===================================================================
--- linux-x86.q.orig/mm/sparse.c
+++ linux-x86.q/mm/sparse.c
@@ -244,6 +244,7 @@ static unsigned long *__init sparse_earl
 	int nid = sparse_early_nid(ms);
 
 	usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
+	printk(KERN_INFO "sparse_early_usemap_alloc: usemap = %p size = %ld\n", usemap, usemap_size());
 	if (usemap)
 		return usemap;
 
@@ -285,6 +286,8 @@ struct page __init *sparse_early_mem_map
 	return NULL;
 }
 
+/* section_map pointer array is 64k */
+static __initdata struct page *section_map[NR_MEM_SECTIONS];
 /*
  * Allocate the accumulated non-linear sections, allocate a mem_map
  * for each and record the physical to section mapping.
@@ -295,14 +298,29 @@ void __init sparse_init(void)
 	struct page *map;
 	unsigned long *usemap;
 
+	/*
+	 * map is using big page (aka 2M in x86 64 bit)
+	 * usemap is less one page (aka 24 bytes)
+	 * so alloc 2M (with 2M align) and 24 bytes in turn will
+	 * make next 2M slip to one more 2M later.
+	 * then in big system, the memmory will have a lot hole...
+	 * here try to allocate 2M pages continously.
+	 */
 	for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
+		section_map[pnum] = sparse_early_mem_map_alloc(pnum);
+	}
 
-		map = sparse_early_mem_map_alloc(pnum);
-		if (!map)
+
+	for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+		if (!present_section_nr(pnum))
 			continue;
 
+		map = section_map[pnum];
+		if (!map)
+			 continue;
+
 		usemap = sparse_early_usemap_alloc(pnum);
 		if (!usemap)
 			continue;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: make mem_map allocation continuous.
  2008-03-11  8:14 ` Ingo Molnar
@ 2008-03-11 16:48   ` Yinghai Lu
  2008-03-12 11:39   ` Mel Gorman
  1 sibling, 0 replies; 6+ messages in thread
From: Yinghai Lu @ 2008-03-11 16:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Christoph Lameter, kernel list, Andy Whitcroft,
	Mel Gorman

On Tue, Mar 11, 2008 at 1:14 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
>
>  * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>
>  > [PATCH] mm: make mem_map allocation continuous.
>
>  >       usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
>  > +     printk(KERN_INFO "sparse_early_usemap_alloc: usemap = %p size = %ld\n", usemap, usemap_size());
>
>  this should be in a separate patch.

that should be removed or change to KERN_DEBUG.

YH

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: make mem_map allocation continuous.
  2008-03-11  8:14 ` Ingo Molnar
  2008-03-11 16:48   ` Yinghai Lu
@ 2008-03-12 11:39   ` Mel Gorman
  2008-03-12 16:40     ` Yinghai Lu
  1 sibling, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2008-03-12 11:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Andrew Morton, Christoph Lameter, kernel list,
	Andy Whitcroft

On (11/03/08 09:14), Ingo Molnar didst pronounce:
> 
> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> 
> > [PATCH] mm: make mem_map allocation continuous.
> > 
> > vmemmap allocation current got
> >  [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
> >  [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
> >  [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
> >  [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
> >  [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
> > ...
> > 
> > there is 2M hole between them.
> > 
> > the rootcause is that usemap (24 bytes) will be allocated after every 2M
> > mem_map. and it will push next vmemmap (2M) to next align (2M).
> > 
> > solution:
> > try to allocate mem_map continously.
> > 
> > after patch, will get
> >  [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
> >  [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
> >  [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
> >  [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
> >  [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
> > ...
> > and usemap will share in page because of they are allocated continuously too.
> > sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
> > sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
> > sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
> > sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
> > ...
> > 
> > so we make the bootmem allocation more compact and use less memory for usemap.
> > 
> > Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
> 
> very nice fix!
> 

Agreed, good work.

> i suspect this patch should go via -mm.
> 
> >  	usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
> > +	printk(KERN_INFO "sparse_early_usemap_alloc: usemap = %p size = %ld\n", usemap, usemap_size());
> 
> this should be in a separate patch.
> 

Should this be KERN_DEBUG instead of KERN_INFO?

I don't have the original mail because I got unsubscribed from the lists
a few days ago and didn't notice (have been having mail issues) so
pardon awkward cut & pastes

> +/* section_map pointer array is 64k */
> +static __initdata struct page *section_map[NR_MEM_SECTIONS];

The size of this varies depending on architecture so the comment may be
misleading.  Maybe a comment like the following would be better?

/*
 * The portions of the mem_map used by SPARSEMEM are allocated in
 * batch and temporarily stored in this array. When sparse_init()
 * completes, the array is discarded
 */

I can see why you file-scoped it because its too large for the stack but
would it be better to allocate it from bootmem instead? It is
available by the time you need to use the array.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: make mem_map allocation continuous.
  2008-03-12 11:39   ` Mel Gorman
@ 2008-03-12 16:40     ` Yinghai Lu
  0 siblings, 0 replies; 6+ messages in thread
From: Yinghai Lu @ 2008-03-12 16:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Ingo Molnar, Andrew Morton, Christoph Lameter, kernel list,
	Andy Whitcroft

On Wed, Mar 12, 2008 at 4:39 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> On (11/03/08 09:14), Ingo Molnar didst pronounce:
>
>
> >
>  > * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>  >
>  > > [PATCH] mm: make mem_map allocation continuous.
>  > >
>  > > vmemmap allocation current got
>  > >  [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
>  > >  [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
>  > >  [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
>  > >  [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
>  > >  [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
>  > > ...
>  > >
>  > > there is 2M hole between them.
>  > >
>  > > the rootcause is that usemap (24 bytes) will be allocated after every 2M
>  > > mem_map. and it will push next vmemmap (2M) to next align (2M).
>  > >
>  > > solution:
>  > > try to allocate mem_map continously.
>  > >
>  > > after patch, will get
>  > >  [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
>  > >  [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
>  > >  [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
>  > >  [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
>  > >  [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
>  > > ...
>  > > and usemap will share in page because of they are allocated continuously too.
>  > > sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
>  > > sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
>  > > sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
>  > > sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
>  > > ...
>  > >
>  > > so we make the bootmem allocation more compact and use less memory for usemap.
>  > >
>  > > Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
>  >
>  > very nice fix!
>  >
>
>  Agreed, good work.
>
>
>  > i suspect this patch should go via -mm.
>  >
>  > >     usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
>  > > +   printk(KERN_INFO "sparse_early_usemap_alloc: usemap = %p size = %ld\n", usemap, usemap_size());
>  >
>  > this should be in a separate patch.
>  >
>
>  Should this be KERN_DEBUG instead of KERN_INFO?
yes. to KERN_DEBUG or removed.
>
>  I don't have the original mail because I got unsubscribed from the lists
>  a few days ago and didn't notice (have been having mail issues) so
>  pardon awkward cut & pastes
>
>
>  > +/* section_map pointer array is 64k */
>  > +static __initdata struct page *section_map[NR_MEM_SECTIONS];
>
>  The size of this varies depending on architecture so the comment may be
>  misleading.  Maybe a comment like the following would be better?
>
>  /*
>   * The portions of the mem_map used by SPARSEMEM are allocated in
>   * batch and temporarily stored in this array. When sparse_init()
>   * completes, the array is discarded
>   */
Yes. for x86_64 is (1<<13)*8 at most. others should much less.
>
>  I can see why you file-scoped it because its too large for the stack but
>  would it be better to allocate it from bootmem instead? It is
>  available by the time you need to use the array.
Yes.
need to after another patch I sent out yesterday to make
free_bootmem_core could handle out of range inputs.

then could use alloc_bootmem and
for_each_online_node(node)
             free_bootmem_node(node, section_map, size);

will send delta to Andrew.

YH

YH

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-03-12 16:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-11  6:22 [PATCH] mm: make mem_map allocation continuous Yinghai Lu
2008-03-11  8:14 ` Ingo Molnar
2008-03-11 16:48   ` Yinghai Lu
2008-03-12 11:39   ` Mel Gorman
2008-03-12 16:40     ` Yinghai Lu
2008-03-11  8:18 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).