LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
@ 2008-11-06  0:16 KOSAKI Motohiro
  2008-11-06 14:56 ` Christoph Lameter
  2008-11-06 16:46 ` Mel Gorman
  0 siblings, 2 replies; 17+ messages in thread
From: KOSAKI Motohiro @ 2008-11-06  0:16 UTC (permalink / raw)
  To: LKML, Christoph Lameter, Mel Gorman, linux-mm; +Cc: kosaki.motohiro

MIGRATE_RESERVE mean that the page is for emergency.
So it shouldn't be cached in pcp.

otherwise, the system have unnecessary memory starvation risk
because other cpu can't use this emergency pages.



Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Christoph Lameter <cl@linux-foundation.org>

---
 mm/page_alloc.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2008-11-06 06:01:15.000000000 +0900
+++ b/mm/page_alloc.c	2008-11-06 06:27:41.000000000 +0900
@@ -1002,6 +1002,7 @@ static void free_hot_cold_page(struct pa
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
+	int migratetype = get_pageblock_migratetype(page);
 
 	if (PageAnon(page))
 		page->mapping = NULL;
@@ -1018,16 +1019,25 @@ static void free_hot_cold_page(struct pa
 	pcp = &zone_pcp(zone, get_cpu())->pcp;
 	local_irq_save(flags);
 	__count_vm_event(PGFREE);
+
+	set_page_private(page, migratetype);
+
+	/* the page for emergency shouldn't be cached */
+	if (migratetype == MIGRATE_RESERVE) {
+		free_one_page(zone, page, 0);
+		goto out;
+	}
 	if (cold)
 		list_add_tail(&page->lru, &pcp->list);
 	else
 		list_add(&page->lru, &pcp->list);
-	set_page_private(page, get_pageblock_migratetype(page));
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
 		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
 		pcp->count -= pcp->batch;
 	}
+
+out:
 	local_irq_restore(flags);
 	put_cpu();
 }



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-06  0:16 [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp KOSAKI Motohiro
@ 2008-11-06 14:56 ` Christoph Lameter
  2008-11-06 16:46 ` Mel Gorman
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Lameter @ 2008-11-06 14:56 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Mel Gorman, linux-mm

And the fastpath gets even more complex. Sigh.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-06  0:16 [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp KOSAKI Motohiro
  2008-11-06 14:56 ` Christoph Lameter
@ 2008-11-06 16:46 ` Mel Gorman
  2008-11-07  1:42   ` KAMEZAWA Hiroyuki
  2008-11-07  4:37   ` KOSAKI Motohiro
  1 sibling, 2 replies; 17+ messages in thread
From: Mel Gorman @ 2008-11-06 16:46 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Christoph Lameter, linux-mm

On Thu, Nov 06, 2008 at 09:16:58AM +0900, KOSAKI Motohiro wrote:
> MIGRATE_RESERVE mean that the page is for emergency.
> So it shouldn't be cached in pcp.
> 

It doesn't necessarily mean it's for emergencys. MIGRATE_RESERVE is one
or more pageblocks at the beginning of the zone. While it's possible
that the minimum page reserve for GFP_ATOMIC is located here, it's not
mandatory.

What MIGRATE_RESERVE can help is high-order atomic allocations used by
some network drivers (a wireless one is what led to MIGRATE_RESERVE). As
they are high-order allocations, they would be returned to the buddy
allocator anyway.

What your patch may help is the situation where the system is under intense
memory pressure, is dipping routinely into the lowmem reserves and mixing
with high-order atomic allocations. This seems a bit extreme.

> otherwise, the system have unnecessary memory starvation risk
> because other cpu can't use this emergency pages.
> 
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Christoph Lameter <cl@linux-foundation.org>
> 

This patch seems functionally sound but as Christoph points out, this
adds another branch to the fast path. Now, I ran some tests and those that
completed didn't show any problems but adding branches in the fast path can
eventually lead to hard-to-detect performance problems.

Do you have a situation in mind that this patch fixes up?

Thanks

> ---
>  mm/page_alloc.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c	2008-11-06 06:01:15.000000000 +0900
> +++ b/mm/page_alloc.c	2008-11-06 06:27:41.000000000 +0900
> @@ -1002,6 +1002,7 @@ static void free_hot_cold_page(struct pa
>  	struct zone *zone = page_zone(page);
>  	struct per_cpu_pages *pcp;
>  	unsigned long flags;
> +	int migratetype = get_pageblock_migratetype(page);
>  
>  	if (PageAnon(page))
>  		page->mapping = NULL;
> @@ -1018,16 +1019,25 @@ static void free_hot_cold_page(struct pa
>  	pcp = &zone_pcp(zone, get_cpu())->pcp;
>  	local_irq_save(flags);
>  	__count_vm_event(PGFREE);
> +
> +	set_page_private(page, migratetype);
> +
> +	/* the page for emergency shouldn't be cached */
> +	if (migratetype == MIGRATE_RESERVE) {
> +		free_one_page(zone, page, 0);
> +		goto out;
> +	}
>  	if (cold)
>  		list_add_tail(&page->lru, &pcp->list);
>  	else
>  		list_add(&page->lru, &pcp->list);
> -	set_page_private(page, get_pageblock_migratetype(page));
>  	pcp->count++;
>  	if (pcp->count >= pcp->high) {
>  		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
>  		pcp->count -= pcp->batch;
>  	}
> +
> +out:
>  	local_irq_restore(flags);
>  	put_cpu();
>  }
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-06 16:46 ` Mel Gorman
@ 2008-11-07  1:42   ` KAMEZAWA Hiroyuki
  2008-11-07 10:42     ` Mel Gorman
  2008-11-07  4:37   ` KOSAKI Motohiro
  1 sibling, 1 reply; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-07  1:42 UTC (permalink / raw)
  To: Mel Gorman; +Cc: KOSAKI Motohiro, LKML, Christoph Lameter, linux-mm

On Thu, 6 Nov 2008 16:46:45 +0000
Mel Gorman <mel@csn.ul.ie> wrote:
> > otherwise, the system have unnecessary memory starvation risk
> > because other cpu can't use this emergency pages.
> > 
> > 
> > 
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > CC: Mel Gorman <mel@csn.ul.ie>
> > CC: Christoph Lameter <cl@linux-foundation.org>
> > 
> 
> This patch seems functionally sound but as Christoph points out, this
> adds another branch to the fast path. Now, I ran some tests and those that
> completed didn't show any problems but adding branches in the fast path can
> eventually lead to hard-to-detect performance problems.
> 
dividing pcp-list into MIGRATE_TYPES is bad ?
If divided, we can get rid of scan. 

Thanks,
-Kame



> Do you have a situation in mind that this patch fixes up?
> 
> Thanks
> 
> > ---
> >  mm/page_alloc.c |   12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > Index: b/mm/page_alloc.c
> > ===================================================================
> > --- a/mm/page_alloc.c	2008-11-06 06:01:15.000000000 +0900
> > +++ b/mm/page_alloc.c	2008-11-06 06:27:41.000000000 +0900
> > @@ -1002,6 +1002,7 @@ static void free_hot_cold_page(struct pa
> >  	struct zone *zone = page_zone(page);
> >  	struct per_cpu_pages *pcp;
> >  	unsigned long flags;
> > +	int migratetype = get_pageblock_migratetype(page);
> >  
> >  	if (PageAnon(page))
> >  		page->mapping = NULL;
> > @@ -1018,16 +1019,25 @@ static void free_hot_cold_page(struct pa
> >  	pcp = &zone_pcp(zone, get_cpu())->pcp;
> >  	local_irq_save(flags);
> >  	__count_vm_event(PGFREE);
> > +
> > +	set_page_private(page, migratetype);
> > +
> > +	/* the page for emergency shouldn't be cached */
> > +	if (migratetype == MIGRATE_RESERVE) {
> > +		free_one_page(zone, page, 0);
> > +		goto out;
> > +	}
> >  	if (cold)
> >  		list_add_tail(&page->lru, &pcp->list);
> >  	else
> >  		list_add(&page->lru, &pcp->list);
> > -	set_page_private(page, get_pageblock_migratetype(page));
> >  	pcp->count++;
> >  	if (pcp->count >= pcp->high) {
> >  		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
> >  		pcp->count -= pcp->batch;
> >  	}
> > +
> > +out:
> >  	local_irq_restore(flags);
> >  	put_cpu();
> >  }
> > 
> > 
> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-06 16:46 ` Mel Gorman
  2008-11-07  1:42   ` KAMEZAWA Hiroyuki
@ 2008-11-07  4:37   ` KOSAKI Motohiro
  2008-11-07 10:47     ` Mel Gorman
  1 sibling, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2008-11-07  4:37 UTC (permalink / raw)
  To: Mel Gorman; +Cc: kosaki.motohiro, LKML, Christoph Lameter, linux-mm

Hi Mel, Cristoph,

Thank you for interesting comment!


> > MIGRATE_RESERVE mean that the page is for emergency.
> > So it shouldn't be cached in pcp.
> 
> It doesn't necessarily mean it's for emergencys. MIGRATE_RESERVE is one
> or more pageblocks at the beginning of the zone. While it's possible
> that the minimum page reserve for GFP_ATOMIC is located here, it's not
> mandatory.
> 
> What MIGRATE_RESERVE can help is high-order atomic allocations used by
> some network drivers (a wireless one is what led to MIGRATE_RESERVE). As
> they are high-order allocations, they would be returned to the buddy
> allocator anyway.

yup.
my patch is meaningless for high order allocation because high order allocation
don't use pcp.


> What your patch may help is the situation where the system is under intense
> memory pressure, is dipping routinely into the lowmem reserves and mixing
> with high-order atomic allocations. This seems a bit extreme.

not so extreame.

The linux page reclaim can't process in interrupt context.
Sl network subsystem and driver often use MIGRATE_RESERVE memory although
system have many reclaimable memory.

At that time, any task in process context can use high order allocation.


> > otherwise, the system have unnecessary memory starvation risk
> > because other cpu can't use this emergency pages.
> > 
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > CC: Mel Gorman <mel@csn.ul.ie>
> > CC: Christoph Lameter <cl@linux-foundation.org>
> > 
> 
> This patch seems functionally sound but as Christoph points out, this
> adds another branch to the fast path. Now, I ran some tests and those that
> completed didn't show any problems but adding branches in the fast path can
> eventually lead to hard-to-detect performance problems.
> 
> Do you have a situation in mind that this patch fixes up?

Ah, sorry for my description is too poor.
This isn't real workload issue, it is jsut 

Actually, I plan to rework to pcp because following pcp list searching 
in fast path is NOT fast.

In general, list searching often cause L1 cache miss, therefore it shouldn't be
used in fast path.


static struct page *buffered_rmqueue(struct zone *preferred_zone,
                        struct zone *zone, int order, gfp_t gfp_flags)
{
(snip)
                /* Find a page of the appropriate migrate type */
                if (cold) {
                        list_for_each_entry_reverse(page, &pcp->list, lru)
                                if (page_private(page) == migratetype)
                                        break;
                } else {
                        list_for_each_entry(page, &pcp->list, lru)
                                if (page_private(page) == migratetype)
                                        break;
                }


Therefore, I'd like to make per migratetype pcp list.
However, MIGRATETYPE_RESEVE list isn't useful because caller never need reserve type.
it is only internal attribute.

So I thought "dropping reserve type page in pcp" patch is useful although it is sololy used.
Then, I posted it sololy for hear other developer opinion.

Actually, current pcp is NOT fast, therefore the discussion of the 
number of branches isn't meaningful.
the discussion of the number of branches is only meaningful when the fast path can
process at N*branches level time, but current pcp is more slow.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-07  1:42   ` KAMEZAWA Hiroyuki
@ 2008-11-07 10:42     ` Mel Gorman
  2008-11-07 11:02       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2008-11-07 10:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: KOSAKI Motohiro, LKML, Christoph Lameter, linux-mm

On Fri, Nov 07, 2008 at 10:42:24AM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 6 Nov 2008 16:46:45 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> > > otherwise, the system have unnecessary memory starvation risk
> > > because other cpu can't use this emergency pages.
> > > 
> > > 
> > > 
> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > CC: Mel Gorman <mel@csn.ul.ie>
> > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > 
> > 
> > This patch seems functionally sound but as Christoph points out, this
> > adds another branch to the fast path. Now, I ran some tests and those that
> > completed didn't show any problems but adding branches in the fast path can
> > eventually lead to hard-to-detect performance problems.
> > 
> dividing pcp-list into MIGRATE_TYPES is bad ?

I do not understand what your question is.

> If divided, we can get rid of scan. 
> 

I don't know what you are saying here either. I think you are saying
that we should avoid scanning the PCP lists for migrate types at all.
That hurts anti-fragmentation as pages can be badly placed if any pcp
page is used.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-07  4:37   ` KOSAKI Motohiro
@ 2008-11-07 10:47     ` Mel Gorman
  2008-11-11 13:39       ` KOSAKI Motohiro
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2008-11-07 10:47 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Christoph Lameter, linux-mm

On Fri, Nov 07, 2008 at 01:37:14PM +0900, KOSAKI Motohiro wrote:
> Hi Mel, Cristoph,
> 
> Thank you for interesting comment!
> 
> 
> > > MIGRATE_RESERVE mean that the page is for emergency.
> > > So it shouldn't be cached in pcp.
> > 
> > It doesn't necessarily mean it's for emergencys. MIGRATE_RESERVE is one
> > or more pageblocks at the beginning of the zone. While it's possible
> > that the minimum page reserve for GFP_ATOMIC is located here, it's not
> > mandatory.
> > 
> > What MIGRATE_RESERVE can help is high-order atomic allocations used by
> > some network drivers (a wireless one is what led to MIGRATE_RESERVE). As
> > they are high-order allocations, they would be returned to the buddy
> > allocator anyway.
> 
> yup.
> my patch is meaningless for high order allocation because high order allocation
> don't use pcp.
> 
> > What your patch may help is the situation where the system is under intense
> > memory pressure, is dipping routinely into the lowmem reserves and mixing
> > with high-order atomic allocations. This seems a bit extreme.
> 
> not so extreame.
> 
> The linux page reclaim can't process in interrupt context.
> Sl network subsystem and driver often use MIGRATE_RESERVE memory although
> system have many reclaimable memory.
> 

Why are they often using MIGRATE_RESERVE, have you confirmed that? For that
to be happening, it implies that either memory is under intense pressure and
free pages are often below watermarks due to interrupt contexts or they are
frequently allocating high-order pages in interrupt context. Normal order-0
allocations should be getting satisified from elsewhere as if the free page
counts are low, they would be direct reclaiming and that will likely be
outside of the MIGRATE_RESERVE areas.

> At that time, any task in process context can use high order allocation.
> 
> > > otherwise, the system have unnecessary memory starvation risk
> > > because other cpu can't use this emergency pages.
> > > 
> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > CC: Mel Gorman <mel@csn.ul.ie>
> > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > 
> > 
> > This patch seems functionally sound but as Christoph points out, this
> > adds another branch to the fast path. Now, I ran some tests and those that
> > completed didn't show any problems but adding branches in the fast path can
> > eventually lead to hard-to-detect performance problems.
> > 
> > Do you have a situation in mind that this patch fixes up?
> 
> Ah, sorry for my description is too poor.
> This isn't real workload issue, it is jsut 
> 
> Actually, I plan to rework to pcp because following pcp list searching 
> in fast path is NOT fast.
> 
> In general, list searching often cause L1 cache miss, therefore it shouldn't be
> used in fast path.
> 
> 
> static struct page *buffered_rmqueue(struct zone *preferred_zone,
>                         struct zone *zone, int order, gfp_t gfp_flags)
> {
> (snip)
>                 /* Find a page of the appropriate migrate type */
>                 if (cold) {
>                         list_for_each_entry_reverse(page, &pcp->list, lru)
>                                 if (page_private(page) == migratetype)
>                                         break;
>                 } else {
>                         list_for_each_entry(page, &pcp->list, lru)
>                                 if (page_private(page) == migratetype)
>                                         break;
>                 }
> 
> Therefore, I'd like to make per migratetype pcp list.

That was actually how it was originally implemented and later moved to a list
search. It got shot down on the grounds a per-cpu structure increased in size.

> However, MIGRATETYPE_RESEVE list isn't useful because caller never need reserve type.
> it is only internal attribute.
> 
> So I thought "dropping reserve type page in pcp" patch is useful although it is sololy used.
> Then, I posted it sololy for hear other developer opinion.
> 
> Actually, current pcp is NOT fast, therefore the discussion of the 
> number of branches isn't meaningful.
> the discussion of the number of branches is only meaningful when the fast path can
> process at N*branches level time, but current pcp is more slow.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-07 10:42     ` Mel Gorman
@ 2008-11-07 11:02       ` KAMEZAWA Hiroyuki
  2008-11-07 11:27         ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-07 11:02 UTC (permalink / raw)
  To: Mel Gorman; +Cc: KOSAKI Motohiro, LKML, Christoph Lameter, linux-mm

On Fri, 7 Nov 2008 10:42:42 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> On Fri, Nov 07, 2008 at 10:42:24AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 6 Nov 2008 16:46:45 +0000
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > > > otherwise, the system have unnecessary memory starvation risk
> > > > because other cpu can't use this emergency pages.
> > > > 
> > > > 
> > > > 
> > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > CC: Mel Gorman <mel@csn.ul.ie>
> > > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > > 
> > > 
> > > This patch seems functionally sound but as Christoph points out, this
> > > adds another branch to the fast path. Now, I ran some tests and those that
> > > completed didn't show any problems but adding branches in the fast path can
> > > eventually lead to hard-to-detect performance problems.
> > > 
> > dividing pcp-list into MIGRATE_TYPES is bad ?
> 
> I do not understand what your question is.
> 
Hmm. like this.

	 pcp = &zone_pcp(zone, get_cpu())->pcp[migrate_type];

	
Thanks,
-Kame


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-07 11:02       ` KAMEZAWA Hiroyuki
@ 2008-11-07 11:27         ` Mel Gorman
  2008-11-07 18:45           ` Christoph Lameter
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2008-11-07 11:27 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: KOSAKI Motohiro, LKML, Christoph Lameter, linux-mm

On Fri, Nov 07, 2008 at 08:02:51PM +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 7 Nov 2008 10:42:42 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Fri, Nov 07, 2008 at 10:42:24AM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 6 Nov 2008 16:46:45 +0000
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > > > otherwise, the system have unnecessary memory starvation risk
> > > > > because other cpu can't use this emergency pages.
> > > > > 
> > > > > 
> > > > > 
> > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > CC: Mel Gorman <mel@csn.ul.ie>
> > > > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > > > 
> > > > 
> > > > This patch seems functionally sound but as Christoph points out, this
> > > > adds another branch to the fast path. Now, I ran some tests and those that
> > > > completed didn't show any problems but adding branches in the fast path can
> > > > eventually lead to hard-to-detect performance problems.
> > > > 
> > > dividing pcp-list into MIGRATE_TYPES is bad ?
> > 
> > I do not understand what your question is.
> > 
> Hmm. like this.
> 
> 	 pcp = &zone_pcp(zone, get_cpu())->pcp[migrate_type];
> 

Oh, do you mean splitting the list instead of searching? This is how it was
originally implement and shot down on the grounds it increased the size of
a per-cpu structure.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-07 11:27         ` Mel Gorman
@ 2008-11-07 18:45           ` Christoph Lameter
  2008-11-11  8:13             ` KOSAKI Motohiro
  2008-12-05 15:40             ` Mel Gorman
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Lameter @ 2008-11-07 18:45 UTC (permalink / raw)
  To: Mel Gorman; +Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, LKML, linux-mm

On Fri, 7 Nov 2008, Mel Gorman wrote:

> Oh, do you mean splitting the list instead of searching? This is how it was
> originally implement and shot down on the grounds it increased the size of
> a per-cpu structure.

The situation may be better with the cpu_alloc stuff. The big pcp array in
struct zone for all possible processors will be gone and thus the memory
requirements will be less.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-07 18:45           ` Christoph Lameter
@ 2008-11-11  8:13             ` KOSAKI Motohiro
  2008-12-05 15:40             ` Mel Gorman
  1 sibling, 0 replies; 17+ messages in thread
From: KOSAKI Motohiro @ 2008-11-11  8:13 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, Mel Gorman, KAMEZAWA Hiroyuki, LKML, linux-mm

> > Oh, do you mean splitting the list instead of searching? This is how it was
> > originally implement and shot down on the grounds it increased the size of
> > a per-cpu structure.
> 
> The situation may be better with the cpu_alloc stuff. The big pcp array in
> struct zone for all possible processors will be gone and thus the memory
> requirements will be less.

Yup, there are very nicer patch!




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-07 10:47     ` Mel Gorman
@ 2008-11-11 13:39       ` KOSAKI Motohiro
  2008-11-11 14:42         ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2008-11-11 13:39 UTC (permalink / raw)
  To: Mel Gorman; +Cc: kosaki.motohiro, LKML, Christoph Lameter, linux-mm

> > > What your patch may help is the situation where the system is under intense
> > > memory pressure, is dipping routinely into the lowmem reserves and mixing
> > > with high-order atomic allocations. This seems a bit extreme.
> > 
> > not so extreame.
> > 
> > The linux page reclaim can't process in interrupt context.
> > Sl network subsystem and driver often use MIGRATE_RESERVE memory although
> > system have many reclaimable memory.
> > 
> 
> Why are they often using MIGRATE_RESERVE, have you confirmed that? For that
> to be happening, it implies that either memory is under intense pressure and
> free pages are often below watermarks due to interrupt contexts or they are
> frequently allocating high-order pages in interrupt context. Normal order-0
> allocations should be getting satisified from elsewhere as if the free page
> counts are low, they would be direct reclaiming and that will likely be
> outside of the MIGRATE_RESERVE areas.

if inserting printk() in MIGRATE_RESERVE, I can observe MIGRATE_RESERVE
page alloc easily although heavy workload don't run.
but, there aren't my point.

ok, I guess my patch description was too poor (and a bit pointless).
So, I retry it.

(1) in general principal, the system should effort to avoid oom rather than
    performance if memory shortage happend.
    MIGRATE_RESERVE directly indicate memory shortage happend.
    and pcp caching can prevent another cpu allocation.
(2) MIGRATE_RESERVE is never searched by buffered_rmqueue() because 
    allocflags_to_migratetype() never return MIGRATE_RESERVE.
    it doesn't work as cache.
    IOW, it don't help to increase performance.
(3) if the system pass MIGRATE_RESERVE to free_hot_cold_page() continously,
    pcp queueing can reduce the number of grabing zone->lock.
    However, it is rate. because MIGRATE_RESERVE is emergency memory,
    and it is often used interupt context processing.
    continuous emergency memory allocation in interrupt context isn't so sane.

Then, unqueueing MIGRATE_RESERVE page doesn't cause performance degression
and, it can (a bit) increase realibility and I think merit is much over demerit.




> > static struct page *buffered_rmqueue(struct zone *preferred_zone,
> >                         struct zone *zone, int order, gfp_t gfp_flags)
> > {
> > (snip)
> >                 /* Find a page of the appropriate migrate type */
> >                 if (cold) {
> >                         list_for_each_entry_reverse(page, &pcp->list, lru)
> >                                 if (page_private(page) == migratetype)
> >                                         break;
> >                 } else {
> >                         list_for_each_entry(page, &pcp->list, lru)
> >                                 if (page_private(page) == migratetype)
> >                                         break;
> >                 }
> > 
> > Therefore, I'd like to make per migratetype pcp list.
> 
> That was actually how it was originally implemented and later moved to a list
> search. It got shot down on the grounds a per-cpu structure increased in size.

Yup, I believe at that time your decision is right.
However, I think the condision was changed (or to be able to change).

 (1) legacy pcp implementation deeply relate to struct zone size.
     and, to blow up struct zone size cause performance degression
     because cache miss increasing.
     However, it solved cristoph's cpu-alloc patch

 (2) legacy pcp doesn't have total number of pages restriction.
     So, increasing lists directly cause number of pages in pcp.
     it can cause oom problem on large numa environment.
     However, I think we can implement total number of pages restriction.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-11 13:39       ` KOSAKI Motohiro
@ 2008-11-11 14:42         ` Mel Gorman
  2008-11-14  4:31           ` KOSAKI Motohiro
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2008-11-11 14:42 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, Christoph Lameter, linux-mm

On Tue, Nov 11, 2008 at 10:39:40PM +0900, KOSAKI Motohiro wrote:
> > > > What your patch may help is the situation where the system is under intense
> > > > memory pressure, is dipping routinely into the lowmem reserves and mixing
> > > > with high-order atomic allocations. This seems a bit extreme.
> > > 
> > > not so extreame.
> > > 
> > > The linux page reclaim can't process in interrupt context.
> > > Sl network subsystem and driver often use MIGRATE_RESERVE memory although
> > > system have many reclaimable memory.
> > > 
> > 
> > Why are they often using MIGRATE_RESERVE, have you confirmed that? For that
> > to be happening, it implies that either memory is under intense pressure and
> > free pages are often below watermarks due to interrupt contexts or they are
> > frequently allocating high-order pages in interrupt context. Normal order-0
> > allocations should be getting satisified from elsewhere as if the free page
> > counts are low, they would be direct reclaiming and that will likely be
> > outside of the MIGRATE_RESERVE areas.
> 
> if inserting printk() in MIGRATE_RESERVE, I can observe MIGRATE_RESERVE
> page alloc easily although heavy workload don't run.
> but, there aren't my point.
> 

That's interesting. What is the size of a pageblock on your system and
is min_free_kbytes aligned to that value? If it's not aligned, it would
explain why MIGRATE_RESERVE pages are being used before the watermarks
are hit.

> ok, I guess my patch description was too poor (and a bit pointless).
> So, I retry it.
> 
> (1) in general principal, the system should effort to avoid oom rather than
>     performance if memory shortage happend.
>     MIGRATE_RESERVE directly indicate memory shortage happend.
>     and pcp caching can prevent another cpu allocation.

MIGRATE_RESERVE does not directly indicate a memory shortage has
occured. Bear in mind that a number of pageblocks are marked
MIGRATE_RESERVE based on the value of the watermarks. In general, the
minimum number of pages kept free will be in the MIGRATE_RESERVE blocks
but it is not mandatory.

> (2) MIGRATE_RESERVE is never searched by buffered_rmqueue() because 
>     allocflags_to_migratetype() never return MIGRATE_RESERVE.
>     it doesn't work as cache.
>     IOW, it don't help to increase performance.

This is true. If MIGRATE_RESERVE pages are routinely being used and placed
on the pcp lists, the lists are not being used to their full potential
and your patch would make sense.

> (3) if the system pass MIGRATE_RESERVE to free_hot_cold_page() continously,
>     pcp queueing can reduce the number of grabing zone->lock.
>     However, it is rate. because MIGRATE_RESERVE is emergency memory,

Again, MIGRATE_RESERVE is not emergency memory.

>     and it is often used interupt context processing.
>     continuous emergency memory allocation in interrupt context isn't so sane.
> 
> Then, unqueueing MIGRATE_RESERVE page doesn't cause performance degression
> and, it can (a bit) increase realibility and I think merit is much over demerit.
> 

I'm now inclined to agree if you have shown that MIGRATE_RESERVE pages are
routinely ending up on the PCP lists.

> 
> 
> 
> > > static struct page *buffered_rmqueue(struct zone *preferred_zone,
> > >                         struct zone *zone, int order, gfp_t gfp_flags)
> > > {
> > > (snip)
> > >                 /* Find a page of the appropriate migrate type */
> > >                 if (cold) {
> > >                         list_for_each_entry_reverse(page, &pcp->list, lru)
> > >                                 if (page_private(page) == migratetype)
> > >                                         break;
> > >                 } else {
> > >                         list_for_each_entry(page, &pcp->list, lru)
> > >                                 if (page_private(page) == migratetype)
> > >                                         break;
> > >                 }
> > > 
> > > Therefore, I'd like to make per migratetype pcp list.
> > 
> > That was actually how it was originally implemented and later moved to a list
> > search. It got shot down on the grounds a per-cpu structure increased in size.
> 
> Yup, I believe at that time your decision is right.
> However, I think the condision was changed (or to be able to change).
> 
>  (1) legacy pcp implementation deeply relate to struct zone size.
>      and, to blow up struct zone size cause performance degression
>      because cache miss increasing.
>      However, it solved cristoph's cpu-alloc patch
> 

Indeed.

>  (2) legacy pcp doesn't have total number of pages restriction.
>      So, increasing lists directly cause number of pages in pcp.
>      it can cause oom problem on large numa environment.
>      However, I think we can implement total number of pages restriction.
> 

Yes although knowing what the right size for each of the lists should be
so that the overall PCP lists are not huge is a tricky one.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-11 14:42         ` Mel Gorman
@ 2008-11-14  4:31           ` KOSAKI Motohiro
  0 siblings, 0 replies; 17+ messages in thread
From: KOSAKI Motohiro @ 2008-11-14  4:31 UTC (permalink / raw)
  To: Mel Gorman; +Cc: kosaki.motohiro, LKML, Christoph Lameter, linux-mm

Hi

Sorry for late responce.
Honestly, I have many bisiness trip in this month ;-)


> > > > > What your patch may help is the situation where the system is under intense
> > > > > memory pressure, is dipping routinely into the lowmem reserves and mixing
> > > > > with high-order atomic allocations. This seems a bit extreme.
> > > > 
> > > > not so extreame.
> > > > 
> > > > The linux page reclaim can't process in interrupt context.
> > > > Sl network subsystem and driver often use MIGRATE_RESERVE memory although
> > > > system have many reclaimable memory.
> > > > 
> > > 
> > > Why are they often using MIGRATE_RESERVE, have you confirmed that? For that
> > > to be happening, it implies that either memory is under intense pressure and
> > > free pages are often below watermarks due to interrupt contexts or they are
> > > frequently allocating high-order pages in interrupt context. Normal order-0
> > > allocations should be getting satisified from elsewhere as if the free page
> > > counts are low, they would be direct reclaiming and that will likely be
> > > outside of the MIGRATE_RESERVE areas.
> > 
> > if inserting printk() in MIGRATE_RESERVE, I can observe MIGRATE_RESERVE
> > page alloc easily although heavy workload don't run.
> > but, there aren't my point.
> > 
> 
> That's interesting. What is the size of a pageblock on your system and
> is min_free_kbytes aligned to that value? If it's not aligned, it would
> explain why MIGRATE_RESERVE pages are being used before the watermarks
> are hit.

hmm, I don't have it yet.
ok, I should investigate more.


> > ok, I guess my patch description was too poor (and a bit pointless).
> > So, I retry it.
> > 
> > (1) in general principal, the system should effort to avoid oom rather than
> >     performance if memory shortage happend.
> >     MIGRATE_RESERVE directly indicate memory shortage happend.
> >     and pcp caching can prevent another cpu allocation.
> 
> MIGRATE_RESERVE does not directly indicate a memory shortage has
> occured. Bear in mind that a number of pageblocks are marked
> MIGRATE_RESERVE based on the value of the watermarks. In general, the
> minimum number of pages kept free will be in the MIGRATE_RESERVE blocks
> but it is not mandatory.
> 
> > (2) MIGRATE_RESERVE is never searched by buffered_rmqueue() because 
> >     allocflags_to_migratetype() never return MIGRATE_RESERVE.
> >     it doesn't work as cache.
> >     IOW, it don't help to increase performance.
> 
> This is true. If MIGRATE_RESERVE pages are routinely being used and placed
> on the pcp lists, the lists are not being used to their full potential
> and your patch would make sense.
> 
> > (3) if the system pass MIGRATE_RESERVE to free_hot_cold_page() continously,
> >     pcp queueing can reduce the number of grabing zone->lock.
> >     However, it is rate. because MIGRATE_RESERVE is emergency memory,
> 
> Again, MIGRATE_RESERVE is not emergency memory.
> 
> >     and it is often used interupt context processing.
> >     continuous emergency memory allocation in interrupt context isn't so sane.
> > 
> > Then, unqueueing MIGRATE_RESERVE page doesn't cause performance degression
> > and, it can (a bit) increase realibility and I think merit is much over demerit.
> > 
> 
> I'm now inclined to agree if you have shown that MIGRATE_RESERVE pages are
> routinely ending up on the PCP lists.

Thanks!

So, now, I have two todo issue.
  - I should mesure performance.
  - I should investigate why MIGRATE_RESERVE is used on my machine.

I expect I finish to the end of next week.

> > Yup, I believe at that time your decision is right.
> > However, I think the condision was changed (or to be able to change).
> > 
> >  (1) legacy pcp implementation deeply relate to struct zone size.
> >      and, to blow up struct zone size cause performance degression
> >      because cache miss increasing.
> >      However, it solved cristoph's cpu-alloc patch
> 
> Indeed.
> 
> >  (2) legacy pcp doesn't have total number of pages restriction.
> >      So, increasing lists directly cause number of pages in pcp.
> >      it can cause oom problem on large numa environment.
> >      However, I think we can implement total number of pages restriction.
> > 
> 
> Yes although knowing what the right size for each of the lists should be
> so that the overall PCP lists are not huge is a tricky one.

Thank you for good advice.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-11-07 18:45           ` Christoph Lameter
  2008-11-11  8:13             ` KOSAKI Motohiro
@ 2008-12-05 15:40             ` Mel Gorman
  2008-12-07  8:22               ` KOSAKI Motohiro
  1 sibling, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2008-12-05 15:40 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, LKML, linux-mm

On Fri, Nov 07, 2008 at 12:45:20PM -0600, Christoph Lameter wrote:
> On Fri, 7 Nov 2008, Mel Gorman wrote:
> 
> > Oh, do you mean splitting the list instead of searching? This is how it was
> > originally implement and shot down on the grounds it increased the size of
> > a per-cpu structure.
> 
> The situation may be better with the cpu_alloc stuff. The big pcp array in
> struct zone for all possible processors will be gone and thus the memory
> requirements will be less.
> 

I implemented this ages ago, ran some tests and then got distracted by
something shiny and forgot to post. The patch below replaces the per-cpu
list search with one list per migrate type of interest. It reduces the size of
a vmlinux built on a stripped-down config for x86 by 268 bytes which is
nothing to write home about but better than a poke in the eye.

The tests I used were kernbench, aim9 (page_test and brk_test mainly) and
tbench figuring these would exercise the page allocator path a bit. The
tests were on 2.6.28-rc5 but the patch applies to latest git as well.
Test results follow, patch is at the bottom.

vmlinux size comparison
    text	   data	    bss	    dec	    hex	filename
 3871764	1542612	3465216	8879592	 877de8	vmlinux-vanilla
 3871496	1542612	3465216	8879324	 877cdc	vmlinux-splitpcp
 
x86_64 BladeCenter LS20 4xCPUS 1GB RAM
	kernbench	+1.10% outside noise
	page_test	-5.50%
	brk_test	+4.10%
	tbench-1	+3.44% within noise
	tbench-cpus	-0.75% well within noise
	tbench-2xcpus	-0.37% well within noise
	highalloc	fine

ppc64 PPC970 4xCPUS 7GB RAM
	kernbench	-0.23% within noise
	page_test	+0.85%
	brk_test	-1.16%
	tbench-1	-1.35% just outside debiation
	tbench-cpus	+3.97% just outside noise
	tbench-2xcpus	-0.79% within noise
	highalloc	fine

x86 8xCPUs 16GB RAM
	kernbench	+0.07% well within noise
	page_test	+1.44%
	brk_test	+6.32%
	tbench-1	+1.59% within noise
	tbench-cpus	+6.29% well outside noise
	tbench-2xcpus	-0.18% well within noise

x86_64 eServer 336 4xCPUS 7GB RAM
	kernbench	+0.29% just outside noise
	page_test	+0.48%
	brk_test	-0.87%
	tbench-1	+0.88% just outside noise
	tbench-cpus	+1.61% well outside noise
	tbench-2xcpus	+0.87% within noise

x86_64 System X 3950 32xCPUs 48GB RAM
	kernbench	+0.40% within noise
	page_test	+3.36%
	brk_test	-0.63%
	tbench-1	-0.69% well within noise
	tbench-cpus	+2.02% just outside noise
	tbench-2xcpus	+1.93% just outside noise

ppc64 System p5 570 4xCPUs 4GB RAM
	kernbench	-0.09% within noise
	page_test	-1.34%
	brk_test	-1.03%
	tbench-1	-1.09% within noise
	tbench-cpus	-0.60% just outside noise
	tbench-2xcpus	-0.99% just outside noise

ppc64 System p5 575 128xCPUs 48GB RAM
	kernbench	+4.10% outside noise
	page_test	-0.11%
	brk_test	-4.22%
	tbench-1	+3.95% within noise
	tbench-cpus	+6.77% far outside noise
	tbench-2xcpus	+1.37% outside noise

====== CUT HERE ======
From: Mel Gorman <mel@csn.ul.ie>
Subject: [RFC] Split per-cpu list into one-list-per-migrate-type

Currently the per-cpu page allocator searches the PCP list for pages of the
correct migrate-type to reduce the possibility of pages being inappropriate
placed from a fragmentation perspective. This search is potentially expensive
in a fast-path and undesirable. Splitting the per-cpu list into multiple lists
increases the size of a per-cpu structure and this was potentially a major
problem at the time the search was introduced. These problem has been
mitigated as now only the necessary number of structures is allocated for the
running system.

This patch replaces a list search in the per-cpu allocator with one list
per migrate type that should be in use by the per-cpu allocator - namely
unmovable, reclaimable and movable.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 include/linux/mmzone.h |    7 +++-
 mm/page_alloc.c        |   79 ++++++++++++++++++++++++++++-------------------
 2 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 35a7b5e..19661e4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -40,6 +40,7 @@
 #define MIGRATE_MOVABLE       2
 #define MIGRATE_RESERVE       3
 #define MIGRATE_ISOLATE       4 /* can't allocate from here */
+#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
 #define MIGRATE_TYPES         5
 
 #define for_each_migratetype_order(order, type) \
@@ -170,7 +171,11 @@ struct per_cpu_pages {
 	int count;		/* number of pages in the list */
 	int high;		/* high watermark, emptying needed */
 	int batch;		/* chunk size for buddy add/remove */
-	struct list_head list;	/* the list of pages */
+	/*
+	 * the lists of pages, one per migrate type stored on the pcp-lists
+	 * which is unreclaimable, reclaimable and movable
+	 */
+	struct list_head lists[MIGRATE_PCPTYPES];
 };
 
 struct per_cpu_pageset {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8ac014..4433b7a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -482,7 +482,7 @@ static inline int free_pages_check(struct page *page)
 }
 
 /*
- * Frees a list of pages. 
+ * Frees a number of pages from the PCP lists
  * Assumes all pages on list are in same zone, and of same order.
  * count is the number of pages to free.
  *
@@ -492,20 +492,28 @@ static inline int free_pages_check(struct page *page)
  * And clear the zone's pages_scanned counter, to hold off the "all pages are
  * pinned" detection logic.
  */
-static void free_pages_bulk(struct zone *zone, int count,
-					struct list_head *list, int order)
+static void free_pcppages_bulk(struct zone *zone, int count,
+					struct per_cpu_pages *pcp)
 {
+	int migratetype = 0;
+
 	spin_lock(&zone->lock);
 	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
 	zone->pages_scanned = 0;
 	while (count--) {
 		struct page *page;
+		struct list_head *list;
+
+		/* Remove pages from lists in a round-robin fashion */
+		do {
+			migratetype = (migratetype + 1) % MIGRATE_PCPTYPES;
+			list = &pcp->lists[migratetype];
+		} while (list_empty(list));
 
-		VM_BUG_ON(list_empty(list));
 		page = list_entry(list->prev, struct page, lru);
 		/* have to delete it as __free_one_page list manipulates */
 		list_del(&page->lru);
-		__free_one_page(page, zone, order);
+		__free_one_page(page, zone, 0);
 	}
 	spin_unlock(&zone->lock);
 }
@@ -892,7 +900,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 		to_drain = pcp->batch;
 	else
 		to_drain = pcp->count;
-	free_pages_bulk(zone, to_drain, &pcp->list, 0);
+	free_pcppages_bulk(zone, to_drain, pcp);
 	pcp->count -= to_drain;
 	local_irq_restore(flags);
 }
@@ -921,7 +929,7 @@ static void drain_pages(unsigned int cpu)
 
 		pcp = &pset->pcp;
 		local_irq_save(flags);
-		free_pages_bulk(zone, pcp->count, &pcp->list, 0);
+		free_pcppages_bulk(zone, pcp->count, pcp);
 		pcp->count = 0;
 		local_irq_restore(flags);
 	}
@@ -987,6 +995,7 @@ static void free_hot_cold_page(struct page *page, int cold)
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
+	int migratetype;
 
 	if (PageAnon(page))
 		page->mapping = NULL;
@@ -1003,16 +1012,32 @@ static void free_hot_cold_page(struct page *page, int cold)
 	pcp = &zone_pcp(zone, get_cpu())->pcp;
 	local_irq_save(flags);
 	__count_vm_event(PGFREE);
+
+	/*
+	 * Only store unreclaimable, reclaimable and movable on pcp lists.
+	 * The one concern is that if the minimum number of free pages is not
+	 * aligned to a pageblock-boundary that allocations/frees from the
+	 * MIGRATE_RESERVE pageblocks may call free_one_page() excessively
+	 */
+	migratetype = get_pageblock_migratetype(page);
+	if (migratetype >= MIGRATE_PCPTYPES) {
+		free_one_page(zone, page, 0);
+		goto out;
+	}
+
 	if (cold)
-		list_add_tail(&page->lru, &pcp->list);
+		list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	else
-		list_add(&page->lru, &pcp->list);
-	set_page_private(page, get_pageblock_migratetype(page));
+		list_add(&page->lru, &pcp->lists[migratetype]);
+	set_page_private(page, migratetype);
 	pcp->count++;
+
 	if (pcp->count >= pcp->high) {
-		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
+		free_pcppages_bulk(zone, pcp->batch, pcp);
 		pcp->count -= pcp->batch;
 	}
+
+out:
 	local_irq_restore(flags);
 	put_cpu();
 }
@@ -1066,31 +1091,21 @@ again:
 
 		pcp = &zone_pcp(zone, cpu)->pcp;
 		local_irq_save(flags);
-		if (!pcp->count) {
-			pcp->count = rmqueue_bulk(zone, 0,
-					pcp->batch, &pcp->list, migratetype);
-			if (unlikely(!pcp->count))
+		if (list_empty(&pcp->lists[migratetype])) {
+			pcp->count += rmqueue_bulk(zone, 0, pcp->batch,
+				&pcp->lists[migratetype], migratetype);
+			if (unlikely(list_empty(&pcp->lists[migratetype])))
 				goto failed;
 		}
 
-		/* Find a page of the appropriate migrate type */
+
 		if (cold) {
-			list_for_each_entry_reverse(page, &pcp->list, lru)
-				if (page_private(page) == migratetype)
-					break;
+			page = list_entry(pcp->lists[migratetype].prev,
+							struct page, lru);
 		} else {
-			list_for_each_entry(page, &pcp->list, lru)
-				if (page_private(page) == migratetype)
-					break;
-		}
-
-		/* Allocate more to the pcp list if necessary */
-		if (unlikely(&page->lru == &pcp->list)) {
-			pcp->count += rmqueue_bulk(zone, 0,
-					pcp->batch, &pcp->list, migratetype);
-			page = list_entry(pcp->list.next, struct page, lru);
+			page = list_entry(pcp->lists[migratetype].next,
+							struct page, lru);
 		}
-
 		list_del(&page->lru);
 		pcp->count--;
 	} else {
@@ -2705,6 +2720,7 @@ static int zone_batchsize(struct zone *zone)
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
+	int migratetype;
 
 	memset(p, 0, sizeof(*p));
 
@@ -2712,7 +2728,8 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 	pcp->count = 0;
 	pcp->high = 6 * batch;
 	pcp->batch = max(1UL, 1 * batch);
-	INIT_LIST_HEAD(&pcp->list);
+	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
+		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
 /*

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-12-05 15:40             ` Mel Gorman
@ 2008-12-07  8:22               ` KOSAKI Motohiro
  2008-12-11 15:39                 ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2008-12-07  8:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Christoph Lameter, KAMEZAWA Hiroyuki, LKML, linux-mm

> 
> ====== CUT HERE ======
> From: Mel Gorman <mel@csn.ul.ie>
> Subject: [RFC] Split per-cpu list into one-list-per-migrate-type
> 
> Currently the per-cpu page allocator searches the PCP list for pages of the
> correct migrate-type to reduce the possibility of pages being inappropriate
> placed from a fragmentation perspective. This search is potentially expensive
> in a fast-path and undesirable. Splitting the per-cpu list into multiple lists
> increases the size of a per-cpu structure and this was potentially a major
> problem at the time the search was introduced. These problem has been
> mitigated as now only the necessary number of structures is allocated for the
> running system.
> 
> This patch replaces a list search in the per-cpu allocator with one list
> per migrate type that should be in use by the per-cpu allocator - namely
> unmovable, reclaimable and movable.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Great.

this patch works well on my box too.
and my review didn't find any bug.

very thanks.

	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp
  2008-12-07  8:22               ` KOSAKI Motohiro
@ 2008-12-11 15:39                 ` Mel Gorman
  0 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2008-12-11 15:39 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Christoph Lameter, KAMEZAWA Hiroyuki, LKML, linux-mm

On Sun, Dec 07, 2008 at 05:22:48PM +0900, KOSAKI Motohiro wrote:
> > 
> > ====== CUT HERE ======
> > From: Mel Gorman <mel@csn.ul.ie>
> > Subject: [RFC] Split per-cpu list into one-list-per-migrate-type
> > 
> > Currently the per-cpu page allocator searches the PCP list for pages of the
> > correct migrate-type to reduce the possibility of pages being inappropriate
> > placed from a fragmentation perspective. This search is potentially expensive
> > in a fast-path and undesirable. Splitting the per-cpu list into multiple lists
> > increases the size of a per-cpu structure and this was potentially a major
> > problem at the time the search was introduced. These problem has been
> > mitigated as now only the necessary number of structures is allocated for the
> > running system.
> > 
> > This patch replaces a list search in the per-cpu allocator with one list
> > per migrate type that should be in use by the per-cpu allocator - namely
> > unmovable, reclaimable and movable.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> 
> Great.
> 
> this patch works well on my box too.
> and my review didn't find any bug.
> 
> very thanks.
> 
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 

Thanks very much for the review. I guess in principal I should stick
with this patch then and go with another revision so. Are there any
concerns about the increase in size of a per-cpu structure or are we
confident that it'll be kept to a minimum with the cpu_alloc stuff?

Similarly, are there suggestions on how to prove more conclusively that
this is a performance win? I saw between 1% and 2% on tbench on some
machines but not all so preferably we'd be more confident. Intuitively,
the patch makes sense that it would be faster to me but I'd prove it for
sure if possible.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-12-11 15:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-06  0:16 [RFC][PATCH] mm: the page of MIGRATE_RESERVE don't insert into pcp KOSAKI Motohiro
2008-11-06 14:56 ` Christoph Lameter
2008-11-06 16:46 ` Mel Gorman
2008-11-07  1:42   ` KAMEZAWA Hiroyuki
2008-11-07 10:42     ` Mel Gorman
2008-11-07 11:02       ` KAMEZAWA Hiroyuki
2008-11-07 11:27         ` Mel Gorman
2008-11-07 18:45           ` Christoph Lameter
2008-11-11  8:13             ` KOSAKI Motohiro
2008-12-05 15:40             ` Mel Gorman
2008-12-07  8:22               ` KOSAKI Motohiro
2008-12-11 15:39                 ` Mel Gorman
2008-11-07  4:37   ` KOSAKI Motohiro
2008-11-07 10:47     ` Mel Gorman
2008-11-11 13:39       ` KOSAKI Motohiro
2008-11-11 14:42         ` Mel Gorman
2008-11-14  4:31           ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).