LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/4] Lumpy Reclaim V3
@ 2006-12-06 16:59 Andy Whitcroft
2006-12-06 16:59 ` [PATCH 1/4] lumpy reclaim v2 Andy Whitcroft
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Andy Whitcroft @ 2006-12-06 16:59 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: Peter Zijlstra, Mel Gorman, Andy Whitcroft, linux-kernel
This is a repost of the lumpy reclaim patch set. This is
basically unchanged from the last post, other than being rebased
to 2.6.19-rc2-mm2. This has passed basic stress testing on a range
of machines here.
[Sorry for the delay reposting, I had a test failure and needed
to confirm it was not due to lumpy before posting.]
As before, I have left the changes broken out for the time being
so as to make it clear what is the original and what is my fault.
I would expect to roll this up before acceptance.
-apw
=== 8< ===
Lumpy Reclaim (V3)
When we are out of memory of a suitable size we enter reclaim.
The current reclaim algorithm targets pages in LRU order, which
is great for fairness but highly unsuitable if you desire pages at
higher orders. To get pages of higher order we must shoot down a
very high proportion of memory; >95% in a lot of cases.
This patch set adds a lumpy reclaim algorithm to the allocator.
It targets groups of pages at the specified order anchored at the
end of the active and inactive lists. This encourages groups of
pages at the requested orders to move from active to inactive,
and active to free lists. This behaviour is only triggered out of
direct reclaim when higher order pages have been requested.
This patch set is particularly effective when utilised with
an anti-fragmentation scheme which groups pages of similar
reclaimability together.
This patch set (against 2.6.19-rc6-mm2) is based on Peter Zijlstra's
lumpy reclaim V2 patch which forms the foundation. It comprises
the following patches:
lumpy-reclaim-v2 -- Peter Zijlstra's lumpy reclaim prototype,
lumpy-cleanup-a-missplaced-comment-and-simplify-some-code --
cleanups to move a comment back to where it came from, to make
the area edge selection more comprehensible and also cleans up
the switch coding style to match the concensus in mm/*.c,
lumpy-ensure-we-respect-zone-boundaries -- bug fix to ensure we do
not attempt to take pages from adjacent zones, and
lumpy-take-the-other-active-inactive-pages-in-the-area -- patch to
increase aggression over the targetted order.
Testing of this patch set under high fragmentation high allocation
load conditions shows significantly improved high order reclaim
rates than a standard kernel. The stack here it is now within 5%
of the best case linear-reclaim figures.
It would be interesting to see if this setup is also successful in
reducing order-2 allocation failures that you have been seeing with
jumbo frames.
Please consider for -mm.
-apw
(lumpy-v3r7)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] lumpy reclaim v2
2006-12-06 16:59 [PATCH 0/4] Lumpy Reclaim V3 Andy Whitcroft
@ 2006-12-06 16:59 ` Andy Whitcroft
2006-12-15 4:57 ` Andrew Morton
2006-12-06 17:00 ` [PATCH 2/4] lumpy cleanup a missplaced comment and simplify some code Andy Whitcroft
` (4 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Andy Whitcroft @ 2006-12-06 16:59 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: Peter Zijlstra, Mel Gorman, Andy Whitcroft, linux-kernel
lumpy reclaim v2
When trying to reclaim pages for a higher order allocation, make reclaim
try to move lumps of pages (fitting the requested order) about, instead
of single pages. This should significantly reduce the number of
reclaimed pages for higher order allocations.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/fs/buffer.c b/fs/buffer.c
index c953c15..2f8b073 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -374,7 +374,7 @@ static void free_more_memory(void)
for_each_online_pgdat(pgdat) {
zones = pgdat->node_zonelists[gfp_zone(GFP_NOFS)].zones;
if (*zones)
- try_to_free_pages(zones, GFP_NOFS);
+ try_to_free_pages(zones, 0, GFP_NOFS);
}
}
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 439f9a8..5c26736 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -187,7 +187,7 @@ extern int rotate_reclaimable_page(struct page *page);
extern void swap_setup(void);
/* linux/mm/vmscan.c */
-extern unsigned long try_to_free_pages(struct zone **, gfp_t);
+extern unsigned long try_to_free_pages(struct zone **, int, gfp_t);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7938e46..78801c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1371,7 +1371,7 @@ nofail_alloc:
reclaim_state.reclaimed_slab = 0;
p->reclaim_state = &reclaim_state;
- did_some_progress = try_to_free_pages(zonelist->zones, gfp_mask);
+ did_some_progress = try_to_free_pages(zonelist->zones, order, gfp_mask);
p->reclaim_state = NULL;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 80f4444..0f2d961 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -67,6 +67,8 @@ struct scan_control {
int swappiness;
int all_unreclaimable;
+
+ int order;
};
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
@@ -620,35 +622,86 @@ keep:
*
* returns how many pages were moved onto *@dst.
*/
+int __isolate_lru_page(struct page *page, int active)
+{
+ int ret = -EINVAL;
+
+ if (PageLRU(page) && (PageActive(page) == active)) {
+ ret = -EBUSY;
+ if (likely(get_page_unless_zero(page))) {
+ /*
+ * Be careful not to clear PageLRU until after we're
+ * sure the page is not being freed elsewhere -- the
+ * page release code relies on it.
+ */
+ ClearPageLRU(page);
+ ret = 0;
+ }
+ }
+
+ return ret;
+}
+
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
- unsigned long *scanned)
+ unsigned long *scanned, int order)
{
unsigned long nr_taken = 0;
- struct page *page;
- unsigned long scan;
+ struct page *page, *tmp;
+ unsigned long scan, pfn, end_pfn, page_pfn;
+ int active;
for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
- struct list_head *target;
page = lru_to_page(src);
prefetchw_prev_lru_page(page, src, flags);
VM_BUG_ON(!PageLRU(page));
- list_del(&page->lru);
- target = src;
- if (likely(get_page_unless_zero(page))) {
- /*
- * Be careful not to clear PageLRU until after we're
- * sure the page is not being freed elsewhere -- the
- * page release code relies on it.
- */
- ClearPageLRU(page);
- target = dst;
- nr_taken++;
- } /* else it is being freed elsewhere */
+ active = PageActive(page);
+ switch (__isolate_lru_page(page, active)) {
+ case 0:
+ list_move(&page->lru, dst);
+ nr_taken++;
+ break;
+
+ case -EBUSY:
+ /* else it is being freed elsewhere */
+ list_move(&page->lru, src);
+ continue;
+
+ default:
+ BUG();
+ }
- list_add(&page->lru, target);
+ if (!order)
+ continue;
+
+ page_pfn = pfn = __page_to_pfn(page);
+ end_pfn = pfn &= ~((1 << order) - 1);
+ end_pfn += 1 << order;
+ for (; pfn < end_pfn; pfn++) {
+ if (unlikely(pfn == page_pfn))
+ continue;
+ if (unlikely(!pfn_valid(pfn)))
+ break;
+
+ scan++;
+ tmp = __pfn_to_page(pfn);
+ switch (__isolate_lru_page(tmp, active)) {
+ case 0:
+ list_move(&tmp->lru, dst);
+ nr_taken++;
+ continue;
+
+ case -EBUSY:
+ /* else it is being freed elsewhere */
+ list_move(&tmp->lru, src);
+ default:
+ break;
+
+ }
+ break;
+ }
}
*scanned = scan;
@@ -679,7 +732,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
nr_taken = isolate_lru_pages(sc->swap_cluster_max,
&zone->inactive_list,
- &page_list, &nr_scan);
+ &page_list, &nr_scan, sc->order);
zone->nr_inactive -= nr_taken;
zone->pages_scanned += nr_scan;
zone->total_scanned += nr_scan;
@@ -825,7 +878,7 @@ force_reclaim_mapped:
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
- &l_hold, &pgscanned);
+ &l_hold, &pgscanned, sc->order);
zone->pages_scanned += pgscanned;
zone->nr_active -= pgmoved;
spin_unlock_irq(&zone->lru_lock);
@@ -1014,7 +1067,7 @@ static unsigned long shrink_zones(int priority, struct zone **zones,
* holds filesystem locks which prevent writeout this might not work, and the
* allocation attempt will fail.
*/
-unsigned long try_to_free_pages(struct zone **zones, gfp_t gfp_mask)
+unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask)
{
int priority;
int ret = 0;
@@ -1029,6 +1082,7 @@ unsigned long try_to_free_pages(struct zone **zones, gfp_t gfp_mask)
.swap_cluster_max = SWAP_CLUSTER_MAX,
.may_swap = 1,
.swappiness = vm_swappiness,
+ .order = order,
};
delay_swap_prefetch();
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/4] lumpy cleanup a missplaced comment and simplify some code
2006-12-06 16:59 [PATCH 0/4] Lumpy Reclaim V3 Andy Whitcroft
2006-12-06 16:59 ` [PATCH 1/4] lumpy reclaim v2 Andy Whitcroft
@ 2006-12-06 17:00 ` Andy Whitcroft
2006-12-06 17:00 ` [PATCH 3/4] lumpy ensure we respect zone boundaries Andy Whitcroft
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Andy Whitcroft @ 2006-12-06 17:00 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: Peter Zijlstra, Mel Gorman, Andy Whitcroft, linux-kernel
lumpy: cleanup a missplaced comment and simplify some code
Move the comment for isolate_lru_pages() back to its function
and comment the new function. Add some running commentry on the
area scan. Cleanup the indentation on switch to match the majority
view in mm/*. Finally, clarify the boundary pfn calculations.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0f2d961..0effa3e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -606,21 +606,14 @@ keep:
}
/*
- * zone->lru_lock is heavily contended. Some of the functions that
- * shrink the lists perform better by taking out a batch of pages
- * and working on them outside the LRU lock.
+ * Attempt to remove the specified page from its LRU. Only take this
+ * page if it is of the appropriate PageActive status. Pages which
+ * are being freed elsewhere are also ignored.
*
- * For pagecache intensive workloads, this function is the hottest
- * spot in the kernel (apart from copy_*_user functions).
- *
- * Appropriate locks must be held before calling this function.
+ * @page: page to consider
+ * @active: active/inactive flag only take pages of this type
*
- * @nr_to_scan: The number of pages to look through on the list.
- * @src: The LRU list to pull pages off.
- * @dst: The temp list to put pages on to.
- * @scanned: The number of pages that were scanned.
- *
- * returns how many pages were moved onto *@dst.
+ * returns 0 on success, -ve errno on failure.
*/
int __isolate_lru_page(struct page *page, int active)
{
@@ -642,6 +635,23 @@ int __isolate_lru_page(struct page *page, int active)
return ret;
}
+/*
+ * zone->lru_lock is heavily contended. Some of the functions that
+ * shrink the lists perform better by taking out a batch of pages
+ * and working on them outside the LRU lock.
+ *
+ * For pagecache intensive workloads, this function is the hottest
+ * spot in the kernel (apart from copy_*_user functions).
+ *
+ * Appropriate locks must be held before calling this function.
+ *
+ * @nr_to_scan: The number of pages to look through on the list.
+ * @src: The LRU list to pull pages off.
+ * @dst: The temp list to put pages on to.
+ * @scanned: The number of pages that were scanned.
+ *
+ * returns how many pages were moved onto *@dst.
+ */
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
unsigned long *scanned, int order)
@@ -659,26 +669,31 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
active = PageActive(page);
switch (__isolate_lru_page(page, active)) {
- case 0:
- list_move(&page->lru, dst);
- nr_taken++;
- break;
+ case 0:
+ list_move(&page->lru, dst);
+ nr_taken++;
+ break;
- case -EBUSY:
- /* else it is being freed elsewhere */
- list_move(&page->lru, src);
- continue;
+ case -EBUSY:
+ /* else it is being freed elsewhere */
+ list_move(&page->lru, src);
+ continue;
- default:
- BUG();
+ default:
+ BUG();
}
if (!order)
continue;
- page_pfn = pfn = __page_to_pfn(page);
- end_pfn = pfn &= ~((1 << order) - 1);
- end_pfn += 1 << order;
+ /*
+ * Attempt to take all pages in the order aligned region
+ * surrounding the tag page. Only take those pages of
+ * the same active state as that tag page.
+ */
+ page_pfn = __page_to_pfn(page);
+ pfn = page_pfn & ~((1 << order) - 1);
+ end_pfn = pfn + (1 << order);
for (; pfn < end_pfn; pfn++) {
if (unlikely(pfn == page_pfn))
continue;
@@ -688,17 +703,16 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
scan++;
tmp = __pfn_to_page(pfn);
switch (__isolate_lru_page(tmp, active)) {
- case 0:
- list_move(&tmp->lru, dst);
- nr_taken++;
- continue;
-
- case -EBUSY:
- /* else it is being freed elsewhere */
- list_move(&tmp->lru, src);
- default:
- break;
+ case 0:
+ list_move(&tmp->lru, dst);
+ nr_taken++;
+ continue;
+ case -EBUSY:
+ /* else it is being freed elsewhere */
+ list_move(&tmp->lru, src);
+ default:
+ break;
}
break;
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/4] lumpy ensure we respect zone boundaries
2006-12-06 16:59 [PATCH 0/4] Lumpy Reclaim V3 Andy Whitcroft
2006-12-06 16:59 ` [PATCH 1/4] lumpy reclaim v2 Andy Whitcroft
2006-12-06 17:00 ` [PATCH 2/4] lumpy cleanup a missplaced comment and simplify some code Andy Whitcroft
@ 2006-12-06 17:00 ` Andy Whitcroft
2006-12-06 17:01 ` [PATCH 4/4] lumpy take the other active inactive pages in the area Andy Whitcroft
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Andy Whitcroft @ 2006-12-06 17:00 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: Peter Zijlstra, Mel Gorman, Andy Whitcroft, linux-kernel
lumpy: ensure we respect zone boundaries
When scanning an aligned order N area ensure we only pull out pages
in the same zone as our tag page, else we will manipulate those
pages' LRU under the wrong zone lru_lock. Bad.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0effa3e..85f626b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -660,6 +660,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct page *page, *tmp;
unsigned long scan, pfn, end_pfn, page_pfn;
int active;
+ int zone_id;
for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
page = lru_to_page(src);
@@ -691,6 +692,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
* surrounding the tag page. Only take those pages of
* the same active state as that tag page.
*/
+ zone_id = page_zone_id(page);
page_pfn = __page_to_pfn(page);
pfn = page_pfn & ~((1 << order) - 1);
end_pfn = pfn + (1 << order);
@@ -700,8 +702,10 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
if (unlikely(!pfn_valid(pfn)))
break;
- scan++;
tmp = __pfn_to_page(pfn);
+ if (unlikely(page_zone_id(tmp) != zone_id))
+ continue;
+ scan++;
switch (__isolate_lru_page(tmp, active)) {
case 0:
list_move(&tmp->lru, dst);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/4] lumpy take the other active inactive pages in the area
2006-12-06 16:59 [PATCH 0/4] Lumpy Reclaim V3 Andy Whitcroft
` (2 preceding siblings ...)
2006-12-06 17:00 ` [PATCH 3/4] lumpy ensure we respect zone boundaries Andy Whitcroft
@ 2006-12-06 17:01 ` Andy Whitcroft
2006-12-11 23:29 ` [PATCH 0/4] Lumpy Reclaim V3 Andrew Morton
2006-12-12 11:13 ` Andrew Morton
5 siblings, 0 replies; 13+ messages in thread
From: Andy Whitcroft @ 2006-12-06 17:01 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: Peter Zijlstra, Mel Gorman, Andy Whitcroft, linux-kernel
lumpy: take the other active/inactive pages in the area
When we scan an order N aligned area around our tag page take any
other pages with a matching active state to that of the tag page.
This will tend to demote areas of the order we are interested from
the active list to the inactive list and from the end of the inactive
list, increasing the chances of such areas coming free together.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 85f626b..fc23d87 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -710,7 +710,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
case 0:
list_move(&tmp->lru, dst);
nr_taken++;
- continue;
+ break;
case -EBUSY:
/* else it is being freed elsewhere */
@@ -718,7 +718,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
default:
break;
}
- break;
}
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/4] Lumpy Reclaim V3
2006-12-06 16:59 [PATCH 0/4] Lumpy Reclaim V3 Andy Whitcroft
` (3 preceding siblings ...)
2006-12-06 17:01 ` [PATCH 4/4] lumpy take the other active inactive pages in the area Andy Whitcroft
@ 2006-12-11 23:29 ` Andrew Morton
2007-01-29 12:24 ` Andy Whitcroft
2006-12-12 11:13 ` Andrew Morton
5 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2006-12-11 23:29 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: linux-mm, Peter Zijlstra, Mel Gorman, linux-kernel
On Wed, 6 Dec 2006 16:59:04 +0000
Andy Whitcroft <apw@shadowen.org> wrote:
> This is a repost of the lumpy reclaim patch set. This is
> basically unchanged from the last post, other than being rebased
> to 2.6.19-rc2-mm2.
The patch sequencing appeared to be designed to make the code hard to
review, so I clumped them all into a single diff:
>
> /*
> + * Attempt to remove the specified page from its LRU. Only take this
> + * page if it is of the appropriate PageActive status. Pages which
> + * are being freed elsewhere are also ignored.
> + *
> + * @page: page to consider
> + * @active: active/inactive flag only take pages of this type
I dunno who started adding these @'s into non-kernel-doc comments. I'll
un-add them.
> + * returns 0 on success, -ve errno on failure.
> + */
> +int __isolate_lru_page(struct page *page, int active)
> +{
> + int ret = -EINVAL;
> +
> + if (PageLRU(page) && (PageActive(page) == active)) {
We hope that all architectures remember that test_bit returns 0 or
1. We got that wrong a few years back. What we do now is rather
un-C-like. And potentially inefficient. Hopefully the compiler usually
sorts it out though.
> + ret = -EBUSY;
> + if (likely(get_page_unless_zero(page))) {
> + /*
> + * Be careful not to clear PageLRU until after we're
> + * sure the page is not being freed elsewhere -- the
> + * page release code relies on it.
> + */
> + ClearPageLRU(page);
> + ret = 0;
> + }
> + }
> +
> + return ret;
> +}
> +
> +/*
> * zone->lru_lock is heavily contended. Some of the functions that
> * shrink the lists perform better by taking out a batch of pages
> * and working on them outside the LRU lock.
> @@ -621,33 +653,71 @@ keep:
> */
> static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> struct list_head *src, struct list_head *dst,
> - unsigned long *scanned)
> + unsigned long *scanned, int order)
> {
> unsigned long nr_taken = 0;
> - struct page *page;
> - unsigned long scan;
> + struct page *page, *tmp;
"tmp" isn't a very good identifier.
> + unsigned long scan, pfn, end_pfn, page_pfn;
One declaration per line is preferred. This gives you room for a brief
comment, where appropriate.
> + /*
> + * Attempt to take all pages in the order aligned region
> + * surrounding the tag page. Only take those pages of
> + * the same active state as that tag page.
> + */
> + zone_id = page_zone_id(page);
> + page_pfn = __page_to_pfn(page);
> + pfn = page_pfn & ~((1 << order) - 1);
Is this always true? It assumes that the absolute value of the starting
pfn of each zone is a multiple of MAX_ORDER (doesn't it?) I don't see any
reason per-se why that has to be true (although it might be).
hm, I guess it has to be true, else hugetlb pages wouldn't work too well.
> + end_pfn = pfn + (1 << order);
> + for (; pfn < end_pfn; pfn++) {
> + if (unlikely(pfn == page_pfn))
> + continue;
> + if (unlikely(!pfn_valid(pfn)))
> + break;
> +
> + tmp = __pfn_to_page(pfn);
> + if (unlikely(page_zone_id(tmp) != zone_id))
> + continue;
> + scan++;
> + switch (__isolate_lru_page(tmp, active)) {
> + case 0:
> + list_move(&tmp->lru, dst);
> + nr_taken++;
> + break;
> +
> + case -EBUSY:
> + /* else it is being freed elsewhere */
> + list_move(&tmp->lru, src);
> + default:
> + break;
> + }
> + }
I think each of those
if (expr)
continue;
statements would benefit from a nice comment explaining why.
This physical-scan part of the function will skip pages which happen to be
on *src. I guess that won't matter much, once the sytem has been up for a
while and the LRU is nicely scrambled.
If this function is passed a list of 32 pages, and order=4, I think it will
go and give us as many as 512 pages on *dst? A check of nr_taken might be
needed.
The patch is pretty simple, isn't it?
I guess a shortcoming is that it doesn't address the situation where
GFP_ATOMIC network rx is trying to allocate order-2 pages for large skbs,
but kswapd doesn't know that. AFACIT nobody will actually run the nice new
code in this quite common scenario.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/4] Lumpy Reclaim V3
2006-12-06 16:59 [PATCH 0/4] Lumpy Reclaim V3 Andy Whitcroft
` (4 preceding siblings ...)
2006-12-11 23:29 ` [PATCH 0/4] Lumpy Reclaim V3 Andrew Morton
@ 2006-12-12 11:13 ` Andrew Morton
2006-12-12 13:14 ` Andy Whitcroft
2007-01-29 12:25 ` Andy Whitcroft
5 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2006-12-12 11:13 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: linux-mm, Peter Zijlstra, Mel Gorman, linux-kernel
On Wed, 6 Dec 2006 16:59:04 +0000
Andy Whitcroft <apw@shadowen.org> wrote:
> This is a repost of the lumpy reclaim patch set.
more...
One concern is that when the code goes to reclaim a lump and fails, we end
up reclaiming a number of pages which we didn't really want to reclaim.
Regardless of the LRU status of those pages.
I think what we should do here is to add the appropriate vmstat counters
for us to be able to assess the frequency of this occurring, then throw a
spread of workloads at it. If that work indicates that there's a problem
then we should look at being a bit smarter about whether all the pages look
to be reclaimable and if not, restore them all and give up.
Also, I suspect it would be cleaner and faster to pass the `active' flag
into isolate_lru_pages(), rather than calculating it on the fly. And I
don't think we need to calculate it on every pass through the loop?
We really do need those vmstat counters to let us see how effective this
thing is being. Basic success/fail stuff. Per-zone, I guess.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/4] Lumpy Reclaim V3
2006-12-12 11:13 ` Andrew Morton
@ 2006-12-12 13:14 ` Andy Whitcroft
2007-01-29 12:25 ` Andy Whitcroft
1 sibling, 0 replies; 13+ messages in thread
From: Andy Whitcroft @ 2006-12-12 13:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Peter Zijlstra, Mel Gorman, linux-kernel
Andrew Morton wrote:
> On Wed, 6 Dec 2006 16:59:04 +0000
> Andy Whitcroft <apw@shadowen.org> wrote:
>
>> This is a repost of the lumpy reclaim patch set.
>
> more...
>
> One concern is that when the code goes to reclaim a lump and fails, we end
> up reclaiming a number of pages which we didn't really want to reclaim.
> Regardless of the LRU status of those pages.
>
> I think what we should do here is to add the appropriate vmstat counters
> for us to be able to assess the frequency of this occurring, then throw a
> spread of workloads at it. If that work indicates that there's a problem
> then we should look at being a bit smarter about whether all the pages look
> to be reclaimable and if not, restore them all and give up.
>
> Also, I suspect it would be cleaner and faster to pass the `active' flag
> into isolate_lru_pages(), rather than calculating it on the fly. And I
> don't think we need to calculate it on every pass through the loop?
>
>
> We really do need those vmstat counters to let us see how effective this
> thing is being. Basic success/fail stuff. Per-zone, I guess.
Sounds like a cue ... I'll go do that.
-apw
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/4] lumpy reclaim v2
2006-12-06 16:59 ` [PATCH 1/4] lumpy reclaim v2 Andy Whitcroft
@ 2006-12-15 4:57 ` Andrew Morton
2007-01-26 11:00 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2006-12-15 4:57 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: linux-mm, Peter Zijlstra, Mel Gorman, linux-kernel
On Wed, 6 Dec 2006 16:59:35 +0000
Andy Whitcroft <apw@shadowen.org> wrote:
> + tmp = __pfn_to_page(pfn);
ia64 doesn't implement __page_to_pfn. Why did you not use page_to_pfn()?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/4] lumpy reclaim v2
2006-12-15 4:57 ` Andrew Morton
@ 2007-01-26 11:00 ` Andrew Morton
0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2007-01-26 11:00 UTC (permalink / raw)
To: Andy Whitcroft, linux-mm, Peter Zijlstra, Mel Gorman, linux-kernel
On Thu, 14 Dec 2006 20:57:34 -0800
Andrew Morton <akpm@osdl.org> wrote:
> On Wed, 6 Dec 2006 16:59:35 +0000
> Andy Whitcroft <apw@shadowen.org> wrote:
>
> > + tmp = __pfn_to_page(pfn);
>
> ia64 doesn't implement __page_to_pfn. Why did you not use page_to_pfn()?
Poke. I'm still a no-compile on ia64.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/4] Lumpy Reclaim V3
2006-12-11 23:29 ` [PATCH 0/4] Lumpy Reclaim V3 Andrew Morton
@ 2007-01-29 12:24 ` Andy Whitcroft
0 siblings, 0 replies; 13+ messages in thread
From: Andy Whitcroft @ 2007-01-29 12:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Peter Zijlstra, Mel Gorman, linux-kernel
Andrew Morton wrote:
> On Wed, 6 Dec 2006 16:59:04 +0000
> Andy Whitcroft <apw@shadowen.org> wrote:
>
>> This is a repost of the lumpy reclaim patch set. This is
>> basically unchanged from the last post, other than being rebased
>> to 2.6.19-rc2-mm2.
>
> The patch sequencing appeared to be designed to make the code hard to
> review, so I clumped them all into a single diff:
>
>>
>> /*
>> + * Attempt to remove the specified page from its LRU. Only take this
>> + * page if it is of the appropriate PageActive status. Pages which
>> + * are being freed elsewhere are also ignored.
>> + *
>> + * @page: page to consider
>> + * @active: active/inactive flag only take pages of this type
>
> I dunno who started adding these @'s into non-kernel-doc comments. I'll
> un-add them.
>
>> + * returns 0 on success, -ve errno on failure.
>> + */
>> +int __isolate_lru_page(struct page *page, int active)
>> +{
>> + int ret = -EINVAL;
>> +
>> + if (PageLRU(page) && (PageActive(page) == active)) {
>
> We hope that all architectures remember that test_bit returns 0 or
> 1. We got that wrong a few years back. What we do now is rather
> un-C-like. And potentially inefficient. Hopefully the compiler usually
> sorts it out though.
With the code as it is in this patch this is safe as there is an
uncommented assumption that the active parameter is actually also the
return from a call to PageActive and therefore should be comparible
regardless of value. However, as you also point out elsewhere we are in
fact looking that active value up every time we spin the search loop,
firstly doing it loads more than required, and second potentially when
unlucky actually picking the wrong value from a page in transition and
doing bad things. Thus we will be moving to being explicit at the
isolate_lru_pages level, at which point there will be risk here. We
will need to coax each of these to booleans before comparison.
>
>
>> + ret = -EBUSY;
>> + if (likely(get_page_unless_zero(page))) {
>> + /*
>> + * Be careful not to clear PageLRU until after we're
>> + * sure the page is not being freed elsewhere -- the
>> + * page release code relies on it.
>> + */
>> + ClearPageLRU(page);
>> + ret = 0;
>> + }
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +/*
>> * zone->lru_lock is heavily contended. Some of the functions that
>> * shrink the lists perform better by taking out a batch of pages
>> * and working on them outside the LRU lock.
>> @@ -621,33 +653,71 @@ keep:
>> */
>> static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>> struct list_head *src, struct list_head *dst,
>> - unsigned long *scanned)
>> + unsigned long *scanned, int order)
>> {
>> unsigned long nr_taken = 0;
>> - struct page *page;
>> - unsigned long scan;
>> + struct page *page, *tmp;
>
> "tmp" isn't a very good identifier.
>
>> + unsigned long scan, pfn, end_pfn, page_pfn;
>
> One declaration per line is preferred. This gives you room for a brief
> comment, where appropriate.
All true, I've folded your cleanups into the base code, they are clearly
better.
>
>
>> + /*
>> + * Attempt to take all pages in the order aligned region
>> + * surrounding the tag page. Only take those pages of
>> + * the same active state as that tag page.
>> + */
>> + zone_id = page_zone_id(page);
>> + page_pfn = __page_to_pfn(page);
>> + pfn = page_pfn & ~((1 << order) - 1);
>
> Is this always true? It assumes that the absolute value of the starting
> pfn of each zone is a multiple of MAX_ORDER (doesn't it?) I don't see any
> reason per-se why that has to be true (although it might be).
Yes this is always true, the buddy guarentees that the struct pages are
valid and present out to MAX_ORDER around any known valid page. The
zone boundary _may_ (rarely) not be MAX_ORDER aligned, but thats ok as
we will detect that below when we check for the page_zone_id matching
the cursor page's zone_id; should it not bail.
I've added some commentary about the assumptions on which we are relying
here.
> hm, I guess it has to be true, else hugetlb pages wouldn't work too well.
Well its mostly true, but if you remember the zone rounding patches we
decided the check was so cheap as we can simply can compare the zone ids
on the pages which is in the page_flags which we are touching anyhow.
The code below (.+9) maintains this check for this situation. I've
added commentry to the code to say what its for.
>
>> + end_pfn = pfn + (1 << order);
>> + for (; pfn < end_pfn; pfn++) {
>> + if (unlikely(pfn == page_pfn))
>> + continue;
>> + if (unlikely(!pfn_valid(pfn)))
>> + break;
>> +
>> + tmp = __pfn_to_page(pfn);
>> + if (unlikely(page_zone_id(tmp) != zone_id))
>> + continue;
>> + scan++;
>> + switch (__isolate_lru_page(tmp, active)) {
>> + case 0:
>> + list_move(&tmp->lru, dst);
>> + nr_taken++;
>> + break;
>> +
>> + case -EBUSY:
>> + /* else it is being freed elsewhere */
>> + list_move(&tmp->lru, src);
>> + default:
>> + break;
>> + }
>> + }
>
> I think each of those
>
> if (expr)
> continue;
>
> statements would benefit from a nice comment explaining why.
Yes, I've added commentary for each of these, they seem obvious the day
you write them, and not a month later.
>
>
> This physical-scan part of the function will skip pages which happen to be
> on *src. I guess that won't matter much, once the sytem has been up for a
> while and the LRU is nicely scrambled.
>
>
> If this function is passed a list of 32 pages, and order=4, I think it
will
> go and give us as many as 512 pages on *dst? A check of nr_taken might be
> needed.
If we ask to scan 32 pages we may scan more pages should 1<<order be
greater than 32. However, if we are not going to scan at least 1 block
at the order we requested then there is no point even in starting. We
would not be in direct reclaim (which is the only way order may be
non-0) so we really do want any scan to be at least 1 block else its a
no-op and should be stopped early -- at least that seems sane?
> The patch is pretty simple, isn't it?
>
> I guess a shortcoming is that it doesn't address the situation where
> GFP_ATOMIC network rx is trying to allocate order-2 pages for large skbs,
> but kswapd doesn't know that. AFACIT nobody will actually run the
nice new
> code in this quite common scenario.
>
That is an issue. Will see what we can do about that.
-apw
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/4] Lumpy Reclaim V3
2006-12-12 11:13 ` Andrew Morton
2006-12-12 13:14 ` Andy Whitcroft
@ 2007-01-29 12:25 ` Andy Whitcroft
1 sibling, 0 replies; 13+ messages in thread
From: Andy Whitcroft @ 2007-01-29 12:25 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Peter Zijlstra, Mel Gorman, linux-kernel
Andrew Morton wrote:
> On Wed, 6 Dec 2006 16:59:04 +0000
> Andy Whitcroft <apw@shadowen.org> wrote:
>
>> This is a repost of the lumpy reclaim patch set.
>
> more...
>
> One concern is that when the code goes to reclaim a lump and fails, we end
> up reclaiming a number of pages which we didn't really want to reclaim.
> Regardless of the LRU status of those pages.
Yes, the "ineffective reclaim" is more of an issue with this than
linear, and the cost metric we are working on should help us show that;
and then help us evaluate the utility of pushing the pages back without
releasing them.
> I think what we should do here is to add the appropriate vmstat counters
> for us to be able to assess the frequency of this occurring, then throw a
> spread of workloads at it. If that work indicates that there's a problem
> then we should look at being a bit smarter about whether all the pages look
> to be reclaimable and if not, restore them all and give up.
Yes, what was obvious from the linear against lumpy was that the only
valid comparison was on the 'effectiveness' metric (which was basically
the same) and code complexity (where lumpy clearly wins). But we have
no feel for 'cost'. We are working on a cost metric for reclaim which
we can use to compare performance.
>
> Also, I suspect it would be cleaner and faster to pass the `active' flag
> into isolate_lru_pages(), rather than calculating it on the fly. And I
> don't think we need to calculate it on every pass through the loop?
Yes this is actually IMO wrong for not doing this, and I've updated it.
>
> We really do need those vmstat counters to let us see how effective this
> thing is being. Basic success/fail stuff. Per-zone, I guess.
Yes, though the ones I have seem sane the output isn't stable and we're
investigating that right now.
-apw
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/4] lumpy cleanup a missplaced comment and simplify some code
2006-11-23 16:48 Andy Whitcroft
@ 2006-11-23 16:49 ` Andy Whitcroft
0 siblings, 0 replies; 13+ messages in thread
From: Andy Whitcroft @ 2006-11-23 16:49 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Peter Zijlstra, Mel Gorman, Andy Whitcroft, linux-kernel
lumpy: cleanup a missplaced comment and simplify some code
Move the comment for isolate_lru_pages() back to its function
and comment the new function. Add some running commentry on the
area scan. Cleanup the indentation on switch to match the majority
view in mm/*. Finally, clarify the boundary pfn calculations.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4645a3f..3b6ef79 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -609,21 +609,14 @@ keep:
}
/*
- * zone->lru_lock is heavily contended. Some of the functions that
- * shrink the lists perform better by taking out a batch of pages
- * and working on them outside the LRU lock.
+ * Attempt to remove the specified page from its LRU. Only take this
+ * page if it is of the appropriate PageActive status. Pages which
+ * are being freed elsewhere are also ignored.
*
- * For pagecache intensive workloads, this function is the hottest
- * spot in the kernel (apart from copy_*_user functions).
- *
- * Appropriate locks must be held before calling this function.
+ * @page: page to consider
+ * @active: active/inactive flag only take pages of this type
*
- * @nr_to_scan: The number of pages to look through on the list.
- * @src: The LRU list to pull pages off.
- * @dst: The temp list to put pages on to.
- * @scanned: The number of pages that were scanned.
- *
- * returns how many pages were moved onto *@dst.
+ * returns 0 on success, -ve errno on failure.
*/
int __isolate_lru_page(struct page *page, int active)
{
@@ -645,6 +638,23 @@ int __isolate_lru_page(struct page *page
return ret;
}
+/*
+ * zone->lru_lock is heavily contended. Some of the functions that
+ * shrink the lists perform better by taking out a batch of pages
+ * and working on them outside the LRU lock.
+ *
+ * For pagecache intensive workloads, this function is the hottest
+ * spot in the kernel (apart from copy_*_user functions).
+ *
+ * Appropriate locks must be held before calling this function.
+ *
+ * @nr_to_scan: The number of pages to look through on the list.
+ * @src: The LRU list to pull pages off.
+ * @dst: The temp list to put pages on to.
+ * @scanned: The number of pages that were scanned.
+ *
+ * returns how many pages were moved onto *@dst.
+ */
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
unsigned long *scanned, int order)
@@ -662,26 +672,31 @@ static unsigned long isolate_lru_pages(u
active = PageActive(page);
switch (__isolate_lru_page(page, active)) {
- case 0:
- list_move(&page->lru, dst);
- nr_taken++;
- break;
+ case 0:
+ list_move(&page->lru, dst);
+ nr_taken++;
+ break;
- case -EBUSY:
- /* else it is being freed elsewhere */
- list_move(&page->lru, src);
- continue;
+ case -EBUSY:
+ /* else it is being freed elsewhere */
+ list_move(&page->lru, src);
+ continue;
- default:
- BUG();
+ default:
+ BUG();
}
if (!order)
continue;
- page_pfn = pfn = __page_to_pfn(page);
- end_pfn = pfn &= ~((1 << order) - 1);
- end_pfn += 1 << order;
+ /*
+ * Attempt to take all pages in the order aligned region
+ * surrounding the tag page. Only take those pages of
+ * the same active state as that tag page.
+ */
+ page_pfn = __page_to_pfn(page);
+ pfn = page_pfn & ~((1 << order) - 1);
+ end_pfn = pfn + (1 << order);
for (; pfn < end_pfn; pfn++) {
if (unlikely(pfn == page_pfn))
continue;
@@ -691,17 +706,16 @@ static unsigned long isolate_lru_pages(u
scan++;
tmp = __pfn_to_page(pfn);
switch (__isolate_lru_page(tmp, active)) {
- case 0:
- list_move(&tmp->lru, dst);
- nr_taken++;
- continue;
-
- case -EBUSY:
- /* else it is being freed elsewhere */
- list_move(&tmp->lru, src);
- default:
- break;
+ case 0:
+ list_move(&tmp->lru, dst);
+ nr_taken++;
+ continue;
+ case -EBUSY:
+ /* else it is being freed elsewhere */
+ list_move(&tmp->lru, src);
+ default:
+ break;
}
break;
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-01-29 12:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-06 16:59 [PATCH 0/4] Lumpy Reclaim V3 Andy Whitcroft
2006-12-06 16:59 ` [PATCH 1/4] lumpy reclaim v2 Andy Whitcroft
2006-12-15 4:57 ` Andrew Morton
2007-01-26 11:00 ` Andrew Morton
2006-12-06 17:00 ` [PATCH 2/4] lumpy cleanup a missplaced comment and simplify some code Andy Whitcroft
2006-12-06 17:00 ` [PATCH 3/4] lumpy ensure we respect zone boundaries Andy Whitcroft
2006-12-06 17:01 ` [PATCH 4/4] lumpy take the other active inactive pages in the area Andy Whitcroft
2006-12-11 23:29 ` [PATCH 0/4] Lumpy Reclaim V3 Andrew Morton
2007-01-29 12:24 ` Andy Whitcroft
2006-12-12 11:13 ` Andrew Morton
2006-12-12 13:14 ` Andy Whitcroft
2007-01-29 12:25 ` Andy Whitcroft
-- strict thread matches above, loose matches on Subject: below --
2006-11-23 16:48 Andy Whitcroft
2006-11-23 16:49 ` [PATCH 2/4] lumpy cleanup a missplaced comment and simplify some code Andy Whitcroft
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).