LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2
@ 2007-03-01 10:08 Mel Gorman
  2007-03-01 10:08 ` [PATCH 1/8] Add __GFP_MOVABLE for callers to flag allocations that may be migrated Mel Gorman
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:08 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm

Changelog since v1
o Rebased to 2.6.20-rc6-mm2
o Added necessary changes to per-zone VM stats for new zone (Christoph)
o Removed unnecessary changes to __ZONE_SHIFT (Christoph)
o Added paranoid check for overflow in cmdline_parse_kernelcore (Andy)

The following 8 patches against 2.6.20-mm2 create a zone called ZONE_MOVABLE
that is only usable by allocations that specify both __GFP_HIGHMEM and
__GFP_MOVABLE. This has the effect of keeping all non-movable pages within a
single memory partition while allowing movable allocations to be satisfied
from either partition. The patches may be applied with the list-based
anti-fragmentation patches that groups pages together based on mobility.

The size of the zone is determined by a kernelcore= parameter specified at
boot-time. This specifies how much memory is usable by non-movable allocations
and the remainder is used for ZONE_MOVABLE. Any range of pages within
ZONE_MOVABLE can be released by migrating the pages or by reclaiming.

When selecting a zone to take pages from for ZONE_MOVABLE, there are two
things to consider. First, only memory from the highest populated zone is
used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM
but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second,
the amount of memory usable by the kernel will be spread evenly throughout
NUMA nodes where possible. If the nodes are not of equal size, the amount
of memory usable by the kernel on some nodes may be greater than others.

By default, the zone is not as useful for hugetlb allocations because they
are pinned and non-migratable (currently at least). A sysctl is provided that
allows huge pages to be allocated from that zone. This means that the huge
page pool can be resized to the size of ZONE_MOVABLE during the lifetime of
the system assuming that pages are not mlocked. Despite huge pages being
non-movable, we do not introduce additional external fragmentation of note
as huge pages are always the largest contiguous block we care about.

Credit goes to Andy Whitcroft for catching a large variety of problems during
review of the patches.
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/8] Add __GFP_MOVABLE for callers to flag allocations that may be migrated
  2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
@ 2007-03-01 10:08 ` Mel Gorman
  2007-03-01 10:08 ` [PATCH 2/8] Create the ZONE_MOVABLE zone Mel Gorman
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:08 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm


(Note, this is identical to the equivilant patch in list-based
anti-fragmentation. When applying both sets of patches, only apply this once)

It is often known at allocation time when a page may be migrated or
not. This patch adds a flag called __GFP_MOVABLE and a new mask called
GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated
using the page migration mechanism or reclaimed by syncing with backing
storage and discarding.

An API function very similar to alloc_zeroed_user_highpage() is added for
__GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The
flags used by alloc_zeroed_user_highpage() are not changed because it changes
the semantics of an existing API. After this patch is applied there are no
in-kernel users of alloc_zeroed_user_highpage() so it probably should be
marked deprecated if this patch is merged.

Note that this patch includes a minor cleanup to the use of __GFP_ZERO
in shmem.c to keep all flag modifications to inode->mapping in the
shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of
Hugh Dickens.

Additional credit goes to Christoph Lameter and Linus Torvalds for shaping
the concept. Credit to Hugh Dickens for catching issues with shmem swap
vector and ramfs allocations.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 fs/inode.c                |   10 ++++++--
 fs/ramfs/inode.c          |    1 
 include/asm-alpha/page.h  |    3 +-
 include/asm-cris/page.h   |    3 +-
 include/asm-h8300/page.h  |    3 +-
 include/asm-i386/page.h   |    3 +-
 include/asm-ia64/page.h   |    5 ++--
 include/asm-m32r/page.h   |    3 +-
 include/asm-s390/page.h   |    3 +-
 include/asm-x86_64/page.h |    3 +-
 include/linux/gfp.h       |   10 +++++++-
 include/linux/highmem.h   |   51 +++++++++++++++++++++++++++++++++++++++--
 mm/memory.c               |    8 +++---
 mm/mempolicy.c            |    4 +--
 mm/migrate.c              |    2 -
 mm/shmem.c                |    7 ++++-
 mm/swap_prefetch.c        |    2 -
 mm/swap_state.c           |    2 -
 18 files changed, 98 insertions(+), 25 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/fs/inode.c linux-2.6.20-mm2-001_mark_highmovable/fs/inode.c
--- linux-2.6.20-mm2-clean/fs/inode.c	2007-02-19 01:21:38.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/fs/inode.c	2007-02-19 09:08:29.000000000 +0000
@@ -145,7 +145,7 @@ static struct inode *alloc_inode(struct 
 		mapping->a_ops = &empty_aops;
  		mapping->host = inode;
 		mapping->flags = 0;
-		mapping_set_gfp_mask(mapping, GFP_HIGHUSER);
+		mapping_set_gfp_mask(mapping, GFP_HIGH_MOVABLE);
 		mapping->assoc_mapping = NULL;
 		mapping->backing_dev_info = &default_backing_dev_info;
 
@@ -521,7 +521,13 @@ repeat:
  *	new_inode 	- obtain an inode
  *	@sb: superblock
  *
- *	Allocates a new inode for given superblock.
+ *	Allocates a new inode for given superblock. The default gfp_mask
+ *	for allocations related to inode->i_mapping is GFP_HIGH_MOVABLE. If
+ *	HIGHMEM pages are unsuitable or it is known that pages allocated
+ *	for the page cache are not reclaimable or migratable,
+ *	mapping_set_gfp_mask() must be called with suitable flags on the
+ *	newly created inode's mapping
+ *
  */
 struct inode *new_inode(struct super_block *sb)
 {
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/fs/ramfs/inode.c linux-2.6.20-mm2-001_mark_highmovable/fs/ramfs/inode.c
--- linux-2.6.20-mm2-clean/fs/ramfs/inode.c	2007-02-19 01:21:42.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/fs/ramfs/inode.c	2007-02-19 09:08:29.000000000 +0000
@@ -61,6 +61,7 @@ struct inode *ramfs_get_inode(struct sup
 		inode->i_blocks = 0;
 		inode->i_mapping->a_ops = &ramfs_aops;
 		inode->i_mapping->backing_dev_info = &ramfs_backing_dev_info;
+		mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
 		inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 		switch (mode & S_IFMT) {
 		default:
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/asm-alpha/page.h linux-2.6.20-mm2-001_mark_highmovable/include/asm-alpha/page.h
--- linux-2.6.20-mm2-clean/include/asm-alpha/page.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/asm-alpha/page.h	2007-02-19 09:08:29.000000000 +0000
@@ -17,7 +17,8 @@
 extern void clear_page(void *page);
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vmaddr)
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vmaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 extern void copy_page(void * _to, void * _from);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/asm-cris/page.h linux-2.6.20-mm2-001_mark_highmovable/include/asm-cris/page.h
--- linux-2.6.20-mm2-clean/include/asm-cris/page.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/asm-cris/page.h	2007-02-19 09:08:29.000000000 +0000
@@ -20,7 +20,8 @@
 #define clear_user_page(page, vaddr, pg)    clear_page(page)
 #define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 /*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/asm-h8300/page.h linux-2.6.20-mm2-001_mark_highmovable/include/asm-h8300/page.h
--- linux-2.6.20-mm2-clean/include/asm-h8300/page.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/asm-h8300/page.h	2007-02-19 09:08:29.000000000 +0000
@@ -22,7 +22,8 @@
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 /*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/asm-i386/page.h linux-2.6.20-mm2-001_mark_highmovable/include/asm-i386/page.h
--- linux-2.6.20-mm2-clean/include/asm-i386/page.h	2007-02-19 01:21:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/asm-i386/page.h	2007-02-19 09:08:29.000000000 +0000
@@ -34,7 +34,8 @@
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 /*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/asm-ia64/page.h linux-2.6.20-mm2-001_mark_highmovable/include/asm-ia64/page.h
--- linux-2.6.20-mm2-clean/include/asm-ia64/page.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/asm-ia64/page.h	2007-02-19 09:08:29.000000000 +0000
@@ -87,9 +87,10 @@ do {						\
 } while (0)
 
 
-#define alloc_zeroed_user_highpage(vma, vaddr) \
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 ({						\
-	struct page *page = alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr); \
+	struct page *page = alloc_page_vma(
+		GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr); \
 	if (page)				\
  		flush_dcache_page(page);	\
 	page;					\
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/asm-m32r/page.h linux-2.6.20-mm2-001_mark_highmovable/include/asm-m32r/page.h
--- linux-2.6.20-mm2-clean/include/asm-m32r/page.h	2007-02-19 01:21:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/asm-m32r/page.h	2007-02-19 09:08:29.000000000 +0000
@@ -15,7 +15,8 @@ extern void copy_page(void *to, void *fr
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 /*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/asm-s390/page.h linux-2.6.20-mm2-001_mark_highmovable/include/asm-s390/page.h
--- linux-2.6.20-mm2-clean/include/asm-s390/page.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/asm-s390/page.h	2007-02-19 09:08:29.000000000 +0000
@@ -64,7 +64,8 @@ static inline void copy_page(void *to, v
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 /*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/asm-x86_64/page.h linux-2.6.20-mm2-001_mark_highmovable/include/asm-x86_64/page.h
--- linux-2.6.20-mm2-clean/include/asm-x86_64/page.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/asm-x86_64/page.h	2007-02-19 09:08:29.000000000 +0000
@@ -51,7 +51,8 @@ void copy_page(void *, void *);
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 /*
  * These are used to make use of C type-checking..
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/linux/gfp.h linux-2.6.20-mm2-001_mark_highmovable/include/linux/gfp.h
--- linux-2.6.20-mm2-clean/include/linux/gfp.h	2007-02-19 01:22:30.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/linux/gfp.h	2007-02-19 09:08:29.000000000 +0000
@@ -30,6 +30,9 @@ struct vm_area_struct;
  * cannot handle allocation failures.
  *
  * __GFP_NORETRY: The VM implementation must not retry indefinitely.
+ *
+ * __GFP_MOVABLE: Flag that this page will be movable by the page migration
+ * mechanism or reclaimed
  */
 #define __GFP_WAIT	((__force gfp_t)0x10u)	/* Can wait and reschedule? */
 #define __GFP_HIGH	((__force gfp_t)0x20u)	/* Should access emergency pools? */
@@ -46,6 +49,7 @@ struct vm_area_struct;
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
 #define __GFP_HARDWALL   ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
+#define __GFP_MOVABLE	((__force gfp_t)0x80000u) /* Page is movable */
 
 #define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
@@ -54,7 +58,8 @@ struct vm_area_struct;
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 			__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
-			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE)
+			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|\
+			__GFP_MOVABLE)
 
 /* This equals 0, but use constants in case they ever change */
 #define GFP_NOWAIT	(GFP_ATOMIC & ~__GFP_HIGH)
@@ -66,6 +71,9 @@ struct vm_area_struct;
 #define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
 #define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
 			 __GFP_HIGHMEM)
+#define GFP_HIGH_MOVABLE	(__GFP_WAIT | __GFP_IO | __GFP_FS | \
+				 __GFP_HARDWALL | __GFP_HIGHMEM | \
+				 __GFP_MOVABLE)
 
 #ifdef CONFIG_NUMA
 #define GFP_THISNODE	(__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/include/linux/highmem.h linux-2.6.20-mm2-001_mark_highmovable/include/linux/highmem.h
--- linux-2.6.20-mm2-clean/include/linux/highmem.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/include/linux/highmem.h	2007-02-19 09:08:29.000000000 +0000
@@ -62,10 +62,27 @@ static inline void clear_user_highpage(s
 }
 
 #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
+/**
+ * __alloc_zeroed_user_highpage - Allocate a zeroed HIGHMEM page for a VMA with caller-specified movable GFP flags
+ * @movableflags: The GFP flags related to the pages future ability to move like __GFP_MOVABLE
+ * @vma: The VMA the page is to be allocated for
+ * @vaddr: The virtual address the page will be inserted into
+ *
+ * This function will allocate a page for a VMA but the caller is expected
+ * to specify via movableflags whether the page will be movable in the
+ * future or not
+ *
+ * An architecture may override this function by defining
+ * __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE and providing their own
+ * implementation.
+ */
 static inline struct page *
-alloc_zeroed_user_highpage(struct vm_area_struct *vma, unsigned long vaddr)
+__alloc_zeroed_user_highpage(gfp_t movableflags,
+			struct vm_area_struct *vma,
+			unsigned long vaddr)
 {
-	struct page *page = alloc_page_vma(GFP_HIGHUSER, vma, vaddr);
+	struct page *page = alloc_page_vma(GFP_HIGHUSER | movableflags,
+			vma, vaddr);
 
 	if (page)
 		clear_user_highpage(page, vaddr);
@@ -74,6 +91,36 @@ alloc_zeroed_user_highpage(struct vm_are
 }
 #endif
 
+/**
+ * alloc_zeroed_user_highpage - Allocate a zeroed HIGHMEM page for a VMA
+ * @vma: The VMA the page is to be allocated for
+ * @vaddr: The virtual address the page will be inserted into
+ *
+ * This function will allocate a page for a VMA that the caller knows will
+ * not be able to move in the future using move_pages() or reclaim. If it
+ * is known that the page can move, use alloc_zeroed_user_highpage_movable
+ */
+static inline struct page *
+alloc_zeroed_user_highpage(struct vm_area_struct *vma, unsigned long vaddr)
+{
+	return __alloc_zeroed_user_highpage(0, vma, vaddr);
+}
+
+/**
+ * alloc_zeroed_user_highpage_movable - Allocate a zeroed HIGHMEM page for a VMA that the caller knows can move
+ * @vma: The VMA the page is to be allocated for
+ * @vaddr: The virtual address the page will be inserted into
+ *
+ * This function will allocate a page for a VMA that the caller knows will
+ * be able to migrate in the future using move_pages() or reclaimed
+ */
+static inline struct page *
+alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
+					unsigned long vaddr)
+{
+	return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr);
+}
+
 static inline void clear_highpage(struct page *page)
 {
 	void *kaddr = kmap_atomic(page, KM_USER0);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/mm/memory.c linux-2.6.20-mm2-001_mark_highmovable/mm/memory.c
--- linux-2.6.20-mm2-clean/mm/memory.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/mm/memory.c	2007-02-19 09:08:29.000000000 +0000
@@ -1761,11 +1761,11 @@ gotten:
 	if (unlikely(anon_vma_prepare(vma)))
 		goto oom;
 	if (old_page == ZERO_PAGE(address)) {
-		new_page = alloc_zeroed_user_highpage(vma, address);
+		new_page = alloc_zeroed_user_highpage_movable(vma, address);
 		if (!new_page)
 			goto oom;
 	} else {
-		new_page = alloc_page_vma(GFP_HIGHUSER, vma, address);
+		new_page = alloc_page_vma(GFP_HIGH_MOVABLE, vma, address);
 		if (!new_page)
 			goto oom;
 		cow_user_page(new_page, old_page, address, vma);
@@ -2283,7 +2283,7 @@ static int do_anonymous_page(struct mm_s
 
 		if (unlikely(anon_vma_prepare(vma)))
 			goto oom;
-		page = alloc_zeroed_user_highpage(vma, address);
+		page = alloc_zeroed_user_highpage_movable(vma, address);
 		if (!page)
 			goto oom;
 
@@ -2384,7 +2384,7 @@ retry:
 
 			if (unlikely(anon_vma_prepare(vma)))
 				goto oom;
-			page = alloc_page_vma(GFP_HIGHUSER, vma, address);
+			page = alloc_page_vma(GFP_HIGH_MOVABLE, vma, address);
 			if (!page)
 				goto oom;
 			copy_user_highpage(page, new_page, address, vma);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/mm/mempolicy.c linux-2.6.20-mm2-001_mark_highmovable/mm/mempolicy.c
--- linux-2.6.20-mm2-clean/mm/mempolicy.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/mm/mempolicy.c	2007-02-19 09:08:29.000000000 +0000
@@ -603,7 +603,7 @@ static void migrate_page_add(struct page
 
 static struct page *new_node_page(struct page *page, unsigned long node, int **x)
 {
-	return alloc_pages_node(node, GFP_HIGHUSER, 0);
+	return alloc_pages_node(node, GFP_HIGH_MOVABLE, 0);
 }
 
 /*
@@ -719,7 +719,7 @@ static struct page *new_vma_page(struct 
 {
 	struct vm_area_struct *vma = (struct vm_area_struct *)private;
 
-	return alloc_page_vma(GFP_HIGHUSER, vma, page_address_in_vma(page, vma));
+	return alloc_page_vma(GFP_HIGH_MOVABLE, vma, page_address_in_vma(page, vma));
 }
 #else
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/mm/migrate.c linux-2.6.20-mm2-001_mark_highmovable/mm/migrate.c
--- linux-2.6.20-mm2-clean/mm/migrate.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/mm/migrate.c	2007-02-19 09:08:29.000000000 +0000
@@ -755,7 +755,7 @@ static struct page *new_page_node(struct
 
 	*result = &pm->status;
 
-	return alloc_pages_node(pm->node, GFP_HIGHUSER | GFP_THISNODE, 0);
+	return alloc_pages_node(pm->node, GFP_HIGH_MOVABLE | GFP_THISNODE, 0);
 }
 
 /*
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/mm/shmem.c linux-2.6.20-mm2-001_mark_highmovable/mm/shmem.c
--- linux-2.6.20-mm2-clean/mm/shmem.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/mm/shmem.c	2007-02-19 09:08:29.000000000 +0000
@@ -93,8 +93,11 @@ static inline struct page *shmem_dir_all
 	 * The above definition of ENTRIES_PER_PAGE, and the use of
 	 * BLOCKS_PER_PAGE on indirect pages, assume PAGE_CACHE_SIZE:
 	 * might be reconsidered if it ever diverges from PAGE_SIZE.
+	 *
+	 * __GFP_MOVABLE is masked out as swap vectors cannot move
 	 */
-	return alloc_pages(gfp_mask, PAGE_CACHE_SHIFT-PAGE_SHIFT);
+	return alloc_pages((gfp_mask & ~__GFP_MOVABLE) | __GFP_ZERO,
+				PAGE_CACHE_SHIFT-PAGE_SHIFT);
 }
 
 static inline void shmem_dir_free(struct page *page)
@@ -371,7 +374,7 @@ static swp_entry_t *shmem_swp_alloc(stru
 		}
 
 		spin_unlock(&info->lock);
-		page = shmem_dir_alloc(mapping_gfp_mask(inode->i_mapping) | __GFP_ZERO);
+		page = shmem_dir_alloc(mapping_gfp_mask(inode->i_mapping));
 		if (page)
 			set_page_private(page, 0);
 		spin_lock(&info->lock);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/mm/swap_prefetch.c linux-2.6.20-mm2-001_mark_highmovable/mm/swap_prefetch.c
--- linux-2.6.20-mm2-clean/mm/swap_prefetch.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/mm/swap_prefetch.c	2007-02-19 09:08:29.000000000 +0000
@@ -204,7 +204,7 @@ static enum trickle_return trickle_swap_
 	 * Get a new page to read from swap. We have already checked the
 	 * watermarks so __alloc_pages will not call on reclaim.
 	 */
-	page = alloc_pages_node(node, GFP_HIGHUSER & ~__GFP_WAIT, 0);
+	page = alloc_pages_node(node, GFP_HIGH_MOVABLE & ~__GFP_WAIT, 0);
 	if (unlikely(!page)) {
 		ret = TRICKLE_DELAY;
 		goto out;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-clean/mm/swap_state.c linux-2.6.20-mm2-001_mark_highmovable/mm/swap_state.c
--- linux-2.6.20-mm2-clean/mm/swap_state.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-001_mark_highmovable/mm/swap_state.c	2007-02-19 09:08:29.000000000 +0000
@@ -340,7 +340,7 @@ struct page *read_swap_cache_async(swp_e
 		 * Get a new page to read into from swap.
 		 */
 		if (!new_page) {
-			new_page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
+			new_page = alloc_page_vma(GFP_HIGH_MOVABLE, vma, addr);
 			if (!new_page)
 				break;		/* Out of memory */
 		}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/8] Create the ZONE_MOVABLE zone
  2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
  2007-03-01 10:08 ` [PATCH 1/8] Add __GFP_MOVABLE for callers to flag allocations that may be migrated Mel Gorman
@ 2007-03-01 10:08 ` Mel Gorman
  2007-03-01 10:09 ` [PATCH 3/8] Allow huge page allocations to use GFP_HIGH_MOVABLE Mel Gorman
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:08 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm


This patch creates an additional zone, ZONE_MOVABLE.  This zone is only
usable by allocations which specify both __GFP_HIGHMEM and __GFP_MOVABLE.
Hot-added memory continues to be placed in their existing destination as
there is no mechanism to redirect them to a specific zone.


Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 include/linux/gfp.h    |    3 
 include/linux/mm.h     |    1 
 include/linux/mmzone.h |   19 +++
 include/linux/vmstat.h |    5 
 mm/highmem.c           |    7 +
 mm/page_alloc.c        |  229 +++++++++++++++++++++++++++++++++++++++++++-
 mm/vmstat.c            |    2 
 7 files changed, 256 insertions(+), 10 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-001_mark_highmovable/include/linux/gfp.h linux-2.6.20-mm2-002_create_movable_zone/include/linux/gfp.h
--- linux-2.6.20-mm2-001_mark_highmovable/include/linux/gfp.h	2007-02-19 09:08:29.000000000 +0000
+++ linux-2.6.20-mm2-002_create_movable_zone/include/linux/gfp.h	2007-02-19 09:10:58.000000000 +0000
@@ -101,6 +101,9 @@ static inline enum zone_type gfp_zone(gf
 	if (flags & __GFP_DMA32)
 		return ZONE_DMA32;
 #endif
+	if ((flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) ==
+			(__GFP_HIGHMEM | __GFP_MOVABLE))
+		return ZONE_MOVABLE;
 #ifdef CONFIG_HIGHMEM
 	if (flags & __GFP_HIGHMEM)
 		return ZONE_HIGHMEM;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-001_mark_highmovable/include/linux/mm.h linux-2.6.20-mm2-002_create_movable_zone/include/linux/mm.h
--- linux-2.6.20-mm2-001_mark_highmovable/include/linux/mm.h	2007-02-19 01:22:30.000000000 +0000
+++ linux-2.6.20-mm2-002_create_movable_zone/include/linux/mm.h	2007-02-19 09:10:58.000000000 +0000
@@ -977,6 +977,7 @@ extern unsigned long find_max_pfn_with_a
 extern void free_bootmem_with_active_regions(int nid,
 						unsigned long max_low_pfn);
 extern void sparse_memory_present_with_active_regions(int nid);
+extern int cmdline_parse_kernelcore(char *p);
 #ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
 extern int early_pfn_to_nid(unsigned long pfn);
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-001_mark_highmovable/include/linux/mmzone.h linux-2.6.20-mm2-002_create_movable_zone/include/linux/mmzone.h
--- linux-2.6.20-mm2-001_mark_highmovable/include/linux/mmzone.h	2007-02-19 01:22:30.000000000 +0000
+++ linux-2.6.20-mm2-002_create_movable_zone/include/linux/mmzone.h	2007-02-19 09:10:58.000000000 +0000
@@ -142,6 +142,7 @@ enum zone_type {
 	 */
 	ZONE_HIGHMEM,
 #endif
+	ZONE_MOVABLE,
 	MAX_NR_ZONES
 };
 
@@ -163,6 +164,7 @@ enum zone_type {
 	+ defined(CONFIG_ZONE_DMA32)	\
 	+ 1				\
 	+ defined(CONFIG_HIGHMEM)	\
+	+ 1				\
 )
 #if __ZONE_COUNT < 2
 #define ZONES_SHIFT 0
@@ -498,10 +500,21 @@ static inline int populated_zone(struct 
 	return (!!zone->present_pages);
 }
 
+extern int movable_zone;
+static inline int zone_movable_is_highmem(void)
+{
+#ifdef CONFIG_HIGHMEM
+	return movable_zone == ZONE_HIGHMEM;
+#else
+	return 0;
+#endif
+}
+
 static inline int is_highmem_idx(enum zone_type idx)
 {
 #ifdef CONFIG_HIGHMEM
-	return (idx == ZONE_HIGHMEM);
+	return (idx == ZONE_HIGHMEM ||
+		(idx == ZONE_MOVABLE && zone_movable_is_highmem()));
 #else
 	return 0;
 #endif
@@ -521,7 +534,9 @@ static inline int is_normal_idx(enum zon
 static inline int is_highmem(struct zone *zone)
 {
 #ifdef CONFIG_HIGHMEM
-	return zone == zone->zone_pgdat->node_zones + ZONE_HIGHMEM;
+	int zone_idx = zone - zone->zone_pgdat->node_zones;
+	return zone_idx == ZONE_HIGHMEM ||
+		(zone_idx == ZONE_MOVABLE && zone_movable_is_highmem());
 #else
 	return 0;
 #endif
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-001_mark_highmovable/include/linux/vmstat.h linux-2.6.20-mm2-002_create_movable_zone/include/linux/vmstat.h
--- linux-2.6.20-mm2-001_mark_highmovable/include/linux/vmstat.h	2007-02-19 01:22:32.000000000 +0000
+++ linux-2.6.20-mm2-002_create_movable_zone/include/linux/vmstat.h	2007-02-19 09:10:58.000000000 +0000
@@ -25,7 +25,7 @@
 #define HIGHMEM_ZONE(xx)
 #endif
 
-#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL HIGHMEM_ZONE(xx)
+#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL HIGHMEM_ZONE(xx) , xx##_MOVABLE
 
 enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
@@ -172,7 +172,8 @@ static inline unsigned long node_page_st
 #ifdef CONFIG_HIGHMEM
 		zone_page_state(&zones[ZONE_HIGHMEM], item) +
 #endif
-		zone_page_state(&zones[ZONE_NORMAL], item);
+		zone_page_state(&zones[ZONE_NORMAL], item) +
+		zone_page_state(&zones[ZONE_MOVABLE], item);
 }
 
 extern void zone_statistics(struct zonelist *, struct zone *);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-001_mark_highmovable/mm/highmem.c linux-2.6.20-mm2-002_create_movable_zone/mm/highmem.c
--- linux-2.6.20-mm2-001_mark_highmovable/mm/highmem.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-002_create_movable_zone/mm/highmem.c	2007-02-19 09:10:58.000000000 +0000
@@ -46,9 +46,14 @@ unsigned int nr_free_highpages (void)
 	pg_data_t *pgdat;
 	unsigned int pages = 0;
 
-	for_each_online_pgdat(pgdat)
+	for_each_online_pgdat(pgdat) {
 		pages += zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM],
 			NR_FREE_PAGES);
+		if (zone_movable_is_highmem())
+			pages += zone_page_state(
+					&pgdat->node_zones[ZONE_MOVABLE],
+					NR_FREE_PAGES);
+	}
 
 	return pages;
 }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-001_mark_highmovable/mm/page_alloc.c linux-2.6.20-mm2-002_create_movable_zone/mm/page_alloc.c
--- linux-2.6.20-mm2-001_mark_highmovable/mm/page_alloc.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-002_create_movable_zone/mm/page_alloc.c	2007-02-19 09:10:58.000000000 +0000
@@ -80,8 +80,9 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
 	 256,
 #endif
 #ifdef CONFIG_HIGHMEM
-	 32
+	 32,
 #endif
+	 32,
 };
 
 EXPORT_SYMBOL(totalram_pages);
@@ -95,8 +96,9 @@ static char * const zone_names[MAX_NR_ZO
 #endif
 	 "Normal",
 #ifdef CONFIG_HIGHMEM
-	 "HighMem"
+	 "HighMem",
 #endif
+	 "Movable",
 };
 
 int min_free_kbytes = 1024;
@@ -134,6 +136,12 @@ static unsigned long __initdata dma_rese
   unsigned long __initdata node_boundary_start_pfn[MAX_NUMNODES];
   unsigned long __initdata node_boundary_end_pfn[MAX_NUMNODES];
 #endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */
+  unsigned long __initdata required_kernelcore;
+  unsigned long __initdata zone_movable_pfn[MAX_NUMNODES];
+
+  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
+  int movable_zone;
+  EXPORT_SYMBOL(movable_zone);
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 
 #ifdef CONFIG_DEBUG_VM
@@ -1578,7 +1586,7 @@ unsigned int nr_free_buffer_pages(void)
  */
 unsigned int nr_free_pagecache_pages(void)
 {
-	return nr_free_zone_pages(gfp_zone(GFP_HIGHUSER));
+	return nr_free_zone_pages(gfp_zone(GFP_HIGH_MOVABLE));
 }
 
 /*
@@ -2567,6 +2575,63 @@ void __init get_pfn_range_for_nid(unsign
 }
 
 /*
+ * This finds a zone that can be used for ZONE_MOVABLE pages. The
+ * assumption is made that zones within a node are ordered in monotonic
+ * increasing memory addresses so that the "highest" populated zone is used
+ */
+void __init find_usable_zone_for_movable(void)
+{
+	int zone_index;
+	for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) {
+		if (zone_index == ZONE_MOVABLE)
+			continue;
+
+		if (arch_zone_highest_possible_pfn[zone_index] >
+				arch_zone_lowest_possible_pfn[zone_index])
+			break;
+	}
+
+	VM_BUG_ON(zone_index == -1);
+	movable_zone = zone_index;
+}
+
+/*
+ * The zone ranges provided by the architecture do not include ZONE_MOVABLE
+ * because it is sized independant of architecture. Unlike the other zones,
+ * the starting point for ZONE_MOVABLE is not fixed. It may be different
+ * in each node depending on the size of each node and how evenly kernelcore
+ * is distributed. This helper function adjusts the zone ranges
+ * provided by the architecture for a given node by using the end of the
+ * highest usable zone for ZONE_MOVABLE. This preserves the assumption that
+ * zones within a node are in order of monotonic increases memory addresses
+ */
+void __init adjust_zone_range_for_zone_movable(int nid,
+					unsigned long zone_type,
+					unsigned long node_start_pfn,
+					unsigned long node_end_pfn,
+					unsigned long *zone_start_pfn,
+					unsigned long *zone_end_pfn)
+{
+	/* Only adjust if ZONE_MOVABLE is on this node */
+	if (zone_movable_pfn[nid]) {
+		/* Size ZONE_MOVABLE */
+		if (zone_type == ZONE_MOVABLE) {
+			*zone_start_pfn = zone_movable_pfn[nid];
+			*zone_end_pfn = min(node_end_pfn,
+				arch_zone_highest_possible_pfn[movable_zone]);
+
+		/* Adjust for ZONE_MOVABLE starting within this range */
+		} else if (*zone_start_pfn < zone_movable_pfn[nid] &&
+				*zone_end_pfn > zone_movable_pfn[nid]) {
+			*zone_end_pfn = zone_movable_pfn[nid];
+
+		/* Check if this whole range is within ZONE_MOVABLE */
+		} else if (*zone_start_pfn >= zone_movable_pfn[nid])
+			*zone_start_pfn = *zone_end_pfn;
+	}
+}
+
+/*
  * Return the number of pages a zone spans in a node, including holes
  * present_pages = zone_spanned_pages_in_node() - zone_absent_pages_in_node()
  */
@@ -2581,6 +2646,9 @@ unsigned long __init zone_spanned_pages_
 	get_pfn_range_for_nid(nid, &node_start_pfn, &node_end_pfn);
 	zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];
 	zone_end_pfn = arch_zone_highest_possible_pfn[zone_type];
+	adjust_zone_range_for_zone_movable(nid, zone_type,
+				node_start_pfn, node_end_pfn,
+				&zone_start_pfn, &zone_end_pfn);
 
 	/* Check that this node has pages within the zone's required range */
 	if (zone_end_pfn < node_start_pfn || zone_start_pfn > node_end_pfn)
@@ -2671,6 +2739,9 @@ unsigned long __init zone_absent_pages_i
 	zone_end_pfn = min(arch_zone_highest_possible_pfn[zone_type],
 							node_end_pfn);
 
+	adjust_zone_range_for_zone_movable(nid, zone_type,
+			node_start_pfn, node_end_pfn,
+			&zone_start_pfn, &zone_end_pfn);
 	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
 }
 
@@ -3031,6 +3102,117 @@ unsigned long __init find_max_pfn_with_a
 	return max_pfn;
 }
 
+/*
+ * Find the PFN the Movable zone begins in each node. Kernel memory
+ * is spread evenly between nodes as long as the nodes have enough
+ * memory. When they don't, some nodes will have more kernelcore than
+ * others
+ */
+void __init find_zone_movable_pfns_for_nodes(unsigned long *movable_pfn)
+{
+	int i, nid;
+	unsigned long usable_startpfn;
+	unsigned long kernelcore_node, kernelcore_remaining;
+	int usable_nodes = num_online_nodes();
+
+	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
+	if (!required_kernelcore)
+		return;
+
+	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
+	find_usable_zone_for_movable();
+	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
+
+restart:
+	/* Spread kernelcore memory as evenly as possible throughout nodes */
+	kernelcore_node = required_kernelcore / usable_nodes;
+	for_each_online_node(nid) {
+		/*
+		 * Recalculate kernelcore_node if the division per node
+		 * now exceeds what is necessary to satisfy the requested
+		 * amount of memory for the kernel
+		 */
+		if (required_kernelcore < kernelcore_node)
+			kernelcore_node = required_kernelcore / usable_nodes;
+
+		/*
+		 * As the map is walked, we track how much memory is usable
+		 * by the kernel using kernelcore_remaining. When it is
+		 * 0, the rest of the node is usable by ZONE_MOVABLE
+		 */
+		kernelcore_remaining = kernelcore_node;
+
+		/* Go through each range of PFNs within this node */
+		for_each_active_range_index_in_nid(i, nid) {
+			unsigned long start_pfn, end_pfn;
+			unsigned long size_pages;
+
+			start_pfn = max(early_node_map[i].start_pfn,
+						zone_movable_pfn[nid]);
+			end_pfn = early_node_map[i].end_pfn;
+			if (start_pfn >= end_pfn)
+				continue;
+
+			/* Account for what is only usable for kernelcore */
+			if (start_pfn < usable_startpfn) {
+				unsigned long kernel_pages;
+				kernel_pages = min(end_pfn, usable_startpfn)
+								- start_pfn;
+
+				kernelcore_remaining -= min(kernel_pages,
+							kernelcore_remaining);
+				required_kernelcore -= min(kernel_pages,
+							required_kernelcore);
+
+				/* Continue if range is now fully accounted */
+				if (end_pfn <= usable_startpfn) {
+
+					/*
+					 * Push zone_movable_pfn to the end so
+					 * that if we have to rebalance
+					 * kernelcore across nodes, we will
+					 * not double account here
+					 */
+					zone_movable_pfn[nid] = end_pfn;
+					continue;
+				}
+				start_pfn = usable_startpfn;
+			}
+
+			/*
+			 * The usable PFN range for ZONE_MOVABLE is from
+			 * start_pfn->end_pfn. Calculate size_pages as the
+			 * number of pages used as kernelcore
+			 */
+			size_pages = end_pfn - start_pfn;
+			if (size_pages > kernelcore_remaining)
+				size_pages = kernelcore_remaining;
+			zone_movable_pfn[nid] = start_pfn + size_pages;
+
+			/*
+			 * Some kernelcore has been met, update counts and
+			 * break if the kernelcore for this node has been
+			 * satisified
+			 */
+			required_kernelcore -= min(required_kernelcore,
+								size_pages);
+			kernelcore_remaining -= size_pages;
+			if (!kernelcore_remaining)
+				break;
+		}
+	}
+
+	/*
+	 * If there is still required_kernelcore, we do another pass with one
+	 * less node in the count. This will push zone_movable_pfn[nid] further
+	 * along on the nodes that still have memory until kernelcore is
+	 * satisified
+	 */
+	usable_nodes--;
+	if (usable_nodes && required_kernelcore > usable_nodes)
+		goto restart;
+}
+
 /**
  * free_area_init_nodes - Initialise all pg_data_t and zone data
  * @max_zone_pfn: an array of max PFNs for each zone
@@ -3060,22 +3242,42 @@ void __init free_area_init_nodes(unsigne
 	arch_zone_lowest_possible_pfn[0] = find_min_pfn_with_active_regions();
 	arch_zone_highest_possible_pfn[0] = max_zone_pfn[0];
 	for (i = 1; i < MAX_NR_ZONES; i++) {
+		if (i == ZONE_MOVABLE)
+			continue;
+
 		arch_zone_lowest_possible_pfn[i] =
 			arch_zone_highest_possible_pfn[i-1];
 		arch_zone_highest_possible_pfn[i] =
 			max(max_zone_pfn[i], arch_zone_lowest_possible_pfn[i]);
 	}
+	arch_zone_lowest_possible_pfn[ZONE_MOVABLE] = 0;
+	arch_zone_highest_possible_pfn[ZONE_MOVABLE] = 0;
 
 	/* Print out the page size for debugging meminit problems */
 	printk(KERN_DEBUG "sizeof(struct page) = %zd\n", sizeof(struct page));
 
+	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
+	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
+	find_zone_movable_pfns_for_nodes(zone_movable_pfn);
+
 	/* Print out the zone ranges */
 	printk("Zone PFN ranges:\n");
-	for (i = 0; i < MAX_NR_ZONES; i++)
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		if (i == ZONE_MOVABLE)
+			continue;
+
 		printk("  %-8s %8lu -> %8lu\n",
 				zone_names[i],
 				arch_zone_lowest_possible_pfn[i],
 				arch_zone_highest_possible_pfn[i]);
+	}
+
+	/* Print out the PFNs ZONE_MOVABLE begins at in each node */
+	printk("Movable zone start PFN for each node\n");
+	for (i = 0; i < MAX_NUMNODES; i++) {
+		if (zone_movable_pfn[i])
+			printk("  Node %d: %lu\n", i, zone_movable_pfn[i]);
+	}
 
 	/* Print out the early_node_map[] */
 	printk("early_node_map[%d] active PFN ranges\n", nr_nodemap_entries);
@@ -3092,6 +3294,25 @@ void __init free_area_init_nodes(unsigne
 				find_min_pfn_for_node(nid), NULL);
 	}
 }
+
+/*
+ * kernelcore=size sets the amount of memory for use for allocations that
+ * cannot be reclaimed or migrated.
+ */
+int __init cmdline_parse_kernelcore(char *p)
+{
+	unsigned long long coremem;
+	if (!p)
+		return -EINVAL;
+
+	coremem = memparse(p, &p);
+	required_kernelcore = coremem >> PAGE_SHIFT;
+
+	/* Paranoid check that UL is enough for required_kernelcore */
+	WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX);
+
+	return 0;
+}
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 
 /**
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-001_mark_highmovable/mm/vmstat.c linux-2.6.20-mm2-002_create_movable_zone/mm/vmstat.c
--- linux-2.6.20-mm2-001_mark_highmovable/mm/vmstat.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-002_create_movable_zone/mm/vmstat.c	2007-02-19 09:10:58.000000000 +0000
@@ -427,7 +427,7 @@ const struct seq_operations fragmentatio
 #endif
 
 #define TEXTS_FOR_ZONES(xx) TEXT_FOR_DMA(xx) TEXT_FOR_DMA32(xx) xx "_normal", \
-					TEXT_FOR_HIGHMEM(xx)
+					TEXT_FOR_HIGHMEM(xx) xx "_movable",
 
 static const char * const vmstat_text[] = {
 	/* Zoned VM counters */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/8] Allow huge page allocations to use GFP_HIGH_MOVABLE
  2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
  2007-03-01 10:08 ` [PATCH 1/8] Add __GFP_MOVABLE for callers to flag allocations that may be migrated Mel Gorman
  2007-03-01 10:08 ` [PATCH 2/8] Create the ZONE_MOVABLE zone Mel Gorman
@ 2007-03-01 10:09 ` Mel Gorman
  2007-03-01 10:09 ` [PATCH 4/8] x86 - Specify amount of kernel memory at boot time Mel Gorman
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:09 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm


Huge pages are not movable so are not allocated from ZONE_MOVABLE. However,
as ZONE_MOVABLE will always have pages that can be migrated or reclaimed,
it can be used to satisfy hugepage allocations even when the system has been
running a long time. This allows an administrator to resize the hugepage
pool at runtime depending on the size of ZONE_MOVABLE.

This patch adds a new sysctl called hugepages_treat_as_movable. When
a non-zero value is written to it, future allocations for the huge page
pool will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not
introduce additional external fragmentation of note as huge pages are always
the largest contiguous block we care about.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 include/linux/hugetlb.h   |    3 +++
 include/linux/mempolicy.h |    6 +++---
 include/linux/sysctl.h    |    1 +
 kernel/sysctl.c           |    8 ++++++++
 mm/hugetlb.c              |   23 ++++++++++++++++++++---
 mm/mempolicy.c            |    5 +++--
 6 files changed, 38 insertions(+), 8 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-002_create_movable_zone/include/linux/hugetlb.h linux-2.6.20-mm2-003_mark_hugepages_movable/include/linux/hugetlb.h
--- linux-2.6.20-mm2-002_create_movable_zone/include/linux/hugetlb.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-003_mark_hugepages_movable/include/linux/hugetlb.h	2007-02-19 09:13:13.000000000 +0000
@@ -14,6 +14,7 @@ static inline int is_vm_hugetlb_page(str
 }
 
 int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
+int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
 int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int);
 void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long);
@@ -28,6 +29,8 @@ int hugetlb_reserve_pages(struct inode *
 void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
 
 extern unsigned long max_huge_pages;
+extern unsigned long hugepages_treat_as_movable;
+extern gfp_t htlb_alloc_mask;
 extern const unsigned long hugetlb_zero, hugetlb_infinity;
 extern int sysctl_hugetlb_shm_group;
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-002_create_movable_zone/include/linux/mempolicy.h linux-2.6.20-mm2-003_mark_hugepages_movable/include/linux/mempolicy.h
--- linux-2.6.20-mm2-002_create_movable_zone/include/linux/mempolicy.h	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-003_mark_hugepages_movable/include/linux/mempolicy.h	2007-02-19 09:13:13.000000000 +0000
@@ -159,7 +159,7 @@ extern void mpol_fix_fork_child_flag(str
 
 extern struct mempolicy default_policy;
 extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
-		unsigned long addr);
+		unsigned long addr, gfp_t gfp_flags);
 extern unsigned slab_node(struct mempolicy *policy);
 
 extern enum zone_type policy_zone;
@@ -256,9 +256,9 @@ static inline void mpol_fix_fork_child_f
 #define set_cpuset_being_rebound(x) do {} while (0)
 
 static inline struct zonelist *huge_zonelist(struct vm_area_struct *vma,
-		unsigned long addr)
+		unsigned long addr, gfp_t gfp_flags)
 {
-	return NODE_DATA(0)->node_zonelists + gfp_zone(GFP_HIGHUSER);
+	return NODE_DATA(0)->node_zonelists + gfp_zone(gfp_flags);
 }
 
 static inline int do_migrate_pages(struct mm_struct *mm,
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-002_create_movable_zone/include/linux/sysctl.h linux-2.6.20-mm2-003_mark_hugepages_movable/include/linux/sysctl.h
--- linux-2.6.20-mm2-002_create_movable_zone/include/linux/sysctl.h	2007-02-19 01:22:32.000000000 +0000
+++ linux-2.6.20-mm2-003_mark_hugepages_movable/include/linux/sysctl.h	2007-02-19 09:13:13.000000000 +0000
@@ -207,6 +207,7 @@ enum
 	VM_PANIC_ON_OOM=33,	/* panic at out-of-memory */
 	VM_VDSO_ENABLED=34,	/* map VDSO into new processes? */
 	VM_MIN_SLAB=35,		 /* Percent pages ignored by zone reclaim */
+	VM_HUGETLB_TREAT_MOVABLE=36, /* Allocate hugepages from ZONE_MOVABLE */
 
 	/* s390 vm cmm sysctls */
 	VM_CMM_PAGES=1111,
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-002_create_movable_zone/kernel/sysctl.c linux-2.6.20-mm2-003_mark_hugepages_movable/kernel/sysctl.c
--- linux-2.6.20-mm2-002_create_movable_zone/kernel/sysctl.c	2007-02-19 01:22:34.000000000 +0000
+++ linux-2.6.20-mm2-003_mark_hugepages_movable/kernel/sysctl.c	2007-02-19 09:13:13.000000000 +0000
@@ -737,6 +737,14 @@ static ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	 },
+	 {
+		.ctl_name	= VM_HUGETLB_TREAT_MOVABLE,
+		.procname	= "hugepages_treat_as_movable",
+		.data		= &hugepages_treat_as_movable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &hugetlb_treat_movable_handler,
+	},
 #endif
 	{
 		.ctl_name	= VM_LOWMEM_RESERVE_RATIO,
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-002_create_movable_zone/mm/hugetlb.c linux-2.6.20-mm2-003_mark_hugepages_movable/mm/hugetlb.c
--- linux-2.6.20-mm2-002_create_movable_zone/mm/hugetlb.c	2007-02-19 01:22:35.000000000 +0000
+++ linux-2.6.20-mm2-003_mark_hugepages_movable/mm/hugetlb.c	2007-02-19 09:13:13.000000000 +0000
@@ -27,6 +27,9 @@ unsigned long max_huge_pages;
 static struct list_head hugepage_freelists[MAX_NUMNODES];
 static unsigned int nr_huge_pages_node[MAX_NUMNODES];
 static unsigned int free_huge_pages_node[MAX_NUMNODES];
+gfp_t htlb_alloc_mask = GFP_HIGHUSER;
+unsigned long hugepages_treat_as_movable;
+
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
  */
@@ -68,12 +71,13 @@ static struct page *dequeue_huge_page(st
 {
 	int nid = numa_node_id();
 	struct page *page = NULL;
-	struct zonelist *zonelist = huge_zonelist(vma, address);
+	struct zonelist *zonelist = huge_zonelist(vma, address,
+						htlb_alloc_mask);
 	struct zone **z;
 
 	for (z = zonelist->zones; *z; z++) {
 		nid = zone_to_nid(*z);
-		if (cpuset_zone_allowed_softwall(*z, GFP_HIGHUSER) &&
+		if (cpuset_zone_allowed_softwall(*z, htlb_alloc_mask) &&
 		    !list_empty(&hugepage_freelists[nid]))
 			break;
 	}
@@ -103,7 +107,7 @@ static int alloc_fresh_huge_page(void)
 {
 	static int nid = 0;
 	struct page *page;
-	page = alloc_pages_node(nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN,
+	page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
 					HUGETLB_PAGE_ORDER);
 	nid = next_node(nid, node_online_map);
 	if (nid == MAX_NUMNODES)
@@ -243,6 +247,19 @@ int hugetlb_sysctl_handler(struct ctl_ta
 	max_huge_pages = set_max_huge_pages(max_huge_pages);
 	return 0;
 }
+
+int hugetlb_treat_movable_handler(struct ctl_table *table, int write,
+			struct file *file, void __user *buffer,
+			size_t *length, loff_t *ppos)
+{
+	proc_dointvec(table, write, file, buffer, length, ppos);
+	if (hugepages_treat_as_movable)
+		htlb_alloc_mask = GFP_HIGH_MOVABLE;
+	else
+		htlb_alloc_mask = GFP_HIGHUSER;
+	return 0;
+}
+
 #endif /* CONFIG_SYSCTL */
 
 int hugetlb_report_meminfo(char *buf)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-002_create_movable_zone/mm/mempolicy.c linux-2.6.20-mm2-003_mark_hugepages_movable/mm/mempolicy.c
--- linux-2.6.20-mm2-002_create_movable_zone/mm/mempolicy.c	2007-02-19 09:08:29.000000000 +0000
+++ linux-2.6.20-mm2-003_mark_hugepages_movable/mm/mempolicy.c	2007-02-19 09:13:13.000000000 +0000
@@ -1211,7 +1211,8 @@ static inline unsigned interleave_nid(st
 
 #ifdef CONFIG_HUGETLBFS
 /* Return a zonelist suitable for a huge page allocation. */
-struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr)
+struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
+							gfp_t gfp_flags)
 {
 	struct mempolicy *pol = get_vma_policy(current, vma, addr);
 
@@ -1219,7 +1220,7 @@ struct zonelist *huge_zonelist(struct vm
 		unsigned nid;
 
 		nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
-		return NODE_DATA(nid)->node_zonelists + gfp_zone(GFP_HIGHUSER);
+		return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags);
 	}
 	return zonelist_policy(GFP_HIGHUSER, pol);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/8] x86 - Specify amount of kernel memory at boot time
  2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
                   ` (2 preceding siblings ...)
  2007-03-01 10:09 ` [PATCH 3/8] Allow huge page allocations to use GFP_HIGH_MOVABLE Mel Gorman
@ 2007-03-01 10:09 ` Mel Gorman
  2007-03-01 10:09 ` [PATCH 5/8] ppc and powerpc " Mel Gorman
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:09 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm


This patch adds the kernelcore= parameter for x86.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 setup.c |    1 +
 1 files changed, 1 insertion(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-003_mark_hugepages_movable/arch/i386/kernel/setup.c linux-2.6.20-mm2-004_x86_set_kernelcore/arch/i386/kernel/setup.c
--- linux-2.6.20-mm2-003_mark_hugepages_movable/arch/i386/kernel/setup.c	2007-02-19 01:19:26.000000000 +0000
+++ linux-2.6.20-mm2-004_x86_set_kernelcore/arch/i386/kernel/setup.c	2007-02-19 09:15:29.000000000 +0000
@@ -195,6 +195,7 @@ static int __init parse_mem(char *arg)
 	return 0;
 }
 early_param("mem", parse_mem);
+early_param("kernelcore", cmdline_parse_kernelcore);
 
 #ifdef CONFIG_PROC_VMCORE
 /* elfcorehdr= specifies the location of elf core header

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 5/8] ppc and powerpc - Specify amount of kernel memory at boot time
  2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
                   ` (3 preceding siblings ...)
  2007-03-01 10:09 ` [PATCH 4/8] x86 - Specify amount of kernel memory at boot time Mel Gorman
@ 2007-03-01 10:09 ` Mel Gorman
  2007-03-01 10:10 ` [PATCH 6/8] x86_64 " Mel Gorman
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:09 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm


This patch adds the kernelcore= parameter for ppc and powerpc.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 powerpc/kernel/prom.c |    1 +
 ppc/mm/init.c         |    2 ++
 2 files changed, 3 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-004_x86_set_kernelcore/arch/powerpc/kernel/prom.c linux-2.6.20-mm2-005_ppc64_set_kernelcore/arch/powerpc/kernel/prom.c
--- linux-2.6.20-mm2-004_x86_set_kernelcore/arch/powerpc/kernel/prom.c	2007-02-19 01:19:32.000000000 +0000
+++ linux-2.6.20-mm2-005_ppc64_set_kernelcore/arch/powerpc/kernel/prom.c	2007-02-19 09:17:41.000000000 +0000
@@ -431,6 +431,7 @@ static int __init early_parse_mem(char *
 	return 0;
 }
 early_param("mem", early_parse_mem);
+early_param("kernelcore", cmdline_parse_kernelcore);
 
 /*
  * The device tree may be allocated below our memory limit, or inside the
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-004_x86_set_kernelcore/arch/ppc/mm/init.c linux-2.6.20-mm2-005_ppc64_set_kernelcore/arch/ppc/mm/init.c
--- linux-2.6.20-mm2-004_x86_set_kernelcore/arch/ppc/mm/init.c	2007-02-04 18:44:54.000000000 +0000
+++ linux-2.6.20-mm2-005_ppc64_set_kernelcore/arch/ppc/mm/init.c	2007-02-19 09:17:41.000000000 +0000
@@ -214,6 +214,8 @@ void MMU_setup(void)
 	}
 }
 
+early_param("kernelcore", cmdline_parse_kernelcore);
+
 /*
  * MMU_init sets up the basic memory mappings for the kernel,
  * including both RAM and possibly some I/O regions,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 6/8] x86_64 - Specify amount of kernel memory at boot time
  2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
                   ` (4 preceding siblings ...)
  2007-03-01 10:09 ` [PATCH 5/8] ppc and powerpc " Mel Gorman
@ 2007-03-01 10:10 ` Mel Gorman
  2007-03-01 10:10 ` [PATCH 7/8] ia64 " Mel Gorman
  2007-03-01 10:10 ` [PATCH 8/8] Add documentation for additional boot parameter and sysctl Mel Gorman
  7 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:10 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm


This patch adds the kernelcore= parameter for x86_64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 e820.c |    1 +
 1 files changed, 1 insertion(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-005_ppc64_set_kernelcore/arch/x86_64/kernel/e820.c linux-2.6.20-mm2-006_x8664_set_kernelcore/arch/x86_64/kernel/e820.c
--- linux-2.6.20-mm2-005_ppc64_set_kernelcore/arch/x86_64/kernel/e820.c	2007-02-19 01:19:38.000000000 +0000
+++ linux-2.6.20-mm2-006_x8664_set_kernelcore/arch/x86_64/kernel/e820.c	2007-02-19 09:19:53.000000000 +0000
@@ -617,6 +617,7 @@ static int __init parse_memopt(char *p)
 	return 0;
 } 
 early_param("mem", parse_memopt);
+early_param("kernelcore", cmdline_parse_kernelcore);
 
 static int userdef __initdata;
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 7/8] ia64 - Specify amount of kernel memory at boot time
  2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
                   ` (5 preceding siblings ...)
  2007-03-01 10:10 ` [PATCH 6/8] x86_64 " Mel Gorman
@ 2007-03-01 10:10 ` Mel Gorman
  2007-03-01 10:10 ` [PATCH 8/8] Add documentation for additional boot parameter and sysctl Mel Gorman
  7 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:10 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm


This patch adds the kernelcore= parameter for ia64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 efi.c |    3 +++
 1 files changed, 3 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-006_x8664_set_kernelcore/arch/ia64/kernel/efi.c linux-2.6.20-mm2-007_ia64_set_kernelcore/arch/ia64/kernel/efi.c
--- linux-2.6.20-mm2-006_x8664_set_kernelcore/arch/ia64/kernel/efi.c	2007-02-19 01:19:27.000000000 +0000
+++ linux-2.6.20-mm2-007_ia64_set_kernelcore/arch/ia64/kernel/efi.c	2007-02-19 09:22:05.000000000 +0000
@@ -28,6 +28,7 @@
 #include <linux/time.h>
 #include <linux/efi.h>
 #include <linux/kexec.h>
+#include <linux/mm.h>
 
 #include <asm/io.h>
 #include <asm/kregs.h>
@@ -422,6 +423,8 @@ efi_init (void)
 			mem_limit = memparse(cp + 4, &cp);
 		} else if (memcmp(cp, "max_addr=", 9) == 0) {
 			max_addr = GRANULEROUNDDOWN(memparse(cp + 9, &cp));
+		} else if (memcmp(cp, "kernelcore=",11) == 0) {
+			cmdline_parse_kernelcore(cp+11);
 		} else if (memcmp(cp, "min_addr=", 9) == 0) {
 			min_addr = GRANULEROUNDDOWN(memparse(cp + 9, &cp));
 		} else {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 8/8] Add documentation for additional boot parameter and sysctl
  2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
                   ` (6 preceding siblings ...)
  2007-03-01 10:10 ` [PATCH 7/8] ia64 " Mel Gorman
@ 2007-03-01 10:10 ` Mel Gorman
  7 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-03-01 10:10 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, linux-kernel, linux-mm


Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.


Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 filesystems/proc.txt  |   15 +++++++++++++++
 kernel-parameters.txt |   16 ++++++++++++++++
 sysctl/vm.txt         |    3 ++-
 3 files changed, 33 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-007_ia64_set_kernelcore/Documentation/filesystems/proc.txt linux-2.6.20-mm2-008_documentation/Documentation/filesystems/proc.txt
--- linux-2.6.20-mm2-007_ia64_set_kernelcore/Documentation/filesystems/proc.txt	2007-02-19 01:19:19.000000000 +0000
+++ linux-2.6.20-mm2-008_documentation/Documentation/filesystems/proc.txt	2007-02-19 09:24:19.000000000 +0000
@@ -1289,6 +1289,21 @@ nr_hugepages configures number of hugetl
 hugetlb_shm_group contains group id that is allowed to create SysV shared
 memory segment using hugetlb page.
 
+hugepages_treat_as_movable
+--------------------------
+
+This parameter is only useful when kernelcore= is specified at boot time to
+create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages
+are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero
+value written to hugepages_treat_as_movable allows huge pages to be allocated
+from ZONE_MOVABLE.
+
+Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge
+pages pool can easily grow or shrink within. Assuming that applications are
+not running that mlock() a lot of memory, it is likely the huge pages pool
+can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value
+into nr_hugepages and triggering page reclaim.
+
 laptop_mode
 -----------
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-007_ia64_set_kernelcore/Documentation/kernel-parameters.txt linux-2.6.20-mm2-008_documentation/Documentation/kernel-parameters.txt
--- linux-2.6.20-mm2-007_ia64_set_kernelcore/Documentation/kernel-parameters.txt	2007-02-19 01:19:20.000000000 +0000
+++ linux-2.6.20-mm2-008_documentation/Documentation/kernel-parameters.txt	2007-02-19 09:24:19.000000000 +0000
@@ -762,6 +762,22 @@ and is between 256 and 4096 characters. 
 	js=		[HW,JOY] Analog joystick
 			See Documentation/input/joystick.txt.
 
+	kernelcore=nn[KMG]	[KNL,IA-32,IA-64,PPC,X86-64] This parameter
+			specifies the amount of memory usable by the kernel
+			for non-movable allocations.  The requested amount is
+			spread evenly throughout all nodes in the system. The
+			remaining memory in each node is used for Movable
+			pages. In the event, a node is too small to have both
+			kernelcore and Movable pages, kernelcore pages will
+			take priority and other nodes will have a larger number
+			of kernelcore pages.  The Movable zone is used for the
+			allocation of pages that may be reclaimed or moved
+			by the page migration subsystem.  This means that
+			HugeTLB pages may not be allocated from this zone.
+			Note that allocations like PTEs-from-HighMem still
+			use the HighMem zone if it exists, and the Normal
+			zone if it does not.
+
 	keepinitrd	[HW,ARM]
 
 	kstack=N	[IA-32,X86-64] Print N words from the kernel stack
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-mm2-007_ia64_set_kernelcore/Documentation/sysctl/vm.txt linux-2.6.20-mm2-008_documentation/Documentation/sysctl/vm.txt
--- linux-2.6.20-mm2-007_ia64_set_kernelcore/Documentation/sysctl/vm.txt	2007-02-19 01:19:20.000000000 +0000
+++ linux-2.6.20-mm2-008_documentation/Documentation/sysctl/vm.txt	2007-02-19 09:24:19.000000000 +0000
@@ -39,7 +39,8 @@ Currently, these files are in /proc/sys/
 
 dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
 dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode,
-block_dump, swap_token_timeout, drop-caches:
+block_dump, swap_token_timeout, drop-caches,
+hugepages_treat_as_movable:
 
 See Documentation/filesystems/proc.txt
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 7/8] ia64 - Specify amount of kernel memory at boot time
  2007-01-25 23:44 [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages Mel Gorman
@ 2007-01-25 23:47 ` Mel Gorman
  0 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2007-01-25 23:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, linux-kernel


This patch adds the kernelcore= parameter for ia64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 efi.c |    3 +++
 1 files changed, 3 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.20-rc4-mm1-006_x8664_set_kernelcore/arch/ia64/kernel/efi.c linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/arch/ia64/kernel/efi.c
--- linux-2.6.20-rc4-mm1-006_x8664_set_kernelcore/arch/ia64/kernel/efi.c	2007-01-07 05:45:51.000000000 +0000
+++ linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/arch/ia64/kernel/efi.c	2007-01-25 17:42:15.000000000 +0000
@@ -27,6 +27,7 @@
 #include <linux/time.h>
 #include <linux/efi.h>
 #include <linux/kexec.h>
+#include <linux/mm.h>
 
 #include <asm/io.h>
 #include <asm/kregs.h>
@@ -422,6 +423,8 @@ efi_init (void)
 			mem_limit = memparse(cp + 4, &cp);
 		} else if (memcmp(cp, "max_addr=", 9) == 0) {
 			max_addr = GRANULEROUNDDOWN(memparse(cp + 9, &cp));
+		} else if (memcmp(cp, "kernelcore=",11) == 0) {
+			cmdline_parse_kernelcore(cp+11);
 		} else if (memcmp(cp, "min_addr=", 9) == 0) {
 			min_addr = GRANULEROUNDDOWN(memparse(cp + 9, &cp));
 		} else {

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-03-01 10:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-01 10:08 [PATCH 0/8] Create optional ZONE_MOVABLE to partition memory between movable and non-movable pages v2 Mel Gorman
2007-03-01 10:08 ` [PATCH 1/8] Add __GFP_MOVABLE for callers to flag allocations that may be migrated Mel Gorman
2007-03-01 10:08 ` [PATCH 2/8] Create the ZONE_MOVABLE zone Mel Gorman
2007-03-01 10:09 ` [PATCH 3/8] Allow huge page allocations to use GFP_HIGH_MOVABLE Mel Gorman
2007-03-01 10:09 ` [PATCH 4/8] x86 - Specify amount of kernel memory at boot time Mel Gorman
2007-03-01 10:09 ` [PATCH 5/8] ppc and powerpc " Mel Gorman
2007-03-01 10:10 ` [PATCH 6/8] x86_64 " Mel Gorman
2007-03-01 10:10 ` [PATCH 7/8] ia64 " Mel Gorman
2007-03-01 10:10 ` [PATCH 8/8] Add documentation for additional boot parameter and sysctl Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2007-01-25 23:44 [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages Mel Gorman
2007-01-25 23:47 ` [PATCH 7/8] ia64 - Specify amount of kernel memory at boot time Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).