LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [QUICKLIST 1/5] Quicklists for page table pages V3
@ 2007-03-19 23:37 Christoph Lameter
  2007-03-19 23:37 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel

Quicklists for page table pages V3

V2->V3
- Fix Kconfig issues by setting CONFIG_QUICKLIST explicitly
  and default to one quicklist if NR_QUICK is not set.
- Fix i386 support. (Cannot mix PMD and PTE allocs.)
- Discussion of V2.
  http://marc.info/?l=linux-kernel&m=117391339914767&w=2

V1->V2
- Add sparch64 patch
- Single i386 and x86_64 patch
- Update attribution
- Update justification
- Update approvals
- Earlier discussion of V1 was at
  http://marc.info/?l=linux-kernel&m=117357922219342&w=2

This patchset introduces an arch independent framework to handle lists
of recently used page table pages to replace the existing (ab)use of the
slab for that purpose.

1. Proven code from the IA64 arch.

	The method used here has been fine tuned for years and
	is NUMA aware. It is based on the knowledge that accesses
	to page table pages are sparse in nature. Taking a page
	off the freelists instead of allocating a zeroed pages
	allows a reduction of number of cachelines touched
	in addition to getting rid of the slab overhead. So
	performance improves. This is particularly useful if pgds
	contain standard mappings. We can save on the teardown
	and setup of such a page if we have some on the quicklists.
	This includes avoiding lists operations that are otherwise
	necessary on alloc and free to track pgds.

2. Light weight alternative to use slab to manage page size pages

	Slab overhead is significant and even page allocator use
	is pretty heavy weight. The use of a per cpu quicklist
	means that we touch only two cachelines for an allocation.
	There is no need to access the page_struct (unless arch code
	needs to fiddle around with it). So the fast past just
	means bringing in one cacheline at the beginning of the
	page. That same cacheline may then be used to store the
	page table entry. Or a second cacheline may be used
	if the page table entry is not in the first cacheline of
	the page. The current code will zero the page which means
	touching 32 cachelines (assuming 128 byte). We get down
	from 32 to 2 cachelines in the fast path.

3. Fix conflicting use of page_structs by slab and arch code.

   	F.e. Both arches use the ->private and ->index field to
	create lists of pgds and i386 also uses other page flags. The slab
	can also use the ->private field for allocations that
	are larger than page size which would occur if one enables
	debugging. In that case the arch code would overwrite the
	pointer to the first page of the compound page allocated
	by the slab. SLAB has been modified to not enable
	debugging for such slabs (!).

	There the potential for additional conflicts
	here especially since some arches also use page flags to mark
	page table pages.

	The patch removes these conflicts by no longer using
	the slab for these purposes. The page allocator is more
	suitable since PAGE_SIZE chunks are its domain.
	Then we can start using standard list operations via
	page->lru instead of improvising linked lists.

	SLUB makes more extensive use of the page struct and so
	far had to create workarounds for these slabs. The ->index
	field is used for the SLUB freelist. So SLUB cannot allow
	the use of a freelist for these slabs and--like slab--
	currently does not allow debugging and forces slabs to
	only contain a single object (avoids freelist).

	If we do not get rid of these issues then both SLAB and SLUB
	have to continue to provide special code paths to support these
	slabs.

4. i386 gets lightweight NUMA aware management of page table pages.

	Note that the use of SLAB on NUMA systems will require the
	use of alien caches to efficiently remove remote page
	table pages. Which (for a PAGE_SIZEd allocation) is a lengthy
	and expensive process. With quicklists no alien caches are
	needed. Pages can be simply returned to the correct node.

5. x86_64 gets lightweight page table page management.

	This will allow x86_64 arch code to faster repopulate pgds
	and other page table entries. The list operations for pgds
	are reduced in the same way as for i386 to the point where
	a pgd is allocated from the page allocator and when it is
	freed back to the page allocator. A pgd can pass through
	the quicklists without having to be reinitialized.

6. Consolidation of code from multiple arches

	So far arches have their own implementation of quicklist
	management. This patch moves that feature into the core allowing
	an easier maintenance and consistent management of quicklists.

Page table pages have the characteristics that they are typically zero
or in a known state when they are freed. This is usually the exactly
same state as needed after allocation. So it makes sense to build a list
of freed page table pages and then consume the pages already in use
first. Those pages have already been initialized correctly (thus no
need to zero them) and are likely already cached in such a way that
the MMU can use them most effectively. Page table pages are used in
a sparse way so zeroing them on allocation is not too useful.

Such an implementation already exits for ia64. Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64.
It also only supported a single quicklist. The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.

Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more
than one quicklist is necessary then we can define NR_QUICK for additional
lists. F.e. i386 needs two and thus has

config NR_QUICK
	int
	default 2

If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:


quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)


Page table pages can be freed using:


quicklist_free(<quicklist-nr>, <destructor>, <page>)


Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.

If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.

Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.

Tested on:
i386 UP / SMP, x86_64 UP, NUMA emulation, IA64 NUMA.

Index: linux-2.6.21-rc3-mm2/include/linux/quicklist.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.21-rc3-mm2/include/linux/quicklist.h	2007-03-16 02:19:15.000000000 -0700
@@ -0,0 +1,95 @@
+#ifndef LINUX_QUICKLIST_H
+#define LINUX_QUICKLIST_H
+/*
+ * Fast allocations and disposal of pages. Pages must be in the condition
+ * as needed after allocation when they are freed. Per cpu lists of pages
+ * are kept that only contain node local pages.
+ *
+ * (C) 2007, SGI. Christoph Lameter <clameter@sgi.com>
+ */
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/percpu.h>
+
+#ifdef CONFIG_QUICKLIST
+
+#ifndef CONFIG_NR_QUICK
+#define CONFIG_NR_QUICK 1
+#endif
+
+struct quicklist {
+	void *page;
+	int nr_pages;
+};
+
+DECLARE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
+
+/*
+ * The two key functions quicklist_alloc and quicklist_free are inline so
+ * that they may be custom compiled for the platform.
+ * Specifying a NULL ctor can remove constructor support. Specifying
+ * a constant quicklist allows the determination of the exact address
+ * in the per cpu area.
+ *
+ * The fast patch in quicklist_alloc touched only a per cpu cacheline and
+ * the first cacheline of the page itself. There is minmal overhead involved.
+ */
+static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
+{
+	struct quicklist *q;
+	void **p = NULL;
+
+	q =&get_cpu_var(quicklist)[nr];
+	p = q->page;
+	if (likely(p)) {
+		q->page = p[0];
+		p[0] = NULL;
+		q->nr_pages--;
+	}
+	put_cpu_var(quicklist);
+	if (likely(p))
+		return p;
+
+	p = (void *)__get_free_page(flags | __GFP_ZERO);
+	if (ctor && p)
+		ctor(p);
+	return p;
+}
+
+static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
+{
+	struct quicklist *q;
+	void **p = pp;
+	struct page *page = virt_to_page(p);
+	int nid = page_to_nid(page);
+
+	if (unlikely(nid != numa_node_id())) {
+		if (dtor)
+			dtor(p);
+		free_page((unsigned long)p);
+		return;
+	}
+
+	q = &get_cpu_var(quicklist)[nr];
+	p[0] = q->page;
+	q->page = p;
+	q->nr_pages++;
+	put_cpu_var(quicklist);
+}
+
+void quicklist_check(int nr, void (*dtor)(void *));
+unsigned long quicklist_total_size(void);
+
+#else
+void quicklist_check(int nr, void (*dtor)(void *))
+{
+}
+
+unsigned long quicklist_total_size(void)
+{
+	return 0;
+}
+#endif
+
+#endif /* LINUX_QUICKLIST_H */
+
Index: linux-2.6.21-rc3-mm2/mm/Makefile
===================================================================
--- linux-2.6.21-rc3-mm2.orig/mm/Makefile	2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/mm/Makefile	2007-03-16 02:16:22.000000000 -0700
@@ -30,3 +30,5 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
+obj-$(CONFIG_QUICKLIST) += quicklist.o
+
Index: linux-2.6.21-rc3-mm2/mm/quicklist.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.21-rc3-mm2/mm/quicklist.c	2007-03-16 02:16:22.000000000 -0700
@@ -0,0 +1,81 @@
+/*
+ * Quicklist support.
+ *
+ * Quicklists are light weight lists of pages that have a defined state
+ * on alloc and free. Pages must be in the quicklist specific defined state
+ * (zero by default) when the page is freed. It seems that the initial idea
+ * for such lists first came from Dave Miller and then various other people
+ * improved on it.
+ *
+ * Copyright (C) 2007 SGI,
+ * 	Christoph Lameter <clameter@sgi.com>
+ * 		Generalized, added support for multiple lists and
+ * 		constructors / destructors.
+ */
+#include <linux/kernel.h>
+
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/quicklist.h>
+
+DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
+
+#define MIN_PAGES		25
+#define MAX_FREES_PER_PASS	16
+#define FRACTION_OF_NODE_MEM	16
+
+static unsigned long max_pages(void)
+{
+	unsigned long node_free_pages, max;
+
+	node_free_pages = node_page_state(numa_node_id(),
+			NR_FREE_PAGES);
+	max = node_free_pages / FRACTION_OF_NODE_MEM;
+	return max(max, (unsigned long)MIN_PAGES);
+}
+
+static long min_pages_to_free(struct quicklist *q)
+{
+	long pages_to_free;
+
+	pages_to_free = q->nr_pages - max_pages();
+
+	return min(pages_to_free, (long)MAX_FREES_PER_PASS);
+}
+
+void quicklist_check(int nr, void (*dtor)(void *))
+{
+	long pages_to_free;
+	struct quicklist *q;
+
+	q = &get_cpu_var(quicklist)[nr];
+	if (q->nr_pages > MIN_PAGES) {
+		pages_to_free = min_pages_to_free(q);
+
+		while (pages_to_free > 0) {
+			void *p = quicklist_alloc(nr, 0, NULL);
+
+			if (dtor)
+				dtor(p);
+			free_page((unsigned long)p);
+			pages_to_free--;
+		}
+	}
+	put_cpu_var(quicklist);
+}
+
+unsigned long quicklist_total_size(void)
+{
+	unsigned long count = 0;
+	int cpu;
+	struct quicklist *ql, *q;
+
+	for_each_online_cpu(cpu) {
+		ql = per_cpu(quicklist, cpu);
+		for (q = ql; q < ql + CONFIG_NR_QUICK; q++)
+			count += q->nr_pages;
+	}
+	return count;
+}
+

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [QUICKLIST 2/5] Quicklist support for IA64
  2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
@ 2007-03-19 23:37 ` Christoph Lameter
  2007-03-19 23:37 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel

Quicklist for IA64

IA64 is the origin of the quicklist implementation. So cut out the pieces
that are now in core code and modify the functions called.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3-mm2/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/mm/init.c	2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/mm/init.c	2007-03-16 02:33:46.000000000 -0700
@@ -39,9 +39,6 @@
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
-DEFINE_PER_CPU(unsigned long *, __pgtable_quicklist);
-DEFINE_PER_CPU(long, __pgtable_quicklist_size);
-
 extern void ia64_tlb_init (void);
 
 unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
@@ -56,54 +53,6 @@ EXPORT_SYMBOL(vmem_map);
 struct page *zero_page_memmap_ptr;	/* map entry for zero page */
 EXPORT_SYMBOL(zero_page_memmap_ptr);
 
-#define MIN_PGT_PAGES			25UL
-#define MAX_PGT_FREES_PER_PASS		16L
-#define PGT_FRACTION_OF_NODE_MEM	16
-
-static inline long
-max_pgt_pages(void)
-{
-	u64 node_free_pages, max_pgt_pages;
-
-#ifndef	CONFIG_NUMA
-	node_free_pages = nr_free_pages();
-#else
-	node_free_pages = node_page_state(numa_node_id(), NR_FREE_PAGES);
-#endif
-	max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM;
-	max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES);
-	return max_pgt_pages;
-}
-
-static inline long
-min_pages_to_free(void)
-{
-	long pages_to_free;
-
-	pages_to_free = pgtable_quicklist_size - max_pgt_pages();
-	pages_to_free = min(pages_to_free, MAX_PGT_FREES_PER_PASS);
-	return pages_to_free;
-}
-
-void
-check_pgt_cache(void)
-{
-	long pages_to_free;
-
-	if (unlikely(pgtable_quicklist_size <= MIN_PGT_PAGES))
-		return;
-
-	preempt_disable();
-	while (unlikely((pages_to_free = min_pages_to_free()) > 0)) {
-		while (pages_to_free--) {
-			free_page((unsigned long)pgtable_quicklist_alloc());
-		}
-		preempt_enable();
-		preempt_disable();
-	}
-	preempt_enable();
-}
-
 void
 lazy_mmu_prot_update (pte_t pte)
 {
Index: linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-ia64/pgalloc.h	2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h	2007-03-16 02:33:46.000000000 -0700
@@ -18,71 +18,18 @@
 #include <linux/mm.h>
 #include <linux/page-flags.h>
 #include <linux/threads.h>
+#include <linux/quicklist.h>
 
 #include <asm/mmu_context.h>
 
-DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist);
-#define pgtable_quicklist __ia64_per_cpu_var(__pgtable_quicklist)
-DECLARE_PER_CPU(long, __pgtable_quicklist_size);
-#define pgtable_quicklist_size __ia64_per_cpu_var(__pgtable_quicklist_size)
-
-static inline long pgtable_quicklist_total_size(void)
-{
-	long ql_size = 0;
-	int cpuid;
-
-	for_each_online_cpu(cpuid) {
-		ql_size += per_cpu(__pgtable_quicklist_size, cpuid);
-	}
-	return ql_size;
-}
-
-static inline void *pgtable_quicklist_alloc(void)
-{
-	unsigned long *ret = NULL;
-
-	preempt_disable();
-
-	ret = pgtable_quicklist;
-	if (likely(ret != NULL)) {
-		pgtable_quicklist = (unsigned long *)(*ret);
-		ret[0] = 0;
-		--pgtable_quicklist_size;
-		preempt_enable();
-	} else {
-		preempt_enable();
-		ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-	}
-
-	return ret;
-}
-
-static inline void pgtable_quicklist_free(void *pgtable_entry)
-{
-#ifdef CONFIG_NUMA
-	int nid = page_to_nid(virt_to_page(pgtable_entry));
-
-	if (unlikely(nid != numa_node_id())) {
-		free_page((unsigned long)pgtable_entry);
-		return;
-	}
-#endif
-
-	preempt_disable();
-	*(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist;
-	pgtable_quicklist = (unsigned long *)pgtable_entry;
-	++pgtable_quicklist_size;
-	preempt_enable();
-}
-
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	return pgtable_quicklist_alloc();
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pgd_free(pgd_t * pgd)
 {
-	pgtable_quicklist_free(pgd);
+	quicklist_free(0, NULL, pgd);
 }
 
 #ifdef CONFIG_PGTABLE_4
@@ -94,12 +41,12 @@ pgd_populate(struct mm_struct *mm, pgd_t
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return pgtable_quicklist_alloc();
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pud_free(pud_t * pud)
 {
-	pgtable_quicklist_free(pud);
+	quicklist_free(0, NULL, pud);
 }
 #define __pud_free_tlb(tlb, pud)	pud_free(pud)
 #endif /* CONFIG_PGTABLE_4 */
@@ -112,12 +59,12 @@ pud_populate(struct mm_struct *mm, pud_t
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return pgtable_quicklist_alloc();
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pmd_free(pmd_t * pmd)
 {
-	pgtable_quicklist_free(pmd);
+	quicklist_free(0, NULL, pmd);
 }
 
 #define __pmd_free_tlb(tlb, pmd)	pmd_free(pmd)
@@ -137,28 +84,31 @@ pmd_populate_kernel(struct mm_struct *mm
 static inline struct page *pte_alloc_one(struct mm_struct *mm,
 					 unsigned long addr)
 {
-	void *pg = pgtable_quicklist_alloc();
+	void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
 	return pg ? virt_to_page(pg) : NULL;
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long addr)
 {
-	return pgtable_quicklist_alloc();
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pte_free(struct page *pte)
 {
-	pgtable_quicklist_free(page_address(pte));
+	quicklist_free(0, NULL, page_address(pte));
 }
 
 static inline void pte_free_kernel(pte_t * pte)
 {
-	pgtable_quicklist_free(pte);
+	quicklist_free(0, NULL, pte);
 }
 
-#define __pte_free_tlb(tlb, pte)	pte_free(pte)
+static inline void check_pgt_cache(void)
+{
+	quicklist_check(0, NULL);
+}
 
-extern void check_pgt_cache(void);
+#define __pte_free_tlb(tlb, pte)	pte_free(pte)
 
 #endif				/* _ASM_IA64_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/arch/ia64/mm/contig.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/mm/contig.c	2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/mm/contig.c	2007-03-16 02:33:46.000000000 -0700
@@ -88,7 +88,7 @@ void show_mem(void)
 	printk(KERN_INFO "%d pages shared\n", total_shared);
 	printk(KERN_INFO "%d pages swap cached\n", total_cached);
 	printk(KERN_INFO "Total of %ld pages in page table cache\n",
-	       pgtable_quicklist_total_size());
+	       quicklist_total_size());
 	printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
 }
 
Index: linux-2.6.21-rc3-mm2/arch/ia64/mm/discontig.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/mm/discontig.c	2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/mm/discontig.c	2007-03-16 02:33:46.000000000 -0700
@@ -563,7 +563,7 @@ void show_mem(void)
 	printk(KERN_INFO "%d pages shared\n", total_shared);
 	printk(KERN_INFO "%d pages swap cached\n", total_cached);
 	printk(KERN_INFO "Total of %ld pages in page table cache\n",
-	       pgtable_quicklist_total_size());
+	       quicklist_total_size());
 	printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
 }
 
Index: linux-2.6.21-rc3-mm2/arch/ia64/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/Kconfig	2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/Kconfig	2007-03-16 02:33:46.000000000 -0700
@@ -29,6 +29,10 @@ config ZONE_DMA
 	def_bool y
 	depends on !IA64_SGI_SN2
 
+config QUICKLIST
+	bool
+	default y
+
 config MMU
 	bool
 	default y

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [QUICKLIST 3/5] Quicklist support for i386
  2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
  2007-03-19 23:37 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
@ 2007-03-19 23:37 ` Christoph Lameter
  2007-03-19 23:37 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel

i386: Convert to quicklists

Implement the i386 management of pgd and pmds using quicklists.

The i386 management of page table pages currently uses page sized slabs.
Getting rid of that using quicklists allows full use of the page flags
and the page->lru. So get rid of the improvised linked lists using
page->index and page->private.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3-mm2/arch/i386/mm/init.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/init.c	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/init.c	2007-03-19 15:55:33.000000000 -0700
@@ -695,31 +695,6 @@ int remove_memory(u64 start, u64 size)
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif
 
-struct kmem_cache *pgd_cache;
-struct kmem_cache *pmd_cache;
-
-void __init pgtable_cache_init(void)
-{
-	if (PTRS_PER_PMD > 1) {
-		pmd_cache = kmem_cache_create("pmd",
-					PTRS_PER_PMD*sizeof(pmd_t),
-					PTRS_PER_PMD*sizeof(pmd_t),
-					0,
-					pmd_ctor,
-					NULL);
-		if (!pmd_cache)
-			panic("pgtable_cache_init(): cannot create pmd cache");
-	}
-	pgd_cache = kmem_cache_create("pgd",
-				PTRS_PER_PGD*sizeof(pgd_t),
-				PTRS_PER_PGD*sizeof(pgd_t),
-				0,
-				pgd_ctor,
-				PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
-	if (!pgd_cache)
-		panic("pgtable_cache_init(): Cannot create pgd cache");
-}
-
 /*
  * This function cannot be __init, since exceptions don't work in that
  * section.  Put this after the callers, so that it cannot be inlined.
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/pgtable.c	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c	2007-03-19 15:59:37.000000000 -0700
@@ -13,6 +13,7 @@
 #include <linux/pagemap.h>
 #include <linux/spinlock.h>
 #include <linux/module.h>
+#include <linux/quicklist.h>
 
 #include <asm/system.h>
 #include <asm/pgtable.h>
@@ -198,11 +199,6 @@ struct page *pte_alloc_one(struct mm_str
 	return pte;
 }
 
-void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags)
-{
-	memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));
-}
-
 /*
  * List of all pgd's needed for non-PAE so it can invalidate entries
  * in both cached and uncached pgd's; not needed for PAE since the
@@ -211,36 +207,18 @@ void pmd_ctor(void *pmd, struct kmem_cac
  * against pageattr.c; it is the unique case in which a valid change
  * of kernel pagetables can't be lazily synchronized by vmalloc faults.
  * vmalloc faults work because attached pagetables are never freed.
- * The locking scheme was chosen on the basis of manfred's
- * recommendations and having no core impact whatsoever.
  * -- wli
  */
 DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
-
-static inline void pgd_list_add(pgd_t *pgd)
-{
-	struct page *page = virt_to_page(pgd);
-	page->index = (unsigned long)pgd_list;
-	if (pgd_list)
-		set_page_private(pgd_list, (unsigned long)&page->index);
-	pgd_list = page;
-	set_page_private(page, (unsigned long)&pgd_list);
-}
+LIST_HEAD(pgd_list);
 
-static inline void pgd_list_del(pgd_t *pgd)
-{
-	struct page *next, **pprev, *page = virt_to_page(pgd);
-	next = (struct page *)page->index;
-	pprev = (struct page **)page_private(page);
-	*pprev = next;
-	if (next)
-		set_page_private(next, (unsigned long)pprev);
-}
+#define QUICK_PGD 0
+#define QUICK_PMD 1
 
-void pgd_ctor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_ctor(void *pgd)
 {
 	unsigned long flags;
+	struct page *page = virt_to_page(pgd);
 
 	if (PTRS_PER_PMD == 1) {
 		memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
@@ -259,31 +237,32 @@ void pgd_ctor(void *pgd, struct kmem_cac
 			__pa(swapper_pg_dir) >> PAGE_SHIFT,
 			USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD);
 
-	pgd_list_add(pgd);
+	list_add(&page->lru, &pgd_list);
 	spin_unlock_irqrestore(&pgd_lock, flags);
 }
 
 /* never called when PTRS_PER_PMD > 1 */
-void pgd_dtor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_dtor(void *pgd)
 {
 	unsigned long flags; /* can be called from interrupt context */
+	struct page *page = virt_to_page(pgd);
 
 	paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT);
 	spin_lock_irqsave(&pgd_lock, flags);
-	pgd_list_del(pgd);
+	list_del(&page->lru);
 	spin_unlock_irqrestore(&pgd_lock, flags);
 }
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	int i;
-	pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
+	pgd_t *pgd = quicklist_alloc(QUICK_PGD, GFP_KERNEL, pgd_ctor);
 
 	if (PTRS_PER_PMD == 1 || !pgd)
 		return pgd;
 
 	for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
-		pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
+		pmd_t *pmd = quicklist_alloc(QUICK_PMD, GFP_KERNEL, NULL);
 		if (!pmd)
 			goto out_oom;
 		paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT);
@@ -296,9 +275,9 @@ out_oom:
 		pgd_t pgdent = pgd[i];
 		void* pmd = (void *)__va(pgd_val(pgdent)-1);
 		paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
-		kmem_cache_free(pmd_cache, pmd);
+		quicklist_free(QUICK_PMD, NULL, pmd);
 	}
-	kmem_cache_free(pgd_cache, pgd);
+	quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 	return NULL;
 }
 
@@ -312,8 +291,14 @@ void pgd_free(pgd_t *pgd)
 			pgd_t pgdent = pgd[i];
 			void* pmd = (void *)__va(pgd_val(pgdent)-1);
 			paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
-			kmem_cache_free(pmd_cache, pmd);
+			quicklist_free(QUICK_PMD, NULL, pmd);
 		}
 	/* in the non-PAE case, free_pgtables() clears user pgd entries */
-	kmem_cache_free(pgd_cache, pgd);
+	quicklist_free(QUICK_PGD, pgd_ctor, pgd);
+}
+
+void check_pgt_cache(void)
+{
+	quicklist_check(QUICK_PGD, pgd_dtor);
+	quicklist_check(QUICK_PMD, NULL);
 }
Index: linux-2.6.21-rc3-mm2/arch/i386/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/Kconfig	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/Kconfig	2007-03-19 15:55:33.000000000 -0700
@@ -55,6 +55,14 @@ config ZONE_DMA
 	bool
 	default y
 
+config QUICKLIST
+	bool
+	default y
+
+config NR_QUICK
+	int
+	default 2
+
 config SBUS
 	bool
 
Index: linux-2.6.21-rc3-mm2/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-i386/pgtable.h	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-i386/pgtable.h	2007-03-19 15:55:33.000000000 -0700
@@ -35,15 +35,12 @@ struct vm_area_struct;
 #define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
 extern unsigned long empty_zero_page[1024];
 extern pgd_t swapper_pg_dir[1024];
-extern struct kmem_cache *pgd_cache;
-extern struct kmem_cache *pmd_cache;
-extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
 
-void pmd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_dtor(void *, struct kmem_cache *, unsigned long);
-void pgtable_cache_init(void);
+void check_pgt_cache(void);
+
+extern spinlock_t pgd_lock;
+extern struct list_head pgd_list;
+static inline void pgtable_cache_init(void) {};
 void paging_init(void);
 
 /*
Index: linux-2.6.21-rc3-mm2/arch/i386/kernel/smp.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/kernel/smp.c	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/kernel/smp.c	2007-03-19 15:55:33.000000000 -0700
@@ -437,7 +437,7 @@ void flush_tlb_mm (struct mm_struct * mm
 	}
 	if (!cpus_empty(cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+	check_pgt_cache();
 	preempt_enable();
 }
 
Index: linux-2.6.21-rc3-mm2/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/kernel/process.c	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/kernel/process.c	2007-03-19 15:55:33.000000000 -0700
@@ -181,6 +181,7 @@ void cpu_idle(void)
 			if (__get_cpu_var(cpu_idle_state))
 				__get_cpu_var(cpu_idle_state) = 0;
 
+			check_pgt_cache();
 			rmb();
 			idle = pm_idle;
 
Index: linux-2.6.21-rc3-mm2/include/asm-i386/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-i386/pgalloc.h	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-i386/pgalloc.h	2007-03-19 15:55:33.000000000 -0700
@@ -65,6 +65,6 @@ do {									\
 #define pud_populate(mm, pmd, pte)	BUG()
 #endif
 
-#define check_pgt_cache()	do { } while (0)
+extern void check_pgt_cache(void);
 
 #endif /* _I386_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/fault.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/fault.c	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/fault.c	2007-03-19 15:55:33.000000000 -0700
@@ -623,11 +623,10 @@ void vmalloc_sync_all(void)
 			struct page *page;
 
 			spin_lock_irqsave(&pgd_lock, flags);
-			for (page = pgd_list; page; page =
-					(struct page *)page->index)
+			list_for_each_entry(page, &pgd_list, lru)
 				if (!vmalloc_sync_one(page_address(page),
 								address)) {
-					BUG_ON(page != pgd_list);
+					BUG();
 					break;
 				}
 			spin_unlock_irqrestore(&pgd_lock, flags);
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/pageattr.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/pageattr.c	2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/pageattr.c	2007-03-19 15:55:33.000000000 -0700
@@ -95,7 +95,7 @@ static void set_pmd_pte(pte_t *kpte, uns
 		return;
 
 	spin_lock_irqsave(&pgd_lock, flags);
-	for (page = pgd_list; page; page = (struct page *)page->index) {
+	list_for_each_entry(page, &pgd_list, lru) {
 		pgd_t *pgd;
 		pud_t *pud;
 		pmd_t *pmd;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [QUICKLIST 4/5] Quicklist support for x86_64
  2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
  2007-03-19 23:37 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
  2007-03-19 23:37 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
@ 2007-03-19 23:37 ` Christoph Lameter
  2007-03-19 23:37 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel

Conver x86_64 to using quicklists

This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.

A second quicklist is used to separate out PGD handling. Thus we can carry
the initialized pgds of terminating processes over to the next process
needing them.

Also clean up the pgd_list handling to use regular list macros. Not using
the slab allocator frees up the lru field so we can use regular list macros.

The adding and removal of the pgds to the pgdlist is moved into the
constructor / destructor. We can then avoid moving pgds off the list that
are still in the quicklists reducing the pds creation and allocation
overhead further.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/Kconfig	2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig	2007-03-15 22:00:04.000000000 -0700
@@ -56,6 +56,14 @@ config ZONE_DMA
 	bool
 	default y
 
+config QUICKLIST
+	bool
+	default y
+
+config NR_QUICK
+	int
+	default 2
+
 config ISA
 	bool
 
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgalloc.h	2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h	2007-03-15 21:59:31.000000000 -0700
@@ -4,6 +4,10 @@
 #include <asm/pda.h>
 #include <linux/threads.h>
 #include <linux/mm.h>
+#include <linux/quicklist.h>
+
+#define QUICK_PGD 0	/* We preserve special mappings over free */
+#define QUICK_PT 1	/* Other page table pages that are zero on free */
 
 #define pmd_populate_kernel(mm, pmd, pte) \
 		set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,86 +24,77 @@ static inline void pmd_populate(struct m
 static inline void pmd_free(pmd_t *pmd)
 {
 	BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-	free_page((unsigned long)pmd);
+	quicklist_free(QUICK_PT, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-	free_page((unsigned long)pud);
+	quicklist_free(QUICK_PT, NULL, pud);
 }
 
-static inline void pgd_list_add(pgd_t *pgd)
+static inline void pgd_ctor(void *x)
 {
+	unsigned boundary;
+	pgd_t *pgd = x;
 	struct page *page = virt_to_page(pgd);
 
+	/*
+	 * Copy kernel pointers in from init.
+	 */
+	boundary = pgd_index(__PAGE_OFFSET);
+	memcpy(pgd + boundary,
+		init_level4_pgt + boundary,
+		(PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
 	spin_lock(&pgd_lock);
-	page->index = (pgoff_t)pgd_list;
-	if (pgd_list)
-		pgd_list->private = (unsigned long)&page->index;
-	pgd_list = page;
-	page->private = (unsigned long)&pgd_list;
+	list_add(&page->lru, &pgd_list);
 	spin_unlock(&pgd_lock);
 }
 
-static inline void pgd_list_del(pgd_t *pgd)
+static inline void pgd_dtor(void *x)
 {
-	struct page *next, **pprev, *page = virt_to_page(pgd);
+	pgd_t *pgd = x;
+	struct page *page = virt_to_page(pgd);
 
 	spin_lock(&pgd_lock);
-	next = (struct page *)page->index;
-	pprev = (struct page **)page->private;
-	*pprev = next;
-	if (next)
-		next->private = (unsigned long)pprev;
+	list_del(&page->lru);
 	spin_unlock(&pgd_lock);
 }
 
+
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	unsigned boundary;
-	pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
-	if (!pgd)
-		return NULL;
-	pgd_list_add(pgd);
-	/*
-	 * Copy kernel pointers in from init.
-	 * Could keep a freelist or slab cache of those because the kernel
-	 * part never changes.
-	 */
-	boundary = pgd_index(__PAGE_OFFSET);
-	memset(pgd, 0, boundary * sizeof(pgd_t));
-	memcpy(pgd + boundary,
-	       init_level4_pgt + boundary,
-	       (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+	pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+			 GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
+
 	return pgd;
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {
 	BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
-	pgd_list_del(pgd);
-	free_page((unsigned long)pgd);
+	quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
-	void *p = (void *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	void *p = (void *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 	if (!p)
 		return NULL;
 	return virt_to_page(p);
@@ -111,17 +106,22 @@ static inline struct page *pte_alloc_one
 static inline void pte_free_kernel(pte_t *pte)
 {
 	BUG_ON((unsigned long)pte & (PAGE_SIZE-1));
-	free_page((unsigned long)pte); 
+	quicklist_free(QUICK_PT, NULL, pte);
 }
 
 static inline void pte_free(struct page *pte)
 {
 	__free_page(pte);
-} 
+}
 
 #define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
 
 #define __pmd_free_tlb(tlb,x)   tlb_remove_page((tlb),virt_to_page(x))
 #define __pud_free_tlb(tlb,x)   tlb_remove_page((tlb),virt_to_page(x))
 
+static inline void check_pgt_cache(void)
+{
+	quicklist_check(QUICK_PGD, pgd_dtor);
+	quicklist_check(QUICK_PT, NULL);
+}
 #endif /* _X86_64_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/kernel/process.c	2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/kernel/process.c	2007-03-15 21:59:31.000000000 -0700
@@ -207,6 +207,7 @@ void cpu_idle (void)
 			if (__get_cpu_var(cpu_idle_state))
 				__get_cpu_var(cpu_idle_state) = 0;
 
+			check_pgt_cache();
 			rmb();
 			idle = pm_idle;
 			if (!idle)
Index: linux-2.6.21-rc3-mm2/arch/x86_64/kernel/smp.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/kernel/smp.c	2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/kernel/smp.c	2007-03-15 21:59:31.000000000 -0700
@@ -242,7 +242,7 @@ void flush_tlb_mm (struct mm_struct * mm
 	}
 	if (!cpus_empty(cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+	check_pgt_cache();
 	preempt_enable();
 }
 EXPORT_SYMBOL(flush_tlb_mm);
Index: linux-2.6.21-rc3-mm2/arch/x86_64/mm/fault.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/mm/fault.c	2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/mm/fault.c	2007-03-15 21:59:31.000000000 -0700
@@ -585,7 +585,7 @@ do_sigbus:
 }
 
 DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
+LIST_HEAD(pgd_list);
 
 void vmalloc_sync_all(void)
 {
@@ -605,8 +605,7 @@ void vmalloc_sync_all(void)
 			if (pgd_none(*pgd_ref))
 				continue;
 			spin_lock(&pgd_lock);
-			for (page = pgd_list; page;
-			     page = (struct page *)page->index) {
+			list_for_each_entry(page, &pgd_list, lru) {
 				pgd_t *pgd;
 				pgd = (pgd_t *)page_address(page) + pgd_index(address);
 				if (pgd_none(*pgd))
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgtable.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgtable.h	2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgtable.h	2007-03-15 21:59:31.000000000 -0700
@@ -402,7 +402,7 @@ static inline pte_t pte_modify(pte_t pte
 #define __swp_entry_to_pte(x)		((pte_t) { (x).val })
 
 extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
+extern struct list_head pgd_list;
 void vmalloc_sync_all(void);
 
 #endif /* !__ASSEMBLY__ */
@@ -419,7 +419,6 @@ extern int kern_addr_valid(unsigned long
 #define HAVE_ARCH_UNMAPPED_AREA
 
 #define pgtable_cache_init()   do { } while (0)
-#define check_pgt_cache()      do { } while (0)
 
 #define PAGE_AGP    PAGE_KERNEL_NOCACHE
 #define HAVE_PAGE_AGP 1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [QUICKLIST 5/5] Quicklist support for sparc64
  2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
                   ` (2 preceding siblings ...)
  2007-03-19 23:37 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
@ 2007-03-19 23:37 ` Christoph Lameter
  2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel

From: David Miller <davem@davemloft.net>

[QUICKLIST]: Add sparc64 quicklist support.

I ported this to sparc64 as per the patch below, tested on
UP SunBlade1500 and 24 cpu Niagara T1000.

Signed-off-by: David S. Miller <davem@davemloft.net>

Index: linux-2.6.21-rc3-mm2/arch/sparc64/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/Kconfig	2007-03-13 00:09:30.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/Kconfig	2007-03-15 22:01:40.000000000 -0700
@@ -26,6 +26,10 @@ config MMU
 	bool
 	default y
 
+config QUICKLIST
+	bool
+	default y
+
 config STACKTRACE_SUPPORT
 	bool
 	default y
Index: linux-2.6.21-rc3-mm2/arch/sparc64/mm/init.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/mm/init.c	2007-03-13 00:09:30.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/mm/init.c	2007-03-15 22:00:44.000000000 -0700
@@ -176,30 +176,6 @@ unsigned long sparc64_kern_sec_context _
 
 int bigkernel = 0;
 
-struct kmem_cache *pgtable_cache __read_mostly;
-
-static void zero_ctor(void *addr, struct kmem_cache *cache, unsigned long flags)
-{
-	clear_page(addr);
-}
-
-extern void tsb_cache_init(void);
-
-void pgtable_cache_init(void)
-{
-	pgtable_cache = kmem_cache_create("pgtable_cache",
-					  PAGE_SIZE, PAGE_SIZE,
-					  SLAB_HWCACHE_ALIGN |
-					  SLAB_MUST_HWCACHE_ALIGN,
-					  zero_ctor,
-					  NULL);
-	if (!pgtable_cache) {
-		prom_printf("Could not create pgtable_cache\n");
-		prom_halt();
-	}
-	tsb_cache_init();
-}
-
 #ifdef CONFIG_DEBUG_DCFLUSH
 atomic_t dcpage_flushes = ATOMIC_INIT(0);
 #ifdef CONFIG_SMP
Index: linux-2.6.21-rc3-mm2/arch/sparc64/mm/tsb.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/mm/tsb.c	2007-03-13 00:09:30.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/mm/tsb.c	2007-03-15 22:00:44.000000000 -0700
@@ -252,7 +252,7 @@ static const char *tsb_cache_names[8] = 
 	"tsb_1MB",
 };
 
-void __init tsb_cache_init(void)
+void __init pgtable_cache_init(void)
 {
 	unsigned long i;
 
Index: linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-sparc64/pgalloc.h	2007-03-13 00:09:30.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h	2007-03-15 22:00:44.000000000 -0700
@@ -6,6 +6,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
+#include <linux/quicklist.h>
 
 #include <asm/spitfire.h>
 #include <asm/cpudata.h>
@@ -13,52 +14,50 @@
 #include <asm/page.h>
 
 /* Page table allocation/freeing. */
-extern struct kmem_cache *pgtable_cache;
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {
-	kmem_cache_free(pgtable_cache, pgd);
+	quicklist_free(0, NULL, pgd);
 }
 
 #define pud_populate(MM, PUD, PMD)	pud_set(PUD, PMD)
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return kmem_cache_alloc(pgtable_cache,
-				GFP_KERNEL|__GFP_REPEAT);
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline void pmd_free(pmd_t *pmd)
 {
-	kmem_cache_free(pgtable_cache, pmd);
+	quicklist_free(0, NULL, pmd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return kmem_cache_alloc(pgtable_cache,
-				GFP_KERNEL|__GFP_REPEAT);
+	return quicklist_alloc(0, GFP_KERNEL, NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm,
 					 unsigned long address)
 {
-	return virt_to_page(pte_alloc_one_kernel(mm, address));
+	void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
+	return pg ? virt_to_page(pg) : NULL;
 }
 		
 static inline void pte_free_kernel(pte_t *pte)
 {
-	kmem_cache_free(pgtable_cache, pte);
+	quicklist_free(0, NULL, pte);
 }
 
 static inline void pte_free(struct page *ptepage)
 {
-	pte_free_kernel(page_address(ptepage));
+	quicklist_free(0, NULL, page_address(ptepage));
 }
 
 
@@ -66,6 +65,9 @@ static inline void pte_free(struct page 
 #define pmd_populate(MM,PMD,PTE_PAGE)		\
 	pmd_populate_kernel(MM,PMD,page_address(PTE_PAGE))
 
-#define check_pgt_cache()	do { } while (0)
+static inline void check_pgt_cache(void)
+{
+	quicklist_check(0, NULL);
+}
 
 #endif /* _SPARC64_PGALLOC_H */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
                   ` (3 preceding siblings ...)
  2007-03-19 23:37 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
@ 2007-03-19 23:53 ` Andrew Morton
  2007-03-19 23:57   ` Christoph Lameter
  2007-03-19 23:58   ` David Miller
  2007-03-20  0:21 ` Andrew Morton
  2007-03-20  3:12 ` Paul Mackerras
  6 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2007-03-19 23:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007 15:37:16 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:

> This patchset introduces an arch independent framework to handle lists
> of recently used page table pages to replace the existing (ab)use of the
> slab for that purpose.
> 
> 1. Proven code from the IA64 arch.

Has it been proven that quicklists are superior to simply going direct to the
page allocator for these pages?

Would it provide a superior solution if we were to a) stop zeroing out the
pte's when doing a fullmm==1 teardown and b) go direct to the page allocator
for these pages?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
@ 2007-03-19 23:57   ` Christoph Lameter
  2007-03-20  0:07     ` Andrew Morton
  2007-03-19 23:58   ` David Miller
  1 sibling, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007, Andrew Morton wrote:

> Has it been proven that quicklists are superior to simply going direct to the
> page allocator for these pages?

Yes.
 
> Would it provide a superior solution if we were to a) stop zeroing out the
> pte's when doing a fullmm==1 teardown and b) go direct to the page allocator
> for these pages?

I doubt it. The zeroing is a by product of our way of serializing pte 
handling. Its going to be difficult to change that.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
  2007-03-19 23:57   ` Christoph Lameter
@ 2007-03-19 23:58   ` David Miller
  1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2007-03-19 23:58 UTC (permalink / raw)
  To: akpm; +Cc: clameter, linux-mm, linux-kernel

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 19 Mar 2007 16:53:29 -0700

> Would it provide a superior solution if we were to a) stop zeroing out the
> pte's when doing a fullmm==1 teardown and b) go direct to the page allocator
> for these pages?

While you could avoid zero'ing them out, you certainly can't avoid
reading them into the cpu caches.

And for the PGDs you have to initialize these things partially to
non-zero values on x86{,_64} on every new PGD you allocate, which is a
complete waste of cpu cache dirtying.  Avoiding this overhead alone
justifies the quicklists I think.  It's not just a "zero" thing,
so GFP_ZERO cannot help you here.

The more I think about it the more I like the quicklists.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-19 23:57   ` Christoph Lameter
@ 2007-03-20  0:07     ` Andrew Morton
  2007-03-20  0:44       ` Christoph Lameter
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-20  0:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007 16:57:55 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Mon, 19 Mar 2007, Andrew Morton wrote:
> 
> > Has it been proven that quicklists are superior to simply going direct to the
> > page allocator for these pages?
> 
> Yes.

Sigh.

Please provide proof that quicklists are superior to simply going direct to
the page allocator for these pages.

> > Would it provide a superior solution if we were to a) stop zeroing out the
> > pte's when doing a fullmm==1 teardown and b) go direct to the page allocator
> > for these pages?
> 
> I doubt it. The zeroing is a by product of our way of serializing pte 
> handling. Its going to be difficult to change that.

Nick didn't think so, and I don't see the problem either.

We'll save on some bus traffic by avoiding the writeback, but how much
effect that will have we don't know.  Presumably little.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
                   ` (4 preceding siblings ...)
  2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
@ 2007-03-20  0:21 ` Andrew Morton
  2007-03-20  1:06   ` Christoph Lameter
  2007-03-20  3:12 ` Paul Mackerras
  6 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-20  0:21 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007 15:37:16 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:

> ...
>
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.21-rc3-mm2/include/linux/quicklist.h	2007-03-16 02:19:15.000000000 -0700
> @@ -0,0 +1,95 @@
> +#ifndef LINUX_QUICKLIST_H
> +#define LINUX_QUICKLIST_H
> +/*
> + * Fast allocations and disposal of pages. Pages must be in the condition
> + * as needed after allocation when they are freed. Per cpu lists of pages
> + * are kept that only contain node local pages.
> + *
> + * (C) 2007, SGI. Christoph Lameter <clameter@sgi.com>
> + */
> +#include <linux/kernel.h>
> +#include <linux/gfp.h>
> +#include <linux/percpu.h>
> +
> +#ifdef CONFIG_QUICKLIST
> +
> +#ifndef CONFIG_NR_QUICK
> +#define CONFIG_NR_QUICK 1
> +#endif

No, please don't define config items like this.  Do it in Kconfig.

> +static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
> +{
> +	struct quicklist *q;
> +	void **p = NULL;
> +
> +	q =&get_cpu_var(quicklist)[nr];
> +	p = q->page;
> +	if (likely(p)) {
> +		q->page = p[0];
> +		p[0] = NULL;
> +		q->nr_pages--;
> +	}
> +	put_cpu_var(quicklist);
> +	if (likely(p))
> +		return p;
> +
> +	p = (void *)__get_free_page(flags | __GFP_ZERO);
> +	if (ctor && p)
> +		ctor(p);
> +	return p;
> +}
> +
> +static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
> +{
> +	struct quicklist *q;
> +	void **p = pp;
> +	struct page *page = virt_to_page(p);
> +	int nid = page_to_nid(page);
> +
> +	if (unlikely(nid != numa_node_id())) {
> +		if (dtor)
> +			dtor(p);
> +		free_page((unsigned long)p);
> +		return;
> +	}
> +
> +	q = &get_cpu_var(quicklist)[nr];
> +	p[0] = q->page;
> +	q->page = p;
> +	q->nr_pages++;
> +	put_cpu_var(quicklist);
> +}

These guys seem to have multiple callsites for ia64 at least and probably
would benefit from being uninlined.

> +void quicklist_check(int nr, void (*dtor)(void *));
> +unsigned long quicklist_total_size(void);
> +
> +#else
> +void quicklist_check(int nr, void (*dtor)(void *))
> +{
> +}
> +
> +unsigned long quicklist_total_size(void)
> +{
> +	return 0;
> +}
> +#endif

That obviouslty won't link and wasn't tested.  Making these static inline
will help.

> +/*
> + * Quicklist support.
> + *
> + * Quicklists are light weight lists of pages that have a defined state
> + * on alloc and free. Pages must be in the quicklist specific defined state
> + * (zero by default) when the page is freed. It seems that the initial idea
> + * for such lists first came from Dave Miller and then various other people
> + * improved on it.
> + *
> + * Copyright (C) 2007 SGI,
> + * 	Christoph Lameter <clameter@sgi.com>
> + * 		Generalized, added support for multiple lists and
> + * 		constructors / destructors.
> + */
> +#include <linux/kernel.h>
> +
> +#include <linux/mm.h>
> +#include <linux/mmzone.h>
> +#include <linux/module.h>
> +#include <linux/quicklist.h>
> +
> +DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];

If we uninline those big inlines, this can perhaps be made static.

> +#define MIN_PAGES		25
> +#define MAX_FREES_PER_PASS	16
> +#define FRACTION_OF_NODE_MEM	16

Are these constants optimal for all architectures?

> +static unsigned long max_pages(void)
> +{
> +	unsigned long node_free_pages, max;
> +
> +	node_free_pages = node_page_state(numa_node_id(),
> +			NR_FREE_PAGES);
> +	max = node_free_pages / FRACTION_OF_NODE_MEM;
> +	return max(max, (unsigned long)MIN_PAGES);
> +}
> +
> +static long min_pages_to_free(struct quicklist *q)
> +{
> +	long pages_to_free;
> +
> +	pages_to_free = q->nr_pages - max_pages();
> +
> +	return min(pages_to_free, (long)MAX_FREES_PER_PASS);
> +}

min_t and max_t are the standard way of avoiding that warning.  Or stick a
UL on the constants (which is probably better).

> +void quicklist_check(int nr, void (*dtor)(void *))
> +{
> +	long pages_to_free;
> +	struct quicklist *q;
> +
> +	q = &get_cpu_var(quicklist)[nr];
> +	if (q->nr_pages > MIN_PAGES) {
> +		pages_to_free = min_pages_to_free(q);
> +
> +		while (pages_to_free > 0) {
> +			void *p = quicklist_alloc(nr, 0, NULL);
> +
> +			if (dtor)
> +				dtor(p);
> +			free_page((unsigned long)p);
> +			pages_to_free--;
> +		}
> +	}
> +	put_cpu_var(quicklist);
> +}

The use of a literal 0 as a gfp_t is a bit ugly.  I assume that we don't
care because we should never actually call into the page allocator for this
caller.  But it's not terribly clear because there is no commentary
describing what this function is supposed to do.

The name foo_check() is unfortunate: it implies that the function checks
something (ie: has no side-effects).  But this function _does_ change
things and perhaps should be called quicklist_trim() or something like
that.

This function lacks any commentary, but I was able to work it out.  I
think.  Some nice comments would be, umm, nice.

> +unsigned long quicklist_total_size(void)
> +{
> +	unsigned long count = 0;
> +	int cpu;
> +	struct quicklist *ql, *q;
> +
> +	for_each_online_cpu(cpu) {
> +		ql = per_cpu(quicklist, cpu);
> +		for (q = ql; q < ql + CONFIG_NR_QUICK; q++)
> +			count += q->nr_pages;
> +	}
> +	return count;
> +}
> +

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-20  0:07     ` Andrew Morton
@ 2007-03-20  0:44       ` Christoph Lameter
  2007-03-20  0:55         ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20  0:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007, Andrew Morton wrote:

> Please provide proof that quicklists are superior to simply going direct to
> the page allocator for these pages.

See the patch. We are only touching 2 cachelines instead of 32. So even 
without considering the page allocator overhead and the slab allocator 
overhead (which will make the situation even better) its superior.

> > I doubt it. The zeroing is a by product of our way of serializing pte 
> > handling. Its going to be difficult to change that.
> 
> Nick didn't think so, and I don't see the problem either.

You do not think that our current way of handling ptes is okay? If we do 
not zero the ptes then we need to separate munmap from process shutdown.

> We'll save on some bus traffic by avoiding the writeback, but how much
> effect that will have we don't know.  Presumably little.

The advantage of the quicklists is that it does not require a rework of 
the pte serialization.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-20  0:44       ` Christoph Lameter
@ 2007-03-20  0:55         ` Andrew Morton
  2007-03-20  1:03           ` Christoph Lameter
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-20  0:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007 17:44:28 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Mon, 19 Mar 2007, Andrew Morton wrote:
> 
> > Please provide proof that quicklists are superior to simply going direct to
> > the page allocator for these pages.
> 
> See the patch. We are only touching 2 cachelines instead of 32. So even 
> without considering the page allocator overhead and the slab allocator 
> overhead (which will make the situation even better) its superior.

That's not proof, it is handwaving.  I could wave right back at you and
claim that the benefit from returning a cache-hot pte page back to the page
allocator for reuse exceeds the benefit which you waved at me above.

You may well be right, but nothing is proven, afaict.

> > > I doubt it. The zeroing is a by product of our way of serializing pte 
> > > handling. Its going to be difficult to change that.
> > 
> > Nick didn't think so, and I don't see the problem either.
> 
> You do not think that our current way of handling ptes is okay? If we do 
> not zero the ptes then we need to separate munmap from process shutdown.

Yep.  It's possible that process shutdown is a sufficiently common and
costly special-case for it to be worth special-casing.

> > We'll save on some bus traffic by avoiding the writeback, but how much
> > effect that will have we don't know.  Presumably little.
> 
> The advantage of the quicklists is that it does not require a rework of 
> the pte serialization.

No, these are unrelated.  We can get pte pages from the page allocator and
zero them without touching the munmap handling.

But it's possible that if we _were_ to optimise the munmap handling as
suggested, the end result would be superior.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-20  0:55         ` Andrew Morton
@ 2007-03-20  1:03           ` Christoph Lameter
  2007-03-20  1:32             ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20  1:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007, Andrew Morton wrote:

> > See the patch. We are only touching 2 cachelines instead of 32. So even 
> > without considering the page allocator overhead and the slab allocator 
> > overhead (which will make the situation even better) its superior.
> 
> That's not proof, it is handwaving.  I could wave right back at you and
> claim that the benefit from returning a cache-hot pte page back to the page
> allocator for reuse exceeds the benefit which you waved at me above.

No you cannot make that claim. That would mean that you have to touch 
32 pages which is inferior.
  
> You may well be right, but nothing is proven, afaict.

Nothing can be proven except within a rigorously defined mathematical 
system but even there we are limited by such things as Russel's paradox.

Its obvious that this is right. And there has been significant work 
invested into retaining page table pages on i386, sparc64 and ia64 for 
exactly the specified. This patch does not change that at all for these 3 
arches. There is no doubt about the correctness of the approach here.

> > You do not think that our current way of handling ptes is okay? If we do 
> > not zero the ptes then we need to separate munmap from process shutdown.
> 
> Yep.  It's possible that process shutdown is a sufficiently common and
> costly special-case for it to be worth special-casing.

Ok great idea but what does this have to do with this patch? This patch 
simply generalizes something that has been there for ages.

> > The advantage of the quicklists is that it does not require a rework of 
> > the pte serialization.
> 
> No, these are unrelated.  We can get pte pages from the page allocator and
> zero them without touching the munmap handling.
> 
> But it's possible that if we _were_ to optimise the munmap handling as
> suggested, the end result would be superior.

Andrew, this is utter crap and unrelated to this work. The main thing here 
is to generalize something that various arches already do and to avoid the 
page struct handling collisions. You use pie-in-the-sky to argue against 
consolidating code and fixing up usage conflicts of the slab with arch 
code?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-20  0:21 ` Andrew Morton
@ 2007-03-20  1:06   ` Christoph Lameter
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20  1:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007, Andrew Morton wrote:

> > +
> > +#ifdef CONFIG_QUICKLIST
> > +
> > +#ifndef CONFIG_NR_QUICK
> > +#define CONFIG_NR_QUICK 1
> > +#endif
> 
> No, please don't define config items like this.  Do it in Kconfig.

They can be set up in the arch specific Kconfig. Ok. I moved the 
#ifndef .. #endif into mm/Kconfig.

> These guys seem to have multiple callsites for ia64 at least and probably
> would benefit from being uninlined.

Then they would no longer be optimizable. Right now one can compile out 
the constructor / destructor support and provide a constant list number 
as well as constant gfp masks. This can be very small and benefit 
tremendously from inlining.

Many arches do not need some features and there are only a few call 
sites.

> > +void quicklist_check(int nr, void (*dtor)(void *));
> > +unsigned long quicklist_total_size(void);
> > +
> > +#else
> > +void quicklist_check(int nr, void (*dtor)(void *))
> > +{
> > +}
> > +
> > +unsigned long quicklist_total_size(void)
> > +{
> > +	return 0;
> > +}
> > +#endif
> 
> That obviouslty won't link and wasn't tested.  Making these static inline
> will help.

Hmmm... We could drop these conmpletely. If an arch does not use 
quicklists then they should not be calling these.

> > +#include <linux/module.h>
> > +#include <linux/quicklist.h>
> > +
> > +DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
> 
> If we uninline those big inlines, this can perhaps be made static.

Yeah but we want the inlines.

> 
> > +#define MIN_PAGES		25
> > +#define MAX_FREES_PER_PASS	16
> > +#define FRACTION_OF_NODE_MEM	16
> 
> Are these constants optimal for all architectures?

I added them as parameters to quicklist_trim so that an arch 
can specify their own settings.

> > +	return min(pages_to_free, (long)MAX_FREES_PER_PASS);
> > +}
> 
> min_t and max_t are the standard way of avoiding that warning.  Or stick a
> UL on the constants (which is probably better).

We do not need those since the constants are now parameters.

> 
> > +void quicklist_check(int nr, void (*dtor)(void *))
> > +{
> > +	long pages_to_free;
> > +	struct quicklist *q;
> > +
> > +	q = &get_cpu_var(quicklist)[nr];
> > +	if (q->nr_pages > MIN_PAGES) {
> > +		pages_to_free = min_pages_to_free(q);
> > +
> > +		while (pages_to_free > 0) {
> > +			void *p = quicklist_alloc(nr, 0, NULL);
> > +
> > +			if (dtor)
> > +				dtor(p);
> > +			free_page((unsigned long)p);
> > +			pages_to_free--;
> > +		}
> > +	}
> > +	put_cpu_var(quicklist);
> > +}
> 
> The use of a literal 0 as a gfp_t is a bit ugly.  I assume that we don't
> care because we should never actually call into the page allocator for this
> caller.  But it's not terribly clear because there is no commentary
> describing what this function is supposed to do.

Right. Will add comments.

> The name foo_check() is unfortunate: it implies that the function checks
> something (ie: has no side-effects).  But this function _does_ change
> things and perhaps should be called quicklist_trim() or something like
> that.

Tradition. Dave initially named it check_pgt_cache it seems.
 
> This function lacks any commentary, but I was able to work it out.  I
> think.  Some nice comments would be, umm, nice.

ok. Here is a fixup patch:

Index: linux-2.6.21-rc3-mm2/include/linux/quicklist.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/linux/quicklist.h	2007-03-19 17:41:42.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/linux/quicklist.h	2007-03-19 17:47:34.000000000 -0700
@@ -13,10 +13,6 @@
 
 #ifdef CONFIG_QUICKLIST
 
-#ifndef CONFIG_NR_QUICK
-#define CONFIG_NR_QUICK 1
-#endif
-
 struct quicklist {
 	void *page;
 	int nr_pages;
@@ -77,18 +73,11 @@ static inline void quicklist_free(int nr
 	put_cpu_var(quicklist);
 }
 
-void quicklist_check(int nr, void (*dtor)(void *));
-unsigned long quicklist_total_size(void);
+void quicklist_trim(int nr, void (*dtor)(void *),
+	unsigned long min_pages, unsigned long max_free);
 
-#else
-void quicklist_check(int nr, void (*dtor)(void *))
-{
-}
+unsigned long quicklist_total_size(void);
 
-unsigned long quicklist_total_size(void)
-{
-	return 0;
-}
 #endif
 
 #endif /* LINUX_QUICKLIST_H */
Index: linux-2.6.21-rc3-mm2/mm/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/mm/Kconfig	2007-03-19 17:41:42.000000000 -0700
+++ linux-2.6.21-rc3-mm2/mm/Kconfig	2007-03-19 17:42:49.000000000 -0700
@@ -220,3 +220,7 @@ config DEBUG_READAHEAD
 
 	  Say N for production servers.
 
+config NR_QUICK
+	depends on QUICKLIST
+	default 1
+
Index: linux-2.6.21-rc3-mm2/mm/quicklist.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/mm/quicklist.c	2007-03-19 17:41:42.000000000 -0700
+++ linux-2.6.21-rc3-mm2/mm/quicklist.c	2007-03-19 17:53:45.000000000 -0700
@@ -21,39 +21,46 @@
 
 DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
 
-#define MIN_PAGES		25
-#define MAX_FREES_PER_PASS	16
 #define FRACTION_OF_NODE_MEM	16
 
-static unsigned long max_pages(void)
+static unsigned long max_pages(unsigned long min_pages)
 {
 	unsigned long node_free_pages, max;
 
 	node_free_pages = node_page_state(numa_node_id(),
 			NR_FREE_PAGES);
 	max = node_free_pages / FRACTION_OF_NODE_MEM;
-	return max(max, (unsigned long)MIN_PAGES);
+	return max(max, min_pages);
 }
 
-static long min_pages_to_free(struct quicklist *q)
+static long min_pages_to_free(struct quicklist *q,
+	unsigned long min_pages, long max_free)
 {
 	long pages_to_free;
 
-	pages_to_free = q->nr_pages - max_pages();
+	pages_to_free = q->nr_pages - max_pages(min_pages);
 
-	return min(pages_to_free, (long)MAX_FREES_PER_PASS);
+	return min(pages_to_free, max_free);
 }
 
-void quicklist_check(int nr, void (*dtor)(void *))
+/*
+ * Trim down the number of pages in the quicklist
+ */
+void quicklist_trim(int nr, void (*dtor)(void *),
+	unsigned long min_pages, unsigned long max_free)
 {
 	long pages_to_free;
 	struct quicklist *q;
 
 	q = &get_cpu_var(quicklist)[nr];
-	if (q->nr_pages > MIN_PAGES) {
-		pages_to_free = min_pages_to_free(q);
+	if (q->nr_pages > min_pages) {
+		pages_to_free = min_pages_to_free(q, min_pages, max_free);
 
 		while (pages_to_free > 0) {
+			/*
+			 * We pass a gfp_t of 0 to quicklist_alloc here
+			 * because we will never call into the page allocator.
+			 */
 			void *p = quicklist_alloc(nr, 0, NULL);
 
 			if (dtor)
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/pgtable.c	2007-03-19 17:42:44.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c	2007-03-19 17:42:49.000000000 -0700
@@ -299,6 +299,6 @@ void pgd_free(pgd_t *pgd)
 
 void check_pgt_cache(void)
 {
-	quicklist_check(QUICK_PGD, pgd_dtor);
-	quicklist_check(QUICK_PMD, NULL);
+	quicklist_trim(QUICK_PGD, pgd_dtor, 25, 16);
+	quicklist_trim(QUICK_PMD, NULL, 25, 16);
 }
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgalloc.h	2007-03-19 17:42:46.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h	2007-03-19 17:42:49.000000000 -0700
@@ -121,7 +121,7 @@ static inline void pte_free(struct page 
 
 static inline void check_pgt_cache(void)
 {
-	quicklist_check(QUICK_PGD, pgd_dtor);
-	quicklist_check(QUICK_PT, NULL);
+	quicklist_trim(QUICK_PGD, pgd_dtor, 25, 16);
+	quicklist_trim(QUICK_PT, NULL, 25, 16);
 }
 #endif /* _X86_64_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-sparc64/pgalloc.h	2007-03-19 17:42:47.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h	2007-03-19 17:42:49.000000000 -0700
@@ -67,7 +67,7 @@ static inline void pte_free(struct page 
 
 static inline void check_pgt_cache(void)
 {
-	quicklist_check(0, NULL);
+	quicklist_trim(0, NULL, 25, 16);
 }
 
 #endif /* _SPARC64_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-ia64/pgalloc.h	2007-03-19 17:42:43.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h	2007-03-19 17:42:59.000000000 -0700
@@ -106,7 +106,7 @@ static inline void pte_free_kernel(pte_t
 
 static inline void check_pgt_cache(void)
 {
-	quicklist_check(0, NULL);
+	quicklist_trim(0, NULL, 25, 16);
 }
 
 #define __pte_free_tlb(tlb, pte)	pte_free(pte)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-20  1:03           ` Christoph Lameter
@ 2007-03-20  1:32             ` Andrew Morton
  2007-03-20 19:41               ` Christoph Lameter
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-20  1:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007 18:03:54 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:

> On Mon, 19 Mar 2007, Andrew Morton wrote:
> 
> > > See the patch. We are only touching 2 cachelines instead of 32. So even 
> > > without considering the page allocator overhead and the slab allocator 
> > > overhead (which will make the situation even better) its superior.
> > 
> > That's not proof, it is handwaving.  I could wave right back at you and
> > claim that the benefit from returning a cache-hot pte page back to the page
> > allocator for reuse exceeds the benefit which you waved at me above.
> 
> No you cannot make that claim. That would mean that you have to touch 
> 32 pages which is inferior.

For pte pages (which are far more common), more than a single cacheline
will be in cache.

Yes, a common quicklist implementation is good.  But no quicklist
implementation at all is better.  You say that will be slower, and you may
well be right, but I say let's demonstrate that (please) rather than
speculating.

Then we can look at the difference and decide whether it is worth the
additional complexity of this special-purpose private allocator.

> > You may well be right, but nothing is proven, afaict.
> 
> Nothing can be proven except within a rigorously defined mathematical 
> system but even there we are limited by such things as Russel's paradox.
> 
> Its obvious that this is right. And there has been significant work 
> invested into retaining page table pages on i386, sparc64 and ia64 for 
> exactly the specified.

I believe that work predated per-cpu-pages.

> This patch does not change that at all for these 3 
> arches. There is no doubt about the correctness of the approach here.
> 
> > > You do not think that our current way of handling ptes is okay? If we do 
> > > not zero the ptes then we need to separate munmap from process shutdown.
> > 
> > Yep.  It's possible that process shutdown is a sufficiently common and
> > costly special-case for it to be worth special-casing.
> 
> Ok great idea but what does this have to do with this patch? This patch 
> simply generalizes something that has been there for ages.

It has a lot to do with this patch.

If we decide that it is useful to optimise the full-mm teardown case then
we will need to zero these pages when we start to use them so we might as
well get them straight from the page allocator.  Hence this patch goes into
the bitbucket.

> > > The advantage of the quicklists is that it does not require a rework of 
> > > the pte serialization.
> > 
> > No, these are unrelated.  We can get pte pages from the page allocator and
> > zero them without touching the munmap handling.
> > 
> > But it's possible that if we _were_ to optimise the munmap handling as
> > suggested, the end result would be superior.
> 
> Andrew, this is utter crap and unrelated to this work. The main thing here 
> is to generalize something that various arches already do and to avoid the 
> page struct handling collisions. You use pie-in-the-sky to argue against 
> consolidating code and fixing up usage conflicts of the slab with arch 
> code?

It is not pie-in-the-sky to ask "is this code still useful?".


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
                   ` (5 preceding siblings ...)
  2007-03-20  0:21 ` Andrew Morton
@ 2007-03-20  3:12 ` Paul Mackerras
  2007-03-20 19:43   ` Christoph Lameter
  6 siblings, 1 reply; 19+ messages in thread
From: Paul Mackerras @ 2007-03-20  3:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, linux-kernel

Christoph Lameter writes:

> +static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
> +{

...

> +	p = (void *)__get_free_page(flags | __GFP_ZERO);

This will cause problems on 64-bit powerpc, at least with 4k pages,
since the pmd and pgd levels only use 1/4 of a page.

Paul.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-20  1:32             ` Andrew Morton
@ 2007-03-20 19:41               ` Christoph Lameter
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20 19:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On Mon, 19 Mar 2007, Andrew Morton wrote:

> Yes, a common quicklist implementation is good.  But no quicklist
> implementation at all is better.  You say that will be slower, and you may
> well be right, but I say let's demonstrate that (please) rather than
> speculating.

There are at least 3 arches already using this scheme there is no 
speculation here. The slab use in i386 is for exactly the same purpose. 
There is nothing new here. It consolidates code and fixes the page struct 
use conflict between slab and arch code. The conflict is the main reason 
why I want this. That way I will not have the special casing in SLUB and 
we can make SLAB support debugging for all slab caches.
 
> > Its obvious that this is right. And there has been significant work 
> > invested into retaining page table pages on i386, sparc64 and ia64 for 
> > exactly the specified.
> 
> I believe that work predated per-cpu-pages.

Lots of arch code depends on page table pages being in a known state for 
reuse this is nothing new. 

> > Ok great idea but what does this have to do with this patch? This patch 
> > simply generalizes something that has been there for ages.
> 
> It has a lot to do with this patch.
> 
> If we decide that it is useful to optimise the full-mm teardown case then
> we will need to zero these pages when we start to use them so we might as
> well get them straight from the page allocator.  Hence this patch goes into
> the bitbucket.

If you decide to optimise the full-mm teardown then you will have to 
rework more than half of the arch handling of page table pages since they 
all rely on pages being zero on return.

> It is not pie-in-the-sky to ask "is this code still useful?".

Yes it is if its a funky idea without code or any data to support a major 
change in the way we handle page table pages. And this falls into that 
category.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
  2007-03-20  3:12 ` Paul Mackerras
@ 2007-03-20 19:43   ` Christoph Lameter
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20 19:43 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: akpm, linux-mm, linux-kernel

On Tue, 20 Mar 2007, Paul Mackerras wrote:

> Christoph Lameter writes:
> 
> > +static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
> > +{
> 
> ...
> 
> > +	p = (void *)__get_free_page(flags | __GFP_ZERO);
> 
> This will cause problems on 64-bit powerpc, at least with 4k pages,
> since the pmd and pgd levels only use 1/4 of a page.

quicklists are only useful for page sized allocations. If you have smaller 
sizes then by all means continue the use of slab. You do not have a page 
struct for each pmd, pgd  to do bad things with (like i386) so everything 
is just fine.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [QUICKLIST 4/5] Quicklist support for x86_64
  2007-03-23  6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
@ 2007-03-23  6:28 ` Christoph Lameter
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-23  6:28 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel

Conver x86_64 to using quicklists

This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.

A second quicklist is used to separate out PGD handling. Thus we can carry
the initialized pgds of terminating processes over to the next process
needing them.

Also clean up the pgd_list handling to use regular list macros. Not using
the slab allocator frees up the lru field so we can use regular list macros.

The adding and removal of the pgds to the pgdlist is moved into the
constructor / destructor. We can then avoid moving pgds off the list that
are still in the quicklists reducing the pds creation and allocation
overhead further.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc4-mm1/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/x86_64/Kconfig	2007-03-20 14:20:34.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/x86_64/Kconfig	2007-03-20 14:21:57.000000000 -0700
@@ -56,6 +56,14 @@ config ZONE_DMA
 	bool
 	default y
 
+config QUICKLIST
+	bool
+	default y
+
+config NR_QUICK
+	int
+	default 2
+
 config ISA
 	bool
 
Index: linux-2.6.21-rc4-mm1/include/asm-x86_64/pgalloc.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-x86_64/pgalloc.h	2007-03-20 14:21:06.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-x86_64/pgalloc.h	2007-03-20 14:55:47.000000000 -0700
@@ -4,6 +4,10 @@
 #include <asm/pda.h>
 #include <linux/threads.h>
 #include <linux/mm.h>
+#include <linux/quicklist.h>
+
+#define QUICK_PGD 0	/* We preserve special mappings over free */
+#define QUICK_PT 1	/* Other page table pages that are zero on free */
 
 #define pmd_populate_kernel(mm, pmd, pte) \
 		set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,86 +24,77 @@ static inline void pmd_populate(struct m
 static inline void pmd_free(pmd_t *pmd)
 {
 	BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-	free_page((unsigned long)pmd);
+	quicklist_free(QUICK_PT, NULL, pmd);
 }
 
 static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline void pud_free (pud_t *pud)
 {
 	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-	free_page((unsigned long)pud);
+	quicklist_free(QUICK_PT, NULL, pud);
 }
 
-static inline void pgd_list_add(pgd_t *pgd)
+static inline void pgd_ctor(void *x)
 {
+	unsigned boundary;
+	pgd_t *pgd = x;
 	struct page *page = virt_to_page(pgd);
 
+	/*
+	 * Copy kernel pointers in from init.
+	 */
+	boundary = pgd_index(__PAGE_OFFSET);
+	memcpy(pgd + boundary,
+		init_level4_pgt + boundary,
+		(PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
 	spin_lock(&pgd_lock);
-	page->index = (pgoff_t)pgd_list;
-	if (pgd_list)
-		pgd_list->private = (unsigned long)&page->index;
-	pgd_list = page;
-	page->private = (unsigned long)&pgd_list;
+	list_add(&page->lru, &pgd_list);
 	spin_unlock(&pgd_lock);
 }
 
-static inline void pgd_list_del(pgd_t *pgd)
+static inline void pgd_dtor(void *x)
 {
-	struct page *next, **pprev, *page = virt_to_page(pgd);
+	pgd_t *pgd = x;
+	struct page *page = virt_to_page(pgd);
 
 	spin_lock(&pgd_lock);
-	next = (struct page *)page->index;
-	pprev = (struct page **)page->private;
-	*pprev = next;
-	if (next)
-		next->private = (unsigned long)pprev;
+	list_del(&page->lru);
 	spin_unlock(&pgd_lock);
 }
 
+
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	unsigned boundary;
-	pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
-	if (!pgd)
-		return NULL;
-	pgd_list_add(pgd);
-	/*
-	 * Copy kernel pointers in from init.
-	 * Could keep a freelist or slab cache of those because the kernel
-	 * part never changes.
-	 */
-	boundary = pgd_index(__PAGE_OFFSET);
-	memset(pgd, 0, boundary * sizeof(pgd_t));
-	memcpy(pgd + boundary,
-	       init_level4_pgt + boundary,
-	       (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+	pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+			 GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
+
 	return pgd;
 }
 
 static inline void pgd_free(pgd_t *pgd)
 {
 	BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
-	pgd_list_del(pgd);
-	free_page((unsigned long)pgd);
+	quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 }
 
 static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
-	void *p = (void *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+	void *p = (void *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
 	if (!p)
 		return NULL;
 	return virt_to_page(p);
@@ -111,17 +106,22 @@ static inline struct page *pte_alloc_one
 static inline void pte_free_kernel(pte_t *pte)
 {
 	BUG_ON((unsigned long)pte & (PAGE_SIZE-1));
-	free_page((unsigned long)pte); 
+	quicklist_free(QUICK_PT, NULL, pte);
 }
 
 static inline void pte_free(struct page *pte)
 {
 	__free_page(pte);
-} 
+}
 
 #define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
 
 #define __pmd_free_tlb(tlb,x)   tlb_remove_page((tlb),virt_to_page(x))
 #define __pud_free_tlb(tlb,x)   tlb_remove_page((tlb),virt_to_page(x))
 
+static inline void check_pgt_cache(void)
+{
+	quicklist_trim(QUICK_PGD, pgd_dtor, 25, 16);
+	quicklist_trim(QUICK_PT, NULL, 25, 16);
+}
 #endif /* _X86_64_PGALLOC_H */
Index: linux-2.6.21-rc4-mm1/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/x86_64/kernel/process.c	2007-03-20 14:20:35.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/x86_64/kernel/process.c	2007-03-20 14:21:57.000000000 -0700
@@ -207,6 +207,7 @@ void cpu_idle (void)
 			if (__get_cpu_var(cpu_idle_state))
 				__get_cpu_var(cpu_idle_state) = 0;
 
+			check_pgt_cache();
 			rmb();
 			idle = pm_idle;
 			if (!idle)
Index: linux-2.6.21-rc4-mm1/arch/x86_64/kernel/smp.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/x86_64/kernel/smp.c	2007-03-20 14:20:35.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/x86_64/kernel/smp.c	2007-03-20 14:21:57.000000000 -0700
@@ -242,7 +242,7 @@ void flush_tlb_mm (struct mm_struct * mm
 	}
 	if (!cpus_empty(cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+	check_pgt_cache();
 	preempt_enable();
 }
 EXPORT_SYMBOL(flush_tlb_mm);
Index: linux-2.6.21-rc4-mm1/arch/x86_64/mm/fault.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/x86_64/mm/fault.c	2007-03-20 14:20:35.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/x86_64/mm/fault.c	2007-03-20 14:21:57.000000000 -0700
@@ -585,7 +585,7 @@ do_sigbus:
 }
 
 DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
+LIST_HEAD(pgd_list);
 
 void vmalloc_sync_all(void)
 {
@@ -605,8 +605,7 @@ void vmalloc_sync_all(void)
 			if (pgd_none(*pgd_ref))
 				continue;
 			spin_lock(&pgd_lock);
-			for (page = pgd_list; page;
-			     page = (struct page *)page->index) {
+			list_for_each_entry(page, &pgd_list, lru) {
 				pgd_t *pgd;
 				pgd = (pgd_t *)page_address(page) + pgd_index(address);
 				if (pgd_none(*pgd))
Index: linux-2.6.21-rc4-mm1/include/asm-x86_64/pgtable.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-x86_64/pgtable.h	2007-03-20 14:21:06.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-x86_64/pgtable.h	2007-03-20 14:21:57.000000000 -0700
@@ -402,7 +402,7 @@ static inline pte_t pte_modify(pte_t pte
 #define __swp_entry_to_pte(x)		((pte_t) { (x).val })
 
 extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
+extern struct list_head pgd_list;
 void vmalloc_sync_all(void);
 
 #endif /* !__ASSEMBLY__ */
@@ -419,7 +419,6 @@ extern int kern_addr_valid(unsigned long
 #define HAVE_ARCH_UNMAPPED_AREA
 
 #define pgtable_cache_init()   do { } while (0)
-#define check_pgt_cache()      do { } while (0)
 
 #define PAGE_AGP    PAGE_KERNEL_NOCACHE
 #define HAVE_PAGE_AGP 1

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-03-23  6:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
2007-03-19 23:57   ` Christoph Lameter
2007-03-20  0:07     ` Andrew Morton
2007-03-20  0:44       ` Christoph Lameter
2007-03-20  0:55         ` Andrew Morton
2007-03-20  1:03           ` Christoph Lameter
2007-03-20  1:32             ` Andrew Morton
2007-03-20 19:41               ` Christoph Lameter
2007-03-19 23:58   ` David Miller
2007-03-20  0:21 ` Andrew Morton
2007-03-20  1:06   ` Christoph Lameter
2007-03-20  3:12 ` Paul Mackerras
2007-03-20 19:43   ` Christoph Lameter
2007-03-23  6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
2007-03-23  6:28 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).