LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [QUICKLIST 1/5] Quicklists for page table pages V3
@ 2007-03-19 23:37 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
` (6 more replies)
0 siblings, 7 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
Quicklists for page table pages V3
V2->V3
- Fix Kconfig issues by setting CONFIG_QUICKLIST explicitly
and default to one quicklist if NR_QUICK is not set.
- Fix i386 support. (Cannot mix PMD and PTE allocs.)
- Discussion of V2.
http://marc.info/?l=linux-kernel&m=117391339914767&w=2
V1->V2
- Add sparch64 patch
- Single i386 and x86_64 patch
- Update attribution
- Update justification
- Update approvals
- Earlier discussion of V1 was at
http://marc.info/?l=linux-kernel&m=117357922219342&w=2
This patchset introduces an arch independent framework to handle lists
of recently used page table pages to replace the existing (ab)use of the
slab for that purpose.
1. Proven code from the IA64 arch.
The method used here has been fine tuned for years and
is NUMA aware. It is based on the knowledge that accesses
to page table pages are sparse in nature. Taking a page
off the freelists instead of allocating a zeroed pages
allows a reduction of number of cachelines touched
in addition to getting rid of the slab overhead. So
performance improves. This is particularly useful if pgds
contain standard mappings. We can save on the teardown
and setup of such a page if we have some on the quicklists.
This includes avoiding lists operations that are otherwise
necessary on alloc and free to track pgds.
2. Light weight alternative to use slab to manage page size pages
Slab overhead is significant and even page allocator use
is pretty heavy weight. The use of a per cpu quicklist
means that we touch only two cachelines for an allocation.
There is no need to access the page_struct (unless arch code
needs to fiddle around with it). So the fast past just
means bringing in one cacheline at the beginning of the
page. That same cacheline may then be used to store the
page table entry. Or a second cacheline may be used
if the page table entry is not in the first cacheline of
the page. The current code will zero the page which means
touching 32 cachelines (assuming 128 byte). We get down
from 32 to 2 cachelines in the fast path.
3. Fix conflicting use of page_structs by slab and arch code.
F.e. Both arches use the ->private and ->index field to
create lists of pgds and i386 also uses other page flags. The slab
can also use the ->private field for allocations that
are larger than page size which would occur if one enables
debugging. In that case the arch code would overwrite the
pointer to the first page of the compound page allocated
by the slab. SLAB has been modified to not enable
debugging for such slabs (!).
There the potential for additional conflicts
here especially since some arches also use page flags to mark
page table pages.
The patch removes these conflicts by no longer using
the slab for these purposes. The page allocator is more
suitable since PAGE_SIZE chunks are its domain.
Then we can start using standard list operations via
page->lru instead of improvising linked lists.
SLUB makes more extensive use of the page struct and so
far had to create workarounds for these slabs. The ->index
field is used for the SLUB freelist. So SLUB cannot allow
the use of a freelist for these slabs and--like slab--
currently does not allow debugging and forces slabs to
only contain a single object (avoids freelist).
If we do not get rid of these issues then both SLAB and SLUB
have to continue to provide special code paths to support these
slabs.
4. i386 gets lightweight NUMA aware management of page table pages.
Note that the use of SLAB on NUMA systems will require the
use of alien caches to efficiently remove remote page
table pages. Which (for a PAGE_SIZEd allocation) is a lengthy
and expensive process. With quicklists no alien caches are
needed. Pages can be simply returned to the correct node.
5. x86_64 gets lightweight page table page management.
This will allow x86_64 arch code to faster repopulate pgds
and other page table entries. The list operations for pgds
are reduced in the same way as for i386 to the point where
a pgd is allocated from the page allocator and when it is
freed back to the page allocator. A pgd can pass through
the quicklists without having to be reinitialized.
6. Consolidation of code from multiple arches
So far arches have their own implementation of quicklist
management. This patch moves that feature into the core allowing
an easier maintenance and consistent management of quicklists.
Page table pages have the characteristics that they are typically zero
or in a known state when they are freed. This is usually the exactly
same state as needed after allocation. So it makes sense to build a list
of freed page table pages and then consume the pages already in use
first. Those pages have already been initialized correctly (thus no
need to zero them) and are likely already cached in such a way that
the MMU can use them most effectively. Page table pages are used in
a sparse way so zeroing them on allocation is not too useful.
Such an implementation already exits for ia64. Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64.
It also only supported a single quicklist. The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.
Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more
than one quicklist is necessary then we can define NR_QUICK for additional
lists. F.e. i386 needs two and thus has
config NR_QUICK
int
default 2
If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:
quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)
Page table pages can be freed using:
quicklist_free(<quicklist-nr>, <destructor>, <page>)
Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.
If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.
Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.
Tested on:
i386 UP / SMP, x86_64 UP, NUMA emulation, IA64 NUMA.
Index: linux-2.6.21-rc3-mm2/include/linux/quicklist.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.21-rc3-mm2/include/linux/quicklist.h 2007-03-16 02:19:15.000000000 -0700
@@ -0,0 +1,95 @@
+#ifndef LINUX_QUICKLIST_H
+#define LINUX_QUICKLIST_H
+/*
+ * Fast allocations and disposal of pages. Pages must be in the condition
+ * as needed after allocation when they are freed. Per cpu lists of pages
+ * are kept that only contain node local pages.
+ *
+ * (C) 2007, SGI. Christoph Lameter <clameter@sgi.com>
+ */
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/percpu.h>
+
+#ifdef CONFIG_QUICKLIST
+
+#ifndef CONFIG_NR_QUICK
+#define CONFIG_NR_QUICK 1
+#endif
+
+struct quicklist {
+ void *page;
+ int nr_pages;
+};
+
+DECLARE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
+
+/*
+ * The two key functions quicklist_alloc and quicklist_free are inline so
+ * that they may be custom compiled for the platform.
+ * Specifying a NULL ctor can remove constructor support. Specifying
+ * a constant quicklist allows the determination of the exact address
+ * in the per cpu area.
+ *
+ * The fast patch in quicklist_alloc touched only a per cpu cacheline and
+ * the first cacheline of the page itself. There is minmal overhead involved.
+ */
+static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
+{
+ struct quicklist *q;
+ void **p = NULL;
+
+ q =&get_cpu_var(quicklist)[nr];
+ p = q->page;
+ if (likely(p)) {
+ q->page = p[0];
+ p[0] = NULL;
+ q->nr_pages--;
+ }
+ put_cpu_var(quicklist);
+ if (likely(p))
+ return p;
+
+ p = (void *)__get_free_page(flags | __GFP_ZERO);
+ if (ctor && p)
+ ctor(p);
+ return p;
+}
+
+static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
+{
+ struct quicklist *q;
+ void **p = pp;
+ struct page *page = virt_to_page(p);
+ int nid = page_to_nid(page);
+
+ if (unlikely(nid != numa_node_id())) {
+ if (dtor)
+ dtor(p);
+ free_page((unsigned long)p);
+ return;
+ }
+
+ q = &get_cpu_var(quicklist)[nr];
+ p[0] = q->page;
+ q->page = p;
+ q->nr_pages++;
+ put_cpu_var(quicklist);
+}
+
+void quicklist_check(int nr, void (*dtor)(void *));
+unsigned long quicklist_total_size(void);
+
+#else
+void quicklist_check(int nr, void (*dtor)(void *))
+{
+}
+
+unsigned long quicklist_total_size(void)
+{
+ return 0;
+}
+#endif
+
+#endif /* LINUX_QUICKLIST_H */
+
Index: linux-2.6.21-rc3-mm2/mm/Makefile
===================================================================
--- linux-2.6.21-rc3-mm2.orig/mm/Makefile 2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/mm/Makefile 2007-03-16 02:16:22.000000000 -0700
@@ -30,3 +30,5 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
obj-$(CONFIG_FS_XIP) += filemap_xip.o
obj-$(CONFIG_MIGRATION) += migrate.o
obj-$(CONFIG_SMP) += allocpercpu.o
+obj-$(CONFIG_QUICKLIST) += quicklist.o
+
Index: linux-2.6.21-rc3-mm2/mm/quicklist.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.21-rc3-mm2/mm/quicklist.c 2007-03-16 02:16:22.000000000 -0700
@@ -0,0 +1,81 @@
+/*
+ * Quicklist support.
+ *
+ * Quicklists are light weight lists of pages that have a defined state
+ * on alloc and free. Pages must be in the quicklist specific defined state
+ * (zero by default) when the page is freed. It seems that the initial idea
+ * for such lists first came from Dave Miller and then various other people
+ * improved on it.
+ *
+ * Copyright (C) 2007 SGI,
+ * Christoph Lameter <clameter@sgi.com>
+ * Generalized, added support for multiple lists and
+ * constructors / destructors.
+ */
+#include <linux/kernel.h>
+
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/quicklist.h>
+
+DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
+
+#define MIN_PAGES 25
+#define MAX_FREES_PER_PASS 16
+#define FRACTION_OF_NODE_MEM 16
+
+static unsigned long max_pages(void)
+{
+ unsigned long node_free_pages, max;
+
+ node_free_pages = node_page_state(numa_node_id(),
+ NR_FREE_PAGES);
+ max = node_free_pages / FRACTION_OF_NODE_MEM;
+ return max(max, (unsigned long)MIN_PAGES);
+}
+
+static long min_pages_to_free(struct quicklist *q)
+{
+ long pages_to_free;
+
+ pages_to_free = q->nr_pages - max_pages();
+
+ return min(pages_to_free, (long)MAX_FREES_PER_PASS);
+}
+
+void quicklist_check(int nr, void (*dtor)(void *))
+{
+ long pages_to_free;
+ struct quicklist *q;
+
+ q = &get_cpu_var(quicklist)[nr];
+ if (q->nr_pages > MIN_PAGES) {
+ pages_to_free = min_pages_to_free(q);
+
+ while (pages_to_free > 0) {
+ void *p = quicklist_alloc(nr, 0, NULL);
+
+ if (dtor)
+ dtor(p);
+ free_page((unsigned long)p);
+ pages_to_free--;
+ }
+ }
+ put_cpu_var(quicklist);
+}
+
+unsigned long quicklist_total_size(void)
+{
+ unsigned long count = 0;
+ int cpu;
+ struct quicklist *ql, *q;
+
+ for_each_online_cpu(cpu) {
+ ql = per_cpu(quicklist, cpu);
+ for (q = ql; q < ql + CONFIG_NR_QUICK; q++)
+ count += q->nr_pages;
+ }
+ return count;
+}
+
^ permalink raw reply [flat|nested] 19+ messages in thread
* [QUICKLIST 2/5] Quicklist support for IA64
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
@ 2007-03-19 23:37 ` Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
` (5 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
Quicklist for IA64
IA64 is the origin of the quicklist implementation. So cut out the pieces
that are now in core code and modify the functions called.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc3-mm2/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/mm/init.c 2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/mm/init.c 2007-03-16 02:33:46.000000000 -0700
@@ -39,9 +39,6 @@
DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-DEFINE_PER_CPU(unsigned long *, __pgtable_quicklist);
-DEFINE_PER_CPU(long, __pgtable_quicklist_size);
-
extern void ia64_tlb_init (void);
unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
@@ -56,54 +53,6 @@ EXPORT_SYMBOL(vmem_map);
struct page *zero_page_memmap_ptr; /* map entry for zero page */
EXPORT_SYMBOL(zero_page_memmap_ptr);
-#define MIN_PGT_PAGES 25UL
-#define MAX_PGT_FREES_PER_PASS 16L
-#define PGT_FRACTION_OF_NODE_MEM 16
-
-static inline long
-max_pgt_pages(void)
-{
- u64 node_free_pages, max_pgt_pages;
-
-#ifndef CONFIG_NUMA
- node_free_pages = nr_free_pages();
-#else
- node_free_pages = node_page_state(numa_node_id(), NR_FREE_PAGES);
-#endif
- max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM;
- max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES);
- return max_pgt_pages;
-}
-
-static inline long
-min_pages_to_free(void)
-{
- long pages_to_free;
-
- pages_to_free = pgtable_quicklist_size - max_pgt_pages();
- pages_to_free = min(pages_to_free, MAX_PGT_FREES_PER_PASS);
- return pages_to_free;
-}
-
-void
-check_pgt_cache(void)
-{
- long pages_to_free;
-
- if (unlikely(pgtable_quicklist_size <= MIN_PGT_PAGES))
- return;
-
- preempt_disable();
- while (unlikely((pages_to_free = min_pages_to_free()) > 0)) {
- while (pages_to_free--) {
- free_page((unsigned long)pgtable_quicklist_alloc());
- }
- preempt_enable();
- preempt_disable();
- }
- preempt_enable();
-}
-
void
lazy_mmu_prot_update (pte_t pte)
{
Index: linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-ia64/pgalloc.h 2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h 2007-03-16 02:33:46.000000000 -0700
@@ -18,71 +18,18 @@
#include <linux/mm.h>
#include <linux/page-flags.h>
#include <linux/threads.h>
+#include <linux/quicklist.h>
#include <asm/mmu_context.h>
-DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist);
-#define pgtable_quicklist __ia64_per_cpu_var(__pgtable_quicklist)
-DECLARE_PER_CPU(long, __pgtable_quicklist_size);
-#define pgtable_quicklist_size __ia64_per_cpu_var(__pgtable_quicklist_size)
-
-static inline long pgtable_quicklist_total_size(void)
-{
- long ql_size = 0;
- int cpuid;
-
- for_each_online_cpu(cpuid) {
- ql_size += per_cpu(__pgtable_quicklist_size, cpuid);
- }
- return ql_size;
-}
-
-static inline void *pgtable_quicklist_alloc(void)
-{
- unsigned long *ret = NULL;
-
- preempt_disable();
-
- ret = pgtable_quicklist;
- if (likely(ret != NULL)) {
- pgtable_quicklist = (unsigned long *)(*ret);
- ret[0] = 0;
- --pgtable_quicklist_size;
- preempt_enable();
- } else {
- preempt_enable();
- ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
- }
-
- return ret;
-}
-
-static inline void pgtable_quicklist_free(void *pgtable_entry)
-{
-#ifdef CONFIG_NUMA
- int nid = page_to_nid(virt_to_page(pgtable_entry));
-
- if (unlikely(nid != numa_node_id())) {
- free_page((unsigned long)pgtable_entry);
- return;
- }
-#endif
-
- preempt_disable();
- *(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist;
- pgtable_quicklist = (unsigned long *)pgtable_entry;
- ++pgtable_quicklist_size;
- preempt_enable();
-}
-
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
- return pgtable_quicklist_alloc();
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pgd_free(pgd_t * pgd)
{
- pgtable_quicklist_free(pgd);
+ quicklist_free(0, NULL, pgd);
}
#ifdef CONFIG_PGTABLE_4
@@ -94,12 +41,12 @@ pgd_populate(struct mm_struct *mm, pgd_t
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return pgtable_quicklist_alloc();
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pud_free(pud_t * pud)
{
- pgtable_quicklist_free(pud);
+ quicklist_free(0, NULL, pud);
}
#define __pud_free_tlb(tlb, pud) pud_free(pud)
#endif /* CONFIG_PGTABLE_4 */
@@ -112,12 +59,12 @@ pud_populate(struct mm_struct *mm, pud_t
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return pgtable_quicklist_alloc();
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pmd_free(pmd_t * pmd)
{
- pgtable_quicklist_free(pmd);
+ quicklist_free(0, NULL, pmd);
}
#define __pmd_free_tlb(tlb, pmd) pmd_free(pmd)
@@ -137,28 +84,31 @@ pmd_populate_kernel(struct mm_struct *mm
static inline struct page *pte_alloc_one(struct mm_struct *mm,
unsigned long addr)
{
- void *pg = pgtable_quicklist_alloc();
+ void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
return pg ? virt_to_page(pg) : NULL;
}
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long addr)
{
- return pgtable_quicklist_alloc();
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pte_free(struct page *pte)
{
- pgtable_quicklist_free(page_address(pte));
+ quicklist_free(0, NULL, page_address(pte));
}
static inline void pte_free_kernel(pte_t * pte)
{
- pgtable_quicklist_free(pte);
+ quicklist_free(0, NULL, pte);
}
-#define __pte_free_tlb(tlb, pte) pte_free(pte)
+static inline void check_pgt_cache(void)
+{
+ quicklist_check(0, NULL);
+}
-extern void check_pgt_cache(void);
+#define __pte_free_tlb(tlb, pte) pte_free(pte)
#endif /* _ASM_IA64_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/arch/ia64/mm/contig.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/mm/contig.c 2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/mm/contig.c 2007-03-16 02:33:46.000000000 -0700
@@ -88,7 +88,7 @@ void show_mem(void)
printk(KERN_INFO "%d pages shared\n", total_shared);
printk(KERN_INFO "%d pages swap cached\n", total_cached);
printk(KERN_INFO "Total of %ld pages in page table cache\n",
- pgtable_quicklist_total_size());
+ quicklist_total_size());
printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
}
Index: linux-2.6.21-rc3-mm2/arch/ia64/mm/discontig.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/mm/discontig.c 2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/mm/discontig.c 2007-03-16 02:33:46.000000000 -0700
@@ -563,7 +563,7 @@ void show_mem(void)
printk(KERN_INFO "%d pages shared\n", total_shared);
printk(KERN_INFO "%d pages swap cached\n", total_cached);
printk(KERN_INFO "Total of %ld pages in page table cache\n",
- pgtable_quicklist_total_size());
+ quicklist_total_size());
printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
}
Index: linux-2.6.21-rc3-mm2/arch/ia64/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/ia64/Kconfig 2007-03-16 02:15:24.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/ia64/Kconfig 2007-03-16 02:33:46.000000000 -0700
@@ -29,6 +29,10 @@ config ZONE_DMA
def_bool y
depends on !IA64_SGI_SN2
+config QUICKLIST
+ bool
+ default y
+
config MMU
bool
default y
^ permalink raw reply [flat|nested] 19+ messages in thread
* [QUICKLIST 3/5] Quicklist support for i386
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
@ 2007-03-19 23:37 ` Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
` (4 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
i386: Convert to quicklists
Implement the i386 management of pgd and pmds using quicklists.
The i386 management of page table pages currently uses page sized slabs.
Getting rid of that using quicklists allows full use of the page flags
and the page->lru. So get rid of the improvised linked lists using
page->index and page->private.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/init.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/init.c 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/init.c 2007-03-19 15:55:33.000000000 -0700
@@ -695,31 +695,6 @@ int remove_memory(u64 start, u64 size)
EXPORT_SYMBOL_GPL(remove_memory);
#endif
-struct kmem_cache *pgd_cache;
-struct kmem_cache *pmd_cache;
-
-void __init pgtable_cache_init(void)
-{
- if (PTRS_PER_PMD > 1) {
- pmd_cache = kmem_cache_create("pmd",
- PTRS_PER_PMD*sizeof(pmd_t),
- PTRS_PER_PMD*sizeof(pmd_t),
- 0,
- pmd_ctor,
- NULL);
- if (!pmd_cache)
- panic("pgtable_cache_init(): cannot create pmd cache");
- }
- pgd_cache = kmem_cache_create("pgd",
- PTRS_PER_PGD*sizeof(pgd_t),
- PTRS_PER_PGD*sizeof(pgd_t),
- 0,
- pgd_ctor,
- PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
- if (!pgd_cache)
- panic("pgtable_cache_init(): Cannot create pgd cache");
-}
-
/*
* This function cannot be __init, since exceptions don't work in that
* section. Put this after the callers, so that it cannot be inlined.
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/pgtable.c 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c 2007-03-19 15:59:37.000000000 -0700
@@ -13,6 +13,7 @@
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/module.h>
+#include <linux/quicklist.h>
#include <asm/system.h>
#include <asm/pgtable.h>
@@ -198,11 +199,6 @@ struct page *pte_alloc_one(struct mm_str
return pte;
}
-void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags)
-{
- memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));
-}
-
/*
* List of all pgd's needed for non-PAE so it can invalidate entries
* in both cached and uncached pgd's; not needed for PAE since the
@@ -211,36 +207,18 @@ void pmd_ctor(void *pmd, struct kmem_cac
* against pageattr.c; it is the unique case in which a valid change
* of kernel pagetables can't be lazily synchronized by vmalloc faults.
* vmalloc faults work because attached pagetables are never freed.
- * The locking scheme was chosen on the basis of manfred's
- * recommendations and having no core impact whatsoever.
* -- wli
*/
DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
-
-static inline void pgd_list_add(pgd_t *pgd)
-{
- struct page *page = virt_to_page(pgd);
- page->index = (unsigned long)pgd_list;
- if (pgd_list)
- set_page_private(pgd_list, (unsigned long)&page->index);
- pgd_list = page;
- set_page_private(page, (unsigned long)&pgd_list);
-}
+LIST_HEAD(pgd_list);
-static inline void pgd_list_del(pgd_t *pgd)
-{
- struct page *next, **pprev, *page = virt_to_page(pgd);
- next = (struct page *)page->index;
- pprev = (struct page **)page_private(page);
- *pprev = next;
- if (next)
- set_page_private(next, (unsigned long)pprev);
-}
+#define QUICK_PGD 0
+#define QUICK_PMD 1
-void pgd_ctor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_ctor(void *pgd)
{
unsigned long flags;
+ struct page *page = virt_to_page(pgd);
if (PTRS_PER_PMD == 1) {
memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
@@ -259,31 +237,32 @@ void pgd_ctor(void *pgd, struct kmem_cac
__pa(swapper_pg_dir) >> PAGE_SHIFT,
USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD);
- pgd_list_add(pgd);
+ list_add(&page->lru, &pgd_list);
spin_unlock_irqrestore(&pgd_lock, flags);
}
/* never called when PTRS_PER_PMD > 1 */
-void pgd_dtor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_dtor(void *pgd)
{
unsigned long flags; /* can be called from interrupt context */
+ struct page *page = virt_to_page(pgd);
paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT);
spin_lock_irqsave(&pgd_lock, flags);
- pgd_list_del(pgd);
+ list_del(&page->lru);
spin_unlock_irqrestore(&pgd_lock, flags);
}
pgd_t *pgd_alloc(struct mm_struct *mm)
{
int i;
- pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
+ pgd_t *pgd = quicklist_alloc(QUICK_PGD, GFP_KERNEL, pgd_ctor);
if (PTRS_PER_PMD == 1 || !pgd)
return pgd;
for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
- pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
+ pmd_t *pmd = quicklist_alloc(QUICK_PMD, GFP_KERNEL, NULL);
if (!pmd)
goto out_oom;
paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT);
@@ -296,9 +275,9 @@ out_oom:
pgd_t pgdent = pgd[i];
void* pmd = (void *)__va(pgd_val(pgdent)-1);
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
- kmem_cache_free(pmd_cache, pmd);
+ quicklist_free(QUICK_PMD, NULL, pmd);
}
- kmem_cache_free(pgd_cache, pgd);
+ quicklist_free(QUICK_PGD, pgd_dtor, pgd);
return NULL;
}
@@ -312,8 +291,14 @@ void pgd_free(pgd_t *pgd)
pgd_t pgdent = pgd[i];
void* pmd = (void *)__va(pgd_val(pgdent)-1);
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
- kmem_cache_free(pmd_cache, pmd);
+ quicklist_free(QUICK_PMD, NULL, pmd);
}
/* in the non-PAE case, free_pgtables() clears user pgd entries */
- kmem_cache_free(pgd_cache, pgd);
+ quicklist_free(QUICK_PGD, pgd_ctor, pgd);
+}
+
+void check_pgt_cache(void)
+{
+ quicklist_check(QUICK_PGD, pgd_dtor);
+ quicklist_check(QUICK_PMD, NULL);
}
Index: linux-2.6.21-rc3-mm2/arch/i386/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/Kconfig 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/Kconfig 2007-03-19 15:55:33.000000000 -0700
@@ -55,6 +55,14 @@ config ZONE_DMA
bool
default y
+config QUICKLIST
+ bool
+ default y
+
+config NR_QUICK
+ int
+ default 2
+
config SBUS
bool
Index: linux-2.6.21-rc3-mm2/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-i386/pgtable.h 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-i386/pgtable.h 2007-03-19 15:55:33.000000000 -0700
@@ -35,15 +35,12 @@ struct vm_area_struct;
#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
extern unsigned long empty_zero_page[1024];
extern pgd_t swapper_pg_dir[1024];
-extern struct kmem_cache *pgd_cache;
-extern struct kmem_cache *pmd_cache;
-extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
-void pmd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_dtor(void *, struct kmem_cache *, unsigned long);
-void pgtable_cache_init(void);
+void check_pgt_cache(void);
+
+extern spinlock_t pgd_lock;
+extern struct list_head pgd_list;
+static inline void pgtable_cache_init(void) {};
void paging_init(void);
/*
Index: linux-2.6.21-rc3-mm2/arch/i386/kernel/smp.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/kernel/smp.c 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/kernel/smp.c 2007-03-19 15:55:33.000000000 -0700
@@ -437,7 +437,7 @@ void flush_tlb_mm (struct mm_struct * mm
}
if (!cpus_empty(cpu_mask))
flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+ check_pgt_cache();
preempt_enable();
}
Index: linux-2.6.21-rc3-mm2/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/kernel/process.c 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/kernel/process.c 2007-03-19 15:55:33.000000000 -0700
@@ -181,6 +181,7 @@ void cpu_idle(void)
if (__get_cpu_var(cpu_idle_state))
__get_cpu_var(cpu_idle_state) = 0;
+ check_pgt_cache();
rmb();
idle = pm_idle;
Index: linux-2.6.21-rc3-mm2/include/asm-i386/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-i386/pgalloc.h 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-i386/pgalloc.h 2007-03-19 15:55:33.000000000 -0700
@@ -65,6 +65,6 @@ do { \
#define pud_populate(mm, pmd, pte) BUG()
#endif
-#define check_pgt_cache() do { } while (0)
+extern void check_pgt_cache(void);
#endif /* _I386_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/fault.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/fault.c 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/fault.c 2007-03-19 15:55:33.000000000 -0700
@@ -623,11 +623,10 @@ void vmalloc_sync_all(void)
struct page *page;
spin_lock_irqsave(&pgd_lock, flags);
- for (page = pgd_list; page; page =
- (struct page *)page->index)
+ list_for_each_entry(page, &pgd_list, lru)
if (!vmalloc_sync_one(page_address(page),
address)) {
- BUG_ON(page != pgd_list);
+ BUG();
break;
}
spin_unlock_irqrestore(&pgd_lock, flags);
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/pageattr.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/pageattr.c 2007-03-19 15:54:28.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/pageattr.c 2007-03-19 15:55:33.000000000 -0700
@@ -95,7 +95,7 @@ static void set_pmd_pte(pte_t *kpte, uns
return;
spin_lock_irqsave(&pgd_lock, flags);
- for (page = pgd_list; page; page = (struct page *)page->index) {
+ list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
^ permalink raw reply [flat|nested] 19+ messages in thread
* [QUICKLIST 4/5] Quicklist support for x86_64
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
@ 2007-03-19 23:37 ` Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
` (3 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
Conver x86_64 to using quicklists
This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.
A second quicklist is used to separate out PGD handling. Thus we can carry
the initialized pgds of terminating processes over to the next process
needing them.
Also clean up the pgd_list handling to use regular list macros. Not using
the slab allocator frees up the lru field so we can use regular list macros.
The adding and removal of the pgds to the pgdlist is moved into the
constructor / destructor. We can then avoid moving pgds off the list that
are still in the quicklists reducing the pds creation and allocation
overhead further.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/Kconfig 2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig 2007-03-15 22:00:04.000000000 -0700
@@ -56,6 +56,14 @@ config ZONE_DMA
bool
default y
+config QUICKLIST
+ bool
+ default y
+
+config NR_QUICK
+ int
+ default 2
+
config ISA
bool
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgalloc.h 2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h 2007-03-15 21:59:31.000000000 -0700
@@ -4,6 +4,10 @@
#include <asm/pda.h>
#include <linux/threads.h>
#include <linux/mm.h>
+#include <linux/quicklist.h>
+
+#define QUICK_PGD 0 /* We preserve special mappings over free */
+#define QUICK_PT 1 /* Other page table pages that are zero on free */
#define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,86 +24,77 @@ static inline void pmd_populate(struct m
static inline void pmd_free(pmd_t *pmd)
{
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
- free_page((unsigned long)pmd);
+ quicklist_free(QUICK_PT, NULL, pmd);
}
static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
{
- return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
}
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
}
static inline void pud_free (pud_t *pud)
{
BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
- free_page((unsigned long)pud);
+ quicklist_free(QUICK_PT, NULL, pud);
}
-static inline void pgd_list_add(pgd_t *pgd)
+static inline void pgd_ctor(void *x)
{
+ unsigned boundary;
+ pgd_t *pgd = x;
struct page *page = virt_to_page(pgd);
+ /*
+ * Copy kernel pointers in from init.
+ */
+ boundary = pgd_index(__PAGE_OFFSET);
+ memcpy(pgd + boundary,
+ init_level4_pgt + boundary,
+ (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
spin_lock(&pgd_lock);
- page->index = (pgoff_t)pgd_list;
- if (pgd_list)
- pgd_list->private = (unsigned long)&page->index;
- pgd_list = page;
- page->private = (unsigned long)&pgd_list;
+ list_add(&page->lru, &pgd_list);
spin_unlock(&pgd_lock);
}
-static inline void pgd_list_del(pgd_t *pgd)
+static inline void pgd_dtor(void *x)
{
- struct page *next, **pprev, *page = virt_to_page(pgd);
+ pgd_t *pgd = x;
+ struct page *page = virt_to_page(pgd);
spin_lock(&pgd_lock);
- next = (struct page *)page->index;
- pprev = (struct page **)page->private;
- *pprev = next;
- if (next)
- next->private = (unsigned long)pprev;
+ list_del(&page->lru);
spin_unlock(&pgd_lock);
}
+
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
- unsigned boundary;
- pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
- if (!pgd)
- return NULL;
- pgd_list_add(pgd);
- /*
- * Copy kernel pointers in from init.
- * Could keep a freelist or slab cache of those because the kernel
- * part never changes.
- */
- boundary = pgd_index(__PAGE_OFFSET);
- memset(pgd, 0, boundary * sizeof(pgd_t));
- memcpy(pgd + boundary,
- init_level4_pgt + boundary,
- (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+ pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+ GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
+
return pgd;
}
static inline void pgd_free(pgd_t *pgd)
{
BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
- pgd_list_del(pgd);
- free_page((unsigned long)pgd);
+ quicklist_free(QUICK_PGD, pgd_dtor, pgd);
}
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
- return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
}
static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
{
- void *p = (void *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ void *p = (void *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
if (!p)
return NULL;
return virt_to_page(p);
@@ -111,17 +106,22 @@ static inline struct page *pte_alloc_one
static inline void pte_free_kernel(pte_t *pte)
{
BUG_ON((unsigned long)pte & (PAGE_SIZE-1));
- free_page((unsigned long)pte);
+ quicklist_free(QUICK_PT, NULL, pte);
}
static inline void pte_free(struct page *pte)
{
__free_page(pte);
-}
+}
#define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
#define __pmd_free_tlb(tlb,x) tlb_remove_page((tlb),virt_to_page(x))
#define __pud_free_tlb(tlb,x) tlb_remove_page((tlb),virt_to_page(x))
+static inline void check_pgt_cache(void)
+{
+ quicklist_check(QUICK_PGD, pgd_dtor);
+ quicklist_check(QUICK_PT, NULL);
+}
#endif /* _X86_64_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/kernel/process.c 2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/kernel/process.c 2007-03-15 21:59:31.000000000 -0700
@@ -207,6 +207,7 @@ void cpu_idle (void)
if (__get_cpu_var(cpu_idle_state))
__get_cpu_var(cpu_idle_state) = 0;
+ check_pgt_cache();
rmb();
idle = pm_idle;
if (!idle)
Index: linux-2.6.21-rc3-mm2/arch/x86_64/kernel/smp.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/kernel/smp.c 2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/kernel/smp.c 2007-03-15 21:59:31.000000000 -0700
@@ -242,7 +242,7 @@ void flush_tlb_mm (struct mm_struct * mm
}
if (!cpus_empty(cpu_mask))
flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+ check_pgt_cache();
preempt_enable();
}
EXPORT_SYMBOL(flush_tlb_mm);
Index: linux-2.6.21-rc3-mm2/arch/x86_64/mm/fault.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/x86_64/mm/fault.c 2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/x86_64/mm/fault.c 2007-03-15 21:59:31.000000000 -0700
@@ -585,7 +585,7 @@ do_sigbus:
}
DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
+LIST_HEAD(pgd_list);
void vmalloc_sync_all(void)
{
@@ -605,8 +605,7 @@ void vmalloc_sync_all(void)
if (pgd_none(*pgd_ref))
continue;
spin_lock(&pgd_lock);
- for (page = pgd_list; page;
- page = (struct page *)page->index) {
+ list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
pgd = (pgd_t *)page_address(page) + pgd_index(address);
if (pgd_none(*pgd))
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgtable.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgtable.h 2007-03-13 00:09:50.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgtable.h 2007-03-15 21:59:31.000000000 -0700
@@ -402,7 +402,7 @@ static inline pte_t pte_modify(pte_t pte
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
+extern struct list_head pgd_list;
void vmalloc_sync_all(void);
#endif /* !__ASSEMBLY__ */
@@ -419,7 +419,6 @@ extern int kern_addr_valid(unsigned long
#define HAVE_ARCH_UNMAPPED_AREA
#define pgtable_cache_init() do { } while (0)
-#define check_pgt_cache() do { } while (0)
#define PAGE_AGP PAGE_KERNEL_NOCACHE
#define HAVE_PAGE_AGP 1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [QUICKLIST 5/5] Quicklist support for sparc64
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
` (2 preceding siblings ...)
2007-03-19 23:37 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
@ 2007-03-19 23:37 ` Christoph Lameter
2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
` (2 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:37 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
From: David Miller <davem@davemloft.net>
[QUICKLIST]: Add sparc64 quicklist support.
I ported this to sparc64 as per the patch below, tested on
UP SunBlade1500 and 24 cpu Niagara T1000.
Signed-off-by: David S. Miller <davem@davemloft.net>
Index: linux-2.6.21-rc3-mm2/arch/sparc64/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/Kconfig 2007-03-13 00:09:30.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/Kconfig 2007-03-15 22:01:40.000000000 -0700
@@ -26,6 +26,10 @@ config MMU
bool
default y
+config QUICKLIST
+ bool
+ default y
+
config STACKTRACE_SUPPORT
bool
default y
Index: linux-2.6.21-rc3-mm2/arch/sparc64/mm/init.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/mm/init.c 2007-03-13 00:09:30.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/mm/init.c 2007-03-15 22:00:44.000000000 -0700
@@ -176,30 +176,6 @@ unsigned long sparc64_kern_sec_context _
int bigkernel = 0;
-struct kmem_cache *pgtable_cache __read_mostly;
-
-static void zero_ctor(void *addr, struct kmem_cache *cache, unsigned long flags)
-{
- clear_page(addr);
-}
-
-extern void tsb_cache_init(void);
-
-void pgtable_cache_init(void)
-{
- pgtable_cache = kmem_cache_create("pgtable_cache",
- PAGE_SIZE, PAGE_SIZE,
- SLAB_HWCACHE_ALIGN |
- SLAB_MUST_HWCACHE_ALIGN,
- zero_ctor,
- NULL);
- if (!pgtable_cache) {
- prom_printf("Could not create pgtable_cache\n");
- prom_halt();
- }
- tsb_cache_init();
-}
-
#ifdef CONFIG_DEBUG_DCFLUSH
atomic_t dcpage_flushes = ATOMIC_INIT(0);
#ifdef CONFIG_SMP
Index: linux-2.6.21-rc3-mm2/arch/sparc64/mm/tsb.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/sparc64/mm/tsb.c 2007-03-13 00:09:30.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/sparc64/mm/tsb.c 2007-03-15 22:00:44.000000000 -0700
@@ -252,7 +252,7 @@ static const char *tsb_cache_names[8] =
"tsb_1MB",
};
-void __init tsb_cache_init(void)
+void __init pgtable_cache_init(void)
{
unsigned long i;
Index: linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-sparc64/pgalloc.h 2007-03-13 00:09:30.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h 2007-03-15 22:00:44.000000000 -0700
@@ -6,6 +6,7 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/slab.h>
+#include <linux/quicklist.h>
#include <asm/spitfire.h>
#include <asm/cpudata.h>
@@ -13,52 +14,50 @@
#include <asm/page.h>
/* Page table allocation/freeing. */
-extern struct kmem_cache *pgtable_cache;
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
- return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pgd_free(pgd_t *pgd)
{
- kmem_cache_free(pgtable_cache, pgd);
+ quicklist_free(0, NULL, pgd);
}
#define pud_populate(MM, PUD, PMD) pud_set(PUD, PMD)
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return kmem_cache_alloc(pgtable_cache,
- GFP_KERNEL|__GFP_REPEAT);
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pmd_free(pmd_t *pmd)
{
- kmem_cache_free(pgtable_cache, pmd);
+ quicklist_free(0, NULL, pmd);
}
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- return kmem_cache_alloc(pgtable_cache,
- GFP_KERNEL|__GFP_REPEAT);
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline struct page *pte_alloc_one(struct mm_struct *mm,
unsigned long address)
{
- return virt_to_page(pte_alloc_one_kernel(mm, address));
+ void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
+ return pg ? virt_to_page(pg) : NULL;
}
static inline void pte_free_kernel(pte_t *pte)
{
- kmem_cache_free(pgtable_cache, pte);
+ quicklist_free(0, NULL, pte);
}
static inline void pte_free(struct page *ptepage)
{
- pte_free_kernel(page_address(ptepage));
+ quicklist_free(0, NULL, page_address(ptepage));
}
@@ -66,6 +65,9 @@ static inline void pte_free(struct page
#define pmd_populate(MM,PMD,PTE_PAGE) \
pmd_populate_kernel(MM,PMD,page_address(PTE_PAGE))
-#define check_pgt_cache() do { } while (0)
+static inline void check_pgt_cache(void)
+{
+ quicklist_check(0, NULL);
+}
#endif /* _SPARC64_PGALLOC_H */
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
` (3 preceding siblings ...)
2007-03-19 23:37 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
@ 2007-03-19 23:53 ` Andrew Morton
2007-03-19 23:57 ` Christoph Lameter
2007-03-19 23:58 ` David Miller
2007-03-20 0:21 ` Andrew Morton
2007-03-20 3:12 ` Paul Mackerras
6 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2007-03-19 23:53 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007 15:37:16 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:
> This patchset introduces an arch independent framework to handle lists
> of recently used page table pages to replace the existing (ab)use of the
> slab for that purpose.
>
> 1. Proven code from the IA64 arch.
Has it been proven that quicklists are superior to simply going direct to the
page allocator for these pages?
Would it provide a superior solution if we were to a) stop zeroing out the
pte's when doing a fullmm==1 teardown and b) go direct to the page allocator
for these pages?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
@ 2007-03-19 23:57 ` Christoph Lameter
2007-03-20 0:07 ` Andrew Morton
2007-03-19 23:58 ` David Miller
1 sibling, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2007-03-19 23:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007, Andrew Morton wrote:
> Has it been proven that quicklists are superior to simply going direct to the
> page allocator for these pages?
Yes.
> Would it provide a superior solution if we were to a) stop zeroing out the
> pte's when doing a fullmm==1 teardown and b) go direct to the page allocator
> for these pages?
I doubt it. The zeroing is a by product of our way of serializing pte
handling. Its going to be difficult to change that.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
2007-03-19 23:57 ` Christoph Lameter
@ 2007-03-19 23:58 ` David Miller
1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2007-03-19 23:58 UTC (permalink / raw)
To: akpm; +Cc: clameter, linux-mm, linux-kernel
From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 19 Mar 2007 16:53:29 -0700
> Would it provide a superior solution if we were to a) stop zeroing out the
> pte's when doing a fullmm==1 teardown and b) go direct to the page allocator
> for these pages?
While you could avoid zero'ing them out, you certainly can't avoid
reading them into the cpu caches.
And for the PGDs you have to initialize these things partially to
non-zero values on x86{,_64} on every new PGD you allocate, which is a
complete waste of cpu cache dirtying. Avoiding this overhead alone
justifies the quicklists I think. It's not just a "zero" thing,
so GFP_ZERO cannot help you here.
The more I think about it the more I like the quicklists.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-19 23:57 ` Christoph Lameter
@ 2007-03-20 0:07 ` Andrew Morton
2007-03-20 0:44 ` Christoph Lameter
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-20 0:07 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007 16:57:55 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Mon, 19 Mar 2007, Andrew Morton wrote:
>
> > Has it been proven that quicklists are superior to simply going direct to the
> > page allocator for these pages?
>
> Yes.
Sigh.
Please provide proof that quicklists are superior to simply going direct to
the page allocator for these pages.
> > Would it provide a superior solution if we were to a) stop zeroing out the
> > pte's when doing a fullmm==1 teardown and b) go direct to the page allocator
> > for these pages?
>
> I doubt it. The zeroing is a by product of our way of serializing pte
> handling. Its going to be difficult to change that.
Nick didn't think so, and I don't see the problem either.
We'll save on some bus traffic by avoiding the writeback, but how much
effect that will have we don't know. Presumably little.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
` (4 preceding siblings ...)
2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
@ 2007-03-20 0:21 ` Andrew Morton
2007-03-20 1:06 ` Christoph Lameter
2007-03-20 3:12 ` Paul Mackerras
6 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-20 0:21 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007 15:37:16 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:
> ...
>
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.21-rc3-mm2/include/linux/quicklist.h 2007-03-16 02:19:15.000000000 -0700
> @@ -0,0 +1,95 @@
> +#ifndef LINUX_QUICKLIST_H
> +#define LINUX_QUICKLIST_H
> +/*
> + * Fast allocations and disposal of pages. Pages must be in the condition
> + * as needed after allocation when they are freed. Per cpu lists of pages
> + * are kept that only contain node local pages.
> + *
> + * (C) 2007, SGI. Christoph Lameter <clameter@sgi.com>
> + */
> +#include <linux/kernel.h>
> +#include <linux/gfp.h>
> +#include <linux/percpu.h>
> +
> +#ifdef CONFIG_QUICKLIST
> +
> +#ifndef CONFIG_NR_QUICK
> +#define CONFIG_NR_QUICK 1
> +#endif
No, please don't define config items like this. Do it in Kconfig.
> +static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
> +{
> + struct quicklist *q;
> + void **p = NULL;
> +
> + q =&get_cpu_var(quicklist)[nr];
> + p = q->page;
> + if (likely(p)) {
> + q->page = p[0];
> + p[0] = NULL;
> + q->nr_pages--;
> + }
> + put_cpu_var(quicklist);
> + if (likely(p))
> + return p;
> +
> + p = (void *)__get_free_page(flags | __GFP_ZERO);
> + if (ctor && p)
> + ctor(p);
> + return p;
> +}
> +
> +static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
> +{
> + struct quicklist *q;
> + void **p = pp;
> + struct page *page = virt_to_page(p);
> + int nid = page_to_nid(page);
> +
> + if (unlikely(nid != numa_node_id())) {
> + if (dtor)
> + dtor(p);
> + free_page((unsigned long)p);
> + return;
> + }
> +
> + q = &get_cpu_var(quicklist)[nr];
> + p[0] = q->page;
> + q->page = p;
> + q->nr_pages++;
> + put_cpu_var(quicklist);
> +}
These guys seem to have multiple callsites for ia64 at least and probably
would benefit from being uninlined.
> +void quicklist_check(int nr, void (*dtor)(void *));
> +unsigned long quicklist_total_size(void);
> +
> +#else
> +void quicklist_check(int nr, void (*dtor)(void *))
> +{
> +}
> +
> +unsigned long quicklist_total_size(void)
> +{
> + return 0;
> +}
> +#endif
That obviouslty won't link and wasn't tested. Making these static inline
will help.
> +/*
> + * Quicklist support.
> + *
> + * Quicklists are light weight lists of pages that have a defined state
> + * on alloc and free. Pages must be in the quicklist specific defined state
> + * (zero by default) when the page is freed. It seems that the initial idea
> + * for such lists first came from Dave Miller and then various other people
> + * improved on it.
> + *
> + * Copyright (C) 2007 SGI,
> + * Christoph Lameter <clameter@sgi.com>
> + * Generalized, added support for multiple lists and
> + * constructors / destructors.
> + */
> +#include <linux/kernel.h>
> +
> +#include <linux/mm.h>
> +#include <linux/mmzone.h>
> +#include <linux/module.h>
> +#include <linux/quicklist.h>
> +
> +DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
If we uninline those big inlines, this can perhaps be made static.
> +#define MIN_PAGES 25
> +#define MAX_FREES_PER_PASS 16
> +#define FRACTION_OF_NODE_MEM 16
Are these constants optimal for all architectures?
> +static unsigned long max_pages(void)
> +{
> + unsigned long node_free_pages, max;
> +
> + node_free_pages = node_page_state(numa_node_id(),
> + NR_FREE_PAGES);
> + max = node_free_pages / FRACTION_OF_NODE_MEM;
> + return max(max, (unsigned long)MIN_PAGES);
> +}
> +
> +static long min_pages_to_free(struct quicklist *q)
> +{
> + long pages_to_free;
> +
> + pages_to_free = q->nr_pages - max_pages();
> +
> + return min(pages_to_free, (long)MAX_FREES_PER_PASS);
> +}
min_t and max_t are the standard way of avoiding that warning. Or stick a
UL on the constants (which is probably better).
> +void quicklist_check(int nr, void (*dtor)(void *))
> +{
> + long pages_to_free;
> + struct quicklist *q;
> +
> + q = &get_cpu_var(quicklist)[nr];
> + if (q->nr_pages > MIN_PAGES) {
> + pages_to_free = min_pages_to_free(q);
> +
> + while (pages_to_free > 0) {
> + void *p = quicklist_alloc(nr, 0, NULL);
> +
> + if (dtor)
> + dtor(p);
> + free_page((unsigned long)p);
> + pages_to_free--;
> + }
> + }
> + put_cpu_var(quicklist);
> +}
The use of a literal 0 as a gfp_t is a bit ugly. I assume that we don't
care because we should never actually call into the page allocator for this
caller. But it's not terribly clear because there is no commentary
describing what this function is supposed to do.
The name foo_check() is unfortunate: it implies that the function checks
something (ie: has no side-effects). But this function _does_ change
things and perhaps should be called quicklist_trim() or something like
that.
This function lacks any commentary, but I was able to work it out. I
think. Some nice comments would be, umm, nice.
> +unsigned long quicklist_total_size(void)
> +{
> + unsigned long count = 0;
> + int cpu;
> + struct quicklist *ql, *q;
> +
> + for_each_online_cpu(cpu) {
> + ql = per_cpu(quicklist, cpu);
> + for (q = ql; q < ql + CONFIG_NR_QUICK; q++)
> + count += q->nr_pages;
> + }
> + return count;
> +}
> +
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-20 0:07 ` Andrew Morton
@ 2007-03-20 0:44 ` Christoph Lameter
2007-03-20 0:55 ` Andrew Morton
0 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20 0:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007, Andrew Morton wrote:
> Please provide proof that quicklists are superior to simply going direct to
> the page allocator for these pages.
See the patch. We are only touching 2 cachelines instead of 32. So even
without considering the page allocator overhead and the slab allocator
overhead (which will make the situation even better) its superior.
> > I doubt it. The zeroing is a by product of our way of serializing pte
> > handling. Its going to be difficult to change that.
>
> Nick didn't think so, and I don't see the problem either.
You do not think that our current way of handling ptes is okay? If we do
not zero the ptes then we need to separate munmap from process shutdown.
> We'll save on some bus traffic by avoiding the writeback, but how much
> effect that will have we don't know. Presumably little.
The advantage of the quicklists is that it does not require a rework of
the pte serialization.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-20 0:44 ` Christoph Lameter
@ 2007-03-20 0:55 ` Andrew Morton
2007-03-20 1:03 ` Christoph Lameter
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-20 0:55 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007 17:44:28 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Mon, 19 Mar 2007, Andrew Morton wrote:
>
> > Please provide proof that quicklists are superior to simply going direct to
> > the page allocator for these pages.
>
> See the patch. We are only touching 2 cachelines instead of 32. So even
> without considering the page allocator overhead and the slab allocator
> overhead (which will make the situation even better) its superior.
That's not proof, it is handwaving. I could wave right back at you and
claim that the benefit from returning a cache-hot pte page back to the page
allocator for reuse exceeds the benefit which you waved at me above.
You may well be right, but nothing is proven, afaict.
> > > I doubt it. The zeroing is a by product of our way of serializing pte
> > > handling. Its going to be difficult to change that.
> >
> > Nick didn't think so, and I don't see the problem either.
>
> You do not think that our current way of handling ptes is okay? If we do
> not zero the ptes then we need to separate munmap from process shutdown.
Yep. It's possible that process shutdown is a sufficiently common and
costly special-case for it to be worth special-casing.
> > We'll save on some bus traffic by avoiding the writeback, but how much
> > effect that will have we don't know. Presumably little.
>
> The advantage of the quicklists is that it does not require a rework of
> the pte serialization.
No, these are unrelated. We can get pte pages from the page allocator and
zero them without touching the munmap handling.
But it's possible that if we _were_ to optimise the munmap handling as
suggested, the end result would be superior.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-20 0:55 ` Andrew Morton
@ 2007-03-20 1:03 ` Christoph Lameter
2007-03-20 1:32 ` Andrew Morton
0 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20 1:03 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007, Andrew Morton wrote:
> > See the patch. We are only touching 2 cachelines instead of 32. So even
> > without considering the page allocator overhead and the slab allocator
> > overhead (which will make the situation even better) its superior.
>
> That's not proof, it is handwaving. I could wave right back at you and
> claim that the benefit from returning a cache-hot pte page back to the page
> allocator for reuse exceeds the benefit which you waved at me above.
No you cannot make that claim. That would mean that you have to touch
32 pages which is inferior.
> You may well be right, but nothing is proven, afaict.
Nothing can be proven except within a rigorously defined mathematical
system but even there we are limited by such things as Russel's paradox.
Its obvious that this is right. And there has been significant work
invested into retaining page table pages on i386, sparc64 and ia64 for
exactly the specified. This patch does not change that at all for these 3
arches. There is no doubt about the correctness of the approach here.
> > You do not think that our current way of handling ptes is okay? If we do
> > not zero the ptes then we need to separate munmap from process shutdown.
>
> Yep. It's possible that process shutdown is a sufficiently common and
> costly special-case for it to be worth special-casing.
Ok great idea but what does this have to do with this patch? This patch
simply generalizes something that has been there for ages.
> > The advantage of the quicklists is that it does not require a rework of
> > the pte serialization.
>
> No, these are unrelated. We can get pte pages from the page allocator and
> zero them without touching the munmap handling.
>
> But it's possible that if we _were_ to optimise the munmap handling as
> suggested, the end result would be superior.
Andrew, this is utter crap and unrelated to this work. The main thing here
is to generalize something that various arches already do and to avoid the
page struct handling collisions. You use pie-in-the-sky to argue against
consolidating code and fixing up usage conflicts of the slab with arch
code?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-20 0:21 ` Andrew Morton
@ 2007-03-20 1:06 ` Christoph Lameter
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20 1:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007, Andrew Morton wrote:
> > +
> > +#ifdef CONFIG_QUICKLIST
> > +
> > +#ifndef CONFIG_NR_QUICK
> > +#define CONFIG_NR_QUICK 1
> > +#endif
>
> No, please don't define config items like this. Do it in Kconfig.
They can be set up in the arch specific Kconfig. Ok. I moved the
#ifndef .. #endif into mm/Kconfig.
> These guys seem to have multiple callsites for ia64 at least and probably
> would benefit from being uninlined.
Then they would no longer be optimizable. Right now one can compile out
the constructor / destructor support and provide a constant list number
as well as constant gfp masks. This can be very small and benefit
tremendously from inlining.
Many arches do not need some features and there are only a few call
sites.
> > +void quicklist_check(int nr, void (*dtor)(void *));
> > +unsigned long quicklist_total_size(void);
> > +
> > +#else
> > +void quicklist_check(int nr, void (*dtor)(void *))
> > +{
> > +}
> > +
> > +unsigned long quicklist_total_size(void)
> > +{
> > + return 0;
> > +}
> > +#endif
>
> That obviouslty won't link and wasn't tested. Making these static inline
> will help.
Hmmm... We could drop these conmpletely. If an arch does not use
quicklists then they should not be calling these.
> > +#include <linux/module.h>
> > +#include <linux/quicklist.h>
> > +
> > +DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
>
> If we uninline those big inlines, this can perhaps be made static.
Yeah but we want the inlines.
>
> > +#define MIN_PAGES 25
> > +#define MAX_FREES_PER_PASS 16
> > +#define FRACTION_OF_NODE_MEM 16
>
> Are these constants optimal for all architectures?
I added them as parameters to quicklist_trim so that an arch
can specify their own settings.
> > + return min(pages_to_free, (long)MAX_FREES_PER_PASS);
> > +}
>
> min_t and max_t are the standard way of avoiding that warning. Or stick a
> UL on the constants (which is probably better).
We do not need those since the constants are now parameters.
>
> > +void quicklist_check(int nr, void (*dtor)(void *))
> > +{
> > + long pages_to_free;
> > + struct quicklist *q;
> > +
> > + q = &get_cpu_var(quicklist)[nr];
> > + if (q->nr_pages > MIN_PAGES) {
> > + pages_to_free = min_pages_to_free(q);
> > +
> > + while (pages_to_free > 0) {
> > + void *p = quicklist_alloc(nr, 0, NULL);
> > +
> > + if (dtor)
> > + dtor(p);
> > + free_page((unsigned long)p);
> > + pages_to_free--;
> > + }
> > + }
> > + put_cpu_var(quicklist);
> > +}
>
> The use of a literal 0 as a gfp_t is a bit ugly. I assume that we don't
> care because we should never actually call into the page allocator for this
> caller. But it's not terribly clear because there is no commentary
> describing what this function is supposed to do.
Right. Will add comments.
> The name foo_check() is unfortunate: it implies that the function checks
> something (ie: has no side-effects). But this function _does_ change
> things and perhaps should be called quicklist_trim() or something like
> that.
Tradition. Dave initially named it check_pgt_cache it seems.
> This function lacks any commentary, but I was able to work it out. I
> think. Some nice comments would be, umm, nice.
ok. Here is a fixup patch:
Index: linux-2.6.21-rc3-mm2/include/linux/quicklist.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/linux/quicklist.h 2007-03-19 17:41:42.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/linux/quicklist.h 2007-03-19 17:47:34.000000000 -0700
@@ -13,10 +13,6 @@
#ifdef CONFIG_QUICKLIST
-#ifndef CONFIG_NR_QUICK
-#define CONFIG_NR_QUICK 1
-#endif
-
struct quicklist {
void *page;
int nr_pages;
@@ -77,18 +73,11 @@ static inline void quicklist_free(int nr
put_cpu_var(quicklist);
}
-void quicklist_check(int nr, void (*dtor)(void *));
-unsigned long quicklist_total_size(void);
+void quicklist_trim(int nr, void (*dtor)(void *),
+ unsigned long min_pages, unsigned long max_free);
-#else
-void quicklist_check(int nr, void (*dtor)(void *))
-{
-}
+unsigned long quicklist_total_size(void);
-unsigned long quicklist_total_size(void)
-{
- return 0;
-}
#endif
#endif /* LINUX_QUICKLIST_H */
Index: linux-2.6.21-rc3-mm2/mm/Kconfig
===================================================================
--- linux-2.6.21-rc3-mm2.orig/mm/Kconfig 2007-03-19 17:41:42.000000000 -0700
+++ linux-2.6.21-rc3-mm2/mm/Kconfig 2007-03-19 17:42:49.000000000 -0700
@@ -220,3 +220,7 @@ config DEBUG_READAHEAD
Say N for production servers.
+config NR_QUICK
+ depends on QUICKLIST
+ default 1
+
Index: linux-2.6.21-rc3-mm2/mm/quicklist.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/mm/quicklist.c 2007-03-19 17:41:42.000000000 -0700
+++ linux-2.6.21-rc3-mm2/mm/quicklist.c 2007-03-19 17:53:45.000000000 -0700
@@ -21,39 +21,46 @@
DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
-#define MIN_PAGES 25
-#define MAX_FREES_PER_PASS 16
#define FRACTION_OF_NODE_MEM 16
-static unsigned long max_pages(void)
+static unsigned long max_pages(unsigned long min_pages)
{
unsigned long node_free_pages, max;
node_free_pages = node_page_state(numa_node_id(),
NR_FREE_PAGES);
max = node_free_pages / FRACTION_OF_NODE_MEM;
- return max(max, (unsigned long)MIN_PAGES);
+ return max(max, min_pages);
}
-static long min_pages_to_free(struct quicklist *q)
+static long min_pages_to_free(struct quicklist *q,
+ unsigned long min_pages, long max_free)
{
long pages_to_free;
- pages_to_free = q->nr_pages - max_pages();
+ pages_to_free = q->nr_pages - max_pages(min_pages);
- return min(pages_to_free, (long)MAX_FREES_PER_PASS);
+ return min(pages_to_free, max_free);
}
-void quicklist_check(int nr, void (*dtor)(void *))
+/*
+ * Trim down the number of pages in the quicklist
+ */
+void quicklist_trim(int nr, void (*dtor)(void *),
+ unsigned long min_pages, unsigned long max_free)
{
long pages_to_free;
struct quicklist *q;
q = &get_cpu_var(quicklist)[nr];
- if (q->nr_pages > MIN_PAGES) {
- pages_to_free = min_pages_to_free(q);
+ if (q->nr_pages > min_pages) {
+ pages_to_free = min_pages_to_free(q, min_pages, max_free);
while (pages_to_free > 0) {
+ /*
+ * We pass a gfp_t of 0 to quicklist_alloc here
+ * because we will never call into the page allocator.
+ */
void *p = quicklist_alloc(nr, 0, NULL);
if (dtor)
Index: linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/pgtable.c 2007-03-19 17:42:44.000000000 -0700
+++ linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c 2007-03-19 17:42:49.000000000 -0700
@@ -299,6 +299,6 @@ void pgd_free(pgd_t *pgd)
void check_pgt_cache(void)
{
- quicklist_check(QUICK_PGD, pgd_dtor);
- quicklist_check(QUICK_PMD, NULL);
+ quicklist_trim(QUICK_PGD, pgd_dtor, 25, 16);
+ quicklist_trim(QUICK_PMD, NULL, 25, 16);
}
Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgalloc.h 2007-03-19 17:42:46.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h 2007-03-19 17:42:49.000000000 -0700
@@ -121,7 +121,7 @@ static inline void pte_free(struct page
static inline void check_pgt_cache(void)
{
- quicklist_check(QUICK_PGD, pgd_dtor);
- quicklist_check(QUICK_PT, NULL);
+ quicklist_trim(QUICK_PGD, pgd_dtor, 25, 16);
+ quicklist_trim(QUICK_PT, NULL, 25, 16);
}
#endif /* _X86_64_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-sparc64/pgalloc.h 2007-03-19 17:42:47.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h 2007-03-19 17:42:49.000000000 -0700
@@ -67,7 +67,7 @@ static inline void pte_free(struct page
static inline void check_pgt_cache(void)
{
- quicklist_check(0, NULL);
+ quicklist_trim(0, NULL, 25, 16);
}
#endif /* _SPARC64_PGALLOC_H */
Index: linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h
===================================================================
--- linux-2.6.21-rc3-mm2.orig/include/asm-ia64/pgalloc.h 2007-03-19 17:42:43.000000000 -0700
+++ linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h 2007-03-19 17:42:59.000000000 -0700
@@ -106,7 +106,7 @@ static inline void pte_free_kernel(pte_t
static inline void check_pgt_cache(void)
{
- quicklist_check(0, NULL);
+ quicklist_trim(0, NULL, 25, 16);
}
#define __pte_free_tlb(tlb, pte) pte_free(pte)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-20 1:03 ` Christoph Lameter
@ 2007-03-20 1:32 ` Andrew Morton
2007-03-20 19:41 ` Christoph Lameter
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-20 1:32 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007 18:03:54 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> On Mon, 19 Mar 2007, Andrew Morton wrote:
>
> > > See the patch. We are only touching 2 cachelines instead of 32. So even
> > > without considering the page allocator overhead and the slab allocator
> > > overhead (which will make the situation even better) its superior.
> >
> > That's not proof, it is handwaving. I could wave right back at you and
> > claim that the benefit from returning a cache-hot pte page back to the page
> > allocator for reuse exceeds the benefit which you waved at me above.
>
> No you cannot make that claim. That would mean that you have to touch
> 32 pages which is inferior.
For pte pages (which are far more common), more than a single cacheline
will be in cache.
Yes, a common quicklist implementation is good. But no quicklist
implementation at all is better. You say that will be slower, and you may
well be right, but I say let's demonstrate that (please) rather than
speculating.
Then we can look at the difference and decide whether it is worth the
additional complexity of this special-purpose private allocator.
> > You may well be right, but nothing is proven, afaict.
>
> Nothing can be proven except within a rigorously defined mathematical
> system but even there we are limited by such things as Russel's paradox.
>
> Its obvious that this is right. And there has been significant work
> invested into retaining page table pages on i386, sparc64 and ia64 for
> exactly the specified.
I believe that work predated per-cpu-pages.
> This patch does not change that at all for these 3
> arches. There is no doubt about the correctness of the approach here.
>
> > > You do not think that our current way of handling ptes is okay? If we do
> > > not zero the ptes then we need to separate munmap from process shutdown.
> >
> > Yep. It's possible that process shutdown is a sufficiently common and
> > costly special-case for it to be worth special-casing.
>
> Ok great idea but what does this have to do with this patch? This patch
> simply generalizes something that has been there for ages.
It has a lot to do with this patch.
If we decide that it is useful to optimise the full-mm teardown case then
we will need to zero these pages when we start to use them so we might as
well get them straight from the page allocator. Hence this patch goes into
the bitbucket.
> > > The advantage of the quicklists is that it does not require a rework of
> > > the pte serialization.
> >
> > No, these are unrelated. We can get pte pages from the page allocator and
> > zero them without touching the munmap handling.
> >
> > But it's possible that if we _were_ to optimise the munmap handling as
> > suggested, the end result would be superior.
>
> Andrew, this is utter crap and unrelated to this work. The main thing here
> is to generalize something that various arches already do and to avoid the
> page struct handling collisions. You use pie-in-the-sky to argue against
> consolidating code and fixing up usage conflicts of the slab with arch
> code?
It is not pie-in-the-sky to ask "is this code still useful?".
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
` (5 preceding siblings ...)
2007-03-20 0:21 ` Andrew Morton
@ 2007-03-20 3:12 ` Paul Mackerras
2007-03-20 19:43 ` Christoph Lameter
6 siblings, 1 reply; 19+ messages in thread
From: Paul Mackerras @ 2007-03-20 3:12 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm, linux-kernel
Christoph Lameter writes:
> +static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
> +{
...
> + p = (void *)__get_free_page(flags | __GFP_ZERO);
This will cause problems on 64-bit powerpc, at least with 4k pages,
since the pmd and pgd levels only use 1/4 of a page.
Paul.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-20 1:32 ` Andrew Morton
@ 2007-03-20 19:41 ` Christoph Lameter
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20 19:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Mon, 19 Mar 2007, Andrew Morton wrote:
> Yes, a common quicklist implementation is good. But no quicklist
> implementation at all is better. You say that will be slower, and you may
> well be right, but I say let's demonstrate that (please) rather than
> speculating.
There are at least 3 arches already using this scheme there is no
speculation here. The slab use in i386 is for exactly the same purpose.
There is nothing new here. It consolidates code and fixes the page struct
use conflict between slab and arch code. The conflict is the main reason
why I want this. That way I will not have the special casing in SLUB and
we can make SLAB support debugging for all slab caches.
> > Its obvious that this is right. And there has been significant work
> > invested into retaining page table pages on i386, sparc64 and ia64 for
> > exactly the specified.
>
> I believe that work predated per-cpu-pages.
Lots of arch code depends on page table pages being in a known state for
reuse this is nothing new.
> > Ok great idea but what does this have to do with this patch? This patch
> > simply generalizes something that has been there for ages.
>
> It has a lot to do with this patch.
>
> If we decide that it is useful to optimise the full-mm teardown case then
> we will need to zero these pages when we start to use them so we might as
> well get them straight from the page allocator. Hence this patch goes into
> the bitbucket.
If you decide to optimise the full-mm teardown then you will have to
rework more than half of the arch handling of page table pages since they
all rely on pages being zero on return.
> It is not pie-in-the-sky to ask "is this code still useful?".
Yes it is if its a funky idea without code or any data to support a major
change in the way we handle page table pages. And this falls into that
category.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V3
2007-03-20 3:12 ` Paul Mackerras
@ 2007-03-20 19:43 ` Christoph Lameter
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-20 19:43 UTC (permalink / raw)
To: Paul Mackerras; +Cc: akpm, linux-mm, linux-kernel
On Tue, 20 Mar 2007, Paul Mackerras wrote:
> Christoph Lameter writes:
>
> > +static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
> > +{
>
> ...
>
> > + p = (void *)__get_free_page(flags | __GFP_ZERO);
>
> This will cause problems on 64-bit powerpc, at least with 4k pages,
> since the pmd and pgd levels only use 1/4 of a page.
quicklists are only useful for page sized allocations. If you have smaller
sizes then by all means continue the use of slab. You do not have a page
struct for each pmd, pgd to do bad things with (like i386) so everything
is just fine.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [QUICKLIST 3/5] Quicklist support for i386
2007-03-23 6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
@ 2007-03-23 6:28 ` Christoph Lameter
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2007-03-23 6:28 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
i386: Convert to quicklists
Implement the i386 management of pgd and pmds using quicklists.
The i386 management of page table pages currently uses page sized slabs.
Getting rid of that using quicklists allows full use of the page flags
and the page->lru. So get rid of the improvised linked lists using
page->index and page->private.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc4-mm1/arch/i386/mm/init.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/mm/init.c 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/mm/init.c 2007-03-20 14:21:52.000000000 -0700
@@ -695,31 +695,6 @@ int remove_memory(u64 start, u64 size)
EXPORT_SYMBOL_GPL(remove_memory);
#endif
-struct kmem_cache *pgd_cache;
-struct kmem_cache *pmd_cache;
-
-void __init pgtable_cache_init(void)
-{
- if (PTRS_PER_PMD > 1) {
- pmd_cache = kmem_cache_create("pmd",
- PTRS_PER_PMD*sizeof(pmd_t),
- PTRS_PER_PMD*sizeof(pmd_t),
- 0,
- pmd_ctor,
- NULL);
- if (!pmd_cache)
- panic("pgtable_cache_init(): cannot create pmd cache");
- }
- pgd_cache = kmem_cache_create("pgd",
- PTRS_PER_PGD*sizeof(pgd_t),
- PTRS_PER_PGD*sizeof(pgd_t),
- 0,
- pgd_ctor,
- PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
- if (!pgd_cache)
- panic("pgtable_cache_init(): Cannot create pgd cache");
-}
-
/*
* This function cannot be __init, since exceptions don't work in that
* section. Put this after the callers, so that it cannot be inlined.
Index: linux-2.6.21-rc4-mm1/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/mm/pgtable.c 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/mm/pgtable.c 2007-03-20 14:55:47.000000000 -0700
@@ -13,6 +13,7 @@
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/module.h>
+#include <linux/quicklist.h>
#include <asm/system.h>
#include <asm/pgtable.h>
@@ -198,11 +199,6 @@ struct page *pte_alloc_one(struct mm_str
return pte;
}
-void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags)
-{
- memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));
-}
-
/*
* List of all pgd's needed for non-PAE so it can invalidate entries
* in both cached and uncached pgd's; not needed for PAE since the
@@ -211,36 +207,18 @@ void pmd_ctor(void *pmd, struct kmem_cac
* against pageattr.c; it is the unique case in which a valid change
* of kernel pagetables can't be lazily synchronized by vmalloc faults.
* vmalloc faults work because attached pagetables are never freed.
- * The locking scheme was chosen on the basis of manfred's
- * recommendations and having no core impact whatsoever.
* -- wli
*/
DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
-
-static inline void pgd_list_add(pgd_t *pgd)
-{
- struct page *page = virt_to_page(pgd);
- page->index = (unsigned long)pgd_list;
- if (pgd_list)
- set_page_private(pgd_list, (unsigned long)&page->index);
- pgd_list = page;
- set_page_private(page, (unsigned long)&pgd_list);
-}
+LIST_HEAD(pgd_list);
-static inline void pgd_list_del(pgd_t *pgd)
-{
- struct page *next, **pprev, *page = virt_to_page(pgd);
- next = (struct page *)page->index;
- pprev = (struct page **)page_private(page);
- *pprev = next;
- if (next)
- set_page_private(next, (unsigned long)pprev);
-}
+#define QUICK_PGD 0
+#define QUICK_PMD 1
-void pgd_ctor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_ctor(void *pgd)
{
unsigned long flags;
+ struct page *page = virt_to_page(pgd);
if (PTRS_PER_PMD == 1) {
memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
@@ -259,31 +237,32 @@ void pgd_ctor(void *pgd, struct kmem_cac
__pa(swapper_pg_dir) >> PAGE_SHIFT,
USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD);
- pgd_list_add(pgd);
+ list_add(&page->lru, &pgd_list);
spin_unlock_irqrestore(&pgd_lock, flags);
}
/* never called when PTRS_PER_PMD > 1 */
-void pgd_dtor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_dtor(void *pgd)
{
unsigned long flags; /* can be called from interrupt context */
+ struct page *page = virt_to_page(pgd);
paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT);
spin_lock_irqsave(&pgd_lock, flags);
- pgd_list_del(pgd);
+ list_del(&page->lru);
spin_unlock_irqrestore(&pgd_lock, flags);
}
pgd_t *pgd_alloc(struct mm_struct *mm)
{
int i;
- pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
+ pgd_t *pgd = quicklist_alloc(QUICK_PGD, GFP_KERNEL, pgd_ctor);
if (PTRS_PER_PMD == 1 || !pgd)
return pgd;
for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
- pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
+ pmd_t *pmd = quicklist_alloc(QUICK_PMD, GFP_KERNEL, NULL);
if (!pmd)
goto out_oom;
paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT);
@@ -296,9 +275,9 @@ out_oom:
pgd_t pgdent = pgd[i];
void* pmd = (void *)__va(pgd_val(pgdent)-1);
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
- kmem_cache_free(pmd_cache, pmd);
+ quicklist_free(QUICK_PMD, NULL, pmd);
}
- kmem_cache_free(pgd_cache, pgd);
+ quicklist_free(QUICK_PGD, pgd_dtor, pgd);
return NULL;
}
@@ -312,8 +291,14 @@ void pgd_free(pgd_t *pgd)
pgd_t pgdent = pgd[i];
void* pmd = (void *)__va(pgd_val(pgdent)-1);
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
- kmem_cache_free(pmd_cache, pmd);
+ quicklist_free(QUICK_PMD, NULL, pmd);
}
/* in the non-PAE case, free_pgtables() clears user pgd entries */
- kmem_cache_free(pgd_cache, pgd);
+ quicklist_free(QUICK_PGD, pgd_ctor, pgd);
+}
+
+void check_pgt_cache(void)
+{
+ quicklist_trim(QUICK_PGD, pgd_dtor, 25, 16);
+ quicklist_trim(QUICK_PMD, NULL, 25, 16);
}
Index: linux-2.6.21-rc4-mm1/arch/i386/Kconfig
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/Kconfig 2007-03-20 14:20:27.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/Kconfig 2007-03-20 14:21:52.000000000 -0700
@@ -55,6 +55,14 @@ config ZONE_DMA
bool
default y
+config QUICKLIST
+ bool
+ default y
+
+config NR_QUICK
+ int
+ default 2
+
config SBUS
bool
Index: linux-2.6.21-rc4-mm1/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-i386/pgtable.h 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-i386/pgtable.h 2007-03-20 14:21:52.000000000 -0700
@@ -35,15 +35,12 @@ struct vm_area_struct;
#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
extern unsigned long empty_zero_page[1024];
extern pgd_t swapper_pg_dir[1024];
-extern struct kmem_cache *pgd_cache;
-extern struct kmem_cache *pmd_cache;
-extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
-void pmd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_dtor(void *, struct kmem_cache *, unsigned long);
-void pgtable_cache_init(void);
+void check_pgt_cache(void);
+
+extern spinlock_t pgd_lock;
+extern struct list_head pgd_list;
+static inline void pgtable_cache_init(void) {};
void paging_init(void);
/*
Index: linux-2.6.21-rc4-mm1/arch/i386/kernel/smp.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/kernel/smp.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/kernel/smp.c 2007-03-20 14:21:52.000000000 -0700
@@ -437,7 +437,7 @@ void flush_tlb_mm (struct mm_struct * mm
}
if (!cpus_empty(cpu_mask))
flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+ check_pgt_cache();
preempt_enable();
}
Index: linux-2.6.21-rc4-mm1/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/kernel/process.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/kernel/process.c 2007-03-20 14:21:52.000000000 -0700
@@ -181,6 +181,7 @@ void cpu_idle(void)
if (__get_cpu_var(cpu_idle_state))
__get_cpu_var(cpu_idle_state) = 0;
+ check_pgt_cache();
rmb();
idle = pm_idle;
Index: linux-2.6.21-rc4-mm1/include/asm-i386/pgalloc.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-i386/pgalloc.h 2007-03-20 14:21:00.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-i386/pgalloc.h 2007-03-20 14:21:52.000000000 -0700
@@ -65,6 +65,6 @@ do { \
#define pud_populate(mm, pmd, pte) BUG()
#endif
-#define check_pgt_cache() do { } while (0)
+extern void check_pgt_cache(void);
#endif /* _I386_PGALLOC_H */
Index: linux-2.6.21-rc4-mm1/arch/i386/mm/fault.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/mm/fault.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/mm/fault.c 2007-03-20 14:21:52.000000000 -0700
@@ -623,11 +623,10 @@ void vmalloc_sync_all(void)
struct page *page;
spin_lock_irqsave(&pgd_lock, flags);
- for (page = pgd_list; page; page =
- (struct page *)page->index)
+ list_for_each_entry(page, &pgd_list, lru)
if (!vmalloc_sync_one(page_address(page),
address)) {
- BUG_ON(page != pgd_list);
+ BUG();
break;
}
spin_unlock_irqrestore(&pgd_lock, flags);
Index: linux-2.6.21-rc4-mm1/arch/i386/mm/pageattr.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/mm/pageattr.c 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/mm/pageattr.c 2007-03-20 14:21:52.000000000 -0700
@@ -95,7 +95,7 @@ static void set_pmd_pte(pte_t *kpte, uns
return;
spin_lock_irqsave(&pgd_lock, flags);
- for (page = pgd_list; page; page = (struct page *)page->index) {
+ list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-03-23 6:29 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-19 23:37 [QUICKLIST 1/5] Quicklists for page table pages V3 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
2007-03-19 23:37 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
2007-03-19 23:53 ` [QUICKLIST 1/5] Quicklists for page table pages V3 Andrew Morton
2007-03-19 23:57 ` Christoph Lameter
2007-03-20 0:07 ` Andrew Morton
2007-03-20 0:44 ` Christoph Lameter
2007-03-20 0:55 ` Andrew Morton
2007-03-20 1:03 ` Christoph Lameter
2007-03-20 1:32 ` Andrew Morton
2007-03-20 19:41 ` Christoph Lameter
2007-03-19 23:58 ` David Miller
2007-03-20 0:21 ` Andrew Morton
2007-03-20 1:06 ` Christoph Lameter
2007-03-20 3:12 ` Paul Mackerras
2007-03-20 19:43 ` Christoph Lameter
2007-03-23 6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).