LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [QUICKLIST 1/5] Quicklists for page table pages V4
@ 2007-03-23 6:28 Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
` (4 more replies)
0 siblings, 5 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 6:28 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
Quicklists for page table pages V4
V3->V4
- Rename quicklist_check to quicklist_trim and allow parameters
to specify how to clean quicklists.
- Remove dead code
V2->V3
- Fix Kconfig issues by setting CONFIG_QUICKLIST explicitly
and default to one quicklist if NR_QUICK is not set.
- Fix i386 support. (Cannot mix PMD and PTE allocs.)
- Discussion of V2.
http://marc.info/?l=linux-kernel&m=117391339914767&w=2
V1->V2
- Add sparch64 patch
- Single i386 and x86_64 patch
- Update attribution
- Update justification
- Update approvals
- Earlier discussion of V1 was at
http://marc.info/?l=linux-kernel&m=117357922219342&w=2
This patchset introduces an arch independent framework to handle lists
of recently used page table pages to replace the existing (ab)use of the
slab for that purpose.
1. Proven code from the IA64 arch.
The method used here has been fine tuned for years and
is NUMA aware. It is based on the knowledge that accesses
to page table pages are sparse in nature. Taking a page
off the freelists instead of allocating a zeroed pages
allows a reduction of number of cachelines touched
in addition to getting rid of the slab overhead. So
performance improves. This is particularly useful if pgds
contain standard mappings. We can save on the teardown
and setup of such a page if we have some on the quicklists.
This includes avoiding lists operations that are otherwise
necessary on alloc and free to track pgds.
2. Light weight alternative to use slab to manage page size pages
Slab overhead is significant and even page allocator use
is pretty heavy weight. The use of a per cpu quicklist
means that we touch only two cachelines for an allocation.
There is no need to access the page_struct (unless arch code
needs to fiddle around with it). So the fast past just
means bringing in one cacheline at the beginning of the
page. That same cacheline may then be used to store the
page table entry. Or a second cacheline may be used
if the page table entry is not in the first cacheline of
the page. The current code will zero the page which means
touching 32 cachelines (assuming 128 byte). We get down
from 32 to 2 cachelines in the fast path.
3. Fix conflicting use of page_structs by slab and arch code.
F.e. Both arches use the ->private and ->index field to
create lists of pgds and i386 also uses other page flags. The slab
can also use the ->private field for allocations that
are larger than page size which would occur if one enables
debugging. In that case the arch code would overwrite the
pointer to the first page of the compound page allocated
by the slab. SLAB has been modified to not enable
debugging for such slabs (!).
There the potential for additional conflicts
here especially since some arches also use page flags to mark
page table pages.
The patch removes these conflicts by no longer using
the slab for these purposes. The page allocator is more
suitable since PAGE_SIZE chunks are its domain.
Then we can start using standard list operations via
page->lru instead of improvising linked lists.
SLUB makes more extensive use of the page struct and so
far had to create workarounds for these slabs. The ->index
field is used for the SLUB freelist. So SLUB cannot allow
the use of a freelist for these slabs and--like slab--
currently does not allow debugging and forces slabs to
only contain a single object (avoids freelist).
If we do not get rid of these issues then both SLAB and SLUB
have to continue to provide special code paths to support these
slabs.
4. i386 gets lightweight NUMA aware management of page table pages.
Note that the use of SLAB on NUMA systems will require the
use of alien caches to efficiently remove remote page
table pages. Which (for a PAGE_SIZEd allocation) is a lengthy
and expensive process. With quicklists no alien caches are
needed. Pages can be simply returned to the correct node.
5. x86_64 gets lightweight page table page management.
This will allow x86_64 arch code to faster repopulate pgds
and other page table entries. The list operations for pgds
are reduced in the same way as for i386 to the point where
a pgd is allocated from the page allocator and when it is
freed back to the page allocator. A pgd can pass through
the quicklists without having to be reinitialized.
6. Consolidation of code from multiple arches
So far arches have their own implementation of quicklist
management. This patch moves that feature into the core allowing
an easier maintenance and consistent management of quicklists.
Page table pages have the characteristics that they are typically zero
or in a known state when they are freed. This is usually the exactly
same state as needed after allocation. So it makes sense to build a list
of freed page table pages and then consume the pages already in use
first. Those pages have already been initialized correctly (thus no
need to zero them) and are likely already cached in such a way that
the MMU can use them most effectively. Page table pages are used in
a sparse way so zeroing them on allocation is not too useful.
Such an implementation already exits for ia64. Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64.
It also only supported a single quicklist. The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.
Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more
than one quicklist is necessary then we can define NR_QUICK for additional
lists. F.e. i386 needs two and thus has
config NR_QUICK
int
default 2
If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:
quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)
Page table pages can be freed using:
quicklist_free(<quicklist-nr>, <destructor>, <page>)
Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.
If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.
Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.
Tested on:
i386 UP / SMP, x86_64 UP, NUMA emulation, IA64 NUMA.
Index: linux-2.6.21-rc4-mm1/include/linux/quicklist.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.21-rc4-mm1/include/linux/quicklist.h 2007-03-20 15:03:05.000000000 -0700
@@ -0,0 +1,84 @@
+#ifndef LINUX_QUICKLIST_H
+#define LINUX_QUICKLIST_H
+/*
+ * Fast allocations and disposal of pages. Pages must be in the condition
+ * as needed after allocation when they are freed. Per cpu lists of pages
+ * are kept that only contain node local pages.
+ *
+ * (C) 2007, SGI. Christoph Lameter <clameter@sgi.com>
+ */
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/percpu.h>
+
+#ifdef CONFIG_QUICKLIST
+
+struct quicklist {
+ void *page;
+ int nr_pages;
+};
+
+DECLARE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
+
+/*
+ * The two key functions quicklist_alloc and quicklist_free are inline so
+ * that they may be custom compiled for the platform.
+ * Specifying a NULL ctor can remove constructor support. Specifying
+ * a constant quicklist allows the determination of the exact address
+ * in the per cpu area.
+ *
+ * The fast patch in quicklist_alloc touched only a per cpu cacheline and
+ * the first cacheline of the page itself. There is minmal overhead involved.
+ */
+static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
+{
+ struct quicklist *q;
+ void **p = NULL;
+
+ q =&get_cpu_var(quicklist)[nr];
+ p = q->page;
+ if (likely(p)) {
+ q->page = p[0];
+ p[0] = NULL;
+ q->nr_pages--;
+ }
+ put_cpu_var(quicklist);
+ if (likely(p))
+ return p;
+
+ p = (void *)__get_free_page(flags | __GFP_ZERO);
+ if (ctor && p)
+ ctor(p);
+ return p;
+}
+
+static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
+{
+ struct quicklist *q;
+ void **p = pp;
+ struct page *page = virt_to_page(p);
+ int nid = page_to_nid(page);
+
+ if (unlikely(nid != numa_node_id())) {
+ if (dtor)
+ dtor(p);
+ free_page((unsigned long)p);
+ return;
+ }
+
+ q = &get_cpu_var(quicklist)[nr];
+ p[0] = q->page;
+ q->page = p;
+ q->nr_pages++;
+ put_cpu_var(quicklist);
+}
+
+void quicklist_trim(int nr, void (*dtor)(void *),
+ unsigned long min_pages, unsigned long max_free);
+
+unsigned long quicklist_total_size(void);
+
+#endif
+
+#endif /* LINUX_QUICKLIST_H */
+
Index: linux-2.6.21-rc4-mm1/mm/Makefile
===================================================================
--- linux-2.6.21-rc4-mm1.orig/mm/Makefile 2007-03-20 15:02:58.000000000 -0700
+++ linux-2.6.21-rc4-mm1/mm/Makefile 2007-03-20 15:59:50.000000000 -0700
@@ -30,3 +30,5 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
obj-$(CONFIG_FS_XIP) += filemap_xip.o
obj-$(CONFIG_MIGRATION) += migrate.o
obj-$(CONFIG_SMP) += allocpercpu.o
+obj-$(CONFIG_QUICKLIST) += quicklist.o
+
Index: linux-2.6.21-rc4-mm1/mm/quicklist.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.21-rc4-mm1/mm/quicklist.c 2007-03-20 15:03:05.000000000 -0700
@@ -0,0 +1,88 @@
+/*
+ * Quicklist support.
+ *
+ * Quicklists are light weight lists of pages that have a defined state
+ * on alloc and free. Pages must be in the quicklist specific defined state
+ * (zero by default) when the page is freed. It seems that the initial idea
+ * for such lists first came from Dave Miller and then various other people
+ * improved on it.
+ *
+ * Copyright (C) 2007 SGI,
+ * Christoph Lameter <clameter@sgi.com>
+ * Generalized, added support for multiple lists and
+ * constructors / destructors.
+ */
+#include <linux/kernel.h>
+
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/quicklist.h>
+
+DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
+
+#define FRACTION_OF_NODE_MEM 16
+
+static unsigned long max_pages(unsigned long min_pages)
+{
+ unsigned long node_free_pages, max;
+
+ node_free_pages = node_page_state(numa_node_id(),
+ NR_FREE_PAGES);
+ max = node_free_pages / FRACTION_OF_NODE_MEM;
+ return max(max, min_pages);
+}
+
+static long min_pages_to_free(struct quicklist *q,
+ unsigned long min_pages, long max_free)
+{
+ long pages_to_free;
+
+ pages_to_free = q->nr_pages - max_pages(min_pages);
+
+ return min(pages_to_free, max_free);
+}
+
+/*
+ * Trim down the number of pages in the quicklist
+ */
+void quicklist_trim(int nr, void (*dtor)(void *),
+ unsigned long min_pages, unsigned long max_free)
+{
+ long pages_to_free;
+ struct quicklist *q;
+
+ q = &get_cpu_var(quicklist)[nr];
+ if (q->nr_pages > min_pages) {
+ pages_to_free = min_pages_to_free(q, min_pages, max_free);
+
+ while (pages_to_free > 0) {
+ /*
+ * We pass a gfp_t of 0 to quicklist_alloc here
+ * because we will never call into the page allocator.
+ */
+ void *p = quicklist_alloc(nr, 0, NULL);
+
+ if (dtor)
+ dtor(p);
+ free_page((unsigned long)p);
+ pages_to_free--;
+ }
+ }
+ put_cpu_var(quicklist);
+}
+
+unsigned long quicklist_total_size(void)
+{
+ unsigned long count = 0;
+ int cpu;
+ struct quicklist *ql, *q;
+
+ for_each_online_cpu(cpu) {
+ ql = per_cpu(quicklist, cpu);
+ for (q = ql; q < ql + CONFIG_NR_QUICK; q++)
+ count += q->nr_pages;
+ }
+ return count;
+}
+
Index: linux-2.6.21-rc4-mm1/mm/Kconfig
===================================================================
--- linux-2.6.21-rc4-mm1.orig/mm/Kconfig 2007-03-20 15:03:04.000000000 -0700
+++ linux-2.6.21-rc4-mm1/mm/Kconfig 2007-03-20 16:00:22.000000000 -0700
@@ -220,3 +220,8 @@ config DEBUG_READAHEAD
Say N for production servers.
+config NR_QUICK
+ int
+ depends on QUICKLIST
+ default "1"
+
^ permalink raw reply [flat|nested] 25+ messages in thread
* [QUICKLIST 2/5] Quicklist support for IA64
2007-03-23 6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
@ 2007-03-23 6:28 ` Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
` (3 subsequent siblings)
4 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 6:28 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
Quicklist for IA64
IA64 is the origin of the quicklist implementation. So cut out the pieces
that are now in core code and modify the functions called.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc4-mm1/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/ia64/mm/init.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/ia64/mm/init.c 2007-03-20 14:21:47.000000000 -0700
@@ -39,9 +39,6 @@
DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-DEFINE_PER_CPU(unsigned long *, __pgtable_quicklist);
-DEFINE_PER_CPU(long, __pgtable_quicklist_size);
-
extern void ia64_tlb_init (void);
unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
@@ -56,54 +53,6 @@ EXPORT_SYMBOL(vmem_map);
struct page *zero_page_memmap_ptr; /* map entry for zero page */
EXPORT_SYMBOL(zero_page_memmap_ptr);
-#define MIN_PGT_PAGES 25UL
-#define MAX_PGT_FREES_PER_PASS 16L
-#define PGT_FRACTION_OF_NODE_MEM 16
-
-static inline long
-max_pgt_pages(void)
-{
- u64 node_free_pages, max_pgt_pages;
-
-#ifndef CONFIG_NUMA
- node_free_pages = nr_free_pages();
-#else
- node_free_pages = node_page_state(numa_node_id(), NR_FREE_PAGES);
-#endif
- max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM;
- max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES);
- return max_pgt_pages;
-}
-
-static inline long
-min_pages_to_free(void)
-{
- long pages_to_free;
-
- pages_to_free = pgtable_quicklist_size - max_pgt_pages();
- pages_to_free = min(pages_to_free, MAX_PGT_FREES_PER_PASS);
- return pages_to_free;
-}
-
-void
-check_pgt_cache(void)
-{
- long pages_to_free;
-
- if (unlikely(pgtable_quicklist_size <= MIN_PGT_PAGES))
- return;
-
- preempt_disable();
- while (unlikely((pages_to_free = min_pages_to_free()) > 0)) {
- while (pages_to_free--) {
- free_page((unsigned long)pgtable_quicklist_alloc());
- }
- preempt_enable();
- preempt_disable();
- }
- preempt_enable();
-}
-
void
lazy_mmu_prot_update (pte_t pte)
{
Index: linux-2.6.21-rc4-mm1/include/asm-ia64/pgalloc.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-ia64/pgalloc.h 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-ia64/pgalloc.h 2007-03-20 14:55:47.000000000 -0700
@@ -18,71 +18,18 @@
#include <linux/mm.h>
#include <linux/page-flags.h>
#include <linux/threads.h>
+#include <linux/quicklist.h>
#include <asm/mmu_context.h>
-DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist);
-#define pgtable_quicklist __ia64_per_cpu_var(__pgtable_quicklist)
-DECLARE_PER_CPU(long, __pgtable_quicklist_size);
-#define pgtable_quicklist_size __ia64_per_cpu_var(__pgtable_quicklist_size)
-
-static inline long pgtable_quicklist_total_size(void)
-{
- long ql_size = 0;
- int cpuid;
-
- for_each_online_cpu(cpuid) {
- ql_size += per_cpu(__pgtable_quicklist_size, cpuid);
- }
- return ql_size;
-}
-
-static inline void *pgtable_quicklist_alloc(void)
-{
- unsigned long *ret = NULL;
-
- preempt_disable();
-
- ret = pgtable_quicklist;
- if (likely(ret != NULL)) {
- pgtable_quicklist = (unsigned long *)(*ret);
- ret[0] = 0;
- --pgtable_quicklist_size;
- preempt_enable();
- } else {
- preempt_enable();
- ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
- }
-
- return ret;
-}
-
-static inline void pgtable_quicklist_free(void *pgtable_entry)
-{
-#ifdef CONFIG_NUMA
- int nid = page_to_nid(virt_to_page(pgtable_entry));
-
- if (unlikely(nid != numa_node_id())) {
- free_page((unsigned long)pgtable_entry);
- return;
- }
-#endif
-
- preempt_disable();
- *(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist;
- pgtable_quicklist = (unsigned long *)pgtable_entry;
- ++pgtable_quicklist_size;
- preempt_enable();
-}
-
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
- return pgtable_quicklist_alloc();
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pgd_free(pgd_t * pgd)
{
- pgtable_quicklist_free(pgd);
+ quicklist_free(0, NULL, pgd);
}
#ifdef CONFIG_PGTABLE_4
@@ -94,12 +41,12 @@ pgd_populate(struct mm_struct *mm, pgd_t
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return pgtable_quicklist_alloc();
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pud_free(pud_t * pud)
{
- pgtable_quicklist_free(pud);
+ quicklist_free(0, NULL, pud);
}
#define __pud_free_tlb(tlb, pud) pud_free(pud)
#endif /* CONFIG_PGTABLE_4 */
@@ -112,12 +59,12 @@ pud_populate(struct mm_struct *mm, pud_t
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return pgtable_quicklist_alloc();
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pmd_free(pmd_t * pmd)
{
- pgtable_quicklist_free(pmd);
+ quicklist_free(0, NULL, pmd);
}
#define __pmd_free_tlb(tlb, pmd) pmd_free(pmd)
@@ -137,28 +84,31 @@ pmd_populate_kernel(struct mm_struct *mm
static inline struct page *pte_alloc_one(struct mm_struct *mm,
unsigned long addr)
{
- void *pg = pgtable_quicklist_alloc();
+ void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
return pg ? virt_to_page(pg) : NULL;
}
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long addr)
{
- return pgtable_quicklist_alloc();
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pte_free(struct page *pte)
{
- pgtable_quicklist_free(page_address(pte));
+ quicklist_free(0, NULL, page_address(pte));
}
static inline void pte_free_kernel(pte_t * pte)
{
- pgtable_quicklist_free(pte);
+ quicklist_free(0, NULL, pte);
}
-#define __pte_free_tlb(tlb, pte) pte_free(pte)
+static inline void check_pgt_cache(void)
+{
+ quicklist_trim(0, NULL, 25, 16);
+}
-extern void check_pgt_cache(void);
+#define __pte_free_tlb(tlb, pte) pte_free(pte)
#endif /* _ASM_IA64_PGALLOC_H */
Index: linux-2.6.21-rc4-mm1/arch/ia64/mm/contig.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/ia64/mm/contig.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/ia64/mm/contig.c 2007-03-20 14:21:47.000000000 -0700
@@ -88,7 +88,7 @@ void show_mem(void)
printk(KERN_INFO "%d pages shared\n", total_shared);
printk(KERN_INFO "%d pages swap cached\n", total_cached);
printk(KERN_INFO "Total of %ld pages in page table cache\n",
- pgtable_quicklist_total_size());
+ quicklist_total_size());
printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
}
Index: linux-2.6.21-rc4-mm1/arch/ia64/mm/discontig.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/ia64/mm/discontig.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/ia64/mm/discontig.c 2007-03-20 14:21:47.000000000 -0700
@@ -563,7 +563,7 @@ void show_mem(void)
printk(KERN_INFO "%d pages shared\n", total_shared);
printk(KERN_INFO "%d pages swap cached\n", total_cached);
printk(KERN_INFO "Total of %ld pages in page table cache\n",
- pgtable_quicklist_total_size());
+ quicklist_total_size());
printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
}
Index: linux-2.6.21-rc4-mm1/arch/ia64/Kconfig
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/ia64/Kconfig 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/ia64/Kconfig 2007-03-20 14:21:47.000000000 -0700
@@ -29,6 +29,10 @@ config ZONE_DMA
def_bool y
depends on !IA64_SGI_SN2
+config QUICKLIST
+ bool
+ default y
+
config MMU
bool
default y
^ permalink raw reply [flat|nested] 25+ messages in thread
* [QUICKLIST 3/5] Quicklist support for i386
2007-03-23 6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
@ 2007-03-23 6:28 ` Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
` (2 subsequent siblings)
4 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 6:28 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
i386: Convert to quicklists
Implement the i386 management of pgd and pmds using quicklists.
The i386 management of page table pages currently uses page sized slabs.
Getting rid of that using quicklists allows full use of the page flags
and the page->lru. So get rid of the improvised linked lists using
page->index and page->private.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc4-mm1/arch/i386/mm/init.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/mm/init.c 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/mm/init.c 2007-03-20 14:21:52.000000000 -0700
@@ -695,31 +695,6 @@ int remove_memory(u64 start, u64 size)
EXPORT_SYMBOL_GPL(remove_memory);
#endif
-struct kmem_cache *pgd_cache;
-struct kmem_cache *pmd_cache;
-
-void __init pgtable_cache_init(void)
-{
- if (PTRS_PER_PMD > 1) {
- pmd_cache = kmem_cache_create("pmd",
- PTRS_PER_PMD*sizeof(pmd_t),
- PTRS_PER_PMD*sizeof(pmd_t),
- 0,
- pmd_ctor,
- NULL);
- if (!pmd_cache)
- panic("pgtable_cache_init(): cannot create pmd cache");
- }
- pgd_cache = kmem_cache_create("pgd",
- PTRS_PER_PGD*sizeof(pgd_t),
- PTRS_PER_PGD*sizeof(pgd_t),
- 0,
- pgd_ctor,
- PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
- if (!pgd_cache)
- panic("pgtable_cache_init(): Cannot create pgd cache");
-}
-
/*
* This function cannot be __init, since exceptions don't work in that
* section. Put this after the callers, so that it cannot be inlined.
Index: linux-2.6.21-rc4-mm1/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/mm/pgtable.c 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/mm/pgtable.c 2007-03-20 14:55:47.000000000 -0700
@@ -13,6 +13,7 @@
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/module.h>
+#include <linux/quicklist.h>
#include <asm/system.h>
#include <asm/pgtable.h>
@@ -198,11 +199,6 @@ struct page *pte_alloc_one(struct mm_str
return pte;
}
-void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags)
-{
- memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));
-}
-
/*
* List of all pgd's needed for non-PAE so it can invalidate entries
* in both cached and uncached pgd's; not needed for PAE since the
@@ -211,36 +207,18 @@ void pmd_ctor(void *pmd, struct kmem_cac
* against pageattr.c; it is the unique case in which a valid change
* of kernel pagetables can't be lazily synchronized by vmalloc faults.
* vmalloc faults work because attached pagetables are never freed.
- * The locking scheme was chosen on the basis of manfred's
- * recommendations and having no core impact whatsoever.
* -- wli
*/
DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
-
-static inline void pgd_list_add(pgd_t *pgd)
-{
- struct page *page = virt_to_page(pgd);
- page->index = (unsigned long)pgd_list;
- if (pgd_list)
- set_page_private(pgd_list, (unsigned long)&page->index);
- pgd_list = page;
- set_page_private(page, (unsigned long)&pgd_list);
-}
+LIST_HEAD(pgd_list);
-static inline void pgd_list_del(pgd_t *pgd)
-{
- struct page *next, **pprev, *page = virt_to_page(pgd);
- next = (struct page *)page->index;
- pprev = (struct page **)page_private(page);
- *pprev = next;
- if (next)
- set_page_private(next, (unsigned long)pprev);
-}
+#define QUICK_PGD 0
+#define QUICK_PMD 1
-void pgd_ctor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_ctor(void *pgd)
{
unsigned long flags;
+ struct page *page = virt_to_page(pgd);
if (PTRS_PER_PMD == 1) {
memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
@@ -259,31 +237,32 @@ void pgd_ctor(void *pgd, struct kmem_cac
__pa(swapper_pg_dir) >> PAGE_SHIFT,
USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD);
- pgd_list_add(pgd);
+ list_add(&page->lru, &pgd_list);
spin_unlock_irqrestore(&pgd_lock, flags);
}
/* never called when PTRS_PER_PMD > 1 */
-void pgd_dtor(void *pgd, struct kmem_cache *cache, unsigned long unused)
+void pgd_dtor(void *pgd)
{
unsigned long flags; /* can be called from interrupt context */
+ struct page *page = virt_to_page(pgd);
paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT);
spin_lock_irqsave(&pgd_lock, flags);
- pgd_list_del(pgd);
+ list_del(&page->lru);
spin_unlock_irqrestore(&pgd_lock, flags);
}
pgd_t *pgd_alloc(struct mm_struct *mm)
{
int i;
- pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
+ pgd_t *pgd = quicklist_alloc(QUICK_PGD, GFP_KERNEL, pgd_ctor);
if (PTRS_PER_PMD == 1 || !pgd)
return pgd;
for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
- pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
+ pmd_t *pmd = quicklist_alloc(QUICK_PMD, GFP_KERNEL, NULL);
if (!pmd)
goto out_oom;
paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT);
@@ -296,9 +275,9 @@ out_oom:
pgd_t pgdent = pgd[i];
void* pmd = (void *)__va(pgd_val(pgdent)-1);
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
- kmem_cache_free(pmd_cache, pmd);
+ quicklist_free(QUICK_PMD, NULL, pmd);
}
- kmem_cache_free(pgd_cache, pgd);
+ quicklist_free(QUICK_PGD, pgd_dtor, pgd);
return NULL;
}
@@ -312,8 +291,14 @@ void pgd_free(pgd_t *pgd)
pgd_t pgdent = pgd[i];
void* pmd = (void *)__va(pgd_val(pgdent)-1);
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
- kmem_cache_free(pmd_cache, pmd);
+ quicklist_free(QUICK_PMD, NULL, pmd);
}
/* in the non-PAE case, free_pgtables() clears user pgd entries */
- kmem_cache_free(pgd_cache, pgd);
+ quicklist_free(QUICK_PGD, pgd_ctor, pgd);
+}
+
+void check_pgt_cache(void)
+{
+ quicklist_trim(QUICK_PGD, pgd_dtor, 25, 16);
+ quicklist_trim(QUICK_PMD, NULL, 25, 16);
}
Index: linux-2.6.21-rc4-mm1/arch/i386/Kconfig
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/Kconfig 2007-03-20 14:20:27.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/Kconfig 2007-03-20 14:21:52.000000000 -0700
@@ -55,6 +55,14 @@ config ZONE_DMA
bool
default y
+config QUICKLIST
+ bool
+ default y
+
+config NR_QUICK
+ int
+ default 2
+
config SBUS
bool
Index: linux-2.6.21-rc4-mm1/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-i386/pgtable.h 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-i386/pgtable.h 2007-03-20 14:21:52.000000000 -0700
@@ -35,15 +35,12 @@ struct vm_area_struct;
#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
extern unsigned long empty_zero_page[1024];
extern pgd_t swapper_pg_dir[1024];
-extern struct kmem_cache *pgd_cache;
-extern struct kmem_cache *pmd_cache;
-extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
-void pmd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_dtor(void *, struct kmem_cache *, unsigned long);
-void pgtable_cache_init(void);
+void check_pgt_cache(void);
+
+extern spinlock_t pgd_lock;
+extern struct list_head pgd_list;
+static inline void pgtable_cache_init(void) {};
void paging_init(void);
/*
Index: linux-2.6.21-rc4-mm1/arch/i386/kernel/smp.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/kernel/smp.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/kernel/smp.c 2007-03-20 14:21:52.000000000 -0700
@@ -437,7 +437,7 @@ void flush_tlb_mm (struct mm_struct * mm
}
if (!cpus_empty(cpu_mask))
flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+ check_pgt_cache();
preempt_enable();
}
Index: linux-2.6.21-rc4-mm1/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/kernel/process.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/kernel/process.c 2007-03-20 14:21:52.000000000 -0700
@@ -181,6 +181,7 @@ void cpu_idle(void)
if (__get_cpu_var(cpu_idle_state))
__get_cpu_var(cpu_idle_state) = 0;
+ check_pgt_cache();
rmb();
idle = pm_idle;
Index: linux-2.6.21-rc4-mm1/include/asm-i386/pgalloc.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-i386/pgalloc.h 2007-03-20 14:21:00.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-i386/pgalloc.h 2007-03-20 14:21:52.000000000 -0700
@@ -65,6 +65,6 @@ do { \
#define pud_populate(mm, pmd, pte) BUG()
#endif
-#define check_pgt_cache() do { } while (0)
+extern void check_pgt_cache(void);
#endif /* _I386_PGALLOC_H */
Index: linux-2.6.21-rc4-mm1/arch/i386/mm/fault.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/mm/fault.c 2007-03-20 14:20:28.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/mm/fault.c 2007-03-20 14:21:52.000000000 -0700
@@ -623,11 +623,10 @@ void vmalloc_sync_all(void)
struct page *page;
spin_lock_irqsave(&pgd_lock, flags);
- for (page = pgd_list; page; page =
- (struct page *)page->index)
+ list_for_each_entry(page, &pgd_list, lru)
if (!vmalloc_sync_one(page_address(page),
address)) {
- BUG_ON(page != pgd_list);
+ BUG();
break;
}
spin_unlock_irqrestore(&pgd_lock, flags);
Index: linux-2.6.21-rc4-mm1/arch/i386/mm/pageattr.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/i386/mm/pageattr.c 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/i386/mm/pageattr.c 2007-03-20 14:21:52.000000000 -0700
@@ -95,7 +95,7 @@ static void set_pmd_pte(pte_t *kpte, uns
return;
spin_lock_irqsave(&pgd_lock, flags);
- for (page = pgd_list; page; page = (struct page *)page->index) {
+ list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
^ permalink raw reply [flat|nested] 25+ messages in thread
* [QUICKLIST 4/5] Quicklist support for x86_64
2007-03-23 6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
@ 2007-03-23 6:28 ` Christoph Lameter
2007-03-23 6:29 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
2007-03-23 6:39 ` [QUICKLIST 1/5] Quicklists for page table pages V4 Andrew Morton
4 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 6:28 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
Conver x86_64 to using quicklists
This adds caching of pgds and puds, pmds, pte. That way we can
avoid costly zeroing and initialization of special mappings in the
pgd.
A second quicklist is used to separate out PGD handling. Thus we can carry
the initialized pgds of terminating processes over to the next process
needing them.
Also clean up the pgd_list handling to use regular list macros. Not using
the slab allocator frees up the lru field so we can use regular list macros.
The adding and removal of the pgds to the pgdlist is moved into the
constructor / destructor. We can then avoid moving pgds off the list that
are still in the quicklists reducing the pds creation and allocation
overhead further.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc4-mm1/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/x86_64/Kconfig 2007-03-20 14:20:34.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/x86_64/Kconfig 2007-03-20 14:21:57.000000000 -0700
@@ -56,6 +56,14 @@ config ZONE_DMA
bool
default y
+config QUICKLIST
+ bool
+ default y
+
+config NR_QUICK
+ int
+ default 2
+
config ISA
bool
Index: linux-2.6.21-rc4-mm1/include/asm-x86_64/pgalloc.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-x86_64/pgalloc.h 2007-03-20 14:21:06.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-x86_64/pgalloc.h 2007-03-20 14:55:47.000000000 -0700
@@ -4,6 +4,10 @@
#include <asm/pda.h>
#include <linux/threads.h>
#include <linux/mm.h>
+#include <linux/quicklist.h>
+
+#define QUICK_PGD 0 /* We preserve special mappings over free */
+#define QUICK_PT 1 /* Other page table pages that are zero on free */
#define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
@@ -20,86 +24,77 @@ static inline void pmd_populate(struct m
static inline void pmd_free(pmd_t *pmd)
{
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
- free_page((unsigned long)pmd);
+ quicklist_free(QUICK_PT, NULL, pmd);
}
static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
{
- return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
}
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
}
static inline void pud_free (pud_t *pud)
{
BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
- free_page((unsigned long)pud);
+ quicklist_free(QUICK_PT, NULL, pud);
}
-static inline void pgd_list_add(pgd_t *pgd)
+static inline void pgd_ctor(void *x)
{
+ unsigned boundary;
+ pgd_t *pgd = x;
struct page *page = virt_to_page(pgd);
+ /*
+ * Copy kernel pointers in from init.
+ */
+ boundary = pgd_index(__PAGE_OFFSET);
+ memcpy(pgd + boundary,
+ init_level4_pgt + boundary,
+ (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+
spin_lock(&pgd_lock);
- page->index = (pgoff_t)pgd_list;
- if (pgd_list)
- pgd_list->private = (unsigned long)&page->index;
- pgd_list = page;
- page->private = (unsigned long)&pgd_list;
+ list_add(&page->lru, &pgd_list);
spin_unlock(&pgd_lock);
}
-static inline void pgd_list_del(pgd_t *pgd)
+static inline void pgd_dtor(void *x)
{
- struct page *next, **pprev, *page = virt_to_page(pgd);
+ pgd_t *pgd = x;
+ struct page *page = virt_to_page(pgd);
spin_lock(&pgd_lock);
- next = (struct page *)page->index;
- pprev = (struct page **)page->private;
- *pprev = next;
- if (next)
- next->private = (unsigned long)pprev;
+ list_del(&page->lru);
spin_unlock(&pgd_lock);
}
+
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
- unsigned boundary;
- pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
- if (!pgd)
- return NULL;
- pgd_list_add(pgd);
- /*
- * Copy kernel pointers in from init.
- * Could keep a freelist or slab cache of those because the kernel
- * part never changes.
- */
- boundary = pgd_index(__PAGE_OFFSET);
- memset(pgd, 0, boundary * sizeof(pgd_t));
- memcpy(pgd + boundary,
- init_level4_pgt + boundary,
- (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+ pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
+ GFP_KERNEL|__GFP_REPEAT, pgd_ctor);
+
return pgd;
}
static inline void pgd_free(pgd_t *pgd)
{
BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
- pgd_list_del(pgd);
- free_page((unsigned long)pgd);
+ quicklist_free(QUICK_PGD, pgd_dtor, pgd);
}
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
- return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
}
static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
{
- void *p = (void *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ void *p = (void *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
if (!p)
return NULL;
return virt_to_page(p);
@@ -111,17 +106,22 @@ static inline struct page *pte_alloc_one
static inline void pte_free_kernel(pte_t *pte)
{
BUG_ON((unsigned long)pte & (PAGE_SIZE-1));
- free_page((unsigned long)pte);
+ quicklist_free(QUICK_PT, NULL, pte);
}
static inline void pte_free(struct page *pte)
{
__free_page(pte);
-}
+}
#define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
#define __pmd_free_tlb(tlb,x) tlb_remove_page((tlb),virt_to_page(x))
#define __pud_free_tlb(tlb,x) tlb_remove_page((tlb),virt_to_page(x))
+static inline void check_pgt_cache(void)
+{
+ quicklist_trim(QUICK_PGD, pgd_dtor, 25, 16);
+ quicklist_trim(QUICK_PT, NULL, 25, 16);
+}
#endif /* _X86_64_PGALLOC_H */
Index: linux-2.6.21-rc4-mm1/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/x86_64/kernel/process.c 2007-03-20 14:20:35.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/x86_64/kernel/process.c 2007-03-20 14:21:57.000000000 -0700
@@ -207,6 +207,7 @@ void cpu_idle (void)
if (__get_cpu_var(cpu_idle_state))
__get_cpu_var(cpu_idle_state) = 0;
+ check_pgt_cache();
rmb();
idle = pm_idle;
if (!idle)
Index: linux-2.6.21-rc4-mm1/arch/x86_64/kernel/smp.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/x86_64/kernel/smp.c 2007-03-20 14:20:35.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/x86_64/kernel/smp.c 2007-03-20 14:21:57.000000000 -0700
@@ -242,7 +242,7 @@ void flush_tlb_mm (struct mm_struct * mm
}
if (!cpus_empty(cpu_mask))
flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
-
+ check_pgt_cache();
preempt_enable();
}
EXPORT_SYMBOL(flush_tlb_mm);
Index: linux-2.6.21-rc4-mm1/arch/x86_64/mm/fault.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/x86_64/mm/fault.c 2007-03-20 14:20:35.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/x86_64/mm/fault.c 2007-03-20 14:21:57.000000000 -0700
@@ -585,7 +585,7 @@ do_sigbus:
}
DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
+LIST_HEAD(pgd_list);
void vmalloc_sync_all(void)
{
@@ -605,8 +605,7 @@ void vmalloc_sync_all(void)
if (pgd_none(*pgd_ref))
continue;
spin_lock(&pgd_lock);
- for (page = pgd_list; page;
- page = (struct page *)page->index) {
+ list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
pgd = (pgd_t *)page_address(page) + pgd_index(address);
if (pgd_none(*pgd))
Index: linux-2.6.21-rc4-mm1/include/asm-x86_64/pgtable.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-x86_64/pgtable.h 2007-03-20 14:21:06.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-x86_64/pgtable.h 2007-03-20 14:21:57.000000000 -0700
@@ -402,7 +402,7 @@ static inline pte_t pte_modify(pte_t pte
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
+extern struct list_head pgd_list;
void vmalloc_sync_all(void);
#endif /* !__ASSEMBLY__ */
@@ -419,7 +419,6 @@ extern int kern_addr_valid(unsigned long
#define HAVE_ARCH_UNMAPPED_AREA
#define pgtable_cache_init() do { } while (0)
-#define check_pgt_cache() do { } while (0)
#define PAGE_AGP PAGE_KERNEL_NOCACHE
#define HAVE_PAGE_AGP 1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [QUICKLIST 5/5] Quicklist support for sparc64
2007-03-23 6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
` (2 preceding siblings ...)
2007-03-23 6:28 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
@ 2007-03-23 6:29 ` Christoph Lameter
2007-03-23 6:39 ` [QUICKLIST 1/5] Quicklists for page table pages V4 Andrew Morton
4 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 6:29 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, Christoph Lameter, linux-kernel
From: David Miller <davem@davemloft.net>
[QUICKLIST]: Add sparc64 quicklist support.
I ported this to sparc64 as per the patch below, tested on
UP SunBlade1500 and 24 cpu Niagara T1000.
Signed-off-by: David S. Miller <davem@davemloft.net>
Index: linux-2.6.21-rc4-mm1/arch/sparc64/Kconfig
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/sparc64/Kconfig 2007-03-20 14:20:33.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/sparc64/Kconfig 2007-03-20 14:22:03.000000000 -0700
@@ -26,6 +26,10 @@ config MMU
bool
default y
+config QUICKLIST
+ bool
+ default y
+
config STACKTRACE_SUPPORT
bool
default y
Index: linux-2.6.21-rc4-mm1/arch/sparc64/mm/init.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/sparc64/mm/init.c 2007-03-20 14:20:33.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/sparc64/mm/init.c 2007-03-20 14:22:03.000000000 -0700
@@ -178,30 +178,6 @@ unsigned long sparc64_kern_sec_context _
int bigkernel = 0;
-struct kmem_cache *pgtable_cache __read_mostly;
-
-static void zero_ctor(void *addr, struct kmem_cache *cache, unsigned long flags)
-{
- clear_page(addr);
-}
-
-extern void tsb_cache_init(void);
-
-void pgtable_cache_init(void)
-{
- pgtable_cache = kmem_cache_create("pgtable_cache",
- PAGE_SIZE, PAGE_SIZE,
- SLAB_HWCACHE_ALIGN |
- SLAB_MUST_HWCACHE_ALIGN,
- zero_ctor,
- NULL);
- if (!pgtable_cache) {
- prom_printf("Could not create pgtable_cache\n");
- prom_halt();
- }
- tsb_cache_init();
-}
-
#ifdef CONFIG_DEBUG_DCFLUSH
atomic_t dcpage_flushes = ATOMIC_INIT(0);
#ifdef CONFIG_SMP
Index: linux-2.6.21-rc4-mm1/arch/sparc64/mm/tsb.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/arch/sparc64/mm/tsb.c 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/arch/sparc64/mm/tsb.c 2007-03-20 14:22:03.000000000 -0700
@@ -252,7 +252,7 @@ static const char *tsb_cache_names[8] =
"tsb_1MB",
};
-void __init tsb_cache_init(void)
+void __init pgtable_cache_init(void)
{
unsigned long i;
Index: linux-2.6.21-rc4-mm1/include/asm-sparc64/pgalloc.h
===================================================================
--- linux-2.6.21-rc4-mm1.orig/include/asm-sparc64/pgalloc.h 2007-03-15 17:20:01.000000000 -0700
+++ linux-2.6.21-rc4-mm1/include/asm-sparc64/pgalloc.h 2007-03-20 14:55:47.000000000 -0700
@@ -6,6 +6,7 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/slab.h>
+#include <linux/quicklist.h>
#include <asm/spitfire.h>
#include <asm/cpudata.h>
@@ -13,52 +14,50 @@
#include <asm/page.h>
/* Page table allocation/freeing. */
-extern struct kmem_cache *pgtable_cache;
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
- return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pgd_free(pgd_t *pgd)
{
- kmem_cache_free(pgtable_cache, pgd);
+ quicklist_free(0, NULL, pgd);
}
#define pud_populate(MM, PUD, PMD) pud_set(PUD, PMD)
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return kmem_cache_alloc(pgtable_cache,
- GFP_KERNEL|__GFP_REPEAT);
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline void pmd_free(pmd_t *pmd)
{
- kmem_cache_free(pgtable_cache, pmd);
+ quicklist_free(0, NULL, pmd);
}
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- return kmem_cache_alloc(pgtable_cache,
- GFP_KERNEL|__GFP_REPEAT);
+ return quicklist_alloc(0, GFP_KERNEL, NULL);
}
static inline struct page *pte_alloc_one(struct mm_struct *mm,
unsigned long address)
{
- return virt_to_page(pte_alloc_one_kernel(mm, address));
+ void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
+ return pg ? virt_to_page(pg) : NULL;
}
static inline void pte_free_kernel(pte_t *pte)
{
- kmem_cache_free(pgtable_cache, pte);
+ quicklist_free(0, NULL, pte);
}
static inline void pte_free(struct page *ptepage)
{
- pte_free_kernel(page_address(ptepage));
+ quicklist_free(0, NULL, page_address(ptepage));
}
@@ -66,6 +65,9 @@ static inline void pte_free(struct page
#define pmd_populate(MM,PMD,PTE_PAGE) \
pmd_populate_kernel(MM,PMD,page_address(PTE_PAGE))
-#define check_pgt_cache() do { } while (0)
+static inline void check_pgt_cache(void)
+{
+ quicklist_trim(0, NULL, 25, 16);
+}
#endif /* _SPARC64_PGALLOC_H */
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
` (3 preceding siblings ...)
2007-03-23 6:29 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
@ 2007-03-23 6:39 ` Andrew Morton
2007-03-23 6:52 ` Christoph Lameter
4 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2007-03-23 6:39 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> 1. Proven code from the IA64 arch.
>
> The method used here has been fine tuned for years and
> is NUMA aware. It is based on the knowledge that accesses
> to page table pages are sparse in nature. Taking a page
> off the freelists instead of allocating a zeroed pages
> allows a reduction of number of cachelines touched
> in addition to getting rid of the slab overhead. So
> performance improves.
By how much?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 6:39 ` [QUICKLIST 1/5] Quicklists for page table pages V4 Andrew Morton
@ 2007-03-23 6:52 ` Christoph Lameter
2007-03-23 7:48 ` Andrew Morton
0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 6:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Thu, 22 Mar 2007, Andrew Morton wrote:
> On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
>
> > 1. Proven code from the IA64 arch.
> >
> > The method used here has been fine tuned for years and
> > is NUMA aware. It is based on the knowledge that accesses
> > to page table pages are sparse in nature. Taking a page
> > off the freelists instead of allocating a zeroed pages
> > allows a reduction of number of cachelines touched
> > in addition to getting rid of the slab overhead. So
> > performance improves.
>
> By how much?
About 40% on fork+exit. See
http://marc.info/?l=linux-ia64&m=110942798406005&w=2
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 6:52 ` Christoph Lameter
@ 2007-03-23 7:48 ` Andrew Morton
2007-03-23 11:23 ` William Lee Irwin III
` (3 more replies)
0 siblings, 4 replies; 25+ messages in thread
From: Andrew Morton @ 2007-03-23 7:48 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Thu, 22 Mar 2007 23:52:05 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> On Thu, 22 Mar 2007, Andrew Morton wrote:
>
> > On Thu, 22 Mar 2007 23:28:41 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> >
> > > 1. Proven code from the IA64 arch.
> > >
> > > The method used here has been fine tuned for years and
> > > is NUMA aware. It is based on the knowledge that accesses
> > > to page table pages are sparse in nature. Taking a page
> > > off the freelists instead of allocating a zeroed pages
> > > allows a reduction of number of cachelines touched
> > > in addition to getting rid of the slab overhead. So
> > > performance improves.
> >
> > By how much?
>
> About 40% on fork+exit. See
>
> http://marc.info/?l=linux-ia64&m=110942798406005&w=2
>
afacit that two-year-old, totally-different patch has nothing to do with my
repeatedly-asked question. It appears to be consolidating three separate
quicklist allocators into one common implementation.
In an attempt to answer my own question (and hence to justify the retention
of this custom allocator) I did this:
diff -puN include/linux/quicklist.h~qlhack include/linux/quicklist.h
--- a/include/linux/quicklist.h~qlhack
+++ a/include/linux/quicklist.h
@@ -32,45 +32,17 @@ DECLARE_PER_CPU(struct quicklist, quickl
*/
static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
{
- struct quicklist *q;
- void **p = NULL;
-
- q =&get_cpu_var(quicklist)[nr];
- p = q->page;
- if (likely(p)) {
- q->page = p[0];
- p[0] = NULL;
- q->nr_pages--;
- }
- put_cpu_var(quicklist);
- if (likely(p))
- return p;
-
- p = (void *)__get_free_page(flags | __GFP_ZERO);
+ void *p = (void *)__get_free_page(flags | __GFP_ZERO);
if (ctor && p)
ctor(p);
return p;
}
-static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
+static inline void quicklist_free(int nr, void (*dtor)(void *), void *p)
{
- struct quicklist *q;
- void **p = pp;
- struct page *page = virt_to_page(p);
- int nid = page_to_nid(page);
-
- if (unlikely(nid != numa_node_id())) {
- if (dtor)
- dtor(p);
- free_page((unsigned long)p);
- return;
- }
-
- q = &get_cpu_var(quicklist)[nr];
- p[0] = q->page;
- q->page = p;
- q->nr_pages++;
- put_cpu_var(quicklist);
+ if (dtor)
+ dtor(p);
+ free_page((unsigned long)p);
}
void quicklist_trim(int nr, void (*dtor)(void *),
@@ -81,4 +53,3 @@ unsigned long quicklist_total_size(void)
#endif
#endif /* LINUX_QUICKLIST_H */
-
_
but it crashes early in the page allocator (i386) and I don't see why. It
makes me wonder if we have a use-after-free which is hidden by the presence
of the quicklist buffering or something.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 7:48 ` Andrew Morton
@ 2007-03-23 11:23 ` William Lee Irwin III
2007-03-23 14:58 ` Christoph Lameter
2007-03-23 11:29 ` William Lee Irwin III
` (2 subsequent siblings)
3 siblings, 1 reply; 25+ messages in thread
From: William Lee Irwin III @ 2007-03-23 11:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, linux-kernel
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote:
> afacit that two-year-old, totally-different patch has nothing to do with my
> repeatedly-asked question. It appears to be consolidating three separate
> quicklist allocators into one common implementation.
> In an attempt to answer my own question (and hence to justify the retention
> of this custom allocator) I did this:
[... patch changing allocator alloc()/free() to bare page allocations ...]
> but it crashes early in the page allocator (i386) and I don't see why. It
> makes me wonder if we have a use-after-free which is hidden by the presence
> of the quicklist buffering or something.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 7:48 ` Andrew Morton
2007-03-23 11:23 ` William Lee Irwin III
@ 2007-03-23 11:29 ` William Lee Irwin III
2007-03-23 14:57 ` William Lee Irwin III
2007-03-23 11:39 ` Nick Piggin
2007-03-23 15:08 ` Christoph Lameter
3 siblings, 1 reply; 25+ messages in thread
From: William Lee Irwin III @ 2007-03-23 11:29 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, linux-kernel
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote:
> afacit that two-year-old, totally-different patch has nothing to do with my
> repeatedly-asked question. It appears to be consolidating three separate
> quicklist allocators into one common implementation.
> In an attempt to answer my own question (and hence to justify the retention
> of this custom allocator) I did this:
[... patch changing allocator alloc()/free() to bare page allocations ...]
> but it crashes early in the page allocator (i386) and I don't see why. It
> makes me wonder if we have a use-after-free which is hidden by the presence
> of the quicklist buffering or something.
Sorry I flubbed the first message. Anyway this does mean something is
seriously wrong and needs to be debugged. Looking into it now.
-- wli
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 7:48 ` Andrew Morton
2007-03-23 11:23 ` William Lee Irwin III
2007-03-23 11:29 ` William Lee Irwin III
@ 2007-03-23 11:39 ` Nick Piggin
2007-03-24 5:14 ` Andrew Morton
2007-03-23 15:08 ` Christoph Lameter
3 siblings, 1 reply; 25+ messages in thread
From: Nick Piggin @ 2007-03-23 11:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, linux-kernel
Andrew Morton wrote:
> but it crashes early in the page allocator (i386) and I don't see why. It
> makes me wonder if we have a use-after-free which is hidden by the presence
> of the quicklist buffering or something.
Does CONFIG_DEBUG_PAGEALLOC catch it?
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 11:29 ` William Lee Irwin III
@ 2007-03-23 14:57 ` William Lee Irwin III
2007-03-23 19:17 ` William Lee Irwin III
0 siblings, 1 reply; 25+ messages in thread
From: William Lee Irwin III @ 2007-03-23 14:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, linux-kernel
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote:
>> afacit that two-year-old, totally-different patch has nothing to do with my
>> repeatedly-asked question. It appears to be consolidating three separate
>> quicklist allocators into one common implementation.
>> In an attempt to answer my own question (and hence to justify the retention
>> of this custom allocator) I did this:
> [... patch changing allocator alloc()/free() to bare page allocations ...]
>> but it crashes early in the page allocator (i386) and I don't see why. It
>> makes me wonder if we have a use-after-free which is hidden by the presence
>> of the quicklist buffering or something.
On Fri, Mar 23, 2007 at 04:29:20AM -0700, William Lee Irwin III wrote:
> Sorry I flubbed the first message. Anyway this does mean something is
> seriously wrong and needs to be debugged. Looking into it now.
I know what's happening. I just need to catch the culprit.
-- wli
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 11:23 ` William Lee Irwin III
@ 2007-03-23 14:58 ` Christoph Lameter
0 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 14:58 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andrew Morton, linux-mm, linux-kernel
On Fri, 23 Mar 2007, William Lee Irwin III wrote:
> [... patch changing allocator alloc()/free() to bare page allocations ...]
> > but it crashes early in the page allocator (i386) and I don't see why. It
> > makes me wonder if we have a use-after-free which is hidden by the presence
> > of the quicklist buffering or something.
Sorry there seems to be some email dropouts today. I am getting
fragments of slab and quicklist discussions. Maybe I can get the whole story from
the mailing lists.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 7:48 ` Andrew Morton
` (2 preceding siblings ...)
2007-03-23 11:39 ` Nick Piggin
@ 2007-03-23 15:08 ` Christoph Lameter
2007-03-23 17:54 ` Christoph Lameter
3 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 15:08 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Thu, 22 Mar 2007, Andrew Morton wrote:
> > About 40% on fork+exit. See
> >
> > http://marc.info/?l=linux-ia64&m=110942798406005&w=2
> >
>
> afacit that two-year-old, totally-different patch has nothing to do with my
> repeatedly-asked question. It appears to be consolidating three separate
> quicklist allocators into one common implementation.
Yes it shows the performance gains from the quicklist approach. This the
work Robin Holt did on the problem. The problem is how to validate the
patch because there should be no change at all on ia64 and on i386 we
basically measure the overhead of the slab allocations. One could
measure the impact x86_64 because this introduces quicklists to that
platform.
The earlier discussion focused on avoiding zeroing of pte as far as I can
recall.
> but it crashes early in the page allocator (i386) and I don't see why. It
> makes me wonder if we have a use-after-free which is hidden by the presence
> of the quicklist buffering or something.
This was on i386? Could be hidden now by the slab use ther.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 15:08 ` Christoph Lameter
@ 2007-03-23 17:54 ` Christoph Lameter
2007-03-24 6:21 ` Andrew Morton
0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2007-03-23 17:54 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
Here are the results of aim9 tests on x86_64. There are some minor performance
improvements and some fluctuations. Page size is only a fourth of that on
ia64 so the resulting benefit is less in terms of saved cacheline fetches.
The benefit is also likely higher on i386 because it can fit double the
page table entries into a page.
1 add_double 1096039.60 1096039.60 0.00 0.00% Thousand Double Precision Additions/second
2 add_float 1087128.71 1099009.90 11881.19 1.09% Thousand Single Precision Additions/second
3 add_long 4019704.43 4374384.24 354679.81 8.82% Thousand Long Integer Additions/second
4 add_int 3772277.23 3772277.23 0.00 0.00% Thousand Integer Additions/second
5 add_short 3754455.45 3761194.03 6738.58 0.18% Thousand Short Integer Additions/second
6 creat-clo 259405.94 267164.18 7758.24 2.99% File Creations and Closes/second
7 page_test 233118.81 235970.15 2851.34 1.22% System Allocations & Pages/second
8 brk_test 3425247.52 3408457.71 -16789.81 -0.49% System Memory Allocations/second
9 jmp_test 21819306.93 21808457.71 -10849.22 -0.05% Non-local gotos/second
10 signal_test 669154.23 689552.24 20398.01 3.05% Signal Traps/second
11 exec_test 747.52 743.78 -3.74 -0.50% Program Loads/second
12 fork_test 8267.33 8457.71 190.38 2.30% Task Creations/second
13 link_test 43819.31 44318.32 499.01 1.14% Link/Unlink Pairs/second
28 fun_cal 326463366.34 326559203.98 95837.64 0.03% Function Calls (no arguments)/second
29 fun_cal1 358906930.69 388202985.07 29296054.38 8.16% Function Calls (1 argument)/second
30 fun_cal2 356372277.23 356362189.05 -10088.18 -0.00% Function Calls (2 arguments)/second
31 fun_cal15 156641584.16 156656716.42 15132.26 0.01% Function Calls (15 arguments)/second
45 mem_rtns_2 1588762.38 1610298.51 21536.13 1.36% Block Memory Operations/second
46 sort_rtns_1 935.32 1004.98 69.66 7.45% Sort Operations/second
47 misc_rtns_1 17099.01 17268.66 169.65 0.99% Auxiliary Loops/second
48 dir_rtns_1 5925742.57 6313432.84 387690.27 6.54% Directory Operations/second
52 series_1 11469950.50 11625771.14 155820.64 1.36% Series Evaluations/second
53 shared_memory 1187313.43 1177910.45 -9402.98 -0.79% Shared Memory Operations/second
54 tcp_test 83183.17 83507.46 324.29 0.39% TCP/IP Messages/second
55 udp_test 273514.85 269801.00 -3713.85 -1.36% UDP/IP DataGrams/second
56 fifo_test 741237.62 803930.35 62692.73 8.46% FIFO Messages/second
57 stream_pipe 885099.01 1058059.70 172960.69 19.54% Stream Pipe Messages/second
58 dgram_pipe 881782.18 957213.93 75431.75 8.55% DataGram Pipe Messages/second
59 pipe_cpy 1355891.09 1316766.17 -39124.92 -2.89% Pipe Messages/second
2.6.21-rc4 bare
------------------------------------------------------------------------------------------------------------
Test Test Elapsed Iteration Iteration Operation
Number Name Time (sec) Count Rate (loops/sec) Rate (ops/sec)
------------------------------------------------------------------------------------------------------------
1 add_double 2.02 123 60.89109 1096039.60 Thousand Double Precision Additions/second
2 add_float 2.02 183 90.59406 1087128.71 Thousand Single Precision Additions/second
3 add_long 2.03 136 66.99507 4019704.43 Thousand Long Integer Additions/second
4 add_int 2.02 127 62.87129 3772277.23 Thousand Integer Additions/second
5 add_short 2.02 316 156.43564 3754455.45 Thousand Short Integer Additions/second
6 creat-clo 2.02 524 259.40594 259405.94 File Creations and Closes/second
7 page_test 2.02 277 137.12871 233118.81 System Allocations & Pages/second
8 brk_test 2.02 407 201.48515 3425247.52 System Memory Allocations/second
9 jmp_test 2.02 44075 21819.30693 21819306.93 Non-local gotos/second
10 signal_test 2.01 1345 669.15423 669154.23 Signal Traps/second
11 exec_test 2.02 302 149.50495 747.52 Program Loads/second
12 fork_test 2.02 167 82.67327 8267.33 Task Creations/second
13 link_test 2.02 1405 695.54455 43819.31 Link/Unlink Pairs/second
14 disk_rr 2.02 65 32.17822 164752.48 Random Disk Reads (K)/second
15 disk_rw 2.03 55 27.09360 138719.21 Random Disk Writes (K)/second
16 disk_rd 2.02 467 231.18812 1183683.17 Sequential Disk Reads (K)/second
17 disk_wrt 2.02 81 40.09901 205306.93 Sequential Disk Writes (K)/second
18 disk_cp 2.04 65 31.86275 163137.25 Disk Copies (K)/second
19 sync_disk_rw 2.05 2 0.97561 2497.56 Sync Random Disk Writes (K)/second
20 sync_disk_wrt 2.15 1 0.46512 1190.70 Sync Sequential Disk Writes (K)/second
21 sync_disk_cp 2.49 1 0.40161 1028.11 Sync Disk Copies (K)/second
22 disk_src 2.02 1049 519.30693 38948.02 Directory Searches/second
23 div_double 2.02 141 69.80198 209405.94 Thousand Double Precision Divides/second
24 div_float 2.02 141 69.80198 209405.94 Thousand Single Precision Divides/second
25 div_long 2.04 68 33.33333 30000.00 Thousand Long Integer Divides/second
26 div_int 2.02 120 59.40594 53465.35 Thousand Integer Divides/second
27 div_short 2.03 119 58.62069 52758.62 Thousand Short Integer Divides/second
28 fun_cal 2.02 1288 637.62376 326463366.34 Function Calls (no arguments)/second
29 fun_cal1 2.02 1416 700.99010 358906930.69 Function Calls (1 argument)/second
30 fun_cal2 2.02 1406 696.03960 356372277.23 Function Calls (2 arguments)/second
31 fun_cal15 2.02 618 305.94059 156641584.16 Function Calls (15 arguments)/second
32 sieve 2.15 10 4.65116 23.26 Integer Sieves/second
33 mul_double 2.02 185 91.58416 1099009.90 Thousand Double Precision Multiplies/second
34 mul_float 2.02 185 91.58416 1099009.90 Thousand Single Precision Multiplies/second
35 mul_long 2.02 6129 3034.15842 728198.02 Thousand Long Integer Multiplies/second
36 mul_int 2.02 8517 4216.33663 1011920.79 Thousand Integer Multiplies/second
37 mul_short 2.02 6817 3374.75248 1012425.74 Thousand Short Integer Multiplies/second
38 num_rtns_1 2.02 6260 3099.00990 309900.99 Numeric Functions/second
39 new_raph 2.02 6305 3121.28713 624257.43 Zeros Found/second
40 trig_rtns 2.02 243 120.29703 1202970.30 Trigonometric Functions/second
41 matrix_rtns 2.02 69718 34513.86139 3451386.14 Point Transformations/second
42 array_rtns 2.01 165 82.08955 1641.79 Linear Systems Solved/second
43 string_rtns 2.02 120 59.40594 5940.59 String Manipulations/second
44 mem_rtns_1 2.02 370 183.16832 5495049.50 Dynamic Memory Operations/second
45 mem_rtns_2 2.02 32093 15887.62376 1588762.38 Block Memory Operations/second
46 sort_rtns_1 2.01 188 93.53234 935.32 Sort Operations/second
47 misc_rtns_1 2.02 3454 1709.90099 17099.01 Auxiliary Loops/second
48 dir_rtns_1 2.02 1197 592.57426 5925742.57 Directory Operations/second
49 shell_rtns_1 2.02 328 162.37624 162.38 Shell Scripts/second
50 shell_rtns_2 2.02 327 161.88119 161.88 Shell Scripts/second
51 shell_rtns_3 2.02 327 161.88119 161.88 Shell Scripts/second
52 series_1 2.02 231693 114699.50495 11469950.50 Series Evaluations/second
53 shared_memory 2.01 23865 11873.13433 1187313.43 Shared Memory Operations/second
54 tcp_test 2.02 1867 924.25743 83183.17 TCP/IP Messages/second
55 udp_test 2.02 5525 2735.14851 273514.85 UDP/IP DataGrams/second
56 fifo_test 2.02 14973 7412.37624 741237.62 FIFO Messages/second
57 stream_pipe 2.02 17879 8850.99010 885099.01 Stream Pipe Messages/second
58 dgram_pipe 2.02 17812 8817.82178 881782.18 DataGram Pipe Messages/second
59 pipe_cpy 2.02 27389 13558.91089 1355891.09 Pipe Messages/second
60 ram_copy 2.02 353141 174822.27723 4374053376.24 Memory to Memory Copy/second
2.6.21-rc4 x86_64 quicklist
------------------------------------------------------------------------------------------------------------
Test Test Elapsed Iteration Iteration Operation
Number Name Time (sec) Count Rate (loops/sec) Rate (ops/sec)
------------------------------------------------------------------------------------------------------------
1 add_double 2.02 123 60.89109 1096039.60 Thousand Double Precision Additions/second
2 add_float 2.02 185 91.58416 1099009.90 Thousand Single Precision Additions/second
3 add_long 2.03 148 72.90640 4374384.24 Thousand Long Integer Additions/second
4 add_int 2.02 127 62.87129 3772277.23 Thousand Integer Additions/second
5 add_short 2.01 315 156.71642 3761194.03 Thousand Short Integer Additions/second
6 creat-clo 2.01 537 267.16418 267164.18 File Creations and Closes/second
7 page_test 2.01 279 138.80597 235970.15 System Allocations & Pages/second
8 brk_test 2.01 403 200.49751 3408457.71 System Memory Allocations/second
9 jmp_test 2.01 43835 21808.45771 21808457.71 Non-local gotos/second
10 signal_test 2.01 1386 689.55224 689552.24 Signal Traps/second
11 exec_test 2.01 299 148.75622 743.78 Program Loads/second
12 fork_test 2.01 170 84.57711 8457.71 Task Creations/second
13 link_test 2.02 1421 703.46535 44318.32 Link/Unlink Pairs/second
14 disk_rr 2.01 63 31.34328 160477.61 Random Disk Reads (K)/second
15 disk_rw 2.02 53 26.23762 134336.63 Random Disk Writes (K)/second
16 disk_rd 2.01 498 247.76119 1268537.31 Sequential Disk Reads (K)/second
17 disk_wrt 2.02 78 38.61386 197702.97 Sequential Disk Writes (K)/second
18 disk_cp 2.04 64 31.37255 160627.45 Disk Copies (K)/second
19 sync_disk_rw 2.65 2 0.75472 1932.08 Sync Random Disk Writes (K)/second
20 sync_disk_wrt 3.96 2 0.50505 1292.93 Sync Sequential Disk Writes (K)/second
21 sync_disk_cp 2.31 1 0.43290 1108.23 Sync Disk Copies (K)/second
22 disk_src 2.01 1079 536.81592 40261.19 Directory Searches/second
23 div_double 2.02 141 69.80198 209405.94 Thousand Double Precision Divides/second
24 div_float 2.01 140 69.65174 208955.22 Thousand Single Precision Divides/second
25 div_long 2.01 67 33.33333 30000.00 Thousand Long Integer Divides/second
26 div_int 2.02 120 59.40594 53465.35 Thousand Integer Divides/second
27 div_short 2.01 118 58.70647 52835.82 Thousand Short Integer Divides/second
28 fun_cal 2.01 1282 637.81095 326559203.98 Function Calls (no arguments)/second
29 fun_cal1 2.01 1524 758.20896 388202985.07 Function Calls (1 argument)/second
30 fun_cal2 2.01 1399 696.01990 356362189.05 Function Calls (2 arguments)/second
31 fun_cal15 2.01 615 305.97015 156656716.42 Function Calls (15 arguments)/second
32 sieve 2.16 10 4.62963 23.15 Integer Sieves/second
33 mul_double 2.02 185 91.58416 1099009.90 Thousand Double Precision Multiplies/second
34 mul_float 2.02 185 91.58416 1099009.90 Thousand Single Precision Multiplies/second
35 mul_long 2.02 6128 3033.66337 728079.21 Thousand Long Integer Multiplies/second
36 mul_int 2.01 8475 4216.41791 1011940.30 Thousand Integer Multiplies/second
37 mul_short 2.01 6783 3374.62687 1012388.06 Thousand Short Integer Multiplies/second
38 num_rtns_1 2.01 6264 3116.41791 311641.79 Numeric Functions/second
39 new_raph 2.01 6261 3114.92537 622985.07 Zeros Found/second
40 trig_rtns 2.01 239 118.90547 1189054.73 Trigonometric Functions/second
41 matrix_rtns 2.01 69555 34604.47761 3460447.76 Point Transformations/second
42 array_rtns 2.01 165 82.08955 1641.79 Linear Systems Solved/second
43 string_rtns 2.01 118 58.70647 5870.65 String Manipulations/second
44 mem_rtns_1 2.01 370 184.07960 5522388.06 Dynamic Memory Operations/second
45 mem_rtns_2 2.01 32367 16102.98507 1610298.51 Block Memory Operations/second
46 sort_rtns_1 2.01 202 100.49751 1004.98 Sort Operations/second
47 misc_rtns_1 2.01 3471 1726.86567 17268.66 Auxiliary Loops/second
48 dir_rtns_1 2.01 1269 631.34328 6313432.84 Directory Operations/second
49 shell_rtns_1 2.01 321 159.70149 159.70 Shell Scripts/second
50 shell_rtns_2 2.01 324 161.19403 161.19 Shell Scripts/second
51 shell_rtns_3 2.01 325 161.69154 161.69 Shell Scripts/second
52 series_1 2.01 233678 116257.71144 11625771.14 Series Evaluations/second
53 shared_memory 2.01 23676 11779.10448 1177910.45 Shared Memory Operations/second
54 tcp_test 2.01 1865 927.86070 83507.46 TCP/IP Messages/second
55 udp_test 2.01 5423 2698.00995 269801.00 UDP/IP DataGrams/second
56 fifo_test 2.01 16159 8039.30348 803930.35 FIFO Messages/second
57 stream_pipe 2.01 21267 10580.59701 1058059.70 Stream Pipe Messages/second
58 dgram_pipe 2.01 19240 9572.13930 957213.93 DataGram Pipe Messages/second
59 pipe_cpy 2.01 26467 13167.66169 1316766.17 Pipe Messages/second
60 ram_copy 2.01 351052 174652.73632 4369811462.69 Memory to Memory Copy/second
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 14:57 ` William Lee Irwin III
@ 2007-03-23 19:17 ` William Lee Irwin III
0 siblings, 0 replies; 25+ messages in thread
From: William Lee Irwin III @ 2007-03-23 19:17 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, linux-kernel
On Thu, Mar 22, 2007 at 11:48:48PM -0800, Andrew Morton wrote:
>>> afacit that two-year-old, totally-different patch has nothing to do with my
>>> repeatedly-asked question. It appears to be consolidating three separate
>>> quicklist allocators into one common implementation.
>>> In an attempt to answer my own question (and hence to justify the retention
>>> of this custom allocator) I did this:
>> [... patch changing allocator alloc()/free() to bare page allocations ...]
>>> but it crashes early in the page allocator (i386) and I don't see why. It
>>> makes me wonder if we have a use-after-free which is hidden by the presence
>>> of the quicklist buffering or something.
On Fri, Mar 23, 2007 at 04:29:20AM -0700, William Lee Irwin III wrote:
>> Sorry I flubbed the first message. Anyway this does mean something is
>> seriously wrong and needs to be debugged. Looking into it now.
On Fri, Mar 23, 2007 at 07:57:07AM -0700, William Lee Irwin III wrote:
> I know what's happening. I just need to catch the culprit.
Are you tripping the BUG_ON() in include/linux/mm.h:256 with
CONFIG_DEBUG_VM set?
-- wli
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 11:39 ` Nick Piggin
@ 2007-03-24 5:14 ` Andrew Morton
0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2007-03-24 5:14 UTC (permalink / raw)
To: Nick Piggin
Cc: Christoph Lameter, linux-mm, linux-kernel, William Lee Irwin III
On Fri, 23 Mar 2007 22:39:24 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Andrew Morton wrote:
>
> > but it crashes early in the page allocator (i386) and I don't see why. It
> > makes me wonder if we have a use-after-free which is hidden by the presence
> > of the quicklist buffering or something.
>
> Does CONFIG_DEBUG_PAGEALLOC catch it?
It'll be a while before I can get onto doing anything with this.
I do have an oops trace:
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 296k freed
Write protecting the kernel read-only data: 921k
BUG: unable to handle kernel paging request at virtual address 00100104
printing eip:
c015b676
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in:
CPU: 1
EIP: 0060:[<c015b676>] Not tainted VLI
EFLAGS: 00010002 (2.6.21-rc4 #6)
EIP is at get_page_from_freelist+0x166/0x3d0
eax: c1b110bc ebx: 00000001 ecx: 00100100 edx: 00200200
esi: c1b11090 edi: c04cc500 ebp: f67d3b88 esp: f67d3b34
ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
Process default.hotplug (pid: 872, ti=f67d2000 task=f6748030 task.ti=f67d2000)
Stack: 00000001 00000044 c067eae8 00000001 00000001 00000000 c04cc6c0 c04cc4a0
00000001 00000000 000284d0 c04ccb78 00000286 00000001 00000000 f67b6000
00000000 00000001 c04cc4a0 f6748030 000084d0 f67d3bcc c015b92e 00000044
Call Trace:
[<c0103e6a>] show_trace_log_lvl+0x1a/0x30
[<c0103f29>] show_stack_log_lvl+0xa9/0xd0
[<c0104139>] show_registers+0x1e9/0x2f0
[<c0104355>] die+0x115/0x250
[<c011561e>] do_page_fault+0x27e/0x630
[<c03d5f64>] error_code+0x7c/0x84
[<c015b92e>] __alloc_pages+0x4e/0x2f0
[<c0114c84>] pte_alloc_one+0x14/0x20
[<c0163d1b>] __pte_alloc+0x1b/0xa0
[<c016459d>] __handle_mm_fault+0x7fd/0x940
[<c01154b9>] do_page_fault+0x119/0x630
[<c03d5f64>] error_code+0x7c/0x84
[<c01a5e8f>] padzero+0x1f/0x30
[<c01a744e>] load_elf_binary+0x76e/0x1a80
[<c017c2c7>] search_binary_handler+0x97/0x220
[<c01a5886>] load_script+0x1d6/0x220
[<c017c2c7>] search_binary_handler+0x97/0x220
[<c017dd0f>] do_execve+0x14f/0x200
[<c010140e>] sys_execve+0x2e/0x80
[<c0102dcc>] sysenter_past_esp+0x5d/0x99
=======================
Code: 06 8b 4d c0 8b 7d c8 8d 04 81 8d 44 82 20 01 c7 9c 8f 45 dc fa e8 4b f4 fd ff 8b 07 85 c0 74 7b 8b 47 0c 8b 08 8d 70 d4 8b 50 04 <89> 51 04 89 0a c7 40 04 00 02 20 00 c7 00 00 01 10 00 ff 0f 8b
EIP: [<c015b676>] get_page_from_freelist+0x166/0x3d0 SS:ESP 0068:f67d3b34
Not pretty. That was bare mainline+christoph's patches+that patch which I sent.
Using http://userweb.kernel.org/~akpm/config-vmm.txt
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-23 17:54 ` Christoph Lameter
@ 2007-03-24 6:21 ` Andrew Morton
2007-03-26 16:52 ` Christoph Lameter
0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2007-03-24 6:21 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> Here are the results of aim9 tests on x86_64. There are some minor performance
> improvements and some fluctuations.
There are a lot of numbers there - what do they tell us?
> 2.6.21-rc4 bare
> 2.6.21-rc4 x86_64 quicklist
So what has changed here? From a quick look it appears that x86_64 is
using get_zeroed_page() for ptes, puds and pmds and is using a custom
quicklist for pgds.
After your patches, x86_64 is using a common quicklist allocator for puds,
pmds and pgds and continues to use get_zeroed_page() for ptes.
Or something totally different, dunno. I tire.
My question is pretty simple: how do we justify the retention of this
custom allocator?
Because simply removing it is the preferable way of fixing the SLUB
problem.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-24 6:21 ` Andrew Morton
@ 2007-03-26 16:52 ` Christoph Lameter
2007-03-26 18:14 ` Christoph Lameter
2007-03-26 18:26 ` Andrew Morton
0 siblings, 2 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-26 16:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Fri, 23 Mar 2007, Andrew Morton wrote:
> On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
>
> > Here are the results of aim9 tests on x86_64. There are some minor performance
> > improvements and some fluctuations.
>
> There are a lot of numbers there - what do they tell us?
That there are performance improvements because of quicklists.
> So what has changed here? From a quick look it appears that x86_64 is
> using get_zeroed_page() for ptes, puds and pmds and is using a custom
> quicklist for pgds.
x86_64 is only using a list in order to track pgds. There is no
quicklist without this patchset.
> After your patches, x86_64 is using a common quicklist allocator for puds,
> pmds and pgds and continues to use get_zeroed_page() for ptes.
x86_64 should be using quicklists for all ptes after this patch. I did not
convert pte_free() since it is only used for freeing ptes during races
(see __pte_alloc). Since pte_free gets passed a page struct it would require
virt_to_page before being put onto the freelist. Not worth doing.
Hmmm... Then how does x86_64 free the ptes? Seems that we do
free_page_and_swap_cache() in tlb_remove_pages. Yup so ptes are not
handled which limits the speed improvements that we see.
> My question is pretty simple: how do we justify the retention of this
> custom allocator?
I would expect this functionality (never thought about it as an allocator)
to extract common code from many arches that use one or the other form of
preserving zeroed pages for page table pages. I saw lots of arches doing
the same with some getting into trouble with the page structs. Having a
common code base that does not have this issue would clean up the kernel
and deal with the slab issue.
> Because simply removing it is the preferable way of fixing the SLUB
> problem.
That would reduce performance. I did not think that a common feature
that is used throughout many arches would need rejustification.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-26 16:52 ` Christoph Lameter
@ 2007-03-26 18:14 ` Christoph Lameter
2007-03-26 18:26 ` Andrew Morton
1 sibling, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-26 18:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Mon, 26 Mar 2007, Christoph Lameter wrote:
> > After your patches, x86_64 is using a common quicklist allocator for puds,
> > pmds and pgds and continues to use get_zeroed_page() for ptes.
>
> x86_64 should be using quicklists for all ptes after this patch. I did not
> convert pte_free() since it is only used for freeing ptes during races
> (see __pte_alloc). Since pte_free gets passed a page struct it would require
> virt_to_page before being put onto the freelist. Not worth doing.
>
> Hmmm... Then how does x86_64 free the ptes? Seems that we do
> free_page_and_swap_cache() in tlb_remove_pages. Yup so ptes are not
> handled which limits the speed improvements that we see.
And if we would try to put the ptes onto quicklists then we would get into
more difficulties with the tlb shootdown code. Sigh. We cannot easily
deal with ptes. Quicklists on i386 and x86_64 only work for pgds,puds and
pmds. And as was pointed out elsewhere in this thread: The performance
gains are therefore limited on these platforms.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-26 16:52 ` Christoph Lameter
2007-03-26 18:14 ` Christoph Lameter
@ 2007-03-26 18:26 ` Andrew Morton
2007-03-27 1:06 ` William Lee Irwin III
2007-03-27 11:19 ` William Lee Irwin III
1 sibling, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2007-03-26 18:26 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linux-kernel
On Mon, 26 Mar 2007 09:52:17 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> On Fri, 23 Mar 2007, Andrew Morton wrote:
>
> > On Fri, 23 Mar 2007 10:54:12 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> >
> > > Here are the results of aim9 tests on x86_64. There are some minor performance
> > > improvements and some fluctuations.
> >
> > There are a lot of numbers there - what do they tell us?
>
> That there are performance improvements because of quicklists.
Christoph, you can continue to be obtuse, and I can continue to ignore
these patches until
a) it has been demonstrated that this patch is superior to simply removing
the quicklists and
b) we understand why the below simple modification crashes i386.
diff -puN include/linux/quicklist.h~qlhack include/linux/quicklist.h
--- a/include/linux/quicklist.h~qlhack
+++ a/include/linux/quicklist.h
@@ -32,45 +32,17 @@ DECLARE_PER_CPU(struct quicklist, quickl
*/
static inline void *quicklist_alloc(int nr, gfp_t flags, void (*ctor)(void *))
{
- struct quicklist *q;
- void **p = NULL;
-
- q =&get_cpu_var(quicklist)[nr];
- p = q->page;
- if (likely(p)) {
- q->page = p[0];
- p[0] = NULL;
- q->nr_pages--;
- }
- put_cpu_var(quicklist);
- if (likely(p))
- return p;
-
- p = (void *)__get_free_page(flags | __GFP_ZERO);
+ void *p = (void *)__get_free_page(flags | __GFP_ZERO);
if (ctor && p)
ctor(p);
return p;
}
-static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp)
+static inline void quicklist_free(int nr, void (*dtor)(void *), void *p)
{
- struct quicklist *q;
- void **p = pp;
- struct page *page = virt_to_page(p);
- int nid = page_to_nid(page);
-
- if (unlikely(nid != numa_node_id())) {
- if (dtor)
- dtor(p);
- free_page((unsigned long)p);
- return;
- }
-
- q = &get_cpu_var(quicklist)[nr];
- p[0] = q->page;
- q->page = p;
- q->nr_pages++;
- put_cpu_var(quicklist);
+ if (dtor)
+ dtor(p);
+ free_page((unsigned long)p);
}
void quicklist_trim(int nr, void (*dtor)(void *),
@@ -81,4 +53,3 @@ unsigned long quicklist_total_size(void)
#endif
#endif /* LINUX_QUICKLIST_H */
-
_
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-26 18:26 ` Andrew Morton
@ 2007-03-27 1:06 ` William Lee Irwin III
2007-03-27 1:22 ` Christoph Lameter
2007-03-27 1:45 ` David Miller
2007-03-27 11:19 ` William Lee Irwin III
1 sibling, 2 replies; 25+ messages in thread
From: William Lee Irwin III @ 2007-03-27 1:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, linux-kernel
On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote:
> a) it has been demonstrated that this patch is superior to simply removing
> the quicklists and
Not that clameter really needs my help, but I agree with his position
on several fronts, and advocate accordingly, so here is where I'm at.
>From prior experience, I believe I know how to extract positive results,
and that's primarily by PTE caching because they're the most frequently
zeroed pagetable nodes. The upper levels of pagetables will remain in
the noise until the leaf level bottleneck is dealt with.
PTE's need a custom tlb.h to deal with the TLB issues noted above; the
asm-generic variant will not suffice. Results above the noise level
need PTE caching. Sparse fault handling (esp. after execve() is done)
is one place in particular where improvements should be most readily
demonstrable, as only single cachelines on each allocated node should
be touched. lmbench should have a fault handling latency test for this.
On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote:
> b) we understand why the below simple modification crashes i386.
Full eager zeroing patches not dependent on quicklist code don't crash,
so there is no latent use-after-free issue covered up by caching. I'll
help out more on the i386 front as-needed.
-- wli
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-27 1:06 ` William Lee Irwin III
@ 2007-03-27 1:22 ` Christoph Lameter
2007-03-27 1:45 ` David Miller
1 sibling, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2007-03-27 1:22 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andrew Morton, linux-mm, linux-kernel
On Mon, 26 Mar 2007, William Lee Irwin III wrote:
> Not that clameter really needs my help, but I agree with his position
> on several fronts, and advocate accordingly, so here is where I'm at.
Yes thank you. I386 is not my field, I have no interest per se in
improving i386 performance and without your help I would have to drop this
and keep the special casing in SLUB for i386. Generic tlb.h changes may
also help to introduce quicklists to x86_64. The current quicklist patches
can only work on higher levels due to the freeing of ptes via
tlb_remove_page().
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-27 1:06 ` William Lee Irwin III
2007-03-27 1:22 ` Christoph Lameter
@ 2007-03-27 1:45 ` David Miller
1 sibling, 0 replies; 25+ messages in thread
From: David Miller @ 2007-03-27 1:45 UTC (permalink / raw)
To: wli; +Cc: akpm, clameter, linux-mm, linux-kernel
From: William Lee Irwin III <wli@holomorphy.com>
Date: Mon, 26 Mar 2007 18:06:24 -0700
> On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote:
> > b) we understand why the below simple modification crashes i386.
>
> Full eager zeroing patches not dependent on quicklist code don't crash,
> so there is no latent use-after-free issue covered up by caching. I'll
> help out more on the i386 front as-needed.
I've looked into this a few times and I am quite mystified as
to why that simple test patch crashes.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [QUICKLIST 1/5] Quicklists for page table pages V4
2007-03-26 18:26 ` Andrew Morton
2007-03-27 1:06 ` William Lee Irwin III
@ 2007-03-27 11:19 ` William Lee Irwin III
1 sibling, 0 replies; 25+ messages in thread
From: William Lee Irwin III @ 2007-03-27 11:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, linux-kernel
On Mon, Mar 26, 2007 at 10:26:51AM -0800, Andrew Morton wrote:
> b) we understand why the below simple modification crashes i386.
This doesn't crash i386 in qemu here on a port of the quicklist patches
to 2.6.21-rc5-mm2. I suppose I'll have to dump it on some real hardware
to see if I can reproduce it there.
-- wli
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2007-03-27 11:19 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-23 6:28 [QUICKLIST 1/5] Quicklists for page table pages V4 Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 2/5] Quicklist support for IA64 Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 3/5] Quicklist support for i386 Christoph Lameter
2007-03-23 6:28 ` [QUICKLIST 4/5] Quicklist support for x86_64 Christoph Lameter
2007-03-23 6:29 ` [QUICKLIST 5/5] Quicklist support for sparc64 Christoph Lameter
2007-03-23 6:39 ` [QUICKLIST 1/5] Quicklists for page table pages V4 Andrew Morton
2007-03-23 6:52 ` Christoph Lameter
2007-03-23 7:48 ` Andrew Morton
2007-03-23 11:23 ` William Lee Irwin III
2007-03-23 14:58 ` Christoph Lameter
2007-03-23 11:29 ` William Lee Irwin III
2007-03-23 14:57 ` William Lee Irwin III
2007-03-23 19:17 ` William Lee Irwin III
2007-03-23 11:39 ` Nick Piggin
2007-03-24 5:14 ` Andrew Morton
2007-03-23 15:08 ` Christoph Lameter
2007-03-23 17:54 ` Christoph Lameter
2007-03-24 6:21 ` Andrew Morton
2007-03-26 16:52 ` Christoph Lameter
2007-03-26 18:14 ` Christoph Lameter
2007-03-26 18:26 ` Andrew Morton
2007-03-27 1:06 ` William Lee Irwin III
2007-03-27 1:22 ` Christoph Lameter
2007-03-27 1:45 ` David Miller
2007-03-27 11:19 ` William Lee Irwin III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).