LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] [0/8] GBpages support for x86-64, v2
@ 2008-01-19  6:48 Andi Kleen
  2008-01-19  6:48 ` [PATCH] [1/8] Handle kernel near memory hole in clear_kernel_mapping Andi Kleen
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: mingo, tglx, linux-kernel


This patch series supports using the new GB pages introduced with 
AMD Quad Cores for the kernel direct mapping.

I addressed all reasonable feedback for the previous version I believe.

Changes against previous version:
- Ported on top of latest git-x86 with PAT series
- Fixed some white space
- Clarify clear_kernel_mapping comments
- Minor cleanups

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [1/8] Handle kernel near memory hole in clear_kernel_mapping
  2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
@ 2008-01-19  6:48 ` Andi Kleen
  2008-01-19  6:48 ` [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit Andi Kleen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: ebiederm, vgoyal, mingo, tglx, linux-kernel


This was a long standing obscure problem in the relocatable kernel. The
AMD GART driver needs to unmap part of the GART in the kernel direct mapping to 
prevent cache corruption. With the relocatable kernel it is in theory possible 
that the separate kernel text mapping straddles that area too. 

Normally it should not happen because GART tends to be >= 2GB, and the kernel 
is normally not loaded that high, but it is possible in theory. 

Teach clear_kernel_mapping() about this case.

This will become more important once the kernel mapping uses 1GB pages.

Cc: ebiederm@xmission.com
Cc: vgoyal@redhat.com

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/init_64.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -415,7 +415,8 @@ void __init paging_init(void)
    from the CPU leading to inconsistent cache lines. address and size
    must be aligned to 2MB boundaries. 
    Does nothing when the mapping doesn't exist. */
-void __init clear_kernel_mapping(unsigned long address, unsigned long size) 
+static void __init
+__clear_kernel_mapping(unsigned long address, unsigned long size)
 {
 	unsigned long end = address + size;
 
@@ -445,6 +446,28 @@ void __init clear_kernel_mapping(unsigne
 	__flush_tlb_all();
 } 
 
+#define overlaps(as, ae, bs, be) ((ae) >= (bs) && (as) <= (be))
+
+void __init clear_kernel_mapping(unsigned long address, unsigned long size)
+{
+	int sh = PMD_SHIFT;
+	unsigned long kernel = __pa(__START_KERNEL_map);
+
+	/*
+	 * Note that we cannot unmap the kernel itself because the unmapped
+	 * holes here are always at least 2MB aligned.
+	 * This just applies to the trailing areas of the 40MB kernel mapping.
+	 */
+	if (overlaps(kernel >> sh, (kernel + KERNEL_TEXT_SIZE) >> sh,
+			__pa(address) >> sh, __pa(address + size) >> sh)) {
+		printk(KERN_WARNING
+			"Kernel mapping at %lx within 2MB of memory hole\n",
+				kernel);
+		__clear_kernel_mapping(__START_KERNEL_map+__pa(address), size);
+	}
+	__clear_kernel_mapping(address, size);
+}
+
 /*
  * Memory hotplug specific functions
  */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit
  2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
  2008-01-19  6:48 ` [PATCH] [1/8] Handle kernel near memory hole in clear_kernel_mapping Andi Kleen
@ 2008-01-19  6:48 ` Andi Kleen
  2008-01-23 21:26   ` Jan Engelhardt
  2008-01-19  6:48 ` [PATCH] [3/8] GBPAGES: Split LARGE_PAGE_SIZE/MASK into PUD_PAGE_SIZE/PMD_PAGE_SIZE Andi Kleen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: mingo, tglx, linux-kernel


Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-x86/cpufeature.h |    2 ++
 1 file changed, 2 insertions(+)

Index: linux/include/asm-x86/cpufeature.h
===================================================================
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -49,6 +49,7 @@
 #define X86_FEATURE_MP		(1*32+19) /* MP Capable. */
 #define X86_FEATURE_NX		(1*32+20) /* Execute Disable */
 #define X86_FEATURE_MMXEXT	(1*32+22) /* AMD MMX extensions */
+#define X86_FEATURE_GBPAGES	(1*32+26) /* GB pages */
 #define X86_FEATURE_RDTSCP	(1*32+27) /* RDTSCP */
 #define X86_FEATURE_LM		(1*32+29) /* Long Mode (x86-64) */
 #define X86_FEATURE_3DNOWEXT	(1*32+30) /* AMD 3DNow! extensions */
@@ -173,6 +174,7 @@
 #define cpu_has_bts		boot_cpu_has(X86_FEATURE_BTS)
 #define cpu_has_pat		boot_cpu_has(X86_FEATURE_PAT)
 #define cpu_has_ss		boot_cpu_has(X86_FEATURE_SELFSNOOP)
+#define cpu_has_gbpages 	boot_cpu_has(X86_FEATURE_GBPAGES)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg 	1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [3/8] GBPAGES: Split LARGE_PAGE_SIZE/MASK into PUD_PAGE_SIZE/PMD_PAGE_SIZE
  2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
  2008-01-19  6:48 ` [PATCH] [1/8] Handle kernel near memory hole in clear_kernel_mapping Andi Kleen
  2008-01-19  6:48 ` [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit Andi Kleen
@ 2008-01-19  6:48 ` Andi Kleen
  2008-01-19  6:48 ` [PATCH] [4/8] Add pgtable accessor functions for GB pages Andi Kleen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: mingo, tglx, linux-kernel


Split the existing LARGE_PAGE_SIZE/MASK macro into two new macros
PUD_PAGE_SIZE/MASK and PMD_PAGE_SIZE/MASK. 

Fix up all callers to use the new names.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/boot/compressed/head_64.S |    8 ++++----
 arch/x86/kernel/head_64.S          |    4 ++--
 arch/x86/kernel/pci-gart_64.c      |    2 +-
 arch/x86/mm/init_64.c              |    6 +++---
 arch/x86/mm/pageattr_64.c          |    4 ++--
 include/asm-x86/page.h             |    4 ++--
 include/asm-x86/page_32.h          |    4 ++++
 include/asm-x86/page_64.h          |    3 +++
 8 files changed, 21 insertions(+), 14 deletions(-)

Index: linux/include/asm-x86/page_64.h
===================================================================
--- linux.orig/include/asm-x86/page_64.h
+++ linux/include/asm-x86/page_64.h
@@ -23,6 +23,9 @@
 #define MCE_STACK 5
 #define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
 
+#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
+#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
+
 #define __PAGE_OFFSET		_AC(0xffff810000000000, UL)
 
 #define __PHYSICAL_START	CONFIG_PHYSICAL_START
Index: linux/arch/x86/boot/compressed/head_64.S
===================================================================
--- linux.orig/arch/x86/boot/compressed/head_64.S
+++ linux/arch/x86/boot/compressed/head_64.S
@@ -80,8 +80,8 @@ startup_32:
 
 #ifdef CONFIG_RELOCATABLE
 	movl	%ebp, %ebx
-	addl	$(LARGE_PAGE_SIZE -1), %ebx
-	andl	$LARGE_PAGE_MASK, %ebx
+	addl	$(PMD_PAGE_SIZE -1), %ebx
+	andl	$PMD_PAGE_MASK, %ebx
 #else
 	movl	$CONFIG_PHYSICAL_START, %ebx
 #endif
@@ -220,8 +220,8 @@ ENTRY(startup_64)
 	/* Start with the delta to where the kernel will run at. */
 #ifdef CONFIG_RELOCATABLE
 	leaq	startup_32(%rip) /* - $startup_32 */, %rbp
-	addq	$(LARGE_PAGE_SIZE - 1), %rbp
-	andq	$LARGE_PAGE_MASK, %rbp
+	addq	$(PMD_PAGE_SIZE - 1), %rbp
+	andq	$PMD_PAGE_MASK, %rbp
 	movq	%rbp, %rbx
 #else
 	movq	$CONFIG_PHYSICAL_START, %rbp
Index: linux/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux.orig/arch/x86/kernel/pci-gart_64.c
+++ linux/arch/x86/kernel/pci-gart_64.c
@@ -501,7 +501,7 @@ static __init unsigned long check_iommu_
 	}
 
 	a = aper + iommu_size;
-	iommu_size -= round_up(a, LARGE_PAGE_SIZE) - a;
+	iommu_size -= round_up(a, PMD_PAGE_SIZE) - a;
 
 	if (iommu_size < 64*1024*1024) {
 		printk(KERN_WARNING
Index: linux/arch/x86/kernel/head_64.S
===================================================================
--- linux.orig/arch/x86/kernel/head_64.S
+++ linux/arch/x86/kernel/head_64.S
@@ -63,7 +63,7 @@ startup_64:
 
 	/* Is the address not 2M aligned? */
 	movq	%rbp, %rax
-	andl	$~LARGE_PAGE_MASK, %eax
+	andl	$~PMD_PAGE_MASK, %eax
 	testl	%eax, %eax
 	jnz	bad_address
 
@@ -88,7 +88,7 @@ startup_64:
 
 	/* Add an Identity mapping if I am above 1G */
 	leaq	_text(%rip), %rdi
-	andq	$LARGE_PAGE_MASK, %rdi
+	andq	$PMD_PAGE_MASK, %rdi
 
 	movq	%rdi, %rax
 	shrq	$PUD_SHIFT, %rax
Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -420,10 +420,10 @@ __clear_kernel_mapping(unsigned long add
 {
 	unsigned long end = address + size;
 
-	BUG_ON(address & ~LARGE_PAGE_MASK);
-	BUG_ON(size & ~LARGE_PAGE_MASK); 
+	BUG_ON(address & ~PMD_PAGE_MASK);
+	BUG_ON(size & ~PMD_PAGE_MASK);
 	
-	for (; address < end; address += LARGE_PAGE_SIZE) { 
+	for (; address < end; address += PMD_PAGE_SIZE) {
 		pgd_t *pgd = pgd_offset_k(address);
 		pud_t *pud;
 		pmd_t *pmd;
Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -70,7 +70,7 @@ static struct page *split_large_page(uns
 	page_private(base) = 0;
 
 	address = __pa(address);
-	addr = address & LARGE_PAGE_MASK; 
+	addr = address & PMD_PAGE_MASK;
 	pbase = (pte_t *)page_address(base);
 	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
 		pbase[i] = pfn_pte(addr >> PAGE_SHIFT, 
@@ -150,7 +150,7 @@ static void revert_page(unsigned long ad
 	BUG_ON(pud_none(*pud));
 	pmd = pmd_offset(pud, address);
 	BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
-	pfn = (__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT;
+	pfn = (__pa(address) & PMD_PAGE_MASK) >> PAGE_SHIFT;
 	large_pte = pfn_pte(pfn, ref_prot);
 	large_pte = pte_mkhuge(large_pte);
 	set_pte((pte_t *)pmd, large_pte);
Index: linux/include/asm-x86/page_32.h
===================================================================
--- linux.orig/include/asm-x86/page_32.h
+++ linux/include/asm-x86/page_32.h
@@ -13,6 +13,10 @@
  */
 #define __PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
 
+/* Eventually 32bit should be moved over to the new names too */
+#define LARGE_PAGE_SIZE PMD_PAGE_SIZE
+#define LARGE_PAGE_MASK PMD_PAGE_MASK
+
 #ifdef CONFIG_X86_PAE
 #define __PHYSICAL_MASK_SHIFT	36
 #define __VIRTUAL_MASK_SHIFT	32
Index: linux/include/asm-x86/page.h
===================================================================
--- linux.orig/include/asm-x86/page.h
+++ linux/include/asm-x86/page.h
@@ -13,8 +13,8 @@
 #define PHYSICAL_PAGE_MASK	(PAGE_MASK & __PHYSICAL_MASK)
 #define PTE_MASK		(_AT(long, PHYSICAL_PAGE_MASK))
 
-#define LARGE_PAGE_SIZE 	(_AC(1,UL) << PMD_SHIFT)
-#define LARGE_PAGE_MASK 	(~(LARGE_PAGE_SIZE-1))
+#define PMD_PAGE_SIZE		(_AC(1, UL) << PMD_SHIFT)
+#define PMD_PAGE_MASK		(~(PMD_PAGE_SIZE-1))
 
 #define HPAGE_SHIFT		PMD_SHIFT
 #define HPAGE_SIZE		(_AC(1,UL) << HPAGE_SHIFT)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [4/8] Add pgtable accessor functions for GB pages
  2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
                   ` (2 preceding siblings ...)
  2008-01-19  6:48 ` [PATCH] [3/8] GBPAGES: Split LARGE_PAGE_SIZE/MASK into PUD_PAGE_SIZE/PMD_PAGE_SIZE Andi Kleen
@ 2008-01-19  6:48 ` Andi Kleen
  2008-01-19  6:48 ` [PATCH] [5/8] GBPAGES: Support gbpages in pagetable dump Andi Kleen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: mingo, tglx, linux-kernel


Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-x86/pgtable_64.h |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux/include/asm-x86/pgtable_64.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -208,6 +208,12 @@ static inline unsigned long pmd_bad(pmd_
 #define pud_offset(pgd, address) ((pud_t *) pgd_page_vaddr(*(pgd)) + pud_index(address))
 #define pud_present(pud) (pud_val(pud) & _PAGE_PRESENT)
 
+static inline int pud_large(pud_t pte)
+{
+	return (pud_val(pte) & (_PAGE_PSE|_PAGE_PRESENT)) ==
+		(_PAGE_PSE|_PAGE_PRESENT);
+}
+
 /* PMD  - Level 2 access */
 #define pmd_page_vaddr(pmd) ((unsigned long) __va(pmd_val(pmd) & PTE_MASK))
 #define pmd_page(pmd)		(pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [5/8] GBPAGES: Support gbpages in pagetable dump
  2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
                   ` (3 preceding siblings ...)
  2008-01-19  6:48 ` [PATCH] [4/8] Add pgtable accessor functions for GB pages Andi Kleen
@ 2008-01-19  6:48 ` Andi Kleen
  2008-01-19  6:48 ` [PATCH] [6/8] Add an option to disable direct mapping gbpages and a global variable Andi Kleen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: mingo, tglx, linux-kernel


Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/fault_64.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/fault_64.c
===================================================================
--- linux.orig/arch/x86/mm/fault_64.c
+++ linux/arch/x86/mm/fault_64.c
@@ -200,7 +200,8 @@ void dump_pagetable(unsigned long addres
 	pud = pud_offset(pgd, address);
 	if (bad_address(pud)) goto bad;
 	printk("PUD %lx ", pud_val(*pud));
-	if (!pud_present(*pud))	goto ret;
+	if (!pud_present(*pud) || pud_large(*pud))
+		goto ret;
 
 	pmd = pmd_offset(pud, address);
 	if (bad_address(pmd)) goto bad;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [6/8] Add an option to disable direct mapping gbpages and a global variable
  2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
                   ` (4 preceding siblings ...)
  2008-01-19  6:48 ` [PATCH] [5/8] GBPAGES: Support gbpages in pagetable dump Andi Kleen
@ 2008-01-19  6:48 ` Andi Kleen
  2008-01-19  6:48 ` [PATCH] [7/8] CPA: Implement GBpages support in change_page_attr() Andi Kleen
  2008-01-19  6:48 ` [PATCH] [8/8] GBPAGES: Do kernel direct mapping at boot using GB pages Andi Kleen
  7 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: mingo, tglx, linux-kernel


Signed-off-by: Andi Kleen <ak@suse.de>

---
 Documentation/x86_64/boot-options.txt |    3 +++
 arch/x86/mm/init_64.c                 |   12 ++++++++++++
 include/asm-x86/pgtable_64.h          |    2 ++
 3 files changed, 17 insertions(+)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -57,6 +57,18 @@ static unsigned long dma_reserve __initd
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
+int direct_gbpages;
+
+static int __init parse_direct_gbpages(char *arg)
+{
+	if (!strcmp(arg, "off")) {
+		direct_gbpages = -1;
+		return 0;
+	}
+	return -1;
+}
+early_param("direct_gbpages", parse_direct_gbpages);
+
 /*
  * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
  * physical space so we can cache the place of the first one and move
Index: linux/include/asm-x86/pgtable_64.h
===================================================================
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -248,6 +248,8 @@ static inline int pud_large(pud_t pte)
 
 #define update_mmu_cache(vma,address,pte) do { } while (0)
 
+extern int direct_gbpages;
+
 /* Encode and de-code a swap entry */
 #define __swp_type(x)			(((x).val >> 1) & 0x3f)
 #define __swp_offset(x)			((x).val >> 8)
Index: linux/Documentation/x86_64/boot-options.txt
===================================================================
--- linux.orig/Documentation/x86_64/boot-options.txt
+++ linux/Documentation/x86_64/boot-options.txt
@@ -307,3 +307,6 @@ Debugging
 			stuck (default)
 
 Miscellaneous
+
+	direct_gbpages=off
+		Do not use GB pages for kernel direct mapping.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [7/8] CPA: Implement GBpages support in change_page_attr()
  2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
                   ` (5 preceding siblings ...)
  2008-01-19  6:48 ` [PATCH] [6/8] Add an option to disable direct mapping gbpages and a global variable Andi Kleen
@ 2008-01-19  6:48 ` Andi Kleen
  2008-01-19 18:53   ` Ingo Molnar
  2008-01-19  6:48 ` [PATCH] [8/8] GBPAGES: Do kernel direct mapping at boot using GB pages Andi Kleen
  7 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: mingo, tglx, linux-kernel


Teach c_p_a() to split and unsplit GB pages.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/pageattr_64.c |  150 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 119 insertions(+), 31 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -40,6 +40,9 @@ pte_t *lookup_address(unsigned long addr
 	pud = pud_offset(pgd, address);
 	if (!pud_present(*pud))
 		return NULL; 
+	*level = 2;
+	if (pud_large(*pud))
+		return (pte_t *)pud;
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(*pmd))
 		return NULL; 
@@ -53,30 +56,85 @@ pte_t *lookup_address(unsigned long addr
 	return pte;
 } 
 
-static struct page *split_large_page(unsigned long address, pgprot_t prot,
-				     pgprot_t ref_prot)
-{ 
-	int i; 
+static pte_t *alloc_split_page(struct page **base)
+{
+	struct page *p = alloc_page(GFP_KERNEL);
+	if (!p)
+		return NULL;
+	SetPagePrivate(p);
+	page_private(p) = 0;
+	*base = p;
+	return page_address(p);
+}
+
+static struct page *free_split_page(struct page *base)
+{
+	BUG_ON(!PagePrivate(base));
+	BUG_ON(page_private(base) != 0);
+	ClearPagePrivate(base);
+	__free_page(base);
+	return NULL;
+}
+
+static struct page *
+split_pmd(unsigned long paddr, pgprot_t prot, pgprot_t ref_prot)
+{
+	int i;
 	unsigned long addr;
-	struct page *base = alloc_pages(GFP_KERNEL, 0);
-	pte_t *pbase;
-	if (!base) 
+	struct page *base;
+	pte_t *pbase = alloc_split_page(&base);
+	if (!pbase)
 		return NULL;
-	/*
-	 * page_private is used to track the number of entries in
-	 * the page table page have non standard attributes.
-	 */
-	SetPagePrivate(base);
-	page_private(base) = 0;
 
-	address = __pa(address);
-	addr = address & PMD_PAGE_MASK;
-	pbase = (pte_t *)page_address(base);
-	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
-		pbase[i] = pfn_pte(addr >> PAGE_SHIFT, 
-				   addr == address ? prot : ref_prot);
+	addr = paddr & PMD_PAGE_MASK;
+	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE)
+		pbase[i] = pfn_pte(addr >> PAGE_SHIFT,
+			   addr == paddr ? prot : ref_prot);
+
+	return base;
+}
+
+static struct page *
+split_gb(unsigned long paddr, pgprot_t prot, pgprot_t ref_prot)
+{
+	unsigned long addr;
+	int i;
+	struct page *base;
+	pte_t *pbase = alloc_split_page(&base);
+
+	if (!pbase)
+		return NULL;
+	addr = paddr & PUD_PAGE_MASK;
+	for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_PAGE_SIZE) {
+		if (paddr >= addr && paddr < addr + PMD_PAGE_SIZE) {
+			struct page *l3;
+			l3 = split_pmd(paddr, prot, ref_prot);
+			if (!l3)
+				return free_split_page(base);
+			page_private(l3)++;
+			pbase[i] = mk_pte(l3, ref_prot);
+		} else {
+			pbase[i] = pfn_pte(addr>>PAGE_SHIFT, ref_prot);
+			pbase[i] = pte_mkhuge(pbase[i]);
+		}
 	}
 	return base;
+}
+
+static struct page *split_large_page(unsigned long address, pgprot_t prot,
+				     pgprot_t ref_prot, int level)
+{
+	unsigned long paddr = __pa(address);
+	if (level == 2)
+		return split_gb(paddr, prot, ref_prot);
+	else if (level == 3)
+		return split_pmd(paddr, prot, ref_prot);
+	else {
+		printk("address %lx\n", address);
+		dump_pagetable(address);
+		BUG();
+	}
+	return NULL;
 } 
 
 struct flush_arg {
@@ -132,17 +190,40 @@ static inline void save_page(struct page
 		list_add(&fpage->lru, &deferred_pages);
 }
 
+static void reset_large_pte(pte_t *pte, unsigned long addr, pgprot_t prot)
+{
+	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
+	set_pte(pte, pte_mkhuge(pfn_pte(pfn, prot)));
+}
+
+static void
+revert_gb(unsigned long address, pud_t *pud, pmd_t *pmd, pgprot_t ref_prot)
+{
+	struct page *p = virt_to_page(pmd);
+
+	/* Reserved pages have been already set up at boot. Don't touch those. */
+	if (PageReserved(p))
+		return;
+
+	--page_private(p);
+	BUG_ON(page_private(p) < 0);
+	if (page_private(p) == 0) {
+		save_page(p);
+		reset_large_pte((pte_t *)pud, address & PUD_PAGE_MASK,
+					ref_prot);
+	}
+}
+
 /* 
  * No more special protections in this 2MB area - revert to a
- * large page again. 
+ * large or GB page again.
  */
+
 static void revert_page(unsigned long address, pgprot_t ref_prot)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
-	pte_t large_pte;
-	unsigned long pfn;
 
 	pgd = pgd_offset_k(address);
 	BUG_ON(pgd_none(*pgd));
@@ -150,10 +231,9 @@ static void revert_page(unsigned long ad
 	BUG_ON(pud_none(*pud));
 	pmd = pmd_offset(pud, address);
 	BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
-	pfn = (__pa(address) & PMD_PAGE_MASK) >> PAGE_SHIFT;
-	large_pte = pfn_pte(pfn, ref_prot);
-	large_pte = pte_mkhuge(large_pte);
-	set_pte((pte_t *)pmd, large_pte);
+	reset_large_pte((pte_t *)pmd, address & PMD_PAGE_MASK, ref_prot);
+
+	revert_gb(address, pud, pmd, ref_prot);
 }      
 
 /*
@@ -189,6 +269,7 @@ static void set_tlb_flush(unsigned long 
 static const unsigned short pat_bit[5] = {
 	[4] = _PAGE_PAT,
 	[3] = _PAGE_PAT_LARGE,
+	[2] = _PAGE_PAT_LARGE,
 };
 
 static int cache_attr_changed(pte_t pte, pgprot_t prot, int level)
@@ -228,15 +309,14 @@ __change_page_attr(unsigned long address
 				page_private(kpte_page)++;
 			set_pte(kpte, pfn_pte(pfn, prot));
 		} else {
-			/*
-			 * split_large_page will take the reference for this
-			 * change_page_attr on the split page.
-			 */
 			struct page *split;
 			ref_prot2 = pte_pgprot(pte_clrhuge(*kpte));
-			split = split_large_page(address, prot, ref_prot2);
+			split = split_large_page(address, prot, ref_prot2,
+						level);
 			if (!split)
 				return -ENOMEM;
+			if (level == 3 && !PageReserved(kpte_page))
+				page_private(kpte_page)++;
 			pgprot_val(ref_prot2) &= ~_PAGE_NX;
 			set_pte(kpte, mk_pte(split, ref_prot2));
 			kpte_page = split;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [8/8] GBPAGES: Do kernel direct mapping at boot using GB pages
  2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
                   ` (6 preceding siblings ...)
  2008-01-19  6:48 ` [PATCH] [7/8] CPA: Implement GBpages support in change_page_attr() Andi Kleen
@ 2008-01-19  6:48 ` Andi Kleen
  7 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-19  6:48 UTC (permalink / raw)
  To: mingo, tglx, linux-kernel


This should decrease TLB pressure because the kernel will need
less TLB faults for its own data access.

Only done for 64bit because i386 does not support GB page tables.

This only applies to the data portion of the direct mapping; the
kernel text mapping stays with 2MB pages because the AMD Fam10h
microarchitecture does not support GB ITLBs and AMD recommends 
against using GB mappings for code.

Can be disabled with direct_gbpages=off

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/mm/init_64.c |   63 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 54 insertions(+), 9 deletions(-)

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -268,13 +268,20 @@ void early_iounmap(void *addr, unsigned 
 	__flush_tlb();
 }
 
+static unsigned long direct_entry(unsigned long paddr)
+{
+	unsigned long entry;
+	entry = __PAGE_KERNEL_LARGE|paddr;
+	entry &= __supported_pte_mask;
+	return entry;
+}
+
 static void __meminit
 phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
 {
 	int i = pmd_index(address);
 
 	for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {
-		unsigned long entry;
 		pmd_t *pmd = pmd_page + pmd_index(address);
 
 		if (address >= end) {
@@ -287,9 +294,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned 
 		if (pmd_val(*pmd))
 			continue;
 
-		entry = __PAGE_KERNEL_LARGE|_PAGE_GLOBAL|address;
-		entry &= __supported_pte_mask;
-		set_pmd(pmd, __pmd(entry));
+		set_pmd(pmd, __pmd(direct_entry(address)));
 	}
 }
 
@@ -317,7 +322,13 @@ static void __meminit phys_pud_init(pud_
 			break;
 
 		if (pud_val(*pud)) {
-			phys_pmd_update(pud, addr, end);
+			if (!pud_large(*pud))
+				phys_pmd_update(pud, addr, end);
+			continue;
+		}
+
+		if (direct_gbpages > 0) {
+			set_pud(pud, __pud(direct_entry(addr)));
 			continue;
 		}
 
@@ -336,9 +347,11 @@ static void __init find_early_table_spac
 	unsigned long puds, pmds, tables, start;
 
 	puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
-	pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
-	tables = round_up(puds * sizeof(pud_t), PAGE_SIZE) +
-		 round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+	tables = round_up(puds * sizeof(pud_t), PAGE_SIZE);
+	if (!direct_gbpages) {
+		pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+		tables += round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+	}
 
  	/* RED-PEN putting page tables only on node 0 could
  	   cause a hotspot and fill up ZONE_DMA. The page tables
@@ -373,8 +386,15 @@ void __init_refok init_memory_mapping(un
 	 * mapped.  Unfortunately this is done currently before the nodes are 
 	 * discovered.
 	 */
-	if (!after_bootmem)
+	if (!after_bootmem) {
+		if (direct_gbpages >= 0 && cpu_has_gbpages) {
+			printk(KERN_INFO "Using GB pages for direct mapping\n");
+			direct_gbpages = 1;
+		} else
+			direct_gbpages = 0;
+
 		find_early_table_space(end);
+	}
 
 	start = (unsigned long)__va(start);
 	end = (unsigned long)__va(end);
@@ -423,6 +443,27 @@ void __init paging_init(void)
 }
 #endif
 
+static void split_gb_page(pud_t *pud, unsigned long paddr)
+{
+	int i;
+	pmd_t *pmd;
+	struct page *p = alloc_page(GFP_KERNEL);
+	if (!p)
+		return;
+
+	Dprintk("split_gb_page %lx\n", paddr);
+
+	SetPagePrivate(p);
+	/* Set reference to 1 so that c_p_a() does not undo it */
+	page_private(p) = 1;
+
+	paddr &= PUD_PAGE_MASK;
+	pmd = page_address(p);
+	for (i = 0; i < PTRS_PER_PTE; i++, paddr += PMD_PAGE_SIZE)
+		pmd[i] = __pmd(direct_entry(paddr));
+	pud_populate(NULL, pud, pmd);
+}
+
 /* Unmap a kernel mapping if it exists. This is useful to avoid prefetches
    from the CPU leading to inconsistent cache lines. address and size
    must be aligned to 2MB boundaries. 
@@ -434,6 +475,8 @@ __clear_kernel_mapping(unsigned long add
 
 	BUG_ON(address & ~PMD_PAGE_MASK);
 	BUG_ON(size & ~PMD_PAGE_MASK);
+
+	Dprintk("clear_kernel_mapping %lx-%lx\n", address, address+size);
 	
 	for (; address < end; address += PMD_PAGE_SIZE) {
 		pgd_t *pgd = pgd_offset_k(address);
@@ -442,6 +485,8 @@ __clear_kernel_mapping(unsigned long add
 		if (pgd_none(*pgd))
 			continue;
 		pud = pud_offset(pgd, address);
+		if (pud_large(*pud))
+			split_gb_page(pud, __pa(address));
 		if (pud_none(*pud))
 			continue; 
 		pmd = pmd_offset(pud, address);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] [7/8] CPA: Implement GBpages support in change_page_attr()
  2008-01-19  6:48 ` [PATCH] [7/8] CPA: Implement GBpages support in change_page_attr() Andi Kleen
@ 2008-01-19 18:53   ` Ingo Molnar
  2008-01-19 19:27     ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2008-01-19 18:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: tglx, linux-kernel, H. Peter Anvin


* Andi Kleen <ak@suse.de> wrote:

>  arch/x86/mm/pageattr_64.c | 150 
>  ++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 119 insertions(+), 31 deletions(-)

please unify the files first, we dont want to let pageattr_32.c and 
pageattr_64.c diverge even more. Once we get these files unified we 
layer more features ontop of it. While gbpages is not available on 
32-bit and probably wont ever be, this code has been historically very 
fragile, so having a single codebase to look at is very important.

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] [7/8] CPA: Implement GBpages support in change_page_attr()
  2008-01-19 18:53   ` Ingo Molnar
@ 2008-01-19 19:27     ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-19 19:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: tglx, linux-kernel, H. Peter Anvin

On Saturday 19 January 2008 19:53:14 Ingo Molnar wrote:
> 
> * Andi Kleen <ak@suse.de> wrote:
> 
> >  arch/x86/mm/pageattr_64.c | 150 
> >  ++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 119 insertions(+), 31 deletions(-)
> 
> please unify the files first, we dont want to let pageattr_32.c and 
> pageattr_64.c diverge even more. Once we get these files unified we 
> layer more features ontop of it. 

They work significantly differently in a few important areas (e.g. particularly 
regarding NX  handling and the kernel mapping) Off the top of my 
head I don't know of a clean way to unify them. 32bit and 64bit kernel 
mappings differ in many significant ways. Maybe it's possible, but it's 
certainly not something I would want to tackle short term in a hurry.
The kernel mappings between 32bit and 64bit are quite different.

However I would like GBPAGES definitely to make the .25 merge window
and they won't work without the upgraded c_p_a().

So please reconsider.

> While gbpages is not available on  
> 32-bit and probably wont ever be, this code has been historically very 
> fragile, 

I'm sure a hurried up unification would make it even more fragile.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit
  2008-01-19  6:48 ` [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit Andi Kleen
@ 2008-01-23 21:26   ` Jan Engelhardt
  2008-01-24  6:57     ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2008-01-23 21:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: mingo, tglx, linux-kernel


On Jan 19 2008 07:48, Andi Kleen wrote:
>Subject: [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit

Is there already a flag for /proc/cpuinfo or could you add one?

>Index: linux/include/asm-x86/cpufeature.h
>===================================================================
>--- linux.orig/include/asm-x86/cpufeature.h
>+++ linux/include/asm-x86/cpufeature.h
>@@ -49,6 +49,7 @@
> #define X86_FEATURE_MP		(1*32+19) /* MP Capable. */
> #define X86_FEATURE_NX		(1*32+20) /* Execute Disable */
> #define X86_FEATURE_MMXEXT	(1*32+22) /* AMD MMX extensions */
>+#define X86_FEATURE_GBPAGES	(1*32+26) /* GB pages */
> #define X86_FEATURE_RDTSCP	(1*32+27) /* RDTSCP */
> #define X86_FEATURE_LM		(1*32+29) /* Long Mode (x86-64) */
> #define X86_FEATURE_3DNOWEXT	(1*32+30) /* AMD 3DNow! extensions */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit
  2008-01-23 21:26   ` Jan Engelhardt
@ 2008-01-24  6:57     ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2008-01-24  6:57 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: mingo, tglx, linux-kernel

On Wednesday 23 January 2008 22:26:35 Jan Engelhardt wrote:
> On Jan 19 2008 07:48, Andi Kleen wrote:
> >Subject: [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid
> > bit
>
> Is there already a flag for /proc/cpuinfo or could you add one?

There is already one called pdpe1gb.  I don't think it's a very clear name, 
although AMD calls it the same. Calling it gbpages in /proc/cpuinfo would
have been probably better (and my old original patch did that too), but I 
didn't catch the new name submitted by someone else in time.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-01-24  6:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-19  6:48 [PATCH] [0/8] GBpages support for x86-64, v2 Andi Kleen
2008-01-19  6:48 ` [PATCH] [1/8] Handle kernel near memory hole in clear_kernel_mapping Andi Kleen
2008-01-19  6:48 ` [PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit Andi Kleen
2008-01-23 21:26   ` Jan Engelhardt
2008-01-24  6:57     ` Andi Kleen
2008-01-19  6:48 ` [PATCH] [3/8] GBPAGES: Split LARGE_PAGE_SIZE/MASK into PUD_PAGE_SIZE/PMD_PAGE_SIZE Andi Kleen
2008-01-19  6:48 ` [PATCH] [4/8] Add pgtable accessor functions for GB pages Andi Kleen
2008-01-19  6:48 ` [PATCH] [5/8] GBPAGES: Support gbpages in pagetable dump Andi Kleen
2008-01-19  6:48 ` [PATCH] [6/8] Add an option to disable direct mapping gbpages and a global variable Andi Kleen
2008-01-19  6:48 ` [PATCH] [7/8] CPA: Implement GBpages support in change_page_attr() Andi Kleen
2008-01-19 18:53   ` Ingo Molnar
2008-01-19 19:27     ` Andi Kleen
2008-01-19  6:48 ` [PATCH] [8/8] GBPAGES: Do kernel direct mapping at boot using GB pages Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).