LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 00/11] PAT x86: PAT support for x86
@ 2008-01-10 18:48 venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 01/11] PAT x86: Make acpi/other drivers map memory instead of assuming identity map venkatesh.pallipadi
                   ` (10 more replies)
  0 siblings, 11 replies; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel

This series is heavily derived from the PAT patchset by Eric Biederman and
Andi Kleen.
http://www.firstfloor.org/pub/ak/x86_64/pat/

This patchset is a followup of "PAT support for X86_64"
http://www.ussg.iu.edu/hypermail/linux/kernel/0712.1/2268.html

Things changed from the above (Dec 13 2007) version:
* PAT mappings now used are - (0,WB) (1,WT) (2,WC) (3,UC).
* Covers both i386 and x86_64.
* Resolve the /sysfs issue by exporting wc and uc interfaces.
* Piggyback PAT initialization on existing MTRR initialization as they
  have same setup rules.
* Avoid early table allocation problem for x86_64 by doing the reserved
  region pruning later in the boot. Handle both memory identity mapping and
  kernel test mapping.
* Handle fork() and /dev/mem mapping and unmapping cases.

Patchset is against Ingo's x86 branch from 2 days ago. Will need some merging
effort with Andi's CPA changes and few other changes like pgtable.h unification.

Not mapping reserved region in identity map is a sort of big change that can
potentially have side-effects on drivers (especially on x86_64), that
assume entire address range is mapped in identity map and use __va to
access reserved regions instead of ioremap/early_ioremap. We have
changed few such common cases, but there can be more in /drivers land.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 01/11] PAT x86: Make acpi/other drivers map memory instead of assuming identity map
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text venkatesh.pallipadi
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: map_instead_of_va.patch --]
[-- Type: text/plain, Size: 14407 bytes --]


This series is heavily derived from the PAT patchset by Eric Biederman and
Andi Kleen.
http://www.firstfloor.org/pub/ak/x86_64/pat/

This patchset is a followup of "PAT support for X86_64"
http://www.ussg.iu.edu/hypermail/linux/kernel/0712.1/2268.html

Things changed from the above (Dec 13 2007) version:
* PAT mappings now used are - (0,WB) (1,WT) (2,WC) (3,UC).
* Covers both i386 and x86_64.
* Resolve the /sysfs issue by exporting wc and uc interfaces.
* Piggyback PAT initialization on existing MTRR initialization as they
  have same setup rules.
* Avoid early table allocation problem for x86_64 by doing the reserved
  region pruning later in the boot. Handle both memory identity mapping and
  kernel test mapping.
* Handle fork() and /dev/mem mapping and unmapping cases.

This patch:

Some boot code has assumptions about entire memory being mapped in identity
mapping. Fix them to use some form of mapping instead. Places fixed below:
* Generic __acpi_map_table
* Looking for RSD PTR at boot time
* Looking for mp table
* get_bios_ebda and ebda size
* pci-calgary (Compile tested only. Will be great if someone who has this
               hardware can verify that this change works fine)
              (This patch is testable as a standalone patch)

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Patchset is against Ingo's x86 branch from 2 days ago. Will need some merging
effort with Andi's CPA changes and few other changes like pgtable.h unification.


Index: linux-2.6.git/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/acpi/boot.c	2008-01-08 03:31:31.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/acpi/boot.c	2008-01-08 03:43:46.000000000 -0800
@@ -105,16 +105,20 @@
 
 #ifdef	CONFIG_X86_64
 
-/* rely on all ACPI tables being in the direct mapping */
 char *__acpi_map_table(unsigned long phys_addr, unsigned long size)
 {
 	if (!phys_addr || !size)
 		return NULL;
 
-	if (phys_addr+size <= (end_pfn_map << PAGE_SHIFT) + PAGE_SIZE)
-		return __va(phys_addr);
+	return early_ioremap(phys_addr, size);
+}
 
-	return NULL;
+void __acpi_unmap_table(void * addr, unsigned long size)
+{
+	if (!addr || !size)
+		return;
+
+	early_iounmap(addr, size);
 }
 
 #else
@@ -158,6 +162,11 @@
 
 	return ((unsigned char *)base + offset);
 }
+
+void __acpi_unmap_table(void * addr, unsigned long size)
+{
+}
+
 #endif
 
 #ifdef CONFIG_PCI_MMCONFIG
@@ -586,17 +595,23 @@
 {
 	unsigned long offset = 0;
 	unsigned long sig_len = sizeof("RSD PTR ") - 1;
+	char * virt_addr;
 
+	virt_addr = __acpi_map_table(start, length);
+	if (!virt_addr)
+		return 0;
 	/*
 	 * Scan all 16-byte boundaries of the physical memory region for the
 	 * RSDP signature.
 	 */
 	for (offset = 0; offset < length; offset += 16) {
-		if (strncmp((char *)(phys_to_virt(start) + offset), "RSD PTR ", sig_len))
+		if (strncmp(virt_addr + offset, "RSD PTR ", sig_len))
 			continue;
+		__acpi_unmap_table(virt_addr, length);
 		return (start + offset);
 	}
 
+	__acpi_unmap_table(virt_addr, length);
 	return 0;
 }
 
Index: linux-2.6.git/drivers/acpi/osl.c
===================================================================
--- linux-2.6.git.orig/drivers/acpi/osl.c	2008-01-08 03:31:31.000000000 -0800
+++ linux-2.6.git/drivers/acpi/osl.c	2008-01-08 03:43:46.000000000 -0800
@@ -231,6 +231,8 @@
 {
 	if (acpi_gbl_permanent_mmap) {
 		iounmap(virt);
+	} else {
+		__acpi_unmap_table(virt, size);
 	}
 }
 EXPORT_SYMBOL_GPL(acpi_os_unmap_memory);
Index: linux-2.6.git/include/linux/acpi.h
===================================================================
--- linux-2.6.git.orig/include/linux/acpi.h	2008-01-08 03:31:38.000000000 -0800
+++ linux-2.6.git/include/linux/acpi.h	2008-01-08 03:43:46.000000000 -0800
@@ -79,6 +79,7 @@
 typedef int (*acpi_table_entry_handler) (struct acpi_subtable_header *header, const unsigned long end);
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
+void __acpi_unmap_table (void * addr, unsigned long size);
 unsigned long acpi_find_rsdp (void);
 int acpi_boot_init (void);
 int acpi_boot_table_init (void);
Index: linux-2.6.git/arch/x86/kernel/mpparse_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/mpparse_64.c	2008-01-08 03:31:31.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/mpparse_64.c	2008-01-08 03:43:46.000000000 -0800
@@ -29,6 +29,7 @@
 #include <asm/io_apic.h>
 #include <asm/proto.h>
 #include <asm/acpi.h>
+#include <asm/bios_ebda.h>
 
 /* Have we found an MP table */
 int smp_found_config;
@@ -535,9 +536,12 @@
 static int __init smp_scan_config (unsigned long base, unsigned long length)
 {
 	extern void __bad_mpf_size(void); 
-	unsigned int *bp = phys_to_virt(base);
+	unsigned int *bp = (unsigned int *)__acpi_map_table(base, length);
 	struct intel_mp_floating *mpf;
 
+	if (!bp)
+		return 0;
+
 	Dprintk("Scan SMP from %p for %ld bytes.\n", bp,length);
 	if (sizeof(*mpf) != 16)
 		__bad_mpf_size();
@@ -555,11 +559,13 @@
 			if (mpf->mpf_physptr)
 				reserve_bootmem_generic(mpf->mpf_physptr, PAGE_SIZE);
 			mpf_found = mpf;
+			__acpi_unmap_table((char *)bp, length);
 			return 1;
 		}
 		bp += 4;
 		length -= 16;
 	}
+	__acpi_unmap_table((char *)bp, length);
 	return 0;
 }
 
@@ -592,11 +598,11 @@
 	 * should be fixed.
 	 */
 
-	address = *(unsigned short *)phys_to_virt(0x40E);
-	address <<= 4;
-	if (smp_scan_config(address, 0x1000))
+	address = get_bios_ebda();
+	if (address && smp_scan_config(address, 0x1000))
 		return;
 
+
 	/* If we have come this far, we did not find an MP table  */
 	 printk(KERN_INFO "No mptable found.\n");
 }
Index: linux-2.6.git/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/init_64.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/init_64.c	2008-01-08 03:43:46.000000000 -0800
@@ -208,7 +208,7 @@
 } 
 
 /* Must run before zap_low_mappings */
-__meminit void *early_ioremap(unsigned long addr, unsigned long size)
+void *early_ioremap(unsigned long addr, unsigned long size)
 {
 	unsigned long vaddr;
 	pmd_t *pmd, *last_pmd;
@@ -237,7 +237,7 @@
 }
 
 /* To avoid virtual aliases later */
-__meminit void early_iounmap(void *addr, unsigned long size)
+void early_iounmap(void *addr, unsigned long size)
 {
 	unsigned long vaddr;
 	pmd_t *pmd;
Index: linux-2.6.git/include/asm-x86/rio.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/rio.h	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/include/asm-x86/rio.h	2008-01-08 03:43:46.000000000 -0800
@@ -8,6 +8,8 @@
 #ifndef __ASM_RIO_H
 #define __ASM_RIO_H
 
+#include <asm/bios_ebda.h>
+
 #define RIO_TABLE_VERSION	3
 
 struct rio_table_hdr {
@@ -60,15 +62,4 @@
 	ALT_CALGARY	= 5,  /* Second Planar Calgary      */
 };
 
-/*
- * there is a real-mode segmented pointer pointing to the
- * 4K EBDA area at 0x40E.
- */
-static inline unsigned long get_bios_ebda(void)
-{
-	unsigned long address = *(unsigned short *)phys_to_virt(0x40EUL);
-	address <<= 4;
-	return address;
-}
-
 #endif /* __ASM_RIO_H */
Index: linux-2.6.git/arch/x86/kernel/mpparse_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/mpparse_32.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/mpparse_32.c	2008-01-08 03:43:46.000000000 -0800
@@ -27,11 +27,11 @@
 #include <asm/mtrr.h>
 #include <asm/mpspec.h>
 #include <asm/io_apic.h>
+#include <asm/bios_ebda.h>
 
 #include <mach_apic.h>
 #include <mach_apicdef.h>
 #include <mach_mpparse.h>
-#include <bios_ebda.h>
 
 /* Have we found an MP table */
 int smp_found_config;
@@ -718,9 +718,12 @@
 
 static int __init smp_scan_config (unsigned long base, unsigned long length)
 {
-	unsigned long *bp = phys_to_virt(base);
+	unsigned long *bp = (unsigned long *)__acpi_map_table(base, length);
 	struct intel_mp_floating *mpf;
 
+	if (!bp)
+		return 0;
+
 	Dprintk("Scan SMP from %p for %ld bytes.\n", bp,length);
 	if (sizeof(*mpf) != 16)
 		printk("Error: MPF size\n");
@@ -755,11 +758,13 @@
 			}
 
 			mpf_found = mpf;
+			__acpi_unmap_table((char *)bp, length);
 			return 1;
 		}
 		bp += 4;
 		length -= 16;
 	}
+	__acpi_unmap_table((char *)bp, length);
 	return 0;
 }
 
Index: linux-2.6.git/arch/x86/kernel/pci-calgary_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/pci-calgary_64.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/pci-calgary_64.c	2008-01-08 03:43:46.000000000 -0800
@@ -1180,6 +1180,7 @@
 		}
 	}
 
+	early_iounmap((void *)rio_table_hdr, sizeof(struct rio_table_hdr));
 	return 0;
 
 error:
@@ -1188,6 +1189,7 @@
 		if (bus_info[bus].bbar)
 			iounmap(bus_info[bus].bbar);
 
+	early_iounmap((void *)rio_table_hdr, sizeof(struct rio_table_hdr));
 	return ret;
 }
 
@@ -1337,7 +1339,8 @@
 	int bus;
 	void *tbl;
 	int calgary_found = 0;
-	unsigned long ptr;
+	unsigned long addr;
+	unsigned short *ptr;
 	unsigned int offset, prev_offset;
 	int ret;
 
@@ -1356,7 +1359,9 @@
 
 	printk(KERN_DEBUG "Calgary: detecting Calgary via BIOS EBDA area\n");
 
-	ptr = (unsigned long)phys_to_virt(get_bios_ebda());
+	addr = get_bios_ebda();
+	if (!addr)
+		return;
 
 	rio_table_hdr = NULL;
 	prev_offset = 0;
@@ -1366,14 +1371,22 @@
 	 * Only parse up until the offset increases:
 	 */
 	while (offset > prev_offset) {
+		ptr = early_ioremap(addr + offset, 4);
+		if (!ptr)
+			break;
+
 		/* The block id is stored in the 2nd word */
-		if (*((unsigned short *)(ptr + offset + 2)) == 0x4752){
+		if (ptr[1] == 0x4752){
+			early_iounmap(ptr, 4);
 			/* set the pointer past the offset & block id */
-			rio_table_hdr = (struct rio_table_hdr *)(ptr + offset + 4);
+			ptr = early_ioremap(addr + offset + 4,
+			              sizeof(struct rio_table_hdr));
+			rio_table_hdr = (struct rio_table_hdr *)ptr;
 			break;
 		}
 		prev_offset = offset;
-		offset = *((unsigned short *)(ptr + offset));
+		offset = ptr[0];
+		early_iounmap(ptr, 4);
 	}
 	if (!rio_table_hdr) {
 		printk(KERN_DEBUG "Calgary: Unable to locate Rio Grande table "
@@ -1384,6 +1397,8 @@
 	ret = build_detail_arrays();
 	if (ret) {
 		printk(KERN_DEBUG "Calgary: build_detail_arrays ret %d\n", ret);
+		early_iounmap((void *)rio_table_hdr,
+		              sizeof(struct rio_table_hdr));
 		return;
 	}
 
@@ -1423,6 +1438,10 @@
 		printk(KERN_INFO "PCI-DMA: Calgary TCE table spec is %d, "
 		       "CONFIG_IOMMU_DEBUG is %s.\n", specified_table_size,
 		       debugging ? "enabled" : "disabled");
+		/* rio_table_hdr will be unmapped in calgary_locate_bbars() */
+	} else {
+		early_iounmap((void *)rio_table_hdr,
+		              sizeof(struct rio_table_hdr));
 	}
 	return;
 
@@ -1433,6 +1452,7 @@
 		if (info->tce_space)
 			free_tce_table(info->tce_space);
 	}
+	early_iounmap((void *)rio_table_hdr, sizeof(struct rio_table_hdr));
 }
 
 int __init calgary_iommu_init(void)
Index: linux-2.6.git/arch/x86/kernel/setup_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/setup_32.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/setup_32.c	2008-01-08 03:43:46.000000000 -0800
@@ -61,7 +61,7 @@
 #include <asm/io.h>
 #include <asm/vmi.h>
 #include <setup_arch.h>
-#include <bios_ebda.h>
+#include <asm/bios_ebda.h>
 #include <asm/cacheflush.h>
 
 /* This value is set up by the early boot code to point to the value
Index: linux-2.6.git/arch/x86/kernel/setup_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/setup_64.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/setup_64.c	2008-01-08 03:47:57.000000000 -0800
@@ -62,6 +62,7 @@
 #include <asm/sections.h>
 #include <asm/dmi.h>
 #include <asm/cacheflush.h>
+#include <asm/bios_ebda.h>
 #include <asm/mce.h>
 #include <asm/ds.h>
 
@@ -243,31 +244,36 @@
 {}
 #endif
 
-#define EBDA_ADDR_POINTER 0x40E
-
 unsigned __initdata ebda_addr;
 unsigned __initdata ebda_size;
 
 static void discover_ebda(void)
 {
+	unsigned short *ptr;
 	/*
 	 * there is a real-mode segmented pointer pointing to the
 	 * 4K EBDA area at 0x40E
 	 */
-	ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
 	/*
 	 * There can be some situations, like paravirtualized guests,
 	 * in which there is no available ebda information. In such
 	 * case, just skip it
 	 */
+
+	ebda_addr = get_bios_ebda();
 	if (!ebda_addr) {
 		ebda_size = 0;
 		return;
 	}
 
-	ebda_addr <<= 4;
-
-	ebda_size = *(unsigned short *)__va(ebda_addr);
+	ptr = (unsigned short *)__acpi_map_table(ebda_addr, 2);
+	if (!ptr) {
+		ebda_addr = 0;
+		ebda_size = 0;
+		return;
+	}
+	ebda_size = *(unsigned short *)ptr;
+	__acpi_unmap_table((char *)ptr, 2);
 
 	/* Round EBDA up to pages */
 	if (ebda_size == 0)
Index: linux-2.6.git/include/asm-x86/bios_ebda.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/include/asm-x86/bios_ebda.h	2008-01-08 03:43:46.000000000 -0800
@@ -0,0 +1,26 @@
+
+#ifndef _BIOS_EBDA_H
+#define _BIOS_EBDA_H
+
+#include <linux/acpi.h>
+
+/*
+ * there is a real-mode segmented pointer pointing to the
+ * 4K EBDA area at 0x40E.
+ */
+static inline unsigned long get_bios_ebda(void)
+{
+	unsigned short	*bp;
+	unsigned long address;
+	bp = (unsigned short *)__acpi_map_table(0x40EUL, 2);
+	if (!bp)
+		return 0;
+
+	address = *bp;
+	address <<= 4;
+	__acpi_unmap_table((char *)bp, 2);
+
+	return address;
+}
+
+#endif /* _MACH_BIOS_EBDA_H */
Index: linux-2.6.git/include/asm-x86/mach-default/bios_ebda.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/mach-default/bios_ebda.h	2008-01-08 03:31:38.000000000 -0800
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,15 +0,0 @@
-#ifndef _MACH_BIOS_EBDA_H
-#define _MACH_BIOS_EBDA_H
-
-/*
- * there is a real-mode segmented pointer pointing to the
- * 4K EBDA area at 0x40E.
- */
-static inline unsigned int get_bios_ebda(void)
-{
-	unsigned int address = *(unsigned short *)phys_to_virt(0x40E);
-	address <<= 4;
-	return address;	/* 0 means none */
-}
-
-#endif /* _MACH_BIOS_EBDA_H */

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 01/11] PAT x86: Make acpi/other drivers map memory instead of assuming identity map venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 19:06   ` Andi Kleen
  2008-01-10 21:05   ` Linus Torvalds
  2008-01-10 18:48 ` [patch 03/11] PAT x86: Map only usable memory in i386 identity map venkatesh.pallipadi
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: usable_only_map.patch --]
[-- Type: text/plain, Size: 12057 bytes --]

x86_64: Map only usable memory in identity map. All reserved memory maps to a
zero page. This is done later during the boot process, by pruning the
page table setup earlier to remove mappings for the reserved region. Prune
done after mem_init, so we can allocate pages as needed and before APs start.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

Index: linux-2.6.git/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/e820_64.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/e820_64.c	2008-01-08 04:00:59.000000000 -0800
@@ -121,6 +121,35 @@
 }
 EXPORT_SYMBOL_GPL(e820_any_mapped);
 
+int e820_any_non_reserved(unsigned long start, unsigned long end)
+{
+	int i;
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		if (ei->type == E820_RESERVED)
+			continue;
+		if (ei->addr >= end || ei->addr + ei->size <= start)
+			continue;
+		return 1;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(e820_any_non_reserved);
+
+int is_memory_any_valid(unsigned long start, unsigned long end)
+{
+	/*
+	 * Keep low PCI/ISA area always mapped.
+	 * Note: end address is exclusive and start is inclusive here
+	 */
+	if (start >= ISA_START_ADDRESS && end <= ISA_END_ADDRESS)
+		return 1;
+
+	/* Switch to efi or e820 in future here */
+	return e820_any_non_reserved(start, end);
+}
+EXPORT_SYMBOL_GPL(is_memory_any_valid);
+
 /*
  * This function checks if the entire range <start,end> is mapped with type.
  *
@@ -156,6 +185,47 @@
 	return 0;
 }
 
+int e820_all_non_reserved(unsigned long start, unsigned long end)
+{
+	int i;
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		if (ei->type == E820_RESERVED)
+			continue;
+
+		/* is the region (part) in overlap with the current region ?*/
+		if (ei->addr >= end || ei->addr + ei->size <= start)
+			continue;
+
+		/*
+		 * if the region is at the beginning of <start,end> we move
+		 * start to the end of the region since it's ok until there
+		 */
+		if (ei->addr <= start)
+			start = ei->addr + ei->size;
+
+		/* if start is at or beyond end, we're done, full coverage */
+		if (start >= end)
+			return 1; /* we're done */
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(e820_all_non_reserved);
+
+int is_memory_all_valid(unsigned long start, unsigned long end)
+{
+	/*
+	 * Keep low PCI/ISA area always mapped.
+ 	 * Note: end address is exclusive and start is inclusive here
+	 */
+	if (start >= ISA_START_ADDRESS && end <= ISA_END_ADDRESS)
+		return 1;
+
+	/* Switch to efi or e820 in future here */
+	return e820_all_non_reserved(start, end);
+}
+EXPORT_SYMBOL_GPL(is_memory_all_valid);
+
 /*
  * Find a free area in a specific range.
  */
Index: linux-2.6.git/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/init_64.c	2008-01-08 03:43:46.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/init_64.c	2008-01-08 03:59:28.000000000 -0800
@@ -215,8 +215,9 @@
 	int i, pmds;
 
 	pmds = ((addr & ~PMD_MASK) + size + ~PMD_MASK) / PMD_SIZE;
-	vaddr = __START_KERNEL_map;
-	pmd = level2_kernel_pgt;
+	/* Skip PMDs meant for kernel text */
+	vaddr = __START_KERNEL_map + KERNEL_TEXT_SIZE;
+	pmd = level2_kernel_pgt + (KERNEL_TEXT_SIZE / PMD_SIZE);
 	last_pmd = level2_kernel_pgt + PTRS_PER_PMD - 1;
 	for (; pmd <= last_pmd; pmd++, vaddr += PMD_SIZE) {
 		for (i = 0; i < pmds; i++) {
@@ -299,11 +300,6 @@
 		if (addr >= end)
 			break;
 
-		if (!after_bootmem && !e820_any_mapped(addr,addr+PUD_SIZE,0)) {
-			set_pud(pud, __pud(0)); 
-			continue;
-		} 
-
 		if (pud_val(*pud)) {
 			phys_pmd_update(pud, addr, end);
 			continue;
@@ -344,6 +340,8 @@
 		(table_start << PAGE_SHIFT) + tables);
 }
 
+static unsigned long max_addr;
+
 /* Setup the direct mapping of the physical memory at PAGE_OFFSET.
    This runs before bootmem is initialized and gets pages directly from the 
    physical memory. To access them they are temporarily mapped. */
@@ -370,10 +368,13 @@
 		pgd_t *pgd = pgd_offset_k(start);
 		pud_t *pud;
 
-		if (after_bootmem)
+		if (after_bootmem) {
 			pud = pud_offset(pgd, start & PGDIR_MASK);
-		else
+		} else {
 			pud = alloc_low_page(&pud_phys);
+			if (end > max_addr)
+				max_addr = end;
+		}
 
 		next = start + PGDIR_SIZE;
 		if (next > end) 
@@ -489,6 +490,187 @@
 static struct kcore_list kcore_mem, kcore_vmalloc, kcore_kernel, kcore_modules,
 			 kcore_vsyscall;
 
+
+static unsigned long __init get_res_page(void)
+{
+	static unsigned long res_phys_page;
+	if (!res_phys_page) {
+		pte_t *pte;
+		pte = alloc_low_page(&res_phys_page);
+		unmap_low_page(pte);
+	}
+	return res_phys_page;
+}
+
+static unsigned long __init get_res_ptepage(void)
+{
+	static unsigned long res_phys_ptepage;
+	if (!res_phys_ptepage) {
+		pte_t *pte_page;
+		unsigned long page_phys;
+		unsigned long entry;
+		int i;
+
+		pte_page = alloc_low_page(&res_phys_ptepage);
+
+		page_phys = get_res_page();
+		entry = _PAGE_NX | _KERNPG_TABLE | _PAGE_GLOBAL | page_phys;
+		entry &= __supported_pte_mask;
+		for (i = 0; i < PTRS_PER_PTE; i++) {
+			pte_t *pte = pte_page + i;
+			set_pte(pte, __pte(entry));
+		}
+
+		unmap_low_page(pte_page);
+	}
+	return res_phys_ptepage;
+}
+
+static void __init phys_pte_prune(pte_t *pte_page, unsigned long address,
+		unsigned long end, unsigned long vaddr, unsigned int exec)
+{
+	int i = pte_index(vaddr);
+
+	for (; i < PTRS_PER_PTE; i++, address = (address & PAGE_MASK) + PAGE_SIZE, vaddr = (vaddr + PAGE_MASK) + PAGE_SIZE) {
+		unsigned long entry;
+		pte_t *pte = pte_page + i;
+
+		if (address >= end)
+			break;
+
+		if (pte_val(*pte))
+			continue;
+
+		/* Nothing to map. Map the null page */
+		if (!(address & (~PAGE_MASK)) &&
+		    (address + PAGE_SIZE <= end) &&
+		    !is_memory_any_valid(address, address + PAGE_SIZE)) {
+			unsigned long phys_page;
+
+			phys_page = get_res_page();
+			entry = _PAGE_NX | _KERNPG_TABLE | _PAGE_GLOBAL |
+				phys_page;
+
+			entry &= __supported_pte_mask;
+			set_pte(pte, __pte(entry));
+
+			continue;
+		}
+
+		if (exec)
+			entry = _PAGE_NX|_KERNPG_TABLE|_PAGE_GLOBAL|address;
+		else
+			entry = _KERNPG_TABLE|_PAGE_GLOBAL|address;
+		entry &= __supported_pte_mask;
+		set_pte(pte, __pte(entry));
+	}
+}
+
+static void __init phys_pmd_prune(pmd_t *pmd_page, unsigned long address,
+		unsigned long end, unsigned long vaddr, unsigned int exec)
+{
+	int i = pmd_index(vaddr);
+
+	for (; i < PTRS_PER_PMD; i++, address = (address & PMD_MASK) + PMD_SIZE,
+			vaddr = (vaddr & PMD_MASK) + PMD_SIZE) {
+		pmd_t *pmd = pmd_page + i;
+		pte_t *pte;
+		unsigned long pte_phys;
+
+		if (address >= end)
+			break;
+
+		if (!pmd_val(*pmd))
+			continue;
+
+		/* Nothing to map. Map the null page */
+		if (!(address & (~PMD_MASK)) &&
+		    (address + PMD_SIZE <= end) &&
+		    !is_memory_any_valid(address, address + PMD_SIZE)) {
+
+			pte_phys = get_res_ptepage();
+			set_pmd(pmd, __pmd(pte_phys | _KERNPG_TABLE));
+
+			continue;
+		}
+
+		/* Map with 2M pages */
+		if (is_memory_all_valid(address, address + PUD_SIZE)) {
+			/* Init already done */
+			continue;
+		}
+
+		/* Map with 4k pages */
+		pte = alloc_low_page(&pte_phys);
+		phys_pte_prune(pte, address, address + PMD_SIZE, vaddr, exec);
+		set_pmd(pmd, __pmd(pte_phys | _KERNPG_TABLE));
+		unmap_low_page(pte);
+
+	}
+}
+
+static void __init phys_pud_prune(pud_t *pud_page, unsigned long addr,
+	       unsigned long end, unsigned long vaddr, unsigned int exec)
+{
+	int i = pud_index(vaddr);
+
+	for (; i < PTRS_PER_PUD; i++, addr = (addr & PUD_MASK) + PUD_SIZE,
+			vaddr = (vaddr & PUD_MASK) + PUD_SIZE) {
+		pud_t *pud = pud_page + i;
+
+		if (addr >= end)
+			break;
+
+		if (pud_val(*pud)) {
+			pmd_t *pmd = pmd_offset(pud,0);
+			phys_pmd_prune(pmd, addr, end, vaddr, exec);
+		}
+	}
+}
+
+void __init prune_reserved_region_maps(void)
+{
+	unsigned long start, end, next;
+
+	/* Prune physical memory identity map */
+	start = (unsigned long)__va(0);
+	end = max_addr;
+	for (; start < end; start = next) {
+		pgd_t *pgd = pgd_offset_k(start);
+		pud_t *pud;
+
+		pud = pud_offset(pgd, start & PGDIR_MASK);
+
+		next = start + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		phys_pud_prune(pud, __pa(start), __pa(next), start, 0);
+	}
+
+	/* Prune kernel text region */
+	start = (unsigned long)KERNEL_TEXT_START;
+	end = start + (unsigned long)KERNEL_TEXT_SIZE;
+	for (; start < end; start = next) {
+		pgd_t *pgd = pgd_offset_k(start);
+		pud_t *pud;
+
+		pud = pud_offset(pgd, start & PGDIR_MASK);
+
+		next = (start & PGDIR_MASK) + (unsigned long)PGDIR_SIZE;
+		if (!next || next > end)
+			next = end;
+
+		phys_pud_prune(pud,
+		               start - (unsigned long)KERNEL_TEXT_START,
+		               next - (unsigned long)KERNEL_TEXT_START,
+			       start,
+			       1);
+	}
+
+	__flush_tlb();
+}
+
 void __init mem_init(void)
 {
 	long codesize, reservedpages, datasize, initsize;
@@ -538,6 +720,8 @@
 		reservedpages << (PAGE_SHIFT-10),
 		datasize >> 10,
 		initsize >> 10);
+
+	prune_reserved_region_maps();
 }
 
 void free_init_pages(char *what, unsigned long begin, unsigned long end)
Index: linux-2.6.git/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_64.c	2008-01-08 03:59:28.000000000 -0800
@@ -19,6 +19,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
 #include <asm/proto.h>
+#include <asm/e820.h>
 
 unsigned long __phys_addr(unsigned long x)
 {
@@ -28,9 +29,6 @@
 }
 EXPORT_SYMBOL(__phys_addr);
 
-#define ISA_START_ADDRESS      0xa0000
-#define ISA_END_ADDRESS                0x100000
-
 /*
  * Fix up the linear direct mapping of the kernel to avoid cache attribute
  * conflicts.
Index: linux-2.6.git/arch/x86/mm/pageattr_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/pageattr_64.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/pageattr_64.c	2008-01-08 04:03:33.000000000 -0800
@@ -53,9 +53,11 @@
 	/*
 	 * page_private is used to track the number of entries in
 	 * the page table page have non standard attributes.
+	 * Count of 1 indicates page split by split_large_page(),
+	 * additional count indicates the number of pages with non-std attr.
 	 */
 	SetPagePrivate(base);
-	page_private(base) = 0;
+	page_private(base) = 1;
 
 	address = __pa(address);
 	addr = address & LARGE_PAGE_MASK;
@@ -176,11 +178,8 @@
 			BUG();
 	}
 
-	/* on x86-64 the direct mapping set at boot is not using 4k pages */
-	BUG_ON(PageReserved(kpte_page));
-
 	save_page(kpte_page);
-	if (page_private(kpte_page) == 0)
+	if (page_private(kpte_page) == 1)
 		revert_page(address, ref_prot);
 	return 0;
 }
Index: linux-2.6.git/include/asm-x86/e820_64.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/e820_64.h	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/include/asm-x86/e820_64.h	2008-01-08 03:59:28.000000000 -0800
@@ -26,6 +26,10 @@
 extern void e820_mark_nosave_regions(void);
 extern int e820_any_mapped(unsigned long start, unsigned long end, unsigned type);
 extern int e820_all_mapped(unsigned long start, unsigned long end, unsigned type);
+extern int e820_any_non_reserved(unsigned long start, unsigned long end);
+extern int is_memory_any_valid(unsigned long start, unsigned long end);
+extern int e820_all_non_reserved(unsigned long start, unsigned long end);
+extern int is_memory_all_valid(unsigned long start, unsigned long end);
 extern unsigned long e820_hole_size(unsigned long start, unsigned long end);
 
 extern void e820_setup_gap(void);
@@ -38,6 +42,10 @@
 
 extern unsigned ebda_addr, ebda_size;
 extern unsigned long nodemap_addr, nodemap_size;
+
+#define ISA_START_ADDRESS	0xa0000
+#define ISA_END_ADDRESS		0x100000
+
 #endif/*!__ASSEMBLY__*/
 
 #endif/*__E820_HEADER*/

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 03/11] PAT x86: Map only usable memory in i386 identity map
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 01/11] PAT x86: Make acpi/other drivers map memory instead of assuming identity map venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 19:10   ` Andi Kleen
  2008-01-10 18:48 ` [patch 04/11] PAT x86: Basic PAT implementation venkatesh.pallipadi
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: usable_only_map_i386.patch --]
[-- Type: text/plain, Size: 9353 bytes --]

i386: Map only usable memory in identity map. Reserved memory maps to a
zero page.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

Index: linux-2.6.git/arch/x86/kernel/e820_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/e820_32.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/e820_32.c	2008-01-08 04:05:24.000000000 -0800
@@ -673,14 +673,42 @@
 }
 EXPORT_SYMBOL_GPL(e820_any_mapped);
 
+int e820_any_non_reserved(u64 start, u64 end)
+{
+	int i;
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		if (ei->type == E820_RESERVED)
+			continue;
+		if (ei->addr >= end || ei->addr + ei->size <= start)
+			continue;
+		return 1;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(e820_any_non_reserved);
+
+int is_memory_any_valid(u64 start, u64 end)
+{
+	/*
+	 * Keep low PCI/ISA area always mapped.
+	 * Note: end address is exclusive and start is inclusive here
+	 */
+	if (start >= ISA_START_ADDRESS && end <= ISA_END_ADDRESS)
+		return 1;
+
+	/* Switch to efi or e820 in future here */
+	return e820_any_non_reserved(start, end);
+}
+EXPORT_SYMBOL_GPL(is_memory_any_valid);
+
  /*
   * This function checks if the entire range <start,end> is mapped with type.
   *
   * Note: this function only works correct if the e820 table is sorted and
   * not-overlapping, which is the case
   */
-int __init
-e820_all_mapped(unsigned long s, unsigned long e, unsigned type)
+int e820_all_mapped(u64 s, u64 e, unsigned type)
 {
 	u64 start = s;
 	u64 end = e;
@@ -705,6 +733,56 @@
 	return 0;
 }
 
+int e820_all_non_reserved(u64 start, u64 end)
+{
+	int i;
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		if (ei->type == E820_RESERVED)
+			continue;
+
+		/* is the region (part) in overlap with the current region ?*/
+		if (ei->addr >= end || ei->addr + ei->size <= start)
+			continue;
+
+		/*
+		 * if the region is at the beginning of <start,end> we move
+		 * start to the end of the region since it's ok until there
+		 */
+		if (ei->addr <= start)
+			start = ei->addr + ei->size;
+
+		/* if start is at or beyond end, we're done, full coverage */
+		if (start >= end)
+			return 1; /* we're done */
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(e820_all_non_reserved);
+
+int is_memory_all_valid(u64 start, u64 end)
+{
+	/*
+	 * Keep low PCI/ISA area always mapped.
+	 * Note: end address is exclusive and start is inclusive here
+	 */
+	if (start >= ISA_START_ADDRESS && end <= ISA_END_ADDRESS)
+		return 1;
+
+	/* Switch to efi or e820 in future here */
+	return e820_all_non_reserved(start, end);
+}
+EXPORT_SYMBOL_GPL(is_memory_all_valid);
+
+int is_memory_all_reserved(u64 start, u64 end)
+{
+	/* Switch to efi or e820 in future here */
+	if (e820_all_mapped(start, end, E820_RESERVED) == 1)
+		return 1;
+	return !is_memory_any_valid(start, end);
+}
+EXPORT_SYMBOL_GPL(is_memory_all_reserved);
+
 static int __init parse_memmap(char *arg)
 {
 	if (!arg)
Index: linux-2.6.git/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/init_32.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/init_32.c	2008-01-08 04:20:44.000000000 -0800
@@ -143,6 +143,50 @@
 	return 0;
 }
 
+static unsigned long __init get_res_page(void)
+{
+	static unsigned long res_phys_page;
+	if (!res_phys_page) {
+
+		res_phys_page = (unsigned long)
+		                alloc_bootmem_low_pages(PAGE_SIZE);
+		if (!res_phys_page)
+			BUG();
+
+		memset((char *)res_phys_page, 0xe, PAGE_SIZE);
+		res_phys_page = __pa(res_phys_page);
+	}
+	return res_phys_page;
+}
+
+static unsigned long __init get_res_ptepage(void)
+{
+	static unsigned long res_phys_ptepage;
+	pte_t *pte;
+	int pte_ofs;
+	unsigned long pfn;
+
+	if (!res_phys_ptepage) {
+
+		res_phys_ptepage = (unsigned long)
+		                   alloc_bootmem_low_pages(PAGE_SIZE);
+		if (!res_phys_ptepage)
+			BUG();
+
+		paravirt_alloc_pt(&init_mm,
+		                  __pa(res_phys_ptepage) >> PAGE_SHIFT);
+
+		/* Set all PTEs in the range to zero page */
+		pfn = get_res_page() >> PAGE_SHIFT;
+		pte = (pte_t *)res_phys_ptepage;
+		for (pte_ofs = 0; pte_ofs < PTRS_PER_PTE; pte++, pte_ofs++)
+			set_pte(pte, pfn_pte(pfn, PAGE_KERNEL));
+
+		res_phys_ptepage = __pa(res_phys_ptepage);
+	}
+	return res_phys_ptepage;
+}
+
 /*
  * This maps the physical memory to kernel virtual address space, a total 
  * of max_low_pfn pages, by creating page tables starting from address 
@@ -155,6 +199,7 @@
 	pmd_t *pmd;
 	pte_t *pte;
 	int pgd_idx, pmd_idx, pte_ofs;
+	unsigned long temp_pfn;
 
 	pgd_idx = pgd_index(PAGE_OFFSET);
 	pgd = pgd_base + pgd_idx;
@@ -168,15 +213,19 @@
 		     pmd_idx < PTRS_PER_PMD && pfn < max_low_pfn;
 		     pmd++, pmd_idx++) {
 			unsigned int address = pfn * PAGE_SIZE + PAGE_OFFSET;
+			unsigned int paddr = pfn * PAGE_SIZE;
 
-			/* Map with big pages if possible, otherwise
-			   create normal page tables. */
-			if (cpu_has_pse) {
+			/*
+			 * Map with big pages if possible, otherwise create
+			 * normal page tables.
+			 */
+			if (cpu_has_pse &&
+			    is_memory_all_valid(paddr, paddr + PMD_SIZE)) {
 				unsigned int address2;
 				pgprot_t prot = PAGE_KERNEL_LARGE;
 
-				address2 = (pfn + PTRS_PER_PTE - 1) * PAGE_SIZE +
-					PAGE_OFFSET + PAGE_SIZE-1;
+				address2 = (pfn + PTRS_PER_PTE) * PAGE_SIZE +
+				           PAGE_OFFSET - 1;
 
 				if (is_kernel_text(address) ||
 				    is_kernel_text(address2))
@@ -185,19 +234,42 @@
 				set_pmd(pmd, pfn_pmd(pfn, prot));
 
 				pfn += PTRS_PER_PTE;
-			} else {
-				pte = one_page_table_init(pmd);
-
-				for (pte_ofs = 0;
-				     pte_ofs < PTRS_PER_PTE && pfn < max_low_pfn;
-				     pte++, pfn++, pte_ofs++, address += PAGE_SIZE) {
-					pgprot_t prot = PAGE_KERNEL;
+				continue;
+			}
+			if (cpu_has_pse &&
+			    !is_memory_any_valid(paddr, paddr + PMD_SIZE)) {
 
-					if (is_kernel_text(address))
-						prot = PAGE_KERNEL_EXEC;
+				temp_pfn = get_res_ptepage();
+				set_pmd(pmd, __pmd(temp_pfn | _PAGE_TABLE));
+				pfn += PTRS_PER_PTE;
+				continue;
+			}
 
-					set_pte(pte, pfn_pte(pfn, prot));
+			/*
+			 * Either !cpu_has_pse or we have some reserved holes
+			 * in the memory region
+			 */
+			pte = one_page_table_init(pmd);
+
+			for (pte_ofs = 0;
+			     pte_ofs < PTRS_PER_PTE && pfn < max_low_pfn;
+			     pte++, pfn++, pte_ofs++,
+			     address += PAGE_SIZE, paddr += PAGE_SIZE) {
+				pgprot_t prot = PAGE_KERNEL;
+
+				if (!is_memory_any_valid(paddr,
+				                         paddr + PAGE_SIZE)) {
+
+					temp_pfn = get_res_page() >> PAGE_SHIFT;
+					set_pte(pte,
+					        pfn_pte(temp_pfn, PAGE_KERNEL));
+					continue;
 				}
+
+				if (is_kernel_text(address))
+					prot = PAGE_KERNEL_EXEC;
+
+				set_pte(pte, pfn_pte(pfn, prot));
 			}
 		}
 	}
@@ -713,6 +785,8 @@
 	if (boot_cpu_data.wp_works_ok < 0)
 		test_wp_bit();
 
+	printk("Prune to be done here. max_low_pfn %lu\n", max_low_pfn);
+	//prune_kernel_identity_map();
 	/*
 	 * Subtle. SMP is doing it's boot stuff late (because it has to
 	 * fork idle threads) - but it also needs low mappings for the
Index: linux-2.6.git/include/asm-x86/e820_32.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/e820_32.h	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/include/asm-x86/e820_32.h	2008-01-08 04:05:24.000000000 -0800
@@ -20,9 +20,13 @@
 
 extern struct e820map e820;
 
-extern int e820_all_mapped(unsigned long start, unsigned long end,
-			   unsigned type);
+extern int e820_all_mapped(u64 start, u64 end, unsigned type);
 extern int e820_any_mapped(u64 start, u64 end, unsigned type);
+extern int e820_any_non_reserved(u64 start, u64 end);
+extern int is_memory_any_valid(u64 start, u64 end);
+extern int e820_all_non_reserved(u64 start, u64 end);
+extern int is_memory_all_valid(u64 start, u64 end);
+extern int is_memory_all_reserved(u64 start, u64 end);
 extern void find_max_pfn(void);
 extern void register_bootmem_low_pages(unsigned long max_low_pfn);
 extern void e820_register_memory(void);
@@ -41,5 +45,8 @@
 #endif
 
 
+#define ISA_START_ADDRESS	0xa0000
+#define ISA_END_ADDRESS		0x100000
+
 #endif/*!__ASSEMBLY__*/
 #endif/*__E820_HEADER*/
Index: linux-2.6.git/arch/x86/mm/pageattr_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/pageattr_32.c	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/pageattr_32.c	2008-01-08 04:05:24.000000000 -0800
@@ -55,9 +55,11 @@
 	/*
 	 * page_private is used to track the number of entries in
 	 * the page table page that have non standard attributes.
+	 * Count of 1 indicates page split by split_large_page(),
+	 * additional count indicates the number of pages with non-std attr.
 	 */
 	SetPagePrivate(base);
-	page_private(base) = 0;
+	page_private(base) = 1;
 
 	address = __pa(address);
 	addr = address & LARGE_PAGE_MASK;
@@ -203,7 +205,7 @@
 
 	save_page(kpte_page);
 	if (!PageReserved(kpte_page)) {
-		if (cpu_has_pse && (page_private(kpte_page) == 0)) {
+		if (cpu_has_pse && (page_private(kpte_page) == 1)) {
 			paravirt_release_pt(page_to_pfn(kpte_page));
 			revert_page(kpte_page, address);
 		}

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 04/11] PAT x86: Basic PAT implementation
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
                   ` (2 preceding siblings ...)
  2008-01-10 18:48 ` [patch 03/11] PAT x86: Map only usable memory in i386 identity map venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 05/11] PAT x86: drm driver changes for PAT venkatesh.pallipadi
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pat-base.patch --]
[-- Type: text/plain, Size: 15749 bytes --]

Originally based on a patch from Eric Biederman, but heavily changed.

PAT set as below
PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC)
So, only change from boot setting is UC_MINUS -> WC.

Also, PAT WC is enabled only on recent Intel CPUs. Other CPUs can be added as
they are tested with these patches.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6.git/arch/x86/mm/Makefile_64
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/Makefile_64	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/Makefile_64	2008-01-08 12:43:13.000000000 -0800
@@ -2,7 +2,7 @@
 # Makefile for the linux x86_64-specific parts of the memory manager.
 #
 
-obj-y	 := init_64.o fault_64.o ioremap_64.o extable.o pageattr_64.o mmap.o
+obj-y	 := init_64.o fault_64.o ioremap_64.o extable.o pageattr_64.o mmap.o pat.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NUMA) += numa_64.o
 obj-$(CONFIG_K8_NUMA) += k8topology_64.o
Index: linux-2.6.git/arch/x86/mm/pat.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/arch/x86/mm/pat.c	2008-01-08 12:43:13.000000000 -0800
@@ -0,0 +1,70 @@
+/* Handle caching attributes in page tables (PAT) */
+#include <linux/mm.h>
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <asm/msr.h>
+#include <asm/tlbflush.h>
+#include <asm/processor.h>
+
+static u64 boot_pat_state;
+int pat_wc_enabled = 0;
+
+enum {
+	PAT_UC = 0,   	/* uncached */
+	PAT_WC = 1,		/* Write combining */
+	PAT_WT = 4,		/* Write Through */
+	PAT_WP = 5,		/* Write Protected */
+	PAT_WB = 6,		/* Write Back (default) */
+	PAT_UC_MINUS = 7,	/* UC, but can be overriden by MTRR */
+};
+
+#define PAT(x,y) ((u64)PAT_ ## y << ((x)*8))
+
+static int pat_known_cpu(void)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+	    (boot_cpu_data.x86 == 0xF ||
+	     (boot_cpu_data.x86 == 0x6 && boot_cpu_data.x86_model >= 0x15)))
+		return cpu_has_pat;
+
+	return 0;
+}
+
+void pat_init(void)
+{
+	u64 pat;
+	if (!smp_processor_id() && !pat_known_cpu())
+		return;
+
+	if (smp_processor_id() && !pat_wc_enabled)
+		return;
+
+	/* Set PWT+PCD to Write-Combining. All other bits stay the same */
+	/* PTE encoding used in Linux:
+	      PAT
+	      |PCD
+	      ||PWT
+	      |||
+	      000 WB         default
+	      010 WC         _PAGE_WC
+	      011 UC         _PAGE_PCD
+		PAT bit unused */
+	pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) |
+	      PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC);
+
+	if (!pat_wc_enabled) {
+		rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
+		pat_wc_enabled = 1;
+	}
+
+	wrmsrl(MSR_IA32_CR_PAT, pat);
+	printk("cpu %d, old 0x%Lx, new 0x%Lx\n", smp_processor_id(),
+			boot_pat_state, pat);
+}
+
+#undef PAT
+
+void pat_shutdown(void)
+{
+}
+
Index: linux-2.6.git/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.git.orig/arch/x86/pci/i386.c	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/arch/x86/pci/i386.c	2008-01-08 12:43:13.000000000 -0800
@@ -301,7 +301,6 @@
 			enum pci_mmap_state mmap_state, int write_combine)
 {
 	unsigned long prot;
-
 	/* I/O space cannot be accessed via normal processor loads and
 	 * stores on this platform.
 	 */
@@ -311,14 +310,25 @@
 	/* Leave vm_pgoff as-is, the PCI space address is the physical
 	 * address on this platform.
 	 */
-	prot = pgprot_val(vma->vm_page_prot);
-	if (boot_cpu_data.x86 > 3)
-		prot |= _PAGE_PCD | _PAGE_PWT;
-	vma->vm_page_prot = __pgprot(prot);
+	if (pat_wc_enabled) {
+		if (write_combine) {
+			vma->vm_page_prot =
+				pgprot_writecombine(vma->vm_page_prot);
+		} else {
+			vma->vm_page_prot =
+				pgprot_noncached(vma->vm_page_prot);
+		}
+	} else {
+		/* Write-combine setting is ignored, it is changed via the mtrr
+		 * interfaces on this platform.
+		 */
+		prot = pgprot_val(vma->vm_page_prot);
+		if (boot_cpu_data.x86 > 3) {
+			prot |= _PAGE_PCD;
+		}
+		vma->vm_page_prot = __pgprot(prot);
+	}
 
-	/* Write-combine setting is ignored, it is changed via the mtrr
-	 * interfaces on this platform.
-	 */
 	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			       vma->vm_end - vma->vm_start,
 			       vma->vm_page_prot))
Index: linux-2.6.git/include/asm-x86/cpufeature.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/cpufeature.h	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/include/asm-x86/cpufeature.h	2008-01-08 12:43:13.000000000 -0800
@@ -167,6 +167,7 @@
 #define cpu_has_pebs		boot_cpu_has(X86_FEATURE_PEBS)
 #define cpu_has_clflush		boot_cpu_has(X86_FEATURE_CLFLSH)
 #define cpu_has_bts		boot_cpu_has(X86_FEATURE_BTS)
+#define cpu_has_pat		boot_cpu_has(X86_FEATURE_PAT)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg		1
Index: linux-2.6.git/include/asm-x86/msr-index.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/msr-index.h	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/include/asm-x86/msr-index.h	2008-01-08 12:43:13.000000000 -0800
@@ -70,6 +70,7 @@
 #define DEBUGCTLMSR_LBR		(1UL << _DEBUGCTLMSR_LBR)
 #define DEBUGCTLMSR_BTF		(1UL << _DEBUGCTLMSR_BTF)
 
+#define MSR_IA32_CR_PAT			0x00000277
 #define MSR_IA32_MC0_CTL		0x00000400
 #define MSR_IA32_MC0_STATUS		0x00000401
 #define MSR_IA32_MC0_ADDR		0x00000402
Index: linux-2.6.git/include/asm-x86/pgtable_64.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/pgtable_64.h	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/include/asm-x86/pgtable_64.h	2008-01-08 12:43:13.000000000 -0800
@@ -158,7 +158,7 @@
 #define _PAGE_RW	(_AC(1, UL)<<_PAGE_BIT_RW)
 #define _PAGE_USER	(_AC(1, UL)<<_PAGE_BIT_USER)
 #define _PAGE_PWT	(_AC(1, UL)<<_PAGE_BIT_PWT)
-#define _PAGE_PCD	(_AC(1, UL)<<_PAGE_BIT_PCD)
+#define _PAGE_PCD	((_AC(1, UL)<<_PAGE_BIT_PCD) | _PAGE_PWT)
 #define _PAGE_ACCESSED	(_AC(1, UL)<<_PAGE_BIT_ACCESSED)
 #define _PAGE_DIRTY	(_AC(1, UL)<<_PAGE_BIT_DIRTY)
 /* 2MB page */
@@ -167,6 +167,10 @@
 #define _PAGE_FILE	(_AC(1, UL)<<_PAGE_BIT_FILE)
 /* Global TLB entry */
 #define _PAGE_GLOBAL	(_AC(1, UL)<<_PAGE_BIT_GLOBAL)
+/* We redefine PCD to be write combining. PAT bit is not used */
+#define _PAGE_WC	(_AC(1, UL)<<_PAGE_BIT_PCD)
+
+#define _PAGE_CACHE_MASK	(_PAGE_PCD)
 
 #define _PAGE_PROTNONE	0x080	/* If not present */
 #define _PAGE_NX        (_AC(1, UL)<<_PAGE_BIT_NX)
@@ -189,13 +193,15 @@
 #define __PAGE_KERNEL_EXEC \
 	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED)
 #define __PAGE_KERNEL_NOCACHE \
-	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_PWT | _PAGE_ACCESSED | _PAGE_NX)
+	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_PCD | _PAGE_ACCESSED | _PAGE_NX)
+#define __PAGE_KERNEL_WC \
+	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_WC | _PAGE_ACCESSED | _PAGE_NX)
 #define __PAGE_KERNEL_RO \
 	(_PAGE_PRESENT | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_NX)
 #define __PAGE_KERNEL_VSYSCALL \
 	(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED)
 #define __PAGE_KERNEL_VSYSCALL_NOCACHE \
-	(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_PCD | _PAGE_PWT)
+	(_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED | _PAGE_PCD)
 #define __PAGE_KERNEL_LARGE \
 	(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC \
@@ -207,6 +213,7 @@
 #define PAGE_KERNEL_EXEC MAKE_GLOBAL(__PAGE_KERNEL_EXEC)
 #define PAGE_KERNEL_RO MAKE_GLOBAL(__PAGE_KERNEL_RO)
 #define PAGE_KERNEL_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_NOCACHE)
+#define PAGE_KERNEL_WC MAKE_GLOBAL(__PAGE_KERNEL_WC)
 #define PAGE_KERNEL_VSYSCALL32 __pgprot(__PAGE_KERNEL_VSYSCALL)
 #define PAGE_KERNEL_VSYSCALL MAKE_GLOBAL(__PAGE_KERNEL_VSYSCALL)
 #define PAGE_KERNEL_LARGE MAKE_GLOBAL(__PAGE_KERNEL_LARGE)
@@ -302,8 +309,24 @@
 
 /*
  * Macro to mark a page protection value as "uncacheable".
+ * Accesses through a uncached translation bypasses the cache
+ * and do not allow for consecutive writes to be combined.
  */
-#define pgprot_noncached(prot)	(__pgprot(pgprot_val(prot) | _PAGE_PCD | _PAGE_PWT))
+#define pgprot_noncached(prot) \
+	__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_MASK) | _PAGE_PCD)
+
+/*
+ * Macro to make mark a page protection value as "write-combining".
+ * Accesses through a write-combining translation works bypasses the
+ * caches, but does allow for consecutive writes to be combined into
+ * single (but larger) write transactions.
+ * This is mostly useful for IO accesses, for memory it is often slower.
+ * It also implies uncached.
+ */
+#define pgprot_writecombine(prot) \
+	__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_MASK) | _PAGE_WC)
+
+#define pgprot_nonstd(prot) (pgprot_val(prot) & _PAGE_CACHE_MASK)
 
 static inline int pmd_large(pmd_t pte) { 
 	return (pmd_val(pte) & __LARGE_PTE) == __LARGE_PTE; 
@@ -417,6 +440,7 @@
 #define pgtable_cache_init()   do { } while (0)
 #define check_pgt_cache()      do { } while (0)
 
+/* AGP users use MTRRs for now. Need to add an ioctl to agpgart for WC */
 #define PAGE_AGP    PAGE_KERNEL_NOCACHE
 #define HAVE_PAGE_AGP 1
 
Index: linux-2.6.git/arch/x86/kernel/cpu/mtrr/generic.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/mtrr/generic.c	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/arch/x86/kernel/cpu/mtrr/generic.c	2008-01-08 12:43:13.000000000 -0800
@@ -79,19 +79,23 @@
 			base, base + step - 1, mtrr_attrib_to_str(*types));
 }
 
+static void prepare_set(void);
+static void post_set(void);
+
 /*  Grab all of the MTRR state for this CPU into *state  */
 void __init get_mtrr_state(void)
 {
 	unsigned int i;
 	struct mtrr_var_range *vrs;
 	unsigned lo, dummy;
+	unsigned long flags;
 
 	if (!mtrr_state.var_ranges) {
-		mtrr_state.var_ranges = kmalloc(num_var_ranges * sizeof (struct mtrr_var_range), 
+		mtrr_state.var_ranges = kmalloc(num_var_ranges * sizeof (struct mtrr_var_range),
 						GFP_KERNEL);
 		if (!mtrr_state.var_ranges)
 			return;
-	} 
+	}
 	vrs = mtrr_state.var_ranges;
 
 	rdmsr(MTRRcap_MSR, lo, dummy);
@@ -137,6 +141,16 @@
 				printk(KERN_INFO "MTRR %u disabled\n", i);
 		}
 	}
+
+	/* PAT setup for BP. we need to go through sync steps here */
+	local_irq_save(flags);
+	prepare_set();
+
+	pat_init();
+
+	post_set();
+	local_irq_restore(flags);
+
 }
 
 /*  Some BIOS's are fucked and don't set all MTRRs the same!  */
@@ -399,6 +413,9 @@
 	/* Actually set the state */
 	mask = set_mtrr_state();
 
+	/* also set PAT */
+	pat_init();
+
 	post_set();
 	local_irq_restore(flags);
 
Index: linux-2.6.git/arch/x86/mm/Makefile_32
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/Makefile_32	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/Makefile_32	2008-01-08 12:43:13.000000000 -0800
@@ -2,7 +2,7 @@
 # Makefile for the linux i386-specific parts of the memory manager.
 #
 
-obj-y	:= init_32.o pgtable_32.o fault_32.o ioremap_32.o extable.o pageattr_32.o mmap.o
+obj-y	:= init_32.o pgtable_32.o fault_32.o ioremap_32.o extable.o pageattr_32.o mmap.o pat.o
 
 obj-$(CONFIG_NUMA) += discontig_32.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
Index: linux-2.6.git/include/asm-x86/pgtable_32.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/pgtable_32.h	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/include/asm-x86/pgtable_32.h	2008-01-08 12:43:36.000000000 -0800
@@ -107,7 +107,7 @@
 #define _PAGE_RW	0x002
 #define _PAGE_USER	0x004
 #define _PAGE_PWT	0x008
-#define _PAGE_PCD	0x010
+#define _PAGE_PCD	0x018
 #define _PAGE_ACCESSED	0x020
 #define _PAGE_DIRTY	0x040
 #define _PAGE_PSE	0x080	/* 4 MB (or 2MB) page, Pentium+, if present.. */
@@ -116,6 +116,12 @@
 #define _PAGE_UNUSED2	0x400
 #define _PAGE_UNUSED3	0x800
 
+/* We redefine PCD to be write combining. PAT bit is not used */
+
+#define _PAGE_WC	0x10
+
+#define _PAGE_CACHE_MASK	0x18
+
 /* If _PAGE_PRESENT is clear, we use these: */
 #define _PAGE_FILE	0x040	/* nonlinear file mapping, saved PTE; unset:swap */
 #define _PAGE_PROTNONE	0x080	/* if the user mapped it with PROT_NONE;
@@ -156,7 +162,7 @@
 extern unsigned long long __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
 #define __PAGE_KERNEL_RO		(__PAGE_KERNEL & ~_PAGE_RW)
 #define __PAGE_KERNEL_RX		(__PAGE_KERNEL_EXEC & ~_PAGE_RW)
-#define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
 
@@ -358,7 +364,18 @@
  * it, this is a no-op.
  */
 #define pgprot_noncached(prot)	((boot_cpu_data.x86 > 3)					  \
-				 ? (__pgprot(pgprot_val(prot) | _PAGE_PCD | _PAGE_PWT)) : (prot))
+				 ? (__pgprot(pgprot_val(prot) | _PAGE_PCD)) : (prot))
+
+/*
+ * Macro to make mark a page protection value as "write-combining".
+ * Accesses through a write-combining translation works bypasses the
+ * caches, but does allow for consecutive writes to be combined into
+ * single (but larger) write transactions.
+ * This is mostly useful for IO accesses, for memory it is often slower.
+ * It also implies uncached.
+ */
+#define pgprot_writecombine(prot) \
+	__pgprot((pgprot_val(prot) & ~_PAGE_CACHE_MASK) | _PAGE_WC)
 
 /*
  * Conversion functions: convert a page and protection to a page entry,
Index: linux-2.6.git/include/asm-x86/processor.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/processor.h	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/include/asm-x86/processor.h	2008-01-08 12:43:13.000000000 -0800
@@ -647,6 +647,10 @@
 
 extern int force_mwait;
 
+extern int pat_wc_enabled;
+extern void pat_init(void);
+extern void pat_shutdown(void);
+
 extern void select_idle_routine(const struct cpuinfo_x86 *c);
 
 extern unsigned long boot_option_idle_override;
Index: linux-2.6.git/include/asm-generic/pgtable.h
===================================================================
--- linux-2.6.git.orig/include/asm-generic/pgtable.h	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/include/asm-generic/pgtable.h	2008-01-08 12:43:13.000000000 -0800
@@ -154,6 +154,10 @@
 })
 #endif
 
+#ifndef pgprot_writecombine
+#define pgprot_writecombine pgprot_noncached
+#endif
+
 /*
  * When walking page tables, we usually want to skip any p?d_none entries;
  * and any p?d_bad entries - reporting the error before resetting to none.
Index: linux-2.6.git/arch/x86/mm/ioremap_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_32.c	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_32.c	2008-01-08 12:43:13.000000000 -0800
@@ -119,7 +119,7 @@
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
 	unsigned long last_addr;
-	void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD | _PAGE_PWT);
+	void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD);
 	if (!p) 
 		return p; 
 
Index: linux-2.6.git/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c	2008-01-08 12:42:58.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_64.c	2008-01-08 12:43:13.000000000 -0800
@@ -138,7 +138,7 @@
 
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
-	return __ioremap(phys_addr, size, _PAGE_PCD | _PAGE_PWT);
+	return __ioremap(phys_addr, size, _PAGE_PCD);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 05/11] PAT x86: drm driver changes for PAT
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
                   ` (3 preceding siblings ...)
  2008-01-10 18:48 ` [patch 04/11] PAT x86: Basic PAT implementation venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 06/11] PAT x86: Refactoring i386 cpa venkatesh.pallipadi
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pat-drivers.patch --]
[-- Type: text/plain, Size: 3439 bytes --]

Straight forward port of pat-drivers.patch to x86 tree

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6.24-rc/drivers/char/drm/drm_proc.c
===================================================================
--- linux-2.6.24-rc.orig/drivers/char/drm/drm_proc.c
+++ linux-2.6.24-rc/drivers/char/drm/drm_proc.c
@@ -480,7 +480,7 @@ static int drm__vma_info(char *buf, char
 	int len = 0;
 	struct drm_vma_entry *pt;
 	struct vm_area_struct *vma;
-#if defined(__i386__)
+#ifdef CONFIG_X86
 	unsigned int pgprot;
 #endif
 
@@ -510,13 +510,13 @@ static int drm__vma_info(char *buf, char
 			       vma->vm_flags & VM_IO ? 'i' : '-',
 			       vma->vm_pgoff);
 
-#if defined(__i386__)
+#ifdef CONFIG_X86
 		pgprot = pgprot_val(vma->vm_page_prot);
 		DRM_PROC_PRINT(" %c%c%c%c%c%c%c%c%c",
 			       pgprot & _PAGE_PRESENT ? 'p' : '-',
 			       pgprot & _PAGE_RW ? 'w' : 'r',
 			       pgprot & _PAGE_USER ? 'u' : 's',
-			       pgprot & _PAGE_PWT ? 't' : 'b',
+			       ((pgprot & _PAGE_CACHE_MASK) == _PAGE_WC) ? 'w' : 'b',
 			       pgprot & _PAGE_PCD ? 'u' : 'c',
 			       pgprot & _PAGE_ACCESSED ? 'a' : '-',
 			       pgprot & _PAGE_DIRTY ? 'd' : '-',
Index: linux-2.6.24-rc/drivers/char/drm/drm_vm.c
===================================================================
--- linux-2.6.24-rc.orig/drivers/char/drm/drm_vm.c
+++ linux-2.6.24-rc/drivers/char/drm/drm_vm.c
@@ -45,11 +45,8 @@ static pgprot_t drm_io_prot(uint32_t map
 {
 	pgprot_t tmp = vm_get_page_prot(vma->vm_flags);
 
-#if defined(__i386__) || defined(__x86_64__)
-	if (boot_cpu_data.x86 > 3 && map_type != _DRM_AGP) {
-		pgprot_val(tmp) |= _PAGE_PCD;
-		pgprot_val(tmp) &= ~_PAGE_PWT;
-	}
+#ifdef CONFIG_X86
+	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 #elif defined(__powerpc__)
 	pgprot_val(tmp) |= _PAGE_NO_CACHE;
 	if (map_type == _DRM_REGISTERS)
Index: linux-2.6.24-rc/drivers/video/gbefb.c
===================================================================
--- linux-2.6.24-rc.orig/drivers/video/gbefb.c
+++ linux-2.6.24-rc/drivers/video/gbefb.c
@@ -57,7 +57,7 @@ struct gbefb_par {
 #endif
 #endif
 #ifdef CONFIG_X86
-#define pgprot_fb(_prot) ((_prot) | _PAGE_PCD)
+#define pgprot_fb(_prot) pgprot_writecombine(_prot)
 #endif
 
 /*
Index: linux-2.6.24-rc/drivers/video/sgivwfb.c
===================================================================
--- linux-2.6.24-rc.orig/drivers/video/sgivwfb.c
+++ linux-2.6.24-rc/drivers/video/sgivwfb.c
@@ -714,8 +714,7 @@ static int sgivwfb_mmap(struct fb_info *
 	if (offset + size > sgivwfb_mem_size)
 		return -EINVAL;
 	offset += sgivwfb_mem_phys;
-	pgprot_val(vma->vm_page_prot) =
-	    pgprot_val(vma->vm_page_prot) | _PAGE_PCD;
+	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 	vma->vm_flags |= VM_IO;
 	if (remap_pfn_range(vma, vma->vm_start, offset >> PAGE_SHIFT,
 						size, vma->vm_page_prot))
Index: linux-2.6.24-rc/include/asm-x86/fb.h
===================================================================
--- linux-2.6.24-rc.orig/include/asm-x86/fb.h
+++ linux-2.6.24-rc/include/asm-x86/fb.h
@@ -8,8 +8,7 @@
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
 				unsigned long off)
 {
-	if (boot_cpu_data.x86 > 3)
-		pgprot_val(vma->vm_page_prot) |= _PAGE_PCD;
+	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 }
 
 #ifdef CONFIG_X86_32

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 06/11] PAT x86: Refactoring i386 cpa
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
                   ` (4 preceding siblings ...)
  2008-01-10 18:48 ` [patch 05/11] PAT x86: drm driver changes for PAT venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 19:00   ` Andi Kleen
  2008-01-10 18:48 ` [patch 07/11] PAT x86: pat-conflict resolution using linear list venkatesh.pallipadi
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: cpa_i386.patch --]
[-- Type: text/plain, Size: 2388 bytes --]

This makes 32 bit cpa similar to x86_64 and makes it easier for following PAT
patches.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

Index: linux-2.6.git/arch/x86/mm/pageattr_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/pageattr_32.c	2008-01-08 04:05:24.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/pageattr_32.c	2008-01-08 05:10:44.000000000 -0800
@@ -153,15 +153,12 @@
 		list_add(&kpte_page->lru, &df_list);
 }
 
-static int __change_page_attr(struct page *page, pgprot_t prot)
+static int __change_page_attr(unsigned long address, unsigned long pfn,
+                              pgprot_t prot)
 {
 	struct page *kpte_page;
-	unsigned long address;
 	pte_t *kpte;
 
-	BUG_ON(PageHighMem(page));
-	address = (unsigned long)page_address(page);
-
 	kpte = lookup_address(address);
 	if (!kpte)
 		return -EINVAL;
@@ -172,7 +169,7 @@
 
 	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) {
 		if (!pte_huge(*kpte)) {
-			set_pte_atomic(kpte, mk_pte(page, prot));
+			set_pte_atomic(kpte, pfn_pte(pfn, prot));
 		} else {
 			struct page *split;
 			pgprot_t ref_prot;
@@ -190,7 +187,7 @@
 		page_private(kpte_page)++;
 	} else {
 		if (!pte_huge(*kpte)) {
-			set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
+			set_pte_atomic(kpte, pfn_pte(pfn, PAGE_KERNEL));
 			BUG_ON(page_private(kpte_page) == 0);
 			page_private(kpte_page)--;
 		} else
@@ -231,14 +228,15 @@
  *
  * Caller must call global_flush_tlb() after this.
  */
-int change_page_attr(struct page *page, int numpages, pgprot_t prot)
+int change_page_attr_addr(unsigned long address, int numpages, pgprot_t prot)
 {
 	unsigned long flags;
 	int err = 0, i;
 
 	spin_lock_irqsave(&cpa_lock, flags);
-	for (i = 0; i < numpages; i++, page++) {
-		err = __change_page_attr(page, prot);
+	for (i = 0; i < numpages; i++, address += PAGE_SIZE) {
+		unsigned long pfn = __pa(address) >> PAGE_SHIFT;
+		err = __change_page_attr(address, pfn, prot);
 		if (err)
 			break;
 	}
@@ -248,6 +246,12 @@
 }
 EXPORT_SYMBOL(change_page_attr);
 
+int change_page_attr(struct page *page, int numpages, pgprot_t prot)
+{
+	unsigned long addr = (unsigned long)page_address(page);
+	return change_page_attr_addr(addr, numpages, prot);
+}
+
 void global_flush_tlb(void)
 {
 	struct page *pg, *next;

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 07/11] PAT x86: pat-conflict resolution using linear list
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
                   ` (5 preceding siblings ...)
  2008-01-10 18:48 ` [patch 06/11] PAT x86: Refactoring i386 cpa venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 19:13   ` Andi Kleen
  2008-01-10 18:48 ` [patch 08/11] PAT x86: pci mmap conlfict patch venkatesh.pallipadi
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pat-conflict.patch --]
[-- Type: text/plain, Size: 9354 bytes --]

Straight forward port of pat-conflict.patch to x86 tree. Use a linear
list to keep track of all reserved region mappings.
Only UC access is allowed for RAM regions for now.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6.git/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c	2008-01-08 12:43:13.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_64.c	2008-01-08 12:44:20.000000000 -0800
@@ -20,6 +20,7 @@
 #include <asm/cacheflush.h>
 #include <asm/proto.h>
 #include <asm/e820.h>
+#include <asm/pat.h>
 
 unsigned long __phys_addr(unsigned long x)
 {
@@ -105,12 +106,23 @@
 		remove_vm_area((void *)(PAGE_MASK & (unsigned long) addr));
 		return NULL;
 	}
+
+	/* For plain ioremap() get the existing attributes. Otherwise
+	   check against the existing ones */
+	if (reserve_mattr(phys_addr, phys_addr + size, flags,
+			  flags ? NULL : &flags) < 0)
+		goto out;
+
 	if (flags && ioremap_change_attr(phys_addr, size, flags) < 0) {
-		area->flags &= 0xffffff;
-		vunmap(addr);
-		return NULL;
+		free_mattr(phys_addr, phys_addr + size, flags);
+		goto out;
 	}
 	return (__force void __iomem *) (offset + (char *)addr);
+
+out:
+	area->flags &= 0xffffff;
+	vunmap(addr);
+	return NULL;
 }
 EXPORT_SYMBOL(__ioremap);
 
@@ -178,8 +190,11 @@
 	}
 
 	/* Reset the direct mapping. Can block */
-	if (p->flags >> 20)
-		ioremap_change_attr(p->phys_addr, p->size, 0);
+	if (p->flags >> 20) {
+		free_mattr(p->phys_addr, p->phys_addr + get_vm_area_size(p),
+		           p->flags>>20);
+		ioremap_change_attr(p->phys_addr, get_vm_area_size(p), 0);
+	}
 
 	/* Finally remove it */
 	o = remove_vm_area((void *)addr);
Index: linux-2.6.git/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/pat.c	2008-01-08 12:43:13.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/pat.c	2008-01-08 12:45:05.000000000 -0800
@@ -5,6 +5,9 @@
 #include <asm/msr.h>
 #include <asm/tlbflush.h>
 #include <asm/processor.h>
+#include <asm/pgtable.h>
+#include <asm/pat.h>
+#include <asm/e820.h>
 
 static u64 boot_pat_state;
 int pat_wc_enabled = 0;
@@ -68,3 +71,116 @@
 {
 }
 
+/* The global memattr list keeps track of caching attributes for specific
+   physical memory areas. Conflicting caching attributes in different
+   mappings can cause CPU cache corruption. To avoid this we keep track.
+
+   The list is sorted and can contain multiple entries for each address
+   (this allows reference counting for overlapping areas). All the aliases
+   have the same cache attributes of course.  Zero attributes are represente
+   as holes.
+
+   Currently the data structure is a list because the number of mappings
+   are right now expected to be relatively small. If this should be a problem
+   it could be changed to a rbtree or similar.
+
+   mattr_lock protects the whole list. */
+
+struct memattr {
+	u64 start;
+	u64 end;
+	unsigned long attr;
+	struct list_head nd;
+};
+
+static LIST_HEAD(mattr_list);
+static DEFINE_SPINLOCK(mattr_lock); 	/* protects memattr list */
+
+int reserve_mattr(u64 start, u64 end, unsigned long attr, unsigned long *fattr)
+{
+	struct memattr *ma = NULL, *ml;
+	int err = 0;
+
+	if (fattr)
+		*fattr = attr;
+
+	if (is_memory_any_valid(start, end)) {
+		if (!is_memory_all_valid(start, end) && !fattr)
+			return -EINVAL;
+
+		if (attr & _PAGE_WC) {
+			if (!fattr)
+				return -EINVAL;
+			else
+				*fattr  = _PAGE_PCD;
+		}
+
+		return 0;
+	}
+
+	ma  = kmalloc(sizeof(struct memattr), GFP_KERNEL);
+	if (!ma)
+		return -ENOMEM;
+	ma->start = start;
+	ma->end = end;
+	ma->attr = attr;
+
+	spin_lock(&mattr_lock);
+	list_for_each_entry(ml, &mattr_list, nd) {
+		if (ml->start <= start && ml->end >= end) {
+			if (fattr)
+				ma->attr = *fattr = ml->attr;
+
+			if (!fattr && attr != ml->attr) {
+				printk(
+	KERN_DEBUG "%s:%d conflicting cache attribute %Lx-%Lx %lx<->%lx\n",
+					current->comm, current->pid,
+					start, end, attr, ml->attr);
+				err = -EBUSY;
+				break;
+			}
+		} else if (ml->start >= end) {
+			list_add(&ma->nd, ml->nd.prev);
+			ma = NULL;
+			break;
+		}
+	}
+
+	if (err)
+		kfree(ma);
+	else if (ma)
+		list_add_tail(&ma->nd, &mattr_list);
+
+	spin_unlock(&mattr_lock);
+	return err;
+}
+
+int free_mattr(u64 start, u64 end, unsigned long attr)
+{
+	struct memattr *ml;
+	int err = attr ? -EBUSY : 0;
+
+	if (is_memory_any_valid(start, end))
+		return 0;
+
+	spin_lock(&mattr_lock);
+	list_for_each_entry(ml, &mattr_list, nd) {
+		if (ml->start == start && ml->end == end) {
+			if (ml->attr != attr)
+				printk(KERN_DEBUG
+	"%s:%d conflicting cache attributes on free %Lx-%Lx %lx<->%lx\n",
+			current->comm, current->pid, start, end, attr,ml->attr);
+			list_del(&ml->nd);
+			kfree(ml);
+			err = 0;
+			break;
+		}
+	}
+	spin_unlock(&mattr_lock);
+	if (err)
+		printk(KERN_DEBUG "%s:%d freeing invalid mattr %Lx-%Lx %lx\n",
+			current->comm, current->pid,
+			start, end, attr);
+	return err;
+}
+
Index: linux-2.6.git/include/asm-x86/pat.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/include/asm-x86/pat.h	2008-01-08 12:44:20.000000000 -0800
@@ -0,0 +1,12 @@
+#ifndef _ASM_PAT_H
+#define _ASM_PAT_H 1
+
+#include <linux/types.h>
+
+/* Handle the page attribute table (PAT) of the CPU */
+
+int reserve_mattr(u64 start, u64 end, unsigned long attr, unsigned long *fattr);
+int free_mattr(u64 start, u64 end, unsigned long attr);
+
+#endif
+
Index: linux-2.6.git/arch/x86/mm/ioremap_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_32.c	2008-01-08 12:43:13.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_32.c	2008-01-08 12:44:20.000000000 -0800
@@ -17,6 +17,7 @@
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/pgtable.h>
+#include <asm/pat.h>
 
 #define ISA_START_ADDRESS	0xa0000
 #define ISA_END_ADDRESS		0x100000
@@ -26,6 +27,42 @@
  */
 
 /*
+ * Fix up the linear direct mapping of the kernel to avoid cache attribute
+ * conflicts.
+ */
+static int
+ioremap_change_attr(unsigned long phys_addr, unsigned long size,
+					unsigned long flags)
+{
+	unsigned long last_addr;
+	int err = 0;
+
+	/* Guaranteed to be > phys_addr, as per __ioremap() */
+	last_addr = phys_addr + size - 1;
+	if (last_addr < virt_to_phys(high_memory) - 1) {
+		unsigned long vaddr = (unsigned long)__va(phys_addr);
+		unsigned long npages;
+
+		phys_addr &= PAGE_MASK;
+
+		/* This might overflow and become zero.. */
+		last_addr = PAGE_ALIGN(last_addr);
+
+		/* .. but that's ok, because modulo-2**n arithmetic will make
+	 	* the page-aligned "last - first" come out right.
+	 	*/
+		npages = (last_addr - phys_addr) >> PAGE_SHIFT;
+
+		err = change_page_attr_addr(vaddr, npages,
+		                       __pgprot(__PAGE_KERNEL|flags));
+		if (!err)
+			global_flush_tlb();
+	}
+
+	return err;
+}
+
+/*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. Needed when the kernel wants to access high addresses
  * directly.
@@ -90,7 +127,25 @@
 		vunmap((void __force *) addr);
 		return NULL;
 	}
+
+	/*
+	 * For plain ioremap() get the existing attributes. Otherwise
+	 * check against the existing ones.
+	 */
+	if (reserve_mattr(phys_addr, phys_addr + size, flags,
+	                  flags ? NULL : &flags) < 0)
+		goto out;
+
+	if (flags && ioremap_change_attr(phys_addr, size, flags) < 0) {
+		free_mattr(phys_addr, phys_addr + size, flags);
+		goto out;
+	}
 	return (void __iomem *) (offset + (char __iomem *)addr);
+
+out:
+	area->flags &= 0xffffff;
+	vunmap(addr);
+	return NULL;
 }
 EXPORT_SYMBOL(__ioremap);
 
@@ -118,36 +173,7 @@
 
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
-	unsigned long last_addr;
-	void __iomem *p = __ioremap(phys_addr, size, _PAGE_PCD);
-	if (!p) 
-		return p; 
-
-	/* Guaranteed to be > phys_addr, as per __ioremap() */
-	last_addr = phys_addr + size - 1;
-
-	if (last_addr < virt_to_phys(high_memory) - 1) {
-		struct page *ppage = virt_to_page(__va(phys_addr));		
-		unsigned long npages;
-
-		phys_addr &= PAGE_MASK;
-
-		/* This might overflow and become zero.. */
-		last_addr = PAGE_ALIGN(last_addr);
-
-		/* .. but that's ok, because modulo-2**n arithmetic will make
-	 	* the page-aligned "last - first" come out right.
-	 	*/
-		npages = (last_addr - phys_addr) >> PAGE_SHIFT;
-
-		if (change_page_attr(ppage, npages, PAGE_KERNEL_NOCACHE) < 0) { 
-			iounmap(p); 
-			p = NULL;
-		}
-		global_flush_tlb();
-	}
-
-	return p;					
+	return __ioremap(phys_addr, size, _PAGE_PCD);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -194,12 +220,11 @@
 	}
 
 	/* Reset the direct mapping. Can block */
-	if ((p->flags >> 20) && p->phys_addr < virt_to_phys(high_memory) - 1) {
-		change_page_attr(virt_to_page(__va(p->phys_addr)),
-				 get_vm_area_size(p) >> PAGE_SHIFT,
-				 PAGE_KERNEL);
-		global_flush_tlb();
-	} 
+	if (p->flags >> 20) {
+		free_mattr(p->phys_addr, p->phys_addr + get_vm_area_size(p),
+		           p->flags>>20);
+		ioremap_change_attr(p->phys_addr, get_vm_area_size(p), 0);
+	}
 
 	/* Finally remove it */
 	o = remove_vm_area((void *)addr);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 08/11] PAT x86: pci mmap conlfict patch
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
                   ` (6 preceding siblings ...)
  2008-01-10 18:48 ` [patch 07/11] PAT x86: pat-conflict resolution using linear list venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 09/11] PAT x86: Add ioremap_wc support venkatesh.pallipadi
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: pci-mmap-conflict.patch --]
[-- Type: text/plain, Size: 2383 bytes --]

Forward port of pci-mmap-conflict.patch to x86 tree.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6.git/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.git.orig/arch/x86/pci/i386.c	2008-01-08 04:30:53.000000000 -0800
+++ linux-2.6.git/arch/x86/pci/i386.c	2008-01-08 05:15:09.000000000 -0800
@@ -30,6 +30,8 @@
 #include <linux/init.h>
 #include <linux/ioport.h>
 #include <linux/errno.h>
+#include <asm/pat.h>
+#include <asm/cacheflush.h>
 
 #include "pci.h"
 
@@ -297,10 +299,37 @@
 	pci_write_config_byte(dev, PCI_LATENCY_TIMER, lat);
 }
 
+static void pci_unmap_page_range(struct vm_area_struct *vma)
+{
+	u64 adr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	unsigned long flags = pgprot_val(vma->vm_page_prot)
+				& _PAGE_CACHE_MASK;
+	free_mattr(adr, adr + vma->vm_end - vma->vm_start, flags);
+}
+
+static void pci_track_mmap_page_range(struct vm_area_struct *vma)
+{
+	u64 adr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	unsigned long flags = pgprot_val(vma->vm_page_prot)
+				& _PAGE_CACHE_MASK;
+
+	reserve_mattr(adr, adr + vma->vm_end - vma->vm_start, flags, NULL);
+}
+
+static struct vm_operations_struct pci_mmap_ops = {
+	.open  = pci_track_mmap_page_range,
+	.close = pci_unmap_page_range
+};
+
 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
 			enum pci_mmap_state mmap_state, int write_combine)
 {
 	unsigned long prot;
+	u64 addr = vma->vm_pgoff << PAGE_SHIFT;
+	unsigned long len = vma->vm_end - vma->vm_start;
+	unsigned long attr;
+	int err;
+
 	/* I/O space cannot be accessed via normal processor loads and
 	 * stores on this platform.
 	 */
@@ -329,10 +358,25 @@
 		vma->vm_page_prot = __pgprot(prot);
 	}
 
+	attr = pgprot_val(vma->vm_page_prot) & _PAGE_CACHE_MASK;
+	err = reserve_mattr(addr, addr+len, attr, NULL);
+	if (err)
+		return -EBUSY;
+
+	err = change_page_attr_addr((unsigned long)__va(addr),
+	                            len >> PAGE_SHIFT,
+	                            __pgprot(__PAGE_KERNEL | attr));
+	if (err) {
+		free_mattr(addr, addr+len, attr);
+		return err;
+	}
+
 	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			       vma->vm_end - vma->vm_start,
 			       vma->vm_page_prot))
 		return -EAGAIN;
 
+	vma->vm_ops = &pci_mmap_ops;
+
 	return 0;
 }

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 09/11] PAT x86: Add ioremap_wc support
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
                   ` (7 preceding siblings ...)
  2008-01-10 18:48 ` [patch 08/11] PAT x86: pci mmap conlfict patch venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 19:08   ` Andi Kleen
  2008-01-10 18:48 ` [patch 10/11] PAT x86: Handle /dev/mem mappings venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfs vor pci_mmap_resource venkatesh.pallipadi
  10 siblings, 1 reply; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: ioremap_wc.patch --]
[-- Type: text/plain, Size: 5276 bytes --]

Forward port of ioremap.patch to x86 tree.
Shared code across i386 and x86_64 are in shared files ioremap.c and io.h.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6.git/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c	2008-01-08 05:12:14.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap_64.c	2008-01-08 05:15:56.000000000 -0800
@@ -127,7 +127,7 @@
 EXPORT_SYMBOL(__ioremap);
 
 /**
- * ioremap_nocache     -   map bus memory into CPU space
+ * ioremap_nocache     -   map bus memory into CPU space uncached
  * @offset:    bus address of the memory
  * @size:      size of the resource to map
  *
@@ -154,6 +154,7 @@
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
+
 /**
  * iounmap - Free a IO remapping
  * @addr: virtual address from ioremap_*
Index: linux-2.6.git/include/asm-x86/io_64.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/io_64.h	2008-01-08 03:41:30.000000000 -0800
+++ linux-2.6.git/include/asm-x86/io_64.h	2008-01-08 05:15:56.000000000 -0800
@@ -165,7 +165,7 @@
  * it's useful if some control registers are in such an area and write combining
  * or read caching is not desirable:
  */
-extern void __iomem * ioremap_nocache (unsigned long offset, unsigned long size);
+extern void __iomem * ioremap_nocache(unsigned long offset, unsigned long size);
 extern void iounmap(volatile void __iomem *addr);
 extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys);
 
Index: linux-2.6.git/include/asm-generic/iomap.h
===================================================================
--- linux-2.6.git.orig/include/asm-generic/iomap.h	2008-01-08 03:31:37.000000000 -0800
+++ linux-2.6.git/include/asm-generic/iomap.h	2008-01-08 05:15:56.000000000 -0800
@@ -65,4 +65,8 @@
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
 extern void pci_iounmap(struct pci_dev *dev, void __iomem *);
 
+#ifndef ioremap_wc
+#define ioremap_wc ioremap_nocache
+#endif
+
 #endif
Index: linux-2.6.git/arch/x86/mm/Makefile_32
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/Makefile_32	2008-01-08 04:43:09.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/Makefile_32	2008-01-08 05:16:50.000000000 -0800
@@ -2,7 +2,7 @@
 # Makefile for the linux i386-specific parts of the memory manager.
 #
 
-obj-y	:= init_32.o pgtable_32.o fault_32.o ioremap_32.o extable.o pageattr_32.o mmap.o pat.o
+obj-y	:= init_32.o pgtable_32.o fault_32.o ioremap_32.o extable.o pageattr_32.o mmap.o pat.o ioremap.o
 
 obj-$(CONFIG_NUMA) += discontig_32.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
Index: linux-2.6.git/arch/x86/mm/Makefile_64
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/Makefile_64	2008-01-08 04:32:05.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/Makefile_64	2008-01-08 05:16:32.000000000 -0800
@@ -2,7 +2,7 @@
 # Makefile for the linux x86_64-specific parts of the memory manager.
 #
 
-obj-y	 := init_64.o fault_64.o ioremap_64.o extable.o pageattr_64.o mmap.o pat.o
+obj-y	 := init_64.o fault_64.o ioremap_64.o extable.o pageattr_64.o mmap.o pat.o ioremap.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NUMA) += numa_64.o
 obj-$(CONFIG_K8_NUMA) += k8topology_64.o
Index: linux-2.6.git/arch/x86/mm/ioremap.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.git/arch/x86/mm/ioremap.c	2008-01-08 05:15:56.000000000 -0800
@@ -0,0 +1,31 @@
+#include <linux/module.h>
+
+#include <asm/io.h>
+#include <asm/pgtable.h>
+#include <asm/processor.h>
+
+/**
+ * ioremap_wc    -   map bus memory into CPU space write combined
+ * @offset:    bus address of the memory
+ * @size:      size of the resource to map
+ *
+ * ioremap_wc performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * This version of ioremap ensures that the memory is marked write combining.
+ * Write combining allows faster writes to some hardware devices.
+ * See also iounmap_nocache for more details.
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+{
+	if (pat_wc_enabled)
+		return __ioremap(phys_addr, size, _PAGE_WC);
+	else
+		return ioremap_nocache(phys_addr, size);
+}
+EXPORT_SYMBOL(ioremap_wc);
Index: linux-2.6.git/include/asm-x86/io.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/io.h	2008-01-08 03:31:38.000000000 -0800
+++ linux-2.6.git/include/asm-x86/io.h	2008-01-08 05:15:56.000000000 -0800
@@ -1,5 +1,14 @@
+#ifndef _ASM_X86_IO_H
+#define _ASM_X86_IO_H
+
+#define ioremap_wc ioremap_wc
+
 #ifdef CONFIG_X86_32
 # include "io_32.h"
 #else
 # include "io_64.h"
 #endif
+
+extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);
+
+#endif

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 10/11] PAT x86: Handle /dev/mem mappings
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
                   ` (8 preceding siblings ...)
  2008-01-10 18:48 ` [patch 09/11] PAT x86: Add ioremap_wc support venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 18:48 ` [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfs vor pci_mmap_resource venkatesh.pallipadi
  10 siblings, 0 replies; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: devmem.patch --]
[-- Type: text/plain, Size: 10412 bytes --]

Forward port of devmem.patch to x86 tree. With added bug fix of doing
cpa only with non zero flags.


Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6.git/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/pat.c	2008-01-08 12:45:05.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/pat.c	2008-01-08 12:51:02.000000000 -0800
@@ -2,12 +2,15 @@
 #include <linux/mm.h>
 #include <linux/kernel.h>
 #include <linux/gfp.h>
+#include <linux/fs.h>
 #include <asm/msr.h>
 #include <asm/tlbflush.h>
 #include <asm/processor.h>
 #include <asm/pgtable.h>
 #include <asm/pat.h>
 #include <asm/e820.h>
+#include <asm/cacheflush.h>
+#include <asm/fcntl.h>
 
 static u64 boot_pat_state;
 int pat_wc_enabled = 0;
@@ -71,6 +74,16 @@
 {
 }
 
+static char *cattr_name(unsigned long flags)
+{
+	switch (flags & _PAGE_CACHE_MASK) {
+	case _PAGE_WC:  return "write combining";
+	case _PAGE_PCD: return "uncached";
+	case 0: 	return "default";
+	default: 	return "broken";
+	}
+}
+
 /* The global memattr list keeps track of caching attributes for specific
    physical memory areas. Conflicting caching attributes in different
    mappings can cause CPU cache corruption. To avoid this we keep track.
@@ -108,7 +121,7 @@
 		if (!is_memory_all_valid(start, end) && !fattr)
 			return -EINVAL;
 
-		if (attr & _PAGE_WC) {
+		if (attr == _PAGE_WC) {
 			if (!fattr)
 				return -EINVAL;
 			else
@@ -133,9 +146,10 @@
 
 			if (!fattr && attr != ml->attr) {
 				printk(
-	KERN_DEBUG "%s:%d conflicting cache attribute %Lx-%Lx %lx<->%lx\n",
+	KERN_DEBUG "%s:%d conflicting cache attribute %Lx-%Lx %s<->%s\n",
 					current->comm, current->pid,
-					start, end, attr, ml->attr);
+					start, end,
+					cattr_name(attr), cattr_name(ml->attr));
 				err = -EBUSY;
 				break;
 			}
@@ -168,8 +182,9 @@
 		if (ml->start == start && ml->end == end) {
 			if (ml->attr != attr)
 				printk(KERN_DEBUG
-	"%s:%d conflicting cache attributes on free %Lx-%Lx %lx<->%lx\n",
-			current->comm, current->pid, start, end, attr,ml->attr);
+	"%s:%d conflicting cache attributes on free %Lx-%Lx %s<->%s\n",
+			current->comm, current->pid, start, end,
+			cattr_name(attr), cattr_name(ml->attr));
 			list_del(&ml->nd);
 			kfree(ml);
 			err = 0;
@@ -178,9 +193,89 @@
 	}
 	spin_unlock(&mattr_lock);
 	if (err)
-		printk(KERN_DEBUG "%s:%d freeing invalid mattr %Lx-%Lx %lx\n",
+		printk(KERN_DEBUG "%s:%d freeing invalid mattr %Lx-%Lx %s\n",
 			current->comm, current->pid,
-			start, end, attr);
+			start, end, cattr_name(attr));
 	return err;
 }
 
+/* /dev/mem interface. Use the previous mapping */
+pgprot_t
+phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size,
+		     pgprot_t vma_prot)
+{
+	u64 offset = pfn << PAGE_SHIFT;
+	unsigned long flags;
+	unsigned long want_flags = 0;
+	if (file->f_flags & O_SYNC)
+		want_flags = _PAGE_PCD;
+
+#ifdef CONFIG_X86_32
+	/*
+	 * On the PPro and successors, the MTRRs are used to set
+ 	 * memory types for physical addresses outside main memory,
+	 * so blindly setting PCD or PWT on those pages is wrong.
+	 * For Pentiums and earlier, the surround logic should disable
+	 * caching for the high addresses through the KEN pin, but
+	 * we maintain the tradition of paranoia in this code.
+	 */
+	if (!pat_wc_enabled &&
+	    ! ( test_bit(X86_FEATURE_MTRR, boot_cpu_data.x86_capability) ||
+		test_bit(X86_FEATURE_K6_MTRR, boot_cpu_data.x86_capability) ||
+		test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) ||
+		test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability)) &&
+	   offset >= __pa(high_memory))
+		want_flags = _PAGE_PCD;
+#endif
+
+	/* ignore error because we can't handle it here */
+	reserve_mattr(offset, offset+size, want_flags, &flags);
+	if (flags != want_flags) {
+		printk(KERN_DEBUG
+	"%s:%d /dev/mem expected mapping type %s for %Lx-%Lx, got %s\n",
+			current->comm, current->pid,
+			cattr_name(want_flags),
+			offset, offset+size,
+			cattr_name(flags));
+	}
+
+	if (offset < __pa(high_memory) && flags) {
+		/* RED-PEN when the kernel memory was write protected
+		   or similar before we'll destroy that here. need a pgprot
+		   mask in cpa? */
+		change_page_attr_addr((unsigned long)__va(offset),
+		                      size >> PAGE_SHIFT,
+		                      __pgprot(__PAGE_KERNEL | flags));
+	}
+	return __pgprot((pgprot_val(vma_prot) & ~_PAGE_CACHE_MASK)|flags);
+}
+
+void map_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
+{
+	u64 addr = (u64)pfn << PAGE_SHIFT;
+	unsigned long flags;
+	unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+
+	reserve_mattr(addr, addr+size, want_flags, &flags);
+	if (flags != want_flags) {
+		printk(KERN_DEBUG
+	"%s:%d /dev/mem expected mapping type %s for %Lx-%Lx, got %s\n",
+			current->comm, current->pid,
+			cattr_name(want_flags),
+			addr, addr+size,
+			cattr_name(flags));
+	}
+}
+
+void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
+{
+	u64 addr = (u64)pfn << PAGE_SHIFT;
+	unsigned long flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+
+	free_mattr(addr, addr+size, flags);
+	if (addr < __pa(high_memory) &&
+	   (pgprot_val(vma_prot) & _PAGE_CACHE_MASK))
+		change_page_attr_addr((unsigned long)__va(addr),
+		                      size >> PAGE_SHIFT,
+		                      PAGE_KERNEL);
+}
Index: linux-2.6.git/drivers/char/mem.c
===================================================================
--- linux-2.6.git.orig/drivers/char/mem.c	2008-01-08 12:40:52.000000000 -0800
+++ linux-2.6.git/drivers/char/mem.c	2008-01-08 12:47:47.000000000 -0800
@@ -41,36 +41,7 @@
  */
 static inline int uncached_access(struct file *file, unsigned long addr)
 {
-#if defined(__i386__) && !defined(__arch_um__)
-	/*
-	 * On the PPro and successors, the MTRRs are used to set
-	 * memory types for physical addresses outside main memory,
-	 * so blindly setting PCD or PWT on those pages is wrong.
-	 * For Pentiums and earlier, the surround logic should disable
-	 * caching for the high addresses through the KEN pin, but
-	 * we maintain the tradition of paranoia in this code.
-	 */
-	if (file->f_flags & O_SYNC)
-		return 1;
- 	return !( test_bit(X86_FEATURE_MTRR, boot_cpu_data.x86_capability) ||
-		  test_bit(X86_FEATURE_K6_MTRR, boot_cpu_data.x86_capability) ||
-		  test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) ||
-		  test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability) )
-	  && addr >= __pa(high_memory);
-#elif defined(__x86_64__) && !defined(__arch_um__)
-	/* 
-	 * This is broken because it can generate memory type aliases,
-	 * which can cause cache corruptions
-	 * But it is only available for root and we have to be bug-to-bug
-	 * compatible with i386.
-	 */
-	if (file->f_flags & O_SYNC)
-		return 1;
-	/* same behaviour as i386. PAT always set to cached and MTRRs control the
-	   caching behaviour. 
-	   Hopefully a full PAT implementation will fix that soon. */	   
-	return 0;
-#elif defined(CONFIG_IA64)
+#if defined(CONFIG_IA64)
 	/*
 	 * On ia64, we ignore O_SYNC because we cannot tolerate memory attribute aliases.
 	 */
@@ -271,6 +242,35 @@
 }
 #endif
 
+void __attribute__((weak))
+map_devmem(unsigned long pfn, unsigned long len, pgprot_t prot)
+{
+	/* nothing. architectures can override. */
+}
+
+void __attribute__((weak))
+unmap_devmem(unsigned long pfn, unsigned long len, pgprot_t prot)
+{
+	/* nothing. architectures can override. */
+}
+
+static void mmap_mem_open(struct vm_area_struct *vma)
+{
+	map_devmem(vma->vm_pgoff,  vma->vm_end - vma->vm_start,
+		   vma->vm_page_prot);
+}
+
+static void mmap_mem_close(struct vm_area_struct *vma)
+{
+	unmap_devmem(vma->vm_pgoff,  vma->vm_end - vma->vm_start,
+		     vma->vm_page_prot);
+}
+
+static struct vm_operations_struct mmap_mem_ops = {
+	.open  = mmap_mem_open,
+	.close = mmap_mem_close
+};
+
 static int mmap_mem(struct file * file, struct vm_area_struct * vma)
 {
 	size_t size = vma->vm_end - vma->vm_start;
@@ -285,6 +285,8 @@
 						 size,
 						 vma->vm_page_prot);
 
+	vma->vm_ops = &mmap_mem_ops;
+
 	/* Remap-pfn-range will mark the range VM_IO and VM_RESERVED */
 	if (remap_pfn_range(vma,
 			    vma->vm_start,
Index: linux-2.6.git/include/asm-x86/pgtable.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/pgtable.h	2008-01-08 12:40:52.000000000 -0800
+++ linux-2.6.git/include/asm-x86/pgtable.h	2008-01-08 12:47:47.000000000 -0800
@@ -1,5 +1,17 @@
+#ifndef _ASM_X86_PGTABLE_H
+#define _ASM_X86_PGTABLE_H
+
 #ifdef CONFIG_X86_32
 # include "pgtable_32.h"
 #else
 # include "pgtable_64.h"
 #endif
+
+#ifndef __ASSEMBLY__
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+struct file;
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+			      unsigned long size, pgprot_t vma_prot);
+#endif
+
+#endif
Index: linux-2.6.git/include/asm-x86/io.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/io.h	2008-01-08 12:45:17.000000000 -0800
+++ linux-2.6.git/include/asm-x86/io.h	2008-01-08 12:47:47.000000000 -0800
@@ -11,4 +11,9 @@
 
 extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+
+extern int valid_phys_addr_range(unsigned long addr, size_t count);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t size);
+
 #endif
Index: linux-2.6.git/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6.git.orig/arch/x86/mm/ioremap.c	2008-01-08 12:45:17.000000000 -0800
+++ linux-2.6.git/arch/x86/mm/ioremap.c	2008-01-08 12:47:47.000000000 -0800
@@ -1,5 +1,7 @@
 #include <linux/module.h>
+#include <linux/mm.h>
 
+#include <asm/e820.h>
 #include <asm/io.h>
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -29,3 +31,18 @@
 		return ioremap_nocache(phys_addr, size);
 }
 EXPORT_SYMBOL(ioremap_wc);
+
+int valid_phys_addr_range(unsigned long addr, size_t count)
+{
+	if (addr + count > __pa(high_memory))
+		return 0;
+
+	return 1;
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)
+{
+	u64 address = pfn << PAGE_SHIFT;
+	return (is_memory_all_valid(address, address + size) ||
+		!is_memory_any_valid(address, address + size));
+}

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfs vor pci_mmap_resource
  2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
                   ` (9 preceding siblings ...)
  2008-01-10 18:48 ` [patch 10/11] PAT x86: Handle /dev/mem mappings venkatesh.pallipadi
@ 2008-01-10 18:48 ` venkatesh.pallipadi
  2008-01-10 19:43   ` Greg KH
  10 siblings, 1 reply; 48+ messages in thread
From: venkatesh.pallipadi @ 2008-01-10 18:48 UTC (permalink / raw)
  To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, akpm, arjan, jesse.barnes, davem
  Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha

[-- Attachment #1: sysfs_resource_wc_mapping.patch --]
[-- Type: text/plain, Size: 4629 bytes --]

New interfaces exported for uc and wc accesses. Apps has to change to use
these new interfaces.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

Index: linux-2.6.git/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.git.orig/drivers/pci/pci-sysfs.c	2008-01-08 10:53:10.000000000 -0800
+++ linux-2.6.git/drivers/pci/pci-sysfs.c	2008-01-08 12:51:22.000000000 -0800
@@ -426,7 +426,7 @@
  */
 static int
 pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
-		  struct vm_area_struct *vma)
+		  struct vm_area_struct *vma, int write_combine)
 {
 	struct pci_dev *pdev = to_pci_dev(container_of(kobj,
 						       struct device, kobj));
@@ -449,7 +449,21 @@
 	vma->vm_pgoff += start >> PAGE_SHIFT;
 	mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
 
-	return pci_mmap_page_range(pdev, vma, mmap_type, 0);
+	return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);
+}
+
+static int
+pci_mmap_resource_uc(struct kobject *kobj, struct bin_attribute *attr,
+		     struct vm_area_struct *vma)
+{
+	return pci_mmap_resource(kobj, attr, vma, 0);
+}
+
+static int
+pci_mmap_resource_wc(struct kobject *kobj, struct bin_attribute *attr,
+		     struct vm_area_struct *vma)
+{
+	return pci_mmap_resource(kobj, attr, vma, 1);
 }
 
 /**
@@ -472,9 +486,46 @@
 			sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
 			kfree(res_attr);
 		}
+
+		res_attr = pdev->res_attr_wc[i];
+		if (res_attr) {
+			sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
+			kfree(res_attr);
+		}
 	}
 }
 
+static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
+{
+	/* allocate attribute structure, piggyback attribute name */
+	int name_len = write_combine ? 13 : 10;
+	struct bin_attribute *res_attr;
+	int retval;
+
+	res_attr = kzalloc(sizeof(*res_attr) + name_len, GFP_ATOMIC);
+	if (res_attr) {
+		char *res_attr_name = (char *)(res_attr + 1);
+
+		if (write_combine) {
+			pdev->res_attr_wc[num] = res_attr;
+			sprintf(res_attr_name, "resource%d_wc", num);
+			res_attr->mmap = pci_mmap_resource_wc;
+		} else {
+			pdev->res_attr[num] = res_attr;
+			sprintf(res_attr_name, "resource%d", num);
+			res_attr->mmap = pci_mmap_resource_uc;
+		}
+		res_attr->attr.name = res_attr_name;
+		res_attr->attr.mode = S_IRUSR | S_IWUSR;
+		res_attr->size = pci_resource_len(pdev, num);
+		res_attr->private = &pdev->resource[num];
+		retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr);
+	} else
+		retval = -ENOMEM;
+
+	return retval;
+}
+
 /**
  * pci_create_resource_files - create resource files in sysfs for @dev
  * @dev: dev in question
@@ -488,31 +539,18 @@
 
 	/* Expose the PCI resources from this device as files */
 	for (i = 0; i < PCI_ROM_RESOURCE; i++) {
-		struct bin_attribute *res_attr;
-
 		/* skip empty resources */
 		if (!pci_resource_len(pdev, i))
 			continue;
 
-		/* allocate attribute structure, piggyback attribute name */
-		res_attr = kzalloc(sizeof(*res_attr) + 10, GFP_ATOMIC);
-		if (res_attr) {
-			char *res_attr_name = (char *)(res_attr + 1);
+		retval = pci_create_attr(pdev, i, 0);
+		/* for prefetchable resources, create a WC mappable file */
+		if (!retval && pdev->resource[i].flags & IORESOURCE_PREFETCH)
+			retval = pci_create_attr(pdev, i, 1);
 
-			pdev->res_attr[i] = res_attr;
-			sprintf(res_attr_name, "resource%d", i);
-			res_attr->attr.name = res_attr_name;
-			res_attr->attr.mode = S_IRUSR | S_IWUSR;
-			res_attr->size = pci_resource_len(pdev, i);
-			res_attr->mmap = pci_mmap_resource;
-			res_attr->private = &pdev->resource[i];
-			retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr);
-			if (retval) {
-				pci_remove_resource_files(pdev);
-				return retval;
-			}
-		} else {
-			return -ENOMEM;
+		if (retval) {
+			pci_remove_resource_files(pdev);
+			return retval;
 		}
 	}
 	return 0;
Index: linux-2.6.git/include/linux/pci.h
===================================================================
--- linux-2.6.git.orig/include/linux/pci.h	2008-01-08 10:53:10.000000000 -0800
+++ linux-2.6.git/include/linux/pci.h	2008-01-08 12:51:22.000000000 -0800
@@ -201,6 +201,7 @@
 	struct bin_attribute *rom_attr; /* attribute descriptor for sysfs ROM entry */
 	int rom_attr_enabled;		/* has display of the rom attribute been enabled? */
 	struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
+	struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
 #ifdef CONFIG_PCI_MSI
 	struct list_head msi_list;
 #endif

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 06/11] PAT x86: Refactoring i386 cpa
  2008-01-10 18:48 ` [patch 06/11] PAT x86: Refactoring i386 cpa venkatesh.pallipadi
@ 2008-01-10 19:00   ` Andi Kleen
  2008-01-14 16:47     ` Ingo Molnar
  0 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-01-10 19:00 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	hpa, linux-kernel, Suresh Siddha

venkatesh.pallipadi@intel.com writes:

> This makes 32 bit cpa similar to x86_64 and makes it easier for following PAT
> patches.

Please don't do this -- that would be a nightmare merging with my CPA
series.

This means adding the _addr() entry point is ok (although it should not 
be needed on i386 because it doesn't do the end_pfn_mapped trick), but not 
a wide scale refactoring.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 18:48 ` [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text venkatesh.pallipadi
@ 2008-01-10 19:06   ` Andi Kleen
  2008-01-10 19:17     ` Pallipadi, Venkatesh
  2008-01-10 21:05   ` Linus Torvalds
  1 sibling, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-01-10 19:06 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	hpa, linux-kernel, Suresh Siddha

venkatesh.pallipadi@intel.com writes:

> x86_64: Map only usable memory in identity map. 

I don't think that is needed or makes sense for reserved/ACPI * etc. 
Only e820 holes should be truly unmapped because only those should
contain mmio.

> All reserved memory maps to a
> zero page. 

Why zero page?  Why not unmap.

Anyways you could make that a zillion times more simple by just rounding
the e820 areas to 2MB -- for the holes only that should be ok I think; 
i would expect them to be near always already suitably aligned.

In short this can be all done much simpler.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 09/11] PAT x86: Add ioremap_wc support
  2008-01-10 18:48 ` [patch 09/11] PAT x86: Add ioremap_wc support venkatesh.pallipadi
@ 2008-01-10 19:08   ` Andi Kleen
  2008-01-10 19:25     ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-01-10 19:08 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	hpa, linux-kernel, Suresh Siddha

venkatesh.pallipadi@intel.com writes:
> Index: linux-2.6.git/include/asm-generic/iomap.h
> ===================================================================
> --- linux-2.6.git.orig/include/asm-generic/iomap.h	2008-01-08 03:31:37.000000000 -0800
> +++ linux-2.6.git/include/asm-generic/iomap.h	2008-01-08 05:15:56.000000000 -0800
> @@ -65,4 +65,8 @@
>  extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
>  extern void pci_iounmap(struct pci_dev *dev, void __iomem *);
>  
> +#ifndef ioremap_wc
> +#define ioremap_wc ioremap_nocache
> +#endif

I don't think that's a good idea. Drivers should be able to detect this somehow.
Handling UC mappings as WC will probably give very poor results.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 03/11] PAT x86: Map only usable memory in i386 identity map
  2008-01-10 18:48 ` [patch 03/11] PAT x86: Map only usable memory in i386 identity map venkatesh.pallipadi
@ 2008-01-10 19:10   ` Andi Kleen
  0 siblings, 0 replies; 48+ messages in thread
From: Andi Kleen @ 2008-01-10 19:10 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	hpa, linux-kernel, Suresh Siddha

venkatesh.pallipadi@intel.com writes:

> i386: Map only usable memory in identity map. Reserved memory maps to a
> zero page.

Same comments as for x86-64 version.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 07/11] PAT x86: pat-conflict resolution using linear list
  2008-01-10 18:48 ` [patch 07/11] PAT x86: pat-conflict resolution using linear list venkatesh.pallipadi
@ 2008-01-10 19:13   ` Andi Kleen
  2008-01-10 20:08     ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-01-10 19:13 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	hpa, linux-kernel, Suresh Siddha

venkatesh.pallipadi@intel.com writes:
>  
>  	/* Reset the direct mapping. Can block */
> -	if (p->flags >> 20)
> -		ioremap_change_attr(p->phys_addr, p->size, 0);
> +	if (p->flags >> 20) {
> +		free_mattr(p->phys_addr, p->phys_addr + get_vm_area_size(p),
> +		           p->flags>>20);
> +		ioremap_change_attr(p->phys_addr, get_vm_area_size(p), 0);

If you really unmap all holes and forbid (or let it just return the
__va address) ioremap on anything mapped (which is probably ok) then
you can eliminate that completely.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 19:06   ` Andi Kleen
@ 2008-01-10 19:17     ` Pallipadi, Venkatesh
  2008-01-10 19:28       ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Pallipadi, Venkatesh @ 2008-01-10 19:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	hpa, linux-kernel, Siddha, Suresh B

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org 
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andi Kleen
>Sent: Thursday, January 10, 2008 11:07 AM
>To: Pallipadi, Venkatesh
>Cc: ebiederm@xmission.com; rdreier@cisco.com; 
>torvalds@linux-foundation.org; gregkh@suse.de; 
>airlied@skynet.ie; davej@redhat.com; mingo@elte.hu; 
>tglx@linutronix.de; hpa@zytor.co; 
>linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 02/11] PAT x86: Map only usable memory in 
>x86_64 identity map and kernel text
>
>venkatesh.pallipadi@intel.com writes:
>
>> x86_64: Map only usable memory in identity map. 
>
>I don't think that is needed or makes sense for reserved/ACPI * etc. 
>Only e820 holes should be truly unmapped because only those should
>contain mmio.

Do you mean just the regions that are not listed in e820 at all? We
should also not map anything marked "RESERVED" in e820. Right?

>> All reserved memory maps to a
>> zero page. 
>
>Why zero page?  Why not unmap.

I had it unmapped first. Then thought of zero mapping for dd of devmem
to continue working. May be there are apps that depend on that?
Also, dd of devmem seems to be already broken with big memory without
any of these changes.
 
>Anyways you could make that a zillion times more simple by 
>just rounding
>the e820 areas to 2MB -- for the holes only that should be ok I think; 
>i would expect them to be near always already suitably aligned.
>
>In short this can be all done much simpler.

On systems I tested, ACPI regions are typically not 2MB aligned. And on
some systems there are few 4k pages of reserved holes just before
0xa0000. PCI reserved regions are 2MB aligned however. I agree that
making this 2MB aligned will make this patch a lot simpler. But, not all
reserved regions seems to be aligned that way.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 09/11] PAT x86: Add ioremap_wc support
  2008-01-10 19:08   ` Andi Kleen
@ 2008-01-10 19:25     ` Pallipadi, Venkatesh
  2008-01-12  0:18       ` Roland Dreier
  0 siblings, 1 reply; 48+ messages in thread
From: Pallipadi, Venkatesh @ 2008-01-10 19:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	hpa, linux-kernel, Siddha, Suresh B


>-----Original Message-----
>From: Andi Kleen [mailto:andi@firstfloor.org] 
>Sent: Thursday, January 10, 2008 11:09 AM
>To: Pallipadi, Venkatesh
>Cc: ebiederm@xmission.com; rdreier@cisco.com; 
>torvalds@linux-foundation.org; gregkh@suse.de; 
>airlied@skynet.ie; davej@redhat.com; mingo@elte.hu; 
>tglx@linutronix.de; hpa@zytor.co; 
>linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 09/11] PAT x86: Add ioremap_wc support
>
>venkatesh.pallipadi@intel.com writes:
>> Index: linux-2.6.git/include/asm-generic/iomap.h
>> ===================================================================
>> --- linux-2.6.git.orig/include/asm-generic/iomap.h	
>2008-01-08 03:31:37.000000000 -0800
>> +++ linux-2.6.git/include/asm-generic/iomap.h	
>2008-01-08 05:15:56.000000000 -0800
>> @@ -65,4 +65,8 @@
>>  extern void __iomem *pci_iomap(struct pci_dev *dev, int 
>bar, unsigned long max);
>>  extern void pci_iounmap(struct pci_dev *dev, void __iomem *);
>>  
>> +#ifndef ioremap_wc
>> +#define ioremap_wc ioremap_nocache
>> +#endif
>
>I don't think that's a good idea. Drivers should be able to 
>detect this somehow.
>Handling UC mappings as WC will probably give very poor results.
>

It is the other way. ioremap_wc aliases to ioremap_nocache.
This was based on earlier feedback from Roland.
>From: Roland Dreier [mailto:rdreier@cisco.com] 
>I think ioremap_wc() needs to be available on all archs for this to be
>really useful to drivers.  It can be a fallback to ioremap_nocache()
>everywhere except 64-bit x86, but it's not nice for every driver that
>wants to use this to need an "#ifdef X86" or whatever.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 19:17     ` Pallipadi, Venkatesh
@ 2008-01-10 19:28       ` Andi Kleen
  2008-01-10 20:50         ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-01-10 19:28 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Andi Kleen, ebiederm, rdreier, torvalds, gregkh, airlied, davej,
	mingo, tglx, hpa, linux-kernel, Siddha, Suresh B

On Thu, Jan 10, 2008 at 11:17:07AM -0800, Pallipadi, Venkatesh wrote:
> >I don't think that is needed or makes sense for reserved/ACPI * etc. 
> >Only e820 holes should be truly unmapped because only those should
> >contain mmio.
> 
> Do you mean just the regions that are not listed in e820 at all? We
> should also not map anything marked "RESERVED" in e820. Right?

RESERVED is usually memory used by the BIOS. Properly MMIO areas
should be in holes.

Of course there might be buggy BIOS who violate that but the
only way to find out is to check for the case in ioremap and warn. I would
be still optimistic of it being correct.

Another way would be to double check against the MTRRs - if it's UC then
it should be unmapped. Maybe that would be a good idea. That should
catch all true mmio holes unless a BIOS maps them cached but if it does
that it's already beyond help.

> 
> >> All reserved memory maps to a
> >> zero page. 
> >
> >Why zero page?  Why not unmap.
> 
> I had it unmapped first. Then thought of zero mapping for dd of devmem
> to continue working. May be there are apps that depend on that?
> Also, dd of devmem seems to be already broken with big memory without
> any of these changes.

Exactly it's already broken.

Anyways if someone accesses mmio through /dev/mem I think they definitely
want the real mappings, not a zero page.  And dev/mem should provide.
The trick is just to do it without caching attribute violations, 
but with mattr it is possible.

>  
> >Anyways you could make that a zillion times more simple by 
> >just rounding
> >the e820 areas to 2MB -- for the holes only that should be ok I think; 
> >i would expect them to be near always already suitably aligned.
> >
> >In short this can be all done much simpler.
> 
> On systems I tested, ACPI regions are typically not 2MB aligned. And on

ACPI regions don't need to be unmapped.

> some systems there are few 4k pages of reserved holes just before

reserved shouldn't be unmapped, just holes. Do they have holes
there or reserved areas?

I still hope 2MB alignment will work out.

-Andi


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfs vor pci_mmap_resource
  2008-01-10 18:48 ` [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfs vor pci_mmap_resource venkatesh.pallipadi
@ 2008-01-10 19:43   ` Greg KH
  2008-01-10 20:54     ` [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfsvor pci_mmap_resource Pallipadi, Venkatesh
  0 siblings, 1 reply; 48+ messages in thread
From: Greg KH @ 2008-01-10 19:43 UTC (permalink / raw)
  To: venkatesh.pallipadi
  Cc: ak, ebiederm, rdreier, torvalds, airlied, davej, mingo, tglx,
	hpa, akpm, arjan, jesse.barnes, davem, linux-kernel,
	Suresh Siddha

On Thu, Jan 10, 2008 at 10:48:51AM -0800, venkatesh.pallipadi@intel.com wrote:
> New interfaces exported for uc and wc accesses. Apps has to change to use
> these new interfaces.
> 
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

Please update the documentation for this change, as well as adding
something to Documentation/ABI/ for these new sysfs files.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 07/11] PAT x86: pat-conflict resolution using linear list
  2008-01-10 19:13   ` Andi Kleen
@ 2008-01-10 20:08     ` Pallipadi, Venkatesh
  0 siblings, 0 replies; 48+ messages in thread
From: Pallipadi, Venkatesh @ 2008-01-10 20:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	linux-kernel, Siddha, Suresh B


>-----Original Message-----
>From: Andi Kleen [mailto:andi@firstfloor.org] 
>Sent: Thursday, January 10, 2008 11:13 AM
>To: Pallipadi, Venkatesh
>Cc: ebiederm@xmission.com; rdreier@cisco.com; 
>torvalds@linux-foundation.org; gregkh@suse.de; 
>airlied@skynet.ie; davej@redhat.com; mingo@elte.hu; 
>tglx@linutronix.de; hpa@zytor.co; 
>linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 07/11] PAT x86: pat-conflict resolution 
>using linear list
>
>venkatesh.pallipadi@intel.com writes:
>>  
>>  	/* Reset the direct mapping. Can block */
>> -	if (p->flags >> 20)
>> -		ioremap_change_attr(p->phys_addr, p->size, 0);
>> +	if (p->flags >> 20) {
>> +		free_mattr(p->phys_addr, p->phys_addr + 
>get_vm_area_size(p),
>> +		           p->flags>>20);
>> +		ioremap_change_attr(p->phys_addr, 
>get_vm_area_size(p), 0);
>
>If you really unmap all holes and forbid (or let it just return the
>__va address) ioremap on anything mapped (which is probably ok) then
>you can eliminate that completely.
>

We heard X can allocate a page and then map it UC using it through gart.
So, I don't we can forbid all ioremaps for RAM.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 19:28       ` Andi Kleen
@ 2008-01-10 20:50         ` Pallipadi, Venkatesh
  2008-01-10 21:16           ` Andi Kleen
  2008-01-14 16:43           ` Ingo Molnar
  0 siblings, 2 replies; 48+ messages in thread
From: Pallipadi, Venkatesh @ 2008-01-10 20:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: Andi Kleen [mailto:andi@firstfloor.org] 
>Sent: Thursday, January 10, 2008 11:28 AM
>To: Pallipadi, Venkatesh
>Cc: Andi Kleen; ebiederm@xmission.com; rdreier@cisco.com; 
>torvalds@linux-foundation.org; gregkh@suse.de; 
>airlied@skynet.ie; davej@redhat.com; mingo@elte.hu; 
>tglx@linutronix.de; hpa@zytor.co; 
>linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 02/11] PAT x86: Map only usable memory in 
>x86_64 identity map and kernel text
>
>On Thu, Jan 10, 2008 at 11:17:07AM -0800, Pallipadi, Venkatesh wrote:
>> >I don't think that is needed or makes sense for 
>reserved/ACPI * etc. 
>> >Only e820 holes should be truly unmapped because only those should
>> >contain mmio.
>> 
>> Do you mean just the regions that are not listed in e820 at all? We
>> should also not map anything marked "RESERVED" in e820. Right?
>
>RESERVED is usually memory used by the BIOS. Properly MMIO areas
>should be in holes.
>
>Of course there might be buggy BIOS who violate that but the
>only way to find out is to check for the case in ioremap and 
>warn. I would
>be still optimistic of it being correct.
>
>Another way would be to double check against the MTRRs - if 
>it's UC then
>it should be unmapped. Maybe that would be a good idea. That should
>catch all true mmio holes unless a BIOS maps them cached but if it does
>that it's already beyond help.

One of the test systems I have has following E820
 BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
 BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
 BIOS-e820: 00000000cff60000 - 00000000cff69000 (ACPI data)
 BIOS-e820: 00000000cff69000 - 00000000cff80000 (ACPI NVS)
 BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000130000000 (usable)

I think it is unsafe to access any reserved areas through "WB" not just
mmio regions. In the above case 0xe0000000-0xf0000000 is one such
region.

Also, relying on MTRR, is like giving more importance to BIOS writer
than required :-). I think the best way to deal with MTRR is just to not
touch it. Leave it as it is and do not try to assume that they are
correct, as frequently they will not be.

>> >> All reserved memory maps to a
>> >> zero page. 
>> >
>> >Why zero page?  Why not unmap.
>> 
>> I had it unmapped first. Then thought of zero mapping for dd 
>of devmem
>> to continue working. May be there are apps that depend on that?
>> Also, dd of devmem seems to be already broken with big memory without
>> any of these changes.
>
>Exactly it's already broken.
>
>Anyways if someone accesses mmio through /dev/mem I think they 
>definitely
>want the real mappings, not a zero page.  And dev/mem should provide.
>The trick is just to do it without caching attribute violations, 
>but with mattr it is possible.

I don't like /dev/mem supporting access to mmio. We do not know what
attributes to use for these regions. We can potentially map all these
pages uncacheable. But there may be cases where reading an address can
block too possibly?

>> >Anyways you could make that a zillion times more simple by 
>> >just rounding
>> >the e820 areas to 2MB -- for the holes only that should be 
>ok I think; 
>> >i would expect them to be near always already suitably aligned.
>> >
>> >In short this can be all done much simpler.
>> 
>> On systems I tested, ACPI regions are typically not 2MB 
>aligned. And on
>
>ACPI regions don't need to be unmapped.
>
>> some systems there are few 4k pages of reserved holes just before
>
>reserved shouldn't be unmapped, just holes. Do they have holes
>there or reserved areas?
>
>I still hope 2MB alignment will work out.

E820 above has a combination of reserved and holes.
The problem is that we end up depending on specific e820s and paltform
specific problems/workarounds. This is not a real problem for i386 at
all, as we map only < 1G memory there. So, it is limited to x86_64
systems which should be less in number.

Thanks,
Venki 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfsvor pci_mmap_resource
  2008-01-10 19:43   ` Greg KH
@ 2008-01-10 20:54     ` Pallipadi, Venkatesh
  0 siblings, 0 replies; 48+ messages in thread
From: Pallipadi, Venkatesh @ 2008-01-10 20:54 UTC (permalink / raw)
  To: Greg KH
  Cc: ak, ebiederm, rdreier, torvalds, airlied, davej, mingo, tglx,
	akpm, arjan, Barnes, Jesse, davem, linux-kernel, Siddha,
	Suresh B

 

>-----Original Message-----
>From: Greg KH [mailto:gregkh@suse.de] 
>Sent: Thursday, January 10, 2008 11:43 AM
>To: Pallipadi, Venkatesh
>Cc: ak@muc.de; ebiederm@xmission.com; rdreier@cisco.com; 
>torvalds@linux-foundation.org; airlied@skynet.ie; 
>davej@redhat.com; mingo@elte.hu; tglx@linutronix.de; 
>hpa@zytor.com; akpm@linux-foundation.org; arjan@infradead.org; 
>Barnes, Jesse; davem@davemloft.net; 
>linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 11/11] PAT x86: Expose uc and wc 
>interfaces in /sysfsvor pci_mmap_resource
>
>On Thu, Jan 10, 2008 at 10:48:51AM -0800, 
>venkatesh.pallipadi@intel.com wrote:
>> New interfaces exported for uc and wc accesses. Apps has to 
>change to use
>> these new interfaces.
>> 
>> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
>
>Please update the documentation for this change, as well as adding
>something to Documentation/ABI/ for these new sysfs files.
>

OK. Will do.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 18:48 ` [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text venkatesh.pallipadi
  2008-01-10 19:06   ` Andi Kleen
@ 2008-01-10 21:05   ` Linus Torvalds
  2008-01-10 21:57     ` Pallipadi, Venkatesh
  1 sibling, 1 reply; 48+ messages in thread
From: Linus Torvalds @ 2008-01-10 21:05 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: ak, ebiederm, rdreier, gregkh, airlied, davej, mingo, tglx, hpa,
	akpm, arjan, jesse.barnes, davem, linux-kernel, Suresh Siddha



On Thu, 10 Jan 2008, venkatesh.pallipadi@intel.com wrote:
>
> x86_64: Map only usable memory in identity map. All reserved memory maps to a
> zero page.

I don't mind this horribly per se, but why a zero page?

Accessing that page without mapping it explicitly would be a bug with 
your change - if only because you'd get the wrong value!

So why map it at all? The only thing mapping it can do is to hide bugs.

			Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 20:50         ` Pallipadi, Venkatesh
@ 2008-01-10 21:16           ` Andi Kleen
  2008-01-10 22:25             ` Pallipadi, Venkatesh
  2008-01-14 16:43           ` Ingo Molnar
  1 sibling, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-01-10 21:16 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Andi Kleen, ebiederm, rdreier, torvalds, gregkh, airlied, davej,
	mingo, tglx, linux-kernel, Siddha, Suresh B

> I think it is unsafe to access any reserved areas through "WB" not just
> mmio regions. In the above case 0xe0000000-0xf0000000 is one such
> region.

That is 2MB aligned.

> 
> Also, relying on MTRR, is like giving more importance to BIOS writer

Let's call it double checking. 

Besides MTRRs will not go away anyways. The goal is just to not
require _more_ MTRRs in Linux like currently. But using the existing
ones is no problem.

> than required :-). I think the best way to deal with MTRR is just to not
> touch it. Leave it as it is and do not try to assume that they are
> correct, as frequently they will not be.

This means you have to trust the e820 map then. It's really the
same thing.

Anyways if you don't like checking the MTRRs that's fine too, but
then the e820 map has to be trusted. If that works it is great.

If not some double checking will be needed and the MTRRs would
be more convenient for that. The code would be somewhat ugly though.


> 
> >> >> All reserved memory maps to a
> >> >> zero page. 
> >> >
> >> >Why zero page?  Why not unmap.
> >> 
> >> I had it unmapped first. Then thought of zero mapping for dd 
> >of devmem
> >> to continue working. May be there are apps that depend on that?
> >> Also, dd of devmem seems to be already broken with big memory without
> >> any of these changes.
> >
> >Exactly it's already broken.
> >
> >Anyways if someone accesses mmio through /dev/mem I think they 
> >definitely
> >want the real mappings, not a zero page.  And dev/mem should provide.
> >The trick is just to do it without caching attribute violations, 
> >but with mattr it is possible.
> 
> I don't like /dev/mem supporting access to mmio. We do not know what

But it always did that. I'm sure you'll break stuff if you forbid
it suddenly.

> attributes to use for these regions.  We can potentially map all these
> pages uncacheable. 

That is what current /dev/mem does.

> But there may be cases where reading an address can
> block too possibly?

Yes sure, machine may hang, but that was always the case and I don't
think it can be changed.

> 
> >> >Anyways you could make that a zillion times more simple by 
> >> >just rounding
> >> >the e820 areas to 2MB -- for the holes only that should be 
> >ok I think; 
> >> >i would expect them to be near always already suitably aligned.
> >> >
> >> >In short this can be all done much simpler.
> >> 
> >> On systems I tested, ACPI regions are typically not 2MB 
> >aligned. And on
> >
> >ACPI regions don't need to be unmapped.
> >
> >> some systems there are few 4k pages of reserved holes just before
> >
> >reserved shouldn't be unmapped, just holes. Do they have holes
> >there or reserved areas?
> >
> >I still hope 2MB alignment will work out.
> 
> E820 above has a combination of reserved and holes.
> The problem is that we end up depending on specific e820s and paltform
> specific problems/workarounds. This is not a real problem for i386 at

> all, as we map only < 1G memory there. 

First there is the 2GB and in theory 1/3 GB split too which are supported.
And then in theory someone could put mmio in the first 1GB anyways (e.g.
in the 1MB hole) 

I don't think you can ignore i386 here.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 21:05   ` Linus Torvalds
@ 2008-01-10 21:57     ` Pallipadi, Venkatesh
  2008-01-10 22:15       ` Linus Torvalds
  0 siblings, 1 reply; 48+ messages in thread
From: Pallipadi, Venkatesh @ 2008-01-10 21:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: ak, ebiederm, rdreier, gregkh, airlied, davej, mingo, tglx, akpm,
	arjan, Barnes, Jesse, davem, linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: Linus Torvalds [mailto:torvalds@linux-foundation.org] 
>Sent: Thursday, January 10, 2008 1:05 PM
>To: Pallipadi, Venkatesh
>Cc: ak@muc.de; ebiederm@xmission.com; rdreier@cisco.com; 
>gregkh@suse.de; airlied@skynet.ie; davej@redhat.com; 
>mingo@elte.hu; tglx@linutronix.de; hpa@zytor.com; 
>akpm@linux-foundation.org; arjan@infradead.org; Barnes, Jesse; 
>davem@davemloft.net; linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 02/11] PAT x86: Map only usable memory in 
>x86_64 identity map and kernel text
>
>
>
>On Thu, 10 Jan 2008, venkatesh.pallipadi@intel.com wrote:
>>
>> x86_64: Map only usable memory in identity map. All reserved 
>memory maps to a
>> zero page.
>
>I don't mind this horribly per se, but why a zero page?
>
>Accessing that page without mapping it explicitly would be a bug with 
>your change - if only because you'd get the wrong value!
>
>So why map it at all? The only thing mapping it can do is to hide bugs.
>

Yes. I had those pages not mapped at all earlier. The reason I switched
to zero page is to continue support cases like:
 BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
 BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)

In this case if some one does a dd of /dev/mem before they can read the
contents of usable memory in 0x100000-0xcff60000 range. But, if I not
map reserved regions, dd will stop after fist such hole. Even though
this may not be a good usage model, I thought there may be apps
depending on such things. Having said that, I do not like having dummy
zero page there very much. So, if we do not see any regressions due to
usages like above, I will be happy to remove mapping reserved regions
altogether.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 21:57     ` Pallipadi, Venkatesh
@ 2008-01-10 22:15       ` Linus Torvalds
  2008-01-10 22:27         ` Pallipadi, Venkatesh
  2008-01-10 22:50         ` Valdis.Kletnieks
  0 siblings, 2 replies; 48+ messages in thread
From: Linus Torvalds @ 2008-01-10 22:15 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: ak, ebiederm, rdreier, gregkh, airlied, davej, mingo, tglx, akpm,
	arjan, Barnes, Jesse, davem, linux-kernel, Siddha, Suresh B



On Thu, 10 Jan 2008, Pallipadi, Venkatesh wrote:
> 
> Yes. I had those pages not mapped at all earlier. The reason I switched
> to zero page is to continue support cases like:
>  BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
>  BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
>  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
> 
> In this case if some one does a dd of /dev/mem before they can read the
> contents of usable memory in 0x100000-0xcff60000 range.

Well, I think that /dev/mem should simply give them the right info. That's 
what people use /dev/mem for - doing things like reading BIOS images etc. 

So returning *either* a zero page *or* stopping at the first hole is both 
equally wrong. 

			Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 21:16           ` Andi Kleen
@ 2008-01-10 22:25             ` Pallipadi, Venkatesh
  2008-01-10 22:35               ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Pallipadi, Venkatesh @ 2008-01-10 22:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo, tglx,
	linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org 
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andi Kleen
>Sent: Thursday, January 10, 2008 1:17 PM
>To: Pallipadi, Venkatesh
>Cc: Andi Kleen; ebiederm@xmission.com; rdreier@cisco.com; 
>torvalds@linux-foundation.org; gregkh@suse.de; 
>airlied@skynet.ie; davej@redhat.com; mingo@elte.hu; 
>tglx@linutronix.de; linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: Re: [patch 02/11] PAT x86: Map only usable memory in 
>x86_64 identity map and kernel text
>
>> I think it is unsafe to access any reserved areas through 
>"WB" not just
>> mmio regions. In the above case 0xe0000000-0xf0000000 is one such
>> region.
>
>That is 2MB aligned.

That e820 also has a reserved here at 0x9d000.

 BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
 BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)

If we keep mapping for such pages, it will be problematic as if a driver
later does a ioremap, then we have to go through split-pages and cpa.
With not mapping any reserved regions at all, we can avoid cpa for all
maps of reserved regions. Reducing the complications at setup will make
code more complicated at ioremap, etc.

Most of the holes/reserved areas will be 2M aligned, other than initial
2M and possible 2M around ACPI region. So, we may end up mapping some of
those pages with small pages. Even though it was not enforced until now,
I feel that is required for correctness.

>> >
>> >Exactly it's already broken.
>> >
>> >Anyways if someone accesses mmio through /dev/mem I think they 
>> >definitely
>> >want the real mappings, not a zero page.  And dev/mem 
>should provide.
>> >The trick is just to do it without caching attribute violations, 
>> >but with mattr it is possible.
>> 
>> I don't like /dev/mem supporting access to mmio. We do not know what
>
>But it always did that. I'm sure you'll break stuff if you forbid
>it suddenly.
>
>> attributes to use for these regions.  We can potentially map 
>all these
>> pages uncacheable. 
>
>That is what current /dev/mem does.

May be I am missing something. But, I don't think I saw /dev/mem
checking whether some region is reserved and mapping those pages as
uncacheable. As I though, its mostly done as MTRR has such setting. If I
do dd of devmem which ends up reading all reserved regions today, I see
one of my systems dying horribly with NMI dazed and confused and the
other gets SCSI errors etc. I am not sure how can some apps depend on
reading mmio regions through /dev/mem. Any particular app you are
thinking about?

>> But there may be cases where reading an address can
>> block too possibly?
>
>Yes sure, machine may hang, but that was always the case and I don't
>think it can be changed.
>
>> 
>> >> >Anyways you could make that a zillion times more simple by 
>> >> >just rounding
>> >> >the e820 areas to 2MB -- for the holes only that should be 
>> >ok I think; 
>> >> >i would expect them to be near always already suitably aligned.
>> >> >
>> >> >In short this can be all done much simpler.
>> >> 
>> >> On systems I tested, ACPI regions are typically not 2MB 
>> >aligned. And on
>> >
>> >ACPI regions don't need to be unmapped.
>> >
>> >> some systems there are few 4k pages of reserved holes just before
>> >
>> >reserved shouldn't be unmapped, just holes. Do they have holes
>> >there or reserved areas?
>> >
>> >I still hope 2MB alignment will work out.
>> 
>> E820 above has a combination of reserved and holes.
>> The problem is that we end up depending on specific e820s 
>and paltform
>> specific problems/workarounds. This is not a real problem for i386 at
>
>> all, as we map only < 1G memory there. 
>
>First there is the 2GB and in theory 1/3 GB split too which 
>are supported.
>And then in theory someone could put mmio in the first 1GB 
>anyways (e.g.
>in the 1MB hole) 
>
>I don't think you can ignore i386 here.
>

OK. I was thinking that we will have smaller subset of systems to worry
about with x86_64. With above, yes. We need to worry about i386 as well.

Other than the complicated code, do you see any issues of identity
mapping only "usable" and "ACPI" regions as per e820? We can possible
try to simplify the code, if that is the only concern.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 22:15       ` Linus Torvalds
@ 2008-01-10 22:27         ` Pallipadi, Venkatesh
  2008-01-10 22:50         ` Valdis.Kletnieks
  1 sibling, 0 replies; 48+ messages in thread
From: Pallipadi, Venkatesh @ 2008-01-10 22:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: ak, ebiederm, rdreier, gregkh, airlied, davej, mingo, tglx, akpm,
	arjan, Barnes, Jesse, davem, linux-kernel, Siddha, Suresh B

 

>-----Original Message-----
>From: Linus Torvalds [mailto:torvalds@linux-foundation.org] 
>Sent: Thursday, January 10, 2008 2:15 PM
>To: Pallipadi, Venkatesh
>Cc: ak@muc.de; ebiederm@xmission.com; rdreier@cisco.com; 
>gregkh@suse.de; airlied@skynet.ie; davej@redhat.com; 
>mingo@elte.hu; tglx@linutronix.de; akpm@linux-foundation.org; 
>arjan@infradead.org; Barnes, Jesse; davem@davemloft.net; 
>linux-kernel@vger.kernel.org; Siddha, Suresh B
>Subject: RE: [patch 02/11] PAT x86: Map only usable memory in 
>x86_64 identity map and kernel text
>
>
>
>On Thu, 10 Jan 2008, Pallipadi, Venkatesh wrote:
>> 
>> Yes. I had those pages not mapped at all earlier. The reason 
>I switched
>> to zero page is to continue support cases like:
>>  BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
>>  BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
>>  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
>>  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
>>  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
>> 
>> In this case if some one does a dd of /dev/mem before they 
>can read the
>> contents of usable memory in 0x100000-0xcff60000 range.
>
>Well, I think that /dev/mem should simply give them the right 
>info. That's 
>what people use /dev/mem for - doing things like reading BIOS 
>images etc. 
>
>So returning *either* a zero page *or* stopping at the first 
>hole is both 
>equally wrong. 
>

I was not fully clear in my earlier email. Mapping /dev/mem would still
work with our changes. As they go through proper map interface. It is
the dd of dev mem which does the read that has the problem. I was
wondering of apps using dd.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 22:25             ` Pallipadi, Venkatesh
@ 2008-01-10 22:35               ` Andi Kleen
  0 siblings, 0 replies; 48+ messages in thread
From: Andi Kleen @ 2008-01-10 22:35 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Andi Kleen, ebiederm, rdreier, torvalds, gregkh, airlied, davej,
	mingo, tglx, linux-kernel, Siddha, Suresh B

On Thu, Jan 10, 2008 at 02:25:29PM -0800, Pallipadi, Venkatesh wrote:
>  
> 
> >-----Original Message-----
> >From: linux-kernel-owner@vger.kernel.org 
> >[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andi Kleen
> >Sent: Thursday, January 10, 2008 1:17 PM
> >To: Pallipadi, Venkatesh
> >Cc: Andi Kleen; ebiederm@xmission.com; rdreier@cisco.com; 
> >torvalds@linux-foundation.org; gregkh@suse.de; 
> >airlied@skynet.ie; davej@redhat.com; mingo@elte.hu; 
> >tglx@linutronix.de; linux-kernel@vger.kernel.org; Siddha, Suresh B
> >Subject: Re: [patch 02/11] PAT x86: Map only usable memory in 
> >x86_64 identity map and kernel text
> >
> >> I think it is unsafe to access any reserved areas through 
> >"WB" not just
> >> mmio regions. In the above case 0xe0000000-0xf0000000 is one such
> >> region.
> >
> >That is 2MB aligned.
> 
> That e820 also has a reserved here at 0x9d000.

That's not a hole

> 
>  BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
>  BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
> 
> If we keep mapping for such pages, it will be problematic as if a driver
> later does a ioremap, then we have to go through split-pages and cpa.

It should not do ioremap uncacheable from reserved because there
shouldn't be a MMIO hole in there. It can do ioremap_cachable()
but that is ok.

> 
> Most of the holes/reserved areas will be 2M aligned, other than initial
> 2M and possible 2M around ACPI region. So, we may end up mapping some of
> those pages with small pages. Even though it was not enforced until now,
> I feel that is required for correctness.

If it's rare enough mapping in 2MB chunks around the holes is ok.
> 
> >> >
> >> >Exactly it's already broken.
> >> >
> >> >Anyways if someone accesses mmio through /dev/mem I think they 
> >> >definitely
> >> >want the real mappings, not a zero page.  And dev/mem 
> >should provide.
> >> >The trick is just to do it without caching attribute violations, 
> >> >but with mattr it is possible.
> >> 
> >> I don't like /dev/mem supporting access to mmio. We do not know what
> >
> >But it always did that. I'm sure you'll break stuff if you forbid
> >it suddenly.
> >
> >> attributes to use for these regions.  We can potentially map 
> >all these
> >> pages uncacheable. 
> >
> >That is what current /dev/mem does.
> 
> May be I am missing something. But, I don't think I saw /dev/mem
> checking whether some region is reserved and mapping those pages as
> uncacheable. 

It relies partly on the MTRRs and partly checks for >= end_pfn.
Yes it's a gross hack, but it works.

> As I though, its mostly done as MTRR has such setting. If I
> do dd of devmem which ends up reading all reserved regions today, I see
> one of my systems dying horribly with NMI dazed and confused and the
> other gets SCSI errors etc. I am not sure how can some apps depend on
> reading mmio regions through /dev/mem. Any particular app you are
> thinking about?

The older X servers for once or x86emu in user space and likely various 
others. There are all kind of scary utilities using /dev/mem around
(like BIOS flash updaters etc.) 

I know some people who don't trust the VM for large memory ares 
like to boot with small mem=...  and then grab memory through /dev/mem.
I suspect if that didn't work anymore there would be eventually
complaints too although there might be a case be made for not 
supporting that anymore.

BUt really /dev/mem is widely used and full compatibility is fairly
important.


> Other than the complicated code, do you see any issues of identity
> mapping only "usable" and "ACPI" regions as per e820? We can possible
> try to simplify the code, if that is the only concern.

The basic idea is fine.

-Andi


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 22:15       ` Linus Torvalds
  2008-01-10 22:27         ` Pallipadi, Venkatesh
@ 2008-01-10 22:50         ` Valdis.Kletnieks
  2008-01-18 18:27           ` Dave Jones
  1 sibling, 1 reply; 48+ messages in thread
From: Valdis.Kletnieks @ 2008-01-10 22:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pallipadi, Venkatesh, ak, ebiederm, rdreier, gregkh, airlied,
	davej, mingo, tglx, akpm, arjan, Barnes, Jesse, davem,
	linux-kernel, Siddha, Suresh B

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

On Thu, 10 Jan 2008 14:15:25 PST, Linus Torvalds said:

> Well, I think that /dev/mem should simply give them the right info. That's 
> what people use /dev/mem for - doing things like reading BIOS images etc. 
> 
> So returning *either* a zero page *or* stopping at the first hole is both 
> equally wrong. 

A case could be made that the /dev/mem driver should at *least* prohibit access
to those memory ranges that the kernel already knows have (or might have)
memory-mapped control registers with Bad Juju side-effects attached to them.

Of course, a case could also be made that it should be permitted, because
anybody who tries to read such memory addresses either (a) knows what they're
doing or (b) is about to become an example of evolution in action... ;)

(Personally, I keep a copy of Arjan's "restrict devmem" patch from Fedora
around, so I guess that says which camp I belong in, and the fact it's a Fedora
patch and not mainstream says something too...)


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 09/11] PAT x86: Add ioremap_wc support
  2008-01-10 19:25     ` Pallipadi, Venkatesh
@ 2008-01-12  0:18       ` Roland Dreier
  0 siblings, 0 replies; 48+ messages in thread
From: Roland Dreier @ 2008-01-12  0:18 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Andi Kleen, ebiederm, torvalds, gregkh, airlied, davej, mingo,
	tglx, hpa, linux-kernel, Siddha, Suresh B

 > >I don't think that's a good idea. Drivers should be able to 
 > >detect this somehow.
 > >Handling UC mappings as WC will probably give very poor results.

 > It is the other way. ioremap_wc aliases to ioremap_nocache.
 > This was based on earlier feedback from Roland.

 > >From: Roland Dreier [mailto:rdreier@cisco.com] 
 > >I think ioremap_wc() needs to be available on all archs for this to be
 > >really useful to drivers.  It can be a fallback to ioremap_nocache()
 > >everywhere except 64-bit x86, but it's not nice for every driver that
 > >wants to use this to need an "#ifdef X86" or whatever.

Yes... my basic point is simply that the kernel interfaces to this WC
stuff must be available on all architectures, even if it's stubbed out
on architectures that don't support write-combining or where it hasn't
been implemented yet.  I think the only sane way to stub it out is to
fall back to uncached...

It's not going to be useful if drivers need crazy #ifdefs to decide
whether to use ioremap_wc(), pgprot_writecombine() or whatever.  Just
look at the mess in drm_io_prot() in drm_vm.c to see what I want to
avoid.

Just to be explicit, my interest in this is that I want to be able to
merge the patch below and have the mlx4 driver still build and work on
every arch that has PCI support:

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index d8287d9..5eac9c6 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -381,8 +381,7 @@ static int mlx4_ib_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 				       PAGE_SIZE, vma->vm_page_prot))
 			return -EAGAIN;
 	} else if (vma->vm_pgoff == 1 && dev->dev->caps.bf_reg_size != 0) {
-		/* FIXME want pgprot_writecombine() for BlueFlame pages */
-		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 
 		if (io_remap_pfn_range(vma, vma->vm_start,
 				       to_mucontext(context)->uar.pfn +


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 20:50         ` Pallipadi, Venkatesh
  2008-01-10 21:16           ` Andi Kleen
@ 2008-01-14 16:43           ` Ingo Molnar
  2008-01-14 21:21             ` Siddha, Suresh B
  1 sibling, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2008-01-14 16:43 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Andi Kleen, ebiederm, rdreier, torvalds, gregkh, airlied, davej,
	tglx, linux-kernel, Siddha, Suresh B, Arjan van de Ven


* Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> wrote:

> Also, relying on MTRR, is like giving more importance to BIOS writer 
> than required :-). I think the best way to deal with MTRR is just to 
> not touch it. Leave it as it is and do not try to assume that they are 
> correct, as frequently they will not be.

i'd suggest the following strategy on PAT-capable CPUs:

 - do not try to write MTRRs. Ever.

 - _read_ the current MTRR settings (including the default MTRR) and 
   check them against the e820 map. I can see two basic types of 
   mismatches:

     - RAM area marked fine in e820 but marked UC by MTRR: this 
       currently results in a slow system. (NOTE: UC- would be fine and 
       overridable by PAT, hence it's not a conflict we should detect.)

     - mmio area marked cacheable in the MTRR (results in broken system)

   then emit a warning and exclude all such areas from any further Linux 
   use. I.e. if it's RAM then clip it from our memory map. If it's mmio 
   area then try to exclude it from BAR sizing/positioning.

   this way we'll only have two sorts of physical pages put into 
   pagetables by Linux:

     1) RAM page, marked cacheable by MTRR
     2) RAM page, marked as UC- by MTRR
     2) mmio page, marked as UC- by MTRR

 - then we'd use PAT for all these patches to differentiate their 
   caching properties. We mark RAM pages as cacheable, and we mark mmio 
   pages as WC or UC.

I.e. try to be as conservative and always have a deterministic exit 
strategy towards a 100% working system, even if BIOS writers messed up 
the MTRR defaults. _Worst-case_ we boot up with somewhat less RAM or 
with a somewhat smaller mmio area. (but there will be warnings in the 
dmesg about that so users can complain about the BIOS.) We should never 
ever allow the wrong BIOS MTRR defaults to impact Linux's correctness.

hm?

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 06/11] PAT x86: Refactoring i386 cpa
  2008-01-10 19:00   ` Andi Kleen
@ 2008-01-14 16:47     ` Ingo Molnar
  0 siblings, 0 replies; 48+ messages in thread
From: Ingo Molnar @ 2008-01-14 16:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: venkatesh.pallipadi, ebiederm, rdreier, torvalds, gregkh,
	airlied, davej, tglx, hpa, linux-kernel, Suresh Siddha


* Andi Kleen <andi@firstfloor.org> wrote:

> venkatesh.pallipadi@intel.com writes:
> 
> > This makes 32 bit cpa similar to x86_64 and makes it easier for 
> > following PAT patches.
> 
> Please don't do this -- that would be a nightmare merging with my CPA 
> series.
> 
> This means adding the _addr() entry point is ok (although it should 
> not be needed on i386 because it doesn't do the end_pfn_mapped trick), 
> but not a wide scale refactoring.

yeah, i think we can unify after your cpa fixes and enhancements just 
fine - as you did the changes symmetrically for 32-bit and 64-bit. 
Hopefully i'll be able to upload the current PAT stuff in x86.git later 
today so that you can merge the cpa items ontop of it.

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-14 16:43           ` Ingo Molnar
@ 2008-01-14 21:21             ` Siddha, Suresh B
  2008-01-14 21:28               ` Andi Kleen
  2008-01-15 22:17               ` Ingo Molnar
  0 siblings, 2 replies; 48+ messages in thread
From: Siddha, Suresh B @ 2008-01-14 21:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pallipadi, Venkatesh, Andi Kleen, ebiederm, rdreier, torvalds,
	gregkh, airlied, davej, tglx, linux-kernel, Siddha, Suresh B,
	Arjan van de Ven, jesse.barnes

On Mon, Jan 14, 2008 at 05:43:24PM +0100, Ingo Molnar wrote:
> 
> * Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> wrote:
> 
> > Also, relying on MTRR, is like giving more importance to BIOS writer 
> > than required :-). I think the best way to deal with MTRR is just to 
> > not touch it. Leave it as it is and do not try to assume that they are 
> > correct, as frequently they will not be.
> 
> i'd suggest the following strategy on PAT-capable CPUs:
> 
>  - do not try to write MTRRs. Ever.
> 
>  - _read_ the current MTRR settings (including the default MTRR) and 
>    check them against the e820 map. I can see two basic types of 
>    mismatches:
> 
>      - RAM area marked fine in e820 but marked UC by MTRR: this 
>        currently results in a slow system.

Time to resurrect Jesse's old patches 
i386-trim-memory-not-covered-by-wb-mtrrs.patch(which was in -mm sometime back)

>        (NOTE: UC- would be fine and 
>        overridable by PAT, hence it's not a conflict we should detect.)

UC- can't be specified by MTRR's.

>      - mmio area marked cacheable in the MTRR (results in broken system)

PAT can help specify the UC/WC attribute here.

thanks,
suresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-14 21:21             ` Siddha, Suresh B
@ 2008-01-14 21:28               ` Andi Kleen
  2008-01-15 22:17               ` Ingo Molnar
  1 sibling, 0 replies; 48+ messages in thread
From: Andi Kleen @ 2008-01-14 21:28 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Ingo Molnar, Pallipadi, Venkatesh, Andi Kleen, ebiederm, rdreier,
	torvalds, gregkh, airlied, davej, tglx, linux-kernel,
	Arjan van de Ven, jesse.barnes

> Time to resurrect Jesse's old patches 
> i386-trim-memory-not-covered-by-wb-mtrrs.patch(which was in -mm sometime back)

They broke booting on my AMD QuadCore system here. Never quite figured
out what the problem was unfortunately.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-14 21:21             ` Siddha, Suresh B
  2008-01-14 21:28               ` Andi Kleen
@ 2008-01-15 22:17               ` Ingo Molnar
  2008-01-15 23:11                 ` Andi Kleen
  2008-01-15 23:21                 ` Siddha, Suresh B
  1 sibling, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2008-01-15 22:17 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Pallipadi, Venkatesh, Andi Kleen, ebiederm, rdreier, torvalds,
	gregkh, airlied, davej, tglx, linux-kernel, Arjan van de Ven,
	jesse.barnes


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> On Mon, Jan 14, 2008 at 05:43:24PM +0100, Ingo Molnar wrote:
> > 
> > * Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> wrote:
> > 
> > > Also, relying on MTRR, is like giving more importance to BIOS writer 
> > > than required :-). I think the best way to deal with MTRR is just to 
> > > not touch it. Leave it as it is and do not try to assume that they are 
> > > correct, as frequently they will not be.
> > 
> > i'd suggest the following strategy on PAT-capable CPUs:
> > 
> >  - do not try to write MTRRs. Ever.
> > 
> >  - _read_ the current MTRR settings (including the default MTRR) and 
> >    check them against the e820 map. I can see two basic types of 
> >    mismatches:
> > 
> >      - RAM area marked fine in e820 but marked UC by MTRR: this 
> >        currently results in a slow system.
> 
> Time to resurrect Jesse's old patches 
> i386-trim-memory-not-covered-by-wb-mtrrs.patch(which was in -mm 
> sometime back)

just to make sure i understood the attribute priorities right: we cannot 
just mark it WB in the PAT and expect it to be write-back - the UC of 
the MTRR will control?

> >        (NOTE: UC- would be fine and 
> >        overridable by PAT, hence it's not a conflict we should detect.)
> 
> UC- can't be specified by MTRR's.

hm, only by PATs? Not even by the default MTRR?

> >      - mmio area marked cacheable in the MTRR (results in broken 
> >      system)
> 
> PAT can help specify the UC/WC attribute here.

ok. So it seems we dont even need all that many special cases, a "dont 
write MTRRs" and "use PATs everywhere" rule would just do the right 
thing all across?

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-15 22:17               ` Ingo Molnar
@ 2008-01-15 23:11                 ` Andi Kleen
  2008-01-15 23:21                 ` Siddha, Suresh B
  1 sibling, 0 replies; 48+ messages in thread
From: Andi Kleen @ 2008-01-15 23:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Siddha, Suresh B, Pallipadi, Venkatesh, Andi Kleen, ebiederm,
	rdreier, torvalds, gregkh, airlied, davej, tglx, linux-kernel,
	Arjan van de Ven, jesse.barnes

> just to make sure i understood the attribute priorities right: we cannot 
> just mark it WB in the PAT and expect it to be write-back - the UC of 
> the MTRR will control?

There are different kinds of UC: UC+ and UC-. One controls the other doesn't.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-15 22:17               ` Ingo Molnar
  2008-01-15 23:11                 ` Andi Kleen
@ 2008-01-15 23:21                 ` Siddha, Suresh B
  2008-01-18 12:01                   ` Ingo Molnar
  1 sibling, 1 reply; 48+ messages in thread
From: Siddha, Suresh B @ 2008-01-15 23:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Siddha, Suresh B, Pallipadi, Venkatesh, Andi Kleen, ebiederm,
	rdreier, torvalds, gregkh, airlied, davej, tglx, linux-kernel,
	Arjan van de Ven, jesse.barnes

On Tue, Jan 15, 2008 at 11:17:58PM +0100, Ingo Molnar wrote:
> 
> * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> > Time to resurrect Jesse's old patches 
> > i386-trim-memory-not-covered-by-wb-mtrrs.patch(which was in -mm 
> > sometime back)
> 
> just to make sure i understood the attribute priorities right: we cannot 
> just mark it WB in the PAT and expect it to be write-back - the UC of 
> the MTRR will control?

unfortuantely PAT is not the over-riding winner always. It all depends
on the individual attributes. For WB in PAT, mtrr always takes
the precedence.

> 
> > >        (NOTE: UC- would be fine and 
> > >        overridable by PAT, hence it's not a conflict we should detect.)
> > 
> > UC- can't be specified by MTRR's.
> 
> hm, only by PATs? Not even by the default MTRR?

No.

> > >      - mmio area marked cacheable in the MTRR (results in broken 
> > >      system)
> > 
> > PAT can help specify the UC/WC attribute here.
> 
> ok. So it seems we dont even need all that many special cases, a "dont 
> write MTRRs" and "use PATs everywhere" rule would just do the right 
> thing all across?

Yes. The main thing required is on the lines of Jesse's patch.
If the MTRR's def type is not WB, then we need to check if any of the RAM
is not covered by MTRR range registers and trim the RAM accordingly.

thanks,
suresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-15 23:21                 ` Siddha, Suresh B
@ 2008-01-18 12:01                   ` Ingo Molnar
  2008-01-18 13:12                     ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2008-01-18 12:01 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Pallipadi, Venkatesh, Andi Kleen, ebiederm, rdreier, torvalds,
	gregkh, airlied, davej, tglx, linux-kernel, Arjan van de Ven,
	jesse.barnes


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> > ok. So it seems we dont even need all that many special cases, a 
> > "dont write MTRRs" and "use PATs everywhere" rule would just do the 
> > right thing all across?
> 
> Yes. The main thing required is on the lines of Jesse's patch. If the 
> MTRR's def type is not WB, then we need to check if any of the RAM is 
> not covered by MTRR range registers and trim the RAM accordingly.

ok. I've dusted off Jesse's patch (and Andrew's fix to it) and merged it 
to x86.git - see below.

one immediate problem is:

  +#ifdef CONFIG_X86_64

we should do this on 32-bit too.

	Ingo

----------->
Subject: x86, 32-bit: trim memory not covered by wb mtrrs
From: Jesse Barnes <jesse.barnes@intel.com>

On some machines, buggy BIOSes don't properly setup WB MTRRs to cover all
available RAM, meaning the last few megs (or even gigs) of memory will be
marked uncached.  Since Linux tends to allocate from high memory addresses
first, this causes the machine to be unusably slow as soon as the kernel
starts really using memory (i.e.  right around init time).

This patch works around the problem by scanning the MTRRs at boot and
figuring out whether the current end_pfn value (setup by early e820 code)
goes beyond the highest WB MTRR range, and if so, trimming it to match.  A
fairly obnoxious KERN_WARNING is printed too, letting the user know that
not all of their memory is available due to a likely BIOS bug.

Something similar could be done on i386 if needed, but the boot ordering
would be slightly different, since the MTRR code on i386 depends on the
boot_cpu_data structure being setup.

This patch fixes a bug in the last patch that caused the code to run on
non-Intel machines (AMD machines apparently don't need it and it's untested
on other non-Intel machines, so best keep it off).

Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Tested-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/kernel-parameters.txt |    6 ++
 arch/x86/kernel/bugs_64.c           |    1 
 arch/x86/kernel/cpu/mtrr/generic.c  |    8 ---
 arch/x86/kernel/cpu/mtrr/if.c       |    8 ---
 arch/x86/kernel/cpu/mtrr/main.c     |   90 ++++++++++++++++++++++++++----------
 arch/x86/kernel/cpu/mtrr/mtrr.h     |    3 +
 arch/x86/kernel/setup_64.c          |    4 +
 include/asm-x86/mtrr.h              |    5 +-
 8 files changed, 86 insertions(+), 39 deletions(-)

Index: linux-x86.q/Documentation/kernel-parameters.txt
===================================================================
--- linux-x86.q.orig/Documentation/kernel-parameters.txt
+++ linux-x86.q/Documentation/kernel-parameters.txt
@@ -562,6 +562,12 @@ and is between 256 and 4096 characters. 
 			See drivers/char/README.epca and
 			Documentation/digiepca.txt.
 
+	disable_mtrr_trim [X86-64, Intel only]
+			By default the kernel will trim any uncacheable
+			memory out of your available memory pool based on
+			MTRR settings.  This parameter disables that behavior,
+			possibly causing your machine to run very slowly.
+
 	dmasound=	[HW,OSS] Sound subsystem buffers
 
 	dscc4.setup=	[NET]
Index: linux-x86.q/arch/x86/kernel/bugs_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/bugs_64.c
+++ linux-x86.q/arch/x86/kernel/bugs_64.c
@@ -13,7 +13,6 @@
 void __init check_bugs(void)
 {
 	identify_cpu(&boot_cpu_data);
-	mtrr_bp_init();
 #if !defined(CONFIG_SMP)
 	printk("CPU: ");
 	print_cpu_info(&boot_cpu_data);
Index: linux-x86.q/arch/x86/kernel/cpu/mtrr/generic.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/cpu/mtrr/generic.c
+++ linux-x86.q/arch/x86/kernel/cpu/mtrr/generic.c
@@ -14,7 +14,7 @@
 #include "mtrr.h"
 
 struct mtrr_state {
-	struct mtrr_var_range *var_ranges;
+	struct mtrr_var_range var_ranges[MAX_VAR_RANGES];
 	mtrr_type fixed_ranges[NUM_FIXED_RANGES];
 	unsigned char enabled;
 	unsigned char have_fixed;
@@ -90,12 +90,6 @@ void __init get_mtrr_state(void)
 	unsigned lo, dummy;
 	unsigned long flags;
 
-	if (!mtrr_state.var_ranges) {
-		mtrr_state.var_ranges = kmalloc(num_var_ranges * sizeof (struct mtrr_var_range),
-						GFP_KERNEL);
-		if (!mtrr_state.var_ranges)
-			return;
-	}
 	vrs = mtrr_state.var_ranges;
 
 	rdmsr(MTRRcap_MSR, lo, dummy);
Index: linux-x86.q/arch/x86/kernel/cpu/mtrr/if.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/cpu/mtrr/if.c
+++ linux-x86.q/arch/x86/kernel/cpu/mtrr/if.c
@@ -11,10 +11,6 @@
 #include <asm/mtrr.h>
 #include "mtrr.h"
 
-/* RED-PEN: this is accessed without any locking */
-extern unsigned int *usage_table;
-
-
 #define FILE_FCOUNT(f) (((struct seq_file *)((f)->private_data))->private)
 
 static const char *const mtrr_strings[MTRR_NUM_TYPES] =
@@ -397,7 +393,7 @@ static int mtrr_seq_show(struct seq_file
 	for (i = 0; i < max; i++) {
 		mtrr_if->get(i, &base, &size, &type);
 		if (size == 0)
-			usage_table[i] = 0;
+			mtrr_usage_table[i] = 0;
 		else {
 			if (size < (0x100000 >> PAGE_SHIFT)) {
 				/* less than 1MB */
@@ -411,7 +407,7 @@ static int mtrr_seq_show(struct seq_file
 			len += seq_printf(seq, 
 				   "reg%02i: base=0x%05lx000 (%4luMB), size=%4lu%cB: %s, count=%d\n",
 			     i, base, base >> (20 - PAGE_SHIFT), size, factor,
-			     mtrr_attrib_to_str(type), usage_table[i]);
+			     mtrr_attrib_to_str(type), mtrr_usage_table[i]);
 		}
 	}
 	return 0;
Index: linux-x86.q/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-x86.q/arch/x86/kernel/cpu/mtrr/main.c
@@ -38,8 +38,8 @@
 #include <linux/cpu.h>
 #include <linux/mutex.h>
 
+#include <asm/e820.h>
 #include <asm/mtrr.h>
-
 #include <asm/uaccess.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
@@ -47,7 +47,7 @@
 
 u32 num_var_ranges = 0;
 
-unsigned int *usage_table;
+unsigned int mtrr_usage_table[MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
 
 u64 size_or_mask, size_and_mask;
@@ -121,13 +121,8 @@ static void __init init_table(void)
 	int i, max;
 
 	max = num_var_ranges;
-	if ((usage_table = kmalloc(max * sizeof *usage_table, GFP_KERNEL))
-	    == NULL) {
-		printk(KERN_ERR "mtrr: could not allocate\n");
-		return;
-	}
 	for (i = 0; i < max; i++)
-		usage_table[i] = 1;
+		mtrr_usage_table[i] = 1;
 }
 
 struct set_mtrr_data {
@@ -383,7 +378,7 @@ int mtrr_add_page(unsigned long base, un
 			goto out;
 		}
 		if (increment)
-			++usage_table[i];
+			++mtrr_usage_table[i];
 		error = i;
 		goto out;
 	}
@@ -391,15 +386,15 @@ int mtrr_add_page(unsigned long base, un
 	i = mtrr_if->get_free_region(base, size, replace);
 	if (i >= 0) {
 		set_mtrr(i, base, size, type);
-		if (likely(replace < 0))
-			usage_table[i] = 1;
-		else {
-			usage_table[i] = usage_table[replace];
+		if (likely(replace < 0)) {
+			mtrr_usage_table[i] = 1;
+		} else {
+			mtrr_usage_table[i] = mtrr_usage_table[replace];
 			if (increment)
-				usage_table[i]++;
+				mtrr_usage_table[i]++;
 			if (unlikely(replace != i)) {
 				set_mtrr(replace, 0, 0, 0);
-				usage_table[replace] = 0;
+				mtrr_usage_table[replace] = 0;
 			}
 		}
 	} else
@@ -529,11 +524,11 @@ int mtrr_del_page(int reg, unsigned long
 		printk(KERN_WARNING "mtrr: MTRR %d not used\n", reg);
 		goto out;
 	}
-	if (usage_table[reg] < 1) {
+	if (mtrr_usage_table[reg] < 1) {
 		printk(KERN_WARNING "mtrr: reg: %d has count=0\n", reg);
 		goto out;
 	}
-	if (--usage_table[reg] < 1)
+	if (--mtrr_usage_table[reg] < 1)
 		set_mtrr(reg, 0, 0, 0);
 	error = reg;
  out:
@@ -593,16 +588,11 @@ struct mtrr_value {
 	unsigned long	lsize;
 };
 
-static struct mtrr_value * mtrr_state;
+static struct mtrr_value mtrr_state[MAX_VAR_RANGES];
 
 static int mtrr_save(struct sys_device * sysdev, pm_message_t state)
 {
 	int i;
-	int size = num_var_ranges * sizeof(struct mtrr_value);
-
-	mtrr_state = kzalloc(size,GFP_ATOMIC);
-	if (!mtrr_state)
-		return -ENOMEM;
 
 	for (i = 0; i < num_var_ranges; i++) {
 		mtrr_if->get(i,
@@ -624,7 +614,6 @@ static int mtrr_restore(struct sys_devic
 				 mtrr_state[i].lsize,
 				 mtrr_state[i].ltype);
 	}
-	kfree(mtrr_state);
 	return 0;
 }
 
@@ -635,6 +624,59 @@ static struct sysdev_driver mtrr_sysdev_
 	.resume		= mtrr_restore,
 };
 
+static int disable_mtrr_trim;
+
+static int __init disable_mtrr_trim_setup(char *str)
+{
+	disable_mtrr_trim = 1;
+	return 0;
+}
+early_param("disable_mtrr_trim", disable_mtrr_trim_setup);
+
+#ifdef CONFIG_X86_64
+/**
+ * mtrr_trim_uncached_memory - trim RAM not covered by MTRRs
+ *
+ * Some buggy BIOSes don't setup the MTRRs properly for systems with certain
+ * memory configurations.  This routine checks to make sure the MTRRs having
+ * a write back type cover all of the memory the kernel is intending to use.
+ * If not, it'll trim any memory off the end by adjusting end_pfn, removing
+ * it from the kernel's allocation pools, warning the user with an obnoxious
+ * message.
+ */
+void __init mtrr_trim_uncached_memory(void)
+{
+	unsigned long i, base, size, highest_addr = 0, def, dummy;
+	mtrr_type type;
+
+	/* Make sure we only trim uncachable memory on Intel machines */
+	rdmsr(MTRRdefType_MSR, def, dummy);
+	def &= 0xff;
+	if (!is_cpu(INTEL) || disable_mtrr_trim || def != MTRR_TYPE_UNCACHABLE)
+		return;
+
+	/* Find highest cached pfn */
+	for (i = 0; i < num_var_ranges; i++) {
+		mtrr_if->get(i, &base, &size, &type);
+		if (type != MTRR_TYPE_WRBACK)
+			continue;
+		base <<= PAGE_SHIFT;
+		size <<= PAGE_SHIFT;
+		if (highest_addr < base + size)
+			highest_addr = base + size;
+	}
+
+	if ((highest_addr >> PAGE_SHIFT) != end_pfn) {
+		printk(KERN_WARNING "***************\n");
+		printk(KERN_WARNING "**** WARNING: likely BIOS bug\n");
+		printk(KERN_WARNING "**** MTRRs don't cover all of "
+		       "memory, trimmed %ld pages\n", end_pfn -
+		       (highest_addr >> PAGE_SHIFT));
+		printk(KERN_WARNING "***************\n");
+		end_pfn = highest_addr >> PAGE_SHIFT;
+	}
+}
+#endif
 
 /**
  * mtrr_bp_init - initialize mtrrs on the boot CPU
Index: linux-x86.q/arch/x86/kernel/cpu/mtrr/mtrr.h
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ linux-x86.q/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -12,6 +12,7 @@
 #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
 
 #define NUM_FIXED_RANGES 88
+#define MAX_VAR_RANGES 256
 #define MTRRfix64K_00000_MSR 0x250
 #define MTRRfix16K_80000_MSR 0x258
 #define MTRRfix16K_A0000_MSR 0x259
@@ -32,6 +33,8 @@
    an 8 bit field: */
 typedef u8 mtrr_type;
 
+extern unsigned int mtrr_usage_table[MAX_VAR_RANGES];
+
 struct mtrr_ops {
 	u32	vendor;
 	u32	use_intel_if;
Index: linux-x86.q/arch/x86/kernel/setup_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/setup_64.c
+++ linux-x86.q/arch/x86/kernel/setup_64.c
@@ -322,6 +322,10 @@ void __init setup_arch(char **cmdline_p)
 	 * we are rounding upwards:
 	 */
 	end_pfn = e820_end_of_ram();
+	/* Trim memory not covered by WB MTRRs */
+	mtrr_bp_init();
+	mtrr_trim_uncached_memory();
+
 	num_physpages = end_pfn;
 
 	check_efer();
Index: linux-x86.q/include/asm-x86/mtrr.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/mtrr.h
+++ linux-x86.q/include/asm-x86/mtrr.h
@@ -97,6 +97,7 @@ extern int mtrr_del_page (int reg, unsig
 extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
 extern void mtrr_ap_init(void);
 extern void mtrr_bp_init(void);
+extern void mtrr_trim_uncached_memory(void);
 #  else
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
@@ -120,7 +121,9 @@ static __inline__ int mtrr_del_page (int
 {
     return -ENODEV;
 }
-
+static __inline__ void mtrr_trim_uncached_memory(void)
+{
+}
 static __inline__ void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi) {;}
 
 #define mtrr_ap_init() do {} while (0)

Subject: x86, 32-bit: trim memory not covered by wb mtrrs, fix
From: Andrew Morton <akpm@linux-foundation.org>

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jesse Barnes <jesse.barnes@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---

 arch/x86/kernel/cpu/mtrr/main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-x86.q/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-x86.q/arch/x86/kernel/cpu/mtrr/main.c
@@ -666,7 +666,7 @@ void __init mtrr_trim_uncached_memory(vo
 			highest_addr = base + size;
 	}
 
-	if ((highest_addr >> PAGE_SHIFT) != end_pfn) {
+	if ((highest_addr >> PAGE_SHIFT) < end_pfn) {
 		printk(KERN_WARNING "***************\n");
 		printk(KERN_WARNING "**** WARNING: likely BIOS bug\n");
 		printk(KERN_WARNING "**** MTRRs don't cover all of "

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-18 12:01                   ` Ingo Molnar
@ 2008-01-18 13:12                     ` Andi Kleen
  2008-01-18 16:46                       ` Jesse Barnes
  0 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-01-18 13:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Siddha, Suresh B, Pallipadi, Venkatesh, Andi Kleen, ebiederm,
	rdreier, torvalds, gregkh, airlied, davej, tglx, linux-kernel,
	Arjan van de Ven, jesse.barnes

> (AMD machines apparently don't need it 

That's not true -- we had AMD systems in the past with broken MTRRs for
large memory configurations too,  Mostly it was pre revE though.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-18 13:12                     ` Andi Kleen
@ 2008-01-18 16:46                       ` Jesse Barnes
  2008-01-18 18:12                         ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Jesse Barnes @ 2008-01-18 16:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Siddha, Suresh B, Pallipadi, Venkatesh, ebiederm,
	rdreier, torvalds, gregkh, airlied, davej, tglx, linux-kernel,
	Arjan van de Ven

On Friday, January 18, 2008 5:12 am Andi Kleen wrote:
> > (AMD machines apparently don't need it
>
> That's not true -- we had AMD systems in the past with broken MTRRs for
> large memory configurations too,  Mostly it was pre revE though.

It should be easy enough to enable it for AMD as well, and it would also be 
good to track down the one failure you found...  I don't *think* the 
re-ordering of MTRR initialization should affect AMDs anymore than it does 
Intel, but someone familiar with the boot code would have to do a quick audit 
to be sure.

Jesse

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-18 16:46                       ` Jesse Barnes
@ 2008-01-18 18:12                         ` Andi Kleen
  2008-01-18 19:02                           ` Jesse Barnes
  0 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-01-18 18:12 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andi Kleen, Ingo Molnar, Siddha, Suresh B, Pallipadi, Venkatesh,
	ebiederm, rdreier, torvalds, gregkh, airlied, davej, tglx,
	linux-kernel, Arjan van de Ven

On Fri, Jan 18, 2008 at 08:46:02AM -0800, Jesse Barnes wrote:
> On Friday, January 18, 2008 5:12 am Andi Kleen wrote:
> > > (AMD machines apparently don't need it
> >
> > That's not true -- we had AMD systems in the past with broken MTRRs for
> > large memory configurations too,  Mostly it was pre revE though.
> 
> It should be easy enough to enable it for AMD as well, and it would also be 
> good to track down the one failure you found...  I don't *think* the 
> re-ordering of MTRR initialization should affect AMDs anymore than it does 
> Intel, but someone familiar with the boot code would have to do a quick audit 
> to be sure.

I looked back then when I had bisected it down and I admit I didn't spot the 
problem from source review. I think it came from the reordering so blacklisting 
AMD alone wouldn't have helped. Might have been some
subtle race (e.g. long ago we had such races in the MTRR code
triggered by the first HT CPUs) 

Anyways I just test booted latest git-x86 with your patches included on 
the QC system and it booted now. However it has both more RAM and newer CPUs 
(the original ones were pre-production, that is why I also didn't send you logs[1] ..) 
then when I tested originally. So this means either the problem was somewhere 
else or the different configuration hides it.

I guess you will hear about it if it's still broken on other machines.

Currently it looks good.

I think it should be enabled on AMD too though. If the reordering breaks
it then blacklisting won't help anyways.

-Andi

[1] but I checked the known errata and there was nothing related to MTRR.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-10 22:50         ` Valdis.Kletnieks
@ 2008-01-18 18:27           ` Dave Jones
  2008-01-18 20:54             ` Ingo Molnar
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Jones @ 2008-01-18 18:27 UTC (permalink / raw)
  To: Valdis.Kletnieks, Linus Torvalds, Pallipadi, Venkatesh, ak,
	ebiederm, rdreier, gregkh, airlied, davej, mingo, tglx, akpm,
	arjan, Barnes, Jesse, davem, linux-kernel, Siddha, Suresh B

On Thu, Jan 10, 2008 at 05:50:41PM -0500, Valdis.Kletnieks@vt.edu wrote:

 > (Personally, I keep a copy of Arjan's "restrict devmem" patch from Fedora
 > around, so I guess that says which camp I belong in, and the fact it's a Fedora
 > patch and not mainstream says something too...)

The way that patch is right now (and has been for some time) is kind of nasty,
and could use cleaning up.  I made a half-assed attempt at improving it
a little over xmas, but it could rewriting from scratch.
That's the only reason I never bothered pushing it (and I assume, the reason
that Arjan didn't push it when he wrote the original).

     Dave

--
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-18 18:12                         ` Andi Kleen
@ 2008-01-18 19:02                           ` Jesse Barnes
  2008-01-19  2:42                             ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Jesse Barnes @ 2008-01-18 19:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Siddha, Suresh B, Pallipadi, Venkatesh, ebiederm,
	rdreier, torvalds, gregkh, airlied, davej, tglx, linux-kernel,
	Arjan van de Ven

On Friday, January 18, 2008 10:12 am Andi Kleen wrote:
> I looked back then when I had bisected it down and I admit I didn't spot
> the problem from source review. I think it came from the reordering so
> blacklisting AMD alone wouldn't have helped. Might have been some
> subtle race (e.g. long ago we had such races in the MTRR code
> triggered by the first HT CPUs)
>
> Anyways I just test booted latest git-x86 with your patches included on
> the QC system and it booted now. However it has both more RAM and newer
> CPUs (the original ones were pre-production, that is why I also didn't send
> you logs[1] ..) then when I tested originally. So this means either the
> problem was somewhere else or the different configuration hides it.
>
> I guess you will hear about it if it's still broken on other machines.
>
> Currently it looks good.
>
> I think it should be enabled on AMD too though. If the reordering breaks
> it then blacklisting won't help anyways.
>
> -Andi
>
> [1] but I checked the known errata and there was nothing related to MTRR.

Ah, ok, that explains your reticence earlier.  Thanks for testing again, I 
guess the patch is good to go.

Jesse

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-18 18:27           ` Dave Jones
@ 2008-01-18 20:54             ` Ingo Molnar
  0 siblings, 0 replies; 48+ messages in thread
From: Ingo Molnar @ 2008-01-18 20:54 UTC (permalink / raw)
  To: Dave Jones, Valdis.Kletnieks, Linus Torvalds, Pallipadi,
	Venkatesh, ak, ebiederm, rdreier, gregkh, airlied, tglx, akpm,
	arjan, Barnes, Jesse, davem, linux-kernel, Siddha, Suresh B


* Dave Jones <davej@redhat.com> wrote:

> On Thu, Jan 10, 2008 at 05:50:41PM -0500, Valdis.Kletnieks@vt.edu wrote:
> 
>  > (Personally, I keep a copy of Arjan's "restrict devmem" patch from Fedora
>  > around, so I guess that says which camp I belong in, and the fact it's a Fedora
>  > patch and not mainstream says something too...)
> 
> The way that patch is right now (and has been for some time) is kind 
> of nasty, and could use cleaning up.  I made a half-assed attempt at 
> improving it a little over xmas, but it could rewriting from scratch. 
> That's the only reason I never bothered pushing it (and I assume, the 
> reason that Arjan didn't push it when he wrote the original).

could someone please send it into this thread so that we can have a go 
at integrating it into x86.git?

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
  2008-01-18 19:02                           ` Jesse Barnes
@ 2008-01-19  2:42                             ` Andi Kleen
  0 siblings, 0 replies; 48+ messages in thread
From: Andi Kleen @ 2008-01-19  2:42 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andi Kleen, Ingo Molnar, Siddha, Suresh B, Pallipadi, Venkatesh,
	ebiederm, rdreier, torvalds, gregkh, airlied, davej, tglx,
	linux-kernel, Arjan van de Ven

> > I think it should be enabled on AMD too though. If the reordering breaks
> > it then blacklisting won't help anyways.

Actually it is already enabled on AMD. You check for is_cpu(INTEL)
but that just checks the generic MTRR architecture and all AMD CPUs
since K7 use that one too.

That is ok imho.

Perhaps it would be good to fix the incorrect comment though.


> >
> > -Andi
> >
> > [1] but I checked the known errata and there was nothing related to MTRR.
> 
> Ah, ok, that explains your reticence earlier.  Thanks for testing again, I 
> guess the patch is good to go.

I see a failure here now on a (AMD) system where it trims a lot of memory, but
should probably not (or at least i haven't noticed any malfunction
before without it). Investigating. 

-Andi


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2008-01-19  2:38 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-10 18:48 [patch 00/11] PAT x86: PAT support for x86 venkatesh.pallipadi
2008-01-10 18:48 ` [patch 01/11] PAT x86: Make acpi/other drivers map memory instead of assuming identity map venkatesh.pallipadi
2008-01-10 18:48 ` [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text venkatesh.pallipadi
2008-01-10 19:06   ` Andi Kleen
2008-01-10 19:17     ` Pallipadi, Venkatesh
2008-01-10 19:28       ` Andi Kleen
2008-01-10 20:50         ` Pallipadi, Venkatesh
2008-01-10 21:16           ` Andi Kleen
2008-01-10 22:25             ` Pallipadi, Venkatesh
2008-01-10 22:35               ` Andi Kleen
2008-01-14 16:43           ` Ingo Molnar
2008-01-14 21:21             ` Siddha, Suresh B
2008-01-14 21:28               ` Andi Kleen
2008-01-15 22:17               ` Ingo Molnar
2008-01-15 23:11                 ` Andi Kleen
2008-01-15 23:21                 ` Siddha, Suresh B
2008-01-18 12:01                   ` Ingo Molnar
2008-01-18 13:12                     ` Andi Kleen
2008-01-18 16:46                       ` Jesse Barnes
2008-01-18 18:12                         ` Andi Kleen
2008-01-18 19:02                           ` Jesse Barnes
2008-01-19  2:42                             ` Andi Kleen
2008-01-10 21:05   ` Linus Torvalds
2008-01-10 21:57     ` Pallipadi, Venkatesh
2008-01-10 22:15       ` Linus Torvalds
2008-01-10 22:27         ` Pallipadi, Venkatesh
2008-01-10 22:50         ` Valdis.Kletnieks
2008-01-18 18:27           ` Dave Jones
2008-01-18 20:54             ` Ingo Molnar
2008-01-10 18:48 ` [patch 03/11] PAT x86: Map only usable memory in i386 identity map venkatesh.pallipadi
2008-01-10 19:10   ` Andi Kleen
2008-01-10 18:48 ` [patch 04/11] PAT x86: Basic PAT implementation venkatesh.pallipadi
2008-01-10 18:48 ` [patch 05/11] PAT x86: drm driver changes for PAT venkatesh.pallipadi
2008-01-10 18:48 ` [patch 06/11] PAT x86: Refactoring i386 cpa venkatesh.pallipadi
2008-01-10 19:00   ` Andi Kleen
2008-01-14 16:47     ` Ingo Molnar
2008-01-10 18:48 ` [patch 07/11] PAT x86: pat-conflict resolution using linear list venkatesh.pallipadi
2008-01-10 19:13   ` Andi Kleen
2008-01-10 20:08     ` Pallipadi, Venkatesh
2008-01-10 18:48 ` [patch 08/11] PAT x86: pci mmap conlfict patch venkatesh.pallipadi
2008-01-10 18:48 ` [patch 09/11] PAT x86: Add ioremap_wc support venkatesh.pallipadi
2008-01-10 19:08   ` Andi Kleen
2008-01-10 19:25     ` Pallipadi, Venkatesh
2008-01-12  0:18       ` Roland Dreier
2008-01-10 18:48 ` [patch 10/11] PAT x86: Handle /dev/mem mappings venkatesh.pallipadi
2008-01-10 18:48 ` [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfs vor pci_mmap_resource venkatesh.pallipadi
2008-01-10 19:43   ` Greg KH
2008-01-10 20:54     ` [patch 11/11] PAT x86: Expose uc and wc interfaces in /sysfsvor pci_mmap_resource Pallipadi, Venkatesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).