LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 00/10] System z10 patches.
@ 2008-03-12 17:31 Martin Schwidefsky
  2008-03-12 17:31 ` [patch 01/10] Add new fields for System z10 to /proc/sysinfo Martin Schwidefsky
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:31 UTC (permalink / raw)
  To: linux-kernel, linux-s390

Greetings,
two weeks ago System z10 has been announced. This patchset adds support
for some of the new features of this machine. The two main features
are large page support (1MB pages) and cpu topology supporty. There are
some common code dependencies for these two, see patches #3, #4, #5, #8
and #9. Most notably we found that we need to add a new tlb flush for the
copy-on-write of a large page (patch #8). We think that this is a bug,
but obviously one that hasn't shown up so far on any of the other large
page architectures.

The patches are queued in the features/for-andrew branches of git390 and
will be included linux-next and -mm (keeping fingers crossed that we
don't create any rejects).

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 01/10] Add new fields for System z10 to /proc/sysinfo
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
@ 2008-03-12 17:31 ` Martin Schwidefsky
  2008-03-12 17:57   ` Josef 'Jeff' Sipek
  2008-03-12 17:31 ` [patch 02/10] Export stfle Martin Schwidefsky
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:31 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Martin Schwidefsky

[-- Attachment #1: 101-sysinfo.diff --]
[-- Type: text/plain, Size: 2251 bytes --]

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Add permanent and temporary model capacity and the corresponding
capacity value fields for the three capacity identifiers to the
output of /proc/sysinfo.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 drivers/s390/sysinfo.c |   21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

Index: quilt-2.6/drivers/s390/sysinfo.c
===================================================================
--- quilt-2.6.orig/drivers/s390/sysinfo.c
+++ quilt-2.6/drivers/s390/sysinfo.c
@@ -26,6 +26,11 @@ struct sysinfo_1_1_1 {
 	char sequence[16];
 	char plant[4];
 	char model[16];
+	char model_perm_cap[16];
+	char model_temp_cap[16];
+	char model_cap_rating[4];
+	char model_perm_cap_rating[4];
+	char model_temp_cap_rating[4];
 };
 
 struct sysinfo_1_2_1 {
@@ -133,6 +138,8 @@ static int stsi_1_1_1(struct sysinfo_1_1
 	EBCASC(info->sequence, sizeof(info->sequence));
 	EBCASC(info->plant, sizeof(info->plant));
 	EBCASC(info->model_capacity, sizeof(info->model_capacity));
+	EBCASC(info->model_perm_cap, sizeof(info->model_perm_cap));
+	EBCASC(info->model_temp_cap, sizeof(info->model_temp_cap));
 	len += sprintf(page + len, "Manufacturer:         %-16.16s\n",
 		       info->manufacturer);
 	len += sprintf(page + len, "Type:                 %-4.4s\n",
@@ -155,8 +162,18 @@ static int stsi_1_1_1(struct sysinfo_1_1
 		       info->sequence);
 	len += sprintf(page + len, "Plant:                %-4.4s\n",
 		       info->plant);
-	len += sprintf(page + len, "Model Capacity:       %-16.16s\n",
-		       info->model_capacity);
+	len += sprintf(page + len, "Model Capacity:       %-16.16s %08u\n",
+		       info->model_capacity, *(u32 *) info->model_cap_rating);
+	if (info->model_perm_cap[0] != '\0')
+		len += sprintf(page + len,
+			       "Model Perm. Capacity: %-16.16s %08u\n",
+			       info->model_perm_cap,
+			       *(u32 *) info->model_perm_cap_rating);
+	if (info->model_temp_cap[0] != '\0')
+		len += sprintf(page + len,
+			       "Model Temp. Capacity: %-16.16s %08u\n",
+			       info->model_temp_cap,
+			       *(u32 *) info->model_temp_cap_rating);
 	return len;
 }
 

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 02/10] Export stfle.
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
  2008-03-12 17:31 ` [patch 01/10] Add new fields for System z10 to /proc/sysinfo Martin Schwidefsky
@ 2008-03-12 17:31 ` Martin Schwidefsky
  2008-03-12 17:31 ` [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file Martin Schwidefsky
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:31 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Heiko Carstens, Martin Schwidefsky

[-- Attachment #1: 102-stfle-global.diff --]
[-- Type: text/plain, Size: 2107 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Make stfle visible so other code can call this.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 arch/s390/kernel/setup.c  |   11 +++++++++--
 include/asm-s390/system.h |    2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

Index: quilt-2.6/arch/s390/kernel/setup.c
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/setup.c
+++ quilt-2.6/arch/s390/kernel/setup.c
@@ -687,7 +687,7 @@ static __init unsigned int stfl(void)
 	return S390_lowcore.stfl_fac_list;
 }
 
-static __init int stfle(unsigned long long *list, int doublewords)
+static int __init __stfle(unsigned long long *list, int doublewords)
 {
 	typedef struct { unsigned long long _[doublewords]; } addrtype;
 	register unsigned long __nr asm("0") = doublewords - 1;
@@ -697,6 +697,13 @@ static __init int stfle(unsigned long lo
 	return __nr + 1;
 }
 
+int __init stfle(unsigned long long *list, int doublewords)
+{
+	if (!(stfl() & (1UL << 24)))
+		return -EOPNOTSUPP;
+	return __stfle(list, doublewords);
+}
+
 /*
  * Setup hardware capabilities.
  */
@@ -741,7 +748,7 @@ static void __init setup_hwcaps(void)
 	 *   HWCAP_S390_DFP bit 6.
 	 */
 	if ((elf_hwcap & (1UL << 2)) &&
-	    stfle(&facility_list_extended, 1) > 0) {
+	    __stfle(&facility_list_extended, 1) > 0) {
 		if (facility_list_extended & (1ULL << (64 - 43)))
 			elf_hwcap |= 1UL << 6;
 	}
Index: quilt-2.6/include/asm-s390/system.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/system.h
+++ quilt-2.6/include/asm-s390/system.h
@@ -406,6 +406,8 @@ __set_psw_mask(unsigned long mask)
 #define local_mcck_enable()  __set_psw_mask(psw_kernel_bits)
 #define local_mcck_disable() __set_psw_mask(psw_kernel_bits & ~PSW_MASK_MCHECK)
 
+int stfle(unsigned long long *list, int doublewords);
+
 #ifdef CONFIG_SMP
 
 extern void smp_ctl_set_bit(int cr, int bit);

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file.
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
  2008-03-12 17:31 ` [patch 01/10] Add new fields for System z10 to /proc/sysinfo Martin Schwidefsky
  2008-03-12 17:31 ` [patch 02/10] Export stfle Martin Schwidefsky
@ 2008-03-12 17:31 ` Martin Schwidefsky
  2008-03-12 23:03   ` Andrew Morton
  2008-03-21 12:29   ` Ingo Molnar
  2008-03-12 17:31 ` [patch 04/10] sched: Add arch_update_cpu_topology hook Martin Schwidefsky
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:31 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Heiko Carstens, Martin Schwidefsky

[-- Attachment #1: 103-nodes-export.diff --]
[-- Type: text/plain, Size: 1224 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Needed so it can be called from outside of sched.c.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 include/linux/sched.h |    4 ++++
 kernel/sched.c        |    2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

Index: quilt-2.6/include/linux/sched.h
===================================================================
--- quilt-2.6.orig/include/linux/sched.h
+++ quilt-2.6/include/linux/sched.h
@@ -791,6 +791,10 @@ struct sched_domain {
 
 extern void partition_sched_domains(int ndoms_new, cpumask_t *doms_new);
 
+#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
+extern int arch_reinit_sched_domains(void);
+#endif
+
 #endif	/* CONFIG_SMP */
 
 /*
Index: quilt-2.6/kernel/sched.c
===================================================================
--- quilt-2.6.orig/kernel/sched.c
+++ quilt-2.6/kernel/sched.c
@@ -6917,7 +6917,7 @@ match2:
 }
 
 #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
-static int arch_reinit_sched_domains(void)
+int arch_reinit_sched_domains(void)
 {
 	int err;
 

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 04/10] sched: Add arch_update_cpu_topology hook.
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
                   ` (2 preceding siblings ...)
  2008-03-12 17:31 ` [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file Martin Schwidefsky
@ 2008-03-12 17:31 ` Martin Schwidefsky
  2008-03-21 12:30   ` Ingo Molnar
  2008-03-12 17:32 ` [patch 05/10] cpu topology: convert siblings_show macro to accept non-lvalues Martin Schwidefsky
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:31 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Heiko Carstens, Martin Schwidefsky

[-- Attachment #1: 104-nodes-hook.diff --]
[-- Type: text/plain, Size: 1589 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Will be called each time the scheduling domains are rebuild.
Needed for architectures that don't have a static cpu topology.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 include/linux/topology.h |    2 ++
 kernel/sched.c           |    5 +++++
 2 files changed, 7 insertions(+)

Index: quilt-2.6/include/linux/topology.h
===================================================================
--- quilt-2.6.orig/include/linux/topology.h
+++ quilt-2.6/include/linux/topology.h
@@ -50,6 +50,8 @@
 	for_each_online_node(node)						\
 		if (nr_cpus_node(node))
 
+void arch_update_cpu_topology(void);
+
 /* Conform to ACPI 2.0 SLIT distance definitions */
 #define LOCAL_DISTANCE		10
 #define REMOTE_DISTANCE		20
Index: quilt-2.6/kernel/sched.c
===================================================================
--- quilt-2.6.orig/kernel/sched.c
+++ quilt-2.6/kernel/sched.c
@@ -6804,6 +6804,10 @@ static int ndoms_cur;		/* number of sche
  */
 static cpumask_t fallback_doms;
 
+void __attribute__((weak)) arch_update_cpu_topology(void)
+{
+}
+
 /*
  * Set up scheduler domains and groups. Callers must hold the hotplug lock.
  * For now this just excludes isolated cpus, but could be used to
@@ -6813,6 +6817,7 @@ static int arch_init_sched_domains(const
 {
 	int err;
 
+	arch_update_cpu_topology();
 	ndoms_cur = 1;
 	doms_cur = kmalloc(sizeof(cpumask_t), GFP_KERNEL);
 	if (!doms_cur)

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 05/10] cpu topology: convert siblings_show macro to accept non-lvalues.
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
                   ` (3 preceding siblings ...)
  2008-03-12 17:31 ` [patch 04/10] sched: Add arch_update_cpu_topology hook Martin Schwidefsky
@ 2008-03-12 17:32 ` Martin Schwidefsky
  2008-03-12 17:32 ` [patch 06/10] cpu topology support for s390 Martin Schwidefsky
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:32 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Heiko Carstens, Martin Schwidefsky

[-- Attachment #1: 105-nodes-siblings.diff --]
[-- Type: text/plain, Size: 1835 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

The sibling cpu masks on s390 can change because of dynamic cpu
reconfiguration. Therefore accesses to these masks are protected with
a lock so there aren't concurrent read and write accesses at the same
time. cpumask_scnprint in define_siblings_show_func expects an lvalue
for the cpu mask which would make the locking down in the s390 arch
code pointless. To solve this change the topology code to save a
snapshot of the sibling cpu mask and use that for output purposes.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 drivers/base/topology.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

Index: quilt-2.6/drivers/base/topology.c
===================================================================
--- quilt-2.6.orig/drivers/base/topology.c
+++ quilt-2.6/drivers/base/topology.c
@@ -40,13 +40,16 @@ static ssize_t show_##name(struct sys_de
 	return sprintf(buf, "%d\n", topology_##name(cpu));	\
 }
 
-#define define_siblings_show_func(name)					\
-static ssize_t show_##name(struct sys_device *dev, char *buf)		\
-{									\
-	ssize_t len = -1;						\
-	unsigned int cpu = dev->id;					\
-	len = cpumask_scnprintf(buf, NR_CPUS+1, topology_##name(cpu));	\
-	return (len + sprintf(buf + len, "\n"));			\
+#define define_siblings_show_func(name)				\
+static ssize_t show_##name(struct sys_device *dev, char *buf)	\
+{								\
+	ssize_t len = -1;					\
+	unsigned int cpu = dev->id;				\
+	cpumask_t mask; 					\
+								\
+	mask = topology_##name(cpu);				\
+	len = cpumask_scnprintf(buf, NR_CPUS + 1, mask);	\
+	return len + sprintf(buf + len, "\n");			\
 }
 
 #ifdef	topology_physical_package_id

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 06/10] cpu topology support for s390.
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
                   ` (4 preceding siblings ...)
  2008-03-12 17:32 ` [patch 05/10] cpu topology: convert siblings_show macro to accept non-lvalues Martin Schwidefsky
@ 2008-03-12 17:32 ` Martin Schwidefsky
  2008-03-12 23:11   ` Andrew Morton
  2008-03-12 17:32 ` [patch 07/10] Vertical cpu management Martin Schwidefsky
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:32 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Heiko Carstens, Martin Schwidefsky

[-- Attachment #1: 106-nodes-support.diff --]
[-- Type: text/plain, Size: 10978 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Add s390 backend so we can give the scheduler some hints about the
cpu topology.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 arch/s390/Kconfig           |    4 
 arch/s390/defconfig         |    1 
 arch/s390/kernel/Makefile   |    2 
 arch/s390/kernel/setup.c    |    2 
 arch/s390/kernel/smp.c      |    4 
 arch/s390/kernel/topology.c |  271 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/s390/sysinfo.c      |    2 
 include/asm-s390/smp.h      |    2 
 include/asm-s390/system.h   |    1 
 include/asm-s390/topology.h |   16 ++
 10 files changed, 300 insertions(+), 5 deletions(-)

Index: quilt-2.6/arch/s390/defconfig
===================================================================
--- quilt-2.6.orig/arch/s390/defconfig
+++ quilt-2.6/arch/s390/defconfig
@@ -3,6 +3,7 @@
 # Linux kernel version: 2.6.25-rc5
 # Wed Mar 12 14:10:52 2008
 #
+CONFIG_SCHED_MC=y
 CONFIG_MMU=y
 CONFIG_ZONE_DMA=y
 CONFIG_LOCKDEP_SUPPORT=y
Index: quilt-2.6/arch/s390/Kconfig
===================================================================
--- quilt-2.6.orig/arch/s390/Kconfig
+++ quilt-2.6/arch/s390/Kconfig
@@ -3,6 +3,10 @@
 # see Documentation/kbuild/kconfig-language.txt.
 #
 
+config SCHED_MC
+	def_bool y
+	depends on SMP
+
 config MMU
 	def_bool y
 
Index: quilt-2.6/arch/s390/kernel/Makefile
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/Makefile
+++ quilt-2.6/arch/s390/kernel/Makefile
@@ -19,7 +19,7 @@ obj-y	+= $(if $(CONFIG_64BIT),reipl64.o,
 extra-y				+= head.o init_task.o vmlinux.lds
 
 obj-$(CONFIG_MODULES)		+= s390_ksyms.o module.o
-obj-$(CONFIG_SMP)		+= smp.o
+obj-$(CONFIG_SMP)		+= smp.o topology.o
 
 obj-$(CONFIG_AUDIT)		+= audit.o
 compat-obj-$(CONFIG_AUDIT)	+= compat_audit.o
Index: quilt-2.6/arch/s390/kernel/setup.c
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/setup.c
+++ quilt-2.6/arch/s390/kernel/setup.c
@@ -39,6 +39,7 @@
 #include <linux/pfn.h>
 #include <linux/ctype.h>
 #include <linux/reboot.h>
+#include <linux/topology.h>
 
 #include <asm/ipl.h>
 #include <asm/uaccess.h>
@@ -830,6 +831,7 @@ setup_arch(char **cmdline_p)
 
         cpu_init();
         __cpu_logical_map[0] = S390_lowcore.cpu_data.cpu_addr;
+	s390_init_cpu_topology();
 
 	/*
 	 * Setup capabilities (ELF_HWCAP & ELF_PLATFORM).
Index: quilt-2.6/arch/s390/kernel/smp.c
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/smp.c
+++ quilt-2.6/arch/s390/kernel/smp.c
@@ -67,9 +67,7 @@ enum s390_cpu_state {
 	CPU_STATE_CONFIGURED,
 };
 
-#ifdef CONFIG_HOTPLUG_CPU
-static DEFINE_MUTEX(smp_cpu_state_mutex);
-#endif
+DEFINE_MUTEX(smp_cpu_state_mutex);
 static int smp_cpu_state[NR_CPUS];
 
 static DEFINE_PER_CPU(struct cpu, cpu_devices);
Index: quilt-2.6/arch/s390/kernel/topology.c
===================================================================
--- /dev/null
+++ quilt-2.6/arch/s390/kernel/topology.c
@@ -0,0 +1,271 @@
+/*
+ *  arch/s390/kernel/topology.c
+ *
+ *    Copyright IBM Corp. 2007
+ *    Author(s): Heiko Carstens <heiko.carstens@de.ibm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/init.h>
+#include <linux/device.h>
+#include <linux/bootmem.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include <linux/cpu.h>
+#include <linux/smp.h>
+#include <asm/delay.h>
+#include <asm/s390_ext.h>
+
+#define CPU_BITS 64
+
+struct tl_cpu {
+	unsigned char reserved[6];
+	unsigned short origin;
+	unsigned long mask[CPU_BITS / BITS_PER_LONG];
+};
+
+struct tl_container {
+	unsigned char reserved[8];
+};
+
+union tl_entry {
+	unsigned char nl;
+	struct tl_cpu cpu;
+	struct tl_container container;
+};
+
+#define NR_MAG 6
+
+struct tl_info {
+	unsigned char reserved0[2];
+	unsigned short length;
+	unsigned char mag[NR_MAG];
+	unsigned char reserved1;
+	unsigned char mnest;
+	unsigned char reserved2[4];
+	union tl_entry tle[0];
+};
+
+struct core_info {
+	struct core_info *next;
+	cpumask_t mask;
+};
+
+static void topology_work_fn(struct work_struct *work);
+static struct tl_info *tl_info;
+static struct core_info core_info;
+static int machine_has_topology;
+static int machine_has_topology_irq;
+static struct timer_list topology_timer;
+static void set_topology_timer(void);
+static DECLARE_WORK(topology_work, topology_work_fn);
+
+cpumask_t cpu_coregroup_map(unsigned int cpu)
+{
+	struct core_info *core = &core_info;
+	cpumask_t mask;
+
+	cpus_clear(mask);
+	if (!machine_has_topology)
+		return cpu_present_map;
+	mutex_lock(&smp_cpu_state_mutex);
+	while (core) {
+		if (cpu_isset(cpu, core->mask)) {
+			mask = core->mask;
+			break;
+		}
+		core = core->next;
+	}
+	mutex_unlock(&smp_cpu_state_mutex);
+	if (cpus_empty(mask))
+		mask = cpumask_of_cpu(cpu);
+	return mask;
+}
+
+static void add_cpus_to_core(struct tl_cpu *tl_cpu, struct core_info *core)
+{
+	unsigned int cpu;
+
+	for (cpu = find_first_bit(&tl_cpu->mask[0], CPU_BITS);
+	     cpu < CPU_BITS;
+	     cpu = find_next_bit(&tl_cpu->mask[0], CPU_BITS, cpu + 1))
+	{
+		unsigned int rcpu, lcpu;
+
+		rcpu = CPU_BITS - 1 - cpu + tl_cpu->origin;
+		for_each_present_cpu(lcpu) {
+			if (__cpu_logical_map[lcpu] == rcpu)
+				cpu_set(lcpu, core->mask);
+		}
+	}
+}
+
+static void clear_cores(void)
+{
+	struct core_info *core = &core_info;
+
+	while (core) {
+		cpus_clear(core->mask);
+		core = core->next;
+	}
+}
+
+static union tl_entry *next_tle(union tl_entry *tle)
+{
+	if (tle->nl)
+		return (union tl_entry *)((struct tl_container *)tle + 1);
+	else
+		return (union tl_entry *)((struct tl_cpu *)tle + 1);
+}
+
+static void tl_to_cores(struct tl_info *info)
+{
+	union tl_entry *tle, *end;
+	struct core_info *core = &core_info;
+
+	mutex_lock(&smp_cpu_state_mutex);
+	clear_cores();
+	tle = (union tl_entry *)&info->tle;
+	end = (union tl_entry *)((unsigned long)info + info->length);
+	while (tle < end) {
+		switch (tle->nl) {
+		case 5:
+		case 4:
+		case 3:
+		case 2:
+			break;
+		case 1:
+			core = core->next;
+			break;
+		case 0:
+			add_cpus_to_core(&tle->cpu, core);
+			break;
+		default:
+			clear_cores();
+			machine_has_topology = 0;
+			return;
+		}
+		tle = next_tle(tle);
+	}
+	mutex_unlock(&smp_cpu_state_mutex);
+}
+
+static int ptf(void)
+{
+	int rc;
+
+	asm volatile(
+		"	.insn	rre,0xb9a20000,%1,%1\n"
+		"	ipm	%0\n"
+		"	srl	%0,28\n"
+		: "=d" (rc)
+		: "d" (2UL)  : "cc");
+	return rc;
+}
+
+void arch_update_cpu_topology(void)
+{
+	struct tl_info *info = tl_info;
+	struct sys_device *sysdev;
+	int cpu;
+
+	if (!machine_has_topology)
+		return;
+	ptf();
+	stsi(info, 15, 1, 2);
+	tl_to_cores(info);
+	for_each_online_cpu(cpu) {
+		sysdev = get_cpu_sysdev(cpu);
+		kobject_uevent(&sysdev->kobj, KOBJ_CHANGE);
+	}
+}
+
+static void topology_work_fn(struct work_struct *work)
+{
+	arch_reinit_sched_domains();
+}
+
+static void topology_timer_fn(unsigned long ignored)
+{
+	if (ptf())
+		schedule_work(&topology_work);
+	set_topology_timer();
+}
+
+static void set_topology_timer(void)
+{
+	topology_timer.function = topology_timer_fn;
+	topology_timer.data = 0;
+	topology_timer.expires = jiffies + 60 * HZ;
+	add_timer(&topology_timer);
+}
+
+static void topology_interrupt(__u16 code)
+{
+	schedule_work(&topology_work);
+}
+
+static int __init init_topology_update(void)
+{
+	int rc;
+
+	if (!machine_has_topology)
+		return 0;
+	init_timer(&topology_timer);
+	if (machine_has_topology_irq) {
+		rc = register_external_interrupt(0x2005, topology_interrupt);
+		if (rc)
+			return rc;
+		ctl_set_bit(0, 8);
+	}
+	else
+		set_topology_timer();
+	return 0;
+}
+__initcall(init_topology_update);
+
+void __init s390_init_cpu_topology(void)
+{
+	unsigned long long facility_bits;
+	struct tl_info *info;
+	struct core_info *core;
+	int nr_cores;
+	int i;
+
+	if (stfle(&facility_bits, 1) <= 0)
+		return;
+	if (!(facility_bits & (1ULL << 52)) || !(facility_bits & (1ULL << 61)))
+		return;
+	machine_has_topology = 1;
+
+	if (facility_bits & (1ULL << 51))
+		machine_has_topology_irq = 1;
+
+	tl_info = alloc_bootmem_pages(PAGE_SIZE);
+	if (!tl_info)
+		goto error;
+	info = tl_info;
+	stsi(info, 15, 1, 2);
+
+	nr_cores = info->mag[NR_MAG - 2];
+	for (i = 0; i < info->mnest - 2; i++)
+		nr_cores *= info->mag[NR_MAG - 3 - i];
+
+	printk(KERN_INFO "CPU topology:");
+	for (i = 0; i < NR_MAG; i++)
+		printk(" %d", info->mag[i]);
+	printk(" / %d\n", info->mnest);
+
+	core = &core_info;
+	for (i = 0; i < nr_cores; i++) {
+		core->next = alloc_bootmem(sizeof(struct core_info));
+		core = core->next;
+		if (!core)
+			goto error;
+	}
+	return;
+error:
+	machine_has_topology = 0;
+	machine_has_topology_irq = 0;
+}
Index: quilt-2.6/drivers/s390/sysinfo.c
===================================================================
--- quilt-2.6.orig/drivers/s390/sysinfo.c
+++ quilt-2.6/drivers/s390/sysinfo.c
@@ -105,7 +105,7 @@ struct sysinfo_3_2_2 {
 	} vm[8];
 };
 
-static inline int stsi(void *sysinfo, int fc, int sel1, int sel2)
+int stsi(void *sysinfo, int fc, int sel1, int sel2)
 {
 	register int r0 asm("0") = (fc << 28) | sel1;
 	register int r1 asm("1") = sel2;
Index: quilt-2.6/include/asm-s390/smp.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/smp.h
+++ quilt-2.6/include/asm-s390/smp.h
@@ -90,6 +90,8 @@ extern void __cpu_die (unsigned int cpu)
 extern void cpu_die (void) __attribute__ ((noreturn));
 extern int __cpu_up (unsigned int cpu);
 
+extern struct mutex smp_cpu_state_mutex;
+
 extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
 	void *info, int wait);
 #endif
Index: quilt-2.6/include/asm-s390/system.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/system.h
+++ quilt-2.6/include/asm-s390/system.h
@@ -406,6 +406,7 @@ __set_psw_mask(unsigned long mask)
 #define local_mcck_enable()  __set_psw_mask(psw_kernel_bits)
 #define local_mcck_disable() __set_psw_mask(psw_kernel_bits & ~PSW_MASK_MCHECK)
 
+int stsi(void *sysinfo, int fc, int sel1, int sel2);
 int stfle(unsigned long long *list, int doublewords);
 
 #ifdef CONFIG_SMP
Index: quilt-2.6/include/asm-s390/topology.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/topology.h
+++ quilt-2.6/include/asm-s390/topology.h
@@ -1,6 +1,22 @@
 #ifndef _ASM_S390_TOPOLOGY_H
 #define _ASM_S390_TOPOLOGY_H
 
+#include <linux/cpumask.h>
+
+#define mc_capable()	(1)
+
+cpumask_t cpu_coregroup_map(unsigned int cpu);
+
+#define topology_core_siblings(cpu)	(cpu_coregroup_map(cpu))
+
+#ifdef CONFIG_SMP
+void s390_init_cpu_topology(void);
+#else
+static inline void s390_init_cpu_topology(void)
+{
+};
+#endif
+
 #include <asm-generic/topology.h>
 
 #endif /* _ASM_S390_TOPOLOGY_H */

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 07/10] Vertical cpu management.
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
                   ` (5 preceding siblings ...)
  2008-03-12 17:32 ` [patch 06/10] cpu topology support for s390 Martin Schwidefsky
@ 2008-03-12 17:32 ` Martin Schwidefsky
  2008-03-12 17:32 ` [patch 08/10] Add missing TLB flush to hugetlb_cow() Martin Schwidefsky
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:32 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Heiko Carstens, Martin Schwidefsky

[-- Attachment #1: 107-vertical-cpus.diff --]
[-- Type: text/plain, Size: 9167 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 arch/s390/kernel/smp.c      |   83 ++++++++++++++++++++++++++++++++++++++++++--
 arch/s390/kernel/topology.c |   66 +++++++++++++++++++++++++++++-----
 include/asm-s390/smp.h      |    1 
 include/asm-s390/topology.h |    9 ++++
 4 files changed, 146 insertions(+), 13 deletions(-)

Index: quilt-2.6/arch/s390/kernel/smp.c
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/smp.c
+++ quilt-2.6/arch/s390/kernel/smp.c
@@ -68,7 +68,9 @@ enum s390_cpu_state {
 };
 
 DEFINE_MUTEX(smp_cpu_state_mutex);
+int smp_cpu_polarization[NR_CPUS];
 static int smp_cpu_state[NR_CPUS];
+static int cpu_management;
 
 static DEFINE_PER_CPU(struct cpu, cpu_devices);
 DEFINE_PER_CPU(struct s390_idle_data, s390_idle);
@@ -454,6 +456,7 @@ static int smp_rescan_cpus_sigp(cpumask_
 		if (cpu_known(cpu_id))
 			continue;
 		__cpu_logical_map[logical_cpu] = cpu_id;
+		smp_cpu_polarization[logical_cpu] = POLARIZATION_UNKNWN;
 		if (!cpu_stopped(logical_cpu))
 			continue;
 		cpu_set(logical_cpu, cpu_present_map);
@@ -487,6 +490,7 @@ static int smp_rescan_cpus_sclp(cpumask_
 		if (cpu_known(cpu_id))
 			continue;
 		__cpu_logical_map[logical_cpu] = cpu_id;
+		smp_cpu_polarization[logical_cpu] = POLARIZATION_UNKNWN;
 		cpu_set(logical_cpu, cpu_present_map);
 		if (cpu >= info->configured)
 			smp_cpu_state[logical_cpu] = CPU_STATE_STANDBY;
@@ -844,6 +848,7 @@ void __init smp_prepare_boot_cpu(void)
 	S390_lowcore.percpu_offset = __per_cpu_offset[0];
 	current_set[0] = current;
 	smp_cpu_state[0] = CPU_STATE_CONFIGURED;
+	smp_cpu_polarization[0] = POLARIZATION_UNKNWN;
 	spin_lock_init(&(&__get_cpu_var(s390_idle))->lock);
 }
 
@@ -895,15 +900,19 @@ static ssize_t cpu_configure_store(struc
 	case 0:
 		if (smp_cpu_state[cpu] == CPU_STATE_CONFIGURED) {
 			rc = sclp_cpu_deconfigure(__cpu_logical_map[cpu]);
-			if (!rc)
+			if (!rc) {
 				smp_cpu_state[cpu] = CPU_STATE_STANDBY;
+				smp_cpu_polarization[cpu] = POLARIZATION_UNKNWN;
+			}
 		}
 		break;
 	case 1:
 		if (smp_cpu_state[cpu] == CPU_STATE_STANDBY) {
 			rc = sclp_cpu_configure(__cpu_logical_map[cpu]);
-			if (!rc)
+			if (!rc) {
 				smp_cpu_state[cpu] = CPU_STATE_CONFIGURED;
+				smp_cpu_polarization[cpu] = POLARIZATION_UNKNWN;
+			}
 		}
 		break;
 	default:
@@ -917,6 +926,34 @@ out:
 static SYSDEV_ATTR(configure, 0644, cpu_configure_show, cpu_configure_store);
 #endif /* CONFIG_HOTPLUG_CPU */
 
+static ssize_t cpu_polarization_show(struct sys_device *dev, char *buf)
+{
+	int cpu = dev->id;
+	ssize_t count;
+
+	mutex_lock(&smp_cpu_state_mutex);
+	switch (smp_cpu_polarization[cpu]) {
+	case POLARIZATION_HRZ:
+		count = sprintf(buf, "horizontal\n");
+		break;
+	case POLARIZATION_VL:
+		count = sprintf(buf, "vertical:low\n");
+		break;
+	case POLARIZATION_VM:
+		count = sprintf(buf, "vertical:medium\n");
+		break;
+	case POLARIZATION_VH:
+		count = sprintf(buf, "vertical:high\n");
+		break;
+	default:
+		count = sprintf(buf, "unknown\n");
+		break;
+	}
+	mutex_unlock(&smp_cpu_state_mutex);
+	return count;
+}
+static SYSDEV_ATTR(polarization, 0444, cpu_polarization_show, NULL);
+
 static ssize_t show_cpu_address(struct sys_device *dev, char *buf)
 {
 	return sprintf(buf, "%d\n", __cpu_logical_map[dev->id]);
@@ -929,6 +966,7 @@ static struct attribute *cpu_common_attr
 	&attr_configure.attr,
 #endif
 	&attr_address.attr,
+	&attr_polarization.attr,
 	NULL,
 };
 
@@ -1073,11 +1111,48 @@ static ssize_t __ref rescan_store(struct
 out:
 	put_online_cpus();
 	mutex_unlock(&smp_cpu_state_mutex);
+	if (!cpus_empty(newcpus))
+		topology_schedule_update();
 	return rc ? rc : count;
 }
 static SYSDEV_ATTR(rescan, 0200, NULL, rescan_store);
 #endif /* CONFIG_HOTPLUG_CPU */
 
+static ssize_t dispatching_show(struct sys_device *dev, char *buf)
+{
+	ssize_t count;
+
+	mutex_lock(&smp_cpu_state_mutex);
+	count = sprintf(buf, "%d\n", cpu_management);
+	mutex_unlock(&smp_cpu_state_mutex);
+	return count;
+}
+
+static ssize_t dispatching_store(struct sys_device *dev, const char *buf,
+				 size_t count)
+{
+	int val, rc;
+	char delim;
+
+	if (sscanf(buf, "%d %c", &val, &delim) != 1)
+		return -EINVAL;
+	if (val != 0 && val != 1)
+		return -EINVAL;
+	rc = 0;
+	mutex_lock(&smp_cpu_state_mutex);
+	get_online_cpus();
+	if (cpu_management == val)
+		goto out;
+	rc = topology_set_cpu_management(val);
+	if (!rc)
+		cpu_management = val;
+out:
+	put_online_cpus();
+	mutex_unlock(&smp_cpu_state_mutex);
+	return rc ? rc : count;
+}
+static SYSDEV_ATTR(dispatching, 0644, dispatching_show, dispatching_store);
+
 static int __init topology_init(void)
 {
 	int cpu;
@@ -1091,6 +1166,10 @@ static int __init topology_init(void)
 	if (rc)
 		return rc;
 #endif
+	rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
+			       &attr_dispatching.attr);
+	if (rc)
+		return rc;
 	for_each_present_cpu(cpu) {
 		rc = smp_add_present_cpu(cpu);
 		if (rc)
Index: quilt-2.6/arch/s390/kernel/topology.c
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/topology.c
+++ quilt-2.6/arch/s390/kernel/topology.c
@@ -18,9 +18,17 @@
 #include <asm/s390_ext.h>
 
 #define CPU_BITS 64
+#define NR_MAG 6
+
+#define PTF_HORIZONTAL	(0UL)
+#define PTF_VERTICAL	(1UL)
+#define PTF_CHECK	(2UL)
 
 struct tl_cpu {
-	unsigned char reserved[6];
+	unsigned char reserved0[4];
+	unsigned char :6;
+	unsigned char pp:2;
+	unsigned char reserved1;
 	unsigned short origin;
 	unsigned long mask[CPU_BITS / BITS_PER_LONG];
 };
@@ -35,8 +43,6 @@ union tl_entry {
 	struct tl_container container;
 };
 
-#define NR_MAG 6
-
 struct tl_info {
 	unsigned char reserved0[2];
 	unsigned short length;
@@ -95,8 +101,10 @@ static void add_cpus_to_core(struct tl_c
 
 		rcpu = CPU_BITS - 1 - cpu + tl_cpu->origin;
 		for_each_present_cpu(lcpu) {
-			if (__cpu_logical_map[lcpu] == rcpu)
+			if (__cpu_logical_map[lcpu] == rcpu) {
 				cpu_set(lcpu, core->mask);
+				smp_cpu_polarization[lcpu] = tl_cpu->pp;
+			}
 		}
 	}
 }
@@ -151,7 +159,17 @@ static void tl_to_cores(struct tl_info *
 	mutex_unlock(&smp_cpu_state_mutex);
 }
 
-static int ptf(void)
+static void topology_update_polarization_simple(void)
+{
+	int cpu;
+
+	mutex_lock(&smp_cpu_state_mutex);
+	for_each_present_cpu(cpu)
+		smp_cpu_polarization[cpu] = POLARIZATION_HRZ;
+	mutex_unlock(&smp_cpu_state_mutex);
+}
+
+static int ptf(unsigned long fc)
 {
 	int rc;
 
@@ -160,7 +178,25 @@ static int ptf(void)
 		"	ipm	%0\n"
 		"	srl	%0,28\n"
 		: "=d" (rc)
-		: "d" (2UL)  : "cc");
+		: "d" (fc)  : "cc");
+	return rc;
+}
+
+int topology_set_cpu_management(int fc)
+{
+	int cpu;
+	int rc;
+
+	if (!machine_has_topology)
+		return -EOPNOTSUPP;
+	if (fc)
+		rc = ptf(PTF_VERTICAL);
+	else
+		rc = ptf(PTF_HORIZONTAL);
+	if (rc)
+		return -EBUSY;
+	for_each_present_cpu(cpu)
+		smp_cpu_polarization[cpu] = POLARIZATION_UNKNWN;
 	return rc;
 }
 
@@ -170,9 +206,10 @@ void arch_update_cpu_topology(void)
 	struct sys_device *sysdev;
 	int cpu;
 
-	if (!machine_has_topology)
+	if (!machine_has_topology) {
+		topology_update_polarization_simple();
 		return;
-	ptf();
+	}
 	stsi(info, 15, 1, 2);
 	tl_to_cores(info);
 	for_each_online_cpu(cpu) {
@@ -186,10 +223,15 @@ static void topology_work_fn(struct work
 	arch_reinit_sched_domains();
 }
 
+void topology_schedule_update(void)
+{
+	schedule_work(&topology_work);
+}
+
 static void topology_timer_fn(unsigned long ignored)
 {
-	if (ptf())
-		schedule_work(&topology_work);
+	if (ptf(PTF_CHECK))
+		topology_schedule_update();
 	set_topology_timer();
 }
 
@@ -210,8 +252,10 @@ static int __init init_topology_update(v
 {
 	int rc;
 
-	if (!machine_has_topology)
+	if (!machine_has_topology) {
+		topology_update_polarization_simple();
 		return 0;
+	}
 	init_timer(&topology_timer);
 	if (machine_has_topology_irq) {
 		rc = register_external_interrupt(0x2005, topology_interrupt);
Index: quilt-2.6/include/asm-s390/smp.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/smp.h
+++ quilt-2.6/include/asm-s390/smp.h
@@ -91,6 +91,7 @@ extern void cpu_die (void) __attribute__
 extern int __cpu_up (unsigned int cpu);
 
 extern struct mutex smp_cpu_state_mutex;
+extern int smp_cpu_polarization[];
 
 extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
 	void *info, int wait);
Index: quilt-2.6/include/asm-s390/topology.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/topology.h
+++ quilt-2.6/include/asm-s390/topology.h
@@ -9,6 +9,15 @@ cpumask_t cpu_coregroup_map(unsigned int
 
 #define topology_core_siblings(cpu)	(cpu_coregroup_map(cpu))
 
+int topology_set_cpu_management(int fc);
+void topology_schedule_update(void);
+
+#define POLARIZATION_UNKNWN	(-1)
+#define POLARIZATION_HRZ	(0)
+#define POLARIZATION_VL		(1)
+#define POLARIZATION_VM		(2)
+#define POLARIZATION_VH		(3)
+
 #ifdef CONFIG_SMP
 void s390_init_cpu_topology(void);
 #else

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 08/10] Add missing TLB flush to hugetlb_cow().
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
                   ` (6 preceding siblings ...)
  2008-03-12 17:32 ` [patch 07/10] Vertical cpu management Martin Schwidefsky
@ 2008-03-12 17:32 ` Martin Schwidefsky
  2008-03-12 17:32 ` [patch 09/10] Hugetlb common code update for System z Martin Schwidefsky
  2008-03-12 17:32 ` [patch 10/10] System z large page support Martin Schwidefsky
  9 siblings, 0 replies; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:32 UTC (permalink / raw)
  To: linux-kernel, linux-s390
  Cc: Andi Kleen, David S. Miller, H. Peter Anvin, Ingo Molnar,
	Paul Mackerras, Paul Mundt, Thomas Gleixner, Tony Luck,
	Gerald Schaefer, Martin Schwidefsky

[-- Attachment #1: 108-hugetlb-tlbflush.diff --]
[-- Type: text/plain, Size: 2224 bytes --]

From: Gerald Schaefer <geraldsc@de.ibm.com>

A cow break on a hugetlbfs page with page_count > 1 will set a new pte
with set_huge_pte_at(), w/o any tlb flush operation. The old pte will
remain in the tlb and subsequent write access to the page will result
in a page fault loop, for as long as it may take until the tlb is
flushed from somewhere else.
This patch introduces an architecture-specific huge_ptep_clear_flush()
function, which is called before the the set_huge_pte_at() in
hugetlb_cow().

Cc: Andi Kleen <ak@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 include/linux/hugetlb.h |    4 ++++
 mm/hugetlb.c            |    1 +
 2 files changed, 5 insertions(+)

Index: quilt-2.6/include/linux/hugetlb.h
===================================================================
--- quilt-2.6.orig/include/linux/hugetlb.h
+++ quilt-2.6/include/linux/hugetlb.h
@@ -80,6 +80,10 @@ static inline int prepare_hugepage_range
 int prepare_hugepage_range(unsigned long addr, unsigned long len);
 #endif
 
+#ifndef ARCH_HAS_HUGEPAGE_CLEAR_FLUSH
+#define huge_ptep_clear_flush(vma, addr, ptep)	do { } while (0)
+#endif
+
 #ifndef ARCH_HAS_SETCLEAR_HUGE_PTE
 #define set_huge_pte_at(mm, addr, ptep, pte)	set_pte_at(mm, addr, ptep, pte)
 #define huge_ptep_get_and_clear(mm, addr, ptep) ptep_get_and_clear(mm, addr, ptep)
Index: quilt-2.6/mm/hugetlb.c
===================================================================
--- quilt-2.6.orig/mm/hugetlb.c
+++ quilt-2.6/mm/hugetlb.c
@@ -864,6 +864,7 @@ static int hugetlb_cow(struct mm_struct 
 	ptep = huge_pte_offset(mm, address & HPAGE_MASK);
 	if (likely(pte_same(*ptep, pte))) {
 		/* Break COW */
+		huge_ptep_clear_flush(vma, address, ptep);
 		set_huge_pte_at(mm, address, ptep,
 				make_huge_pte(vma, new_page, 1));
 		/* Make the old page be freed below */

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 09/10] Hugetlb common code update for System z.
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
                   ` (7 preceding siblings ...)
  2008-03-12 17:32 ` [patch 08/10] Add missing TLB flush to hugetlb_cow() Martin Schwidefsky
@ 2008-03-12 17:32 ` Martin Schwidefsky
  2008-03-12 17:51   ` Dave Hansen
  2008-03-12 17:32 ` [patch 10/10] System z large page support Martin Schwidefsky
  9 siblings, 1 reply; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:32 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Gerald Schaefer, Martin Schwidefsky

[-- Attachment #1: 109-hugetlb-defines.diff --]
[-- Type: text/plain, Size: 6154 bytes --]

From: Gerald Schaefer <geraldsc@de.ibm.com>

Huge ptes have a special type on s390 and cannot be handled with the
standard pte functions in certain cases. This patch adds some new
architecture-specific definitions and functions to hugetlb common code,
as a prerequisite for the System z large page support. They won't
affect other architectures.

Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 include/linux/hugetlb.h |   18 ++++++++++++++++++
 mm/hugetlb.c            |   36 +++++++++++++++++++++---------------
 2 files changed, 39 insertions(+), 15 deletions(-)

Index: quilt-2.6/include/linux/hugetlb.h
===================================================================
--- quilt-2.6.orig/include/linux/hugetlb.h
+++ quilt-2.6/include/linux/hugetlb.h
@@ -80,6 +80,24 @@ static inline int prepare_hugepage_range
 int prepare_hugepage_range(unsigned long addr, unsigned long len);
 #endif
 
+#ifndef ARCH_HAS_HUGE_PTE_TYPE
+#define huge_pte_none(pte)			pte_none(pte)
+#define huge_pte_wrprotect(pte)			pte_wrprotect(pte)
+#define huge_ptep_set_wrprotect(mm, addr, ptep)	\
+	ptep_set_wrprotect(mm, addr, ptep)
+#define huge_ptep_set_access_flags(vma, addr, ptep, pte, dirty)	\
+	ptep_set_access_flags(vma, addr, ptep, pte, dirty)
+#define huge_ptep_get(ptep)			(*ptep)
+#endif
+
+#ifndef ARCH_HAS_PREPARE_HUGEPAGE
+#define arch_prepare_hugepage(page)		0
+#define arch_release_hugepage(page)		do { } while (0)
+#else
+int arch_prepare_hugepage(struct page *page);
+void arch_release_hugepage(struct page *page);
+#endif
+
 #ifndef ARCH_HAS_HUGEPAGE_CLEAR_FLUSH
 #define huge_ptep_clear_flush(vma, addr, ptep)	do { } while (0)
 #endif
Index: quilt-2.6/mm/hugetlb.c
===================================================================
--- quilt-2.6.orig/mm/hugetlb.c
+++ quilt-2.6/mm/hugetlb.c
@@ -129,6 +129,7 @@ static void update_and_free_page(struct 
 	}
 	set_compound_page_dtor(page, NULL);
 	set_page_refcounted(page);
+	arch_release_hugepage(page);
 	__free_pages(page, HUGETLB_PAGE_ORDER);
 }
 
@@ -198,6 +199,10 @@ static struct page *alloc_fresh_huge_pag
 		htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|__GFP_NOWARN,
 		HUGETLB_PAGE_ORDER);
 	if (page) {
+		if (arch_prepare_hugepage(page)) {
+			__free_pages(page, HUGETLB_PAGE_ORDER);
+			return 0;
+		}
 		set_compound_page_dtor(page, free_huge_page);
 		spin_lock(&hugetlb_lock);
 		nr_huge_pages++;
@@ -707,7 +712,7 @@ static pte_t make_huge_pte(struct vm_are
 		entry =
 		    pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
 	} else {
-		entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot));
+		entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot));
 	}
 	entry = pte_mkyoung(entry);
 	entry = pte_mkhuge(entry);
@@ -720,8 +725,8 @@ static void set_huge_ptep_writable(struc
 {
 	pte_t entry;
 
-	entry = pte_mkwrite(pte_mkdirty(*ptep));
-	if (ptep_set_access_flags(vma, address, ptep, entry, 1)) {
+	entry = pte_mkwrite(pte_mkdirty(huge_ptep_get(ptep)));
+	if (huge_ptep_set_access_flags(vma, address, ptep, entry, 1)) {
 		update_mmu_cache(vma, address, entry);
 	}
 }
@@ -751,10 +756,10 @@ int copy_hugetlb_page_range(struct mm_st
 
 		spin_lock(&dst->page_table_lock);
 		spin_lock(&src->page_table_lock);
-		if (!pte_none(*src_pte)) {
+		if (!huge_pte_none(huge_ptep_get(src_pte))) {
 			if (cow)
-				ptep_set_wrprotect(src, addr, src_pte);
-			entry = *src_pte;
+				huge_ptep_set_wrprotect(src, addr, src_pte);
+			entry = huge_ptep_get(src_pte);
 			ptepage = pte_page(entry);
 			get_page(ptepage);
 			set_huge_pte_at(dst, addr, dst_pte, entry);
@@ -798,7 +803,7 @@ void __unmap_hugepage_range(struct vm_ar
 			continue;
 
 		pte = huge_ptep_get_and_clear(mm, address, ptep);
-		if (pte_none(pte))
+		if (huge_pte_none(pte))
 			continue;
 
 		page = pte_page(pte);
@@ -862,7 +867,7 @@ static int hugetlb_cow(struct mm_struct 
 	spin_lock(&mm->page_table_lock);
 
 	ptep = huge_pte_offset(mm, address & HPAGE_MASK);
-	if (likely(pte_same(*ptep, pte))) {
+	if (likely(pte_same(huge_ptep_get(ptep), pte))) {
 		/* Break COW */
 		huge_ptep_clear_flush(vma, address, ptep);
 		set_huge_pte_at(mm, address, ptep,
@@ -932,7 +937,7 @@ retry:
 		goto backout;
 
 	ret = 0;
-	if (!pte_none(*ptep))
+	if (!huge_pte_none(huge_ptep_get(ptep)))
 		goto backout;
 
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
@@ -974,8 +979,8 @@ int hugetlb_fault(struct mm_struct *mm, 
 	 * the same page in the page cache.
 	 */
 	mutex_lock(&hugetlb_instantiation_mutex);
-	entry = *ptep;
-	if (pte_none(entry)) {
+	entry = huge_ptep_get(ptep);
+	if (huge_pte_none(entry)) {
 		ret = hugetlb_no_page(mm, vma, address, ptep, write_access);
 		mutex_unlock(&hugetlb_instantiation_mutex);
 		return ret;
@@ -985,7 +990,7 @@ int hugetlb_fault(struct mm_struct *mm, 
 
 	spin_lock(&mm->page_table_lock);
 	/* Check for a racing update before calling hugetlb_cow */
-	if (likely(pte_same(entry, *ptep)))
+	if (likely(pte_same(entry, huge_ptep_get(ptep))))
 		if (write_access && !pte_write(entry))
 			ret = hugetlb_cow(mm, vma, address, ptep, entry);
 	spin_unlock(&mm->page_table_lock);
@@ -1015,7 +1020,8 @@ int follow_hugetlb_page(struct mm_struct
 		 */
 		pte = huge_pte_offset(mm, vaddr & HPAGE_MASK);
 
-		if (!pte || pte_none(*pte) || (write && !pte_write(*pte))) {
+		if (!pte || huge_pte_none(huge_ptep_get(pte)) ||
+		    (write && !pte_write(huge_ptep_get(pte)))) {
 			int ret;
 
 			spin_unlock(&mm->page_table_lock);
@@ -1031,7 +1037,7 @@ int follow_hugetlb_page(struct mm_struct
 		}
 
 		pfn_offset = (vaddr & ~HPAGE_MASK) >> PAGE_SHIFT;
-		page = pte_page(*pte);
+		page = pte_page(huge_ptep_get(pte));
 same_page:
 		if (pages) {
 			get_page(page);
@@ -1080,7 +1086,7 @@ void hugetlb_change_protection(struct vm
 			continue;
 		if (huge_pmd_unshare(mm, &address, ptep))
 			continue;
-		if (!pte_none(*ptep)) {
+		if (!huge_pte_none(huge_ptep_get(ptep))) {
 			pte = huge_ptep_get_and_clear(mm, address, ptep);
 			pte = pte_mkhuge(pte_modify(pte, newprot));
 			set_huge_pte_at(mm, address, ptep, pte);

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 10/10] System z large page support.
  2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
                   ` (8 preceding siblings ...)
  2008-03-12 17:32 ` [patch 09/10] Hugetlb common code update for System z Martin Schwidefsky
@ 2008-03-12 17:32 ` Martin Schwidefsky
  2008-03-12 17:52   ` Dave Hansen
  9 siblings, 1 reply; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-12 17:32 UTC (permalink / raw)
  To: linux-kernel, linux-s390; +Cc: Gerald Schaefer, Martin Schwidefsky

[-- Attachment #1: 110-hugetlb-s390.diff --]
[-- Type: text/plain, Size: 22918 bytes --]

From: Gerald Schaefer <geraldsc@de.ibm.com>

This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.

Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 arch/s390/defconfig         |    3 
 arch/s390/kernel/early.c    |   14 +++
 arch/s390/kernel/head64.S   |   12 +++
 arch/s390/kernel/setup.c    |   19 +----
 arch/s390/mm/Makefile       |    2 
 arch/s390/mm/fault.c        |    3 
 arch/s390/mm/hugetlbpage.c  |  134 +++++++++++++++++++++++++++++++++++++
 arch/s390/mm/init.c         |   23 ------
 arch/s390/mm/vmem.c         |   55 +++++++++++++--
 fs/Kconfig                  |    3 
 include/asm-s390/page.h     |   29 ++++++--
 include/asm-s390/pgtable.h  |  158 ++++++++++++++++++++++++++++++++++++++++++++
 include/asm-s390/setup.h    |    4 +
 include/asm-s390/system.h   |   10 ++
 include/asm-s390/tlbflush.h |    1 
 15 files changed, 418 insertions(+), 52 deletions(-)

Index: quilt-2.6/arch/s390/defconfig
===================================================================
--- quilt-2.6.orig/arch/s390/defconfig
+++ quilt-2.6/arch/s390/defconfig
@@ -653,7 +653,8 @@ CONFIG_PROC_SYSCTL=y
 CONFIG_SYSFS=y
 CONFIG_TMPFS=y
 CONFIG_TMPFS_POSIX_ACL=y
-# CONFIG_HUGETLB_PAGE is not set
+CONFIG_HUGETLBFS=y
+CONFIG_HUGETLB_PAGE=y
 CONFIG_CONFIGFS_FS=m
 
 #
Index: quilt-2.6/arch/s390/kernel/early.c
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/early.c
+++ quilt-2.6/arch/s390/kernel/early.c
@@ -263,6 +263,19 @@ static noinline __init void setup_lowcor
 	s390_base_pgm_handler_fn = early_pgm_check_handler;
 }
 
+static noinline __init void setup_hpage(void)
+{
+#ifndef CONFIG_DEBUG_PAGEALLOC
+	unsigned int facilities;
+
+	facilities = stfl();
+	if (!(facilities & (1UL << 23)) || !(facilities & (1UL << 29)))
+		return;
+	machine_flags |= 1024;
+	__ctl_set_bit(0, 23);
+#endif
+}
+
 /*
  * Save ipl parameters, clear bss memory, initialize storage keys
  * and create a kernel NSS at startup if the SAVESYS= parm is defined
@@ -280,6 +293,7 @@ void __init startup_init(void)
 	create_kernel_nss();
 	sort_main_extable();
 	setup_lowcore_early();
+	setup_hpage();
 	sclp_read_info_early();
 	sclp_facilities_detect();
 	memsize = sclp_memory_detect();
Index: quilt-2.6/arch/s390/kernel/head64.S
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/head64.S
+++ quilt-2.6/arch/s390/kernel/head64.S
@@ -187,11 +187,21 @@ startup_continue:
 	oi	6(%r12),2		# set MVCOS flag
 1:
 
+	la	%r1,0f-.LPG1(%r13)
+	stg	%r1,__LC_PGM_NEW_PSW+8
+	lhi	%r1,-1
+	.short	0xb9af
+	.short	0x0011
+0:	tm	0x8f,0x6		# specification exception?
+	bno	1f-.LPG1(%r13)
+	oi	6(%r12),8
+1:
+
 	lpswe	.Lentry-.LPG1(13)	# jump to _stext in primary-space,
 					# virtual and never return ...
 	.align	16
 .Lentry:.quad	0x0000000180000000,_stext
-.Lctl:	.quad	0x04b50002		# cr0: various things
+.Lctl:	.quad	0x04350002		# cr0: various things
 	.quad	0			# cr1: primary space segment table
 	.quad	.Lduct			# cr2: dispatchable unit control table
 	.quad	0			# cr3: instruction authorization
Index: quilt-2.6/arch/s390/kernel/setup.c
===================================================================
--- quilt-2.6.orig/arch/s390/kernel/setup.c
+++ quilt-2.6/arch/s390/kernel/setup.c
@@ -679,15 +679,6 @@ setup_memory(void)
 #endif
 }
 
-static __init unsigned int stfl(void)
-{
-	asm volatile(
-		"	.insn	s,0xb2b10000,0(0)\n" /* stfl */
-		"0:\n"
-		EX_TABLE(0b,0b));
-	return S390_lowcore.stfl_fac_list;
-}
-
 static int __init __stfle(unsigned long long *list, int doublewords)
 {
 	typedef struct { unsigned long long _[doublewords]; } addrtype;
@@ -754,6 +745,9 @@ static void __init setup_hwcaps(void)
 			elf_hwcap |= 1UL << 6;
 	}
 
+	if (MACHINE_HAS_HPAGE)
+		elf_hwcap |= 1UL << 7;
+
 	switch (cpuinfo->cpu_id.machine) {
 	case 0x9672:
 #if !defined(CONFIG_64BIT)
@@ -873,8 +867,9 @@ void __cpuinit print_cpu_info(struct cpu
 
 static int show_cpuinfo(struct seq_file *m, void *v)
 {
-	static const char *hwcap_str[7] = {
-		"esan3", "zarch", "stfle", "msa", "ldisp", "eimm", "dfp"
+	static const char *hwcap_str[8] = {
+		"esan3", "zarch", "stfle", "msa", "ldisp", "eimm", "dfp",
+		"edat"
 	};
         struct cpuinfo_S390 *cpuinfo;
 	unsigned long n = (unsigned long) v - 1;
@@ -889,7 +884,7 @@ static int show_cpuinfo(struct seq_file 
 			       num_online_cpus(), loops_per_jiffy/(500000/HZ),
 			       (loops_per_jiffy/(5000/HZ))%100);
 		seq_puts(m, "features\t: ");
-		for (i = 0; i < 7; i++)
+		for (i = 0; i < 8; i++)
 			if (hwcap_str[i] && (elf_hwcap & (1UL << i)))
 				seq_printf(m, "%s ", hwcap_str[i]);
 		seq_puts(m, "\n");
Index: quilt-2.6/arch/s390/mm/fault.c
===================================================================
--- quilt-2.6.orig/arch/s390/mm/fault.c
+++ quilt-2.6/arch/s390/mm/fault.c
@@ -28,6 +28,7 @@
 #include <linux/hardirq.h>
 #include <linux/kprobes.h>
 #include <linux/uaccess.h>
+#include <linux/hugetlb.h>
 
 #include <asm/system.h>
 #include <asm/pgtable.h>
@@ -374,6 +375,8 @@ good_area:
 	}
 
 survive:
+	if (is_vm_hugetlb_page(vma))
+		address &= HPAGE_MASK;
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
Index: quilt-2.6/arch/s390/mm/hugetlbpage.c
===================================================================
--- /dev/null
+++ quilt-2.6/arch/s390/mm/hugetlbpage.c
@@ -0,0 +1,134 @@
+/*
+ *  IBM System z Huge TLB Page Support for Kernel.
+ *
+ *    Copyright 2007 IBM Corp.
+ *    Author(s): Gerald Schaefer <gerald.schaefer@de.ibm.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+
+
+void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+				   pte_t *pteptr, pte_t pteval)
+{
+	pmd_t *pmdp = (pmd_t *) pteptr;
+	pte_t shadow_pteval = pteval;
+	unsigned long mask;
+
+	if (!MACHINE_HAS_HPAGE) {
+		pteptr = (pte_t *) pte_page(pteval)[1].index;
+		mask = pte_val(pteval) &
+				(_SEGMENT_ENTRY_INV | _SEGMENT_ENTRY_RO);
+		pte_val(pteval) = (_SEGMENT_ENTRY + __pa(pteptr)) | mask;
+		if (mm->context.noexec) {
+			pteptr += PTRS_PER_PTE;
+			pte_val(shadow_pteval) =
+					(_SEGMENT_ENTRY + __pa(pteptr)) | mask;
+		}
+	}
+
+	pmd_val(*pmdp) = pte_val(pteval);
+	if (mm->context.noexec) {
+		pmdp = get_shadow_table(pmdp);
+		pmd_val(*pmdp) = pte_val(shadow_pteval);
+	}
+}
+
+int arch_prepare_hugepage(struct page *page)
+{
+	unsigned long addr = page_to_phys(page);
+	pte_t pte;
+	pte_t *ptep;
+	int i;
+
+	if (MACHINE_HAS_HPAGE)
+		return 0;
+
+	ptep = (pte_t *) pte_alloc_one(&init_mm, address);
+	if (!ptep)
+		return -ENOMEM;
+
+	pte = mk_pte(page, PAGE_RW);
+	for (i = 0; i < PTRS_PER_PTE; i++) {
+		set_pte_at(&init_mm, addr + i * PAGE_SIZE, ptep + i, pte);
+		pte_val(pte) += PAGE_SIZE;
+	}
+	page[1].index = (unsigned long) ptep;
+	return 0;
+}
+
+void arch_release_hugepage(struct page *page)
+{
+	pte_t *ptep;
+
+	if (MACHINE_HAS_HPAGE)
+		return;
+
+	ptep = (pte_t *) page[1].index;
+	if (!ptep)
+		return;
+	pte_free(&init_mm, ptep);
+	page[1].index = 0;
+}
+
+pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr)
+{
+	pgd_t *pgdp;
+	pud_t *pudp;
+	pmd_t *pmdp = NULL;
+
+	pgdp = pgd_offset(mm, addr);
+	pudp = pud_alloc(mm, pgdp, addr);
+	if (pudp)
+		pmdp = pmd_alloc(mm, pudp, addr);
+	return (pte_t *) pmdp;
+}
+
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+{
+	pgd_t *pgdp;
+	pud_t *pudp;
+	pmd_t *pmdp = NULL;
+
+	pgdp = pgd_offset(mm, addr);
+	if (pgd_present(*pgdp)) {
+		pudp = pud_offset(pgdp, addr);
+		if (pud_present(*pudp))
+			pmdp = pmd_offset(pudp, addr);
+	}
+	return (pte_t *) pmdp;
+}
+
+int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
+{
+	return 0;
+}
+
+struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
+			      int write)
+{
+	return ERR_PTR(-EINVAL);
+}
+
+int pmd_huge(pmd_t pmd)
+{
+	if (!MACHINE_HAS_HPAGE)
+		return 0;
+
+	return !!(pmd_val(pmd) & _SEGMENT_ENTRY_LARGE);
+}
+
+struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+			     pmd_t *pmdp, int write)
+{
+	struct page *page;
+
+	if (!MACHINE_HAS_HPAGE)
+		return NULL;
+
+	page = pmd_page(*pmdp);
+	if (page)
+		page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
+	return page;
+}
Index: quilt-2.6/arch/s390/mm/init.c
===================================================================
--- quilt-2.6.orig/arch/s390/mm/init.c
+++ quilt-2.6/arch/s390/mm/init.c
@@ -78,28 +78,6 @@ void show_mem(void)
 	printk("%lu pages pagetables\n", global_page_state(NR_PAGETABLE));
 }
 
-static void __init setup_ro_region(void)
-{
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	pte_t new_pte;
-	unsigned long address, end;
-
-	address = ((unsigned long)&_stext) & PAGE_MASK;
-	end = PFN_ALIGN((unsigned long)&_eshared);
-
-	for (; address < end; address += PAGE_SIZE) {
-		pgd = pgd_offset_k(address);
-		pud = pud_offset(pgd, address);
-		pmd = pmd_offset(pud, address);
-		pte = pte_offset_kernel(pmd, address);
-		new_pte = mk_pte_phys(address, __pgprot(_PAGE_RO));
-		*pte = new_pte;
-	}
-}
-
 /*
  * paging_init() sets up the page tables
  */
@@ -122,7 +100,6 @@ void __init paging_init(void)
 	clear_table((unsigned long *) init_mm.pgd, pgd_type,
 		    sizeof(unsigned long)*2048);
 	vmem_map_init();
-	setup_ro_region();
 
         /* enable virtual mapping in kernel mode */
 	__ctl_load(S390_lowcore.kernel_asce, 1, 1);
Index: quilt-2.6/arch/s390/mm/Makefile
===================================================================
--- quilt-2.6.orig/arch/s390/mm/Makefile
+++ quilt-2.6/arch/s390/mm/Makefile
@@ -4,4 +4,4 @@
 
 obj-y	 := init.o fault.o extmem.o mmap.o vmem.o pgtable.o
 obj-$(CONFIG_CMM) += cmm.o
-
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
Index: quilt-2.6/arch/s390/mm/vmem.c
===================================================================
--- quilt-2.6.orig/arch/s390/mm/vmem.c
+++ quilt-2.6/arch/s390/mm/vmem.c
@@ -10,10 +10,12 @@
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/list.h>
+#include <linux/hugetlb.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
 #include <asm/setup.h>
 #include <asm/tlbflush.h>
+#include <asm/sections.h>
 
 static DEFINE_MUTEX(vmem_mutex);
 
@@ -114,7 +116,7 @@ static pte_t __init_refok *vmem_pte_allo
 /*
  * Add a physical memory range to the 1:1 mapping.
  */
-static int vmem_add_range(unsigned long start, unsigned long size)
+static int vmem_add_range(unsigned long start, unsigned long size, int ro)
 {
 	unsigned long address;
 	pgd_t *pg_dir;
@@ -141,7 +143,19 @@ static int vmem_add_range(unsigned long 
 			pud_populate_kernel(&init_mm, pu_dir, pm_dir);
 		}
 
+		pte = mk_pte_phys(address, __pgprot(ro ? _PAGE_RO : 0));
 		pm_dir = pmd_offset(pu_dir, address);
+
+#ifdef __s390x__
+		if (MACHINE_HAS_HPAGE && !(address & ~HPAGE_MASK) &&
+		    (address + HPAGE_SIZE <= start + size) &&
+		    (address >= HPAGE_SIZE)) {
+			pte_val(pte) |= _SEGMENT_ENTRY_LARGE;
+			pmd_val(*pm_dir) = pte_val(pte);
+			address += HPAGE_SIZE - PAGE_SIZE;
+			continue;
+		}
+#endif
 		if (pmd_none(*pm_dir)) {
 			pt_dir = vmem_pte_alloc();
 			if (!pt_dir)
@@ -150,7 +164,6 @@ static int vmem_add_range(unsigned long 
 		}
 
 		pt_dir = pte_offset_kernel(pm_dir, address);
-		pte = pfn_pte(address >> PAGE_SHIFT, PAGE_KERNEL);
 		*pt_dir = pte;
 	}
 	ret = 0;
@@ -181,6 +194,13 @@ static void vmem_remove_range(unsigned l
 		pm_dir = pmd_offset(pu_dir, address);
 		if (pmd_none(*pm_dir))
 			continue;
+
+		if (pmd_huge(*pm_dir)) {
+			pmd_clear_kernel(pm_dir);
+			address += HPAGE_SIZE - PAGE_SIZE;
+			continue;
+		}
+
 		pt_dir = pte_offset_kernel(pm_dir, address);
 		*pt_dir = pte;
 	}
@@ -249,14 +269,14 @@ out:
 	return ret;
 }
 
-static int vmem_add_mem(unsigned long start, unsigned long size)
+static int vmem_add_mem(unsigned long start, unsigned long size, int ro)
 {
 	int ret;
 
 	ret = vmem_add_mem_map(start, size);
 	if (ret)
 		return ret;
-	return vmem_add_range(start, size);
+	return vmem_add_range(start, size, ro);
 }
 
 /*
@@ -339,7 +359,7 @@ int add_shared_memory(unsigned long star
 	if (ret)
 		goto out_free;
 
-	ret = vmem_add_mem(start, size);
+	ret = vmem_add_mem(start, size, 0);
 	if (ret)
 		goto out_remove;
 
@@ -375,14 +395,35 @@ out:
  */
 void __init vmem_map_init(void)
 {
+	unsigned long ro_start, ro_end;
+	unsigned long start, end;
 	int i;
 
 	INIT_LIST_HEAD(&init_mm.context.crst_list);
 	INIT_LIST_HEAD(&init_mm.context.pgtable_list);
 	init_mm.context.noexec = 0;
 	NODE_DATA(0)->node_mem_map = VMEM_MAP;
-	for (i = 0; i < MEMORY_CHUNKS && memory_chunk[i].size > 0; i++)
-		vmem_add_mem(memory_chunk[i].addr, memory_chunk[i].size);
+	ro_start = ((unsigned long)&_stext) & PAGE_MASK;
+	ro_end = PFN_ALIGN((unsigned long)&_eshared);
+	for (i = 0; i < MEMORY_CHUNKS && memory_chunk[i].size > 0; i++) {
+		start = memory_chunk[i].addr;
+		end = memory_chunk[i].addr + memory_chunk[i].size;
+		if (start >= ro_end || end <= ro_start)
+			vmem_add_mem(start, end - start, 0);
+		else if (start >= ro_start && end <= ro_end)
+			vmem_add_mem(start, end - start, 1);
+		else if (start >= ro_start) {
+			vmem_add_mem(start, ro_end - start, 1);
+			vmem_add_mem(ro_end, end - ro_end, 0);
+		} else if (end < ro_end) {
+			vmem_add_mem(start, ro_start - start, 0);
+			vmem_add_mem(ro_start, end - ro_start, 1);
+		} else {
+			vmem_add_mem(start, ro_start - start, 0);
+			vmem_add_mem(ro_start, ro_end - ro_start, 1);
+			vmem_add_mem(ro_end, end - ro_end, 0);
+		}
+	}
 }
 
 /*
Index: quilt-2.6/fs/Kconfig
===================================================================
--- quilt-2.6.orig/fs/Kconfig
+++ quilt-2.6/fs/Kconfig
@@ -978,7 +978,8 @@ config TMPFS_POSIX_ACL
 
 config HUGETLBFS
 	bool "HugeTLB file system support"
-	depends on X86 || IA64 || PPC64 || SPARC64 || (SUPERH && MMU) || BROKEN
+	depends on X86 || IA64 || PPC64 || SPARC64 || (SUPERH && MMU) || \
+		   (S390 && 64BIT) || BROKEN
 	help
 	  hugetlbfs is a filesystem backing for HugeTLB pages, based on
 	  ramfs. For architectures that support it, say Y here and read
Index: quilt-2.6/include/asm-s390/page.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/page.h
+++ quilt-2.6/include/asm-s390/page.h
@@ -19,17 +19,34 @@
 #define PAGE_DEFAULT_ACC	0
 #define PAGE_DEFAULT_KEY	(PAGE_DEFAULT_ACC << 4)
 
+#define HPAGE_SHIFT	20
+#define HPAGE_SIZE	(1UL << HPAGE_SHIFT)
+#define HPAGE_MASK	(~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
+
+#define ARCH_HAS_SETCLEAR_HUGE_PTE
+#define ARCH_HAS_HUGE_PTE_TYPE
+#define ARCH_HAS_PREPARE_HUGEPAGE
+#define ARCH_HAS_HUGEPAGE_CLEAR_FLUSH
+
 #include <asm/setup.h>
 #ifndef __ASSEMBLY__
 
 static inline void clear_page(void *page)
 {
-	register unsigned long reg1 asm ("1") = 0;
-	register void *reg2 asm ("2") = page;
-	register unsigned long reg3 asm ("3") = 4096;
-	asm volatile(
-		"	mvcl	2,0"
-		: "+d" (reg2), "+d" (reg3) : "d" (reg1) : "memory", "cc");
+	if (MACHINE_HAS_CPAGE) {
+		asm volatile(
+			"	.insn	rre,0xb9af0000,%0,%1"
+			: : "d" (0x10000), "a" (page) : "memory", "cc");
+	} else {
+		register unsigned long reg1 asm ("1") = 0;
+		register void *reg2 asm ("2") = page;
+		register unsigned long reg3 asm ("3") = 4096;
+		asm volatile(
+			"	mvcl	2,0"
+			: "+d" (reg2), "+d" (reg3) : "d" (reg1)
+			: "memory", "cc");
+	}
 }
 
 static inline void copy_page(void *to, void *from)
Index: quilt-2.6/include/asm-s390/pgtable.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/pgtable.h
+++ quilt-2.6/include/asm-s390/pgtable.h
@@ -231,6 +231,15 @@ extern char empty_zero_page[PAGE_SIZE];
 #define _PAGE_TYPE_EX_RW	0x002
 
 /*
+ * Only four types for huge pages, using the invalid bit and protection bit
+ * of a segment table entry.
+ */
+#define _HPAGE_TYPE_EMPTY	0x020	/* _SEGMENT_ENTRY_INV */
+#define _HPAGE_TYPE_NONE	0x220
+#define _HPAGE_TYPE_RO		0x200	/* _SEGMENT_ENTRY_RO  */
+#define _HPAGE_TYPE_RW		0x000
+
+/*
  * PTE type bits are rather complicated. handle_pte_fault uses pte_present,
  * pte_none and pte_file to find out the pte type WITHOUT holding the page
  * table lock. ptep_clear_flush on the other hand uses ptep_clear_flush to
@@ -315,6 +324,9 @@ extern char empty_zero_page[PAGE_SIZE];
 #define _SEGMENT_ENTRY		(0)
 #define _SEGMENT_ENTRY_EMPTY	(_SEGMENT_ENTRY_INV)
 
+#define _SEGMENT_ENTRY_LARGE	0x400	/* STE-format control, large page   */
+#define _SEGMENT_ENTRY_CO	0x100	/* change-recording override   */
+
 #endif /* __s390x__ */
 
 /*
@@ -891,6 +903,152 @@ static inline pmd_t *pmd_offset(pud_t *p
 #define pte_unmap(pte) do { } while (0)
 #define pte_unmap_nested(pte) do { } while (0)
 
+#ifdef __s390x__
+static inline pte_t pte_mkhuge(pte_t pte)
+{
+	/*
+	 * PROT_NONE needs to be remapped from the pte type to the ste type.
+	 * The HW invalid bit is also different for pte and ste. The pte
+	 * invalid bit happens to be the same as the ste _SEGMENT_ENTRY_LARGE
+	 * bit, so we don't have to clear it.
+	 */
+	if (pte_val(pte) & _PAGE_INVALID) {
+		if (pte_val(pte) & _PAGE_SWT)
+			pte_val(pte) |= _HPAGE_TYPE_NONE;
+		pte_val(pte) |= _SEGMENT_ENTRY_INV;
+	}
+	/*
+	 * Clear SW pte bits SWT and SWX, there are no SW bits in a segment
+	 * table entry.
+	 */
+	pte_val(pte) &= ~(_PAGE_SWT | _PAGE_SWX);
+	/*
+	 * Also set the change-override bit because we don't need dirty bit
+	 * tracking for hugetlbfs pages.
+	 */
+	pte_val(pte) |= (_SEGMENT_ENTRY_LARGE | _SEGMENT_ENTRY_CO);
+	return pte;
+}
+
+static inline pte_t huge_pte_wrprotect(pte_t pte)
+{
+	pte_val(pte) |= _PAGE_RO;
+	return pte;
+}
+
+static inline int huge_pte_none(pte_t pte)
+{
+	return (pte_val(pte) & _SEGMENT_ENTRY_INV) &&
+		!(pte_val(pte) & _SEGMENT_ENTRY_RO);
+}
+
+static inline pte_t huge_ptep_get(pte_t *ptep)
+{
+	pte_t pte = *ptep;
+	unsigned long mask;
+
+	if (!MACHINE_HAS_HPAGE) {
+		ptep = (pte_t *) (pte_val(pte) & _SEGMENT_ENTRY_ORIGIN);
+		if (ptep) {
+			mask = pte_val(pte) &
+				(_SEGMENT_ENTRY_INV | _SEGMENT_ENTRY_RO);
+			pte = pte_mkhuge(*ptep);
+			pte_val(pte) |= mask;
+		}
+	}
+	return pte;
+}
+
+static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+					    unsigned long addr, pte_t *ptep)
+{
+	pte_t pte = huge_ptep_get(ptep);
+
+	pmd_clear((pmd_t *) ptep);
+	return pte;
+}
+
+static inline void __pmd_csp(pmd_t *pmdp)
+{
+	register unsigned long reg2 asm("2") = pmd_val(*pmdp);
+	register unsigned long reg3 asm("3") = pmd_val(*pmdp) |
+					       _SEGMENT_ENTRY_INV;
+	register unsigned long reg4 asm("4") = ((unsigned long) pmdp) + 5;
+
+	asm volatile(
+		"	csp %1,%3"
+		: "=m" (*pmdp)
+		: "d" (reg2), "d" (reg3), "d" (reg4), "m" (*pmdp) : "cc");
+	pmd_val(*pmdp) = _SEGMENT_ENTRY_INV | _SEGMENT_ENTRY;
+}
+
+static inline void __pmd_idte(unsigned long address, pmd_t *pmdp)
+{
+	unsigned long sto = (unsigned long) pmdp -
+				pmd_index(address) * sizeof(pmd_t);
+
+	if (!(pmd_val(*pmdp) & _SEGMENT_ENTRY_INV)) {
+		asm volatile(
+			"	.insn	rrf,0xb98e0000,%2,%3,0,0"
+			: "=m" (*pmdp)
+			: "m" (*pmdp), "a" (sto),
+			  "a" ((address & HPAGE_MASK))
+		);
+	}
+	pmd_val(*pmdp) = _SEGMENT_ENTRY_INV | _SEGMENT_ENTRY;
+}
+
+static inline void huge_ptep_invalidate(struct mm_struct *mm,
+					unsigned long address, pte_t *ptep)
+{
+	pmd_t *pmdp = (pmd_t *) ptep;
+
+	if (!MACHINE_HAS_IDTE) {
+		__pmd_csp(pmdp);
+		if (mm->context.noexec) {
+			pmdp = get_shadow_table(pmdp);
+			__pmd_csp(pmdp);
+		}
+		return;
+	}
+
+	__pmd_idte(address, pmdp);
+	if (mm->context.noexec) {
+		pmdp = get_shadow_table(pmdp);
+		__pmd_idte(address, pmdp);
+	}
+	return;
+}
+
+#define huge_ptep_set_access_flags(__vma, __addr, __ptep, __entry, __dirty) \
+({									    \
+	int __changed = !pte_same(huge_ptep_get(__ptep), __entry);	    \
+	if (__changed) {						    \
+		huge_ptep_invalidate((__vma)->vm_mm, __addr, __ptep);	    \
+		set_huge_pte_at((__vma)->vm_mm, __addr, __ptep, __entry);   \
+	}								    \
+	__changed;							    \
+})
+
+#define huge_ptep_set_wrprotect(__mm, __addr, __ptep)			\
+({									\
+	pte_t __pte = huge_ptep_get(__ptep);				\
+	if (pte_write(__pte)) {						\
+		if (atomic_read(&(__mm)->mm_users) > 1 ||		\
+		    (__mm) != current->active_mm)			\
+			huge_ptep_invalidate(__mm, __addr, __ptep);	\
+		set_huge_pte_at(__mm, __addr, __ptep,			\
+				huge_pte_wrprotect(__pte));		\
+	}								\
+})
+
+static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
+					 unsigned long address, pte_t *ptep)
+{
+	huge_ptep_invalidate(vma->vm_mm, address, ptep);
+}
+#endif /* __s390x__ */
+
 /*
  * 31 bit swap entry format:
  * A page-table entry has some bits we have to treat in a special way.
Index: quilt-2.6/include/asm-s390/setup.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/setup.h
+++ quilt-2.6/include/asm-s390/setup.h
@@ -70,11 +70,15 @@ extern unsigned long machine_flags;
 #define MACHINE_HAS_CSP		(machine_flags & 8)
 #define MACHINE_HAS_DIAG44	(1)
 #define MACHINE_HAS_MVCOS	(0)
+#define MACHINE_HAS_HPAGE	(0)
+#define MACHINE_HAS_CPAGE	(0)
 #else /* __s390x__ */
 #define MACHINE_HAS_IEEE	(1)
 #define MACHINE_HAS_CSP		(1)
 #define MACHINE_HAS_DIAG44	(machine_flags & 32)
 #define MACHINE_HAS_MVCOS	(machine_flags & 512)
+#define MACHINE_HAS_HPAGE	(machine_flags & 1024)
+#define MACHINE_HAS_CPAGE	(machine_flags & 2048)
 #endif /* __s390x__ */
 
 #define MACHINE_HAS_SCLP	(!MACHINE_IS_P390)
Index: quilt-2.6/include/asm-s390/system.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/system.h
+++ quilt-2.6/include/asm-s390/system.h
@@ -16,6 +16,7 @@
 #include <asm/ptrace.h>
 #include <asm/setup.h>
 #include <asm/processor.h>
+#include <asm/lowcore.h>
 
 #ifdef __KERNEL__
 
@@ -423,6 +424,15 @@ extern void smp_ctl_clear_bit(int cr, in
 
 #endif /* CONFIG_SMP */
 
+static inline unsigned int stfl(void)
+{
+	asm volatile(
+		"	.insn	s,0xb2b10000,0(0)\n" /* stfl */
+		"0:\n"
+		EX_TABLE(0b,0b));
+	return S390_lowcore.stfl_fac_list;
+}
+
 extern void (*_machine_restart)(char *command);
 extern void (*_machine_halt)(void);
 extern void (*_machine_power_off)(void);
Index: quilt-2.6/include/asm-s390/tlbflush.h
===================================================================
--- quilt-2.6.orig/include/asm-s390/tlbflush.h
+++ quilt-2.6/include/asm-s390/tlbflush.h
@@ -2,6 +2,7 @@
 #define _S390_TLBFLUSH_H
 
 #include <linux/mm.h>
+#include <linux/sched.h>
 #include <asm/processor.h>
 #include <asm/pgalloc.h>
 

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-12 17:32 ` [patch 09/10] Hugetlb common code update for System z Martin Schwidefsky
@ 2008-03-12 17:51   ` Dave Hansen
  2008-03-12 23:18     ` Gerald Schaefer
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2008-03-12 17:51 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel, linux-s390, Gerald Schaefer

On Wed, 2008-03-12 at 18:32 +0100, Martin Schwidefsky wrote:
> +#ifndef ARCH_HAS_HUGE_PTE_TYPE
> +#define huge_pte_none(pte)                     pte_none(pte)
> +#define huge_pte_wrprotect(pte)                        pte_wrprotect(pte)
> +#define huge_ptep_set_wrprotect(mm, addr, ptep)        \
> +       ptep_set_wrprotect(mm, addr, ptep)
> +#define huge_ptep_set_access_flags(vma, addr, ptep, pte, dirty)        \
> +       ptep_set_access_flags(vma, addr, ptep, pte, dirty)
> +#define huge_ptep_get(ptep)                    (*ptep)
> +#endif
> +
> +#ifndef ARCH_HAS_PREPARE_HUGEPAGE

Can you guys please do these defines in Kconfig instead of headers?  I
find them much easier to track down when I have one place to look,
rather than a mess of 14 other #includes in a arch-specific header. :)

I'm also a little concerned that you just #ifdef'd in about 44 new ptep
functions in here.  Have you carefully considered doing this in a way
that would fit in better with the other architectures?

> Huge ptes have a special type on s390 and cannot be handled with the
> standard pte functions in certain cases.

Can you elaborate a bit more on that?

-- Dave


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 10/10] System z large page support.
  2008-03-12 17:32 ` [patch 10/10] System z large page support Martin Schwidefsky
@ 2008-03-12 17:52   ` Dave Hansen
  2008-03-12 22:14     ` Gerald Schaefer
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2008-03-12 17:52 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel, linux-s390, Gerald Schaefer


On Wed, 2008-03-12 at 18:32 +0100, Martin Schwidefsky wrote:
> 
> +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr)
> +{
> +       pgd_t *pgdp;
> +       pud_t *pudp;
> +       pmd_t *pmdp = NULL;
> +
> +       pgdp = pgd_offset(mm, addr);
> +       pudp = pud_alloc(mm, pgdp, addr);
> +       if (pudp)
> +               pmdp = pmd_alloc(mm, pudp, addr);
> +       return (pte_t *) pmdp;
> +}

That looks pretty generic.  Why can't you share with the version we
already have?

-- Dave


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 01/10] Add new fields for System z10 to /proc/sysinfo
  2008-03-12 17:31 ` [patch 01/10] Add new fields for System z10 to /proc/sysinfo Martin Schwidefsky
@ 2008-03-12 17:57   ` Josef 'Jeff' Sipek
  2008-03-13 10:02     ` Martin Schwidefsky
  0 siblings, 1 reply; 31+ messages in thread
From: Josef 'Jeff' Sipek @ 2008-03-12 17:57 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel, linux-s390

On Wed, Mar 12, 2008 at 06:31:56PM +0100, Martin Schwidefsky wrote:
> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> 
> Add permanent and temporary model capacity and the corresponding
> capacity value fields for the three capacity identifiers to the
> output of /proc/sysinfo.
> 
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> ---
> 
>  drivers/s390/sysinfo.c |   21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> Index: quilt-2.6/drivers/s390/sysinfo.c
> ===================================================================
> --- quilt-2.6.orig/drivers/s390/sysinfo.c
> +++ quilt-2.6/drivers/s390/sysinfo.c
> @@ -26,6 +26,11 @@ struct sysinfo_1_1_1 {
>  	char sequence[16];
>  	char plant[4];
>  	char model[16];
> +	char model_perm_cap[16];
> +	char model_temp_cap[16];
> +	char model_cap_rating[4];
> +	char model_perm_cap_rating[4];
> +	char model_temp_cap_rating[4];
>  };

I'd try to be safer, and make the struct __attribute__((packed))...

Josef 'Jeff' Sipek.

-- 
I'm somewhere between geek and normal.
		- Linus Torvalds

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 10/10] System z large page support.
  2008-03-12 17:52   ` Dave Hansen
@ 2008-03-12 22:14     ` Gerald Schaefer
  0 siblings, 0 replies; 31+ messages in thread
From: Gerald Schaefer @ 2008-03-12 22:14 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Martin Schwidefsky, linux-kernel, linux-s390

On Wed, 2008-03-12 at 10:52 -0700, Dave Hansen wrote:
> > 
> > +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr)
> > +{
> > +       pgd_t *pgdp;
> > +       pud_t *pudp;
> > +       pmd_t *pmdp = NULL;
> > +
> > +       pgdp = pgd_offset(mm, addr);
> > +       pudp = pud_alloc(mm, pgdp, addr);
> > +       if (pudp)
> > +               pmdp = pmd_alloc(mm, pudp, addr);
> > +       return (pte_t *) pmdp;
> > +}
> 
> That looks pretty generic.  Why can't you share with the version we
> already have?

What version do you mean? Every architecture with hugetlbfs support has
its own huge_pte_alloc() function in arch/<arch>/mm/hugetlbfs.c, and
they are all doing slightly different things.

--
Gerald Schaefer



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file.
  2008-03-12 17:31 ` [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file Martin Schwidefsky
@ 2008-03-12 23:03   ` Andrew Morton
  2008-03-13  9:48     ` Martin Schwidefsky
  2008-03-21 12:29   ` Ingo Molnar
  1 sibling, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2008-03-12 23:03 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel, linux-s390, heiko.carstens, schwidefsky

On Wed, 12 Mar 2008 18:31:58 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> +#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
> +extern int arch_reinit_sched_domains(void);
> +#endif

I tend to recommend that the ifdefs be omitted here.

It has the downside that the build will then fail at link-time rather than
at compile-time, but I haven't seen anyone complain about that.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 06/10] cpu topology support for s390.
  2008-03-12 17:32 ` [patch 06/10] cpu topology support for s390 Martin Schwidefsky
@ 2008-03-12 23:11   ` Andrew Morton
  2008-03-13 12:28     ` Martin Schwidefsky
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2008-03-12 23:11 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel, linux-s390, heiko.carstens, schwidefsky

On Wed, 12 Mar 2008 18:32:01 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> 
> Add s390 backend so we can give the scheduler some hints about the
> cpu topology.
> 
> ===================================================================
> --- /dev/null
> +++ quilt-2.6/arch/s390/kernel/topology.c
> @@ -0,0 +1,271 @@
> +/*
> + *  arch/s390/kernel/topology.c
> + *
> + *    Copyright IBM Corp. 2007
> + *    Author(s): Heiko Carstens <heiko.carstens@de.ibm.com>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/init.h>
> +#include <linux/device.h>
> +#include <linux/bootmem.h>
> +#include <linux/sched.h>
> +#include <linux/workqueue.h>
> +#include <linux/cpu.h>
> +#include <linux/smp.h>
> +#include <asm/delay.h>
> +#include <asm/s390_ext.h>
> +
> +#define CPU_BITS 64
> +
> +struct tl_cpu {
> +	unsigned char reserved[6];
> +	unsigned short origin;
> +	unsigned long mask[CPU_BITS / BITS_PER_LONG];
> +};

mask[] will be too small for CPU_BITS=65 ;)

> ...
>
> +static union tl_entry *next_tle(union tl_entry *tle)
> +{
> +	if (tle->nl)
> +		return (union tl_entry *)((struct tl_container *)tle + 1);
> +	else
> +		return (union tl_entry *)((struct tl_cpu *)tle + 1);
> +}

omg.

> +static void tl_to_cores(struct tl_info *info)
> +{
> +	union tl_entry *tle, *end;
> +	struct core_info *core = &core_info;
> +
> +	mutex_lock(&smp_cpu_state_mutex);
> +	clear_cores();
> +	tle = (union tl_entry *)&info->tle;

and this cast was unneeded!

> +	end = (union tl_entry *)((unsigned long)info + info->length);

I'd suggest that you take a look at all the pointer arith games which are
being played in this code and see if it can be done better with a more
appropriate use of the C type system.  Before someone dies.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-12 17:51   ` Dave Hansen
@ 2008-03-12 23:18     ` Gerald Schaefer
  2008-03-12 23:43       ` Andrew Morton
  0 siblings, 1 reply; 31+ messages in thread
From: Gerald Schaefer @ 2008-03-12 23:18 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Martin Schwidefsky, linux-kernel, linux-s390

On Wed, 2008-03-12 at 10:51 -0700, Dave Hansen wrote:
> > +#ifndef ARCH_HAS_HUGE_PTE_TYPE
> > +#define huge_pte_none(pte)                     pte_none(pte)
> > +#define huge_pte_wrprotect(pte)                        pte_wrprotect(pte)
> > +#define huge_ptep_set_wrprotect(mm, addr, ptep)        \
> > +       ptep_set_wrprotect(mm, addr, ptep)
> > +#define huge_ptep_set_access_flags(vma, addr, ptep, pte, dirty)        \
> > +       ptep_set_access_flags(vma, addr, ptep, pte, dirty)
> > +#define huge_ptep_get(ptep)                    (*ptep)
> > +#endif
> > +
> > +#ifndef ARCH_HAS_PREPARE_HUGEPAGE
> 
> Can you guys please do these defines in Kconfig instead of headers?  I
> find them much easier to track down when I have one place to look,
> rather than a mess of 14 other #includes in a arch-specific header. :)

There are already several ARCH_HAS_xxx defines which are being used in
inlude/linux/hugetlb.h. All of them are defined in
include/asm-<arch>/page.h for every architecture that needs them (with
the exception of powerpc, where it is include/asm-powerpc/page_64.h).
So there is already one place to look for them, and so we put our
defines into include/asm-s390/page.h.

> I'm also a little concerned that you just #ifdef'd in about 44 new ptep
> functions in here.  Have you carefully considered doing this in a way
> that would fit in better with the other architectures?

Other architectures should not be affected at all. Because of the
#ifdef, the new ptep functions are either a nop for them or just the
same as they were before our patch.

> > Huge ptes have a special type on s390 and cannot be handled with the
> > standard pte functions in certain cases.
> 
> Can you elaborate a bit more on that?

Large ptes are not really ptes but segment table entries (pmd entries),
in our case. This is similar to other architectures with hardware large
page support, because there simply is no page table level (and thus no
ptes) anymore. Unfortunately, the hugetlbfs common code does not
consider that discrepancy and just uses a standard pte_t and standard
pte functions, probably because it did not really make a difference on
other architectures.

On s390, a segment table entry (pmd) type is different from a pte type
mainly in the location of its invalid bit. This means that we cannot
use pte_none(), pte_wrprotect() and similar functions for large ptes,
which was the reason for the new huge_pte functions that we introduced.

--
Gerald Schaefer


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-12 23:18     ` Gerald Schaefer
@ 2008-03-12 23:43       ` Andrew Morton
  2008-03-13 17:49         ` Gerald Schaefer
  2008-03-28 14:05         ` Gerald Schaefer
  0 siblings, 2 replies; 31+ messages in thread
From: Andrew Morton @ 2008-03-12 23:43 UTC (permalink / raw)
  To: gerald.schaefer; +Cc: haveblue, schwidefsky, linux-kernel, linux-s390

On Thu, 13 Mar 2008 00:18:57 +0100
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> On Wed, 2008-03-12 at 10:51 -0700, Dave Hansen wrote:
> > > +#ifndef ARCH_HAS_HUGE_PTE_TYPE
> > > +#define huge_pte_none(pte)                     pte_none(pte)
> > > +#define huge_pte_wrprotect(pte)                        pte_wrprotect(pte)
> > > +#define huge_ptep_set_wrprotect(mm, addr, ptep)        \
> > > +       ptep_set_wrprotect(mm, addr, ptep)
> > > +#define huge_ptep_set_access_flags(vma, addr, ptep, pte, dirty)        \
> > > +       ptep_set_access_flags(vma, addr, ptep, pte, dirty)
> > > +#define huge_ptep_get(ptep)                    (*ptep)
> > > +#endif
> > > +
> > > +#ifndef ARCH_HAS_PREPARE_HUGEPAGE
> > 
> > Can you guys please do these defines in Kconfig instead of headers?  I
> > find them much easier to track down when I have one place to look,
> > rather than a mess of 14 other #includes in a arch-specific header. :)
> 
> There are already several ARCH_HAS_xxx defines which are being used in
> inlude/linux/hugetlb.h. All of them are defined in
> include/asm-<arch>/page.h for every architecture that needs them (with
> the exception of powerpc, where it is include/asm-powerpc/page_64.h).

Yes, but that's fugly and it would be better to put in place the
infrastructure for cleaning it up, rather than worsening it.

So...

Put this:

+#define huge_pte_none(pte)			pte_none(pte)
+#define huge_pte_wrprotect(pte)			pte_wrprotect(pte)
+#define huge_ptep_set_wrprotect(mm, addr, ptep)	\
+	ptep_set_wrprotect(mm, addr, ptep)
+#define huge_ptep_set_access_flags(vma, addr, ptep, pte, dirty)	\
+	ptep_set_access_flags(vma, addr, ptep, pte, dirty)
+#define huge_ptep_get(ptep)			(*ptep)

into include/asm-generic/hugetlb.h

then for each architecture except s390 add an include/asm-foo/hugetlb.h
which does

#include <asm-generic/hugetlb.h>

then in include/linux/hugetlb.h add

#include <asm/hugetlb.h>

and then in include/asm-s390/hugetlb.h, add your s390-specific versions of
huge_pte_none() and friends.

later, someone can hopefully use this new infrastructure to rid us of
ARCH_HAS_HUGEPAGE_ONLY_RANGE, ARCH_HAS_HUGETLB_FREE_PGD_RANGE,
ARCH_HAS_PREPARE_HUGEPAGE_RANGE, ARCH_HAS_SETCLEAR_HUGE_PTE and
ARCH_HAS_HUGETLB_PREFAULT_HOOK.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file.
  2008-03-12 23:03   ` Andrew Morton
@ 2008-03-13  9:48     ` Martin Schwidefsky
  0 siblings, 0 replies; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-13  9:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-s390, heiko.carstens

On Wed, 2008-03-12 at 16:03 -0700, Andrew Morton wrote:
> > +#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
> > +extern int arch_reinit_sched_domains(void);
> > +#endif
> 
> I tend to recommend that the ifdefs be omitted here.
> 
> It has the downside that the build will then fail at link-time rather than
> at compile-time, but I haven't seen anyone complain about that.

Ok, I've removed the #if.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 01/10] Add new fields for System z10 to /proc/sysinfo
  2008-03-12 17:57   ` Josef 'Jeff' Sipek
@ 2008-03-13 10:02     ` Martin Schwidefsky
  0 siblings, 0 replies; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-13 10:02 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek; +Cc: linux-kernel, linux-s390

On Wed, 2008-03-12 at 13:57 -0400, Josef 'Jeff' Sipek wrote:
> > Index: quilt-2.6/drivers/s390/sysinfo.c
> > ===================================================================
> > --- quilt-2.6.orig/drivers/s390/sysinfo.c
> > +++ quilt-2.6/drivers/s390/sysinfo.c
> > @@ -26,6 +26,11 @@ struct sysinfo_1_1_1 {
> >  	char sequence[16];
> >  	char plant[4];
> >  	char model[16];
> > +	char model_perm_cap[16];
> > +	char model_temp_cap[16];
> > +	char model_cap_rating[4];
> > +	char model_perm_cap_rating[4];
> > +	char model_temp_cap_rating[4];
> >  };
> 
> I'd try to be safer, and make the struct __attribute__((packed))...

Hmm, that would be true for all sysinfo typedefs. The automatic
alignment does the right thing though. Don't know if its worth the
effort - at least until you get bitten for the first time ;-)

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 06/10] cpu topology support for s390.
  2008-03-12 23:11   ` Andrew Morton
@ 2008-03-13 12:28     ` Martin Schwidefsky
  2008-03-13 22:40       ` Heiko Carstens
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-13 12:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-s390, heiko.carstens

On Wed, 2008-03-12 at 16:11 -0700, Andrew Morton wrote:
> > +#include <asm/delay.h>
> > +#include <asm/s390_ext.h>
> > +
> > +#define CPU_BITS 64
> > +
> > +struct tl_cpu {
> > +	unsigned char reserved[6];
> > +	unsigned short origin;
> > +	unsigned long mask[CPU_BITS / BITS_PER_LONG];
> > +};
> 
> mask[] will be too small for CPU_BITS=65 ;)

We could add the +(BITS_PER_LONG - 1) logic but what for? The CPU_BITS
is defined right above and it will be increased in steps of 64.

> > ...
> >
> > +static union tl_entry *next_tle(union tl_entry *tle)
> > +{
> > +	if (tle->nl)
> > +		return (union tl_entry *)((struct tl_container *)tle + 1);
> > +	else
> > +		return (union tl_entry *)((struct tl_cpu *)tle + 1);
> > +}
> 
> omg.

The length of the current tle depends on the type, the next type is
located behind the current one. Expect for the typecasting and the union
trick this is what needs to be done.

> > +static void tl_to_cores(struct tl_info *info)
> > +{
> > +	union tl_entry *tle, *end;
> > +	struct core_info *core = &core_info;
> > +
> > +	mutex_lock(&smp_cpu_state_mutex);
> > +	clear_cores();
> > +	tle = (union tl_entry *)&info->tle;
> 
> and this cast was unneeded!
> 
> > +	end = (union tl_entry *)((unsigned long)info + info->length);
> 
> I'd suggest that you take a look at all the pointer arith games which are
> being played in this code and see if it can be done better with a more
> appropriate use of the C type system.  Before someone dies.

The only thing that I can see that we could do is to get rid of the
unions and do the pointer arithmetic with the tl_cpu / tl_container
structs by hand. The data stored by ptf is structured in a way that
makes it rather hard to write decent C code.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-12 23:43       ` Andrew Morton
@ 2008-03-13 17:49         ` Gerald Schaefer
  2008-03-28 14:05         ` Gerald Schaefer
  1 sibling, 0 replies; 31+ messages in thread
From: Gerald Schaefer @ 2008-03-13 17:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: haveblue, schwidefsky, linux-kernel, linux-s390

On Wed, 2008-03-12 at 16:43 -0700, Andrew Morton wrote:
> Yes, but that's fugly and it would be better to put in place the
> infrastructure for cleaning it up, rather than worsening it.
> 
> So...

OK, we'll start the clean-up, thanks for the detailed suggestions.

--
Gerald Schaefer


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 06/10] cpu topology support for s390.
  2008-03-13 12:28     ` Martin Schwidefsky
@ 2008-03-13 22:40       ` Heiko Carstens
  0 siblings, 0 replies; 31+ messages in thread
From: Heiko Carstens @ 2008-03-13 22:40 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: Andrew Morton, linux-kernel, linux-s390

On Thu, Mar 13, 2008 at 01:28:27PM +0100, Martin Schwidefsky wrote:
> On Wed, 2008-03-12 at 16:11 -0700, Andrew Morton wrote:
> > > +#define CPU_BITS 64
> > > +
> > > +struct tl_cpu {
> > > +	unsigned char reserved[6];
> > > +	unsigned short origin;
> > > +	unsigned long mask[CPU_BITS / BITS_PER_LONG];
> > > +};
> > 
> > mask[] will be too small for CPU_BITS=65 ;)
> 
> We could add the +(BITS_PER_LONG - 1) logic but what for? The CPU_BITS
> is defined right above and it will be increased in steps of 64.

It will always be 64 and won't be increased. For more than 64 cpus "origin"
in the hardware structure above will be > 0 and each bit in the mask
would represent cpu "origin + bit number".

> > > +	end = (union tl_entry *)((unsigned long)info + info->length);
> > 
> > I'd suggest that you take a look at all the pointer arith games which are
> > being played in this code and see if it can be done better with a more
> > appropriate use of the C type system.  Before someone dies.
> 
> The only thing that I can see that we could do is to get rid of the
> unions and do the pointer arithmetic with the tl_cpu / tl_container
> structs by hand. The data stored by ptf is structured in a way that
> makes it rather hard to write decent C code.

Please leave as is. I did have a few different implementations but they
all looked even worse than this one. If you read the hardware specs then
the current code should be rather easily understandable.
And since the whole documentation is publically availabe in the meantime
I could even add some comments ;)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file.
  2008-03-12 17:31 ` [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file Martin Schwidefsky
  2008-03-12 23:03   ` Andrew Morton
@ 2008-03-21 12:29   ` Ingo Molnar
  1 sibling, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2008-03-21 12:29 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel, linux-s390, Heiko Carstens


* Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> Needed so it can be called from outside of sched.c.

thanks, applied.

> +#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
> +extern int arch_reinit_sched_domains(void);
> +#endif

i removed the #ifdefs around this.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 04/10] sched: Add arch_update_cpu_topology hook.
  2008-03-12 17:31 ` [patch 04/10] sched: Add arch_update_cpu_topology hook Martin Schwidefsky
@ 2008-03-21 12:30   ` Ingo Molnar
  0 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2008-03-21 12:30 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel, linux-s390, Heiko Carstens


* Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> Will be called each time the scheduling domains are rebuild. Needed 
> for architectures that don't have a static cpu topology.

thanks, applied.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-12 23:43       ` Andrew Morton
  2008-03-13 17:49         ` Gerald Schaefer
@ 2008-03-28 14:05         ` Gerald Schaefer
  2008-03-28 14:06           ` Ingo Molnar
  1 sibling, 1 reply; 31+ messages in thread
From: Gerald Schaefer @ 2008-03-28 14:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: haveblue, schwidefsky, linux-kernel, linux-s390, David S. Miller,
	Tony Luck, Paul Mackerras, Thomas Gleixner, Paul Mundt

On Wed, 2008-03-12 at 16:43 -0700, Andrew Morton wrote:
> > There are already several ARCH_HAS_xxx defines which are being used in
> > inlude/linux/hugetlb.h. All of them are defined in
> > include/asm-<arch>/page.h for every architecture that needs them (with
> > the exception of powerpc, where it is include/asm-powerpc/page_64.h).
> 
> Yes, but that's fugly and it would be better to put in place the
> infrastructure for cleaning it up, rather than worsening it.
> 
> So...
> 
> Put this:
> 
> +#define huge_pte_none(pte)			pte_none(pte)
> +#define huge_pte_wrprotect(pte)			pte_wrprotect(pte)
> +#define huge_ptep_set_wrprotect(mm, addr, ptep)	\
> +	ptep_set_wrprotect(mm, addr, ptep)
> +#define huge_ptep_set_access_flags(vma, addr, ptep, pte, dirty)	\
> +	ptep_set_access_flags(vma, addr, ptep, pte, dirty)
> +#define huge_ptep_get(ptep)			(*ptep)
> 
> into include/asm-generic/hugetlb.h
> 
> then for each architecture except s390 add an include/asm-foo/hugetlb.h
> which does
> 
> #include <asm-generic/hugetlb.h>
> 
> then in include/linux/hugetlb.h add
> 
> #include <asm/hugetlb.h>
> 
> and then in include/asm-s390/hugetlb.h, add your s390-specific versions of
> huge_pte_none() and friends.
> 
> later, someone can hopefully use this new infrastructure to rid us of
> ARCH_HAS_HUGEPAGE_ONLY_RANGE, ARCH_HAS_HUGETLB_FREE_PGD_RANGE,
> ARCH_HAS_PREPARE_HUGEPAGE_RANGE, ARCH_HAS_SETCLEAR_HUGE_PTE and
> ARCH_HAS_HUGETLB_PREFAULT_HOOK.

This patch moves all architecture functions for hugetlb to architecture
header files (include/asm-foo/hugetlb.h). It also removes (!)
ARCH_HAS_HUGEPAGE_ONLY_RANGE, ARCH_HAS_HUGETLB_FREE_PGD_RANGE,
ARCH_HAS_PREPARE_HUGEPAGE_RANGE, ARCH_HAS_SETCLEAR_HUGE_PTE and
ARCH_HAS_HUGETLB_PREFAULT_HOOK.

Cross-Compile tests on the affected architectures (and one unaffected)
worked fine, but I had no cross-compiler for sh.

If this patch is accepted, we will resend the s390 large page patches
so that they will use/extend this new infrastructure.

--
Gerald Schaefer

---
 include/asm-ia64/hugetlb.h    |   21 +++++++++++++++++++
 include/asm-ia64/page.h       |    6 -----
 include/asm-powerpc/hugetlb.h |   35 +++++++++++++++++++++++++++++++
 include/asm-powerpc/page_64.h |    7 ------
 include/asm-sh/hugetlb.h      |   28 +++++++++++++++++++++++++
 include/asm-sparc64/hugetlb.h |   30 +++++++++++++++++++++++++++
 include/asm-sparc64/page.h    |    2 -
 include/asm-x86/hugetlb.h     |   28 +++++++++++++++++++++++++
 include/linux/hugetlb.h       |   46 ------------------------------------------
 9 files changed, 143 insertions(+), 60 deletions(-)

Index: linux-2.6.25-rc7/include/asm-ia64/hugetlb.h
===================================================================
--- /dev/null
+++ linux-2.6.25-rc7/include/asm-ia64/hugetlb.h
@@ -0,0 +1,21 @@
+#ifndef _ASM_IA64_HUGETLB_H
+#define _ASM_IA64_HUGETLB_H
+
+#include <asm/page.h>
+
+
+#define is_hugepage_only_range(mm, addr, len)		\
+	(REGION_NUMBER(addr) == RGN_HPAGE ||	\
+	 REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE)
+
+void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
+			    unsigned long end, unsigned long floor,
+			    unsigned long ceiling);
+int prepare_hugepage_range(unsigned long addr, unsigned long len);
+
+#define set_huge_pte_at(mm, addr, ptep, pte)	set_pte_at(mm, addr, ptep, pte)
+#define huge_ptep_get_and_clear(mm, addr, ptep) ptep_get_and_clear(mm, addr, ptep)
+
+#define hugetlb_prefault_arch_hook(mm)		do { } while (0)
+
+#endif /* _ASM_IA64_HUGETLB_H */
Index: linux-2.6.25-rc7/include/asm-sh/hugetlb.h
===================================================================
--- /dev/null
+++ linux-2.6.25-rc7/include/asm-sh/hugetlb.h
@@ -0,0 +1,28 @@
+#ifndef _ASM_SH_HUGETLB_H
+#define _ASM_SH_HUGETLB_H
+
+#include <asm/page.h>
+
+
+#define is_hugepage_only_range(mm, addr, len)	0
+#define hugetlb_free_pgd_range			free_pgd_range
+
+/*
+ * If the arch doesn't supply something else, assume that hugepage
+ * size aligned regions are ok without further preparation.
+ */
+static inline int prepare_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
+	return 0;
+}
+
+#define set_huge_pte_at(mm, addr, ptep, pte)	set_pte_at(mm, addr, ptep, pte)
+#define huge_ptep_get_and_clear(mm, addr, ptep) ptep_get_and_clear(mm, addr, ptep)
+
+#define hugetlb_prefault_arch_hook(mm)		do { } while (0)
+
+#endif /* _ASM_SH_HUGETLB_H */
Index: linux-2.6.25-rc7/include/asm-sparc64/hugetlb.h
===================================================================
--- /dev/null
+++ linux-2.6.25-rc7/include/asm-sparc64/hugetlb.h
@@ -0,0 +1,30 @@
+#ifndef _ASM_SPARC64_HUGETLB_H
+#define _ASM_SPARC64_HUGETLB_H
+
+#include <asm/page.h>
+
+
+#define is_hugepage_only_range(mm, addr, len)	0
+#define hugetlb_free_pgd_range			free_pgd_range
+
+/*
+ * If the arch doesn't supply something else, assume that hugepage
+ * size aligned regions are ok without further preparation.
+ */
+static inline int prepare_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
+	return 0;
+}
+
+void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+		     pte_t *ptep, pte_t pte);
+pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep);
+
+void hugetlb_prefault_arch_hook(struct mm_struct *mm);
+
+#endif /* _ASM_SPARC64_HUGETLB_H */
Index: linux-2.6.25-rc7/include/asm-x86/hugetlb.h
===================================================================
--- /dev/null
+++ linux-2.6.25-rc7/include/asm-x86/hugetlb.h
@@ -0,0 +1,28 @@
+#ifndef _ASM_X86_HUGETLB_H
+#define _ASM_X86_HUGETLB_H
+
+#include <asm/page.h>
+
+
+#define is_hugepage_only_range(mm, addr, len)	0
+#define hugetlb_free_pgd_range			free_pgd_range
+
+/*
+ * If the arch doesn't supply something else, assume that hugepage
+ * size aligned regions are ok without further preparation.
+ */
+static inline int prepare_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
+	return 0;
+}
+
+#define set_huge_pte_at(mm, addr, ptep, pte)	set_pte_at(mm, addr, ptep, pte)
+#define huge_ptep_get_and_clear(mm, addr, ptep) ptep_get_and_clear(mm, addr, ptep)
+
+#define hugetlb_prefault_arch_hook(mm)		do { } while (0)
+
+#endif /* _ASM_X86_HUGETLB_H */
Index: linux-2.6.25-rc7/include/linux/hugetlb.h
===================================================================
--- linux-2.6.25-rc7.orig/include/linux/hugetlb.h
+++ linux-2.6.25-rc7/include/linux/hugetlb.h
@@ -8,6 +8,7 @@
 #include <linux/mempolicy.h>
 #include <linux/shm.h>
 #include <asm/tlbflush.h>
+#include <asm/hugetlb.h>
 
 struct ctl_table;
 
@@ -51,51 +52,6 @@ int pmd_huge(pmd_t pmd);
 void hugetlb_change_protection(struct vm_area_struct *vma,
 		unsigned long address, unsigned long end, pgprot_t newprot);
 
-#ifndef ARCH_HAS_HUGEPAGE_ONLY_RANGE
-#define is_hugepage_only_range(mm, addr, len)	0
-#endif
-
-#ifndef ARCH_HAS_HUGETLB_FREE_PGD_RANGE
-#define hugetlb_free_pgd_range	free_pgd_range
-#else
-void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
-			    unsigned long end, unsigned long floor,
-			    unsigned long ceiling);
-#endif
-
-#ifndef ARCH_HAS_PREPARE_HUGEPAGE_RANGE
-/*
- * If the arch doesn't supply something else, assume that hugepage
- * size aligned regions are ok without further preparation.
- */
-static inline int prepare_hugepage_range(unsigned long addr, unsigned long len)
-{
-	if (len & ~HPAGE_MASK)
-		return -EINVAL;
-	if (addr & ~HPAGE_MASK)
-		return -EINVAL;
-	return 0;
-}
-#else
-int prepare_hugepage_range(unsigned long addr, unsigned long len);
-#endif
-
-#ifndef ARCH_HAS_SETCLEAR_HUGE_PTE
-#define set_huge_pte_at(mm, addr, ptep, pte)	set_pte_at(mm, addr, ptep, pte)
-#define huge_ptep_get_and_clear(mm, addr, ptep) ptep_get_and_clear(mm, addr, ptep)
-#else
-void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-		     pte_t *ptep, pte_t pte);
-pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep);
-#endif
-
-#ifndef ARCH_HAS_HUGETLB_PREFAULT_HOOK
-#define hugetlb_prefault_arch_hook(mm)		do { } while (0)
-#else
-void hugetlb_prefault_arch_hook(struct mm_struct *mm);
-#endif
-
 #else /* !CONFIG_HUGETLB_PAGE */
 
 static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
Index: linux-2.6.25-rc7/include/asm-powerpc/hugetlb.h
===================================================================
--- /dev/null
+++ linux-2.6.25-rc7/include/asm-powerpc/hugetlb.h
@@ -0,0 +1,35 @@
+#ifndef _ASM_POWERPC_HUGETLB_H
+#define _ASM_POWERPC_HUGETLB_H
+
+#include <asm/page.h>
+
+
+extern int is_hugepage_only_range(struct mm_struct *m,
+				  unsigned long addr,
+				  unsigned long len);
+
+void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
+			    unsigned long end, unsigned long floor,
+			    unsigned long ceiling);
+
+/*
+ * If the arch doesn't supply something else, assume that hugepage
+ * size aligned regions are ok without further preparation.
+ */
+static inline int prepare_hugepage_range(unsigned long addr, unsigned long len)
+{
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
+		return -EINVAL;
+	return 0;
+}
+
+void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+		     pte_t *ptep, pte_t pte);
+pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep);
+
+#define hugetlb_prefault_arch_hook(mm)		do { } while (0)
+
+#endif /* _ASM_POWERPC_HUGETLB_H */
Index: linux-2.6.25-rc7/include/asm-ia64/page.h
===================================================================
--- linux-2.6.25-rc7.orig/include/asm-ia64/page.h
+++ linux-2.6.25-rc7/include/asm-ia64/page.h
@@ -54,9 +54,6 @@
 # define HPAGE_MASK		(~(HPAGE_SIZE - 1))
 
 # define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
-# define ARCH_HAS_HUGEPAGE_ONLY_RANGE
-# define ARCH_HAS_PREPARE_HUGEPAGE_RANGE
-# define ARCH_HAS_HUGETLB_FREE_PGD_RANGE
 #endif /* CONFIG_HUGETLB_PAGE */
 
 #ifdef __ASSEMBLY__
@@ -153,9 +150,6 @@ typedef union ia64_va {
 # define htlbpage_to_page(x)	(((unsigned long) REGION_NUMBER(x) << 61)			\
 				 | (REGION_OFFSET(x) >> (HPAGE_SHIFT-PAGE_SHIFT)))
 # define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
-# define is_hugepage_only_range(mm, addr, len)		\
-	 (REGION_NUMBER(addr) == RGN_HPAGE ||	\
-	  REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE)
 extern unsigned int hpage_shift;
 #endif
 
Index: linux-2.6.25-rc7/include/asm-powerpc/page_64.h
===================================================================
--- linux-2.6.25-rc7.orig/include/asm-powerpc/page_64.h
+++ linux-2.6.25-rc7/include/asm-powerpc/page_64.h
@@ -128,11 +128,6 @@ extern void slice_init_context(struct mm
 extern void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
 #define slice_mm_new_context(mm)	((mm)->context.id == 0)
 
-#define ARCH_HAS_HUGEPAGE_ONLY_RANGE
-extern int is_hugepage_only_range(struct mm_struct *m,
-				  unsigned long addr,
-				  unsigned long len);
-
 #endif /* __ASSEMBLY__ */
 #else
 #define slice_init()
@@ -146,8 +141,6 @@ do {						\
 
 #ifdef CONFIG_HUGETLB_PAGE
 
-#define ARCH_HAS_HUGETLB_FREE_PGD_RANGE
-#define ARCH_HAS_SETCLEAR_HUGE_PTE
 #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 
 #endif /* !CONFIG_HUGETLB_PAGE */
Index: linux-2.6.25-rc7/include/asm-sparc64/page.h
===================================================================
--- linux-2.6.25-rc7.orig/include/asm-sparc64/page.h
+++ linux-2.6.25-rc7/include/asm-sparc64/page.h
@@ -39,8 +39,6 @@
 #define HPAGE_SIZE		(_AC(1,UL) << HPAGE_SHIFT)
 #define HPAGE_MASK		(~(HPAGE_SIZE - 1UL))
 #define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
-#define ARCH_HAS_SETCLEAR_HUGE_PTE
-#define ARCH_HAS_HUGETLB_PREFAULT_HOOK
 #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 #endif
 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-28 14:05         ` Gerald Schaefer
@ 2008-03-28 14:06           ` Ingo Molnar
  2008-03-28 14:33             ` Gerald Schaefer
  2008-03-28 15:53             ` Martin Schwidefsky
  0 siblings, 2 replies; 31+ messages in thread
From: Ingo Molnar @ 2008-03-28 14:06 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Andrew Morton, haveblue, schwidefsky, linux-kernel, linux-s390,
	David S. Miller, Tony Luck, Paul Mackerras, Thomas Gleixner,
	Paul Mundt


* Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

>  include/asm-sh/hugetlb.h      |   28 +++++++++++++++++++++++++
>  include/asm-sparc64/hugetlb.h |   30 +++++++++++++++++++++++++++
>  include/asm-x86/hugetlb.h     |   28 +++++++++++++++++++++++++

these seem largely duplicated - shouldnt there be an 
asm-generic/hugetlb.h instead, which asm/hugetlb.h could include to get 
default behavior? It would probably reduce the linecount of your patch 
as well.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-28 14:06           ` Ingo Molnar
@ 2008-03-28 14:33             ` Gerald Schaefer
  2008-03-28 15:53             ` Martin Schwidefsky
  1 sibling, 0 replies; 31+ messages in thread
From: Gerald Schaefer @ 2008-03-28 14:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, haveblue, schwidefsky, linux-kernel, linux-s390,
	David S. Miller, Tony Luck, Paul Mackerras, Thomas Gleixner,
	Paul Mundt

On Fri, 2008-03-28 at 15:06 +0100, Ingo Molnar wrote:
> * Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:
> 
> >  include/asm-sh/hugetlb.h      |   28 +++++++++++++++++++++++++
> >  include/asm-sparc64/hugetlb.h |   30 +++++++++++++++++++++++++++
> >  include/asm-x86/hugetlb.h     |   28 +++++++++++++++++++++++++
> 
> these seem largely duplicated - shouldnt there be an 
> asm-generic/hugetlb.h instead, which asm/hugetlb.h could include to get 
> default behavior? It would probably reduce the linecount of your patch 
> as well.

Right, asm-generic was also suggested by Andrew as a first step, before
getting rid of the ARCH_HAS_xxx stuff, and it would also reduce the loc.
But it would make things complicated if one architecture later wants to
take out some of the generic stuff to define it on their own, then they'd
need to touch all architecture headers again.

So I thought I'd rather not introduce the asm-generic header at all but
let all architectures have their own headers, risking code duplication
but hopefully making future updates easier.

--
Gerald Schaefer


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-28 14:06           ` Ingo Molnar
  2008-03-28 14:33             ` Gerald Schaefer
@ 2008-03-28 15:53             ` Martin Schwidefsky
  2008-03-28 16:03               ` Ingo Molnar
  1 sibling, 1 reply; 31+ messages in thread
From: Martin Schwidefsky @ 2008-03-28 15:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gerald Schaefer, Andrew Morton, haveblue, linux-kernel,
	linux-s390, David S. Miller, Tony Luck, Paul Mackerras,
	Thomas Gleixner, Paul Mundt

On Fri, 2008-03-28 at 15:06 +0100, Ingo Molnar wrote:
> * Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:
> 
> >  include/asm-sh/hugetlb.h      |   28 +++++++++++++++++++++++++
> >  include/asm-sparc64/hugetlb.h |   30 +++++++++++++++++++++++++++
> >  include/asm-x86/hugetlb.h     |   28 +++++++++++++++++++++++++
> 
> these seem largely duplicated - shouldnt there be an 
> asm-generic/hugetlb.h instead, which asm/hugetlb.h could include to get 
> default behavior? It would probably reduce the linecount of your patch 
> as well.

Well the hugetlbfs primitives are architecture specific, aren't they?
Just like the other page table manipulation functions. I find the usual
method to use asm-generic/<xxx> and a lot of defines and #ifdefs to pick
up the correct definition from a generic header file rather hard to
read. In the end each arch that wants to use hugetlbfs has to define
each of the hugetlb primitives. Most of them are rather simple, e.g. the
x86 set_huge_pte_at is just a set_pte_at. One line to define the
primitive. Now we could have an #ifdef block around the default
definition that maps set_huge_pte_at to set_pte_at in asm-generic and an
ARCH_HAS_xx override for architecture that need to do something more
complicated. Somehow that was where we started ..
I think the best way to get rid of the ARCH_HAS_xxx fugliness is to let
each architecture define their primitives, even if it looks like code
duplication.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 09/10] Hugetlb common code update for System z.
  2008-03-28 15:53             ` Martin Schwidefsky
@ 2008-03-28 16:03               ` Ingo Molnar
  0 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2008-03-28 16:03 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Gerald Schaefer, Andrew Morton, haveblue, linux-kernel,
	linux-s390, David S. Miller, Tony Luck, Paul Mackerras,
	Thomas Gleixner, Paul Mundt


* Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> > >  include/asm-sh/hugetlb.h      |   28 +++++++++++++++++++++++++
> > >  include/asm-sparc64/hugetlb.h |   30 +++++++++++++++++++++++++++
> > >  include/asm-x86/hugetlb.h     |   28 +++++++++++++++++++++++++
> > 
> > these seem largely duplicated - shouldnt there be an 
> > asm-generic/hugetlb.h instead, which asm/hugetlb.h could include to 
> > get default behavior? It would probably reduce the linecount of your 
> > patch as well.
> 
> Well the hugetlbfs primitives are architecture specific, aren't they? 
> Just like the other page table manipulation functions. I find the 
> usual method to use asm-generic/<xxx> and a lot of defines and #ifdefs 
> to pick up the correct definition from a generic header file rather 
> hard to read. In the end each arch that wants to use hugetlbfs has to 
> define each of the hugetlb primitives. Most of them are rather simple, 
> e.g. the x86 set_huge_pte_at is just a set_pte_at. One line to define 
> the primitive. Now we could have an #ifdef block around the default 
> definition that maps set_huge_pte_at to set_pte_at in asm-generic and 
> an ARCH_HAS_xx override for architecture that need to do something 
> more complicated. Somehow that was where we started .. I think the 
> best way to get rid of the ARCH_HAS_xxx fugliness is to let each 
> architecture define their primitives, even if it looks like code 
> duplication.

sorry, i misread your patch - it indeed looks cleaner with your patch 
applied.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2008-03-28 16:04 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-12 17:31 [patch 00/10] System z10 patches Martin Schwidefsky
2008-03-12 17:31 ` [patch 01/10] Add new fields for System z10 to /proc/sysinfo Martin Schwidefsky
2008-03-12 17:57   ` Josef 'Jeff' Sipek
2008-03-13 10:02     ` Martin Schwidefsky
2008-03-12 17:31 ` [patch 02/10] Export stfle Martin Schwidefsky
2008-03-12 17:31 ` [patch 03/10] sched: add exported arch_reinit_sched_domains() to header file Martin Schwidefsky
2008-03-12 23:03   ` Andrew Morton
2008-03-13  9:48     ` Martin Schwidefsky
2008-03-21 12:29   ` Ingo Molnar
2008-03-12 17:31 ` [patch 04/10] sched: Add arch_update_cpu_topology hook Martin Schwidefsky
2008-03-21 12:30   ` Ingo Molnar
2008-03-12 17:32 ` [patch 05/10] cpu topology: convert siblings_show macro to accept non-lvalues Martin Schwidefsky
2008-03-12 17:32 ` [patch 06/10] cpu topology support for s390 Martin Schwidefsky
2008-03-12 23:11   ` Andrew Morton
2008-03-13 12:28     ` Martin Schwidefsky
2008-03-13 22:40       ` Heiko Carstens
2008-03-12 17:32 ` [patch 07/10] Vertical cpu management Martin Schwidefsky
2008-03-12 17:32 ` [patch 08/10] Add missing TLB flush to hugetlb_cow() Martin Schwidefsky
2008-03-12 17:32 ` [patch 09/10] Hugetlb common code update for System z Martin Schwidefsky
2008-03-12 17:51   ` Dave Hansen
2008-03-12 23:18     ` Gerald Schaefer
2008-03-12 23:43       ` Andrew Morton
2008-03-13 17:49         ` Gerald Schaefer
2008-03-28 14:05         ` Gerald Schaefer
2008-03-28 14:06           ` Ingo Molnar
2008-03-28 14:33             ` Gerald Schaefer
2008-03-28 15:53             ` Martin Schwidefsky
2008-03-28 16:03               ` Ingo Molnar
2008-03-12 17:32 ` [patch 10/10] System z large page support Martin Schwidefsky
2008-03-12 17:52   ` Dave Hansen
2008-03-12 22:14     ` Gerald Schaefer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).