LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 0/8] Immediate Values
@ 2007-08-27 15:59 Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 1/8] Immediate Values - Global Modules List and Module Mutex Mathieu Desnoyers
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel

Hi Andrew,

Here are the updated immediate values for 2.6.23-rc3-mm1. It depends on "Text
Edit Lock".

The example user is the scheduler profiling, but I plan to use it mainly for my
linux kernel markers. Immediate values should prove themselves useful in
situation where features, such as profiling, are rarely used, but should
nevertheless be compiled in a distribution kernel because of their usefulness to
provide useful profiling information.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 1/8] Immediate Values - Global Modules List and Module Mutex
  2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
@ 2007-08-27 15:59 ` Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 2/8] Immediate Values - Architecture Independent Code Mathieu Desnoyers
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-global-modules-list-and-mutex.patch --]
[-- Type: text/plain, Size: 2184 bytes --]

Remove "static" from module_mutex and the modules list so it can be used by
other builtin objects in the kernel. Otherwise, every code depending on the
module list would have to be put in kernel/module.c. Since the immediate values
depends on the module list but can be considered as logically different, it
makes sense to implement them in their own file.

The alternative to this would be to disable preemption in code path that need
such synchronization, so they can be protected against module unload by
stop_machine(), but not being able to sleep within while needing such
synchronization is limiting.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/module.h |    4 ++++
 kernel/module.c        |    4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-08-07 11:03:56.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-08-07 11:40:22.000000000 -0400
@@ -64,8 +64,8 @@ extern int module_sysfs_initialized;
 
 /* List of modules, protected by module_mutex or preempt_disable
  * (add/delete uses stop_machine). */
-static DEFINE_MUTEX(module_mutex);
-static LIST_HEAD(modules);
+DEFINE_MUTEX(module_mutex);
+LIST_HEAD(modules);
 static DECLARE_MUTEX(notify_mutex);
 
 static BLOCKING_NOTIFIER_HEAD(module_notify_list);
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-08-07 11:03:48.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-08-07 11:39:55.000000000 -0400
@@ -60,6 +60,10 @@ struct module_kobject
 	struct kobject *drivers_dir;
 };
 
+/* Protects the list of modules. */
+extern struct mutex module_mutex;
+extern struct list_head modules;
+
 /* These are either module local, or the kernel's dummy ones. */
 extern int init_module(void);
 extern void cleanup_module(void);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 2/8] Immediate Values - Architecture Independent Code
  2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 1/8] Immediate Values - Global Modules List and Module Mutex Mathieu Desnoyers
@ 2007-08-27 15:59 ` Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 3/8] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 15436 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _immediate_read() version, which uses standard global
variables, and optimized per architecture immediate_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__immediate" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Changelog:

- section __immediate is already SHF_ALLOC
- Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
  the if (immediateindex) is unnecessary here.


Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-generic/vmlinux.lds.h |    7 ++
 include/linux/immediate.h         |  119 +++++++++++++++++++++++++++++++++++
 include/linux/module.h            |    6 +
 init/main.c                       |    2 
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  128 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   12 +++
 7 files changed, 275 insertions(+)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2007-08-27 11:49:12.000000000 -0400
@@ -0,0 +1,119 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+#include <asm/immediate.h>
+#else
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+struct module;
+
+/**
+ * immediate_read - read immediate variable
+ * @var: pointer of type immediate_*_t
+ *
+ * Reads the value of @var.
+ */
+#define immediate_read(var)		_immediate_read(var)
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(var, i)		((var)->value = (i))
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Must be called with module_mutex held.
+ */
+#define _immediate_set(var, i)		immediate_set(var, i)
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Should be used for early boot updates.
+ */
+#define immediate_set_early(var, i)	immediate_set(var, i)
+
+/**
+ * immediate_if - if () statement depending on an immediate value
+ * @var: pointer of type immediate_*_t
+ *
+ * Use as an if () statement depending on an immediate value.
+ */
+#define immediate_if(var)		if (immediate_read(var))
+
+/*
+ * Internal update functions.
+ */
+static inline void module_immediate_setup(struct module *mod) { }
+static inline void immediate_update_early(void) { }
+#endif
+
+/**
+ * DEFINE_IMMEDIATE_TYPE - Define an immediate type
+ * @type: type that the immediate should hold
+ * @name: name of the immediate type
+ *
+ * Define new immediate types. Naming scheme is immediate_*_t.
+ * Always access these types with the provided functions.
+ */
+#define DEFINE_IMMEDIATE_TYPE(type, name) \
+	typedef struct { type value; } name
+
+/*
+ * Standard pre-defined immediate types.
+ */
+DEFINE_IMMEDIATE_TYPE(char, immediate_char_t);
+DEFINE_IMMEDIATE_TYPE(short, immediate_short_t);
+DEFINE_IMMEDIATE_TYPE(int, immediate_int_t);
+DEFINE_IMMEDIATE_TYPE(long, immediate_long_t);
+DEFINE_IMMEDIATE_TYPE(void*, immediate_void_ptr_t);
+
+/**
+ * IMMEDIATE_INIT - Static initialization of an immediate variable
+ * @i: required value
+ *
+ * Use this macro to initialize an immediate value to an initial static
+ * value.
+ */
+#define IMMEDIATE_INIT(i)		{ (i) }
+
+/**
+ * _immediate_read - Read immediate value with standard memory load.
+ * @var: pointer of type immediate_*_t
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _immediate_read(var)		(var)->value
+
+/*
+ * _immediate_if - if () statement depending on immediate value (memory load)
+ * @var: pointer of type immediate_*_t
+ *
+ * Force the use of a normal if () statement depending on an immediate value.
+ */
+#define _immediate_if(var)		if (_immediate_read(var))
+
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-08-27 11:16:09.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-08-27 11:49:12.000000000 -0400
@@ -122,6 +122,13 @@
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
 									\
+	/* Immediate values: pointers */				\
+	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__start___immediate) = .;		\
+		*(__immediate)						\
+		VMLINUX_SYMBOL(__stop___immediate) = .;			\
+	}								\
+									\
 	/* Kernel symbol table: strings */				\
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
 		*(__ksymtab_strings)					\
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-08-27 11:49:12.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-08-27 11:49:12.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <asm/local.h>
 
 #include <asm/module.h>
@@ -374,6 +375,11 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+
+#ifdef CONFIG_IMMEDIATE
+	const struct __immediate *immediate;
+	unsigned int num_immediate;
+#endif
 };
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-08-27 11:49:12.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-08-27 11:49:12.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1718,6 +1719,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex = 0;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1814,6 +1816,9 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+#ifdef CONFIG_IMMEDIATE
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
+#endif
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1964,6 +1969,11 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+	mod->num_immediate =
+		sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2030,6 +2040,8 @@ static struct module *load_module(void _
 	 }
 #endif
 
+	module_immediate_setup(mod);
+
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2007-08-27 11:16:09.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile	2007-08-27 11:49:12.000000000 -0400
@@ -59,6 +59,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2007-08-27 11:49:12.000000000 -0400
@@ -0,0 +1,128 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+
+extern const struct __immediate __start___immediate[];
+extern const struct __immediate __stop___immediate[];
+
+/*
+ * modules_mutex nests inside immediate_mutex. immediate_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(immediate_mutex);
+
+/*
+ * Sets a range of immediates to a enabled state : set the enable bit.
+ */
+static inline void _immediate_update_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+	int ret;
+
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&immediate_mutex);
+		kernel_text_lock();
+		ret = arch_immediate_update(iter);
+		kernel_text_unlock();
+		if (ret)
+			printk(KERN_WARNING "Invalid immediate value. "
+					    "Variable at %p, "
+					    "instruction at %p, size %lu\n",
+					    (void*)iter->immediate,
+					    (void*)iter->var, iter->size);
+		mutex_unlock(&immediate_mutex);
+	}
+}
+
+#ifdef CONFIG_MODULES
+/**
+ * module_immediate_setup - Update immediate values in a module
+ * @mod: pointer to the struct module
+ *
+ * Setup the immediate according to the variable upon which it depends.  Called
+ * by load_module with module_mutex held. This mutex protects against concurrent
+ * modifications to modules'immediates. Therefore, since
+ * module_immediate_setup() does not modify builtin immediates, it does not need
+ * to take the immediate_mutex.
+ */
+void module_immediate_setup(struct module *mod)
+{
+	_immediate_update_range(mod->immediate,
+				mod->immediate+mod->num_immediate);
+}
+
+/*
+ * immediate mutex nests inside the modules mutex.
+ */
+static inline void immediate_update_modules(int lock)
+{
+	struct module *mod;
+
+	if (lock)
+		mutex_lock(&module_mutex);
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		_immediate_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+	if (lock)
+		mutex_unlock(&module_mutex);
+}
+#else
+static inline void immediate_update_modules(int lock) { }
+#endif
+
+/**
+ * immediate_update - update all immediate values in the kernel
+ * @lock: should a module_mutex be taken ?
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void immediate_update(int lock)
+{
+	/* Core kernel immediates */
+	_immediate_update_range(__start___immediate, __stop___immediate);
+	/* immediates in modules. */
+	immediate_update_modules(lock);
+}
+EXPORT_SYMBOL_GPL(immediate_update);
+
+static void __init immediate_update_early_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+
+	for (iter = begin; iter < end; iter++)
+		arch_immediate_update_early(iter);
+}
+
+/**
+ * immediate_update_early - Update immediate values at boot time
+ *
+ * Update the immediate values to the state of the variables they refer to. It
+ * is done before SMP is active, at the very beginning of start_kernel().
+ */
+void __init immediate_update_early(void)
+{
+	immediate_update_early_range(__start___immediate, __stop___immediate);
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2007-08-27 11:16:09.000000000 -0400
+++ linux-2.6-lttng/init/main.c	2007-08-27 11:49:12.000000000 -0400
@@ -56,6 +56,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -529,6 +530,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	container_init_early();
+	immediate_update_early();
 
 	local_irq_disable();
 	early_boot_irqs_off();

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 3/8] Immediate Values - Kconfig menu in EMBEDDED
  2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 1/8] Immediate Values - Global Modules List and Module Mutex Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 2/8] Immediate Values - Architecture Independent Code Mathieu Desnoyers
@ 2007-08-27 15:59 ` Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 4/8] Immediate Values - Move Kprobes i386 restore_interrupt to kdebug.h Mathieu Desnoyers
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel
  Cc: Mathieu Desnoyers, Adrian Bunk, Andi Kleen, Alexey Dobriyan,
	Christoph Hellwig

[-- Attachment #1: immediate-values-kconfig-embedded.patch --]
[-- Type: text/plain, Size: 2258 bytes --]

Immediate values provide a way to use dynamic code patching to update variables
sitting within the instruction stream. It saves caches lines normally used by
static read mostly variables. Enable it by default, but let users disable it
through the EMBEDDED menu with the "Disable immediate values" submenu entry.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
---
 init/Kconfig |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig	2007-08-07 16:36:20.000000000 -0400
+++ linux-2.6-lttng/init/Kconfig	2007-08-07 16:44:26.000000000 -0400
@@ -411,6 +411,17 @@ config CC_OPTIMIZE_FOR_SIZE
 config SYSCTL
 	bool
 
+config IMMEDIATE
+	default y if !DISABLE_IMMEDIATE
+	depends on X86_32 || PPC || PPC64
+	bool
+	help
+	  Immediate values are used as read mostly variables that are rarely
+	  updated. They use code patching to modify the values inscribed in the
+	  instruction stream. It provides a way to save precious cache lines
+	  that would otherwise have to be used by these variables. Can be
+	  disabled through the EMBEDDED menu.
+
 menuconfig EMBEDDED
 	bool "Configure standard kernel features (for small systems)"
 	help
@@ -649,6 +660,16 @@ config PROC_KPAGEMAP
           information on page-level memory usage. Disabling this interface
           will reduce the size of the kernel by around 600 bytes.
 
+config DISABLE_IMMEDIATE
+	default y if EMBEDDED
+	bool "Disable immediate values" if EMBEDDED
+	depends on X86_32 || PPC || PPC64
+	help
+	  Disable code patching based immediate values for embedded systems. It
+	  consumes slightly more memory and requires to modify the instruction
+	  stream each time a variable is updated. Should really be disabled for
+	  embedded systems with read-only text.
+
 endmenu		# General setup
 
 config RT_MUTEXES

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 4/8] Immediate Values - Move Kprobes i386 restore_interrupt to kdebug.h
  2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2007-08-27 15:59 ` [patch 3/8] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
@ 2007-08-27 15:59 ` Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 5/8] Immediate Values - i386 Optimization Mathieu Desnoyers
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel
  Cc: Mathieu Desnoyers, Christoph Hellwig, prasanna, ananth,
	anil.s.keshavamurthy, davem

[-- Attachment #1: immediate-values-move-kprobes-i386-restore-interrupt-to-kdebug-h.patch --]
[-- Type: text/plain, Size: 2294 bytes --]

Since the breakpoint handler is useful both to kprobes and immediate values, it
makes sense to make the required restore_interrupt() available through
asm-i386/kdebug.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Christoph Hellwig <hch@infradead.org>
CC: prasanna@in.ibm.com
CC: ananth@in.ibm.com
CC: anil.s.keshavamurthy@intel.com
CC: davem@davemloft.net
---
 include/asm-i386/kdebug.h  |   12 ++++++++++++
 include/asm-i386/kprobes.h |    9 ---------
 2 files changed, 12 insertions(+), 9 deletions(-)

Index: linux-2.6-lttng/include/asm-i386/kdebug.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-i386/kdebug.h	2007-08-09 15:45:55.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/kdebug.h	2007-08-09 16:23:50.000000000 -0400
@@ -6,6 +6,9 @@
  * from x86_64 architecture.
  */
 
+#include <asm/ptrace.h>
+#include <asm/system.h>
+
 struct pt_regs;
 
 /* Grossly misnamed. */
@@ -25,4 +28,13 @@ enum die_val {
 	DIE_PAGE_FAULT_NO_CONTEXT,
 };
 
+/* trap3/1 are intr gates for kprobes.  So, restore the status of IF,
+ * if necessary, before executing the original int3/1 (trap) handler.
+ */
+static inline void restore_interrupts(struct pt_regs *regs)
+{
+	if (regs->eflags & IF_MASK)
+		local_irq_enable();
+}
+
 #endif
Index: linux-2.6-lttng/include/asm-i386/kprobes.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-i386/kprobes.h	2007-08-09 15:45:55.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/kprobes.h	2007-08-09 15:46:39.000000000 -0400
@@ -77,15 +77,6 @@ struct kprobe_ctlblk {
 	struct prev_kprobe prev_kprobe;
 };
 
-/* trap3/1 are intr gates for kprobes.  So, restore the status of IF,
- * if necessary, before executing the original int3/1 (trap) handler.
- */
-static inline void restore_interrupts(struct pt_regs *regs)
-{
-	if (regs->eflags & IF_MASK)
-		local_irq_enable();
-}
-
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 				    unsigned long val, void *data);
 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 5/8] Immediate Values - i386 Optimization
  2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2007-08-27 15:59 ` [patch 4/8] Immediate Values - Move Kprobes i386 restore_interrupt to kdebug.h Mathieu Desnoyers
@ 2007-08-27 15:59 ` Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 6/8] Immediate Values - Powerpc Optimization Mathieu Desnoyers
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers, Christoph Hellwig

[-- Attachment #1: immediate-values-i386-optimization.patch --]
[-- Type: text/plain, Size: 17352 bytes --]

i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Reviewed-by: Andi Kleen <ak@muc.de>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
---
 arch/i386/kernel/Makefile    |    1 
 arch/i386/kernel/immediate.c |  296 +++++++++++++++++++++++++++++++++++++++++++
 arch/i386/kernel/traps.c     |    8 -
 include/asm-i386/immediate.h |  137 +++++++++++++++++++
 4 files changed, 438 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng/include/asm-i386/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-i386/immediate.h	2007-08-27 11:49:18.000000000 -0400
@@ -0,0 +1,137 @@
+#ifndef _ASM_I386_IMMEDIATE_H
+#define _ASM_I386_IMMEDIATE_H
+
+/*
+ * Immediate values. i386 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+struct module;
+
+struct __immediate {
+	long var;		/* Pointer to the identifier variable of the
+				 * immediate value
+				 */
+	long immediate;		/*
+				 * Pointer to the memory location of the
+				 * immediate value within the instruction.
+				 */
+	long size;		/* Type size. */
+};
+
+/**
+ * immediate_read - read immediate variable
+ * @var: pointer of type immediate_*_t
+ *
+ * Reads the value of @var.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _immediate_read() instead.
+ * Makes sure the 2 and 4 bytes update will be atomic by aligning the immediate
+ * value. 2 bytes (short) uses a 66H prefix. If size is bigger than 4 bytes,
+ * fall back on a memory read.
+ */
+#define immediate_read(var)						\
+	({								\
+		__typeof__((var)->value) value;				\
+		switch (sizeof(value)) {				\
+		case 1:							\
+			asm (	".section __immediate, \"a\", @progbits;\n\t" \
+					".long %1, (0f)+1, 1;\n\t"	\
+					".previous;\n\t"		\
+					"0:\n\t"			\
+					"mov %2,%0;\n\t"		\
+				: "=r" (value)				\
+				: "m" ((var)->value),			\
+				  "i" (0));				\
+			break;						\
+		case 2:							\
+			asm (	".section __immediate, \"a\", @progbits;\n\t" \
+					".long %1, (0f)+2, 2;\n\t"	\
+					".previous;\n\t"		\
+					"1:\n\t"			\
+					".align 2;\n\t"			\
+					"0:\n\t"			\
+					"mov %2,%0;\n\t"		\
+				: "=r" (value)				\
+				: "m" ((var)->value),			\
+				  "i" (0));				\
+			break;						\
+		case 4:							\
+			asm (	".section __immediate, \"a\", @progbits;\n\t" \
+					".long %1, (0f)+1, 4;\n\t"	\
+					".previous;\n\t"		\
+					"1:\n\t"			\
+					".org (1b)+(3-((1b)%%4)), 0x90;\n\t" \
+					"0:\n\t"			\
+					"mov %2,%0;\n\t"		\
+				: "=r" (value)				\
+				: "m" ((var)->value),			\
+				  "i" (0));				\
+			break;						\
+		default:value = (var)->value;				\
+			break;						\
+		};							\
+		value;							\
+	})
+
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(var, i) \
+	(var)->value = (i); \
+	immediate_update(1);
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Must be called with module_mutex held.
+ */
+#define _immediate_set(var, i) \
+	(var)->value = (i); \
+	immediate_update(0);
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Should be used for early boot updates.
+ */
+#define immediate_set_early(var, i) \
+	(var)->value = (i); \
+	immediate_update_early();
+
+/**
+ * immediate_if - if () statement depending on an immediate value
+ * @var: pointer of type immediate_*_t
+ *
+ * Use as an if () statement depending on an immediate value.
+ * Do not use in __init and __exit functions. Use _immediate_if() instead.
+ * Branch depending on an immediate value. Could eventually be optimized further
+ * by improving gcc to give the ability to patch a jump instruction instead of
+ * the value it depends on.
+ */
+#define immediate_if(var)	if (unlikely(immediate_read(var)))
+
+/*
+ * Internal update functions.
+ */
+extern void immediate_update(int lock);
+extern void module_immediate_setup(struct module *mod);
+extern void immediate_update_early(void);
+extern int arch_immediate_update(const struct __immediate *immediate);
+extern void arch_immediate_update_early(const struct __immediate *immediate);
+
+#endif /* _ASM_I386_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/i386/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/Makefile	2007-08-27 11:16:08.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/Makefile	2007-08-27 11:49:18.000000000 -0400
@@ -37,6 +37,7 @@ obj-$(CONFIG_MODULES)		+= module.o
 obj-y				+= sysenter.o vsyscall.o
 obj-$(CONFIG_ACPI_SRAT) 	+= srat.o
 obj-$(CONFIG_EFI) 		+= efi.o efi_stub.o
+obj-$(CONFIG_IMMEDIATE)		+= immediate.o
 obj-$(CONFIG_DOUBLEFAULT) 	+= doublefault.o
 obj-$(CONFIG_KGDB)		+= kgdb.o kgdb-jmp.o
 obj-$(CONFIG_VM86)		+= vm86.o
Index: linux-2.6-lttng/arch/i386/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/i386/kernel/immediate.c	2007-08-27 11:49:18.000000000 -0400
@@ -0,0 +1,296 @@
+/*
+ * Immediate Value - i386 architecture specific code.
+ *
+ * Rationale
+ *
+ * Required because of :
+ * - Erratum 49 fix for Intel PIII.
+ * - Still present on newer processors : Intel Core 2 Duo Processor for Intel
+ *   Centrino Duo Processor Technology Specification Update, AH33.
+ *   Unsynchronized Cross-Modifying Code Operations Can Cause Unexpected
+ *   Instruction Execution Results.
+ *
+ * Permits immediate value modification by XMC with correct serialization.
+ *
+ * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
+ * location that has preemption enabled because it involves no temporary or
+ * reused data structure.
+ *
+ * Quoting Richard J Moore, source of the information motivating this
+ * implementation which differs from the one proposed by Intel which is not
+ * suitable for kernel context (does not support NMI and would require disabling
+ * interrupts on every CPU for a long period) :
+ *
+ * "There is another issue to consider when looking into using probes other
+ * then int3:
+ *
+ * Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
+ * practice of modifying code on one processor where another has prefetched
+ * the unmodified version of the code. Intel states that unpredictable general
+ * protection faults may result if a synchronizing instruction (iret, int,
+ * int3, cpuid, etc ) is not executed on the second processor before it
+ * executes the pre-fetched out-of-date copy of the instruction.
+ *
+ * When we became aware of this I had a long discussion with Intel's
+ * microarchitecture guys. It turns out that the reason for this erratum
+ * (which incidentally Intel does not intend to fix) is because the trace
+ * cache - the stream of micorops resulting from instruction interpretation -
+ * cannot guaranteed to be valid. Reading between the lines I assume this
+ * issue arises because of optimization done in the trace cache, where it is
+ * no longer possible to identify the original instruction boundaries. If the
+ * CPU discoverers that the trace cache has been invalidated because of
+ * unsynchronized cross-modification then instruction execution will be
+ * aborted with a GPF. Further discussion with Intel revealed that replacing
+ * the first opcode byte with an int3 would not be subject to this erratum.
+ *
+ * So, is cmpxchg reliable? One has to guarantee more than mere atomicity."
+ *
+ * Overall design
+ *
+ * The algorithm proposed by Intel applies not so well in kernel context: it
+ * would imply disabling interrupts and looping on every CPUs while modifying
+ * the code and would not support instrumentation of code called from interrupt
+ * sources that cannot be disabled.
+ *
+ * Therefore, we use a different algorithm to respect Intel's erratum (see the
+ * quoted discussion above). We make sure that no CPU sees an out-of-date copy
+ * of a pre-fetched instruction by 1 - using a breakpoint, which skips the
+ * instruction that is going to be modified, 2 - issuing an IPI to every CPU to
+ * execute a sync_core(), to make sure that even when the breakpoint is removed,
+ * no cpu could possibly still have the out-of-date copy of the instruction,
+ * modify the now unused 2nd byte of the instruction, and then put back the
+ * original 1st byte of the instruction.
+ *
+ * It has exactly the same intent as the algorithm proposed by Intel, but
+ * it has less side-effects, scales better and supports NMI, SMI and MCE.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/notifier.h>
+#include <linux/preempt.h>
+#include <linux/smp.h>
+#include <linux/notifier.h>
+#include <linux/module.h>
+#include <linux/immediate.h>
+#include <linux/kdebug.h>
+#include <linux/rcupdate.h>
+#include <linux/kprobes.h>
+
+#include <asm/cacheflush.h>
+
+#define BREAKPOINT_INSTRUCTION  0xcc
+#define BREAKPOINT_INS_LEN	1
+#define NOP_INSTRUCTION		0x90
+#define NR_NOPS			8
+
+static long target_after_int3;		/* EIP of the target after the int3 */
+static long bypass_eip;			/* EIP of the bypass. */
+static long bypass_after_int3;		/* EIP after the end-of-bypass int3 */
+static long after_immediate;		/*
+					 * EIP where to resume after the
+					 * single-stepping.
+					 */
+
+/*
+ * Size of the movl instruction (without the immediate value) in bytes.
+ * The 2 bytes load immediate has a 66H prefix, which makes the opcode 2 bytes
+ * wide.
+ */
+static inline size_t _immediate_get_insn_size(long size)
+{
+	switch (size) {
+		case 1: return 1;
+		case 2: return 2;
+		case 4: return 1;
+		default: BUG();
+	};
+}
+
+/*
+ * Internal bypass used during value update. The bypass is skipped by the
+ * function in which it is inserted.
+ * No need to be aligned because we exclude readers from the site during
+ * update.
+ * Layout is:
+ * nop nop nop nop nop nop nop nop int3
+ * The nops are the target replaced by the instruction to single-step.
+ */
+static inline void _immediate_bypass(long *bypassaddr, long *breaknextaddr)
+{
+		asm volatile (	"jmp 2f;\n\t"
+				"0:\n\t"
+				".space 8, 0x90;\n\t"
+				"1:\n\t"
+				"int3;\n\t"
+				"2:\n\t"
+				"movl $(0b),%0;\n\t"
+				"movl $((1b)+1),%1;\n\t"
+				: "=r" (*bypassaddr),
+				  "=r" (*breaknextaddr) : );
+}
+
+static void immediate_synchronize_core(void *info)
+{
+	sync_core();	/* use cpuid to stop speculative execution */
+}
+
+/*
+ * The eip value points right after the breakpoint instruction, in the second
+ * byte of the movl.
+ * Disable preemption in the bypass to make sure no thread will be preempted in
+ * it. We can then use synchronize_sched() to make sure every bypass users have
+ * ended.
+ */
+static int immediate_notifier(struct notifier_block *nb,
+	unsigned long val, void *data)
+{
+	enum die_val die_val = (enum die_val) val;
+	struct die_args *args = data;
+
+	if (!args->regs || user_mode_vm(args->regs))
+		return NOTIFY_DONE;
+
+	if (die_val == DIE_INT3) {
+		if (args->regs->eip == target_after_int3) {
+			preempt_disable();
+			args->regs->eip = bypass_eip;
+			return NOTIFY_STOP;
+		} else if (args->regs->eip == bypass_after_int3) {
+			args->regs->eip = after_immediate;
+			preempt_enable();
+			return NOTIFY_STOP;
+		}
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block immediate_notify = {
+	.notifier_call = immediate_notifier,
+	.priority = 0x7fffffff,	/* we need to be notified first */
+};
+
+
+/**
+ * arch_immediate_update - update one immediate value
+ * @immediate: pointer of type const struct __immediate to update
+ *
+ * Update one immediate value. Must be called with immediate_mutex held.
+ */
+__kprobes int arch_immediate_update(const struct __immediate *immediate)
+{
+	int ret;
+	size_t insn_size = _immediate_get_insn_size(immediate->size);
+	long insn = immediate->immediate - insn_size;
+
+#ifdef CONFIG_KPROBES
+	/*
+	 * Fail if a kprobe has been set on this instruction.
+	 * (TODO: we could eventually do better and modify all the (possibly
+	 * nested) kprobes for this site if kprobes had an API for this.
+	 */
+	if (unlikely(*(unsigned char*)insn == BREAKPOINT_INSTRUCTION)) {
+		printk(KERN_WARNING "Immediate value in conflict with kprobe. "
+				    "Variable at %p, "
+				    "instruction at %p, size %lu\n",
+				    (void*)immediate->immediate,
+				    (void*)immediate->var, immediate->size);
+		return -EBUSY;
+	}
+#endif
+
+	/*
+	 * If the variable and the instruction have the same value, there is
+	 * nothing to do.
+	 */
+	switch (immediate->size) {
+		case 1:	if (*(uint8_t*)immediate->immediate
+					== *(uint8_t*)immediate->var)
+				return 0;
+			break;
+		case 2:	if (*(uint16_t*)immediate->immediate
+					== *(uint16_t*)immediate->var)
+				return 0;
+			break;
+		case 4:	if (*(uint32_t*)immediate->immediate
+					== *(uint32_t*)immediate->var)
+				return 0;
+			break;
+		default:return -EINVAL;
+	}
+
+	_immediate_bypass(&bypass_eip, &bypass_after_int3);
+
+	after_immediate = immediate->immediate + immediate->size;
+
+	text_poke((void*)bypass_eip, (void*)insn, insn_size + immediate->size);
+	/*
+	 * Fill the rest with nops.
+	 */
+	text_set((void*)(bypass_eip + insn_size + immediate->size),
+		NOP_INSTRUCTION,
+		NR_NOPS - immediate->size - insn_size);
+
+	target_after_int3 = insn + BREAKPOINT_INS_LEN;
+	/* register_die_notifier has memory barriers */
+	register_die_notifier(&immediate_notify);
+	/* The breakpoint will single-step the bypass */
+	text_set((void*)insn, BREAKPOINT_INSTRUCTION, 1);
+	wmb();
+	/*
+	 * Execute serializing instruction on each CPU.
+	 * Acts as a memory barrier.
+	 */
+	ret = on_each_cpu(immediate_synchronize_core, NULL, 1, 1);
+	BUG_ON(ret != 0);
+
+	text_poke((void*)(insn + insn_size), (void*)immediate->var,
+			immediate->size);
+	wmb();
+	text_set((void*)insn, *(char*)bypass_eip, 1);
+		/*
+		 * Wait for all int3 handlers to end
+		 * (interrupts are disabled in int3).
+		 * This CPU is clearly not in a int3 handler,
+		 * because int3 handler is not preemptible and
+		 * there cannot be any more int3 handler called
+		 * for this site, because we placed the original
+		 * instruction back.
+		 * synchronize_sched has memory barriers.
+		 */
+	synchronize_sched();
+	unregister_die_notifier(&immediate_notify);
+	/* unregister_die_notifier has memory barriers */
+	return 0;
+}
+
+/**
+ * arch_immediate_update_early - update one immediate value at boot time
+ * @immediate: pointer of type const struct __immediate to update
+ *
+ * Update one immediate value at boot time.
+ */
+void __init arch_immediate_update_early(const struct __immediate *immediate)
+{
+	/*
+	 * If the variable and the instruction have the same value, there is
+	 * nothing to do.
+	 */
+	switch (immediate->size) {
+		case 1:	if (*(uint8_t*)immediate->immediate
+					== *(uint8_t*)immediate->var)
+				return;
+			break;
+		case 2:	if (*(uint16_t*)immediate->immediate
+					== *(uint16_t*)immediate->var)
+				return;
+			break;
+		case 4:	if (*(uint32_t*)immediate->immediate
+					== *(uint32_t*)immediate->var)
+				return;
+			break;
+		default:return;
+	}
+	memcpy((void*)immediate->immediate, (void*)immediate->var,
+			immediate->size);
+}
Index: linux-2.6-lttng/arch/i386/kernel/traps.c
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/traps.c	2007-08-27 11:16:08.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/traps.c	2007-08-27 11:49:18.000000000 -0400
@@ -602,7 +602,7 @@ fastcall void do_##name(struct pt_regs *
 }
 
 DO_VM86_ERROR_INFO( 0, SIGFPE,  "divide error", divide_error, FPE_INTDIV, regs->eip)
-#ifndef CONFIG_KPROBES
+#if !defined(CONFIG_KPROBES) && !defined(CONFIG_IMMEDIATE)
 DO_VM86_ERROR( 3, SIGTRAP, "int3", int3)
 #endif
 DO_VM86_ERROR( 4, SIGSEGV, "overflow", overflow)
@@ -844,14 +844,14 @@ void restart_nmi(void)
 	acpi_nmi_enable();
 }
 
-#ifdef CONFIG_KPROBES
+#if defined(CONFIG_KPROBES) || defined(CONFIG_IMMEDIATE)
 fastcall void __kprobes do_int3(struct pt_regs *regs, long error_code)
 {
 	if (notify_die(DIE_INT3, "int3", regs, error_code, 3, SIGTRAP)
 			== NOTIFY_STOP)
 		return;
-	/* This is an interrupt gate, because kprobes wants interrupts
-	disabled.  Normal trap handlers don't. */
+	/* This is an interrupt gate, because kprobes and immediate values wants
+	 * interrupts disabled. Normal trap handlers don't. */
 	restore_interrupts(regs);
 	do_trap(3, SIGTRAP, "int3", 1, regs, error_code, NULL);
 }

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 6/8] Immediate Values - Powerpc Optimization
  2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2007-08-27 15:59 ` [patch 5/8] Immediate Values - i386 Optimization Mathieu Desnoyers
@ 2007-08-27 15:59 ` Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 7/8] Immediate Values - Documentation Mathieu Desnoyers
  2007-08-27 15:59 ` [patch 8/8] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
  7 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers, Christoph Hellwig

[-- Attachment #1: immediate-values-powerpc-optimization.patch --]
[-- Type: text/plain, Size: 8389 bytes --]

PowerPC optimization of the immediate values which uses a li instruction,
patched with an immediate value.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Christoph Hellwig <hch@infradead.org>
---
 arch/powerpc/kernel/Makefile    |    1 
 arch/powerpc/kernel/immediate.c |  103 ++++++++++++++++++++++++++++++++
 include/asm-powerpc/immediate.h |  127 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 231 insertions(+)

Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h	2007-08-27 11:49:21.000000000 -0400
@@ -0,0 +1,127 @@
+#ifndef _ASM_POWERPC_IMMEDIATE_H
+#define _ASM_POWERPC_IMMEDIATE_H
+
+/*
+ * Immediate values. PowerPC architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm-compat.h>
+
+struct module;
+
+struct __immediate {
+	long var;		/* Identifier variable of the immediate value */
+	long immediate;		/*
+				 * Pointer to the memory location that holds
+				 * the immediate value within the load immediate
+				 * instruction.
+				 */
+	long size;		/* Type size. */
+};
+
+/**
+ * immediate_read - read immediate variable
+ * @var: pointer of type immediate_*_t
+ *
+ * Reads the value of @var.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _immediate_read() instead.
+ * Makes sure the 2 bytes update will be atomic by aligning the immediate
+ * value. Use a normal memory read for the 4 bytes immediate because there is no
+ * way to atomically update it without using a seqlock read side, which would
+ * cost more in term of total i-cache and d-cache space than a simple memory
+ * read.
+ */
+#define immediate_read(var)						\
+	({								\
+		__typeof__((var)->value) value;				\
+		switch (sizeof(value)) {				\
+		case 1:							\
+			asm (	".section __immediate, \"a\", @progbits;\n\t" \
+					PPC_LONG "%1, ((0f)+3), 1;\n\t"	\
+					".previous;\n\t"		\
+					"0:\n\t"			\
+					"li %0,%2;\n\t"			\
+				: "=r" (value)				\
+				: "i" (&(var)->value),			\
+				  "i" (0));				\
+			break;						\
+		case 2:							\
+			asm (	".section __immediate, \"a\", @progbits;\n\t" \
+					PPC_LONG "%1, ((0f)+2), 2;\n\t"	\
+					".previous;\n\t"		\
+					".align 2\n\t"			\
+					"0:\n\t"			\
+					"li %0,%2;\n\t"			\
+				: "=r" (value)				\
+				: "i" (&(var)->value),			\
+				  "i" (0));				\
+			break;						\
+		default:						\
+			value = (var)->value;				\
+			break;						\
+		};							\
+		value;							\
+	})
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(var, i) \
+	(var)->value = (i); \
+	immediate_update(1);
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Must be called with module_mutex held.
+ */
+#define _immediate_set(var, i) \
+	(var)->value = (i); \
+	immediate_update(0);
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Should be used for early boot updates.
+ */
+#define immediate_set_early(var, i) \
+	(var)->value = (i); \
+	immediate_update_early();
+
+/**
+ * immediate_if - if () statement depending on an immediate value
+ * @var: pointer of type immediate_*_t
+ *
+ * Use as an if () statement depending on an immediate value.
+ * Do not use in __init and __exit functions. Use _immediate_if() instead.
+ * Branch depending on an immediate value. Could eventually be optimized further
+ * by improving gcc to give the ability to patch a jump instruction instead of
+ * the value it depends on.
+ */
+#define immediate_if(var)	if (unlikely(immediate_read(var)))
+
+/*
+ * Internal update functions.
+ */
+extern void immediate_update(int lock);
+extern void module_immediate_setup(struct module *mod);
+extern void immediate_update_early(void);
+extern int arch_immediate_update(const struct __immediate *immediate);
+extern void arch_immediate_update_early(const struct __immediate *immediate);
+
+#endif /* _ASM_POWERPC_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/powerpc/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/kernel/Makefile	2007-08-27 11:16:08.000000000 -0400
+++ linux-2.6-lttng/arch/powerpc/kernel/Makefile	2007-08-27 11:49:21.000000000 -0400
@@ -104,3 +104,4 @@ obj-$(CONFIG_PPC64)		+= $(obj64-y)
 
 extra-$(CONFIG_PPC_FPU)		+= fpu.o
 extra-$(CONFIG_PPC64)		+= entry_64.o
+obj-$(CONFIG_IMMEDIATE)		+= immediate.o
Index: linux-2.6-lttng/arch/powerpc/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/powerpc/kernel/immediate.c	2007-08-27 11:49:21.000000000 -0400
@@ -0,0 +1,103 @@
+/*
+ * Powerpc optimized immediate values enabling/disabling.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/module.h>
+#include <linux/immediate.h>
+#include <linux/string.h>
+#include <linux/kprobes.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#define LI_OPCODE_LEN	2
+
+/**
+ * arch_immediate_update - update one immediate value
+ * @immediate: pointer of type const struct __immediate to update
+ *
+ * Update one immediate value. Must be called with immediate_mutex held.
+ */
+int arch_immediate_update(const struct __immediate *immediate)
+{
+#ifdef CONFIG_KPROBES
+	kprobe_opcode_t *insn;
+	/*
+	 * Fail if a kprobe has been set on this instruction.
+	 * (TODO: we could eventually do better and modify all the (possibly
+	 * nested) kprobes for this site if kprobes had an API for this.
+	 */
+	switch (immediate->size) {
+		case 1:	/* The uint8_t points to the 3rd byte of the
+			 * instruction */
+			insn = (void*)(immediate->immediate - 1 - LI_OPCODE_LEN);
+			break;
+		case 2:	insn = (void*)(immediate->immediate - LI_OPCODE_LEN);
+			break;
+		default:
+		return -EINVAL;
+	}
+
+	if (unlikely(*insn == BREAKPOINT_INSTRUCTION)) {
+		printk(KERN_WARNING "Immediate value in conflict with kprobe. "
+				    "Variable at %p, "
+				    "instruction at %p, size %lu\n",
+				    (void*)immediate->immediate,
+				    (void*)immediate->var, immediate->size);
+		return -EBUSY;
+	}
+#endif
+
+	/*
+	 * If the variable and the instruction have the same value, there is
+	 * nothing to do.
+	 */
+	switch (immediate->size) {
+		case 1:	if (*(uint8_t*)immediate->immediate
+					== *(uint8_t*)immediate->var)
+				return 0;
+			break;
+		case 2:	if (*(uint16_t*)immediate->immediate
+					== *(uint16_t*)immediate->var)
+				return 0;
+			break;
+		default:return -EINVAL;
+	}
+	memcpy((void*)immediate->immediate, (void*)immediate->var,
+			immediate->size);
+	flush_icache_range((unsigned long)immediate->immediate,
+				immediate->size);
+	return 0;
+}
+
+/**
+ * arch_immediate_update_early - update one immediate value at boot time
+ * @immediate: pointer of type const struct __immediate to update
+ *
+ * Update one immediate value at boot time.
+ * We can use flush_icache_range, since the cpu identification has been done in
+ * the early_init stage.
+ */
+void __init arch_immediate_update_early(const struct __immediate *immediate)
+{
+	/*
+	 * If the variable and the instruction have the same value, there is
+	 * nothing to do.
+	 */
+	switch (immediate->size) {
+		case 1:	if (*(uint8_t*)immediate->immediate
+					== *(uint8_t*)immediate->var)
+				return;
+			break;
+		case 2:	if (*(uint16_t*)immediate->immediate
+					== *(uint16_t*)immediate->var)
+				return;
+			break;
+		default:return;
+	}
+	memcpy((void*)immediate->immediate, (void*)immediate->var,
+			immediate->size);
+	flush_icache_range((unsigned long)immediate->immediate,
+				immediate->size);
+}

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 7/8] Immediate Values - Documentation
  2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2007-08-27 15:59 ` [patch 6/8] Immediate Values - Powerpc Optimization Mathieu Desnoyers
@ 2007-08-27 15:59 ` Mathieu Desnoyers
  2007-09-20 10:46   ` Denys Vlasenko
  2007-08-27 15:59 ` [patch 8/8] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
  7 siblings, 1 reply; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: immediate-values-documentation.patch --]
[-- Type: text/plain, Size: 9274 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 Documentation/immediate.txt |  232 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 232 insertions(+)

Index: linux-2.6-lttng/Documentation/immediate.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/immediate.txt	2007-08-20 15:55:26.000000000 -0400
@@ -0,0 +1,232 @@
+		        Using the Immediate Values
+
+			    Mathieu Desnoyers
+
+
+This document introduces Immediate Values and their use.
+
+* Purpose of immediate values
+
+An immediate value is used to compile into the kernel variables that sits within
+the instruction stream. They are meant to be rarely updated but read often.
+Using immediate values for these variables will save cache lines.
+
+This infrastructure is specialized in supporting dynamic patching of the values
+in the instruction stream when multiple CPUs are running without disturbing the
+normal system behavior.
+
+Compiling code meant to be rarely enabled at runtime can be done using
+immediate_if() as condition surrounding the code.
+
+* Usage
+
+In order to use the macro immediate, you should include linux/immediate.h.
+
+#include <linux/immediate.h>
+
+immediate_char_t this_immediate;
+EXPORT_SYMBOL(this_immediate);
+
+
+Add, in your code :
+
+Use immediate_set(&this_immediate) to set the immediate value.
+
+Use immediate_read(&this_immediate) to read the immediate value.
+
+The immediate mechanism supports inserting multiple instances of the same
+immediate. Immediate values can be put in inline functions, inlined static
+functions, and unrolled loops.
+
+If you have to read the immediate values from a function declared as __init or
+__exit, you should explicitly use _immediate_read(), which will fall back on a
+global variable read. Failing to do so will leave a reference to the __init
+section after it is freed (it would generate a modpost warning).
+
+The prefered idiom to dynamically enable compiled-in code is to use
+immediate_if (&this_immediate), which may eventually use gcc improvements to
+provide a jump instruction patching based condition instead of a immediate value
+feeding a conditional jump. You should use _immediate_if () instead of
+immediate_if () in functions marked __init or __exit.
+
+immediate_set_early() should be used only at early kernel boot time, before SMP
+is activated.
+
+If you need to declare your own immediate types (for instance, a pointer to
+struct task_struct), use:
+
+DEFINE_IMMEDIATE_TYPE(struct task_struct*, immediate_task_struct_ptr_t);
+
+and declare your variable with:
+immediate_task_struct_ptr_t myptr;
+
+You can choose to set an initial static value to the immediate by using, for
+instance:
+
+immediate_task_struct_ptr_t myptr = IMMEDIATE_INIT(10);
+
+
+* Optimization for a given architecture
+
+One can implement optimized immediate values for a given architecture by
+replacing asm-$ARCH/immediate.h.
+
+* Performance improvement
+
+* Memory hit for a data-based branch
+
+Here are the results on a 3GHz Pentium 4:
+
+number of tests : 100
+number of branches per test : 100000
+memory hit cycles per iteration (mean) : 636.611
+L1 cache hit cycles per iteration (mean) : 89.6413
+instruction stream based test, cycles per iteration (mean) : 85.3438
+Just getting the pointer from a modulo on a pseudo-random value, doing
+  noting with it, cycles per iteration (mean) : 77.5044
+
+So:
+Base case:                      77.50 cycles
+instruction stream based test:  +7.8394 cycles
+L1 cache hit based test:        +12.1369 cycles
+Memory load based test:         +559.1066 cycles
+
+So let's say we have a ping flood coming at
+(14014 packets transmitted, 14014 received, 0% packet loss, time 1826ms)
+7674 packets per second. If we put 2 markers for irq entry/exit, it
+brings us to 15348 markers sites executed per second.
+
+(15348 exec/s) * (559 cycles/exec) / (3G cycles/s) = 0.0029
+We therefore have a 0.29% slowdown just on this case.
+
+Compared to this, the instruction stream based test will cause a
+slowdown of:
+
+(15348 exec/s) * (7.84 cycles/exec) / (3G cycles/s) = 0.00004
+For a 0.004% slowdown.
+
+If we plan to use this for memory allocation, spinlock, and all sort of
+very high event rate tracing, we can assume it will execute 10 to 100
+times more sites per second, which brings us to 0.4% slowdown with the
+instruction stream based test compared to 29% slowdown with the memory
+load based test on a system with high memory pressure.
+
+
+
+* Markers impact under heavy memory load
+
+Running a kernel with my LTTng instrumentation set, in a test that
+generates memory pressure (from userspace) by trashing L1 and L2 caches
+between calls to getppid() (note: syscall_trace is active and calls
+a marker upon syscall entry and syscall exit; markers are disarmed).
+This test is done in user-space, so there are some delays due to IRQs
+coming and to the scheduler. (UP 2.6.22-rc6-mm1 kernel, task with -20
+nice level)
+
+My first set of results : Linear cache trashing, turned out not to be
+very interesting, because it seems like the linearity of the memset on a
+full array is somehow detected and it does not "really" trash the
+caches.
+
+Now the most interesting result : Random walk L1 and L2 trashing
+surrounding a getppid() call.
+
+- Markers compiled out (but syscall_trace execution forced)
+number of tests : 10000
+No memory pressure
+Reading timestamps takes 108.033 cycles
+getppid : 1681.4 cycles
+With memory pressure
+Reading timestamps takes 102.938 cycles
+getppid : 15691.6 cycles
+
+
+- With the immediate values based markers:
+number of tests : 10000
+No memory pressure
+Reading timestamps takes 108.006 cycles
+getppid : 1681.84 cycles
+With memory pressure
+Reading timestamps takes 100.291 cycles
+getppid : 11793 cycles
+
+
+- With global variables based markers:
+number of tests : 10000
+No memory pressure
+Reading timestamps takes 107.999 cycles
+getppid : 1669.06 cycles
+With memory pressure
+Reading timestamps takes 102.839 cycles
+getppid : 12535 cycles
+
+The result is quite interesting in that the kernel is slower without
+markers than with markers. I explain it by the fact that the data
+accessed is not layed out in the same manner in the cache lines when the
+markers are compiled in or out. It seems that it aligns the function's
+data better to compile-in the markers in this case.
+
+But since the interesting comparison is between the immediate values and
+global variables based markers, and because they share the same memory
+layout, except for the movl being replaced by a movz, we see that the
+global variable based markers (2 markers) adds 742 cycles to each system
+call (syscall entry and exit are traced and memory locations for both
+global variables lie on the same cache line).
+
+
+- Test redone with less iterations, but with error estimates
+
+10 runs of 100 iterations each: Tests done on a 3GHz P4. Here I run getppid with
+syscall trace inactive, comparing memory pressure and w/o memory pressure.
+(sorry, my system is not setup to execute syscall_trace this time, but it will
+make the point anyway).
+
+No memory pressure
+Reading timestamps:     150.92 cycles,     std dev.    1.01 cycles
+getppid:               1462.09 cycles,     std dev.   18.87 cycles
+
+With memory pressure
+Reading timestamps:     578.22 cycles,     std dev.  269.51 cycles
+getppid:              17113.33 cycles,     std dev. 1655.92 cycles
+
+
+Now for memory read timing: (10 runs, branches per test: 100000)
+Memory read based branch:
+                       644.09 cycles,      std dev.   11.39 cycles
+L1 cache hit based branch:
+                        88.16 cycles,      std dev.    1.35 cycles
+
+
+So, now that we have the raw results, let's calculate:
+
+Memory read:
+644.09±11.39 - 88.16±1.35 = 555.93±11.46 cycles
+
+Getppid without memory pressure:
+1462.09±18.87 - 150.92±1.01 = 1311.17±18.90 cycles
+
+Getppid with memory pressure:
+17113.33±1655.92 - 578.22±269.51 = 16535.11±1677.71 cycles
+
+Therefore, if we add 2 markers not based on immediate values to the getppid
+code, which would add 2 memory reads, we would add
+2 * 555.93±12.74 = 1111.86±25.48 cycles
+
+Therefore,
+
+1111.86±25.48 / 16535.11±1677.71 = 0.0672
+ relative error: sqrt(((25.48/1111.86)^2)+((1677.71/16535.11)^2))
+                     = 0.1040
+ absolute error: 0.1040 * 0.0672 = 0.0070
+
+Therefore: 0.0672±0.0070 * 100% = 6.72±0.70 %
+
+We can therefore affirm that adding 2 markers to getppid, on a system with high
+memory pressure, would have a performance hit of at least 6.0% on the system
+call time, all within the uncertainty limits of these tests. The same applies to
+other kernel code paths. The smaller those code paths are, the highest the
+impact ratio will be.
+
+Therefore, not only is it interesting to use the immediate values to dynamically
+activate dormant code such as the markers, but I think it should also be
+considered as a replacement for many of the "read mostly" static variables.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 8/8] Scheduler Profiling - Use Immediate Values
  2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2007-08-27 15:59 ` [patch 7/8] Immediate Values - Documentation Mathieu Desnoyers
@ 2007-08-27 15:59 ` Mathieu Desnoyers
  7 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 15:59 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: profiling-use-immediate-values.patch --]
[-- Type: text/plain, Size: 7121 bytes --]

Use immediate values with lower d-cache hit in optimized version as a
condition for scheduler profiling call.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 drivers/kvm/svm.c       |    2 +-
 drivers/kvm/vmx.c       |    2 +-
 include/linux/profile.h |   10 ++++------
 kernel/profile.c        |   38 ++++++++++++++++++++++++++------------
 kernel/sched.c          |    3 ++-
 5 files changed, 34 insertions(+), 21 deletions(-)

Index: linux-2.6-lttng/kernel/profile.c
===================================================================
--- linux-2.6-lttng.orig/kernel/profile.c	2007-08-27 11:16:08.000000000 -0400
+++ linux-2.6-lttng/kernel/profile.c	2007-08-27 11:49:25.000000000 -0400
@@ -42,9 +42,6 @@ int (*timer_hook)(struct pt_regs *) __re
 static atomic_t *prof_buffer;
 static unsigned long prof_len, prof_shift;
 
-int prof_on __read_mostly;
-EXPORT_SYMBOL_GPL(prof_on);
-
 static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
@@ -52,6 +49,14 @@ static DEFINE_PER_CPU(int, cpu_profile_f
 static DEFINE_MUTEX(profile_flip_mutex);
 #endif /* CONFIG_SMP */
 
+/* Immediate values */
+immediate_char_t sleep_profiling __read_mostly,
+			sched_profiling __read_mostly,
+			kvm_profiling __read_mostly,
+			cpu_profiling __read_mostly;
+EXPORT_SYMBOL_GPL(kvm_profiling);
+EXPORT_SYMBOL_GPL(cpu_profiling);
+
 static int __init profile_setup(char * str)
 {
 	static char __initdata schedstr[] = "schedule";
@@ -60,7 +65,7 @@ static int __init profile_setup(char * s
 	int par;
 
 	if (!strncmp(str, sleepstr, strlen(sleepstr))) {
-		prof_on = SLEEP_PROFILING;
+		immediate_set_early(&sleep_profiling, 1);
 		if (str[strlen(sleepstr)] == ',')
 			str += strlen(sleepstr) + 1;
 		if (get_option(&str, &par))
@@ -69,7 +74,7 @@ static int __init profile_setup(char * s
 			"kernel sleep profiling enabled (shift: %ld)\n",
 			prof_shift);
 	} else if (!strncmp(str, schedstr, strlen(schedstr))) {
-		prof_on = SCHED_PROFILING;
+		immediate_set_early(&sched_profiling, 1);
 		if (str[strlen(schedstr)] == ',')
 			str += strlen(schedstr) + 1;
 		if (get_option(&str, &par))
@@ -78,7 +83,7 @@ static int __init profile_setup(char * s
 			"kernel schedule profiling enabled (shift: %ld)\n",
 			prof_shift);
 	} else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
-		prof_on = KVM_PROFILING;
+		immediate_set_early(&kvm_profiling, 1);
 		if (str[strlen(kvmstr)] == ',')
 			str += strlen(kvmstr) + 1;
 		if (get_option(&str, &par))
@@ -88,7 +93,7 @@ static int __init profile_setup(char * s
 			prof_shift);
 	} else if (get_option(&str, &par)) {
 		prof_shift = par;
-		prof_on = CPU_PROFILING;
+		immediate_set_early(&cpu_profiling, 1);
 		printk(KERN_INFO "kernel profiling enabled (shift: %ld)\n",
 			prof_shift);
 	}
@@ -99,7 +104,10 @@ __setup("profile=", profile_setup);
 
 void __init profile_init(void)
 {
-	if (!prof_on) 
+	if (!_immediate_read(&sleep_profiling) &&
+		!_immediate_read(&sched_profiling) &&
+		!_immediate_read(&kvm_profiling) &&
+		!_immediate_read(&cpu_profiling))
 		return;
  
 	/* only text is profiled */
@@ -288,7 +296,7 @@ void profile_hits(int type, void *__pc, 
 	int i, j, cpu;
 	struct profile_hit *hits;
 
-	if (prof_on != type || !prof_buffer)
+	if (!prof_buffer)
 		return;
 	pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
 	i = primary = (pc & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
@@ -398,7 +406,7 @@ void profile_hits(int type, void *__pc, 
 {
 	unsigned long pc;
 
-	if (prof_on != type || !prof_buffer)
+	if (!prof_buffer)
 		return;
 	pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
 	atomic_add(nr_hits, &prof_buffer[min(pc, prof_len - 1)]);
@@ -555,7 +563,10 @@ static int __init create_hash_tables(voi
 	}
 	return 0;
 out_cleanup:
-	prof_on = 0;
+	immediate_set_early(&sleep_profiling, 0);
+	immediate_set_early(&sched_profiling, 0);
+	immediate_set_early(&kvm_profiling, 0);
+	immediate_set_early(&cpu_profiling, 0);
 	smp_mb();
 	on_each_cpu(profile_nop, NULL, 0, 1);
 	for_each_online_cpu(cpu) {
@@ -582,7 +593,10 @@ static int __init create_proc_profile(vo
 {
 	struct proc_dir_entry *entry;
 
-	if (!prof_on)
+	if (!_immediate_read(&sleep_profiling) &&
+		!_immediate_read(&sched_profiling) &&
+		!_immediate_read(&kvm_profiling) &&
+		!_immediate_read(&cpu_profiling))
 		return 0;
 	if (create_hash_tables())
 		return -1;
Index: linux-2.6-lttng/include/linux/profile.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/profile.h	2007-08-27 11:16:08.000000000 -0400
+++ linux-2.6-lttng/include/linux/profile.h	2007-08-27 11:49:25.000000000 -0400
@@ -7,10 +7,12 @@
 #include <linux/init.h>
 #include <linux/cpumask.h>
 #include <linux/cache.h>
+#include <linux/immediate.h>
 
 #include <asm/errno.h>
 
-extern int prof_on __read_mostly;
+extern immediate_char_t sleep_profiling, sched_profiling, kvm_profiling,
+		cpu_profiling;
 
 #define CPU_PROFILING	1
 #define SCHED_PROFILING	2
@@ -35,11 +37,7 @@ void profile_hits(int, void *ip, unsigne
  */
 static inline void profile_hit(int type, void *ip)
 {
-	/*
-	 * Speedup for the common (no profiling enabled) case:
-	 */
-	if (unlikely(prof_on == type))
-		profile_hits(type, ip, 1);
+	profile_hits(type, ip, 1);
 }
 
 #ifdef CONFIG_PROC_FS
Index: linux-2.6-lttng/kernel/sched.c
===================================================================
--- linux-2.6-lttng.orig/kernel/sched.c	2007-08-27 11:16:08.000000000 -0400
+++ linux-2.6-lttng/kernel/sched.c	2007-08-27 11:49:25.000000000 -0400
@@ -3440,7 +3440,8 @@ static inline void schedule_debug(struct
 	if (unlikely(in_atomic_preempt_off()) && unlikely(!prev->exit_state))
 		__schedule_bug(prev);
 
-	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
+	immediate_if (&sched_profiling)
+		profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
 	schedstat_inc(this_rq(), sched_cnt);
 }
Index: linux-2.6-lttng/drivers/kvm/svm.c
===================================================================
--- linux-2.6-lttng.orig/drivers/kvm/svm.c	2007-08-27 11:16:08.000000000 -0400
+++ linux-2.6-lttng/drivers/kvm/svm.c	2007-08-27 11:49:25.000000000 -0400
@@ -1570,7 +1570,7 @@ again:
 	/*
 	 * Profile KVM exit RIPs:
 	 */
-	if (unlikely(prof_on == KVM_PROFILING))
+	immediate_if (&kvm_profiling)
 		profile_hit(KVM_PROFILING,
 			(void *)(unsigned long)svm->vmcb->save.rip);
 
Index: linux-2.6-lttng/drivers/kvm/vmx.c
===================================================================
--- linux-2.6-lttng.orig/drivers/kvm/vmx.c	2007-08-27 11:16:08.000000000 -0400
+++ linux-2.6-lttng/drivers/kvm/vmx.c	2007-08-27 11:49:25.000000000 -0400
@@ -2232,7 +2232,7 @@ again:
 	/*
 	 * Profile KVM exit RIPs:
 	 */
-	if (unlikely(prof_on == KVM_PROFILING))
+	immediate_if (&kvm_profiling)
 		profile_hit(KVM_PROFILING, (void *)vmcs_readl(GUEST_RIP));
 
 	r = kvm_handle_exit(kvm_run, vcpu);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 7/8] Immediate Values - Documentation
  2007-08-27 15:59 ` [patch 7/8] Immediate Values - Documentation Mathieu Desnoyers
@ 2007-09-20 10:46   ` Denys Vlasenko
  2007-09-21 13:31     ` Mathieu Desnoyers
  0 siblings, 1 reply; 17+ messages in thread
From: Denys Vlasenko @ 2007-09-20 10:46 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel

On Monday 27 August 2007 16:59, Mathieu Desnoyers wrote:
> +We can therefore affirm that adding 2 markers to getppid, on a system with high
> +memory pressure, would have a performance hit of at least 6.0% on the system
> +call time, all within the uncertainty limits of these tests. The same applies to
> +other kernel code paths. The smaller those code paths are, the highest the
> +impact ratio will be.

Immediates make code bigger, right?
What will happen on a system with high *icache* pressure?
There a lot of inline happy and/or C++ folks out there
in the userland, they routinely have programs in tens of megabytes range.

getppid is one of the lightest syscalls out there.
What kind of speedup do you see on a real-world test
(two processes exchaging data through pipes, for example)?

> +Therefore, not only is it interesting to use the immediate values to dynamically
> +activate dormant code such as the markers, but I think it should also be
> +considered as a replacement for many of the "read mostly" static variables.

What effect that will have on "size vmlinux" on AMD64?
--
vda

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 7/8] Immediate Values - Documentation
  2007-09-20 10:46   ` Denys Vlasenko
@ 2007-09-21 13:31     ` Mathieu Desnoyers
  2007-09-21 15:51       ` Denys Vlasenko
  0 siblings, 1 reply; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-09-21 13:31 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: akpm, linux-kernel

* Denys Vlasenko (vda.linux@googlemail.com) wrote:
> On Monday 27 August 2007 16:59, Mathieu Desnoyers wrote:
> > +We can therefore affirm that adding 2 markers to getppid, on a system with high
> > +memory pressure, would have a performance hit of at least 6.0% on the system
> > +call time, all within the uncertainty limits of these tests. The same applies to
> > +other kernel code paths. The smaller those code paths are, the highest the
> > +impact ratio will be.
> 
> Immediates make code bigger, right?

Nope.

Example:

char x;

void testb(void)
{
        if (x > 5)
                testa();
}

Would turn into:
  56:   b0 00                   mov    $0x0,%al
  58:   3c 05                   cmp    $0x5,%al
  5a:   7e 05                   jle    61 <testb+0x11>

(6 bytes)

Rather than:

  56:   80 3d 00 00 00 00 05    cmpb   $0x5,0x0
  5d:   7e 05                   jle    64 <testb+0x14>

(9 bytes)

So actually, immediate values well used make the code smaller. By the
way, I recommend using the smallest immediate values required, which
will often be a single byte.

> What will happen on a system with high *icache* pressure?

It *helps* :) And by the way, icache on recent x86 and x86_64 is a trace
cache, so I don't see your point anyway.

> There a lot of inline happy and/or C++ folks out there
> in the userland, they routinely have programs in tens of megabytes range.
> 
> getppid is one of the lightest syscalls out there.
> What kind of speedup do you see on a real-world test
> (two processes exchaging data through pipes, for example)?
> 

With the size of the caches we currently have, that kind of workload
will not show any measurable difference: the signal/noise ratio is way
to small to detect that kind of performance difference under such
workload. Try it if you want.

The real-world speedup I am interested into is to have almost -zero-
tracer impact, which imples being undetectable even in the smallest and
shortest functions. I guess nobody is interested in adding a measurable
performance hit to kmalloc fast path, right ?


> > +Therefore, not only is it interesting to use the immediate values to dynamically
> > +activate dormant code such as the markers, but I think it should also be
> > +considered as a replacement for many of the "read mostly" static variables.
> 
> What effect that will have on "size vmlinux" on AMD64?

Without considering kernel/immediate.o, it will make the code smaller
and add 3*8bytes=24bytes of data in the __immediate section per
immediate value reference (data only used for updates).

Mathieu

> --
> vda

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 7/8] Immediate Values - Documentation
  2007-09-21 13:31     ` Mathieu Desnoyers
@ 2007-09-21 15:51       ` Denys Vlasenko
  0 siblings, 0 replies; 17+ messages in thread
From: Denys Vlasenko @ 2007-09-21 15:51 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel

On Friday 21 September 2007 14:31, Mathieu Desnoyers wrote:
> > Immediates make code bigger, right?
> 
> Nope.
> 
> Example:
> 
> char x;
> 
> void testb(void)
> {
>         if (x > 5)
>                 testa();
> }
> 
> Would turn into:
>   56:   b0 00                   mov    $0x0,%al
>   58:   3c 05                   cmp    $0x5,%al
>   5a:   7e 05                   jle    61 <testb+0x11>
> 
> (6 bytes)
> 
> Rather than:
> 
>   56:   80 3d 00 00 00 00 05    cmpb   $0x5,0x0
>   5d:   7e 05                   jle    64 <testb+0x14>
> 
> (9 bytes)

For 32-bit value, you won't be so lucky.

> So actually, immediate values well used make the code smaller. By the
> way, I recommend using the smallest immediate values required, which
> will often be a single byte.

I agree on this wholeheartedy. However, current kernel mostly uses int
even for yes/no style flags.

> > getppid is one of the lightest syscalls out there.
> > What kind of speedup do you see on a real-world test
> > (two processes exchaging data through pipes, for example)?
> > 
> 
> With the size of the caches we currently have, that kind of workload
> will not show any measurable difference: the signal/noise ratio is way
> to small to detect that kind of performance difference under such
> workload. Try it if you want.

Exactly my point: this speedup is not measurable on realistic workload.

> The real-world speedup I am interested into is to have almost -zero-
> tracer impact, which imples being undetectable even in the smallest and
> shortest functions. I guess nobody is interested in adding a measurable
> performance hit to kmalloc fast path, right?
> 
> > > +Therefore, not only is it interesting to use the immediate values to dynamically
> > > +activate dormant code such as the markers, but I think it should also be
> > > +considered as a replacement for many of the "read mostly" static variables.
> > 
> > What effect that will have on "size vmlinux" on AMD64?
> 
> Without considering kernel/immediate.o, it will make the code smaller
> and add 3*8bytes=24bytes of data in the __immediate section per
> immediate value reference (data only used for updates).

Yes. *Per immediate value reference*.

Therefore I don't think it's wise to recommend to use __immediate
for any variables which are referenced many times. "Many" defined as
"more than ten".

IOW: I think that this last paragraph shouldn't be there:

On Tuesday 18 September 2007 22:07, Mathieu Desnoyers wrote:
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> ---
>  Documentation/immediate.txt |  228 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 228 insertions(+)
>...
> +Therefore, not only is it interesting to use the immediate values to dynamically
> +activate dormant code such as the markers, but I think it should also be
> +considered as a replacement for many of the "read-mostly" static variables.


A few crazy ideas how you can make it slightly less painful for 64-bit arch:

* Pack last long ('size') into low bits of other fields.
  (I expect link stage problems, tho)


* Make last field uint8_t and pack whole struct into 17 bytes (__attribute__((packed)))
  instead of 24 bytes.
  Expect align-happy folks faint left and right at such horrendous crime :) but
  other than that, it will work. Updates of immediates will *maybe* get a tiny bit slower
  (which is unimportant anyway).

  [btw, this can be done for i386 too]


* Turn long's into int32_t, since kernel's text addresses (at least on AMD64)
  fit into int32_t (sign-extend will give you correct 64-bit address):

  ffffffff80200000 A _text
  ffffffff80200000 T startup_64
  ffffffff802000b7 t ident_complete
  ffffffff80200110 T secondary_startup_64
  ffffffff802001a8 T initial_code
  ffffffff802001b0 T init_rsp
  ffffffff802001b8 t bad_address
  ffffffff802001c0 T early_idt_handler

  [I hope there is suitable reloc type for AMD64 and ld won't complain]
--
vda

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 2/8] Immediate Values - Architecture Independent Code
  2007-09-06 20:02 [patch 0/8] Immediate Values for 2.6.23-rc4-mm1 Mathieu Desnoyers
@ 2007-09-06 20:02 ` Mathieu Desnoyers
  0 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-09-06 20:02 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 15435 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _immediate_read() version, which uses standard global
variables, and optimized per architecture immediate_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__immediate" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Changelog:

- section __immediate is already SHF_ALLOC
- Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
  the if (immediateindex) is unnecessary here.


Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-generic/vmlinux.lds.h |    7 ++
 include/linux/immediate.h         |  119 +++++++++++++++++++++++++++++++++++
 include/linux/module.h            |    6 +
 init/main.c                       |    2 
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  128 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   10 ++
 7 files changed, 273 insertions(+)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2007-09-06 15:02:50.000000000 -0400
@@ -0,0 +1,119 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+#include <asm/immediate.h>
+#else
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+struct module;
+
+/**
+ * immediate_read - read immediate variable
+ * @var: pointer of type immediate_*_t
+ *
+ * Reads the value of @var.
+ */
+#define immediate_read(var)		_immediate_read(var)
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(var, i)		((var)->value = (i))
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Must be called with module_mutex held.
+ */
+#define _immediate_set(var, i)		immediate_set(var, i)
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Should be used for early boot updates.
+ */
+#define immediate_set_early(var, i)	immediate_set(var, i)
+
+/**
+ * immediate_if - if () statement depending on an immediate value
+ * @var: pointer of type immediate_*_t
+ *
+ * Use as an if () statement depending on an immediate value.
+ */
+#define immediate_if(var)		if (immediate_read(var))
+
+/*
+ * Internal update functions.
+ */
+static inline void module_immediate_setup(struct module *mod) { }
+static inline void immediate_update_early(void) { }
+#endif
+
+/**
+ * DEFINE_IMMEDIATE_TYPE - Define an immediate type
+ * @type: type that the immediate should hold
+ * @name: name of the immediate type
+ *
+ * Define new immediate types. Naming scheme is immediate_*_t.
+ * Always access these types with the provided functions.
+ */
+#define DEFINE_IMMEDIATE_TYPE(type, name) \
+	typedef struct { type value; } name
+
+/*
+ * Standard pre-defined immediate types.
+ */
+DEFINE_IMMEDIATE_TYPE(char, immediate_char_t);
+DEFINE_IMMEDIATE_TYPE(short, immediate_short_t);
+DEFINE_IMMEDIATE_TYPE(int, immediate_int_t);
+DEFINE_IMMEDIATE_TYPE(long, immediate_long_t);
+DEFINE_IMMEDIATE_TYPE(void*, immediate_void_ptr_t);
+
+/**
+ * IMMEDIATE_INIT - Static initialization of an immediate variable
+ * @i: required value
+ *
+ * Use this macro to initialize an immediate value to an initial static
+ * value.
+ */
+#define IMMEDIATE_INIT(i)		{ (i) }
+
+/**
+ * _immediate_read - Read immediate value with standard memory load.
+ * @var: pointer of type immediate_*_t
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _immediate_read(var)		(var)->value
+
+/*
+ * _immediate_if - if () statement depending on immediate value (memory load)
+ * @var: pointer of type immediate_*_t
+ *
+ * Force the use of a normal if () statement depending on an immediate value.
+ */
+#define _immediate_if(var)		if (_immediate_read(var))
+
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-06 14:32:10.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-06 15:06:36.000000000 -0400
@@ -122,6 +122,13 @@
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
 									\
+	/* Immediate values: pointers */				\
+	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__start___immediate) = .;		\
+		*(__immediate)						\
+		VMLINUX_SYMBOL(__stop___immediate) = .;			\
+	}								\
+									\
 	/* Kernel symbol table: strings */				\
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
 		*(__ksymtab_strings)					\
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-09-06 15:02:49.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-09-06 15:06:36.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <asm/local.h>
 
 #include <asm/module.h>
@@ -374,6 +375,11 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+
+#ifdef CONFIG_IMMEDIATE
+	const struct __immediate *immediate;
+	unsigned int num_immediate;
+#endif
 };
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-09-06 15:02:49.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-09-06 15:07:03.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1717,6 +1718,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1813,6 +1815,7 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1963,6 +1966,11 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+	mod->num_immediate =
+		sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2028,6 +2036,8 @@ static struct module *load_module(void _
 		 goto nomodsectinfo;
 #endif
 
+	module_immediate_setup(mod);
+
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2007-09-06 15:02:50.000000000 -0400
@@ -0,0 +1,128 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+
+extern const struct __immediate __start___immediate[];
+extern const struct __immediate __stop___immediate[];
+
+/*
+ * modules_mutex nests inside immediate_mutex. immediate_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(immediate_mutex);
+
+/*
+ * Sets a range of immediates to a enabled state : set the enable bit.
+ */
+static inline void _immediate_update_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+	int ret;
+
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&immediate_mutex);
+		kernel_text_lock();
+		ret = arch_immediate_update(iter);
+		kernel_text_unlock();
+		if (ret)
+			printk(KERN_WARNING "Invalid immediate value. "
+					    "Variable at %p, "
+					    "instruction at %p, size %lu\n",
+					    (void*)iter->immediate,
+					    (void*)iter->var, iter->size);
+		mutex_unlock(&immediate_mutex);
+	}
+}
+
+#ifdef CONFIG_MODULES
+/**
+ * module_immediate_setup - Update immediate values in a module
+ * @mod: pointer to the struct module
+ *
+ * Setup the immediate according to the variable upon which it depends.  Called
+ * by load_module with module_mutex held. This mutex protects against concurrent
+ * modifications to modules'immediates. Therefore, since
+ * module_immediate_setup() does not modify builtin immediates, it does not need
+ * to take the immediate_mutex.
+ */
+void module_immediate_setup(struct module *mod)
+{
+	_immediate_update_range(mod->immediate,
+				mod->immediate+mod->num_immediate);
+}
+
+/*
+ * immediate mutex nests inside the modules mutex.
+ */
+static inline void immediate_update_modules(int lock)
+{
+	struct module *mod;
+
+	if (lock)
+		mutex_lock(&module_mutex);
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		_immediate_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+	if (lock)
+		mutex_unlock(&module_mutex);
+}
+#else
+static inline void immediate_update_modules(int lock) { }
+#endif
+
+/**
+ * immediate_update - update all immediate values in the kernel
+ * @lock: should a module_mutex be taken ?
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void immediate_update(int lock)
+{
+	/* Core kernel immediates */
+	_immediate_update_range(__start___immediate, __stop___immediate);
+	/* immediates in modules. */
+	immediate_update_modules(lock);
+}
+EXPORT_SYMBOL_GPL(immediate_update);
+
+static void __init immediate_update_early_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+
+	for (iter = begin; iter < end; iter++)
+		arch_immediate_update_early(iter);
+}
+
+/**
+ * immediate_update_early - Update immediate values at boot time
+ *
+ * Update the immediate values to the state of the variables they refer to. It
+ * is done before SMP is active, at the very beginning of start_kernel().
+ */
+void __init immediate_update_early(void)
+{
+	immediate_update_early_range(__start___immediate, __stop___immediate);
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2007-09-06 14:32:10.000000000 -0400
+++ linux-2.6-lttng/init/main.c	2007-09-06 15:02:50.000000000 -0400
@@ -56,6 +56,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -525,6 +526,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	container_init_early();
+	immediate_update_early();
 
 	local_irq_disable();
 	early_boot_irqs_off();
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2007-09-06 14:32:10.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile	2007-09-06 15:05:03.000000000 -0400
@@ -61,6 +61,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 2/8] Immediate Values - Architecture Independent Code
  2007-08-20 20:23 [patch 0/8] Immediate Values Mathieu Desnoyers
@ 2007-08-20 20:23 ` Mathieu Desnoyers
  0 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-20 20:23 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 15624 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _immediate_read() version, which uses standard global
variables, and optimized per architecture immediate_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__immediate" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-generic/vmlinux.lds.h |    7 ++
 include/linux/immediate.h         |  119 +++++++++++++++++++++++++++++++++++
 include/linux/module.h            |    6 +
 init/main.c                       |    2 
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  128 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   16 ++++
 7 files changed, 279 insertions(+)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2007-08-07 13:30:21.000000000 -0400
@@ -0,0 +1,119 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+#include <asm/immediate.h>
+#else
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+struct module;
+
+/**
+ * immediate_read - read immediate variable
+ * @var: pointer of type immediate_*_t
+ *
+ * Reads the value of @var.
+ */
+#define immediate_read(var)		_immediate_read(var)
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(var, i)		((var)->value = (i))
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Must be called with module_mutex held.
+ */
+#define _immediate_set(var, i)		immediate_set(var, i)
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Should be used for early boot updates.
+ */
+#define immediate_set_early(var, i)	immediate_set(var, i)
+
+/**
+ * immediate_if - if () statement depending on an immediate value
+ * @var: pointer of type immediate_*_t
+ *
+ * Use as an if () statement depending on an immediate value.
+ */
+#define immediate_if(var)		if (immediate_read(var))
+
+/*
+ * Internal update functions.
+ */
+static inline void module_immediate_setup(struct module *mod) { }
+static inline void immediate_update_early(void) { }
+#endif
+
+/**
+ * DEFINE_IMMEDIATE_TYPE - Define an immediate type
+ * @type: type that the immediate should hold
+ * @name: name of the immediate type
+ *
+ * Define new immediate types. Naming scheme is immediate_*_t.
+ * Always access these types with the provided functions.
+ */
+#define DEFINE_IMMEDIATE_TYPE(type, name) \
+	typedef struct { type value; } name
+
+/*
+ * Standard pre-defined immediate types.
+ */
+DEFINE_IMMEDIATE_TYPE(char, immediate_char_t);
+DEFINE_IMMEDIATE_TYPE(short, immediate_short_t);
+DEFINE_IMMEDIATE_TYPE(int, immediate_int_t);
+DEFINE_IMMEDIATE_TYPE(long, immediate_long_t);
+DEFINE_IMMEDIATE_TYPE(void*, immediate_void_ptr_t);
+
+/**
+ * IMMEDIATE_INIT - Static initialization of an immediate variable
+ * @i: required value
+ *
+ * Use this macro to initialize an immediate value to an initial static
+ * value.
+ */
+#define IMMEDIATE_INIT(i)		{ (i) }
+
+/**
+ * _immediate_read - Read immediate value with standard memory load.
+ * @var: pointer of type immediate_*_t
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _immediate_read(var)		(var)->value
+
+/*
+ * _immediate_if - if () statement depending on immediate value (memory load)
+ * @var: pointer of type immediate_*_t
+ *
+ * Force the use of a normal if () statement depending on an immediate value.
+ */
+#define _immediate_if(var)		if (_immediate_read(var))
+
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-08-07 13:28:12.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-08-07 13:29:55.000000000 -0400
@@ -122,6 +122,13 @@
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
 									\
+	/* Immediate values: pointers */				\
+	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__start___immediate) = .;		\
+		*(__immediate)						\
+		VMLINUX_SYMBOL(__stop___immediate) = .;			\
+	}								\
+									\
 	/* Kernel symbol table: strings */				\
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
 		*(__ksymtab_strings)					\
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-08-07 13:29:52.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-08-07 13:29:55.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <asm/local.h>
 
 #include <asm/module.h>
@@ -374,6 +375,11 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+
+#ifdef CONFIG_IMMEDIATE
+	const struct __immediate *immediate;
+	unsigned int num_immediate;
+#endif
 };
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-08-07 13:29:52.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-08-07 13:29:55.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1719,6 +1720,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex = 0;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1815,6 +1817,9 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+#ifdef CONFIG_IMMEDIATE
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
+#endif
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1825,6 +1830,8 @@ static struct module *load_module(void _
 #endif
 	if (unwindex)
 		sechdrs[unwindex].sh_flags |= SHF_ALLOC;
+	if (immediateindex)
+		sechdrs[immediateindex].sh_flags |= SHF_ALLOC;
 
 	/* Check module struct version now, before we try to use module. */
 	if (!check_modstruct_version(sechdrs, versindex, mod)) {
@@ -1965,6 +1972,13 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	if (immediateindex) {
+		mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+		mod->num_immediate =
+			sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+	}
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2031,6 +2045,8 @@ static struct module *load_module(void _
 	 }
 #endif
 
+	module_immediate_setup(mod);
+
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2007-08-07 13:28:12.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile	2007-08-07 13:29:55.000000000 -0400
@@ -57,6 +57,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2007-08-07 13:30:27.000000000 -0400
@@ -0,0 +1,128 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+
+extern const struct __immediate __start___immediate[];
+extern const struct __immediate __stop___immediate[];
+
+/*
+ * modules_mutex nests inside immediate_mutex. immediate_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(immediate_mutex);
+
+/*
+ * Sets a range of immediates to a enabled state : set the enable bit.
+ */
+static inline void _immediate_update_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+	int ret;
+
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&immediate_mutex);
+		kernel_text_lock();
+		ret = arch_immediate_update(iter);
+		kernel_text_unlock();
+		if (ret)
+			printk(KERN_WARNING "Invalid immediate value. "
+					    "Variable at %p, "
+					    "instruction at %p, size %lu\n",
+					    (void*)iter->immediate,
+					    (void*)iter->var, iter->size);
+		mutex_unlock(&immediate_mutex);
+	}
+}
+
+#ifdef CONFIG_MODULES
+/**
+ * module_immediate_setup - Update immediate values in a module
+ * @mod: pointer to the struct module
+ *
+ * Setup the immediate according to the variable upon which it depends.  Called
+ * by load_module with module_mutex held. This mutex protects against concurrent
+ * modifications to modules'immediates. Therefore, since
+ * module_immediate_setup() does not modify builtin immediates, it does not need
+ * to take the immediate_mutex.
+ */
+void module_immediate_setup(struct module *mod)
+{
+	_immediate_update_range(mod->immediate,
+				mod->immediate+mod->num_immediate);
+}
+
+/*
+ * immediate mutex nests inside the modules mutex.
+ */
+static inline void immediate_update_modules(int lock)
+{
+	struct module *mod;
+
+	if (lock)
+		mutex_lock(&module_mutex);
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		_immediate_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+	if (lock)
+		mutex_unlock(&module_mutex);
+}
+#else
+static inline void immediate_update_modules(int lock) { }
+#endif
+
+/**
+ * immediate_update - update all immediate values in the kernel
+ * @lock: should a module_mutex be taken ?
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void immediate_update(int lock)
+{
+	/* Core kernel immediates */
+	_immediate_update_range(__start___immediate, __stop___immediate);
+	/* immediates in modules. */
+	immediate_update_modules(lock);
+}
+EXPORT_SYMBOL_GPL(immediate_update);
+
+static void __init immediate_update_early_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+
+	for (iter = begin; iter < end; iter++)
+		arch_immediate_update_early(iter);
+}
+
+/**
+ * immediate_update_early - Update immediate values at boot time
+ *
+ * Update the immediate values to the state of the variables they refer to. It
+ * is done before SMP is active, at the very beginning of start_kernel().
+ */
+void __init immediate_update_early(void)
+{
+	immediate_update_early_range(__start___immediate, __stop___immediate);
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2007-08-07 13:28:12.000000000 -0400
+++ linux-2.6-lttng/init/main.c	2007-08-07 13:29:55.000000000 -0400
@@ -56,6 +56,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -521,6 +522,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	container_init_early();
+	immediate_update_early();
 
 	local_irq_disable();
 	early_boot_irqs_off();

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 2/8] Immediate Values - Architecture Independent Code
  2007-08-13 20:51   ` Alexey Dobriyan
@ 2007-08-16 16:02     ` Mathieu Desnoyers
  0 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-16 16:02 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: akpm, linux-kernel

* Alexey Dobriyan (adobriyan@gmail.com) wrote:
> On Sun, Aug 12, 2007 at 11:07:04AM -0400, Mathieu Desnoyers wrote:
> > Immediate values are used as read mostly variables that are rarely updated. They
> > use code patching to modify the values inscribed in the instruction stream. It
> > provides a way to save precious cache lines that would otherwise have to be used
> > by these variables.
> > 
> > There is a generic _immediate_read() version, which uses standard global
> > variables, and optimized per architecture immediate_read() implementations,
> > which use a load immediate to remove a data cache hit. When the immediate values
> > functionnality is disabled in the kernel, it falls back to global variables.
> > 
> > It adds a new rodata section "__immediate" to place the pointers to the enable
> > value. Immediate values activation functions sits in kernel/immediate.c.
> > 
> > Immediate values refer to the memory address of a previously declared integer.
> > This integer holds the information about the state of the immediate values
> > associated, and must be accessed through the API found in linux/immediate.h.
> > 
> > At module load time, each immediate value is checked to see if it must be
> > enabled. It would be the case if the variable they refer to is exported from
> > another module and already enabled.
> > 
> > In the early stages of start_kernel(), the immediate values are updated to
> > reflect the state of the variable they refer to.
> > 
> > * Why should this be merged *
> > 
> > It improves performances on heavy memory I/O workloads.
> > 
> > An interesting result shows the potential this infrastructure has by
> > showing the slowdown a simple system call such as getppid() suffers when it is
> > used under heavy user-space cache trashing:
> > 
> > Random walk L1 and L2 trashing surrounding a getppid() call:
> > (note: in this test, do_syscal_trace was taken at each system call, see
> > Documentation/immediate.txt in these patches for details)
> > - No memory pressure :   getppid() takes  1573 cycles
> > - With memory pressure : getppid() takes 15589 cycles
> > 
> > We therefore have a slowdown of 10 times just to get the kernel variables from
> > memory. Another test on the same architecture (Intel P4) measured the memory
> > latency to be 559 cycles. Therefore, each cache line removed from the hot path
> > would improve the syscall time of 3.5% in these conditions.
> 
> I still think this is bad idea:
> 1) already existing CPU erratas against popular models. As I learned
>    yesterday -- SILENT reboot (often) or hang (rare).

I understand your fears. Actually, I had the same reaction when I heard
about the djprobes project. What I am doing with the immediate values
are a quite simpler version of live code patching which limits what has
to be patched to a very simple subset of instructions which layout
(alignment, etc) can be controlled.

I have taken the said erratas into consideration and developed the
algorithms that makes sure they won't be triggered. I would suggest that
you present test cases that proves your fear right, or if you could cite
documentation of the specific CPU models upon which you base your
comment.

Reproducing the errata both on PIII and Intel Core2 duo seems very hard
to achieve. If anyone out there have a "known" test case to trigger the
fault, I would gladly do some more testing to show the behavior without
my algorithm (non working) vs with my algorithm in the same conditions.


> 2) new type pretending to be standard and idiomatic -- immediate_t

Representing the data referred to by immediate_reads seems important to
me, because not doing so would result in people updating the variable
directly without using immediate_set, which, of course, would not update
all the references to the variable. If you have better suggestions, I am
open to them.

> 3) new almost-C-keyword -- immediate_if. What does it mean? You'll never
>    guess just by looking at it.

It's a special if() statement that uses its argument (an immediate
value) as a condition for a branch. The reason why it's not under the
form if (immediate_read(var)) is because I want to leave room from
improvement if gcc guys ever want to implement nop/jmp based branches
that could be patchable at runtime.

> 4) numbers! it's very entertaining to look at numbers with 4 digits
>    after comma and without any attempts to do error estimates.

Here are the test redone with error estimates :

10 runs of 100 iterations each: (I don't have more time to do the 10000
iterations I've done the first time, sorry, stats will take care of
showing the errors appropriately). Tests done on a 3GHz P4. Here I run
getppid with syscall trace inactive, comparing memory pressure and w/o
memory pressure. (sorry, my system is not setup to execute syscall_trace
this time, but it will make the point anyway).

No memory pressure
Reading timestamps:     150.92 cycles,     std dev.    1.01 cycles
getppid:               1462.09 cycles,     std dev.   18.87 cycles

With memory pressure
Reading timestamps:     578.22 cycles,     std dev.  269.51 cycles
getppid:              17113.33 cycles,     std dev. 1655.92 cycles


Now for memory read timing: (10 runs, branches per test: 100000)
Memory read based branch:
                       644.09 cycles,      std dev.   11.39 cycles
L1 cache hit based branch:
                        88.16 cycles,      std dev.    1.35 cycles


So, now that we have the raw results, let's calculate:

Memory read:
644.09±11.39 - 88.16±1.35 = 555.93±11.46 cycles

Getppid without memory pressure:
1462.09±18.87 - 150.92±1.01 = 1311.17±18.90 cycles

Getppid with memory pressure:
17113.33±1655.92 - 578.22±269.51 = 16535.11±1677.71 cycles

Therefore, if we add 2 markers not based on immediate values to the getppid
code, which would add 2 memory reads, we would add
2 * 555.93±12.74 = 1111.86±25.48 cycles

Therefore,

1111.86±25.48 / 16535.11±1677.71 = 0.0672
 relative error: sqrt(((25.48/1111.86)^2)+((1677.71/16535.11)^2))
                     = 0.1040
 absolute error: 0.1040 * 0.0672 = 0.0070

Therefore: 0.0672±0.0070 * 100% = 6.72±0.70 %

We can therefore affirm that adding 2 markers to getppid, on a system
with high memory pressure, would have a performance hit of at least 6.0
% on the system call time, all within the uncertainty limits of these
tests. The same applies to other kernel code paths. The smaller those
code paths are, the highest the impact ratio will be.


> 5) examples!
> 
> 	DEFINE_IMMEDIATE_TYPE(struct task_struct*, immediate_task_struct_ptr_t);
> 	immediate_task_struct_ptr_t myptr;
> 
>    this is close to hungarian notation, and threats were made to use
>    immediate stuff everywhere.

Same reason as 2). Note that the standard case is immediate_int_t,
immediate_long_t... and is not as bad as having to express the whole
pointer type in the type name. That's the best tradeoff between
restricting direct update of these pointers and code readability I have
seen. I am open to better solutions...


> 6) hundreds lines of new code playing with dynamic modifying
> 
> 
> For what you ask?
> 
> 
> For 1 (one) cacheline saving in schedule() which
> unlike getpid/getppid is non-trivial function, so all rosy numbers about
> speed savings aren't really applicable. And nobody measured how big it is
> on macro benchmark.
> 
> 
> So, this stuff should be put into CoolHacks case and put aside.
> It just not worth it!
> 

I think you are missing the whole point there. The scheduler profiling
patch is only an example of how immediate values can be used. Timer
stats would perfectly fit too, blktrace, .... actually, any feature that
is very useful to have in a distro, but where the choice much be made at
compile time can now be temporarily activated at runtime when needed.

"unlike getpid/getppid is non-trivial function" : I am instrumenting a
*lot* of kernel paths with LTTng, including do_syscall_entry and
do_syscall_exit, which happen to be executed upon *all* system calls
when tracing is active. I used getppid (tracing inactive, but
do_syscal_trace forced) to exemplify the performance difference of a
simple system call, but you must note that it applies to _every_ system
call in the kernel. I've seen people complain when impact of a dormant
tracer on the select() system call is noticeable _at all_, so, yes, a
3.5% impact on getppid is meaningful.

> A side note, folks at KS can discuss barrier for merging speed improvements.
> My feelings are around 1%.
> 

Should they discuss what performance deterioration is acceptable for new
infrastructures such as tracer too ? In the end, they may decide that
"it can be compiled away anyway", but distros will not ship kernels with
features that slows down the kernel and people will still complain that
Linux does not ship with a tracing alternative comparable to Dtrace.

The question that arises when selecting a new feature is not "what is
the average performance improvement of this", but rather "what is the
worse case performance hit". This is exactly what I address.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch 2/8] Immediate Values - Architecture Independent Code
  2007-08-12 15:07 ` [patch 2/8] Immediate Values - Architecture Independent Code Mathieu Desnoyers
@ 2007-08-13 20:51   ` Alexey Dobriyan
  2007-08-16 16:02     ` Mathieu Desnoyers
  0 siblings, 1 reply; 17+ messages in thread
From: Alexey Dobriyan @ 2007-08-13 20:51 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel

On Sun, Aug 12, 2007 at 11:07:04AM -0400, Mathieu Desnoyers wrote:
> Immediate values are used as read mostly variables that are rarely updated. They
> use code patching to modify the values inscribed in the instruction stream. It
> provides a way to save precious cache lines that would otherwise have to be used
> by these variables.
> 
> There is a generic _immediate_read() version, which uses standard global
> variables, and optimized per architecture immediate_read() implementations,
> which use a load immediate to remove a data cache hit. When the immediate values
> functionnality is disabled in the kernel, it falls back to global variables.
> 
> It adds a new rodata section "__immediate" to place the pointers to the enable
> value. Immediate values activation functions sits in kernel/immediate.c.
> 
> Immediate values refer to the memory address of a previously declared integer.
> This integer holds the information about the state of the immediate values
> associated, and must be accessed through the API found in linux/immediate.h.
> 
> At module load time, each immediate value is checked to see if it must be
> enabled. It would be the case if the variable they refer to is exported from
> another module and already enabled.
> 
> In the early stages of start_kernel(), the immediate values are updated to
> reflect the state of the variable they refer to.
> 
> * Why should this be merged *
> 
> It improves performances on heavy memory I/O workloads.
> 
> An interesting result shows the potential this infrastructure has by
> showing the slowdown a simple system call such as getppid() suffers when it is
> used under heavy user-space cache trashing:
> 
> Random walk L1 and L2 trashing surrounding a getppid() call:
> (note: in this test, do_syscal_trace was taken at each system call, see
> Documentation/immediate.txt in these patches for details)
> - No memory pressure :   getppid() takes  1573 cycles
> - With memory pressure : getppid() takes 15589 cycles
> 
> We therefore have a slowdown of 10 times just to get the kernel variables from
> memory. Another test on the same architecture (Intel P4) measured the memory
> latency to be 559 cycles. Therefore, each cache line removed from the hot path
> would improve the syscall time of 3.5% in these conditions.

I still think this is bad idea:
1) already existing CPU erratas against popular models. As I learned
   yesterday -- SILENT reboot (often) or hang (rare).
2) new type pretending to be standard and idiomatic -- immediate_t
3) new almost-C-keyword -- immediate_if. What does it mean? You'll never
   guess just by looking at it.
4) numbers! it's very entertaining to look at numbers with 4 digits
   after comma and without any attempts to do error estimates.
5) examples!

	DEFINE_IMMEDIATE_TYPE(struct task_struct*, immediate_task_struct_ptr_t);
	immediate_task_struct_ptr_t myptr;

   this is close to hungarian notation, and threats were made to use
   immediate stuff everywhere.
6) hundreds lines of new code playing with dynamic modifying


For what you ask?


For 1 (one) cacheline saving in schedule() which
unlike getpid/getppid is non-trivial function, so all rosy numbers about
speed savings aren't really applicable. And nobody measured how big it is
on macro benchmark.


So, this stuff should be put into CoolHacks case and put aside.
It just not worth it!




A side note, folks at KS can discuss barrier for merging speed improvements.
My feelings are around 1%.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch 2/8] Immediate Values - Architecture Independent Code
  2007-08-12 15:07 [patch 0/8] Immediate Values Mathieu Desnoyers
@ 2007-08-12 15:07 ` Mathieu Desnoyers
  2007-08-13 20:51   ` Alexey Dobriyan
  0 siblings, 1 reply; 17+ messages in thread
From: Mathieu Desnoyers @ 2007-08-12 15:07 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 15624 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _immediate_read() version, which uses standard global
variables, and optimized per architecture immediate_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__immediate" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-generic/vmlinux.lds.h |    7 ++
 include/linux/immediate.h         |  119 +++++++++++++++++++++++++++++++++++
 include/linux/module.h            |    6 +
 init/main.c                       |    2 
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  128 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   16 ++++
 7 files changed, 279 insertions(+)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2007-08-07 13:30:21.000000000 -0400
@@ -0,0 +1,119 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+#include <asm/immediate.h>
+#else
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+struct module;
+
+/**
+ * immediate_read - read immediate variable
+ * @var: pointer of type immediate_*_t
+ *
+ * Reads the value of @var.
+ */
+#define immediate_read(var)		_immediate_read(var)
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(var, i)		((var)->value = (i))
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Must be called with module_mutex held.
+ */
+#define _immediate_set(var, i)		immediate_set(var, i)
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @var: pointer of type immediate_*_t
+ * @i: required value
+ *
+ * Sets the value of @var. Should be used for early boot updates.
+ */
+#define immediate_set_early(var, i)	immediate_set(var, i)
+
+/**
+ * immediate_if - if () statement depending on an immediate value
+ * @var: pointer of type immediate_*_t
+ *
+ * Use as an if () statement depending on an immediate value.
+ */
+#define immediate_if(var)		if (immediate_read(var))
+
+/*
+ * Internal update functions.
+ */
+static inline void module_immediate_setup(struct module *mod) { }
+static inline void immediate_update_early(void) { }
+#endif
+
+/**
+ * DEFINE_IMMEDIATE_TYPE - Define an immediate type
+ * @type: type that the immediate should hold
+ * @name: name of the immediate type
+ *
+ * Define new immediate types. Naming scheme is immediate_*_t.
+ * Always access these types with the provided functions.
+ */
+#define DEFINE_IMMEDIATE_TYPE(type, name) \
+	typedef struct { type value; } name
+
+/*
+ * Standard pre-defined immediate types.
+ */
+DEFINE_IMMEDIATE_TYPE(char, immediate_char_t);
+DEFINE_IMMEDIATE_TYPE(short, immediate_short_t);
+DEFINE_IMMEDIATE_TYPE(int, immediate_int_t);
+DEFINE_IMMEDIATE_TYPE(long, immediate_long_t);
+DEFINE_IMMEDIATE_TYPE(void*, immediate_void_ptr_t);
+
+/**
+ * IMMEDIATE_INIT - Static initialization of an immediate variable
+ * @i: required value
+ *
+ * Use this macro to initialize an immediate value to an initial static
+ * value.
+ */
+#define IMMEDIATE_INIT(i)		{ (i) }
+
+/**
+ * _immediate_read - Read immediate value with standard memory load.
+ * @var: pointer of type immediate_*_t
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _immediate_read(var)		(var)->value
+
+/*
+ * _immediate_if - if () statement depending on immediate value (memory load)
+ * @var: pointer of type immediate_*_t
+ *
+ * Force the use of a normal if () statement depending on an immediate value.
+ */
+#define _immediate_if(var)		if (_immediate_read(var))
+
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-08-07 13:28:12.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-08-07 13:29:55.000000000 -0400
@@ -122,6 +122,13 @@
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
 									\
+	/* Immediate values: pointers */				\
+	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__start___immediate) = .;		\
+		*(__immediate)						\
+		VMLINUX_SYMBOL(__stop___immediate) = .;			\
+	}								\
+									\
 	/* Kernel symbol table: strings */				\
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
 		*(__ksymtab_strings)					\
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-08-07 13:29:52.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-08-07 13:29:55.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <asm/local.h>
 
 #include <asm/module.h>
@@ -374,6 +375,11 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+
+#ifdef CONFIG_IMMEDIATE
+	const struct __immediate *immediate;
+	unsigned int num_immediate;
+#endif
 };
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-08-07 13:29:52.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-08-07 13:29:55.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1719,6 +1720,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex = 0;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1815,6 +1817,9 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+#ifdef CONFIG_IMMEDIATE
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
+#endif
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1825,6 +1830,8 @@ static struct module *load_module(void _
 #endif
 	if (unwindex)
 		sechdrs[unwindex].sh_flags |= SHF_ALLOC;
+	if (immediateindex)
+		sechdrs[immediateindex].sh_flags |= SHF_ALLOC;
 
 	/* Check module struct version now, before we try to use module. */
 	if (!check_modstruct_version(sechdrs, versindex, mod)) {
@@ -1965,6 +1972,13 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	if (immediateindex) {
+		mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+		mod->num_immediate =
+			sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+	}
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2031,6 +2045,8 @@ static struct module *load_module(void _
 	 }
 #endif
 
+	module_immediate_setup(mod);
+
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2007-08-07 13:28:12.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile	2007-08-07 13:29:55.000000000 -0400
@@ -57,6 +57,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2007-08-07 13:30:27.000000000 -0400
@@ -0,0 +1,128 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+
+extern const struct __immediate __start___immediate[];
+extern const struct __immediate __stop___immediate[];
+
+/*
+ * modules_mutex nests inside immediate_mutex. immediate_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(immediate_mutex);
+
+/*
+ * Sets a range of immediates to a enabled state : set the enable bit.
+ */
+static inline void _immediate_update_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+	int ret;
+
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&immediate_mutex);
+		kernel_text_lock();
+		ret = arch_immediate_update(iter);
+		kernel_text_unlock();
+		if (ret)
+			printk(KERN_WARNING "Invalid immediate value. "
+					    "Variable at %p, "
+					    "instruction at %p, size %lu\n",
+					    (void*)iter->immediate,
+					    (void*)iter->var, iter->size);
+		mutex_unlock(&immediate_mutex);
+	}
+}
+
+#ifdef CONFIG_MODULES
+/**
+ * module_immediate_setup - Update immediate values in a module
+ * @mod: pointer to the struct module
+ *
+ * Setup the immediate according to the variable upon which it depends.  Called
+ * by load_module with module_mutex held. This mutex protects against concurrent
+ * modifications to modules'immediates. Therefore, since
+ * module_immediate_setup() does not modify builtin immediates, it does not need
+ * to take the immediate_mutex.
+ */
+void module_immediate_setup(struct module *mod)
+{
+	_immediate_update_range(mod->immediate,
+				mod->immediate+mod->num_immediate);
+}
+
+/*
+ * immediate mutex nests inside the modules mutex.
+ */
+static inline void immediate_update_modules(int lock)
+{
+	struct module *mod;
+
+	if (lock)
+		mutex_lock(&module_mutex);
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		_immediate_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+	if (lock)
+		mutex_unlock(&module_mutex);
+}
+#else
+static inline void immediate_update_modules(int lock) { }
+#endif
+
+/**
+ * immediate_update - update all immediate values in the kernel
+ * @lock: should a module_mutex be taken ?
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void immediate_update(int lock)
+{
+	/* Core kernel immediates */
+	_immediate_update_range(__start___immediate, __stop___immediate);
+	/* immediates in modules. */
+	immediate_update_modules(lock);
+}
+EXPORT_SYMBOL_GPL(immediate_update);
+
+static void __init immediate_update_early_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+
+	for (iter = begin; iter < end; iter++)
+		arch_immediate_update_early(iter);
+}
+
+/**
+ * immediate_update_early - Update immediate values at boot time
+ *
+ * Update the immediate values to the state of the variables they refer to. It
+ * is done before SMP is active, at the very beginning of start_kernel().
+ */
+void __init immediate_update_early(void)
+{
+	immediate_update_early_range(__start___immediate, __stop___immediate);
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2007-08-07 13:28:12.000000000 -0400
+++ linux-2.6-lttng/init/main.c	2007-08-07 13:29:55.000000000 -0400
@@ -56,6 +56,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -521,6 +522,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	container_init_early();
+	immediate_update_early();
 
 	local_irq_disable();
 	early_boot_irqs_off();

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-09-21 15:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-27 15:59 [patch 0/8] Immediate Values Mathieu Desnoyers
2007-08-27 15:59 ` [patch 1/8] Immediate Values - Global Modules List and Module Mutex Mathieu Desnoyers
2007-08-27 15:59 ` [patch 2/8] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2007-08-27 15:59 ` [patch 3/8] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
2007-08-27 15:59 ` [patch 4/8] Immediate Values - Move Kprobes i386 restore_interrupt to kdebug.h Mathieu Desnoyers
2007-08-27 15:59 ` [patch 5/8] Immediate Values - i386 Optimization Mathieu Desnoyers
2007-08-27 15:59 ` [patch 6/8] Immediate Values - Powerpc Optimization Mathieu Desnoyers
2007-08-27 15:59 ` [patch 7/8] Immediate Values - Documentation Mathieu Desnoyers
2007-09-20 10:46   ` Denys Vlasenko
2007-09-21 13:31     ` Mathieu Desnoyers
2007-09-21 15:51       ` Denys Vlasenko
2007-08-27 15:59 ` [patch 8/8] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
  -- strict thread matches above, loose matches on Subject: below --
2007-09-06 20:02 [patch 0/8] Immediate Values for 2.6.23-rc4-mm1 Mathieu Desnoyers
2007-09-06 20:02 ` [patch 2/8] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2007-08-20 20:23 [patch 0/8] Immediate Values Mathieu Desnoyers
2007-08-20 20:23 ` [patch 2/8] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2007-08-12 15:07 [patch 0/8] Immediate Values Mathieu Desnoyers
2007-08-12 15:07 ` [patch 2/8] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2007-08-13 20:51   ` Alexey Dobriyan
2007-08-16 16:02     ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).