LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 0/7] Immediate Values
@ 2008-02-02 21:08 Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel

Hi Andrew,

Here are the updated immediate values for 2.6.24-git12.

Dependencies :

# instrumentation menu removal #merged in kbuild.git
fix-arm-to-play-nicely-with-generic-instrumentation-menu.patch
create-arch-kconfig.patch
add-have-oprofile.patch
add-have-kprobes.patch
move-kconfiginstrumentation-to-arch-kconfig-and-init-kconfig.patch
#
#Kprobes mutex cleanup
kprobes-use-mutex-for-insn-pages.patch
kprobes-dont-use-kprobes-mutex-in-arch-code.patch
kprobes-declare-kprobes-mutex-static.patch
#Enhance DEBUG_RODATA support
x86-enhance-debug-rodata-support-alternatives.patch
x86-enhance-debug-rodata-support-for-hotplug-and-kprobes.patch
#Text Edit Lock (depends on Enhance DEBUG_RODATA and kprobes mutex cleanup)
text-edit-lock-architecture-independent-code.patch
text-edit-lock-kprobes-architecture-independent-support.patch

The patchset applies in the following order :

#Immediate Values
immediate-values-architecture-independent-code.patch
immediate-values-kconfig-menu-in-embedded.patch
immediate-values-x86-optimization.patch
add-text-poke-and-sync-core-to-powerpc.patch
immediate-values-powerpc-optimization.patch
immediate-values-documentation.patch
#
scheduler-profiling-use-immediate-values.patch
#

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
@ 2008-02-02 21:08 ` Mathieu Desnoyers
  2008-02-26 22:52   ` Jason Baron
  2008-02-02 21:08 ` [patch 2/7] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel; +Cc: Mathieu Desnoyers, Rusty Russell

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 18592 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _imv_read() version, which uses standard global
variables, and optimized per architecture imv_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__imv" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Changelog:

- section __imv is already SHF_ALLOC
- Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
  the if (immediateindex) is unnecessary here.
- Remove module_mutex usage: depend on functions implemented in module.c for
  that.
- Does not update tainted module's immediate values.
- remove imv_*_t types, add DECLARE_IMV() and DEFINE_IMV().
  - imv_read(&var) becomes imv_read(var) because of this.
- Adding a new EXPORT_IMV_SYMBOL(_GPL).
- remove imv_if(). Should use if (unlikely(imv_read(var))) instead.
  - Wait until we have gcc support before we add the imv_if macro, since
    its form may have to change.
- Dont't declare the __imv section in vmlinux.lds.h, just put the content
  in the rodata section.
- Simplify interface : remove imv_set_early, keep track of kernel boot
  status internally.
- Remove the ALIGN(8) before the __imv section. It is packed now.
- Uses an IPI busy-loop on each CPU with interrupts disabled as a simple,
  architecture agnostic, update mechanism.
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
---
 include/asm-generic/vmlinux.lds.h |    3 
 include/linux/immediate.h         |   94 +++++++++++++++++++
 include/linux/module.h            |   16 +++
 init/main.c                       |    8 +
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  187 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   50 +++++++++-
 7 files changed, 358 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2008-02-02 15:54:04.000000000 -0500
@@ -0,0 +1,94 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+struct __imv {
+	unsigned long var;	/* Pointer to the identifier variable of the
+				 * immediate value
+				 */
+	unsigned long imv;	/*
+				 * Pointer to the memory location of the
+				 * immediate value within the instruction.
+				 */
+	unsigned char size;	/* Type size. */
+} __attribute__ ((packed));
+
+#include <asm/immediate.h>
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)						\
+	do {								\
+		name##__imv = (i);					\
+		core_imv_update();					\
+		module_imv_update();					\
+	} while (0)
+
+/*
+ * Internal update functions.
+ */
+extern void core_imv_update(void);
+extern void imv_update_range(const struct __imv *begin,
+	const struct __imv *end);
+
+#else
+
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ */
+#define imv_read(name)			_imv_read(name)
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)		(name##__imv = (i))
+
+static inline void core_imv_update(void) { }
+static inline void module_imv_update(void) { }
+
+#endif
+
+#define DECLARE_IMV(type, name) extern __typeof__(type) name##__imv
+#define DEFINE_IMV(type, name)  __typeof__(type) name##__imv
+
+#define EXPORT_IMV_SYMBOL(name) EXPORT_SYMBOL(name##__imv)
+#define EXPORT_IMV_SYMBOL_GPL(name) EXPORT_SYMBOL_GPL(name##__imv)
+
+/**
+ * _imv_read - Read immediate value with standard memory load.
+ * @name: immediate value name
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _imv_read(name)		(name##__imv)
+
+#endif
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2008-02-02 15:54:04.000000000 -0500
+++ linux-2.6-lttng/include/linux/module.h	2008-02-02 15:54:04.000000000 -0500
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <linux/marker.h>
 #include <asm/local.h>
 
@@ -355,6 +356,10 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+#ifdef CONFIG_IMMEDIATE
+	const struct __imv *immediate;
+	unsigned int num_immediate;
+#endif
 #ifdef CONFIG_MARKERS
 	struct marker *markers;
 	unsigned int num_markers;
@@ -467,6 +472,9 @@ extern void print_modules(void);
 
 extern void module_update_markers(void);
 
+extern void _module_imv_update(void);
+extern void module_imv_update(void);
+
 #else /* !CONFIG_MODULES... */
 #define EXPORT_SYMBOL(sym)
 #define EXPORT_SYMBOL_GPL(sym)
@@ -572,6 +580,14 @@ static inline void module_update_markers
 {
 }
 
+static inline void _module_imv_update(void)
+{
+}
+
+static inline void module_imv_update(void)
+{
+}
+
 #endif /* CONFIG_MODULES */
 
 struct device_driver;
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2008-02-02 15:54:04.000000000 -0500
+++ linux-2.6-lttng/kernel/module.c	2008-02-02 15:54:04.000000000 -0500
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1716,6 +1717,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex;
 	unsigned int markersindex;
 	unsigned int markersstringsindex;
 	struct module *mod;
@@ -1814,6 +1816,7 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__imv");
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1965,6 +1968,11 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+	mod->num_immediate =
+		sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2032,11 +2040,16 @@ static struct module *load_module(void _
 
 	add_kallsyms(mod, sechdrs, symindex, strindex, secstrings);
 
+	if (!(mod->taints & TAINT_FORCED_MODULE)) {
 #ifdef CONFIG_MARKERS
-	if (!(mod->taints & TAINT_FORCED_MODULE))
 		marker_update_probe_range(mod->markers,
 			mod->markers + mod->num_markers);
 #endif
+#ifdef CONFIG_IMMEDIATE
+		imv_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+#endif
+	}
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
@@ -2573,3 +2586,38 @@ void module_update_markers(void)
 	mutex_unlock(&module_mutex);
 }
 #endif
+
+#ifdef CONFIG_IMMEDIATE
+/**
+ * _module_imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Module_mutex must be held be the caller.
+ */
+void _module_imv_update(void)
+{
+	struct module *mod;
+
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		imv_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+}
+EXPORT_SYMBOL_GPL(_module_imv_update);
+
+/**
+ * module_imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Takes module_mutex.
+ */
+void module_imv_update(void)
+{
+	mutex_lock(&module_mutex);
+	_module_imv_update();
+	mutex_unlock(&module_mutex);
+}
+EXPORT_SYMBOL_GPL(module_imv_update);
+#endif
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2008-02-02 15:57:23.000000000 -0500
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+#include <linux/cpu.h>
+
+#include <asm/cacheflush.h>
+
+/*
+ * Kernel ready to execute the SMP update that may depend on trap and ipi.
+ */
+static int imv_early_boot_complete;
+
+extern const struct __imv __start___imv[];
+extern const struct __imv __stop___imv[];
+
+/*
+ * imv_mutex nests inside module_mutex. imv_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(imv_mutex);
+
+static atomic_t wait_sync;
+
+struct ipi_loop_data {
+	long value;
+	const struct __imv *imv;
+} loop_data;
+
+static void ipi_busy_loop(void *arg)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	atomic_dec(&wait_sync);
+	do {
+		/* Make sure the wait_sync gets re-read */
+		smp_mb();
+	} while (atomic_read(&wait_sync) > loop_data.value);
+	atomic_dec(&wait_sync);
+	do {
+		/* Make sure the wait_sync gets re-read */
+		smp_mb();
+	} while (atomic_read(&wait_sync) > 0);
+	/*
+	 * Issuing a synchronizing instruction must be done on each CPU before
+	 * reenabling interrupts after modifying an instruction. Required by
+	 * Intel's errata.
+	 */
+	sync_core();
+	flush_icache_range(loop_data.imv->imv,
+		loop_data.imv->imv + loop_data.imv->size);
+	local_irq_restore(flags);
+}
+
+/**
+ * apply_imv_update - update one immediate value
+ * @imv: pointer of type const struct __imv to update
+ *
+ * Update one immediate value. Must be called with imv_mutex held.
+ * It makes sure all CPUs are not executing the modified code by having them
+ * busy looping with interrupts disabled.
+ * It does _not_ protect against NMI and MCE (could be a problem with Intel's
+ * errata if we use immediate values in their code path).
+ */
+static int apply_imv_update(const struct __imv *imv)
+{
+	unsigned long flags;
+	long online_cpus;
+
+	/*
+	 * If the variable and the instruction have the same value, there is
+	 * nothing to do.
+	 */
+	switch (imv->size) {
+	case 1:	if (*(uint8_t *)imv->imv
+				== *(uint8_t *)imv->var)
+			return 0;
+		break;
+	case 2:	if (*(uint16_t *)imv->imv
+				== *(uint16_t *)imv->var)
+			return 0;
+		break;
+	case 4:	if (*(uint32_t *)imv->imv
+				== *(uint32_t *)imv->var)
+			return 0;
+		break;
+	case 8:	if (*(uint64_t *)imv->imv
+				== *(uint64_t *)imv->var)
+			return 0;
+		break;
+	default:return -EINVAL;
+	}
+
+	if (imv_early_boot_complete) {
+		kernel_text_lock();
+		get_online_cpus();
+		online_cpus = num_online_cpus();
+		atomic_set(&wait_sync, 2 * online_cpus);
+		loop_data.value = online_cpus;
+		loop_data.imv = imv;
+		smp_call_function(ipi_busy_loop, NULL, 1, 0);
+		local_irq_save(flags);
+		atomic_dec(&wait_sync);
+		do {
+			/* Make sure the wait_sync gets re-read */
+			smp_mb();
+		} while (atomic_read(&wait_sync) > online_cpus);
+		text_poke((void *)imv->imv, (void *)imv->var,
+				imv->size);
+		/*
+		 * Make sure the modified instruction is seen by all CPUs before
+		 * we continue (visible to other CPUs and local interrupts).
+		 */
+		wmb();
+		atomic_dec(&wait_sync);
+		flush_icache_range(imv->imv,
+				imv->imv + imv->size);
+		local_irq_restore(flags);
+		put_online_cpus();
+		kernel_text_unlock();
+	} else
+		text_poke_early((void *)imv->imv, (void *)imv->var,
+				imv->size);
+	return 0;
+}
+
+/**
+ * imv_update_range - Update immediate values in a range
+ * @begin: pointer to the beginning of the range
+ * @end: pointer to the end of the range
+ *
+ * Updates a range of immediates.
+ */
+void imv_update_range(const struct __imv *begin,
+		const struct __imv *end)
+{
+	const struct __imv *iter;
+	int ret;
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&imv_mutex);
+		ret = apply_imv_update(iter);
+		if (imv_early_boot_complete && ret)
+			printk(KERN_WARNING
+				"Invalid immediate value. "
+				"Variable at %p, "
+				"instruction at %p, size %hu\n",
+				(void *)iter->imv,
+				(void *)iter->var, iter->size);
+		mutex_unlock(&imv_mutex);
+	}
+}
+EXPORT_SYMBOL_GPL(imv_update_range);
+
+/**
+ * imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void core_imv_update(void)
+{
+	/* Core kernel imvs */
+	imv_update_range(__start___imv, __stop___imv);
+}
+EXPORT_SYMBOL_GPL(core_imv_update);
+
+void __init imv_init_complete(void)
+{
+	imv_early_boot_complete = 1;
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2008-02-02 15:33:24.000000000 -0500
+++ linux-2.6-lttng/init/main.c	2008-02-02 15:54:04.000000000 -0500
@@ -57,6 +57,7 @@
 #include <linux/device.h>
 #include <linux/kthread.h>
 #include <linux/sched.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -101,6 +102,11 @@ static inline void mark_rodata_ro(void) 
 #ifdef CONFIG_TC
 extern void tc_init(void);
 #endif
+#ifdef CONFIG_IMMEDIATE
+extern void imv_init_complete(void);
+#else
+static inline void imv_init_complete(void) { }
+#endif
 
 enum system_states system_state;
 EXPORT_SYMBOL(system_state);
@@ -522,6 +528,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	cgroup_init_early();
+	core_imv_update();
 
 	local_irq_disable();
 	early_boot_irqs_off();
@@ -645,6 +652,7 @@ asmlinkage void __init start_kernel(void
 	cpuset_init();
 	taskstats_init_early();
 	delayacct_init();
+	imv_init_complete();
 
 	check_bugs();
 
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2008-02-02 15:33:24.000000000 -0500
+++ linux-2.6-lttng/kernel/Makefile	2008-02-02 15:54:04.000000000 -0500
@@ -63,6 +63,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 obj-$(CONFIG_MARKERS) += marker.o
 obj-$(CONFIG_LATENCYTOP) += latencytop.o
 
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2008-02-02 15:33:24.000000000 -0500
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2008-02-02 15:54:04.000000000 -0500
@@ -61,6 +61,9 @@
 		*(.rodata) *(.rodata.*)					\
 		*(__vermagic)		/* Kernel version magic */	\
 		*(__markers_strings)	/* Markers: strings */		\
+		VMLINUX_SYMBOL(__start___imv) = .;			\
+		*(__imv)		/* Immediate values: pointers */ \
+		VMLINUX_SYMBOL(__stop___imv) = .;			\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 2/7] Immediate Values - Kconfig menu in EMBEDDED
  2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
@ 2008-02-02 21:08 ` Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 3/7] Immediate Values - x86 Optimization Mathieu Desnoyers
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Mathieu Desnoyers, Rusty Russell, Adrian Bunk, Andi Kleen,
	Alexey Dobriyan, Christoph Hellwig

[-- Attachment #1: immediate-values-kconfig-menu-in-embedded.patch --]
[-- Type: text/plain, Size: 2661 bytes --]

Immediate values provide a way to use dynamic code patching to update variables
sitting within the instruction stream. It saves caches lines normally used by
static read mostly variables. Enable it by default, but let users disable it
through the EMBEDDED menu with the "Disable immediate values" submenu entry.

Note: Since I think that I really should let embedded systems developers using
RO memory the option to disable the immediate values, I choose to leave this
menu option there, in the EMBEDDED menu. Also, the "CONFIG_IMMEDIATE" makes
sense because we want to compile out all the immediate code when we decide not
to use optimized immediate values at all (it removes otherwise unused code).

Changelog:
- Change ARCH_SUPPORTS_IMMEDIATE for ARCH_HAS_IMMEDIATE

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
---
 init/Kconfig |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig	2008-01-29 15:26:54.000000000 -0500
+++ linux-2.6-lttng/init/Kconfig	2008-01-29 15:27:17.000000000 -0500
@@ -444,6 +444,20 @@ config CC_OPTIMIZE_FOR_SIZE
 config SYSCTL
 	bool
 
+config IMMEDIATE
+	default y if !DISABLE_IMMEDIATE
+	depends on HAVE_IMMEDIATE
+	bool
+	help
+	  Immediate values are used as read-mostly variables that are rarely
+	  updated. They use code patching to modify the values inscribed in the
+	  instruction stream. It provides a way to save precious cache lines
+	  that would otherwise have to be used by these variables. They can be
+	  disabled through the EMBEDDED menu.
+
+config HAVE_IMMEDIATE
+	def_bool n
+
 menuconfig EMBEDDED
 	bool "Configure standard kernel features (for small systems)"
 	help
@@ -679,6 +693,16 @@ config MARKERS
 
 source "arch/Kconfig"
 
+config DISABLE_IMMEDIATE
+	default y if EMBEDDED
+	bool "Disable immediate values" if EMBEDDED
+	depends on HAVE_IMMEDIATE
+	help
+	  Disable code patching based immediate values for embedded systems. It
+	  consumes slightly more memory and requires to modify the instruction
+	  stream each time a variable is updated. Should really be disabled for
+	  embedded systems with read-only text.
+
 endmenu		# General setup
 
 config SLABINFO

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 3/7] Immediate Values - x86 Optimization
  2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 2/7] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
@ 2008-02-02 21:08 ` Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 4/7] Add text_poke and sync_core to powerpc Mathieu Desnoyers
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Mathieu Desnoyers, Andi Kleen, H. Peter Anvin, Chuck Ebbert,
	Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner,
	Ingo Molnar, Rusty Russell

[-- Attachment #1: immediate-values-x86-optimization.patch --]
[-- Type: text/plain, Size: 5077 bytes --]

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
  non atomic writes to a code region only touched by us (nobody can execute it
  since we are protected by the imv_mutex).
- Put imv_set and _imv_set in the architecture independent header.
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.

Ok, so the most flexible solution that I see, that should fit for both
x86 and x86_64 would be :
1 byte  :       "=q" : "a", "b", "c", or "d" register for the i386.  For
                       x86-64 it is equivalent to "r" class (for 8-bit
                       instructions that do not use upper halves).
2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is in a
                       general register.

- "Redux" immediate values : no need to put a breakpoint, therefore, no
  need to know where the instruction starts. It's therefore OK to have a
  REX prefix.

- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
  immediate structure.
- Change the immediate.c update code to support variable length opcodes.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
  Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <ak@muc.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
---
 arch/x86/Kconfig            |    1 
 include/asm-x86/immediate.h |   77 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86/immediate.h	2008-01-30 09:33:22.000000000 -0500
@@ -0,0 +1,77 @@
+#ifndef _ASM_X86_IMMEDIATE_H
+#define _ASM_X86_IMMEDIATE_H
+
+/*
+ * Immediate values. x86 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm.h>
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _imv_read() instead.
+ * If size is bigger than the architecture long size, fall back on a memory
+ * read.
+ *
+ * Make sure to populate the initial static 64 bits opcode with a value
+ * what will generate an instruction with 8 bytes immediate value (not the REX.W
+ * prefixed one that loads a sign extended 32 bits immediate value in a r64
+ * register).
+ */
+#define imv_read(name)							\
+	({								\
+		__typeof__(name##__imv) value;				\
+		BUILD_BUG_ON(sizeof(value) > 8);			\
+		switch (sizeof(value)) {				\
+		case 1:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0,%0\n\t"				\
+				"3:\n\t"				\
+				: "=q" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		case 2:							\
+		case 4:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0,%0\n\t"				\
+				"3:\n\t"				\
+				: "=r" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		case 8:							\
+			if (sizeof(long) < 8) {				\
+				value = name##__imv;			\
+				break;					\
+			}						\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0xFEFEFEFE01010101,%0\n\t" 	\
+				"3:\n\t"				\
+				: "=r" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		};							\
+		value;							\
+	})
+
+#endif /* _ASM_X86_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/x86/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86/Kconfig	2008-01-30 09:14:21.000000000 -0500
+++ linux-2.6-lttng/arch/x86/Kconfig	2008-01-30 09:33:22.000000000 -0500
@@ -20,6 +20,7 @@ config X86
 	def_bool y
 	select HAVE_OPROFILE
 	select HAVE_KPROBES
+	select HAVE_IMMEDIATE
 
 config GENERIC_LOCKBREAK
 	def_bool n

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 4/7] Add text_poke and sync_core to powerpc
  2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2008-02-02 21:08 ` [patch 3/7] Immediate Values - x86 Optimization Mathieu Desnoyers
@ 2008-02-02 21:08 ` Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 5/7] Immediate Values - Powerpc Optimization Mathieu Desnoyers
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Mathieu Desnoyers, Rusty Russell, Christoph Hellwig, Paul Mackerras

[-- Attachment #1: add-text-poke-and-sync-core-to-powerpc.patch --]
[-- Type: text/plain, Size: 1355 bytes --]

- Needed on architectures where we must surround live instruction modification
  with "WP flag disable".
- Turns into a memcpy on powerpc since there is no WP flag activated for
  instruction pages (yet..).
- Add empty sync_core to powerpc so it can be used in architecture independent
  code.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Christoph Hellwig <hch@infradead.org>
CC: Paul Mackerras <paulus@samba.org>
---
 include/asm-powerpc/cacheflush.h |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/include/asm-powerpc/cacheflush.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-powerpc/cacheflush.h	2007-11-19 12:05:50.000000000 -0500
+++ linux-2.6-lttng/include/asm-powerpc/cacheflush.h	2007-11-19 13:27:36.000000000 -0500
@@ -63,7 +63,9 @@ extern void flush_dcache_phys_range(unsi
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
 	memcpy(dst, src, len)
 
-
+#define text_poke	memcpy
+#define text_poke_early	text_poke
+#define sync_core()
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 /* internal debugging function */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 5/7] Immediate Values - Powerpc Optimization
  2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2008-02-02 21:08 ` [patch 4/7] Add text_poke and sync_core to powerpc Mathieu Desnoyers
@ 2008-02-02 21:08 ` Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 6/7] Immediate Values - Documentation Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 7/7] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
  6 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Mathieu Desnoyers, Rusty Russell, Christoph Hellwig, Paul Mackerras

[-- Attachment #1: immediate-values-powerpc-optimization.patch --]
[-- Type: text/plain, Size: 3003 bytes --]

PowerPC optimization of the immediate values which uses a li instruction,
patched with an immediate value.

Changelog:
- Put imv_set and _imv_set in the architecture independent header.
- Pack the __imv section. Use smallest types required for size (char).
- Remove architecture specific update code : now handled by architecture
  agnostic code.
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Christoph Hellwig <hch@infradead.org>
CC: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/Kconfig            |    1 
 include/asm-powerpc/immediate.h |   55 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h	2008-01-30 09:33:42.000000000 -0500
@@ -0,0 +1,55 @@
+#ifndef _ASM_POWERPC_IMMEDIATE_H
+#define _ASM_POWERPC_IMMEDIATE_H
+
+/*
+ * Immediate values. PowerPC architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm-compat.h>
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _imv_read() instead.
+ */
+#define imv_read(name)							\
+	({								\
+		__typeof__(name##__imv) value;				\
+		BUILD_BUG_ON(sizeof(value) > 8);			\
+		switch (sizeof(value)) {				\
+		case 1:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+					PPC_LONG "%c1, ((1f)-1)\n\t"	\
+					".byte 1\n\t"			\
+					".previous\n\t"			\
+					"li %0,0\n\t"			\
+					"1:\n\t"			\
+				: "=r" (value)				\
+				: "i" (&name##__imv));			\
+			break;						\
+		case 2:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+					PPC_LONG "%c1, ((1f)-2)\n\t"	\
+					".byte 2\n\t"			\
+					".previous\n\t"			\
+					"li %0,0\n\t"			\
+					"1:\n\t"			\
+				: "=r" (value)				\
+				: "i" (&name##__imv));			\
+			break;						\
+		case 4:							\
+		case 8:	value = name##__imv;				\
+			break;						\
+		};							\
+		value;							\
+	})
+
+#endif /* _ASM_POWERPC_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/powerpc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/Kconfig	2008-01-30 09:14:21.000000000 -0500
+++ linux-2.6-lttng/arch/powerpc/Kconfig	2008-01-30 09:33:42.000000000 -0500
@@ -89,6 +89,7 @@ config PPC
 	default y
 	select HAVE_OPROFILE
 	select HAVE_KPROBES
+	select HAVE_IMMEDIATE
 
 config EARLY_PRINTK
 	bool

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 6/7] Immediate Values - Documentation
  2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2008-02-02 21:08 ` [patch 5/7] Immediate Values - Powerpc Optimization Mathieu Desnoyers
@ 2008-02-02 21:08 ` Mathieu Desnoyers
  2008-02-02 21:08 ` [patch 7/7] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
  6 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel; +Cc: Mathieu Desnoyers, Rusty Russell

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: immediate-values-documentation.patch --]
[-- Type: text/plain, Size: 8867 bytes --]

Changelog:
- Remove imv_set_early (removed from API).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
---
 Documentation/immediate.txt |  221 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 221 insertions(+)

Index: linux-2.6-lttng/Documentation/immediate.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/immediate.txt	2008-02-01 07:42:01.000000000 -0500
@@ -0,0 +1,221 @@
+		        Using the Immediate Values
+
+			    Mathieu Desnoyers
+
+
+This document introduces Immediate Values and their use.
+
+
+* Purpose of immediate values
+
+An immediate value is used to compile into the kernel variables that sit within
+the instruction stream. They are meant to be rarely updated but read often.
+Using immediate values for these variables will save cache lines.
+
+This infrastructure is specialized in supporting dynamic patching of the values
+in the instruction stream when multiple CPUs are running without disturbing the
+normal system behavior.
+
+Compiling code meant to be rarely enabled at runtime can be done using
+if (unlikely(imv_read(var))) as condition surrounding the code. The
+smallest data type required for the test (an 8 bits char) is preferred, since
+some architectures, such as powerpc, only allow up to 16 bits immediate values.
+
+
+* Usage
+
+In order to use the "immediate" macros, you should include linux/immediate.h.
+
+#include <linux/immediate.h>
+
+DEFINE_IMV(char, this_immediate);
+EXPORT_IMV_SYMBOL(this_immediate);
+
+
+And use, in the body of a function:
+
+Use imv_set(this_immediate) to set the immediate value.
+
+Use imv_read(this_immediate) to read the immediate value.
+
+The immediate mechanism supports inserting multiple instances of the same
+immediate. Immediate values can be put in inline functions, inlined static
+functions, and unrolled loops.
+
+If you have to read the immediate values from a function declared as __init or
+__exit, you should explicitly use _imv_read(), which will fall back on a
+global variable read. Failing to do so will leave a reference to the __init
+section after it is freed (it would generate a modpost warning).
+
+You can choose to set an initial static value to the immediate by using, for
+instance:
+
+DEFINE_IMV(long, myptr) = 10;
+
+
+* Optimization for a given architecture
+
+One can implement optimized immediate values for a given architecture by
+replacing asm-$ARCH/immediate.h.
+
+
+* Performance improvement
+
+
+  * Memory hit for a data-based branch
+
+Here are the results on a 3GHz Pentium 4:
+
+number of tests: 100
+number of branches per test: 100000
+memory hit cycles per iteration (mean): 636.611
+L1 cache hit cycles per iteration (mean): 89.6413
+instruction stream based test, cycles per iteration (mean): 85.3438
+Just getting the pointer from a modulo on a pseudo-random value, doing
+  nothing with it, cycles per iteration (mean): 77.5044
+
+So:
+Base case:                      77.50 cycles
+instruction stream based test:  +7.8394 cycles
+L1 cache hit based test:        +12.1369 cycles
+Memory load based test:         +559.1066 cycles
+
+So let's say we have a ping flood coming at
+(14014 packets transmitted, 14014 received, 0% packet loss, time 1826ms)
+7674 packets per second. If we put 2 markers for irq entry/exit, it
+brings us to 15348 markers sites executed per second.
+
+(15348 exec/s) * (559 cycles/exec) / (3G cycles/s) = 0.0029
+We therefore have a 0.29% slowdown just on this case.
+
+Compared to this, the instruction stream based test will cause a
+slowdown of:
+
+(15348 exec/s) * (7.84 cycles/exec) / (3G cycles/s) = 0.00004
+For a 0.004% slowdown.
+
+If we plan to use this for memory allocation, spinlock, and all sorts of
+very high event rate tracing, we can assume it will execute 10 to 100
+times more sites per second, which brings us to 0.4% slowdown with the
+instruction stream based test compared to 29% slowdown with the memory
+load based test on a system with high memory pressure.
+
+
+
+  * Markers impact under heavy memory load
+
+Running a kernel with my LTTng instrumentation set, in a test that
+generates memory pressure (from userspace) by trashing L1 and L2 caches
+between calls to getppid() (note: syscall_trace is active and calls
+a marker upon syscall entry and syscall exit; markers are disarmed).
+This test is done in user-space, so there are some delays due to IRQs
+coming and to the scheduler. (UP 2.6.22-rc6-mm1 kernel, task with -20
+nice level)
+
+My first set of results: Linear cache trashing, turned out not to be
+very interesting, because it seems like the linearity of the memset on a
+full array is somehow detected and it does not "really" trash the
+caches.
+
+Now the most interesting result: Random walk L1 and L2 trashing
+surrounding a getppid() call.
+
+- Markers compiled out (but syscall_trace execution forced)
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 108.033 cycles
+getppid: 1681.4 cycles
+With memory pressure
+Reading timestamps takes 102.938 cycles
+getppid: 15691.6 cycles
+
+
+- With the immediate values based markers:
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 108.006 cycles
+getppid: 1681.84 cycles
+With memory pressure
+Reading timestamps takes 100.291 cycles
+getppid: 11793 cycles
+
+
+- With global variables based markers:
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 107.999 cycles
+getppid: 1669.06 cycles
+With memory pressure
+Reading timestamps takes 102.839 cycles
+getppid: 12535 cycles
+
+The result is quite interesting in that the kernel is slower without
+markers than with markers. I explain it by the fact that the data
+accessed is not laid out in the same manner in the cache lines when the
+markers are compiled in or out. It seems that it aligns the function's
+data better to compile-in the markers in this case.
+
+But since the interesting comparison is between the immediate values and
+global variables based markers, and because they share the same memory
+layout, except for the movl being replaced by a movz, we see that the
+global variable based markers (2 markers) adds 742 cycles to each system
+call (syscall entry and exit are traced and memory locations for both
+global variables lie on the same cache line).
+
+
+- Test redone with less iterations, but with error estimates
+
+10 runs of 100 iterations each: Tests done on a 3GHz P4. Here I run getppid with
+syscall trace inactive, comparing the case with memory pressure and without
+memory pressure. (sorry, my system is not setup to execute syscall_trace this
+time, but it will make the point anyway).
+
+No memory pressure
+Reading timestamps:     150.92 cycles,     std dev.    1.01 cycles
+getppid:               1462.09 cycles,     std dev.   18.87 cycles
+
+With memory pressure
+Reading timestamps:     578.22 cycles,     std dev.  269.51 cycles
+getppid:              17113.33 cycles,     std dev. 1655.92 cycles
+
+
+Now for memory read timing: (10 runs, branches per test: 100000)
+Memory read based branch:
+                       644.09 cycles,      std dev.   11.39 cycles
+L1 cache hit based branch:
+                        88.16 cycles,      std dev.    1.35 cycles
+
+
+So, now that we have the raw results, let's calculate:
+
+Memory read:
+644.09±11.39 - 88.16±1.35 = 555.93±11.46 cycles
+
+Getppid without memory pressure:
+1462.09±18.87 - 150.92±1.01 = 1311.17±18.90 cycles
+
+Getppid with memory pressure:
+17113.33±1655.92 - 578.22±269.51 = 16535.11±1677.71 cycles
+
+Therefore, if we add 2 markers not based on immediate values to the getppid
+code, which would add 2 memory reads, we would add
+2 * 555.93±12.74 = 1111.86±25.48 cycles
+
+Therefore,
+
+1111.86±25.48 / 16535.11±1677.71 = 0.0672
+ relative error: sqrt(((25.48/1111.86)^2)+((1677.71/16535.11)^2))
+                     = 0.1040
+ absolute error: 0.1040 * 0.0672 = 0.0070
+
+Therefore: 0.0672±0.0070 * 100% = 6.72±0.70 %
+
+We can therefore affirm that adding 2 markers to getppid, on a system with high
+memory pressure, would have a performance hit of at least 6.0% on the system
+call time, all within the uncertainty limits of these tests. The same applies to
+other kernel code paths. The smaller those code paths are, the highest the
+impact ratio will be.
+
+Therefore, not only is it interesting to use the immediate values to dynamically
+activate dormant code such as the markers, but I think it should also be
+considered as a replacement for many of the "read-mostly" static variables.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 7/7] Scheduler Profiling - Use Immediate Values
  2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2008-02-02 21:08 ` [patch 6/7] Immediate Values - Documentation Mathieu Desnoyers
@ 2008-02-02 21:08 ` Mathieu Desnoyers
  6 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: scheduler-profiling-use-immediate-values.patch --]
[-- Type: text/plain, Size: 5830 bytes --]

Use immediate values with lower d-cache hit in optimized version as a
condition for scheduler profiling call.

Changelog :
- Use imv_* instead of immediate_*.
- Follow the white rabbit : kvm_main.c which becomes x86.c.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kvm/x86.c      |    2 +-
 include/linux/profile.h |    5 +++--
 kernel/profile.c        |   22 +++++++++++-----------
 kernel/sched_fair.c     |    5 +----
 4 files changed, 16 insertions(+), 18 deletions(-)

Index: linux-2.6-lttng/kernel/profile.c
===================================================================
--- linux-2.6-lttng.orig/kernel/profile.c	2008-02-01 07:32:04.000000000 -0500
+++ linux-2.6-lttng/kernel/profile.c	2008-02-01 07:43:02.000000000 -0500
@@ -42,8 +42,8 @@ static int (*timer_hook)(struct pt_regs 
 static atomic_t *prof_buffer;
 static unsigned long prof_len, prof_shift;
 
-int prof_on __read_mostly;
-EXPORT_SYMBOL_GPL(prof_on);
+DEFINE_IMV(char, prof_on) __read_mostly;
+EXPORT_IMV_SYMBOL_GPL(prof_on);
 
 static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
 #ifdef CONFIG_SMP
@@ -61,7 +61,7 @@ static int __init profile_setup(char *st
 
 	if (!strncmp(str, sleepstr, strlen(sleepstr))) {
 #ifdef CONFIG_SCHEDSTATS
-		prof_on = SLEEP_PROFILING;
+		imv_set(prof_on, SLEEP_PROFILING);
 		if (str[strlen(sleepstr)] == ',')
 			str += strlen(sleepstr) + 1;
 		if (get_option(&str, &par))
@@ -74,7 +74,7 @@ static int __init profile_setup(char *st
 			"kernel sleep profiling requires CONFIG_SCHEDSTATS\n");
 #endif /* CONFIG_SCHEDSTATS */
 	} else if (!strncmp(str, schedstr, strlen(schedstr))) {
-		prof_on = SCHED_PROFILING;
+		imv_set(prof_on, SCHED_PROFILING);
 		if (str[strlen(schedstr)] == ',')
 			str += strlen(schedstr) + 1;
 		if (get_option(&str, &par))
@@ -83,7 +83,7 @@ static int __init profile_setup(char *st
 			"kernel schedule profiling enabled (shift: %ld)\n",
 			prof_shift);
 	} else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
-		prof_on = KVM_PROFILING;
+		imv_set(prof_on, KVM_PROFILING);
 		if (str[strlen(kvmstr)] == ',')
 			str += strlen(kvmstr) + 1;
 		if (get_option(&str, &par))
@@ -93,7 +93,7 @@ static int __init profile_setup(char *st
 			prof_shift);
 	} else if (get_option(&str, &par)) {
 		prof_shift = par;
-		prof_on = CPU_PROFILING;
+		imv_set(prof_on, CPU_PROFILING);
 		printk(KERN_INFO "kernel profiling enabled (shift: %ld)\n",
 			prof_shift);
 	}
@@ -104,7 +104,7 @@ __setup("profile=", profile_setup);
 
 void __init profile_init(void)
 {
-	if (!prof_on)
+	if (!_imv_read(prof_on))
 		return;
 
 	/* only text is profiled */
@@ -291,7 +291,7 @@ void profile_hits(int type, void *__pc, 
 	int i, j, cpu;
 	struct profile_hit *hits;
 
-	if (prof_on != type || !prof_buffer)
+	if (!prof_buffer)
 		return;
 	pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
 	i = primary = (pc & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
@@ -401,7 +401,7 @@ void profile_hits(int type, void *__pc, 
 {
 	unsigned long pc;
 
-	if (prof_on != type || !prof_buffer)
+	if (!prof_buffer)
 		return;
 	pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
 	atomic_add(nr_hits, &prof_buffer[min(pc, prof_len - 1)]);
@@ -558,7 +558,7 @@ static int __init create_hash_tables(voi
 	}
 	return 0;
 out_cleanup:
-	prof_on = 0;
+	imv_set(prof_on, 0);
 	smp_mb();
 	on_each_cpu(profile_nop, NULL, 0, 1);
 	for_each_online_cpu(cpu) {
@@ -585,7 +585,7 @@ static int __init create_proc_profile(vo
 {
 	struct proc_dir_entry *entry;
 
-	if (!prof_on)
+	if (!_imv_read(prof_on))
 		return 0;
 	if (create_hash_tables())
 		return -1;
Index: linux-2.6-lttng/include/linux/profile.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/profile.h	2008-02-01 07:32:04.000000000 -0500
+++ linux-2.6-lttng/include/linux/profile.h	2008-02-01 07:43:02.000000000 -0500
@@ -7,10 +7,11 @@
 #include <linux/init.h>
 #include <linux/cpumask.h>
 #include <linux/cache.h>
+#include <linux/immediate.h>
 
 #include <asm/errno.h>
 
-extern int prof_on __read_mostly;
+DECLARE_IMV(char, prof_on) __read_mostly;
 
 #define CPU_PROFILING	1
 #define SCHED_PROFILING	2
@@ -38,7 +39,7 @@ static inline void profile_hit(int type,
 	/*
 	 * Speedup for the common (no profiling enabled) case:
 	 */
-	if (unlikely(prof_on == type))
+	if (unlikely(imv_read(prof_on) == type))
 		profile_hits(type, ip, 1);
 }
 
Index: linux-2.6-lttng/kernel/sched_fair.c
===================================================================
--- linux-2.6-lttng.orig/kernel/sched_fair.c	2008-02-01 07:34:07.000000000 -0500
+++ linux-2.6-lttng/kernel/sched_fair.c	2008-02-01 07:43:02.000000000 -0500
@@ -470,11 +470,8 @@ static void enqueue_sleeper(struct cfs_r
 		 * get a milliseconds-range estimation of the amount of
 		 * time that the task spent sleeping:
 		 */
-		if (unlikely(prof_on == SLEEP_PROFILING)) {
-
-			profile_hits(SLEEP_PROFILING, (void *)get_wchan(tsk),
+		profile_hits(SLEEP_PROFILING, (void *)get_wchan(task_of(se)),
 				     delta >> 20);
-		}
 		account_scheduler_latency(tsk, delta >> 10, 0);
 	}
 #endif
Index: linux-2.6-lttng/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kvm/x86.c	2008-02-01 07:43:17.000000000 -0500
+++ linux-2.6-lttng/arch/x86/kvm/x86.c	2008-02-01 07:43:48.000000000 -0500
@@ -2592,7 +2592,7 @@ again:
 	/*
 	 * Profile KVM exit RIPs:
 	 */
-	if (unlikely(prof_on == KVM_PROFILING)) {
+	if (unlikely(imv_read(prof_on) == KVM_PROFILING)) {
 		kvm_x86_ops->cache_regs(vcpu);
 		profile_hit(KVM_PROFILING, (void *)vcpu->arch.rip);
 	}

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-02 21:08 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
@ 2008-02-26 22:52   ` Jason Baron
  2008-02-26 23:12     ` Mathieu Desnoyers
  2008-02-27 19:05     ` Mathieu Desnoyers
  0 siblings, 2 replies; 30+ messages in thread
From: Jason Baron @ 2008-02-26 22:52 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell

On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> Changelog:
> 
> - section __imv is already SHF_ALLOC
> - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
>   the if (immediateindex) is unnecessary here.
> - Remove module_mutex usage: depend on functions implemented in module.c for
>   that.

hi,

In testing this patch, i've run across a deadlock...apply_imv_update() can get
called again before, ipi_busy_loop() has had a chance to finsh, and set 
wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
deadlock below using nmi_watchdog=1 in item 1). 

After hitting this deadlock i modified apply_imv_update() to wait for 
ipi_busy_loop(), to finish, however this resulted in a 3 way deadlock. Process
A held a read lock on tasklist_lock, then process B called apply_imv_update().
Process A received the IPI and begins executing ipi_busy_loop(). Then process
C takes a write lock irq on the task list lock, before receiving the IPI. Thus,
process A holds up process C, and C can't get an IPI b/c interrupts are
disabled. i believe this is an inherent problem in using smp_call_function, in
that one can't synchronize the processes on each other...you can reproduce
these issues using the test module below item 2)

In order to address these issues, i've modified stop_machine_run() to take an
new third parameter, RUN_ALL, to allow stop_machine_run() to run a function
on all cpus, item 3). I then modified kernel/immediate.c to use this new 
infrastructure item 4). the code in immediate.c simplifies quite a bit. This
new code has been robust through all testing thus far.

thanks,

-Jason

1)

 NMI Watchdog detected LOCKUP on CPU 3
CPU 3
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7cc>]  [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
RSP: 0018:ffff81007fbdff70  EFLAGS: 00000002
RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fbd2930
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffff81007fbdff78 R08: ffff81007fbdff78 R09: ffff81007dd91e80
R10: ffff810001030b80 R11: 0000000000000246 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001030a80
FS:  0000000000000000(0000) GS:ffff81007f801c80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000025e8d68 CR3: 000000007549c000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff81007fbd6000, task ffff81007fbd2930)
Stack:  0000000000000000 ffff81007fbdffa8 ffffffff8101ca4e ffff81007fbdffa8
 ffffffff8100b0e2 0000000000000003 0000000000000040 ffff81007fbd7e60
 ffffffff8100cac6 ffff81007fbd7e60 <EOI>  ffff81007fbd7ee8 0000000000000246
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
 [<ffffffff8100a723>] ? enter_idle+0x22/0x24
 [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
 [<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6


---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 2
CPU 2
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 9704, comm: toggle-writer Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8101c717>]  [<ffffffff8101c717>] __smp_call_function_mask+0x9f/0xc1
RSP: 0018:ffff81003ac35da8  EFLAGS: 00000297
RAX: 00000000000008fc RBX: 000000000000000b RCX: 00000000000008fc
RDX: 00000000000008fc RSI: 00000000000000fc RDI: 000000000000000b
RBP: ffff81003ac35e08 R08: ffffffff8840504f R09: 00007f01f07696f0
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8106e7b3
FS:  00007f01f07696f0(0000) GS:ffff81007f801980(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f01f0796000 CR3: 000000006bcdf000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process toggle-writer (pid: 9704, threadinfo ffff81003ac34000, task ffff81003ad14930)
Stack:  ffffffff8106e7b3 0000000000000000 0000000000000002 00003fff00000000
 000000000000000b ffffffff81075c03 00000010001280d2 0000000000000000
 0000000000000000 ffffffff8106e7b3 000000000000000f 0000000000000005
Call Trace:
 [<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
 [<ffffffff81075c03>] ? __alloc_pages+0x8e/0x337
 [<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
 [<ffffffff8101c783>] smp_call_function_mask+0x4a/0x59
 [<ffffffff8101c7ab>] smp_call_function+0x19/0x1b
 [<ffffffff8106e703>] imv_update_range+0xd7/0x16e
 [<ffffffff8105494e>] _module_imv_update+0x35/0x54
 [<ffffffff81054982>] module_imv_update+0x15/0x23
 [<ffffffff884050c4>] :toggle_tester:proc_toggle_write+0x4c/0x64
 [<ffffffff810d64a4>] proc_reg_write+0x7b/0x96
 [<ffffffff81099f72>] vfs_write+0xae/0x157
 [<ffffffff8109a53f>] sys_write+0x47/0x70
 [<ffffffff8100c0e9>] tracesys+0xdc/0xe1

Code: f8 48 3b 5d c0 48 8b 05 28 1a 41 00 75 0a bf fc 00 00 00 ff 50 38 eb 0f be fc 00 00 00 48
---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7cc>]  [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
RSP: 0018:ffff81007fb6ff70  EFLAGS: 00000002
RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fb5f260
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffff81007fb6ff78 R08: ffff81007fb6ff78 R09: ffff810001016700
R10: ffff810001018b80 R11: 0000000000000001 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001018a80
FS:  0000000000000000(0000) GS:ffff81007f801680(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000031fac9ac20 CR3: 00000000778e6000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff81007fb6a000, task ffff81007fb5f260)
Stack:  0000000000000000 ffff81007fb6ffa8 ffffffff8101ca4e ffff81007fb6ffa8
 ffffffff8100b0e2 0000000000000001 0000000000000040 ffff81007fb6be60
 ffffffff8100cac6 ffff81007fb6be60 <EOI>  ffff81007fb6bee8 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
 [<ffffffff8100a723>] ? enter_idle+0x22/0x24
 [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
 [<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6

Code: 81 48 c7 c7 02 fd 36 81 48 89 e5 e8 7b fe ff ff c9 c3 55 48 89 e5 53 9c 5e fa f0 ff 0d 82
---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 0
Kernel panic - not syncing: Aiee, killing interrupt handler!
CPU 0
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7e6>]  [<ffffffff8106e7e6>] ipi_busy_loop+0x33/0x41
RSP: 0018:ffffffff81498f70  EFLAGS: 00000006
RAX: 0000000000000004 RBX: 0000000000000000 RCX: ffffffff81394620
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff81498f78 R08: ffffffff81498f78 R09: ffff810062cc1e98
R10: ffff81000100cb80 R11: ffff810062cc1d58 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81466640 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff813d2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000031fac6f060 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81434000, task ffffffff81394620)
Stack:  0000000000000000 ffffffff81498fa8 ffffffff8101ca4e ffffffff8100b0e2
 ffffffff8100b0e2 0000000000000000 ffffffffffffffff ffffffff81435ec0
 ffffffff8100cac6 ffffffff81435ec0 <EOI>  ffffffff81435f48 ffff810062cc1d58
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
 [<ffffffff8100a723>] ? enter_idle+0x22/0x24
 [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
 [<ffffffff81266ee6>] ? rest_init+0x5a/0x5c

Code: f0 ff 0d 82 6b 49 00 0f ae f0 48 63 05 78 6b 49 00 48 3b 05 5d 6b 49 00 7f ed f0 ff 0d 68
---[ end trace 738433437d960ebc ]---


2)


--- /dev/null
+++ b/samples/markers/toggle-tester.c
@@ -0,0 +1,90 @@
+/* probe-example.c
+ *
+ * toggle module to test immedidate infrastructure.
+ *
+ * (C) Copyright 2007 Jason Baron <jbaron@redhat.com>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <asm/atomic.h>
+#include <linux/seq_file.h>
+#include <linux/fs.h>
+#include <linux/proc_fs.h>
+
+static DECLARE_MUTEX(toggle_mutex);
+DEFINE_IMV(char, toggle_value) = 0;
+
+static ssize_t proc_toggle_write(struct file *file, const char __user *buf,
+                                size_t length, loff_t *ppos)
+{
+	int i;
+
+	down(&toggle_mutex);
+	i = imv_read(toggle_value);
+	imv_set(toggle_value, !i);
+	up(&toggle_mutex);
+
+	return 0;
+}
+
+static int proc_toggle_show(struct seq_file *s, void *p)
+{
+	int i;
+
+	down(&toggle_mutex);        
+        i = imv_read(toggle_value);
+	up(&toggle_mutex);
+
+	seq_printf(s, "%u\n", i);
+
+        return 0;
+}
+
+
+static int proc_toggle_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, proc_toggle_show, NULL);
+}
+
+
+static const struct file_operations toggle_operations = {
+	.open           = proc_toggle_open,
+	.read           = seq_read,
+	.write          = proc_toggle_write,
+	.llseek         = seq_lseek,
+	.release        = single_release,
+};
+	
+
+struct proc_dir_entry *pde;
+
+static int __init toggle_init(void)
+{
+	
+	pde = create_proc_entry("toggle", 0, NULL);
+	if (!pde)
+		return -ENOMEM;
+	pde->proc_fops = &toggle_operations;
+
+
+	return 0;
+}
+
+static void __exit toggle_fini(void)
+{
+	remove_proc_entry("toggle", &proc_root);
+	
+}
+
+module_init(toggle_init);
+module_exit(toggle_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jason Baron");
+MODULE_DESCRIPTION("testing immediate implementation");


3)


diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 5bfc553..1ab1c5b 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -8,6 +8,9 @@
 #include <asm/system.h>
 
 #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+
+#define RUN_ALL ~0U
+
 /**
  * stop_machine_run: freeze the machine on all CPUs and run this function
  * @fn: the function to run
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 51b5ee5..e6ee46f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -23,9 +23,17 @@ enum stopmachine_state {
 	STOPMACHINE_WAIT,
 	STOPMACHINE_PREPARE,
 	STOPMACHINE_DISABLE_IRQ,
+	STOPMACHINE_RUN,
 	STOPMACHINE_EXIT,
 };
 
+struct stop_machine_data {
+	int (*fn)(void *);
+	void *data;
+	struct completion done;
+	int run_all;
+} smdata;
+
 static enum stopmachine_state stopmachine_state;
 static unsigned int stopmachine_num_threads;
 static atomic_t stopmachine_thread_ack;
@@ -35,6 +43,7 @@ static int stopmachine(void *cpu)
 {
 	int irqs_disabled = 0;
 	int prepared = 0;
+	int ran = 0;
 
 	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
 
@@ -59,6 +68,11 @@ static int stopmachine(void *cpu)
 			prepared = 1;
 			smp_mb(); /* Must read state first. */
 			atomic_inc(&stopmachine_thread_ack);
+		} else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
+			smdata.fn(smdata.data);
+			ran = 1;
+			smp_mb(); /* Must read state first. */
+			atomic_inc(&stopmachine_thread_ack);
 		}
 		/* Yield in first stage: migration threads need to
 		 * help our sisters onto their CPUs. */
@@ -136,12 +150,10 @@ static void restart_machine(void)
 	preempt_enable_no_resched();
 }
 
-struct stop_machine_data
+static void run_other_cpus(void)
 {
-	int (*fn)(void *);
-	void *data;
-	struct completion done;
-};
+	stopmachine_set_state(STOPMACHINE_RUN);
+}
 
 static int do_stop(void *_smdata)
 {
@@ -151,6 +163,8 @@ static int do_stop(void *_smdata)
 	ret = stop_machine();
 	if (ret == 0) {
 		ret = smdata->fn(smdata->data);
+		if (smdata->run_all)
+			run_other_cpus();
 		restart_machine();
 	}
 
@@ -170,17 +184,16 @@ static int do_stop(void *_smdata)
 struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
 				       unsigned int cpu)
 {
-	struct stop_machine_data smdata;
 	struct task_struct *p;
 
+	down(&stopmachine_mutex);
 	smdata.fn = fn;
 	smdata.data = data;
+	smdata.run_all = (cpu == RUN_ALL) ? 1 : 0;
 	init_completion(&smdata.done);
-
-	down(&stopmachine_mutex);
-
+	smp_wmb();
 	/* If they don't care which CPU fn runs on, bind to any online one. */
-	if (cpu == NR_CPUS)
+	if (cpu == NR_CPUS || cpu == RUN_ALL)
 		cpu = raw_smp_processor_id();
 
 	p = kthread_create(do_stop, &smdata, "kstopmachine");


4)


diff --git a/kernel/immediate.c b/kernel/immediate.c
index 4c36a89..39ec13e 100644
--- a/kernel/immediate.c
+++ b/kernel/immediate.c
@@ -20,6 +20,7 @@
 #include <linux/immediate.h>
 #include <linux/memory.h>
 #include <linux/cpu.h>
+#include <linux/stop_machine.h>
 
 #include <asm/cacheflush.h>
 
@@ -30,6 +31,23 @@ static int imv_early_boot_complete;
 
 extern const struct __imv __start___imv[];
 extern const struct __imv __stop___imv[];
+int started;
+
+int stop_machine_imv_update(void *imv_ptr)
+{
+	struct __imv *imv = imv_ptr;
+
+	if (!started) {
+		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
+		started = 1;
+		smp_wmb();
+	} else
+		sync_core();
+
+	flush_icache_range(imv->imv, imv->imv + imv->size);
+
+	return 0;
+}
 
 /*
  * imv_mutex nests inside module_mutex. imv_mutex protects builtin
@@ -37,38 +55,6 @@ extern const struct __imv __stop___imv[];
  */
 static DEFINE_MUTEX(imv_mutex);
 
-static atomic_t wait_sync;
-
-struct ipi_loop_data {
-	long value;
-	const struct __imv *imv;
-} loop_data;
-
-static void ipi_busy_loop(void *arg)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	atomic_dec(&wait_sync);
-	do {
-		/* Make sure the wait_sync gets re-read */
-		smp_mb();
-	} while (atomic_read(&wait_sync) > loop_data.value);
-	atomic_dec(&wait_sync);
-	do {
-		/* Make sure the wait_sync gets re-read */
-		smp_mb();
-	} while (atomic_read(&wait_sync) > 0);
-	/*
-	 * Issuing a synchronizing instruction must be done on each CPU before
-	 * reenabling interrupts after modifying an instruction. Required by
-	 * Intel's errata.
-	 */
-	sync_core();
-	flush_icache_range(loop_data.imv->imv,
-		loop_data.imv->imv + loop_data.imv->size);
-	local_irq_restore(flags);
-}
 
 /**
  * apply_imv_update - update one immediate value
@@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
  */
 static int apply_imv_update(const struct __imv *imv)
 {
-	unsigned long flags;
-	long online_cpus;
-
 	/*
 	 * If the variable and the instruction have the same value, there is
 	 * nothing to do.
@@ -111,30 +94,8 @@ static int apply_imv_update(const struct __imv *imv)
 
 	if (imv_early_boot_complete) {
 		kernel_text_lock();
-		get_online_cpus();
-		online_cpus = num_online_cpus();
-		atomic_set(&wait_sync, 2 * online_cpus);
-		loop_data.value = online_cpus;
-		loop_data.imv = imv;
-		smp_call_function(ipi_busy_loop, NULL, 1, 0);
-		local_irq_save(flags);
-		atomic_dec(&wait_sync);
-		do {
-			/* Make sure the wait_sync gets re-read */
-			smp_mb();
-		} while (atomic_read(&wait_sync) > online_cpus);
-		text_poke((void *)imv->imv, (void *)imv->var,
-				imv->size);
-		/*
-		 * Make sure the modified instruction is seen by all CPUs before
-		 * we continue (visible to other CPUs and local interrupts).
-		 */
-		wmb();
-		atomic_dec(&wait_sync);
-		flush_icache_range(imv->imv,
-				imv->imv + imv->size);
-		local_irq_restore(flags);
-		put_online_cpus();
+		started = 0;
+		stop_machine_run(stop_machine_imv_update, (void *)imv, RUN_ALL);
 		kernel_text_unlock();
 	} else
 		text_poke_early((void *)imv->imv, (void *)imv->var,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 22:52   ` Jason Baron
@ 2008-02-26 23:12     ` Mathieu Desnoyers
  2008-02-26 23:34       ` Mathieu Desnoyers
  2008-02-27 17:01       ` Jason Baron
  2008-02-27 19:05     ` Mathieu Desnoyers
  1 sibling, 2 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-26 23:12 UTC (permalink / raw)
  To: Jason Baron; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell, Jan Kiszka

* Jason Baron (jbaron@redhat.com) wrote:
> On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> > Changelog:
> > 
> > - section __imv is already SHF_ALLOC
> > - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
> >   the if (immediateindex) is unnecessary here.
> > - Remove module_mutex usage: depend on functions implemented in module.c for
> >   that.
> 
> hi,
> 
> In testing this patch, i've run across a deadlock...apply_imv_update() can get
> called again before, ipi_busy_loop() has had a chance to finsh, and set 
> wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
> indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
> deadlock below using nmi_watchdog=1 in item 1). 
> 

Hrm, yes, Jan pointed out the exact same problem in my ltt-test-tsc TSC
test module in LTTng a few days ago. His fix implied to add another
barrier upon which the smp_call_function() caller must wait for the ipis
to finish. Since this immediate value code does the same I did in my
ltt-test-tsc code, the same fix will likely apply.

I'll cook something.

Hrm, does your stop machine implementation disable interrupts while the
CPUs busy loop ?

> After hitting this deadlock i modified apply_imv_update() to wait for 
> ipi_busy_loop(), to finish, however this resulted in a 3 way deadlock. Process
> A held a read lock on tasklist_lock, then process B called apply_imv_update().
> Process A received the IPI and begins executing ipi_busy_loop(). Then process
> C takes a write lock irq on the task list lock, before receiving the IPI. Thus,
> process A holds up process C, and C can't get an IPI b/c interrupts are
> disabled. i believe this is an inherent problem in using smp_call_function, in
> that one can't synchronize the processes on each other...you can reproduce
> these issues using the test module below item 2)
> 
> In order to address these issues, i've modified stop_machine_run() to take an
> new third parameter, RUN_ALL, to allow stop_machine_run() to run a function
> on all cpus, item 3). I then modified kernel/immediate.c to use this new 
> infrastructure item 4). the code in immediate.c simplifies quite a bit. This
> new code has been robust through all testing thus far.
> 
> thanks,
> 
> -Jason
> 
> 1)
> 
>  NMI Watchdog detected LOCKUP on CPU 3
> CPU 3
> Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
> RIP: 0010:[<ffffffff8106e7cc>]  [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
> RSP: 0018:ffff81007fbdff70  EFLAGS: 00000002
> RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fbd2930
> RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
> RBP: ffff81007fbdff78 R08: ffff81007fbdff78 R09: ffff81007dd91e80
> R10: ffff810001030b80 R11: 0000000000000246 R12: ffffffff8106e7b3
> R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001030a80
> FS:  0000000000000000(0000) GS:ffff81007f801c80(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000025e8d68 CR3: 000000007549c000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff81007fbd6000, task ffff81007fbd2930)
> Stack:  0000000000000000 ffff81007fbdffa8 ffffffff8101ca4e ffff81007fbdffa8
>  ffffffff8100b0e2 0000000000000003 0000000000000040 ffff81007fbd7e60
>  ffffffff8100cac6 ffff81007fbd7e60 <EOI>  ffff81007fbd7ee8 0000000000000246
> Call Trace:
>  <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
>  [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
>  [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
>  <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
>  [<ffffffff8100a723>] ? enter_idle+0x22/0x24
>  [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
>  [<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6
> 
> 
> ---[ end trace 738433437d960ebc ]---
> NMI Watchdog detected LOCKUP on CPU 2
> CPU 2
> Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 9704, comm: toggle-writer Not tainted 2.6.24-git12markers #1
> RIP: 0010:[<ffffffff8101c717>]  [<ffffffff8101c717>] __smp_call_function_mask+0x9f/0xc1
> RSP: 0018:ffff81003ac35da8  EFLAGS: 00000297
> RAX: 00000000000008fc RBX: 000000000000000b RCX: 00000000000008fc
> RDX: 00000000000008fc RSI: 00000000000000fc RDI: 000000000000000b
> RBP: ffff81003ac35e08 R08: ffffffff8840504f R09: 00007f01f07696f0
> R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
> R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8106e7b3
> FS:  00007f01f07696f0(0000) GS:ffff81007f801980(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f01f0796000 CR3: 000000006bcdf000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process toggle-writer (pid: 9704, threadinfo ffff81003ac34000, task ffff81003ad14930)
> Stack:  ffffffff8106e7b3 0000000000000000 0000000000000002 00003fff00000000
>  000000000000000b ffffffff81075c03 00000010001280d2 0000000000000000
>  0000000000000000 ffffffff8106e7b3 000000000000000f 0000000000000005
> Call Trace:
>  [<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
>  [<ffffffff81075c03>] ? __alloc_pages+0x8e/0x337
>  [<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
>  [<ffffffff8101c783>] smp_call_function_mask+0x4a/0x59
>  [<ffffffff8101c7ab>] smp_call_function+0x19/0x1b
>  [<ffffffff8106e703>] imv_update_range+0xd7/0x16e
>  [<ffffffff8105494e>] _module_imv_update+0x35/0x54
>  [<ffffffff81054982>] module_imv_update+0x15/0x23
>  [<ffffffff884050c4>] :toggle_tester:proc_toggle_write+0x4c/0x64
>  [<ffffffff810d64a4>] proc_reg_write+0x7b/0x96
>  [<ffffffff81099f72>] vfs_write+0xae/0x157
>  [<ffffffff8109a53f>] sys_write+0x47/0x70
>  [<ffffffff8100c0e9>] tracesys+0xdc/0xe1
> 
> Code: f8 48 3b 5d c0 48 8b 05 28 1a 41 00 75 0a bf fc 00 00 00 ff 50 38 eb 0f be fc 00 00 00 48
> ---[ end trace 738433437d960ebc ]---
> NMI Watchdog detected LOCKUP on CPU 1
> CPU 1
> Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
> RIP: 0010:[<ffffffff8106e7cc>]  [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
> RSP: 0018:ffff81007fb6ff70  EFLAGS: 00000002
> RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fb5f260
> RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
> RBP: ffff81007fb6ff78 R08: ffff81007fb6ff78 R09: ffff810001016700
> R10: ffff810001018b80 R11: 0000000000000001 R12: ffffffff8106e7b3
> R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001018a80
> FS:  0000000000000000(0000) GS:ffff81007f801680(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000031fac9ac20 CR3: 00000000778e6000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff81007fb6a000, task ffff81007fb5f260)
> Stack:  0000000000000000 ffff81007fb6ffa8 ffffffff8101ca4e ffff81007fb6ffa8
>  ffffffff8100b0e2 0000000000000001 0000000000000040 ffff81007fb6be60
>  ffffffff8100cac6 ffff81007fb6be60 <EOI>  ffff81007fb6bee8 0000000000000001
> Call Trace:
>  <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
>  [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
>  [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
>  <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
>  [<ffffffff8100a723>] ? enter_idle+0x22/0x24
>  [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
>  [<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6
> 
> Code: 81 48 c7 c7 02 fd 36 81 48 89 e5 e8 7b fe ff ff c9 c3 55 48 89 e5 53 9c 5e fa f0 ff 0d 82
> ---[ end trace 738433437d960ebc ]---
> NMI Watchdog detected LOCKUP on CPU 0
> Kernel panic - not syncing: Aiee, killing interrupt handler!
> CPU 0
> Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
> RIP: 0010:[<ffffffff8106e7e6>]  [<ffffffff8106e7e6>] ipi_busy_loop+0x33/0x41
> RSP: 0018:ffffffff81498f70  EFLAGS: 00000006
> RAX: 0000000000000004 RBX: 0000000000000000 RCX: ffffffff81394620
> RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
> RBP: ffffffff81498f78 R08: ffffffff81498f78 R09: ffff810062cc1e98
> R10: ffff81000100cb80 R11: ffff810062cc1d58 R12: ffffffff8106e7b3
> R13: 0000000000000000 R14: ffffffff81466640 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffffffff813d2000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000031fac6f060 CR3: 0000000000201000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff81434000, task ffffffff81394620)
> Stack:  0000000000000000 ffffffff81498fa8 ffffffff8101ca4e ffffffff8100b0e2
>  ffffffff8100b0e2 0000000000000000 ffffffffffffffff ffffffff81435ec0
>  ffffffff8100cac6 ffffffff81435ec0 <EOI>  ffffffff81435f48 ffff810062cc1d58
> Call Trace:
>  <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
>  [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
>  [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
>  [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
>  <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
>  [<ffffffff8100a723>] ? enter_idle+0x22/0x24
>  [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
>  [<ffffffff81266ee6>] ? rest_init+0x5a/0x5c
> 
> Code: f0 ff 0d 82 6b 49 00 0f ae f0 48 63 05 78 6b 49 00 48 3b 05 5d 6b 49 00 7f ed f0 ff 0d 68
> ---[ end trace 738433437d960ebc ]---
> 
> 
> 2)
> 
> 
> --- /dev/null
> +++ b/samples/markers/toggle-tester.c
> @@ -0,0 +1,90 @@
> +/* probe-example.c
> + *
> + * toggle module to test immedidate infrastructure.
> + *
> + * (C) Copyright 2007 Jason Baron <jbaron@redhat.com>
> + *
> + * This file is released under the GPLv2.
> + * See the file COPYING for more details.
> + */
> +
> +#include <linux/sched.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/marker.h>
> +#include <asm/atomic.h>
> +#include <linux/seq_file.h>
> +#include <linux/fs.h>
> +#include <linux/proc_fs.h>
> +
> +static DECLARE_MUTEX(toggle_mutex);
> +DEFINE_IMV(char, toggle_value) = 0;
> +
> +static ssize_t proc_toggle_write(struct file *file, const char __user *buf,
> +                                size_t length, loff_t *ppos)
> +{
> +	int i;
> +
> +	down(&toggle_mutex);
> +	i = imv_read(toggle_value);
> +	imv_set(toggle_value, !i);
> +	up(&toggle_mutex);
> +
> +	return 0;
> +}
> +
> +static int proc_toggle_show(struct seq_file *s, void *p)
> +{
> +	int i;
> +
> +	down(&toggle_mutex);        
> +        i = imv_read(toggle_value);
> +	up(&toggle_mutex);
> +
> +	seq_printf(s, "%u\n", i);
> +
> +        return 0;
> +}
> +
> +
> +static int proc_toggle_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, proc_toggle_show, NULL);
> +}
> +
> +
> +static const struct file_operations toggle_operations = {
> +	.open           = proc_toggle_open,
> +	.read           = seq_read,
> +	.write          = proc_toggle_write,
> +	.llseek         = seq_lseek,
> +	.release        = single_release,
> +};
> +	
> +
> +struct proc_dir_entry *pde;
> +
> +static int __init toggle_init(void)
> +{
> +	
> +	pde = create_proc_entry("toggle", 0, NULL);
> +	if (!pde)
> +		return -ENOMEM;
> +	pde->proc_fops = &toggle_operations;
> +
> +
> +	return 0;
> +}
> +
> +static void __exit toggle_fini(void)
> +{
> +	remove_proc_entry("toggle", &proc_root);
> +	
> +}
> +
> +module_init(toggle_init);
> +module_exit(toggle_fini);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Jason Baron");
> +MODULE_DESCRIPTION("testing immediate implementation");
> 
> 
> 3)
> 
> 
> diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> index 5bfc553..1ab1c5b 100644
> --- a/include/linux/stop_machine.h
> +++ b/include/linux/stop_machine.h
> @@ -8,6 +8,9 @@
>  #include <asm/system.h>
>  
>  #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> +
> +#define RUN_ALL ~0U
> +
>  /**
>   * stop_machine_run: freeze the machine on all CPUs and run this function
>   * @fn: the function to run
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 51b5ee5..e6ee46f 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -23,9 +23,17 @@ enum stopmachine_state {
>  	STOPMACHINE_WAIT,
>  	STOPMACHINE_PREPARE,
>  	STOPMACHINE_DISABLE_IRQ,
> +	STOPMACHINE_RUN,
>  	STOPMACHINE_EXIT,
>  };
>  
> +struct stop_machine_data {
> +	int (*fn)(void *);
> +	void *data;
> +	struct completion done;
> +	int run_all;
> +} smdata;
> +
>  static enum stopmachine_state stopmachine_state;
>  static unsigned int stopmachine_num_threads;
>  static atomic_t stopmachine_thread_ack;
> @@ -35,6 +43,7 @@ static int stopmachine(void *cpu)
>  {
>  	int irqs_disabled = 0;
>  	int prepared = 0;
> +	int ran = 0;
>  
>  	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
>  
> @@ -59,6 +68,11 @@ static int stopmachine(void *cpu)
>  			prepared = 1;
>  			smp_mb(); /* Must read state first. */
>  			atomic_inc(&stopmachine_thread_ack);
> +		} else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
> +			smdata.fn(smdata.data);
> +			ran = 1;
> +			smp_mb(); /* Must read state first. */
> +			atomic_inc(&stopmachine_thread_ack);
>  		}
>  		/* Yield in first stage: migration threads need to
>  		 * help our sisters onto their CPUs. */
> @@ -136,12 +150,10 @@ static void restart_machine(void)
>  	preempt_enable_no_resched();
>  }
>  
> -struct stop_machine_data
> +static void run_other_cpus(void)
>  {
> -	int (*fn)(void *);
> -	void *data;
> -	struct completion done;
> -};
> +	stopmachine_set_state(STOPMACHINE_RUN);
> +}
>  
>  static int do_stop(void *_smdata)
>  {
> @@ -151,6 +163,8 @@ static int do_stop(void *_smdata)
>  	ret = stop_machine();
>  	if (ret == 0) {
>  		ret = smdata->fn(smdata->data);
> +		if (smdata->run_all)
> +			run_other_cpus();
>  		restart_machine();
>  	}
>  
> @@ -170,17 +184,16 @@ static int do_stop(void *_smdata)
>  struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
>  				       unsigned int cpu)
>  {
> -	struct stop_machine_data smdata;
>  	struct task_struct *p;
>  
> +	down(&stopmachine_mutex);
>  	smdata.fn = fn;
>  	smdata.data = data;
> +	smdata.run_all = (cpu == RUN_ALL) ? 1 : 0;
>  	init_completion(&smdata.done);
> -
> -	down(&stopmachine_mutex);
> -
> +	smp_wmb();
>  	/* If they don't care which CPU fn runs on, bind to any online one. */
> -	if (cpu == NR_CPUS)
> +	if (cpu == NR_CPUS || cpu == RUN_ALL)
>  		cpu = raw_smp_processor_id();
>  
>  	p = kthread_create(do_stop, &smdata, "kstopmachine");
> 
> 
> 4)
> 
> 
> diff --git a/kernel/immediate.c b/kernel/immediate.c
> index 4c36a89..39ec13e 100644
> --- a/kernel/immediate.c
> +++ b/kernel/immediate.c
> @@ -20,6 +20,7 @@
>  #include <linux/immediate.h>
>  #include <linux/memory.h>
>  #include <linux/cpu.h>
> +#include <linux/stop_machine.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -30,6 +31,23 @@ static int imv_early_boot_complete;
>  
>  extern const struct __imv __start___imv[];
>  extern const struct __imv __stop___imv[];
> +int started;
> +
> +int stop_machine_imv_update(void *imv_ptr)
> +{
> +	struct __imv *imv = imv_ptr;
> +
> +	if (!started) {
> +		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> +		started = 1;
> +		smp_wmb();
> +	} else
> +		sync_core();
> +
> +	flush_icache_range(imv->imv, imv->imv + imv->size);
> +
> +	return 0;
> +}
>  
>  /*
>   * imv_mutex nests inside module_mutex. imv_mutex protects builtin
> @@ -37,38 +55,6 @@ extern const struct __imv __stop___imv[];
>   */
>  static DEFINE_MUTEX(imv_mutex);
>  
> -static atomic_t wait_sync;
> -
> -struct ipi_loop_data {
> -	long value;
> -	const struct __imv *imv;
> -} loop_data;
> -
> -static void ipi_busy_loop(void *arg)
> -{
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > loop_data.value);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > 0);
> -	/*
> -	 * Issuing a synchronizing instruction must be done on each CPU before
> -	 * reenabling interrupts after modifying an instruction. Required by
> -	 * Intel's errata.
> -	 */
> -	sync_core();
> -	flush_icache_range(loop_data.imv->imv,
> -		loop_data.imv->imv + loop_data.imv->size);
> -	local_irq_restore(flags);
> -}
>  
>  /**
>   * apply_imv_update - update one immediate value
> @@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
>   */
>  static int apply_imv_update(const struct __imv *imv)
>  {
> -	unsigned long flags;
> -	long online_cpus;
> -
>  	/*
>  	 * If the variable and the instruction have the same value, there is
>  	 * nothing to do.
> @@ -111,30 +94,8 @@ static int apply_imv_update(const struct __imv *imv)
>  
>  	if (imv_early_boot_complete) {
>  		kernel_text_lock();
> -		get_online_cpus();
> -		online_cpus = num_online_cpus();
> -		atomic_set(&wait_sync, 2 * online_cpus);
> -		loop_data.value = online_cpus;
> -		loop_data.imv = imv;
> -		smp_call_function(ipi_busy_loop, NULL, 1, 0);
> -		local_irq_save(flags);
> -		atomic_dec(&wait_sync);
> -		do {
> -			/* Make sure the wait_sync gets re-read */
> -			smp_mb();
> -		} while (atomic_read(&wait_sync) > online_cpus);
> -		text_poke((void *)imv->imv, (void *)imv->var,
> -				imv->size);
> -		/*
> -		 * Make sure the modified instruction is seen by all CPUs before
> -		 * we continue (visible to other CPUs and local interrupts).
> -		 */
> -		wmb();
> -		atomic_dec(&wait_sync);
> -		flush_icache_range(imv->imv,
> -				imv->imv + imv->size);
> -		local_irq_restore(flags);
> -		put_online_cpus();
> +		started = 0;
> +		stop_machine_run(stop_machine_imv_update, (void *)imv, RUN_ALL);
>  		kernel_text_unlock();
>  	} else
>  		text_poke_early((void *)imv->imv, (void *)imv->var,

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 23:12     ` Mathieu Desnoyers
@ 2008-02-26 23:34       ` Mathieu Desnoyers
  2008-02-27 16:44         ` Jason Baron
  2008-02-27 17:01       ` Jason Baron
  1 sibling, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-26 23:34 UTC (permalink / raw)
  To: Jason Baron; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell, Jan Kiszka

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Jason Baron (jbaron@redhat.com) wrote:
> > On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> > > Changelog:
> > > 
> > > - section __imv is already SHF_ALLOC
> > > - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
> > >   the if (immediateindex) is unnecessary here.
> > > - Remove module_mutex usage: depend on functions implemented in module.c for
> > >   that.
> > 
> > hi,
> > 
> > In testing this patch, i've run across a deadlock...apply_imv_update() can get
> > called again before, ipi_busy_loop() has had a chance to finsh, and set 
> > wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
> > indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
> > deadlock below using nmi_watchdog=1 in item 1). 
> > 
> 
> Hrm, yes, Jan pointed out the exact same problem in my ltt-test-tsc TSC
> test module in LTTng a few days ago. His fix implied to add another
> barrier upon which the smp_call_function() caller must wait for the ipis
> to finish. Since this immediate value code does the same I did in my
> ltt-test-tsc code, the same fix will likely apply.
> 
> I'll cook something.
> 

This should work. Untested for now. Can you give it a try ?

Fix Immediate Deadlock

Jason Baron <jbaron@redhat.com> :

In testing this patch, i've run across a deadlock...apply_imv_update() can get
called again before, ipi_busy_loop() has had a chance to finsh, and set
wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
indefinitely and the subsequent apply_imv_update() hangs. I've shown this
deadlock below using nmi_watchdog=1 in item 1).

Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> :

Hrm, yes, Jan pointed out the exact same problem in my ltt-test-tsc TSC
test module in LTTng a few days ago. His fix implied to add another
barrier upon which the smp_call_function() caller must wait for the ipis
to finish. Since this immediate value code does the same I did in my
ltt-test-tsc code, the same fix will likely apply.

Thanks to Jason Baron for finding this bug and proposing an initial
implementation.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jason Baron <jbaron@redhat.com>
---
 kernel/immediate.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- linux-2.6-lttng.orig/kernel/immediate.c	2008-02-26 18:16:10.000000000 -0500
+++ linux-2.6-lttng/kernel/immediate.c	2008-02-26 18:29:32.000000000 -0500
@@ -53,12 +53,12 @@ static void ipi_busy_loop(void *arg)
 	do {
 		/* Make sure the wait_sync gets re-read */
 		smp_mb();
-	} while (atomic_read(&wait_sync) > loop_data.value);
+	} while (atomic_read(&wait_sync) > 2 * loop_data.value);
 	atomic_dec(&wait_sync);
 	do {
 		/* Make sure the wait_sync gets re-read */
 		smp_mb();
-	} while (atomic_read(&wait_sync) > 0);
+	} while (atomic_read(&wait_sync) > loop_data.value);
 	/*
 	 * Issuing a synchronizing instruction must be done on each CPU before
 	 * reenabling interrupts after modifying an instruction. Required by
@@ -67,6 +67,7 @@ static void ipi_busy_loop(void *arg)
 	sync_core();
 	flush_icache_range(loop_data.imv->imv,
 		loop_data.imv->imv + loop_data.imv->size);
+	atomic_dec(&wait_sync);
 	local_irq_restore(flags);
 }
 
@@ -113,7 +114,7 @@ static int apply_imv_update(const struct
 		kernel_text_lock();
 		get_online_cpus();
 		online_cpus = num_online_cpus();
-		atomic_set(&wait_sync, 2 * online_cpus);
+		atomic_set(&wait_sync, 3 * online_cpus);
 		loop_data.value = online_cpus;
 		loop_data.imv = imv;
 		smp_call_function(ipi_busy_loop, NULL, 1, 0);
@@ -122,7 +123,7 @@ static int apply_imv_update(const struct
 		do {
 			/* Make sure the wait_sync gets re-read */
 			smp_mb();
-		} while (atomic_read(&wait_sync) > online_cpus);
+		} while (atomic_read(&wait_sync) > 2 * online_cpus);
 		text_poke((void *)imv->imv, (void *)imv->var,
 				imv->size);
 		/*
@@ -133,6 +134,12 @@ static int apply_imv_update(const struct
 		atomic_dec(&wait_sync);
 		flush_icache_range(imv->imv,
 				imv->imv + imv->size);
+		/*
+		 * Wait until all other CPUs are done so that we don't overwrite
+		 * loop_data or wait_sync prematurely.
+		 */
+		while (unlikely(atomic_read(&wait_sync) > 1))
+			cpu_relax();
 		local_irq_restore(flags);
 		put_online_cpus();
 		kernel_text_unlock();

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 23:34       ` Mathieu Desnoyers
@ 2008-02-27 16:44         ` Jason Baron
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Baron @ 2008-02-27 16:44 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell, Jan Kiszka

On Tue, Feb 26, 2008 at 06:34:45PM -0500, Mathieu Desnoyers wrote:
> > > In testing this patch, i've run across a deadlock...apply_imv_update() can get
> > > called again before, ipi_busy_loop() has had a chance to finsh, and set 
> > > wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
> > > indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
> > > deadlock below using nmi_watchdog=1 in item 1). 
> > > 
> > 
> > Hrm, yes, Jan pointed out the exact same problem in my ltt-test-tsc TSC
> > test module in LTTng a few days ago. His fix implied to add another
> > barrier upon which the smp_call_function() caller must wait for the ipis
> > to finish. Since this immediate value code does the same I did in my
> > ltt-test-tsc code, the same fix will likely apply.
> > 
> > I'll cook something.
> > 
> 
> This should work. Untested for now. Can you give it a try ?
> 

this patch results in the subsequent 3 way deadlock that I described in the
previous mail. smp_call_function() can not be used with a function that 
attempts to rendezvous cpus in the manner being done here. The patch I posted 
in the previous mail addresses these limitations on smp_call_functions(). trace
of the lockup using this patch is shown below. 

thanks,

-Jason


NMI Watchdog detected LOCKUP on CPU 2
CPU 2
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 11233, comm: make Not tainted 2.6.24-git12markers #2
RIP: 0010:[<ffffffff811248d9>]  [<ffffffff811248d9>] __write_lock_failed+0x9/0x20
RSP: 0018:ffff81006e025e30  EFLAGS: 00000087
RAX: ffff81006ece8b58 RBX: 0000000000000000 RCX: ffff81006e1f0ec8
RDX: 0000000000000011 RSI: fe60000000000000 RDI: ffffffff813d4000
RBP: ffff81006e025e38 R08: ffff81006e1f0e90 R09: 00000000ffffffff
R10: ffff810072c45e98 R11: 0000000000004111 R12: 00000000fffffff4
R13: ffff81006ece8930 R14: ffff81006818b260 R15: 0000000000000000
FS:  00002b14168b66f0(0000) GS:ffff81007f801980(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000106f000 CR3: 00000000680d2000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process make (pid: 11233, threadinfo ffff81006e024000, task ffff81006818b260)
Stack:  ffffffff8127536a ffff81006e025ed8 ffffffff81034260 ffff81006e1f0e80
 0000000000000000 0000000000000000 ffff81006e025f58 00007fff94222fe0
 0000000000004111 ffff81006ece8930 0000000000000000 ffff81006ec6f040
Call Trace:
 [<ffffffff8127536a>] ? _write_lock_irq+0x13/0x15
 [<ffffffff81034260>] copy_process+0xf7c/0x1477
 [<ffffffff81034870>] do_fork+0x75/0x20a
 [<ffffffff8100c0e9>] ? tracesys+0xdc/0xe1
 [<ffffffff8100a3c6>] sys_vfork+0x20/0x22
 [<ffffffff8100c287>] ptregscall_common+0x67/0xb0

Code: e9 07 48 89 11 31 c0 c3 48 83 e9 07 eb 00 48 c7 c0 f2 ff ff ff c3 90 90 90 90 90 90 90 90
---[ end trace 6a2bbda3f47e95fd ]---
NMI Watchdog detected LOCKUP on CPU 3
CPU 3
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 3258, comm: toggle-writer Not tainted 2.6.24-git12markers #2
RIP: 0010:[<ffffffff8101c717>]  [<ffffffff8101c717>] __smp_call_function_mask+0x9f/0xc1
RSP: 0018:ffff81006ec51da8  EFLAGS: 00000297
RAX: 00000000000008fc RBX: 0000000000000007 RCX: 000000007688fa00
RDX: 00000000000008fc RSI: 00000000000000fc RDI: 0000000000000007
RBP: ffff81006ec51e08 R08: ffffffff8840504f R09: 00007fa6423566f0
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8106e7c3
FS:  00007fa6423566f0(0000) GS:ffff81007f801c80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031fb603080 CR3: 000000006ec49000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process toggle-writer (pid: 3258, threadinfo ffff81006ec50000, task ffff81007899a000)
Stack:  ffffffff8106e7c3 0000000000000000 0000000300000002 ffffffff00000000
 0000000000000007 0000000000000292 ffff81007899a000 0000000000000000
 0000000000000000 ffffffff8106e7c3 000000000000000f 0000000000000005
Call Trace:
 [<ffffffff8106e7c3>] ? ipi_busy_loop+0x0/0x55
 [<ffffffff8106e7c3>] ? ipi_busy_loop+0x0/0x55
 [<ffffffff8101c783>] smp_call_function_mask+0x4a/0x59
 [<ffffffff8101c7ab>] smp_call_function+0x19/0x1b
 [<ffffffff8106e702>] imv_update_range+0xd6/0x17e
 [<ffffffff8105494e>] _module_imv_update+0x35/0x54
 [<ffffffff81054982>] module_imv_update+0x15/0x23
 [<ffffffff884050c4>] :toggle_tester:proc_toggle_write+0x4c/0x64
 [<ffffffff810d64c8>] proc_reg_write+0x7b/0x96
 [<ffffffff81099f96>] vfs_write+0xae/0x157
 [<ffffffff8109a563>] sys_write+0x47/0x70
 [<ffffffff8100c0e9>] tracesys+0xdc/0xe1

Code: f8 48 3b 5d c0 48 8b 05 28 1a 41 00 75 0a bf fc 00 00 00 ff 50 38 eb 0f be fc 00 00 00 48
---[ end trace 6a2bbda3f47e95fd ]---
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #2
RIP: 0010:[<ffffffff8106e7dc>]  [<ffffffff8106e7dc>] ipi_busy_loop+0x19/0x55
RSP: 0018:ffff81007fb6ff70  EFLAGS: 00000002
RAX: 0000000000000008 RBX: 0000000000000000 RCX: ffff81007fb5f260
RDX: 000000000000000a RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffff81007fb6ff78 R08: ffff81007fb6ff78 R09: 0000000000000002
R10: ffff810001018b80 R11: ffff810072dddc18 R12: ffffffff8106e7c3
R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001018a80
FS:  0000000000000000(0000) GS:ffff81007f801680(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000031fac9afa0 CR3: 0000000072cfb000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff81007fb6a000, task ffff81007fb5f260)
Stack:  0000000000000000 ffff81007fb6ffa8 ffffffff8101ca4e ffff81007fb6ffa8
 ffffffff8100b0e2 0000000000000001 0000000000000040 ffff81007fb6be60
 ffffffff8100cac6 ffff81007fb6be60 <EOI>  ffff81007fb6bee8 ffff810072dddc18
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
 [<ffffffff8100a723>] ? enter_idle+0x22/0x24
 [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
 [<ffffffff81270a55>] ? start_secondary+0x3ba/0x3c6

Code: 81 48 c7 c7 02 fd 36 81 48 89 e5 e8 6b fe ff ff c9 c3 55 48 89 e5 53 9c 5e fa f0 ff 0d 72
---[ end trace 6a2bbda3f47e95fd ]---
NMI Watchdog detected LOCKUP on CPU 0
Kernel panic - not syncing: Aiee, killing interrupt handler!
CPU 0
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 5, comm: watchdog/0 Not tainted 2.6.24-git12markers #2
RIP: 0010:[<ffffffff8106e7e9>]  [<ffffffff8106e7e9>] ipi_busy_loop+0x26/0x55
RSP: 0018:ffffffff81498f70  EFLAGS: 00000002
RAX: 0000000000000008 RBX: 0000000000000000 RCX: ffffffff81394620
RDX: 000000000000000a RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff81498f78 R08: ffffffff81498f78 R09: ffff81000100cbe0
R10: ffff81007fb59e20 R11: 0000000000000001 R12: ffffffff8106e7c3
R13: 0000000000000000 R14: 00000000000000f0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff813d2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006bb374 CR3: 000000006ec88000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process watchdog/0 (pid: 5, threadinfo ffff81007fb58000, task ffff81007fb54930)
Stack:  0000000000000000 ffffffff81498fa8 ffffffff8101ca4e ffffffff884059c8
 ffff81007e4d3260 ffff81007e4d3260 00000000000003b2 ffff81007fb59e60
 ffffffff8100cac6 ffff81007fb59e60 <EOI>  ffff81007fb59f20 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff81069685>] ? watchdog+0xd5/0x1c8
 [<ffffffff810695b0>] ? watchdog+0x0/0x1c8
 [<ffffffff81047fd3>] ? kthread+0x49/0x76
 [<ffffffff8100cd88>] ? child_rip+0xa/0x12
 [<ffffffff81047f8a>] ? kthread+0x0/0x76
 [<ffffffff8100cd7e>] ? child_rip+0x0/0x12


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 23:12     ` Mathieu Desnoyers
  2008-02-26 23:34       ` Mathieu Desnoyers
@ 2008-02-27 17:01       ` Jason Baron
  1 sibling, 0 replies; 30+ messages in thread
From: Jason Baron @ 2008-02-27 17:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell, Jan Kiszka

On Tue, Feb 26, 2008 at 06:12:48PM -0500, Mathieu Desnoyers wrote:
> Hrm, does your stop machine implementation disable interrupts while the
> CPUs busy loop ?
> 

yes. this is how stop_machine is implemented...and I believe is consistent with
the way in which your algorithm disables irqs. The logic to me, is we don't
want any cpus to see the kernel text in an inconsistent state via an irq.

-Jason


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 22:52   ` Jason Baron
  2008-02-26 23:12     ` Mathieu Desnoyers
@ 2008-02-27 19:05     ` Mathieu Desnoyers
  2008-02-28 16:33       ` [patch 1/2] add ALL_CPUS option to stop_machine_run() Jason Baron
                         ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-27 19:05 UTC (permalink / raw)
  To: Jason Baron; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell

* Jason Baron (jbaron@redhat.com) wrote:
> On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> > Changelog:
> > 
> > - section __imv is already SHF_ALLOC
> > - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
> >   the if (immediateindex) is unnecessary here.
> > - Remove module_mutex usage: depend on functions implemented in module.c for
> >   that.
> 
> hi,
> 
> In testing this patch, i've run across a deadlock...apply_imv_update() can get
> called again before, ipi_busy_loop() has had a chance to finsh, and set 
> wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
> indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
> deadlock below using nmi_watchdog=1 in item 1). 
> 
> After hitting this deadlock i modified apply_imv_update() to wait for 
> ipi_busy_loop(), to finish, however this resulted in a 3 way deadlock. Process
> A held a read lock on tasklist_lock, then process B called apply_imv_update().
> Process A received the IPI and begins executing ipi_busy_loop(). Then process
> C takes a write lock irq on the task list lock, before receiving the IPI. Thus,
> process A holds up process C, and C can't get an IPI b/c interrupts are
> disabled. i believe this is an inherent problem in using smp_call_function, in
> that one can't synchronize the processes on each other...you can reproduce
> these issues using the test module below item 2)
> 
> In order to address these issues, i've modified stop_machine_run() to take an
> new third parameter, RUN_ALL, to allow stop_machine_run() to run a function
> on all cpus, item 3). I then modified kernel/immediate.c to use this new 
> infrastructure item 4). the code in immediate.c simplifies quite a bit. This
> new code has been robust through all testing thus far.
> 

Ok, I see why you did it that way. Comments follow inline.

> thanks,
> 
> -Jason
> 
> 1)
[...]
 
> 2)
> 
[...]
> 
> 3)
> 
> 
> diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> index 5bfc553..1ab1c5b 100644
> --- a/include/linux/stop_machine.h
> +++ b/include/linux/stop_machine.h
> @@ -8,6 +8,9 @@
>  #include <asm/system.h>
>  
>  #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> +
> +#define RUN_ALL ~0U
> +
>  /**
>   * stop_machine_run: freeze the machine on all CPUs and run this function
>   * @fn: the function to run
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 51b5ee5..e6ee46f 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -23,9 +23,17 @@ enum stopmachine_state {
>  	STOPMACHINE_WAIT,
>  	STOPMACHINE_PREPARE,
>  	STOPMACHINE_DISABLE_IRQ,
> +	STOPMACHINE_RUN,
>  	STOPMACHINE_EXIT,
>  };
>  
> +struct stop_machine_data {
> +	int (*fn)(void *);
> +	void *data;
> +	struct completion done;
> +	int run_all;
> +} smdata;
> +

Why do we now have to declare this static ? Can we pass it as a pointer
to stopmachine instead ?

>  static enum stopmachine_state stopmachine_state;
>  static unsigned int stopmachine_num_threads;
>  static atomic_t stopmachine_thread_ack;
> @@ -35,6 +43,7 @@ static int stopmachine(void *cpu)
>  {
>  	int irqs_disabled = 0;
>  	int prepared = 0;
> +	int ran = 0;
>  
>  	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
>  
> @@ -59,6 +68,11 @@ static int stopmachine(void *cpu)
>  			prepared = 1;
>  			smp_mb(); /* Must read state first. */
>  			atomic_inc(&stopmachine_thread_ack);
> +		} else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
> +			smdata.fn(smdata.data);
> +			ran = 1;
> +			smp_mb(); /* Must read state first. */
> +			atomic_inc(&stopmachine_thread_ack);
>  		}
>  		/* Yield in first stage: migration threads need to
>  		 * help our sisters onto their CPUs. */
> @@ -136,12 +150,10 @@ static void restart_machine(void)
>  	preempt_enable_no_resched();
>  }
>  
> -struct stop_machine_data
> +static void run_other_cpus(void)
>  {
> -	int (*fn)(void *);
> -	void *data;
> -	struct completion done;
> -};
> +	stopmachine_set_state(STOPMACHINE_RUN);

Hrm, the semantic of STOPMACHINE_RUN is a bit weird :
- The CPU where the do_stop thread runs will first execute (alone) the
  callback.
- Then, all the other CPUs will execute the callback concurrently.

Given that you use a "started" boolean in the callback, which is ok as
long as there is no concurrent modification (correct given the current
semantic, but fragile), I would tend to document that the first time the
callback is called, it is ran alone, without concurrency, and then all
the other callbacks are ran concurrently.


> +}
>  
>  static int do_stop(void *_smdata)
>  {
> @@ -151,6 +163,8 @@ static int do_stop(void *_smdata)
>  	ret = stop_machine();
>  	if (ret == 0) {
>  		ret = smdata->fn(smdata->data);
> +		if (smdata->run_all)
> +			run_other_cpus();
>  		restart_machine();
>  	}
>  
> @@ -170,17 +184,16 @@ static int do_stop(void *_smdata)
>  struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
>  				       unsigned int cpu)
>  {
> -	struct stop_machine_data smdata;
>  	struct task_struct *p;
>  
> +	down(&stopmachine_mutex);
>  	smdata.fn = fn;
>  	smdata.data = data;
> +	smdata.run_all = (cpu == RUN_ALL) ? 1 : 0;
>  	init_completion(&smdata.done);
> -
> -	down(&stopmachine_mutex);
> -
> +	smp_wmb();
>  	/* If they don't care which CPU fn runs on, bind to any online one. */
> -	if (cpu == NR_CPUS)
> +	if (cpu == NR_CPUS || cpu == RUN_ALL)
>  		cpu = raw_smp_processor_id();
>  
>  	p = kthread_create(do_stop, &smdata, "kstopmachine");
> 
> 
> 4)
> 
> 
> diff --git a/kernel/immediate.c b/kernel/immediate.c
> index 4c36a89..39ec13e 100644
> --- a/kernel/immediate.c
> +++ b/kernel/immediate.c
> @@ -20,6 +20,7 @@
>  #include <linux/immediate.h>
>  #include <linux/memory.h>
>  #include <linux/cpu.h>
> +#include <linux/stop_machine.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -30,6 +31,23 @@ static int imv_early_boot_complete;
>  
>  extern const struct __imv __start___imv[];
>  extern const struct __imv __stop___imv[];
> +int started;
> +
> +int stop_machine_imv_update(void *imv_ptr)
> +{
> +	struct __imv *imv = imv_ptr;
> +
> +	if (!started) {
> +		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> +		started = 1;
> +		smp_wmb();

missing mb() comment here.

> +	} else
> +		sync_core();
> +

Note : we really want the sync_core()s to be executed after the
text_poke. This is ok given the implicit RUN_ALL semantic, but I guess
it should be documented in stop_machine that the first callback is
executed alone before all the others.

Thanks!

Mathieu

> +	flush_icache_range(imv->imv, imv->imv + imv->size);
> +
> +	return 0;
> +}
>  
>  /*
>   * imv_mutex nests inside module_mutex. imv_mutex protects builtin
> @@ -37,38 +55,6 @@ extern const struct __imv __stop___imv[];
>   */
>  static DEFINE_MUTEX(imv_mutex);
>  
> -static atomic_t wait_sync;
> -
> -struct ipi_loop_data {
> -	long value;
> -	const struct __imv *imv;
> -} loop_data;
> -
> -static void ipi_busy_loop(void *arg)
> -{
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > loop_data.value);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > 0);
> -	/*
> -	 * Issuing a synchronizing instruction must be done on each CPU before
> -	 * reenabling interrupts after modifying an instruction. Required by
> -	 * Intel's errata.
> -	 */
> -	sync_core();
> -	flush_icache_range(loop_data.imv->imv,
> -		loop_data.imv->imv + loop_data.imv->size);
> -	local_irq_restore(flags);
> -}
>  
>  /**
>   * apply_imv_update - update one immediate value
> @@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
>   */
>  static int apply_imv_update(const struct __imv *imv)
>  {
> -	unsigned long flags;
> -	long online_cpus;
> -
>  	/*
>  	 * If the variable and the instruction have the same value, there is
>  	 * nothing to do.
> @@ -111,30 +94,8 @@ static int apply_imv_update(const struct __imv *imv)
>  
>  	if (imv_early_boot_complete) {
>  		kernel_text_lock();
> -		get_online_cpus();
> -		online_cpus = num_online_cpus();
> -		atomic_set(&wait_sync, 2 * online_cpus);
> -		loop_data.value = online_cpus;
> -		loop_data.imv = imv;
> -		smp_call_function(ipi_busy_loop, NULL, 1, 0);
> -		local_irq_save(flags);
> -		atomic_dec(&wait_sync);
> -		do {
> -			/* Make sure the wait_sync gets re-read */
> -			smp_mb();
> -		} while (atomic_read(&wait_sync) > online_cpus);
> -		text_poke((void *)imv->imv, (void *)imv->var,
> -				imv->size);
> -		/*
> -		 * Make sure the modified instruction is seen by all CPUs before
> -		 * we continue (visible to other CPUs and local interrupts).
> -		 */
> -		wmb();
> -		atomic_dec(&wait_sync);
> -		flush_icache_range(imv->imv,
> -				imv->imv + imv->size);
> -		local_irq_restore(flags);
> -		put_online_cpus();
> +		started = 0;
> +		stop_machine_run(stop_machine_imv_update, (void *)imv, RUN_ALL);
>  		kernel_text_unlock();
>  	} else
>  		text_poke_early((void *)imv->imv, (void *)imv->var,

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-27 19:05     ` Mathieu Desnoyers
@ 2008-02-28 16:33       ` Jason Baron
  2008-02-28 22:09         ` Max Krasnyanskiy
  2008-02-28 16:37       ` [patch 2/2] implement immediate updating via stop_machine_run() Jason Baron
  2008-02-28 16:50       ` [patch 1/7] Immediate Values - Architecture Independent Code Jason Baron
  2 siblings, 1 reply; 30+ messages in thread
From: Jason Baron @ 2008-02-28 16:33 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell


-allow stop_mahcine_run() to call a function on all cpus. Calling 
 stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
 stop_machine_run() proceeds as normal until the calling cpu has
 invoked 'fn'. Then, we tell all the other cpus to call 'fn'.

Signed-off-by: Jason Baron <jbaron@redhat.com>


---

 include/linux/stop_machine.h |    8 +++++++-
 kernel/stop_machine.c        |   33 +++++++++++++++++++++++----------
 2 files changed, 30 insertions(+), 11 deletions(-)


diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 5bfc553..18af011 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -8,11 +8,17 @@
 #include <asm/system.h>
 
 #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+
+#define ALL_CPUS ~0U
+
 /**
  * stop_machine_run: freeze the machine on all CPUs and run this function
  * @fn: the function to run
  * @data: the data ptr for the @fn()
- * @cpu: the cpu to run @fn() on (or any, if @cpu == NR_CPUS.
+ * @cpu: if @cpu == n, run @fn() on cpu n
+ *       if @cpu == NR_CPUS, run @fn() on any cpu
+ *       if @cpu == ALL_CPUS, run @fn() first on the calling cpu, and then
+ *       concurrently on all the other cpus
  *
  * Description: This causes a thread to be scheduled on every other cpu,
  * each of which disables interrupts, and finally interrupts are disabled
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 51b5ee5..c75b4f4 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -23,9 +23,17 @@ enum stopmachine_state {
 	STOPMACHINE_WAIT,
 	STOPMACHINE_PREPARE,
 	STOPMACHINE_DISABLE_IRQ,
+	STOPMACHINE_RUN,
 	STOPMACHINE_EXIT,
 };
 
+struct stop_machine_data {
+	int (*fn)(void *);
+	void *data;
+	struct completion done;
+	int run_all;
+} smdata;
+
 static enum stopmachine_state stopmachine_state;
 static unsigned int stopmachine_num_threads;
 static atomic_t stopmachine_thread_ack;
@@ -35,6 +43,7 @@ static int stopmachine(void *cpu)
 {
 	int irqs_disabled = 0;
 	int prepared = 0;
+	int ran = 0;
 
 	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
 
@@ -59,6 +68,11 @@ static int stopmachine(void *cpu)
 			prepared = 1;
 			smp_mb(); /* Must read state first. */
 			atomic_inc(&stopmachine_thread_ack);
+		} else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
+			smdata.fn(smdata.data);
+			ran = 1;
+			smp_mb(); /* Must read state first. */
+			atomic_inc(&stopmachine_thread_ack);
 		}
 		/* Yield in first stage: migration threads need to
 		 * help our sisters onto their CPUs. */
@@ -136,12 +150,10 @@ static void restart_machine(void)
 	preempt_enable_no_resched();
 }
 
-struct stop_machine_data
+static void run_other_cpus(void)
 {
-	int (*fn)(void *);
-	void *data;
-	struct completion done;
-};
+	stopmachine_set_state(STOPMACHINE_RUN);
+}
 
 static int do_stop(void *_smdata)
 {
@@ -151,6 +163,8 @@ static int do_stop(void *_smdata)
 	ret = stop_machine();
 	if (ret == 0) {
 		ret = smdata->fn(smdata->data);
+		if (smdata->run_all)
+			run_other_cpus();
 		restart_machine();
 	}
 
@@ -170,17 +184,16 @@ static int do_stop(void *_smdata)
 struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
 				       unsigned int cpu)
 {
-	struct stop_machine_data smdata;
 	struct task_struct *p;
 
+	down(&stopmachine_mutex);
 	smdata.fn = fn;
 	smdata.data = data;
+	smdata.run_all = (cpu == ALL_CPUS) ? 1 : 0;
 	init_completion(&smdata.done);
-
-	down(&stopmachine_mutex);
-
+	smp_wmb(); /* make sure other cpus see smdata updates */
 	/* If they don't care which CPU fn runs on, bind to any online one. */
-	if (cpu == NR_CPUS)
+	if (cpu == NR_CPUS || cpu == ALL_CPUS)
 		cpu = raw_smp_processor_id();
 
 	p = kthread_create(do_stop, &smdata, "kstopmachine");


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 2/2] implement immediate updating via stop_machine_run()
  2008-02-27 19:05     ` Mathieu Desnoyers
  2008-02-28 16:33       ` [patch 1/2] add ALL_CPUS option to stop_machine_run() Jason Baron
@ 2008-02-28 16:37       ` Jason Baron
  2008-02-29 13:43         ` Mathieu Desnoyers
  2008-02-28 16:50       ` [patch 1/7] Immediate Values - Architecture Independent Code Jason Baron
  2 siblings, 1 reply; 30+ messages in thread
From: Jason Baron @ 2008-02-28 16:37 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell


-Updating immediate values, cannot rely on smp_call_function() b/c synchronizing
 cpus using IPIs leads to deadlocks. Process A held a read lock on 
 tasklist_lock, then process B called apply_imv_update(). Process A received the 
 IPI and begins executing ipi_busy_loop(). Then process C takes a write lock 
 irq on the task list lock, before receiving the IPI. Thus, process A holds up 
 process C, and C can't get an IPI b/c interrupts are disabled. Solve this 
 problem by using a new 'ALL_CPUS' parameter to stop_machine_run(). Which 
 runs a function on all cpus after they are busy looping and have disabled 
 irqs. Since this is done in a new process context, we don't have to worry 
 about interrupted spin_locks. Also, less lines of code. Has survived 24 hours+
 of testing...

Signed-off-by: Jason Baron <jbaron@redhat.com>

---

 kernel/immediate.c |   80 ++++++++++++++--------------------------------------
 1 files changed, 21 insertions(+), 59 deletions(-)


diff --git a/kernel/immediate.c b/kernel/immediate.c
index 4c36a89..378a452 100644
--- a/kernel/immediate.c
+++ b/kernel/immediate.c
@@ -20,6 +20,7 @@
 #include <linux/immediate.h>
 #include <linux/memory.h>
 #include <linux/cpu.h>
+#include <linux/stop_machine.h>
 
 #include <asm/cacheflush.h>
 
@@ -27,48 +28,33 @@
  * Kernel ready to execute the SMP update that may depend on trap and ipi.
  */
 static int imv_early_boot_complete;
+static int wrote_text;
 
 extern const struct __imv __start___imv[];
 extern const struct __imv __stop___imv[];
 
+static int stop_machine_imv_update(void *imv_ptr)
+{
+	struct __imv *imv = imv_ptr;
+
+	if (!started) {
+		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
+		wrote_text = 1;
+		smp_wmb(); /* make sure other cpus see that this has run */
+	} else
+		sync_core();
+
+	flush_icache_range(imv->imv, imv->imv + imv->size);
+
+	return 0;
+}
+
 /*
  * imv_mutex nests inside module_mutex. imv_mutex protects builtin
  * immediates and module immediates.
  */
 static DEFINE_MUTEX(imv_mutex);
 
-static atomic_t wait_sync;
-
-struct ipi_loop_data {
-	long value;
-	const struct __imv *imv;
-} loop_data;
-
-static void ipi_busy_loop(void *arg)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	atomic_dec(&wait_sync);
-	do {
-		/* Make sure the wait_sync gets re-read */
-		smp_mb();
-	} while (atomic_read(&wait_sync) > loop_data.value);
-	atomic_dec(&wait_sync);
-	do {
-		/* Make sure the wait_sync gets re-read */
-		smp_mb();
-	} while (atomic_read(&wait_sync) > 0);
-	/*
-	 * Issuing a synchronizing instruction must be done on each CPU before
-	 * reenabling interrupts after modifying an instruction. Required by
-	 * Intel's errata.
-	 */
-	sync_core();
-	flush_icache_range(loop_data.imv->imv,
-		loop_data.imv->imv + loop_data.imv->size);
-	local_irq_restore(flags);
-}
 
 /**
  * apply_imv_update - update one immediate value
@@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
  */
 static int apply_imv_update(const struct __imv *imv)
 {
-	unsigned long flags;
-	long online_cpus;
-
 	/*
 	 * If the variable and the instruction have the same value, there is
 	 * nothing to do.
@@ -111,30 +94,9 @@ static int apply_imv_update(const struct __imv *imv)
 
 	if (imv_early_boot_complete) {
 		kernel_text_lock();
-		get_online_cpus();
-		online_cpus = num_online_cpus();
-		atomic_set(&wait_sync, 2 * online_cpus);
-		loop_data.value = online_cpus;
-		loop_data.imv = imv;
-		smp_call_function(ipi_busy_loop, NULL, 1, 0);
-		local_irq_save(flags);
-		atomic_dec(&wait_sync);
-		do {
-			/* Make sure the wait_sync gets re-read */
-			smp_mb();
-		} while (atomic_read(&wait_sync) > online_cpus);
-		text_poke((void *)imv->imv, (void *)imv->var,
-				imv->size);
-		/*
-		 * Make sure the modified instruction is seen by all CPUs before
-		 * we continue (visible to other CPUs and local interrupts).
-		 */
-		wmb();
-		atomic_dec(&wait_sync);
-		flush_icache_range(imv->imv,
-				imv->imv + imv->size);
-		local_irq_restore(flags);
-		put_online_cpus();
+		wrote_text = 0;
+		stop_machine_run(stop_machine_imv_update, (void *)imv,
+					ALL_CPUS);
 		kernel_text_unlock();
 	} else
 		text_poke_early((void *)imv->imv, (void *)imv->var,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-27 19:05     ` Mathieu Desnoyers
  2008-02-28 16:33       ` [patch 1/2] add ALL_CPUS option to stop_machine_run() Jason Baron
  2008-02-28 16:37       ` [patch 2/2] implement immediate updating via stop_machine_run() Jason Baron
@ 2008-02-28 16:50       ` Jason Baron
  2 siblings, 0 replies; 30+ messages in thread
From: Jason Baron @ 2008-02-28 16:50 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell

On Wed, Feb 27, 2008 at 02:05:19PM -0500, Mathieu Desnoyers wrote:
> > +struct stop_machine_data {
> > +	int (*fn)(void *);
> > +	void *data;
> > +	struct completion done;
> > +	int run_all;
> > +} smdata;
> > +
> 
> Why do we now have to declare this static ? Can we pass it as a pointer
> to stopmachine instead ?
> 

Since the other cpus need to access 'fn', i made smdata a global.
stop_machine_run() is system-wide, so we only need one of these...

thanks,

-Jason


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-28 16:33       ` [patch 1/2] add ALL_CPUS option to stop_machine_run() Jason Baron
@ 2008-02-28 22:09         ` Max Krasnyanskiy
  2008-02-28 22:14           ` Mathieu Desnoyers
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Max Krasnyanskiy @ 2008-02-28 22:09 UTC (permalink / raw)
  To: Jason Baron
  Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, Rusty Russell

Jason Baron wrote:
> -allow stop_mahcine_run() to call a function on all cpus. Calling 
>  stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
>  stop_machine_run() proceeds as normal until the calling cpu has
>  invoked 'fn'. Then, we tell all the other cpus to call 'fn'.
> 

Jason, we're actually trying to reduce the usage of the stop_machine in 
general. It's a very big hammer that kills latencies and stuff. It'd be nice 
if we did not introduce any more dependencies on it. I guess in some case 
there is simply no other way to handle what need to do. But please think twice
(or more :)).

Max



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-28 22:09         ` Max Krasnyanskiy
@ 2008-02-28 22:14           ` Mathieu Desnoyers
  2008-02-29  2:39             ` Jason Baron
  2008-02-29  9:00           ` Ingo Molnar
  2008-03-02 23:32           ` Rusty Russell
  2 siblings, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-28 22:14 UTC (permalink / raw)
  To: Max Krasnyanskiy
  Cc: Jason Baron, akpm, Ingo Molnar, linux-kernel, Rusty Russell

* Max Krasnyanskiy (maxk@qualcomm.com) wrote:
> Jason Baron wrote:
>> -allow stop_mahcine_run() to call a function on all cpus. Calling  
>> stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
>>  stop_machine_run() proceeds as normal until the calling cpu has
>>  invoked 'fn'. Then, we tell all the other cpus to call 'fn'.
>
> Jason, we're actually trying to reduce the usage of the stop_machine in 
> general. It's a very big hammer that kills latencies and stuff. It'd be 
> nice if we did not introduce any more dependencies on it. I guess in some 
> case there is simply no other way to handle what need to do. But please 
> think twice
> (or more :)).
>
> Max
>
>

I have a "more complex" immediate value implementation that does not
depend on such heavy lock. I made this simplified version because Rusty
preferred it, although I say from the beginning that it kills interrupt
latency. I could propose the atomic, nmi-safe version directly if enough
people are in favor of it.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-28 22:14           ` Mathieu Desnoyers
@ 2008-02-29  2:39             ` Jason Baron
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Baron @ 2008-02-29  2:39 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Max Krasnyanskiy, akpm, Ingo Molnar, linux-kernel, Rusty Russell

On Thu, Feb 28, 2008 at 05:14:30PM -0500, Mathieu Desnoyers wrote:
> * Max Krasnyanskiy (maxk@qualcomm.com) wrote:
> > Jason Baron wrote:
> >> -allow stop_mahcine_run() to call a function on all cpus. Calling  
> >> stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
> >>  stop_machine_run() proceeds as normal until the calling cpu has
> >>  invoked 'fn'. Then, we tell all the other cpus to call 'fn'.
> >
> > Jason, we're actually trying to reduce the usage of the stop_machine in 
> > general. It's a very big hammer that kills latencies and stuff. It'd be 
> > nice if we did not introduce any more dependencies on it. I guess in some 
> > case there is simply no other way to handle what need to do. But please 
> > think twice
> > (or more :)).
> >
> > Max
> >
> >
> 
> I have a "more complex" immediate value implementation that does not
> depend on such heavy lock. I made this simplified version because Rusty
> preferred it, although I say from the beginning that it kills interrupt
> latency. I could propose the atomic, nmi-safe version directly if enough
> people are in favor of it.
> 
> Mathieu
> 

to me the updating of the immdiate values isn't the critical path, but 
obviously i'd be in favor of a more efficient implementation.

-Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-28 22:09         ` Max Krasnyanskiy
  2008-02-28 22:14           ` Mathieu Desnoyers
@ 2008-02-29  9:00           ` Ingo Molnar
  2008-02-29 18:24             ` Max Krasnyanskiy
  2008-03-02 23:32           ` Rusty Russell
  2 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2008-02-29  9:00 UTC (permalink / raw)
  To: Max Krasnyanskiy
  Cc: Jason Baron, Mathieu Desnoyers, akpm, linux-kernel,
	Rusty Russell, Peter Zijlstra


* Max Krasnyanskiy <maxk@qualcomm.com> wrote:

>> -allow stop_mahcine_run() to call a function on all cpus. Calling 
>> stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
>>  stop_machine_run() proceeds as normal until the calling cpu has 
>>  invoked 'fn'. Then, we tell all the other cpus to call 'fn'.
>
> Jason, we're actually trying to reduce the usage of the stop_machine 
> in general. [...]

please talk in your own name. Stop-machine is a very elegant tool that 
simplifies a lot of hard things in the kernel and is reasonably fast as 
well. We've just recently added two new usages of it and more are 
planned.

_you_ might be the one who wants to 'reduce the usage of stop_machine' - 
but that means it is _you_ who first has to convert a number of very 
difficult pieces of code to "something else".

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 2/2] implement immediate updating via stop_machine_run()
  2008-02-28 16:37       ` [patch 2/2] implement immediate updating via stop_machine_run() Jason Baron
@ 2008-02-29 13:43         ` Mathieu Desnoyers
  0 siblings, 0 replies; 30+ messages in thread
From: Mathieu Desnoyers @ 2008-02-29 13:43 UTC (permalink / raw)
  To: Jason Baron; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell

* Jason Baron (jbaron@redhat.com) wrote:
> 
> -Updating immediate values, cannot rely on smp_call_function() b/c synchronizing
>  cpus using IPIs leads to deadlocks. Process A held a read lock on 
>  tasklist_lock, then process B called apply_imv_update(). Process A received the 
>  IPI and begins executing ipi_busy_loop(). Then process C takes a write lock 
>  irq on the task list lock, before receiving the IPI. Thus, process A holds up 
>  process C, and C can't get an IPI b/c interrupts are disabled. Solve this 
>  problem by using a new 'ALL_CPUS' parameter to stop_machine_run(). Which 
>  runs a function on all cpus after they are busy looping and have disabled 
>  irqs. Since this is done in a new process context, we don't have to worry 
>  about interrupted spin_locks. Also, less lines of code. Has survived 24 hours+
>  of testing...
> 

Ok, I am merging it in my patchset. Note that I have additionnal patches
in my LTTng tree that implement the lockless algorithm on top of these
and does not need the stop_machine for the immediate values. Therefore,
I won't be doing much testing of these 2 patches myself.

Thanks for taking care of this,

Mathieu

> Signed-off-by: Jason Baron <jbaron@redhat.com>
> 
> ---
> 
>  kernel/immediate.c |   80 ++++++++++++++--------------------------------------
>  1 files changed, 21 insertions(+), 59 deletions(-)
> 
> 
> diff --git a/kernel/immediate.c b/kernel/immediate.c
> index 4c36a89..378a452 100644
> --- a/kernel/immediate.c
> +++ b/kernel/immediate.c
> @@ -20,6 +20,7 @@
>  #include <linux/immediate.h>
>  #include <linux/memory.h>
>  #include <linux/cpu.h>
> +#include <linux/stop_machine.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -27,48 +28,33 @@
>   * Kernel ready to execute the SMP update that may depend on trap and ipi.
>   */
>  static int imv_early_boot_complete;
> +static int wrote_text;
>  
>  extern const struct __imv __start___imv[];
>  extern const struct __imv __stop___imv[];
>  
> +static int stop_machine_imv_update(void *imv_ptr)
> +{
> +	struct __imv *imv = imv_ptr;
> +
> +	if (!started) {
> +		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> +		wrote_text = 1;
> +		smp_wmb(); /* make sure other cpus see that this has run */
> +	} else
> +		sync_core();
> +
> +	flush_icache_range(imv->imv, imv->imv + imv->size);
> +
> +	return 0;
> +}
> +
>  /*
>   * imv_mutex nests inside module_mutex. imv_mutex protects builtin
>   * immediates and module immediates.
>   */
>  static DEFINE_MUTEX(imv_mutex);
>  
> -static atomic_t wait_sync;
> -
> -struct ipi_loop_data {
> -	long value;
> -	const struct __imv *imv;
> -} loop_data;
> -
> -static void ipi_busy_loop(void *arg)
> -{
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > loop_data.value);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > 0);
> -	/*
> -	 * Issuing a synchronizing instruction must be done on each CPU before
> -	 * reenabling interrupts after modifying an instruction. Required by
> -	 * Intel's errata.
> -	 */
> -	sync_core();
> -	flush_icache_range(loop_data.imv->imv,
> -		loop_data.imv->imv + loop_data.imv->size);
> -	local_irq_restore(flags);
> -}
>  
>  /**
>   * apply_imv_update - update one immediate value
> @@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
>   */
>  static int apply_imv_update(const struct __imv *imv)
>  {
> -	unsigned long flags;
> -	long online_cpus;
> -
>  	/*
>  	 * If the variable and the instruction have the same value, there is
>  	 * nothing to do.
> @@ -111,30 +94,9 @@ static int apply_imv_update(const struct __imv *imv)
>  
>  	if (imv_early_boot_complete) {
>  		kernel_text_lock();
> -		get_online_cpus();
> -		online_cpus = num_online_cpus();
> -		atomic_set(&wait_sync, 2 * online_cpus);
> -		loop_data.value = online_cpus;
> -		loop_data.imv = imv;
> -		smp_call_function(ipi_busy_loop, NULL, 1, 0);
> -		local_irq_save(flags);
> -		atomic_dec(&wait_sync);
> -		do {
> -			/* Make sure the wait_sync gets re-read */
> -			smp_mb();
> -		} while (atomic_read(&wait_sync) > online_cpus);
> -		text_poke((void *)imv->imv, (void *)imv->var,
> -				imv->size);
> -		/*
> -		 * Make sure the modified instruction is seen by all CPUs before
> -		 * we continue (visible to other CPUs and local interrupts).
> -		 */
> -		wmb();
> -		atomic_dec(&wait_sync);
> -		flush_icache_range(imv->imv,
> -				imv->imv + imv->size);
> -		local_irq_restore(flags);
> -		put_online_cpus();
> +		wrote_text = 0;
> +		stop_machine_run(stop_machine_imv_update, (void *)imv,
> +					ALL_CPUS);
>  		kernel_text_unlock();
>  	} else
>  		text_poke_early((void *)imv->imv, (void *)imv->var,

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-29  9:00           ` Ingo Molnar
@ 2008-02-29 18:24             ` Max Krasnyanskiy
  2008-02-29 19:15               ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Max Krasnyanskiy @ 2008-02-29 18:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jason Baron, Mathieu Desnoyers, akpm, linux-kernel,
	Rusty Russell, Peter Zijlstra

Ingo Molnar wrote:
> * Max Krasnyanskiy <maxk@qualcomm.com> wrote:
> 
>>> -allow stop_mahcine_run() to call a function on all cpus. Calling 
>>> stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
>>>  stop_machine_run() proceeds as normal until the calling cpu has 
>>>  invoked 'fn'. Then, we tell all the other cpus to call 'fn'.
>> Jason, we're actually trying to reduce the usage of the stop_machine 
>> in general. [...]
> 
> please talk in your own name. Stop-machine is a very elegant tool that 
> simplifies a lot of hard things in the kernel and is reasonably fast as 
> well. We've just recently added two new usages of it and more are 
> planned.
> 
> _you_ might be the one who wants to 'reduce the usage of stop_machine' - 
> but that means it is _you_ who first has to convert a number of very 
> difficult pieces of code to "something else".
Sure I started the discussion but I suppose you missed Andi's and other 
replies. All I said that people should think twice before relying on it.
btw I'm ok if I _am_ the _one_ who has to convert those pieces of code, that's 
part of the fun :). But if people keep adding stuff which uses stom_machine 
that may be pretty difficult :).

btw Being an RT guy you do not think that stop machine is evil ? I mean from 
the overhead and especially latency perspective. By overhead I mean if you 
have 100+ cpu box that Paul and other guys have mentioned here. Every single 
CPU has to be frozen. You said it's reasonably fast. I guess it depends what's 
reasonable. And from the latency perspective all bets are off. We have no 
guaranties whatsoever as to hold long it will take for cpu X to get frozen 
(there numerous factors here) and all the other cpus have to wait for it.
As I said for some things there is just no other way but to use the 
stop_machine but we should try to minimize that as much as possible.

Max


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-29 18:24             ` Max Krasnyanskiy
@ 2008-02-29 19:15               ` Ingo Molnar
  2008-02-29 19:58                 ` Max Krasnyanskiy
  2008-03-03  4:12                 ` Rusty Russell
  0 siblings, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2008-02-29 19:15 UTC (permalink / raw)
  To: Max Krasnyanskiy
  Cc: Jason Baron, Mathieu Desnoyers, akpm, linux-kernel,
	Rusty Russell, Peter Zijlstra


* Max Krasnyanskiy <maxk@qualcomm.com> wrote:

> btw Being an RT guy you do not think that stop machine is evil ? [...]

i'm not "an RT guy", -rt is just one of the many projects i've been 
involved with.

and no, i dont think stop machine is "evil" - it's currently the best 
way to do certain things. If you can solve it better then sure, i'm 
awaiting your patches - but the only patch i saw from you so far was the 
one that turned off stop-machine for isolated cpus - which was 
incredibly broken and ignored the problem altogether.

Right now the answer is: "if you want to do hard RT then avoid doing 
things like loading modules". (which you should avoid while doing 
hard-RT anyway)

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-29 19:15               ` Ingo Molnar
@ 2008-02-29 19:58                 ` Max Krasnyanskiy
  2008-03-03  4:12                 ` Rusty Russell
  1 sibling, 0 replies; 30+ messages in thread
From: Max Krasnyanskiy @ 2008-02-29 19:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jason Baron, Mathieu Desnoyers, akpm, linux-kernel,
	Rusty Russell, Peter Zijlstra

Ingo Molnar wrote:
> * Max Krasnyanskiy <maxk@qualcomm.com> wrote:
> 
>> btw Being an RT guy you do not think that stop machine is evil ? [...]
> 
> i'm not "an RT guy", -rt is just one of the many projects i've been 
> involved with.
> 
> and no, i dont think stop machine is "evil" - it's currently the best 
> way to do certain things. If you can solve it better then sure, i'm 
> awaiting your patches - but the only patch i saw from you so far was the 
> one that turned off stop-machine for isolated cpus - which was 
> incredibly broken and ignored the problem altogether.
Ingo, I got it. My patch was a hack. Moving on. Seriously there is no need to 
say it ten thousand times ;-).

You clipped the part where I elaborated what exactly is evil about the stop 
machine. I clearly said that yes for some things there is just no other way 
but in general we should _try_ to avoid it. Note that I did not say "we must" 
I'm saying we should try.

> Right now the answer is: "if you want to do hard RT then avoid doing 
> things like loading modules". (which you should avoid while doing 
> hard-RT anyway)
That's just not practical. Sure you can have some kind of stripped down 
machine but then you loose a lot of flexibility. Again "should" is the keyword 
here. For a lot of workloads hard-RT has to coexist with a bunch of other things.

Max


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-28 22:09         ` Max Krasnyanskiy
  2008-02-28 22:14           ` Mathieu Desnoyers
  2008-02-29  9:00           ` Ingo Molnar
@ 2008-03-02 23:32           ` Rusty Russell
  2 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2008-03-02 23:32 UTC (permalink / raw)
  To: Max Krasnyanskiy
  Cc: Jason Baron, Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel,
	Kathy Staples

On Friday 29 February 2008 09:09:37 Max Krasnyanskiy wrote:
> Jason Baron wrote:
> > -allow stop_mahcine_run() to call a function on all cpus. Calling
> >  stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
> >  stop_machine_run() proceeds as normal until the calling cpu has
> >  invoked 'fn'. Then, we tell all the other cpus to call 'fn'.
>
> Jason, we're actually trying to reduce the usage of the stop_machine in
> general. It's a very big hammer that kills latencies and stuff. It'd be
> nice if we did not introduce any more dependencies on it. I guess in some
> case there is simply no other way to handle what need to do. But please
> think twice (or more :)).

Well, by definition modifying an immediate value should be very rare, so it's 
a reasonable candidate.

But stop_machine needs work.  It should not be as heavy as it is.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-02-29 19:15               ` Ingo Molnar
  2008-02-29 19:58                 ` Max Krasnyanskiy
@ 2008-03-03  4:12                 ` Rusty Russell
  2008-03-04  0:30                   ` Max Krasnyanskiy
  1 sibling, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2008-03-03  4:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Max Krasnyanskiy, Jason Baron, Mathieu Desnoyers, akpm,
	linux-kernel, Peter Zijlstra

On Saturday 01 March 2008 06:15:59 Ingo Molnar wrote:
> Right now the answer is: "if you want to do hard RT then avoid doing
> things like loading modules". (which you should avoid while doing
> hard-RT anyway)

Hi Ingo,

     I agree, but Max is doing a service by making more people aware of the 
limitations of stop_machine so they don't sprinkle it around like candy :)

I think the module load case can be reasonably fixed; it's in my backlog now.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-03-03  4:12                 ` Rusty Russell
@ 2008-03-04  0:30                   ` Max Krasnyanskiy
  2008-03-04  2:36                     ` Rusty Russell
  0 siblings, 1 reply; 30+ messages in thread
From: Max Krasnyanskiy @ 2008-03-04  0:30 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Jason Baron, Mathieu Desnoyers, akpm, linux-kernel,
	Peter Zijlstra

Hi Rusty,

> On Saturday 01 March 2008 06:15:59 Ingo Molnar wrote:
>> Right now the answer is: "if you want to do hard RT then avoid doing
>> things like loading modules". (which you should avoid while doing
>> hard-RT anyway)
> 
> Hi Ingo,
> 
>      I agree, but Max is doing a service by making more people aware of the 
> limitations of stop_machine so they don't sprinkle it around like candy :)
Exactly. I'm glad we're on same page on this.

> I think the module load case can be reasonably fixed; it's in my backlog now.
Great. I might get some time later this week to think about this some more and 
try things out. So if you have ideas/suggestions let me know.

Max

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-03-04  0:30                   ` Max Krasnyanskiy
@ 2008-03-04  2:36                     ` Rusty Russell
  2008-03-04  4:11                       ` Max Krasnyansky
  0 siblings, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2008-03-04  2:36 UTC (permalink / raw)
  To: Max Krasnyanskiy
  Cc: Ingo Molnar, Jason Baron, Mathieu Desnoyers, akpm, linux-kernel,
	Peter Zijlstra

On Tuesday 04 March 2008 11:30:52 Max Krasnyanskiy wrote:
> Great. I might get some time later this week to think about this some more
> and try things out. So if you have ideas/suggestions let me know.

Hmm, might be worth making stop_machine_run take a cpumask: I balked at it 
originally, but introducing a section special case crosses the line I think.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 1/2] add ALL_CPUS option to stop_machine_run()
  2008-03-04  2:36                     ` Rusty Russell
@ 2008-03-04  4:11                       ` Max Krasnyansky
  0 siblings, 0 replies; 30+ messages in thread
From: Max Krasnyansky @ 2008-03-04  4:11 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Jason Baron, Mathieu Desnoyers, akpm, linux-kernel,
	Peter Zijlstra



Rusty Russell wrote:
> On Tuesday 04 March 2008 11:30:52 Max Krasnyanskiy wrote:
>> Great. I might get some time later this week to think about this some more
>> and try things out. So if you have ideas/suggestions let me know.
> 
> Hmm, might be worth making stop_machine_run take a cpumask: I balked at it 
> originally, but introducing a section special case crosses the line I think.

Are you talking about a mask of which CPUs to stop  ?
That's essentially what I did with the cpu_isolated_map. But as Ingo and
others pointed out it's the wrong thing to do.
Or did you mean something else ?

Max

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2008-03-04  4:11 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
2008-02-02 21:08 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2008-02-26 22:52   ` Jason Baron
2008-02-26 23:12     ` Mathieu Desnoyers
2008-02-26 23:34       ` Mathieu Desnoyers
2008-02-27 16:44         ` Jason Baron
2008-02-27 17:01       ` Jason Baron
2008-02-27 19:05     ` Mathieu Desnoyers
2008-02-28 16:33       ` [patch 1/2] add ALL_CPUS option to stop_machine_run() Jason Baron
2008-02-28 22:09         ` Max Krasnyanskiy
2008-02-28 22:14           ` Mathieu Desnoyers
2008-02-29  2:39             ` Jason Baron
2008-02-29  9:00           ` Ingo Molnar
2008-02-29 18:24             ` Max Krasnyanskiy
2008-02-29 19:15               ` Ingo Molnar
2008-02-29 19:58                 ` Max Krasnyanskiy
2008-03-03  4:12                 ` Rusty Russell
2008-03-04  0:30                   ` Max Krasnyanskiy
2008-03-04  2:36                     ` Rusty Russell
2008-03-04  4:11                       ` Max Krasnyansky
2008-03-02 23:32           ` Rusty Russell
2008-02-28 16:37       ` [patch 2/2] implement immediate updating via stop_machine_run() Jason Baron
2008-02-29 13:43         ` Mathieu Desnoyers
2008-02-28 16:50       ` [patch 1/7] Immediate Values - Architecture Independent Code Jason Baron
2008-02-02 21:08 ` [patch 2/7] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
2008-02-02 21:08 ` [patch 3/7] Immediate Values - x86 Optimization Mathieu Desnoyers
2008-02-02 21:08 ` [patch 4/7] Add text_poke and sync_core to powerpc Mathieu Desnoyers
2008-02-02 21:08 ` [patch 5/7] Immediate Values - Powerpc Optimization Mathieu Desnoyers
2008-02-02 21:08 ` [patch 6/7] Immediate Values - Documentation Mathieu Desnoyers
2008-02-02 21:08 ` [patch 7/7] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).