LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 00/61] ANNOUNCE: lock validator -V1
@ 2006-05-29 21:21 Ingo Molnar
  2006-05-29 21:22 ` [patch 01/61] lock validator: floppy.c irq-release fix Ingo Molnar
                   ` (73 more replies)
  0 siblings, 74 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

We are pleased to announce the first release of the "lock dependency 
correctness validator" kernel debugging feature, which can be downloaded 
from:

  http://redhat.com/~mingo/lockdep-patches/

The easiest way to try lockdep on a testbox is to apply the combo patch 
to 2.6.17-rc4-mm3. The patch order is:

  http://kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.17-rc4.tar.bz2
  http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17-rc4/2.6.17-rc4-mm3/2.6.17-rc4-mm3.bz2
  http://redhat.com/~mingo/lockdep-patches/lockdep-combo.patch

do 'make oldconfig' and accept all the defaults for new config options - 
reboot into the kernel and if everything goes well it should boot up 
fine and you should have /proc/lockdep and /proc/lockdep_stats files.

Typically if the lock validator finds some problem it will print out 
voluminous debug output that begins with "BUG: ..." and which syslog 
output can be used by kernel developers to figure out the precise 
locking scenario.

What does the lock validator do? It "observes" and maps all locking 
rules as they occur dynamically (as triggered by the kernel's natural 
use of spinlocks, rwlocks, mutexes and rwsems). Whenever the lock 
validator subsystem detects a new locking scenario, it validates this 
new rule against the existing set of rules. If this new rule is 
consistent with the existing set of rules then the new rule is added 
transparently and the kernel continues as normal. If the new rule could 
create a deadlock scenario then this condition is printed out.

When determining validity of locking, all possible "deadlock scenarios" 
are considered: assuming arbitrary number of CPUs, arbitrary irq context 
and task context constellations, running arbitrary combinations of all 
the existing locking scenarios. In a typical system this means millions 
of separate scenarios. This is why we call it a "locking correctness" 
validator - for all rules that are observed the lock validator proves it 
with mathematical certainty that a deadlock could not occur (assuming 
that the lock validator implementation itself is correct and its 
internal data structures are not corrupted by some other kernel 
subsystem). [see more details and conditionals of this statement in 
include/linux/lockdep.h and Documentation/lockdep-design.txt]

Furthermore, this "all possible scenarios" property of the validator 
also enables the finding of complex, highly unlikely multi-CPU 
multi-context races via single single-context rules, increasing the 
likelyhood of finding bugs drastically. In practical terms: the lock 
validator already found a bug in the upstream kernel that could only 
occur on systems with 3 or more CPUs, and which needed 3 very unlikely 
code sequences to occur at once on the 3 CPUs. That bug was found and 
reported on a single-CPU system (!). So in essence a race will be found 
"piecemail-wise", triggering all the necessary components for the race, 
without having to reproduce the race scenario itself! In its short 
existence the lock validator found and reported many bugs before they 
actually caused a real deadlock.

To further increase the efficiency of the validator, the mapping is not 
per "lock instance", but per "lock-type". For example, all struct inode 
objects in the kernel have inode->inotify_mutex. If there are 10,000 
inodes cached, then there are 10,000 lock objects. But ->inotify_mutex 
is a single "lock type", and all locking activities that occur against 
->inotify_mutex are "unified" into this single lock-type. The advantage 
of the lock-type approach is that all historical ->inotify_mutex uses 
are mapped into a single (and as narrow as possible) set of locking 
rules - regardless of how many different tasks or inode structures it 
took to build this set of rules. The set of rules persist during the 
lifetime of the kernel.

To see the rough magnitude of checking that the lock validator does, 
here's a portion of /proc/lockdep_stats, fresh after bootup:

 lock-types:                            694 [max: 2048]
 direct dependencies:                  1598 [max: 8192]
 indirect dependencies:               17896
 all direct dependencies:             16206
 dependency chains:                    1910 [max: 8192]
 in-hardirq chains:                      17
 in-softirq chains:                     105
 in-process chains:                    1065
 stack-trace entries:                 38761 [max: 131072]
 combined max dependencies:         2033928
 hardirq-safe locks:                     24
 hardirq-unsafe locks:                  176
 softirq-safe locks:                     53
 softirq-unsafe locks:                  137
 irq-safe locks:                         59
 irq-unsafe locks:                      176

The lock validator has observed 1598 actual single-thread locking 
patterns, and has validated all possible 2033928 distinct locking 
scenarios.

More details about the design of the lock validator can be found in 
Documentation/lockdep-design.txt, which can also found at:

   http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt

The patchqueue consists of 61 patches, and the changes are quite 
extensive:

 215 files changed, 7693 insertions(+), 1247 deletions(-)

So be careful when testing.

We only plan to post the queue to lkml this time, we'll try to not flood 
lkml with future releases. The finegrained patch-queue can be also seen 
at:

  http://redhat.com/~mingo/lockdep-patches/patches/

(the series file, with explanations about splitup categories of the 
patches can be found attached below.)

The lock validator has been build-tested with allyesconfig, and booted 
on x86 and x86_64. (Other architectures probably dont build/work yet.)

Comments, test-results, bug fixes, and improvements are welcome!

	Ingo


# locking fixes (for bugs found by lockdep), not yet in mainline or -mm:

floppy-release-fix.patch
forcedeth-deadlock-fix.patch

# fixes for upstream that only triggers on lockdep:

sound_oss_emu10k1_midi-fix.patch
mutex-section-bug.patch

# locking subsystem debugging improvements:

warn-once.patch
add-module-address.patch

generic-lock-debugging.patch
locking-selftests.patch

spinlock-init-cleanups.patch
lock-init-improvement.patch
xfs-improve-mrinit-macro.patch

# stacktrace:

x86_64-beautify-stack-backtrace.patch
x86_64-document-stack-backtrace.patch
stacktrace.patch

x86_64-use-stacktrace-for-backtrace.patch

# irq-flags state tracing:

lockdep-fown-fixes.patch
lockdep-sk-callback-lock-fixes.patch
trace-irqflags.patch
trace-irqflags-cleanups-x86.patch
trace-irqflags-cleanups-x86_64.patch
local-irq-enable-in-hardirq.patch

# percpu subsystem feature needed for lockdep:

add-per-cpu-offset.patch

# lockdep subsystem core bits:

lockdep-core.patch
lockdep-proc.patch
lockdep-docs.patch

# make use of lockdep in locking subsystems:

lockdep-prove-rwsems.patch
lockdep-prove-spin_rwlocks.patch
lockdep-prove-mutexes.patch

# lockdep utility patches:

lockdep-print-types-in-sysrq.patch
lockdep-x86_64-early-init.patch
lockdep-i386-alternatives-off.patch
lockdep-printk-recursion.patch
lockdep-disable-nmi-watchdog.patch

# map all the locking details and quirks to lockdep:

lockdep-blockdev.patch
lockdep-direct-io.patch
lockdep-serial.patch
lockdep-dcache.patch
lockdep-namei.patch
lockdep-super.patch
lockdep-futex.patch
lockdep-genirq.patch
lockdep-kgdb.patch
lockdep-completions.patch
lockdep-waitqueue.patch
lockdep-mm.patch
lockdep-slab.patch

lockdep-skb_queue_head_init.patch
lockdep-timer.patch
lockdep-sched.patch
lockdep-hrtimer.patch
lockdep-sock.patch
lockdep-af_unix.patch
lockdep-lock_sock.patch
lockdep-mmap_sem.patch

lockdep-prune_dcache-workaround.patch
lockdep-jbd.patch
lockdep-posix-timers.patch
lockdep-sch_generic.patch
lockdep-xfrm.patch
lockdep-sound-seq-ports.patch

lockdep-enable-Kconfig.patch

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 01/61] lock validator: floppy.c irq-release fix
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
@ 2006-05-29 21:22 ` Ingo Molnar
  2006-05-30  1:32   ` Andrew Morton
  2006-05-29 21:23 ` [patch 02/61] lock validator: forcedeth.c fix Ingo Molnar
                   ` (72 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

floppy.c does alot of irq-unsafe work within floppy_release_irq_and_dma():
free_irq(), release_region() ... so when executing in irq context, push
the whole function into keventd.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/block/floppy.c |   27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

Index: linux/drivers/block/floppy.c
===================================================================
--- linux.orig/drivers/block/floppy.c
+++ linux/drivers/block/floppy.c
@@ -573,6 +573,21 @@ static int floppy_grab_irq_and_dma(void)
 static void floppy_release_irq_and_dma(void);
 
 /*
+ * Interrupt, DMA and region freeing must not be done from IRQ
+ * context - e.g. irq-unregistration means /proc VFS work, region
+ * release takes an irq-unsafe lock, etc. So we push this work
+ * into keventd:
+ */
+static void fd_release_fn(void *data)
+{
+	mutex_lock(&open_lock);
+	floppy_release_irq_and_dma();
+	mutex_unlock(&open_lock);
+}
+
+static DECLARE_WORK(floppy_release_irq_and_dma_work, fd_release_fn, NULL);
+
+/*
  * The "reset" variable should be tested whenever an interrupt is scheduled,
  * after the commands have been sent. This is to ensure that the driver doesn't
  * get wedged when the interrupt doesn't come because of a failed command.
@@ -836,7 +851,7 @@ static int set_dor(int fdc, char mask, c
 	if (newdor & FLOPPY_MOTOR_MASK)
 		floppy_grab_irq_and_dma();
 	if (olddor & FLOPPY_MOTOR_MASK)
-		floppy_release_irq_and_dma();
+		schedule_work(&floppy_release_irq_and_dma_work);
 	return olddor;
 }
 
@@ -917,6 +932,8 @@ static int _lock_fdc(int drive, int inte
 
 		set_current_state(TASK_RUNNING);
 		remove_wait_queue(&fdc_wait, &wait);
+
+		flush_scheduled_work();
 	}
 	command_status = FD_COMMAND_NONE;
 
@@ -950,7 +967,7 @@ static inline void unlock_fdc(void)
 	if (elv_next_request(floppy_queue))
 		do_fd_request(floppy_queue);
 	spin_unlock_irqrestore(&floppy_lock, flags);
-	floppy_release_irq_and_dma();
+	schedule_work(&floppy_release_irq_and_dma_work);
 	wake_up(&fdc_wait);
 }
 
@@ -4647,6 +4664,12 @@ void cleanup_module(void)
 	del_timer_sync(&fd_timer);
 	blk_cleanup_queue(floppy_queue);
 
+	/*
+	 * Wait for any asynchronous floppy_release_irq_and_dma()
+	 * calls to finish first:
+	 */
+	flush_scheduled_work();
+
 	if (usage_count)
 		floppy_release_irq_and_dma();
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 02/61] lock validator: forcedeth.c fix
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
  2006-05-29 21:22 ` [patch 01/61] lock validator: floppy.c irq-release fix Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-30  1:33   ` Andrew Morton
  2006-05-29 21:23 ` [patch 03/61] lock validator: sound/oss/emu10k1/midi.c cleanup Ingo Molnar
                   ` (71 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

nv_do_nic_poll() is called from timer softirqs, which has interrupts
enabled, but np->lock might also be taken by some other interrupt
context.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/net/forcedeth.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux/drivers/net/forcedeth.c
===================================================================
--- linux.orig/drivers/net/forcedeth.c
+++ linux/drivers/net/forcedeth.c
@@ -2869,6 +2869,7 @@ static void nv_do_nic_poll(unsigned long
 	struct net_device *dev = (struct net_device *) data;
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
+	unsigned long flags;
 	u32 mask = 0;
 
 	/*
@@ -2897,10 +2898,9 @@ static void nv_do_nic_poll(unsigned long
 			mask |= NVREG_IRQ_OTHER;
 		}
 	}
+	local_irq_save(flags);
 	np->nic_poll_irq = 0;
 
-	/* FIXME: Do we need synchronize_irq(dev->irq) here? */
-
 	writel(mask, base + NvRegIrqMask);
 	pci_push(base);
 
@@ -2924,6 +2924,7 @@ static void nv_do_nic_poll(unsigned long
 			enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_OTHER].vector);
 		}
 	}
+	local_irq_restore(flags);
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 03/61] lock validator: sound/oss/emu10k1/midi.c cleanup
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
  2006-05-29 21:22 ` [patch 01/61] lock validator: floppy.c irq-release fix Ingo Molnar
  2006-05-29 21:23 ` [patch 02/61] lock validator: forcedeth.c fix Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-30  1:33   ` Andrew Morton
  2006-05-29 21:23 ` [patch 04/61] lock validator: mutex section binutils workaround Ingo Molnar
                   ` (70 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

move the __attribute outside of the DEFINE_SPINLOCK() section.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 sound/oss/emu10k1/midi.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/sound/oss/emu10k1/midi.c
===================================================================
--- linux.orig/sound/oss/emu10k1/midi.c
+++ linux/sound/oss/emu10k1/midi.c
@@ -45,7 +45,7 @@
 #include "../sound_config.h"
 #endif
 
-static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));
+static __attribute((unused)) DEFINE_SPINLOCK(midi_spinlock);
 
 static void init_midi_hdr(struct midi_hdr *midihdr)
 {

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 04/61] lock validator: mutex section binutils workaround
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (2 preceding siblings ...)
  2006-05-29 21:23 ` [patch 03/61] lock validator: sound/oss/emu10k1/midi.c cleanup Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-29 21:23 ` [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond) Ingo Molnar
                   ` (69 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

work around weird section nesting build bug causing smp-alternatives
failures under certain circumstances.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/mutex.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/mutex.c
===================================================================
--- linux.orig/kernel/mutex.c
+++ linux/kernel/mutex.c
@@ -309,7 +309,7 @@ static inline int __mutex_trylock_slowpa
  * This function must not be used in interrupt context. The
  * mutex must be released by the same task that acquired it.
  */
-int fastcall mutex_trylock(struct mutex *lock)
+int fastcall __sched mutex_trylock(struct mutex *lock)
 {
 	return __mutex_fastpath_trylock(&lock->count,
 					__mutex_trylock_slowpath);

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond)
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (3 preceding siblings ...)
  2006-05-29 21:23 ` [patch 04/61] lock validator: mutex section binutils workaround Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-30  1:33   ` Andrew Morton
  2006-05-29 21:23 ` [patch 06/61] lock validator: add __module_address() method Ingo Molnar
                   ` (68 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

add WARN_ON_ONCE(cond) to print once-per-bootup messages.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/asm-generic/bug.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)

Index: linux/include/asm-generic/bug.h
===================================================================
--- linux.orig/include/asm-generic/bug.h
+++ linux/include/asm-generic/bug.h
@@ -44,4 +44,17 @@
 # define WARN_ON_SMP(x)			do { } while (0)
 #endif
 
+#define WARN_ON_ONCE(condition)				\
+({							\
+	static int __warn_once = 1;			\
+	int __ret = 0;					\
+							\
+	if (unlikely(__warn_once && (condition))) {	\
+		__warn_once = 0;			\
+		WARN_ON(1);				\
+		__ret = 1;				\
+	}						\
+	__ret;						\
+})
+
 #endif

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 06/61] lock validator: add __module_address() method
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (4 preceding siblings ...)
  2006-05-29 21:23 ` [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond) Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-30  1:33   ` Andrew Morton
  2006-05-29 21:23 ` [patch 07/61] lock validator: better lock debugging Ingo Molnar
                   ` (67 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

add __module_address() method - to be used by lockdep.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/module.h |    6 ++++++
 kernel/module.c        |   14 ++++++++++++++
 2 files changed, 20 insertions(+)

Index: linux/include/linux/module.h
===================================================================
--- linux.orig/include/linux/module.h
+++ linux/include/linux/module.h
@@ -371,6 +371,7 @@ static inline int module_is_live(struct 
 /* Is this address in a module? (second is with no locks, for oops) */
 struct module *module_text_address(unsigned long addr);
 struct module *__module_text_address(unsigned long addr);
+int __module_address(unsigned long addr);
 
 /* Returns module and fills in value, defined and namebuf, or NULL if
    symnum out of range. */
@@ -509,6 +510,11 @@ static inline struct module *__module_te
 	return NULL;
 }
 
+static inline int __module_address(unsigned long addr)
+{
+	return 0;
+}
+
 /* Get/put a kernel symbol (calls should be symmetric) */
 #define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); })
 #define symbol_put(x) do { } while(0)
Index: linux/kernel/module.c
===================================================================
--- linux.orig/kernel/module.c
+++ linux/kernel/module.c
@@ -2222,6 +2222,20 @@ const struct exception_table_entry *sear
 	return e;
 }
 
+/*
+ * Is this a valid module address? We don't grab the lock.
+ */
+int __module_address(unsigned long addr)
+{
+	struct module *mod;
+
+	list_for_each_entry(mod, &modules, list)
+		if (within(addr, mod->module_core, mod->core_size))
+			return 1;
+	return 0;
+}
+
+
 /* Is this a valid kernel address?  We don't grab the lock: we are oopsing. */
 struct module *__module_text_address(unsigned long addr)
 {

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 07/61] lock validator: better lock debugging
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (5 preceding siblings ...)
  2006-05-29 21:23 ` [patch 06/61] lock validator: add __module_address() method Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-30  1:33   ` Andrew Morton
  2006-05-29 21:23 ` [patch 08/61] lock validator: locking API self-tests Ingo Molnar
                   ` (66 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

generic lock debugging:

 - generalized lock debugging framework. For example, a bug in one lock
   subsystem turns off debugging in all lock subsystems.

 - got rid of the caller address passing from the mutex/rtmutex debugging
   code: it caused way too much prototype hackery, and lockdep will give
   the same information anyway.

 - ability to do silent tests

 - check lock freeing in vfree too.

 - more finegrained debugging options, to allow distributions to
   turn off more expensive debugging features.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/char/sysrq.c             |    2 
 include/asm-generic/mutex-null.h |   11 -
 include/linux/debug_locks.h      |   62 ++++++++
 include/linux/init_task.h        |    1 
 include/linux/mm.h               |    8 -
 include/linux/mutex-debug.h      |   12 -
 include/linux/mutex.h            |    6 
 include/linux/rtmutex.h          |   10 -
 include/linux/sched.h            |    4 
 init/main.c                      |    9 +
 kernel/exit.c                    |    5 
 kernel/fork.c                    |    4 
 kernel/mutex-debug.c             |  289 +++----------------------------------
 kernel/mutex-debug.h             |   87 +----------
 kernel/mutex.c                   |   83 +++++++---
 kernel/mutex.h                   |   18 --
 kernel/rtmutex-debug.c           |  302 +--------------------------------------
 kernel/rtmutex-debug.h           |    8 -
 kernel/rtmutex.c                 |   45 ++---
 kernel/rtmutex.h                 |    3 
 kernel/sched.c                   |   16 +-
 lib/Kconfig.debug                |   26 ++-
 lib/Makefile                     |    2 
 lib/debug_locks.c                |   45 +++++
 lib/spinlock_debug.c             |   60 +++----
 mm/vmalloc.c                     |    2 
 26 files changed, 329 insertions(+), 791 deletions(-)

Index: linux/drivers/char/sysrq.c
===================================================================
--- linux.orig/drivers/char/sysrq.c
+++ linux/drivers/char/sysrq.c
@@ -152,7 +152,7 @@ static struct sysrq_key_op sysrq_mountro
 static void sysrq_handle_showlocks(int key, struct pt_regs *pt_regs,
 				struct tty_struct *tty)
 {
-	mutex_debug_show_all_locks();
+	debug_show_all_locks();
 }
 static struct sysrq_key_op sysrq_showlocks_op = {
 	.handler	= sysrq_handle_showlocks,
Index: linux/include/asm-generic/mutex-null.h
===================================================================
--- linux.orig/include/asm-generic/mutex-null.h
+++ linux/include/asm-generic/mutex-null.h
@@ -10,14 +10,9 @@
 #ifndef _ASM_GENERIC_MUTEX_NULL_H
 #define _ASM_GENERIC_MUTEX_NULL_H
 
-/* extra parameter only needed for mutex debugging: */
-#ifndef __IP__
-# define __IP__
-#endif
-
-#define __mutex_fastpath_lock(count, fail_fn)	      fail_fn(count __RET_IP__)
-#define __mutex_fastpath_lock_retval(count, fail_fn)  fail_fn(count __RET_IP__)
-#define __mutex_fastpath_unlock(count, fail_fn)       fail_fn(count __RET_IP__)
+#define __mutex_fastpath_lock(count, fail_fn)	      fail_fn(count)
+#define __mutex_fastpath_lock_retval(count, fail_fn)  fail_fn(count)
+#define __mutex_fastpath_unlock(count, fail_fn)       fail_fn(count)
 #define __mutex_fastpath_trylock(count, fail_fn)      fail_fn(count)
 #define __mutex_slowpath_needs_to_unlock()	      1
 
Index: linux/include/linux/debug_locks.h
===================================================================
--- /dev/null
+++ linux/include/linux/debug_locks.h
@@ -0,0 +1,62 @@
+#ifndef __LINUX_DEBUG_LOCKING_H
+#define __LINUX_DEBUG_LOCKING_H
+
+extern int debug_locks;
+extern int debug_locks_silent;
+
+/*
+ * Generic 'turn off all lock debugging' function:
+ */
+extern int debug_locks_off(void);
+
+/*
+ * In the debug case we carry the caller's instruction pointer into
+ * other functions, but we dont want the function argument overhead
+ * in the nondebug case - hence these macros:
+ */
+#define _RET_IP_		(unsigned long)__builtin_return_address(0)
+#define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
+
+#define DEBUG_WARN_ON(c)						\
+({									\
+	int __ret = 0;							\
+									\
+	if (unlikely(c)) {						\
+		if (debug_locks_off())					\
+			WARN_ON(1);					\
+		__ret = 1;						\
+	}								\
+	__ret;								\
+})
+
+#ifdef CONFIG_SMP
+# define SMP_DEBUG_WARN_ON(c)			DEBUG_WARN_ON(c)
+#else
+# define SMP_DEBUG_WARN_ON(c)			do { } while (0)
+#endif
+
+#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
+  extern void locking_selftest(void);
+#else
+# define locking_selftest()	do { } while (0)
+#endif
+
+static inline void
+debug_check_no_locks_freed(const void *from, unsigned long len)
+{
+}
+
+static inline void
+debug_check_no_locks_held(struct task_struct *task)
+{
+}
+
+static inline void debug_show_all_locks(void)
+{
+}
+
+static inline void debug_show_held_locks(struct task_struct *task)
+{
+}
+
+#endif
Index: linux/include/linux/init_task.h
===================================================================
--- linux.orig/include/linux/init_task.h
+++ linux/include/linux/init_task.h
@@ -133,7 +133,6 @@ extern struct group_info init_groups;
 	.journal_info	= NULL,						\
 	.cpu_timers	= INIT_CPU_TIMERS(tsk.cpu_timers),		\
 	.fs_excl	= ATOMIC_INIT(0),				\
-	INIT_RT_MUTEXES(tsk)						\
 }
 
 
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h
+++ linux/include/linux/mm.h
@@ -14,6 +14,7 @@
 #include <linux/prio_tree.h>
 #include <linux/fs.h>
 #include <linux/mutex.h>
+#include <linux/debug_locks.h>
 
 struct mempolicy;
 struct anon_vma;
@@ -1080,13 +1081,6 @@ static inline void vm_stat_account(struc
 }
 #endif /* CONFIG_PROC_FS */
 
-static inline void
-debug_check_no_locks_freed(const void *from, unsigned long len)
-{
-	mutex_debug_check_no_locks_freed(from, len);
-	rt_mutex_debug_check_no_locks_freed(from, len);
-}
-
 #ifndef CONFIG_DEBUG_PAGEALLOC
 static inline void
 kernel_map_pages(struct page *page, int numpages, int enable)
Index: linux/include/linux/mutex-debug.h
===================================================================
--- linux.orig/include/linux/mutex-debug.h
+++ linux/include/linux/mutex-debug.h
@@ -7,17 +7,11 @@
  * Mutexes - debugging helpers:
  */
 
-#define __DEBUG_MUTEX_INITIALIZER(lockname) \
-	, .held_list = LIST_HEAD_INIT(lockname.held_list), \
-	  .name = #lockname , .magic = &lockname
+#define __DEBUG_MUTEX_INITIALIZER(lockname)				\
+	, .magic = &lockname
 
-#define mutex_init(sem)		__mutex_init(sem, __FUNCTION__)
+#define mutex_init(sem)		__mutex_init(sem, __FILE__":"#sem)
 
 extern void FASTCALL(mutex_destroy(struct mutex *lock));
 
-extern void mutex_debug_show_all_locks(void);
-extern void mutex_debug_show_held_locks(struct task_struct *filter);
-extern void mutex_debug_check_no_locks_held(struct task_struct *task);
-extern void mutex_debug_check_no_locks_freed(const void *from, unsigned long len);
-
 #endif
Index: linux/include/linux/mutex.h
===================================================================
--- linux.orig/include/linux/mutex.h
+++ linux/include/linux/mutex.h
@@ -50,8 +50,6 @@ struct mutex {
 	struct list_head	wait_list;
 #ifdef CONFIG_DEBUG_MUTEXES
 	struct thread_info	*owner;
-	struct list_head	held_list;
-	unsigned long		acquire_ip;
 	const char 		*name;
 	void			*magic;
 #endif
@@ -76,10 +74,6 @@ struct mutex_waiter {
 # define __DEBUG_MUTEX_INITIALIZER(lockname)
 # define mutex_init(mutex)			__mutex_init(mutex, NULL)
 # define mutex_destroy(mutex)				do { } while (0)
-# define mutex_debug_show_all_locks()			do { } while (0)
-# define mutex_debug_show_held_locks(p)			do { } while (0)
-# define mutex_debug_check_no_locks_held(task)		do { } while (0)
-# define mutex_debug_check_no_locks_freed(from, len)	do { } while (0)
 #endif
 
 #define __MUTEX_INITIALIZER(lockname) \
Index: linux/include/linux/rtmutex.h
===================================================================
--- linux.orig/include/linux/rtmutex.h
+++ linux/include/linux/rtmutex.h
@@ -29,8 +29,6 @@ struct rt_mutex {
 	struct task_struct	*owner;
 #ifdef CONFIG_DEBUG_RT_MUTEXES
 	int			save_state;
-	struct list_head	held_list_entry;
-	unsigned long		acquire_ip;
 	const char 		*name, *file;
 	int			line;
 	void			*magic;
@@ -98,14 +96,6 @@ extern int rt_mutex_trylock(struct rt_mu
 
 extern void rt_mutex_unlock(struct rt_mutex *lock);
 
-#ifdef CONFIG_DEBUG_RT_MUTEXES
-# define INIT_RT_MUTEX_DEBUG(tsk)					\
-	.held_list_head	= LIST_HEAD_INIT(tsk.held_list_head),		\
-	.held_list_lock	= SPIN_LOCK_UNLOCKED
-#else
-# define INIT_RT_MUTEX_DEBUG(tsk)
-#endif
-
 #ifdef CONFIG_RT_MUTEXES
 # define INIT_RT_MUTEXES(tsk)						\
 	.pi_waiters	= PLIST_HEAD_INIT(tsk.pi_waiters, tsk.pi_lock),	\
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -910,10 +910,6 @@ struct task_struct {
 	struct plist_head pi_waiters;
 	/* Deadlock detection and priority inheritance handling */
 	struct rt_mutex_waiter *pi_blocked_on;
-# ifdef CONFIG_DEBUG_RT_MUTEXES
-	spinlock_t held_list_lock;
-	struct list_head held_list_head;
-# endif
 #endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
Index: linux/init/main.c
===================================================================
--- linux.orig/init/main.c
+++ linux/init/main.c
@@ -53,6 +53,7 @@
 #include <linux/key.h>
 #include <linux/root_dev.h>
 #include <linux/buffer_head.h>
+#include <linux/debug_locks.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -512,6 +513,14 @@ asmlinkage void __init start_kernel(void
 		panic(panic_later, panic_param);
 	profile_init();
 	local_irq_enable();
+
+	/*
+	 * Need to run this when irqs are enabled, because it wants
+	 * to self-test [hard/soft]-irqs on/off lock inversion bugs
+	 * too:
+	 */
+	locking_selftest();
+
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 			initrd_start < min_low_pfn << PAGE_SHIFT) {
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -952,10 +952,9 @@ fastcall NORET_TYPE void do_exit(long co
 	if (unlikely(current->pi_state_cache))
 		kfree(current->pi_state_cache);
 	/*
-	 * If DEBUG_MUTEXES is on, make sure we are holding no locks:
+	 * Make sure we are holding no locks:
 	 */
-	mutex_debug_check_no_locks_held(tsk);
-	rt_mutex_debug_check_no_locks_held(tsk);
+	debug_check_no_locks_held(tsk);
 
 	if (tsk->io_context)
 		exit_io_context();
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -921,10 +921,6 @@ static inline void rt_mutex_init_task(st
 	spin_lock_init(&p->pi_lock);
 	plist_head_init(&p->pi_waiters, &p->pi_lock);
 	p->pi_blocked_on = NULL;
-# ifdef CONFIG_DEBUG_RT_MUTEXES
-	spin_lock_init(&p->held_list_lock);
-	INIT_LIST_HEAD(&p->held_list_head);
-# endif
 #endif
 }
 
Index: linux/kernel/mutex-debug.c
===================================================================
--- linux.orig/kernel/mutex-debug.c
+++ linux/kernel/mutex-debug.c
@@ -19,37 +19,10 @@
 #include <linux/spinlock.h>
 #include <linux/kallsyms.h>
 #include <linux/interrupt.h>
+#include <linux/debug_locks.h>
 
 #include "mutex-debug.h"
 
-/*
- * We need a global lock when we walk through the multi-process
- * lock tree. Only used in the deadlock-debugging case.
- */
-DEFINE_SPINLOCK(debug_mutex_lock);
-
-/*
- * All locks held by all tasks, in a single global list:
- */
-LIST_HEAD(debug_mutex_held_locks);
-
-/*
- * In the debug case we carry the caller's instruction pointer into
- * other functions, but we dont want the function argument overhead
- * in the nondebug case - hence these macros:
- */
-#define __IP_DECL__		, unsigned long ip
-#define __IP__			, ip
-#define __RET_IP__		, (unsigned long)__builtin_return_address(0)
-
-/*
- * "mutex debugging enabled" flag. We turn it off when we detect
- * the first problem because we dont want to recurse back
- * into the tracing code when doing error printk or
- * executing a BUG():
- */
-int debug_mutex_on = 1;
-
 static void printk_task(struct task_struct *p)
 {
 	if (p)
@@ -66,157 +39,28 @@ static void printk_ti(struct thread_info
 		printk("<none>");
 }
 
-static void printk_task_short(struct task_struct *p)
-{
-	if (p)
-		printk("%s/%d [%p, %3d]", p->comm, p->pid, p, p->prio);
-	else
-		printk("<none>");
-}
-
 static void printk_lock(struct mutex *lock, int print_owner)
 {
-	printk(" [%p] {%s}\n", lock, lock->name);
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+	printk(" [%p] {%s}\n", lock, lock->dep_map.name);
+#else
+	printk(" [%p]\n", lock);
+#endif
 
 	if (print_owner && lock->owner) {
 		printk(".. held by:  ");
 		printk_ti(lock->owner);
 		printk("\n");
 	}
-	if (lock->owner) {
-		printk("... acquired at:               ");
-		print_symbol("%s\n", lock->acquire_ip);
-	}
-}
-
-/*
- * printk locks held by a task:
- */
-static void show_task_locks(struct task_struct *p)
-{
-	switch (p->state) {
-	case TASK_RUNNING:		printk("R"); break;
-	case TASK_INTERRUPTIBLE:	printk("S"); break;
-	case TASK_UNINTERRUPTIBLE:	printk("D"); break;
-	case TASK_STOPPED:		printk("T"); break;
-	case EXIT_ZOMBIE:		printk("Z"); break;
-	case EXIT_DEAD:			printk("X"); break;
-	default:			printk("?"); break;
-	}
-	printk_task(p);
-	if (p->blocked_on) {
-		struct mutex *lock = p->blocked_on->lock;
-
-		printk(" blocked on mutex:");
-		printk_lock(lock, 1);
-	} else
-		printk(" (not blocked on mutex)\n");
-}
-
-/*
- * printk all locks held in the system (if filter == NULL),
- * or all locks belonging to a single task (if filter != NULL):
- */
-void show_held_locks(struct task_struct *filter)
-{
-	struct list_head *curr, *cursor = NULL;
-	struct mutex *lock;
-	struct thread_info *t;
-	unsigned long flags;
-	int count = 0;
-
-	if (filter) {
-		printk("------------------------------\n");
-		printk("| showing all locks held by: |  (");
-		printk_task_short(filter);
-		printk("):\n");
-		printk("------------------------------\n");
-	} else {
-		printk("---------------------------\n");
-		printk("| showing all locks held: |\n");
-		printk("---------------------------\n");
-	}
-
-	/*
-	 * Play safe and acquire the global trace lock. We
-	 * cannot printk with that lock held so we iterate
-	 * very carefully:
-	 */
-next:
-	debug_spin_lock_save(&debug_mutex_lock, flags);
-	list_for_each(curr, &debug_mutex_held_locks) {
-		if (cursor && curr != cursor)
-			continue;
-		lock = list_entry(curr, struct mutex, held_list);
-		t = lock->owner;
-		if (filter && (t != filter->thread_info))
-			continue;
-		count++;
-		cursor = curr->next;
-		debug_spin_unlock_restore(&debug_mutex_lock, flags);
-
-		printk("\n#%03d:            ", count);
-		printk_lock(lock, filter ? 0 : 1);
-		goto next;
-	}
-	debug_spin_unlock_restore(&debug_mutex_lock, flags);
-	printk("\n");
-}
-
-void mutex_debug_show_all_locks(void)
-{
-	struct task_struct *g, *p;
-	int count = 10;
-	int unlock = 1;
-
-	printk("\nShowing all blocking locks in the system:\n");
-
-	/*
-	 * Here we try to get the tasklist_lock as hard as possible,
-	 * if not successful after 2 seconds we ignore it (but keep
-	 * trying). This is to enable a debug printout even if a
-	 * tasklist_lock-holding task deadlocks or crashes.
-	 */
-retry:
-	if (!read_trylock(&tasklist_lock)) {
-		if (count == 10)
-			printk("hm, tasklist_lock locked, retrying... ");
-		if (count) {
-			count--;
-			printk(" #%d", 10-count);
-			mdelay(200);
-			goto retry;
-		}
-		printk(" ignoring it.\n");
-		unlock = 0;
-	}
-	if (count != 10)
-		printk(" locked it.\n");
-
-	do_each_thread(g, p) {
-		show_task_locks(p);
-		if (!unlock)
-			if (read_trylock(&tasklist_lock))
-				unlock = 1;
-	} while_each_thread(g, p);
-
-	printk("\n");
-	show_held_locks(NULL);
-	printk("=============================================\n\n");
-
-	if (unlock)
-		read_unlock(&tasklist_lock);
 }
 
 static void report_deadlock(struct task_struct *task, struct mutex *lock,
-			    struct mutex *lockblk, unsigned long ip)
+			    struct mutex *lockblk)
 {
 	printk("\n%s/%d is trying to acquire this lock:\n",
 		current->comm, current->pid);
 	printk_lock(lock, 1);
-	printk("... trying at:                 ");
-	print_symbol("%s\n", ip);
-	show_held_locks(current);
+	debug_show_held_locks(current);
 
 	if (lockblk) {
 		printk("but %s/%d is deadlocking current task %s/%d!\n\n",
@@ -225,7 +69,7 @@ static void report_deadlock(struct task_
 			task->comm, task->pid);
 		printk_lock(lockblk, 1);
 
-		show_held_locks(task);
+		debug_show_held_locks(task);
 
 		printk("\n%s/%d's [blocked] stackdump:\n\n",
 			task->comm, task->pid);
@@ -235,7 +79,7 @@ static void report_deadlock(struct task_
 	printk("\n%s/%d's [current] stackdump:\n\n",
 		current->comm, current->pid);
 	dump_stack();
-	mutex_debug_show_all_locks();
+	debug_show_all_locks();
 	printk("[ turning off deadlock detection. Please report this. ]\n\n");
 	local_irq_disable();
 }
@@ -243,13 +87,12 @@ static void report_deadlock(struct task_
 /*
  * Recursively check for mutex deadlocks:
  */
-static int check_deadlock(struct mutex *lock, int depth,
-			  struct thread_info *ti, unsigned long ip)
+static int check_deadlock(struct mutex *lock, int depth, struct thread_info *ti)
 {
 	struct mutex *lockblk;
 	struct task_struct *task;
 
-	if (!debug_mutex_on)
+	if (!debug_locks)
 		return 0;
 
 	ti = lock->owner;
@@ -263,123 +106,46 @@ static int check_deadlock(struct mutex *
 
 	/* Self-deadlock: */
 	if (current == task) {
-		DEBUG_OFF();
+		debug_locks_off();
 		if (depth)
 			return 1;
 		printk("\n==========================================\n");
 		printk(  "[ BUG: lock recursion deadlock detected! |\n");
 		printk(  "------------------------------------------\n");
-		report_deadlock(task, lock, NULL, ip);
+		report_deadlock(task, lock, NULL);
 		return 0;
 	}
 
 	/* Ugh, something corrupted the lock data structure? */
 	if (depth > 20) {
-		DEBUG_OFF();
+		debug_locks_off();
 		printk("\n===========================================\n");
 		printk(  "[ BUG: infinite lock dependency detected!? |\n");
 		printk(  "-------------------------------------------\n");
-		report_deadlock(task, lock, lockblk, ip);
+		report_deadlock(task, lock, lockblk);
 		return 0;
 	}
 
 	/* Recursively check for dependencies: */
-	if (lockblk && check_deadlock(lockblk, depth+1, ti, ip)) {
+	if (lockblk && check_deadlock(lockblk, depth+1, ti)) {
 		printk("\n============================================\n");
 		printk(  "[ BUG: circular locking deadlock detected! ]\n");
 		printk(  "--------------------------------------------\n");
-		report_deadlock(task, lock, lockblk, ip);
+		report_deadlock(task, lock, lockblk);
 		return 0;
 	}
 	return 0;
 }
 
 /*
- * Called when a task exits, this function checks whether the
- * task is holding any locks, and reports the first one if so:
- */
-void mutex_debug_check_no_locks_held(struct task_struct *task)
-{
-	struct list_head *curr, *next;
-	struct thread_info *t;
-	unsigned long flags;
-	struct mutex *lock;
-
-	if (!debug_mutex_on)
-		return;
-
-	debug_spin_lock_save(&debug_mutex_lock, flags);
-	list_for_each_safe(curr, next, &debug_mutex_held_locks) {
-		lock = list_entry(curr, struct mutex, held_list);
-		t = lock->owner;
-		if (t != task->thread_info)
-			continue;
-		list_del_init(curr);
-		DEBUG_OFF();
-		debug_spin_unlock_restore(&debug_mutex_lock, flags);
-
-		printk("BUG: %s/%d, lock held at task exit time!\n",
-			task->comm, task->pid);
-		printk_lock(lock, 1);
-		if (lock->owner != task->thread_info)
-			printk("exiting task is not even the owner??\n");
-		return;
-	}
-	debug_spin_unlock_restore(&debug_mutex_lock, flags);
-}
-
-/*
- * Called when kernel memory is freed (or unmapped), or if a mutex
- * is destroyed or reinitialized - this code checks whether there is
- * any held lock in the memory range of <from> to <to>:
- */
-void mutex_debug_check_no_locks_freed(const void *from, unsigned long len)
-{
-	struct list_head *curr, *next;
-	const void *to = from + len;
-	unsigned long flags;
-	struct mutex *lock;
-	void *lock_addr;
-
-	if (!debug_mutex_on)
-		return;
-
-	debug_spin_lock_save(&debug_mutex_lock, flags);
-	list_for_each_safe(curr, next, &debug_mutex_held_locks) {
-		lock = list_entry(curr, struct mutex, held_list);
-		lock_addr = lock;
-		if (lock_addr < from || lock_addr >= to)
-			continue;
-		list_del_init(curr);
-		DEBUG_OFF();
-		debug_spin_unlock_restore(&debug_mutex_lock, flags);
-
-		printk("BUG: %s/%d, active lock [%p(%p-%p)] freed!\n",
-			current->comm, current->pid, lock, from, to);
-		dump_stack();
-		printk_lock(lock, 1);
-		if (lock->owner != current_thread_info())
-			printk("freeing task is not even the owner??\n");
-		return;
-	}
-	debug_spin_unlock_restore(&debug_mutex_lock, flags);
-}
-
-/*
  * Must be called with lock->wait_lock held.
  */
-void debug_mutex_set_owner(struct mutex *lock,
-			   struct thread_info *new_owner __IP_DECL__)
+void debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner)
 {
 	lock->owner = new_owner;
-	DEBUG_WARN_ON(!list_empty(&lock->held_list));
-	if (debug_mutex_on) {
-		list_add_tail(&lock->held_list, &debug_mutex_held_locks);
-		lock->acquire_ip = ip;
-	}
 }
 
-void debug_mutex_init_waiter(struct mutex_waiter *waiter)
+void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
 {
 	memset(waiter, 0x11, sizeof(*waiter));
 	waiter->magic = waiter;
@@ -401,10 +167,12 @@ void debug_mutex_free_waiter(struct mute
 }
 
 void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
-			    struct thread_info *ti __IP_DECL__)
+			    struct thread_info *ti)
 {
 	SMP_DEBUG_WARN_ON(!spin_is_locked(&lock->wait_lock));
-	check_deadlock(lock, 0, ti, ip);
+#ifdef CONFIG_DEBUG_MUTEX_DEADLOCKS
+	check_deadlock(lock, 0, ti);
+#endif
 	/* Mark the current thread as blocked on the lock: */
 	ti->task->blocked_on = waiter;
 	waiter->lock = lock;
@@ -424,13 +192,10 @@ void mutex_remove_waiter(struct mutex *l
 
 void debug_mutex_unlock(struct mutex *lock)
 {
+	DEBUG_WARN_ON(lock->owner != current_thread_info());
 	DEBUG_WARN_ON(lock->magic != lock);
 	DEBUG_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
 	DEBUG_WARN_ON(lock->owner != current_thread_info());
-	if (debug_mutex_on) {
-		DEBUG_WARN_ON(list_empty(&lock->held_list));
-		list_del_init(&lock->held_list);
-	}
 }
 
 void debug_mutex_init(struct mutex *lock, const char *name)
@@ -438,10 +203,8 @@ void debug_mutex_init(struct mutex *lock
 	/*
 	 * Make sure we are not reinitializing a held lock:
 	 */
-	mutex_debug_check_no_locks_freed((void *)lock, sizeof(*lock));
+	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
 	lock->owner = NULL;
-	INIT_LIST_HEAD(&lock->held_list);
-	lock->name = name;
 	lock->magic = lock;
 }
 
Index: linux/kernel/mutex-debug.h
===================================================================
--- linux.orig/kernel/mutex-debug.h
+++ linux/kernel/mutex-debug.h
@@ -10,110 +10,43 @@
  * More details are in kernel/mutex-debug.c.
  */
 
-extern spinlock_t debug_mutex_lock;
-extern struct list_head debug_mutex_held_locks;
-extern int debug_mutex_on;
-
-/*
- * In the debug case we carry the caller's instruction pointer into
- * other functions, but we dont want the function argument overhead
- * in the nondebug case - hence these macros:
- */
-#define __IP_DECL__		, unsigned long ip
-#define __IP__			, ip
-#define __RET_IP__		, (unsigned long)__builtin_return_address(0)
-
 /*
  * This must be called with lock->wait_lock held.
  */
-extern void debug_mutex_set_owner(struct mutex *lock,
-				  struct thread_info *new_owner __IP_DECL__);
+extern void
+debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner);
 
 static inline void debug_mutex_clear_owner(struct mutex *lock)
 {
 	lock->owner = NULL;
 }
 
-extern void debug_mutex_init_waiter(struct mutex_waiter *waiter);
+extern void debug_mutex_lock_common(struct mutex *lock,
+				    struct mutex_waiter *waiter);
 extern void debug_mutex_wake_waiter(struct mutex *lock,
 				    struct mutex_waiter *waiter);
 extern void debug_mutex_free_waiter(struct mutex_waiter *waiter);
 extern void debug_mutex_add_waiter(struct mutex *lock,
 				   struct mutex_waiter *waiter,
-				   struct thread_info *ti __IP_DECL__);
+				   struct thread_info *ti);
 extern void mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
 				struct thread_info *ti);
 extern void debug_mutex_unlock(struct mutex *lock);
 extern void debug_mutex_init(struct mutex *lock, const char *name);
 
-#define debug_spin_lock_save(lock, flags)		\
-	do {						\
-		local_irq_save(flags);			\
-		if (debug_mutex_on)			\
-			spin_lock(lock);		\
-	} while (0)
-
-#define debug_spin_unlock_restore(lock, flags)		\
-	do {						\
-		if (debug_mutex_on)			\
-			spin_unlock(lock);		\
-		local_irq_restore(flags);		\
-		preempt_check_resched();		\
-	} while (0)
-
 #define spin_lock_mutex(lock, flags)			\
 	do {						\
 		struct mutex *l = container_of(lock, struct mutex, wait_lock); \
 							\
 		DEBUG_WARN_ON(in_interrupt());		\
-		debug_spin_lock_save(&debug_mutex_lock, flags); \
-		spin_lock(lock);			\
+		local_irq_save(flags);			\
+		__raw_spin_lock(&(lock)->raw_lock);	\
 		DEBUG_WARN_ON(l->magic != l);		\
 	} while (0)
 
 #define spin_unlock_mutex(lock, flags)			\
 	do {						\
-		spin_unlock(lock);			\
-		debug_spin_unlock_restore(&debug_mutex_lock, flags);	\
+		__raw_spin_unlock(&(lock)->raw_lock);	\
+		local_irq_restore(flags);		\
+		preempt_check_resched();		\
 	} while (0)
-
-#define DEBUG_OFF()					\
-do {							\
-	if (debug_mutex_on) {				\
-		debug_mutex_on = 0;			\
-		console_verbose();			\
-		if (spin_is_locked(&debug_mutex_lock))	\
-			spin_unlock(&debug_mutex_lock);	\
-	}						\
-} while (0)
-
-#define DEBUG_BUG()					\
-do {							\
-	if (debug_mutex_on) {				\
-		DEBUG_OFF();				\
-		BUG();					\
-	}						\
-} while (0)
-
-#define DEBUG_WARN_ON(c)				\
-do {							\
-	if (unlikely(c && debug_mutex_on)) {		\
-		DEBUG_OFF();				\
-		WARN_ON(1);				\
-	}						\
-} while (0)
-
-# define DEBUG_BUG_ON(c)				\
-do {							\
-	if (unlikely(c))				\
-		DEBUG_BUG();				\
-} while (0)
-
-#ifdef CONFIG_SMP
-# define SMP_DEBUG_WARN_ON(c)			DEBUG_WARN_ON(c)
-# define SMP_DEBUG_BUG_ON(c)			DEBUG_BUG_ON(c)
-#else
-# define SMP_DEBUG_WARN_ON(c)			do { } while (0)
-# define SMP_DEBUG_BUG_ON(c)			do { } while (0)
-#endif
-
Index: linux/kernel/mutex.c
===================================================================
--- linux.orig/kernel/mutex.c
+++ linux/kernel/mutex.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
+#include <linux/debug_locks.h>
 
 /*
  * In the DEBUG case we are using the "NULL fastpath" for mutexes,
@@ -38,7 +39,7 @@
  *
  * It is not allowed to initialize an already locked mutex.
  */
-void fastcall __mutex_init(struct mutex *lock, const char *name)
+__always_inline void fastcall __mutex_init(struct mutex *lock, const char *name)
 {
 	atomic_set(&lock->count, 1);
 	spin_lock_init(&lock->wait_lock);
@@ -56,7 +57,7 @@ EXPORT_SYMBOL(__mutex_init);
  * branch is predicted by the CPU as default-untaken.
  */
 static void fastcall noinline __sched
-__mutex_lock_slowpath(atomic_t *lock_count __IP_DECL__);
+__mutex_lock_slowpath(atomic_t *lock_count);
 
 /***
  * mutex_lock - acquire the mutex
@@ -79,7 +80,7 @@ __mutex_lock_slowpath(atomic_t *lock_cou
  *
  * This function is similar to (but not equivalent to) down().
  */
-void fastcall __sched mutex_lock(struct mutex *lock)
+void inline fastcall __sched mutex_lock(struct mutex *lock)
 {
 	might_sleep();
 	/*
@@ -92,7 +93,7 @@ void fastcall __sched mutex_lock(struct 
 EXPORT_SYMBOL(mutex_lock);
 
 static void fastcall noinline __sched
-__mutex_unlock_slowpath(atomic_t *lock_count __IP_DECL__);
+__mutex_unlock_slowpath(atomic_t *lock_count);
 
 /***
  * mutex_unlock - release the mutex
@@ -116,22 +117,36 @@ void fastcall __sched mutex_unlock(struc
 
 EXPORT_SYMBOL(mutex_unlock);
 
+static void fastcall noinline __sched
+__mutex_unlock_non_nested_slowpath(atomic_t *lock_count);
+
+void fastcall __sched mutex_unlock_non_nested(struct mutex *lock)
+{
+	/*
+	 * The unlocking fastpath is the 0->1 transition from 'locked'
+	 * into 'unlocked' state:
+	 */
+	__mutex_fastpath_unlock(&lock->count, __mutex_unlock_non_nested_slowpath);
+}
+
+EXPORT_SYMBOL(mutex_unlock_non_nested);
+
+
 /*
  * Lock a mutex (possibly interruptible), slowpath:
  */
 static inline int __sched
-__mutex_lock_common(struct mutex *lock, long state __IP_DECL__)
+__mutex_lock_common(struct mutex *lock, long state, unsigned int subtype)
 {
 	struct task_struct *task = current;
 	struct mutex_waiter waiter;
 	unsigned int old_val;
 	unsigned long flags;
 
-	debug_mutex_init_waiter(&waiter);
-
 	spin_lock_mutex(&lock->wait_lock, flags);
 
-	debug_mutex_add_waiter(lock, &waiter, task->thread_info, ip);
+	debug_mutex_lock_common(lock, &waiter);
+	debug_mutex_add_waiter(lock, &waiter, task->thread_info);
 
 	/* add waiting tasks to the end of the waitqueue (FIFO): */
 	list_add_tail(&waiter.list, &lock->wait_list);
@@ -173,7 +188,7 @@ __mutex_lock_common(struct mutex *lock, 
 
 	/* got the lock - rejoice! */
 	mutex_remove_waiter(lock, &waiter, task->thread_info);
-	debug_mutex_set_owner(lock, task->thread_info __IP__);
+	debug_mutex_set_owner(lock, task->thread_info);
 
 	/* set it to 0 if there are no waiters left: */
 	if (likely(list_empty(&lock->wait_list)))
@@ -183,32 +198,41 @@ __mutex_lock_common(struct mutex *lock, 
 
 	debug_mutex_free_waiter(&waiter);
 
-	DEBUG_WARN_ON(list_empty(&lock->held_list));
 	DEBUG_WARN_ON(lock->owner != task->thread_info);
 
 	return 0;
 }
 
 static void fastcall noinline __sched
-__mutex_lock_slowpath(atomic_t *lock_count __IP_DECL__)
+__mutex_lock_slowpath(atomic_t *lock_count)
 {
 	struct mutex *lock = container_of(lock_count, struct mutex, count);
 
-	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE __IP__);
+	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0);
 }
 
+#ifdef CONFIG_DEBUG_MUTEXES
+void __sched
+mutex_lock_nested(struct mutex *lock, unsigned int subtype)
+{
+	might_sleep();
+	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subtype);
+}
+
+EXPORT_SYMBOL_GPL(mutex_lock_nested);
+#endif
+
 /*
  * Release the lock, slowpath:
  */
-static fastcall noinline void
-__mutex_unlock_slowpath(atomic_t *lock_count __IP_DECL__)
+static fastcall inline void
+__mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
 {
 	struct mutex *lock = container_of(lock_count, struct mutex, count);
 	unsigned long flags;
 
-	DEBUG_WARN_ON(lock->owner != current_thread_info());
-
 	spin_lock_mutex(&lock->wait_lock, flags);
+	debug_mutex_unlock(lock);
 
 	/*
 	 * some architectures leave the lock unlocked in the fastpath failure
@@ -218,8 +242,6 @@ __mutex_unlock_slowpath(atomic_t *lock_c
 	if (__mutex_slowpath_needs_to_unlock())
 		atomic_set(&lock->count, 1);
 
-	debug_mutex_unlock(lock);
-
 	if (!list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
 		struct mutex_waiter *waiter =
@@ -237,11 +259,27 @@ __mutex_unlock_slowpath(atomic_t *lock_c
 }
 
 /*
+ * Release the lock, slowpath:
+ */
+static fastcall noinline void
+__mutex_unlock_slowpath(atomic_t *lock_count)
+{
+	__mutex_unlock_common_slowpath(lock_count, 1);
+}
+
+static fastcall noinline void
+__mutex_unlock_non_nested_slowpath(atomic_t *lock_count)
+{
+	__mutex_unlock_common_slowpath(lock_count, 0);
+}
+
+
+/*
  * Here come the less common (and hence less performance-critical) APIs:
  * mutex_lock_interruptible() and mutex_trylock().
  */
 static int fastcall noinline __sched
-__mutex_lock_interruptible_slowpath(atomic_t *lock_count __IP_DECL__);
+__mutex_lock_interruptible_slowpath(atomic_t *lock_count);
 
 /***
  * mutex_lock_interruptible - acquire the mutex, interruptable
@@ -264,11 +302,11 @@ int fastcall __sched mutex_lock_interrup
 EXPORT_SYMBOL(mutex_lock_interruptible);
 
 static int fastcall noinline __sched
-__mutex_lock_interruptible_slowpath(atomic_t *lock_count __IP_DECL__)
+__mutex_lock_interruptible_slowpath(atomic_t *lock_count)
 {
 	struct mutex *lock = container_of(lock_count, struct mutex, count);
 
-	return __mutex_lock_common(lock, TASK_INTERRUPTIBLE __IP__);
+	return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0);
 }
 
 /*
@@ -285,7 +323,8 @@ static inline int __mutex_trylock_slowpa
 
 	prev = atomic_xchg(&lock->count, -1);
 	if (likely(prev == 1))
-		debug_mutex_set_owner(lock, current_thread_info() __RET_IP__);
+		debug_mutex_set_owner(lock, current_thread_info());
+
 	/* Set it back to 0 if there are no waiters: */
 	if (likely(list_empty(&lock->wait_list)))
 		atomic_set(&lock->count, 0);
Index: linux/kernel/mutex.h
===================================================================
--- linux.orig/kernel/mutex.h
+++ linux/kernel/mutex.h
@@ -19,19 +19,15 @@
 #define DEBUG_WARN_ON(c)				do { } while (0)
 #define debug_mutex_set_owner(lock, new_owner)		do { } while (0)
 #define debug_mutex_clear_owner(lock)			do { } while (0)
-#define debug_mutex_init_waiter(waiter)			do { } while (0)
 #define debug_mutex_wake_waiter(lock, waiter)		do { } while (0)
 #define debug_mutex_free_waiter(waiter)			do { } while (0)
-#define debug_mutex_add_waiter(lock, waiter, ti, ip)	do { } while (0)
+#define debug_mutex_add_waiter(lock, waiter, ti)	do { } while (0)
+#define mutex_acquire(lock, subtype, trylock)	do { } while (0)
+#define mutex_release(lock, nested)		do { } while (0)
 #define debug_mutex_unlock(lock)			do { } while (0)
 #define debug_mutex_init(lock, name)			do { } while (0)
 
-/*
- * Return-address parameters/declarations. They are very useful for
- * debugging, but add overhead in the !DEBUG case - so we go the
- * trouble of using this not too elegant but zero-cost solution:
- */
-#define __IP_DECL__
-#define __IP__
-#define __RET_IP__
-
+static inline void
+debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
+{
+}
Index: linux/kernel/rtmutex-debug.c
===================================================================
--- linux.orig/kernel/rtmutex-debug.c
+++ linux/kernel/rtmutex-debug.c
@@ -26,6 +26,7 @@
 #include <linux/interrupt.h>
 #include <linux/plist.h>
 #include <linux/fs.h>
+#include <linux/debug_locks.h>
 
 #include "rtmutex_common.h"
 
@@ -45,8 +46,6 @@ do {								\
 		console_verbose();				\
 		if (spin_is_locked(&current->pi_lock))		\
 			spin_unlock(&current->pi_lock);		\
-		if (spin_is_locked(&current->held_list_lock))	\
-			spin_unlock(&current->held_list_lock);	\
 	}							\
 } while (0)
 
@@ -105,14 +104,6 @@ static void printk_task(task_t *p)
 		printk("<none>");
 }
 
-static void printk_task_short(task_t *p)
-{
-	if (p)
-		printk("%s/%d [%p, %3d]", p->comm, p->pid, p, p->prio);
-	else
-		printk("<none>");
-}
-
 static void printk_lock(struct rt_mutex *lock, int print_owner)
 {
 	if (lock->name)
@@ -128,222 +119,6 @@ static void printk_lock(struct rt_mutex 
 		printk_task(rt_mutex_owner(lock));
 		printk("\n");
 	}
-	if (rt_mutex_owner(lock)) {
-		printk("... acquired at:               ");
-		print_symbol("%s\n", lock->acquire_ip);
-	}
-}
-
-static void printk_waiter(struct rt_mutex_waiter *w)
-{
-	printk("-------------------------\n");
-	printk("| waiter struct %p:\n", w);
-	printk("| w->list_entry: [DP:%p/%p|SP:%p/%p|PRI:%d]\n",
-	       w->list_entry.plist.prio_list.prev, w->list_entry.plist.prio_list.next,
-	       w->list_entry.plist.node_list.prev, w->list_entry.plist.node_list.next,
-	       w->list_entry.prio);
-	printk("| w->pi_list_entry: [DP:%p/%p|SP:%p/%p|PRI:%d]\n",
-	       w->pi_list_entry.plist.prio_list.prev, w->pi_list_entry.plist.prio_list.next,
-	       w->pi_list_entry.plist.node_list.prev, w->pi_list_entry.plist.node_list.next,
-	       w->pi_list_entry.prio);
-	printk("\n| lock:\n");
-	printk_lock(w->lock, 1);
-	printk("| w->ti->task:\n");
-	printk_task(w->task);
-	printk("| blocked at:  ");
-	print_symbol("%s\n", w->ip);
-	printk("-------------------------\n");
-}
-
-static void show_task_locks(task_t *p)
-{
-	switch (p->state) {
-	case TASK_RUNNING:		printk("R"); break;
-	case TASK_INTERRUPTIBLE:	printk("S"); break;
-	case TASK_UNINTERRUPTIBLE:	printk("D"); break;
-	case TASK_STOPPED:		printk("T"); break;
-	case EXIT_ZOMBIE:		printk("Z"); break;
-	case EXIT_DEAD:			printk("X"); break;
-	default:			printk("?"); break;
-	}
-	printk_task(p);
-	if (p->pi_blocked_on) {
-		struct rt_mutex *lock = p->pi_blocked_on->lock;
-
-		printk(" blocked on:");
-		printk_lock(lock, 1);
-	} else
-		printk(" (not blocked)\n");
-}
-
-void rt_mutex_show_held_locks(task_t *task, int verbose)
-{
-	struct list_head *curr, *cursor = NULL;
-	struct rt_mutex *lock;
-	task_t *t;
-	unsigned long flags;
-	int count = 0;
-
-	if (!rt_trace_on)
-		return;
-
-	if (verbose) {
-		printk("------------------------------\n");
-		printk("| showing all locks held by: |  (");
-		printk_task_short(task);
-		printk("):\n");
-		printk("------------------------------\n");
-	}
-
-next:
-	spin_lock_irqsave(&task->held_list_lock, flags);
-	list_for_each(curr, &task->held_list_head) {
-		if (cursor && curr != cursor)
-			continue;
-		lock = list_entry(curr, struct rt_mutex, held_list_entry);
-		t = rt_mutex_owner(lock);
-		WARN_ON(t != task);
-		count++;
-		cursor = curr->next;
-		spin_unlock_irqrestore(&task->held_list_lock, flags);
-
-		printk("\n#%03d:            ", count);
-		printk_lock(lock, 0);
-		goto next;
-	}
-	spin_unlock_irqrestore(&task->held_list_lock, flags);
-
-	printk("\n");
-}
-
-void rt_mutex_show_all_locks(void)
-{
-	task_t *g, *p;
-	int count = 10;
-	int unlock = 1;
-
-	printk("\n");
-	printk("----------------------\n");
-	printk("| showing all tasks: |\n");
-	printk("----------------------\n");
-
-	/*
-	 * Here we try to get the tasklist_lock as hard as possible,
-	 * if not successful after 2 seconds we ignore it (but keep
-	 * trying). This is to enable a debug printout even if a
-	 * tasklist_lock-holding task deadlocks or crashes.
-	 */
-retry:
-	if (!read_trylock(&tasklist_lock)) {
-		if (count == 10)
-			printk("hm, tasklist_lock locked, retrying... ");
-		if (count) {
-			count--;
-			printk(" #%d", 10-count);
-			mdelay(200);
-			goto retry;
-		}
-		printk(" ignoring it.\n");
-		unlock = 0;
-	}
-	if (count != 10)
-		printk(" locked it.\n");
-
-	do_each_thread(g, p) {
-		show_task_locks(p);
-		if (!unlock)
-			if (read_trylock(&tasklist_lock))
-				unlock = 1;
-	} while_each_thread(g, p);
-
-	printk("\n");
-
-	printk("-----------------------------------------\n");
-	printk("| showing all locks held in the system: |\n");
-	printk("-----------------------------------------\n");
-
-	do_each_thread(g, p) {
-		rt_mutex_show_held_locks(p, 0);
-		if (!unlock)
-			if (read_trylock(&tasklist_lock))
-				unlock = 1;
-	} while_each_thread(g, p);
-
-
-	printk("=============================================\n\n");
-
-	if (unlock)
-		read_unlock(&tasklist_lock);
-}
-
-void rt_mutex_debug_check_no_locks_held(task_t *task)
-{
-	struct rt_mutex_waiter *w;
-	struct list_head *curr;
-	struct rt_mutex *lock;
-
-	if (!rt_trace_on)
-		return;
-	if (!rt_prio(task->normal_prio) && rt_prio(task->prio)) {
-		printk("BUG: PI priority boost leaked!\n");
-		printk_task(task);
-		printk("\n");
-	}
-	if (list_empty(&task->held_list_head))
-		return;
-
-	spin_lock(&task->pi_lock);
-	plist_for_each_entry(w, &task->pi_waiters, pi_list_entry) {
-		TRACE_OFF();
-
-		printk("hm, PI interest held at exit time? Task:\n");
-		printk_task(task);
-		printk_waiter(w);
-		return;
-	}
-	spin_unlock(&task->pi_lock);
-
-	list_for_each(curr, &task->held_list_head) {
-		lock = list_entry(curr, struct rt_mutex, held_list_entry);
-
-		printk("BUG: %s/%d, lock held at task exit time!\n",
-		       task->comm, task->pid);
-		printk_lock(lock, 1);
-		if (rt_mutex_owner(lock) != task)
-			printk("exiting task is not even the owner??\n");
-	}
-}
-
-int rt_mutex_debug_check_no_locks_freed(const void *from, unsigned long len)
-{
-	const void *to = from + len;
-	struct list_head *curr;
-	struct rt_mutex *lock;
-	unsigned long flags;
-	void *lock_addr;
-
-	if (!rt_trace_on)
-		return 0;
-
-	spin_lock_irqsave(&current->held_list_lock, flags);
-	list_for_each(curr, &current->held_list_head) {
-		lock = list_entry(curr, struct rt_mutex, held_list_entry);
-		lock_addr = lock;
-		if (lock_addr < from || lock_addr >= to)
-			continue;
-		TRACE_OFF();
-
-		printk("BUG: %s/%d, active lock [%p(%p-%p)] freed!\n",
-			current->comm, current->pid, lock, from, to);
-		dump_stack();
-		printk_lock(lock, 1);
-		if (rt_mutex_owner(lock) != current)
-			printk("freeing task is not even the owner??\n");
-		return 1;
-	}
-	spin_unlock_irqrestore(&current->held_list_lock, flags);
-
-	return 0;
 }
 
 void rt_mutex_debug_task_free(struct task_struct *task)
@@ -395,85 +170,41 @@ void debug_rt_mutex_print_deadlock(struc
 	       current->comm, current->pid);
 	printk_lock(waiter->lock, 1);
 
-	printk("... trying at:                 ");
-	print_symbol("%s\n", waiter->ip);
-
 	printk("\n2) %s/%d is blocked on this lock:\n", task->comm, task->pid);
 	printk_lock(waiter->deadlock_lock, 1);
 
-	rt_mutex_show_held_locks(current, 1);
-	rt_mutex_show_held_locks(task, 1);
+	debug_show_held_locks(current);
+	debug_show_held_locks(task);
 
 	printk("\n%s/%d's [blocked] stackdump:\n\n", task->comm, task->pid);
 	show_stack(task, NULL);
 	printk("\n%s/%d's [current] stackdump:\n\n",
 	       current->comm, current->pid);
 	dump_stack();
-	rt_mutex_show_all_locks();
+	debug_show_all_locks();
+
 	printk("[ turning off deadlock detection."
 	       "Please report this trace. ]\n\n");
 	local_irq_disable();
 }
 
-void debug_rt_mutex_lock(struct rt_mutex *lock __IP_DECL__)
+void debug_rt_mutex_lock(struct rt_mutex *lock)
 {
-	unsigned long flags;
-
-	if (rt_trace_on) {
-		TRACE_WARN_ON_LOCKED(!list_empty(&lock->held_list_entry));
-
-		spin_lock_irqsave(&current->held_list_lock, flags);
-		list_add_tail(&lock->held_list_entry, &current->held_list_head);
-		spin_unlock_irqrestore(&current->held_list_lock, flags);
-
-		lock->acquire_ip = ip;
-	}
 }
 
 void debug_rt_mutex_unlock(struct rt_mutex *lock)
 {
-	unsigned long flags;
-
-	if (rt_trace_on) {
-		TRACE_WARN_ON_LOCKED(rt_mutex_owner(lock) != current);
-		TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list_entry));
-
-		spin_lock_irqsave(&current->held_list_lock, flags);
-		list_del_init(&lock->held_list_entry);
-		spin_unlock_irqrestore(&current->held_list_lock, flags);
-	}
+	TRACE_WARN_ON_LOCKED(rt_mutex_owner(lock) != current);
 }
 
-void debug_rt_mutex_proxy_lock(struct rt_mutex *lock,
-			       struct task_struct *powner __IP_DECL__)
+void
+debug_rt_mutex_proxy_lock(struct rt_mutex *lock, struct task_struct *powner)
 {
-	unsigned long flags;
-
-	if (rt_trace_on) {
-		TRACE_WARN_ON_LOCKED(!list_empty(&lock->held_list_entry));
-
-		spin_lock_irqsave(&powner->held_list_lock, flags);
-		list_add_tail(&lock->held_list_entry, &powner->held_list_head);
-		spin_unlock_irqrestore(&powner->held_list_lock, flags);
-
-		lock->acquire_ip = ip;
-	}
 }
 
 void debug_rt_mutex_proxy_unlock(struct rt_mutex *lock)
 {
-	unsigned long flags;
-
-	if (rt_trace_on) {
-		struct task_struct *owner = rt_mutex_owner(lock);
-
-		TRACE_WARN_ON_LOCKED(!owner);
-		TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list_entry));
-
-		spin_lock_irqsave(&owner->held_list_lock, flags);
-		list_del_init(&lock->held_list_entry);
-		spin_unlock_irqrestore(&owner->held_list_lock, flags);
-	}
+	TRACE_WARN_ON_LOCKED(!rt_mutex_owner(lock));
 }
 
 void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
@@ -493,14 +224,11 @@ void debug_rt_mutex_free_waiter(struct r
 
 void debug_rt_mutex_init(struct rt_mutex *lock, const char *name)
 {
-	void *addr = lock;
-
-	if (rt_trace_on) {
-		rt_mutex_debug_check_no_locks_freed(addr,
-						    sizeof(struct rt_mutex));
-		INIT_LIST_HEAD(&lock->held_list_entry);
-		lock->name = name;
-	}
+	/*
+	 * Make sure we are not reinitializing a held lock:
+	 */
+	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
+	lock->name = name;
 }
 
 void rt_mutex_deadlock_account_lock(struct rt_mutex *lock, task_t *task)
Index: linux/kernel/rtmutex-debug.h
===================================================================
--- linux.orig/kernel/rtmutex-debug.h
+++ linux/kernel/rtmutex-debug.h
@@ -9,20 +9,16 @@
  * This file contains macros used solely by rtmutex.c. Debug version.
  */
 
-#define __IP_DECL__		, unsigned long ip
-#define __IP__			, ip
-#define __RET_IP__		, (unsigned long)__builtin_return_address(0)
-
 extern void
 rt_mutex_deadlock_account_lock(struct rt_mutex *lock, struct task_struct *task);
 extern void rt_mutex_deadlock_account_unlock(struct task_struct *task);
 extern void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter);
 extern void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter);
 extern void debug_rt_mutex_init(struct rt_mutex *lock, const char *name);
-extern void debug_rt_mutex_lock(struct rt_mutex *lock __IP_DECL__);
+extern void debug_rt_mutex_lock(struct rt_mutex *lock);
 extern void debug_rt_mutex_unlock(struct rt_mutex *lock);
 extern void debug_rt_mutex_proxy_lock(struct rt_mutex *lock,
-				      struct task_struct *powner __IP_DECL__);
+				      struct task_struct *powner);
 extern void debug_rt_mutex_proxy_unlock(struct rt_mutex *lock);
 extern void debug_rt_mutex_deadlock(int detect, struct rt_mutex_waiter *waiter,
 				    struct rt_mutex *lock);
Index: linux/kernel/rtmutex.c
===================================================================
--- linux.orig/kernel/rtmutex.c
+++ linux/kernel/rtmutex.c
@@ -160,8 +160,7 @@ int max_lock_depth = 1024;
 static int rt_mutex_adjust_prio_chain(task_t *task,
 				      int deadlock_detect,
 				      struct rt_mutex *orig_lock,
-				      struct rt_mutex_waiter *orig_waiter
-				      __IP_DECL__)
+				      struct rt_mutex_waiter *orig_waiter)
 {
 	struct rt_mutex *lock;
 	struct rt_mutex_waiter *waiter, *top_waiter = orig_waiter;
@@ -356,7 +355,7 @@ static inline int try_to_steal_lock(stru
  *
  * Must be called with lock->wait_lock held.
  */
-static int try_to_take_rt_mutex(struct rt_mutex *lock __IP_DECL__)
+static int try_to_take_rt_mutex(struct rt_mutex *lock)
 {
 	/*
 	 * We have to be careful here if the atomic speedups are
@@ -383,7 +382,7 @@ static int try_to_take_rt_mutex(struct r
 		return 0;
 
 	/* We got the lock. */
-	debug_rt_mutex_lock(lock __IP__);
+	debug_rt_mutex_lock(lock);
 
 	rt_mutex_set_owner(lock, current, 0);
 
@@ -401,8 +400,7 @@ static int try_to_take_rt_mutex(struct r
  */
 static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
 				   struct rt_mutex_waiter *waiter,
-				   int detect_deadlock
-				   __IP_DECL__)
+				   int detect_deadlock)
 {
 	struct rt_mutex_waiter *top_waiter = waiter;
 	task_t *owner = rt_mutex_owner(lock);
@@ -450,8 +448,7 @@ static int task_blocks_on_rt_mutex(struc
 
 	spin_unlock(&lock->wait_lock);
 
-	res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock,
-					 waiter __IP__);
+	res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter);
 
 	spin_lock(&lock->wait_lock);
 
@@ -523,7 +520,7 @@ static void wakeup_next_waiter(struct rt
  * Must be called with lock->wait_lock held
  */
 static void remove_waiter(struct rt_mutex *lock,
-			  struct rt_mutex_waiter *waiter  __IP_DECL__)
+			  struct rt_mutex_waiter *waiter)
 {
 	int first = (waiter == rt_mutex_top_waiter(lock));
 	int boost = 0;
@@ -564,7 +561,7 @@ static void remove_waiter(struct rt_mute
 
 	spin_unlock(&lock->wait_lock);
 
-	rt_mutex_adjust_prio_chain(owner, 0, lock, NULL __IP__);
+	rt_mutex_adjust_prio_chain(owner, 0, lock, NULL);
 
 	spin_lock(&lock->wait_lock);
 }
@@ -575,7 +572,7 @@ static void remove_waiter(struct rt_mute
 static int __sched
 rt_mutex_slowlock(struct rt_mutex *lock, int state,
 		  struct hrtimer_sleeper *timeout,
-		  int detect_deadlock __IP_DECL__)
+		  int detect_deadlock)
 {
 	struct rt_mutex_waiter waiter;
 	int ret = 0;
@@ -586,7 +583,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
 	spin_lock(&lock->wait_lock);
 
 	/* Try to acquire the lock again: */
-	if (try_to_take_rt_mutex(lock __IP__)) {
+	if (try_to_take_rt_mutex(lock)) {
 		spin_unlock(&lock->wait_lock);
 		return 0;
 	}
@@ -600,7 +597,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
 
 	for (;;) {
 		/* Try to acquire the lock: */
-		if (try_to_take_rt_mutex(lock __IP__))
+		if (try_to_take_rt_mutex(lock))
 			break;
 
 		/*
@@ -624,7 +621,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
 		 */
 		if (!waiter.task) {
 			ret = task_blocks_on_rt_mutex(lock, &waiter,
-						      detect_deadlock __IP__);
+						      detect_deadlock);
 			/*
 			 * If we got woken up by the owner then start loop
 			 * all over without going into schedule to try
@@ -650,7 +647,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
 	set_current_state(TASK_RUNNING);
 
 	if (unlikely(waiter.task))
-		remove_waiter(lock, &waiter __IP__);
+		remove_waiter(lock, &waiter);
 
 	/*
 	 * try_to_take_rt_mutex() sets the waiter bit
@@ -681,7 +678,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
  * Slow path try-lock function:
  */
 static inline int
-rt_mutex_slowtrylock(struct rt_mutex *lock __IP_DECL__)
+rt_mutex_slowtrylock(struct rt_mutex *lock)
 {
 	int ret = 0;
 
@@ -689,7 +686,7 @@ rt_mutex_slowtrylock(struct rt_mutex *lo
 
 	if (likely(rt_mutex_owner(lock) != current)) {
 
-		ret = try_to_take_rt_mutex(lock __IP__);
+		ret = try_to_take_rt_mutex(lock);
 		/*
 		 * try_to_take_rt_mutex() sets the lock waiters
 		 * bit unconditionally. Clean this up.
@@ -739,13 +736,13 @@ rt_mutex_fastlock(struct rt_mutex *lock,
 		  int detect_deadlock,
 		  int (*slowfn)(struct rt_mutex *lock, int state,
 				struct hrtimer_sleeper *timeout,
-				int detect_deadlock __IP_DECL__))
+				int detect_deadlock))
 {
 	if (!detect_deadlock && likely(rt_mutex_cmpxchg(lock, NULL, current))) {
 		rt_mutex_deadlock_account_lock(lock, current);
 		return 0;
 	} else
-		return slowfn(lock, state, NULL, detect_deadlock __RET_IP__);
+		return slowfn(lock, state, NULL, detect_deadlock);
 }
 
 static inline int
@@ -753,24 +750,24 @@ rt_mutex_timed_fastlock(struct rt_mutex 
 			struct hrtimer_sleeper *timeout, int detect_deadlock,
 			int (*slowfn)(struct rt_mutex *lock, int state,
 				      struct hrtimer_sleeper *timeout,
-				      int detect_deadlock __IP_DECL__))
+				      int detect_deadlock))
 {
 	if (!detect_deadlock && likely(rt_mutex_cmpxchg(lock, NULL, current))) {
 		rt_mutex_deadlock_account_lock(lock, current);
 		return 0;
 	} else
-		return slowfn(lock, state, timeout, detect_deadlock __RET_IP__);
+		return slowfn(lock, state, timeout, detect_deadlock);
 }
 
 static inline int
 rt_mutex_fasttrylock(struct rt_mutex *lock,
-		     int (*slowfn)(struct rt_mutex *lock __IP_DECL__))
+		     int (*slowfn)(struct rt_mutex *lock))
 {
 	if (likely(rt_mutex_cmpxchg(lock, NULL, current))) {
 		rt_mutex_deadlock_account_lock(lock, current);
 		return 1;
 	}
-	return slowfn(lock __RET_IP__);
+	return slowfn(lock);
 }
 
 static inline void
@@ -918,7 +915,7 @@ void rt_mutex_init_proxy_locked(struct r
 				struct task_struct *proxy_owner)
 {
 	__rt_mutex_init(lock, NULL);
-	debug_rt_mutex_proxy_lock(lock, proxy_owner __RET_IP__);
+	debug_rt_mutex_proxy_lock(lock, proxy_owner);
 	rt_mutex_set_owner(lock, proxy_owner, 0);
 	rt_mutex_deadlock_account_lock(lock, proxy_owner);
 }
Index: linux/kernel/rtmutex.h
===================================================================
--- linux.orig/kernel/rtmutex.h
+++ linux/kernel/rtmutex.h
@@ -10,9 +10,6 @@
  * Non-debug version.
  */
 
-#define __IP_DECL__
-#define __IP__
-#define __RET_IP__
 #define rt_mutex_deadlock_check(l)			(0)
 #define rt_mutex_deadlock_account_lock(m, t)		do { } while (0)
 #define rt_mutex_deadlock_account_unlock(l)		do { } while (0)
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -30,6 +30,7 @@
 #include <linux/capability.h>
 #include <linux/completion.h>
 #include <linux/kernel_stat.h>
+#include <linux/debug_locks.h>
 #include <linux/security.h>
 #include <linux/notifier.h>
 #include <linux/profile.h>
@@ -3158,12 +3159,13 @@ void fastcall add_preempt_count(int val)
 	/*
 	 * Underflow?
 	 */
-	BUG_ON((preempt_count() < 0));
+	if (DEBUG_WARN_ON((preempt_count() < 0)))
+		return;
 	preempt_count() += val;
 	/*
 	 * Spinlock count overflowing soon?
 	 */
-	BUG_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK-10);
+	DEBUG_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK-10);
 }
 EXPORT_SYMBOL(add_preempt_count);
 
@@ -3172,11 +3174,15 @@ void fastcall sub_preempt_count(int val)
 	/*
 	 * Underflow?
 	 */
-	BUG_ON(val > preempt_count());
+	if (DEBUG_WARN_ON(val > preempt_count()))
+		return;
 	/*
 	 * Is the spinlock portion underflowing?
 	 */
-	BUG_ON((val < PREEMPT_MASK) && !(preempt_count() & PREEMPT_MASK));
+	if (DEBUG_WARN_ON((val < PREEMPT_MASK) &&
+			!(preempt_count() & PREEMPT_MASK)))
+		return;
+
 	preempt_count() -= val;
 }
 EXPORT_SYMBOL(sub_preempt_count);
@@ -4715,7 +4721,7 @@ void show_state(void)
 	} while_each_thread(g, p);
 
 	read_unlock(&tasklist_lock);
-	mutex_debug_show_all_locks();
+	debug_show_all_locks();
 }
 
 /**
Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -130,12 +130,30 @@ config DEBUG_PREEMPT
 	  will detect preemption count underflows.
 
 config DEBUG_MUTEXES
-	bool "Mutex debugging, deadlock detection"
-	default n
+	bool "Mutex debugging, basic checks"
+	default y
 	depends on DEBUG_KERNEL
 	help
-	 This allows mutex semantics violations and mutex related deadlocks
-	 (lockups) to be detected and reported automatically.
+	 This feature allows mutex semantics violations to be detected and
+	 reported.
+
+config DEBUG_MUTEX_ALLOC
+	bool "Detect incorrect freeing of live mutexes"
+	default y
+	depends on DEBUG_MUTEXES
+	help
+	 This feature will check whether any held mutex is incorrectly
+	 freed by the kernel, via any of the memory-freeing routines
+	 (kfree(), kmem_cache_free(), free_pages(), vfree(), etc.),
+	 or whether there is any lock held during task exit.
+
+config DEBUG_MUTEX_DEADLOCKS
+	bool "Detect mutex related deadlocks"
+	default y
+	depends on DEBUG_MUTEXES
+	help
+	 This feature will automatically detect and report mutex related
+	 deadlocks, as they happen.
 
 config DEBUG_RT_MUTEXES
 	bool "RT Mutex debugging, deadlock detection"
Index: linux/lib/Makefile
===================================================================
--- linux.orig/lib/Makefile
+++ linux/lib/Makefile
@@ -11,7 +11,7 @@ lib-$(CONFIG_SMP) += cpumask.o
 
 lib-y	+= kobject.o kref.o kobject_uevent.o klist.o
 
-obj-y += sort.o parser.o halfmd4.o iomap_copy.o
+obj-y += sort.o parser.o halfmd4.o iomap_copy.o debug_locks.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
Index: linux/lib/debug_locks.c
===================================================================
--- /dev/null
+++ linux/lib/debug_locks.c
@@ -0,0 +1,45 @@
+/*
+ * lib/debug_locks.c
+ *
+ * Generic place for common debugging facilities for various locks:
+ * spinlocks, rwlocks, mutexes and rwsems.
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ */
+#include <linux/rwsem.h>
+#include <linux/mutex.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/debug_locks.h>
+
+/*
+ * We want to turn all lock-debugging facilities on/off at once,
+ * via a global flag. The reason is that once a single bug has been
+ * detected and reported, there might be cascade of followup bugs
+ * that would just muddy the log. So we report the first one and
+ * shut up after that.
+ */
+int debug_locks = 1;
+
+/*
+ * The locking-testsuite uses <debug_locks_silent> to get a
+ * 'silent failure': nothing is printed to the console when
+ * a locking bug is detected.
+ */
+int debug_locks_silent;
+
+/*
+ * Generic 'turn off all lock debugging' function:
+ */
+int debug_locks_off(void)
+{
+	if (xchg(&debug_locks, 0)) {
+		if (!debug_locks_silent) {
+			console_verbose();
+			return 1;
+		}
+	}
+	return 0;
+}
Index: linux/lib/spinlock_debug.c
===================================================================
--- linux.orig/lib/spinlock_debug.c
+++ linux/lib/spinlock_debug.c
@@ -9,38 +9,35 @@
 #include <linux/config.h>
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
+#include <linux/debug_locks.h>
 #include <linux/delay.h>
+#include <linux/module.h>
 
 static void spin_bug(spinlock_t *lock, const char *msg)
 {
-	static long print_once = 1;
 	struct task_struct *owner = NULL;
 
-	if (xchg(&print_once, 0)) {
-		if (lock->owner && lock->owner != SPINLOCK_OWNER_INIT)
-			owner = lock->owner;
-		printk(KERN_EMERG "BUG: spinlock %s on CPU#%d, %s/%d\n",
-			msg, raw_smp_processor_id(),
-			current->comm, current->pid);
-		printk(KERN_EMERG " lock: %p, .magic: %08x, .owner: %s/%d, "
-				".owner_cpu: %d\n",
-			lock, lock->magic,
-			owner ? owner->comm : "<none>",
-			owner ? owner->pid : -1,
-			lock->owner_cpu);
-		dump_stack();
-#ifdef CONFIG_SMP
-		/*
-		 * We cannot continue on SMP:
-		 */
-//		panic("bad locking");
-#endif
-	}
+	if (!debug_locks_off())
+		return;
+
+	if (lock->owner && lock->owner != SPINLOCK_OWNER_INIT)
+		owner = lock->owner;
+	printk(KERN_EMERG "BUG: spinlock %s on CPU#%d, %s/%d\n",
+		msg, raw_smp_processor_id(),
+		current->comm, current->pid);
+	printk(KERN_EMERG " lock: %p, .magic: %08x, .owner: %s/%d, "
+			".owner_cpu: %d\n",
+		lock, lock->magic,
+		owner ? owner->comm : "<none>",
+		owner ? owner->pid : -1,
+		lock->owner_cpu);
+	dump_stack();
 }
 
 #define SPIN_BUG_ON(cond, lock, msg) if (unlikely(cond)) spin_bug(lock, msg)
 
-static inline void debug_spin_lock_before(spinlock_t *lock)
+static inline void
+debug_spin_lock_before(spinlock_t *lock)
 {
 	SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic");
 	SPIN_BUG_ON(lock->owner == current, lock, "recursion");
@@ -119,20 +116,13 @@ void _raw_spin_unlock(spinlock_t *lock)
 
 static void rwlock_bug(rwlock_t *lock, const char *msg)
 {
-	static long print_once = 1;
+	if (!debug_locks_off())
+		return;
 
-	if (xchg(&print_once, 0)) {
-		printk(KERN_EMERG "BUG: rwlock %s on CPU#%d, %s/%d, %p\n",
-			msg, raw_smp_processor_id(), current->comm,
-			current->pid, lock);
-		dump_stack();
-#ifdef CONFIG_SMP
-		/*
-		 * We cannot continue on SMP:
-		 */
-		panic("bad locking");
-#endif
-	}
+	printk(KERN_EMERG "BUG: rwlock %s on CPU#%d, %s/%d, %p\n",
+		msg, raw_smp_processor_id(), current->comm,
+		current->pid, lock);
+	dump_stack();
 }
 
 #define RWLOCK_BUG_ON(cond, lock, msg) if (unlikely(cond)) rwlock_bug(lock, msg)
Index: linux/mm/vmalloc.c
===================================================================
--- linux.orig/mm/vmalloc.c
+++ linux/mm/vmalloc.c
@@ -330,6 +330,8 @@ void __vunmap(void *addr, int deallocate
 		return;
 	}
 
+	debug_check_no_locks_freed(addr, area->size);
+
 	if (deallocate_pages) {
 		int i;
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 08/61] lock validator: locking API self-tests
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (6 preceding siblings ...)
  2006-05-29 21:23 ` [patch 07/61] lock validator: better lock debugging Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-29 21:23 ` [patch 09/61] lock validator: spin/rwlock init cleanups Ingo Molnar
                   ` (65 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

introduce DEBUG_LOCKING_API_SELFTESTS, which uses the generic lock
debugging code's silent-failure feature to run a matrix of testcases.
There are 210 testcases currently:

------------------------
| Locking API testsuite:
----------------------------------------------------------------------------
                                 | spin |wlock |rlock |mutex | wsem | rsem |
  --------------------------------------------------------------------------
                     A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                 A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
             A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
             A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
         A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
         A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
         A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                    double unlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                 bad unlock order:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
  --------------------------------------------------------------------------
              recursive read-lock:             |  ok  |             |  ok  |
  --------------------------------------------------------------------------
                non-nested unlock:  ok  |  ok  |  ok  |  ok  |
  ------------------------------------------------------------
     hard-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
     soft-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
     hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
     soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
       sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
       sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
         hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
         soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
         hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
         soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #1/132:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #1/132:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #1/213:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #1/213:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #1/231:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #1/231:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #1/312:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #1/312:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #1/321:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #1/321:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #2/123:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #2/123:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #2/132:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #2/132:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #2/213:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #2/213:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #2/231:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #2/231:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #2/312:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #2/312:  ok  |  ok  |  ok  |
    hard-safe-A + unsafe-B #2/321:  ok  |  ok  |  ok  |
    soft-safe-A + unsafe-B #2/321:  ok  |  ok  |  ok  |
      hard-irq lock-inversion/123:  ok  |  ok  |  ok  |
      soft-irq lock-inversion/123:  ok  |  ok  |  ok  |
      hard-irq lock-inversion/132:  ok  |  ok  |  ok  |
      soft-irq lock-inversion/132:  ok  |  ok  |  ok  |
      hard-irq lock-inversion/213:  ok  |  ok  |  ok  |
      soft-irq lock-inversion/213:  ok  |  ok  |  ok  |
      hard-irq lock-inversion/231:  ok  |  ok  |  ok  |
      soft-irq lock-inversion/231:  ok  |  ok  |  ok  |
      hard-irq lock-inversion/312:  ok  |  ok  |  ok  |
      soft-irq lock-inversion/312:  ok  |  ok  |  ok  |
      hard-irq lock-inversion/321:  ok  |  ok  |  ok  |
      soft-irq lock-inversion/321:  ok  |  ok  |  ok  |
      hard-irq read-recursion/123:  ok  |
      soft-irq read-recursion/123:  ok  |
      hard-irq read-recursion/132:  ok  |
      soft-irq read-recursion/132:  ok  |
      hard-irq read-recursion/213:  ok  |
      soft-irq read-recursion/213:  ok  |
      hard-irq read-recursion/231:  ok  |
      soft-irq read-recursion/231:  ok  |
      hard-irq read-recursion/312:  ok  |
      soft-irq read-recursion/312:  ok  |
      hard-irq read-recursion/321:  ok  |
      soft-irq read-recursion/321:  ok  |
-------------------------------------------------------
Good, all 210 testcases passed! |
---------------------------------

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 Documentation/kernel-parameters.txt  |    9 
 lib/Kconfig.debug                    |   12 
 lib/Makefile                         |    1 
 lib/locking-selftest-hardirq.h       |    9 
 lib/locking-selftest-mutex.h         |    5 
 lib/locking-selftest-rlock-hardirq.h |    2 
 lib/locking-selftest-rlock-softirq.h |    2 
 lib/locking-selftest-rlock.h         |    5 
 lib/locking-selftest-rsem.h          |    5 
 lib/locking-selftest-softirq.h       |    9 
 lib/locking-selftest-spin-hardirq.h  |    2 
 lib/locking-selftest-spin-softirq.h  |    2 
 lib/locking-selftest-spin.h          |    5 
 lib/locking-selftest-wlock-hardirq.h |    2 
 lib/locking-selftest-wlock-softirq.h |    2 
 lib/locking-selftest-wlock.h         |    5 
 lib/locking-selftest-wsem.h          |    5 
 lib/locking-selftest.c               | 1168 +++++++++++++++++++++++++++++++++++
 18 files changed, 1250 insertions(+)

Index: linux/Documentation/kernel-parameters.txt
===================================================================
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -436,6 +436,15 @@ running once the system is up.
 
 	debug		[KNL] Enable kernel debugging (events log level).
 
+	debug_locks_verbose=
+			[KNL] verbose self-tests
+			Format=<0|1>
+			Print debugging info while doing the locking API
+			self-tests.
+			We default to 0 (no extra messages), setting it to
+			1 will print _a lot_ more information - normally
+			only useful to kernel developers.
+
 	decnet=		[HW,NET]
 			Format: <area>[,<node>]
 			See also Documentation/networking/decnet.txt.
Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -191,6 +191,18 @@ config DEBUG_SPINLOCK_SLEEP
 	  If you say Y here, various routines which may sleep will become very
 	  noisy if they are called with a spinlock held.
 
+config DEBUG_LOCKING_API_SELFTESTS
+	bool "Locking API boot-time self-tests"
+	depends on DEBUG_KERNEL
+	default y
+	help
+	  Say Y here if you want the kernel to run a short self-test during
+	  bootup. The self-test checks whether common types of locking bugs
+	  are detected by debugging mechanisms or not. (if you disable
+	  lock debugging then those bugs wont be detected of course.)
+	  The following locking APIs are covered: spinlocks, rwlocks,
+	  mutexes and rwsems.
+
 config DEBUG_KOBJECT
 	bool "kobject debugging"
 	depends on DEBUG_KERNEL
Index: linux/lib/Makefile
===================================================================
--- linux.orig/lib/Makefile
+++ linux/lib/Makefile
@@ -18,6 +18,7 @@ CFLAGS_kobject.o += -DDEBUG
 CFLAGS_kobject_uevent.o += -DDEBUG
 endif
 
+obj-$(CONFIG_DEBUG_LOCKING_API_SELFTESTS) += locking-selftest.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
 lib-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
Index: linux/lib/locking-selftest-hardirq.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-hardirq.h
@@ -0,0 +1,9 @@
+#undef IRQ_DISABLE
+#undef IRQ_ENABLE
+#undef IRQ_ENTER
+#undef IRQ_EXIT
+
+#define IRQ_ENABLE		HARDIRQ_ENABLE
+#define IRQ_DISABLE		HARDIRQ_DISABLE
+#define IRQ_ENTER		HARDIRQ_ENTER
+#define IRQ_EXIT		HARDIRQ_EXIT
Index: linux/lib/locking-selftest-mutex.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-mutex.h
@@ -0,0 +1,5 @@
+#undef LOCK
+#define LOCK		ML
+
+#undef UNLOCK
+#define UNLOCK		MU
Index: linux/lib/locking-selftest-rlock-hardirq.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-rlock-hardirq.h
@@ -0,0 +1,2 @@
+#include "locking-selftest-rlock.h"
+#include "locking-selftest-hardirq.h"
Index: linux/lib/locking-selftest-rlock-softirq.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-rlock-softirq.h
@@ -0,0 +1,2 @@
+#include "locking-selftest-rlock.h"
+#include "locking-selftest-softirq.h"
Index: linux/lib/locking-selftest-rlock.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-rlock.h
@@ -0,0 +1,5 @@
+#undef LOCK
+#define LOCK		RL
+
+#undef UNLOCK
+#define UNLOCK		RU
Index: linux/lib/locking-selftest-rsem.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-rsem.h
@@ -0,0 +1,5 @@
+#undef LOCK
+#define LOCK		RSL
+
+#undef UNLOCK
+#define UNLOCK		RSU
Index: linux/lib/locking-selftest-softirq.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-softirq.h
@@ -0,0 +1,9 @@
+#undef IRQ_DISABLE
+#undef IRQ_ENABLE
+#undef IRQ_ENTER
+#undef IRQ_EXIT
+
+#define IRQ_DISABLE		SOFTIRQ_DISABLE
+#define IRQ_ENABLE		SOFTIRQ_ENABLE
+#define IRQ_ENTER		SOFTIRQ_ENTER
+#define IRQ_EXIT		SOFTIRQ_EXIT
Index: linux/lib/locking-selftest-spin-hardirq.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-spin-hardirq.h
@@ -0,0 +1,2 @@
+#include "locking-selftest-spin.h"
+#include "locking-selftest-hardirq.h"
Index: linux/lib/locking-selftest-spin-softirq.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-spin-softirq.h
@@ -0,0 +1,2 @@
+#include "locking-selftest-spin.h"
+#include "locking-selftest-softirq.h"
Index: linux/lib/locking-selftest-spin.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-spin.h
@@ -0,0 +1,5 @@
+#undef LOCK
+#define LOCK		L
+
+#undef UNLOCK
+#define UNLOCK		U
Index: linux/lib/locking-selftest-wlock-hardirq.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-wlock-hardirq.h
@@ -0,0 +1,2 @@
+#include "locking-selftest-wlock.h"
+#include "locking-selftest-hardirq.h"
Index: linux/lib/locking-selftest-wlock-softirq.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-wlock-softirq.h
@@ -0,0 +1,2 @@
+#include "locking-selftest-wlock.h"
+#include "locking-selftest-softirq.h"
Index: linux/lib/locking-selftest-wlock.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-wlock.h
@@ -0,0 +1,5 @@
+#undef LOCK
+#define LOCK		WL
+
+#undef UNLOCK
+#define UNLOCK		WU
Index: linux/lib/locking-selftest-wsem.h
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest-wsem.h
@@ -0,0 +1,5 @@
+#undef LOCK
+#define LOCK		WSL
+
+#undef UNLOCK
+#define UNLOCK		WSU
Index: linux/lib/locking-selftest.c
===================================================================
--- /dev/null
+++ linux/lib/locking-selftest.c
@@ -0,0 +1,1168 @@
+/*
+ * lib/locking-selftest.c
+ *
+ * Testsuite for various locking APIs: spinlocks, rwlocks,
+ * mutexes and rw-semaphores.
+ *
+ * It is checking both false positives and false negatives.
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ */
+#include <linux/rwsem.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/delay.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/kallsyms.h>
+#include <linux/interrupt.h>
+#include <linux/debug_locks.h>
+
+/*
+ * Change this to 1 if you want to see the failure printouts:
+ */
+static unsigned int debug_locks_verbose;
+
+static int __init setup_debug_locks_verbose(char *str)
+{
+	get_option(&str, &debug_locks_verbose);
+
+	return 1;
+}
+
+__setup("debug_locks_verbose=", setup_debug_locks_verbose);
+
+#define FAILURE		0
+#define SUCCESS		1
+
+enum {
+	LOCKTYPE_SPIN,
+	LOCKTYPE_RWLOCK,
+	LOCKTYPE_MUTEX,
+	LOCKTYPE_RWSEM,
+};
+
+/*
+ * Normal standalone locks, for the circular and irq-context
+ * dependency tests:
+ */
+static DEFINE_SPINLOCK(lock_A);
+static DEFINE_SPINLOCK(lock_B);
+static DEFINE_SPINLOCK(lock_C);
+static DEFINE_SPINLOCK(lock_D);
+
+static DEFINE_RWLOCK(rwlock_A);
+static DEFINE_RWLOCK(rwlock_B);
+static DEFINE_RWLOCK(rwlock_C);
+static DEFINE_RWLOCK(rwlock_D);
+
+static DEFINE_MUTEX(mutex_A);
+static DEFINE_MUTEX(mutex_B);
+static DEFINE_MUTEX(mutex_C);
+static DEFINE_MUTEX(mutex_D);
+
+static DECLARE_RWSEM(rwsem_A);
+static DECLARE_RWSEM(rwsem_B);
+static DECLARE_RWSEM(rwsem_C);
+static DECLARE_RWSEM(rwsem_D);
+
+/*
+ * Locks that we initialize dynamically as well so that
+ * e.g. X1 and X2 becomes two instances of the same type,
+ * but X* and Y* are different types. We do this so that
+ * we do not trigger a real lockup:
+ */
+static DEFINE_SPINLOCK(lock_X1);
+static DEFINE_SPINLOCK(lock_X2);
+static DEFINE_SPINLOCK(lock_Y1);
+static DEFINE_SPINLOCK(lock_Y2);
+static DEFINE_SPINLOCK(lock_Z1);
+static DEFINE_SPINLOCK(lock_Z2);
+
+static DEFINE_RWLOCK(rwlock_X1);
+static DEFINE_RWLOCK(rwlock_X2);
+static DEFINE_RWLOCK(rwlock_Y1);
+static DEFINE_RWLOCK(rwlock_Y2);
+static DEFINE_RWLOCK(rwlock_Z1);
+static DEFINE_RWLOCK(rwlock_Z2);
+
+static DEFINE_MUTEX(mutex_X1);
+static DEFINE_MUTEX(mutex_X2);
+static DEFINE_MUTEX(mutex_Y1);
+static DEFINE_MUTEX(mutex_Y2);
+static DEFINE_MUTEX(mutex_Z1);
+static DEFINE_MUTEX(mutex_Z2);
+
+static DECLARE_RWSEM(rwsem_X1);
+static DECLARE_RWSEM(rwsem_X2);
+static DECLARE_RWSEM(rwsem_Y1);
+static DECLARE_RWSEM(rwsem_Y2);
+static DECLARE_RWSEM(rwsem_Z1);
+static DECLARE_RWSEM(rwsem_Z2);
+
+/*
+ * non-inlined runtime initializers, to let separate locks share
+ * the same lock-type:
+ */
+#define INIT_TYPE_FUNC(type) 				\
+static noinline void					\
+init_type_##type(spinlock_t *lock, rwlock_t *rwlock, struct mutex *mutex, \
+		 struct rw_semaphore *rwsem)		\
+{							\
+	spin_lock_init(lock);				\
+	rwlock_init(rwlock);				\
+	mutex_init(mutex);				\
+	init_rwsem(rwsem);				\
+}
+
+INIT_TYPE_FUNC(X)
+INIT_TYPE_FUNC(Y)
+INIT_TYPE_FUNC(Z)
+
+static void init_shared_types(void)
+{
+	init_type_X(&lock_X1, &rwlock_X1, &mutex_X1, &rwsem_X1);
+	init_type_X(&lock_X2, &rwlock_X2, &mutex_X2, &rwsem_X2);
+
+	init_type_Y(&lock_Y1, &rwlock_Y1, &mutex_Y1, &rwsem_Y1);
+	init_type_Y(&lock_Y2, &rwlock_Y2, &mutex_Y2, &rwsem_Y2);
+
+	init_type_Z(&lock_Z1, &rwlock_Z1, &mutex_Z1, &rwsem_Z1);
+	init_type_Z(&lock_Z2, &rwlock_Z2, &mutex_Z2, &rwsem_Z2);
+}
+
+/*
+ * For spinlocks and rwlocks we also do hardirq-safe / softirq-safe tests.
+ * The following functions use a lock from a simulated hardirq/softirq
+ * context, causing the locks to be marked as hardirq-safe/softirq-safe:
+ */
+
+#define HARDIRQ_DISABLE		local_irq_disable
+#define HARDIRQ_ENABLE		local_irq_enable
+
+#define HARDIRQ_ENTER()				\
+	local_irq_disable();			\
+	nmi_enter();				\
+	WARN_ON(!in_irq());
+
+#define HARDIRQ_EXIT()				\
+	nmi_exit();				\
+	local_irq_enable();
+
+#define SOFTIRQ_DISABLE		local_bh_disable
+#define SOFTIRQ_ENABLE		local_bh_enable
+
+#define SOFTIRQ_ENTER()				\
+		local_bh_disable();		\
+		local_irq_disable();		\
+		WARN_ON(!in_softirq());
+
+#define SOFTIRQ_EXIT()				\
+		local_irq_enable();		\
+		local_bh_enable();
+
+/*
+ * Shortcuts for lock/unlock API variants, to keep
+ * the testcases compact:
+ */
+#define L(x)			spin_lock(&lock_##x)
+#define U(x)			spin_unlock(&lock_##x)
+#define UNN(x)			spin_unlock_non_nested(&lock_##x)
+#define LU(x)			L(x); U(x)
+
+#define WL(x)			write_lock(&rwlock_##x)
+#define WU(x)			write_unlock(&rwlock_##x)
+#define WLU(x)			WL(x); WU(x)
+
+#define RL(x)			read_lock(&rwlock_##x)
+#define RU(x)			read_unlock(&rwlock_##x)
+#define RUNN(x)			read_unlock_non_nested(&rwlock_##x)
+#define RLU(x)			RL(x); RU(x)
+
+#define ML(x)			mutex_lock(&mutex_##x)
+#define MU(x)			mutex_unlock(&mutex_##x)
+#define MUNN(x)			mutex_unlock_non_nested(&mutex_##x)
+
+#define WSL(x)			down_write(&rwsem_##x)
+#define WSU(x)			up_write(&rwsem_##x)
+
+#define RSL(x)			down_read(&rwsem_##x)
+#define RSU(x)			up_read(&rwsem_##x)
+#define RSUNN(x)		up_read_non_nested(&rwsem_##x)
+
+#define LOCK_UNLOCK_2(x,y)	LOCK(x); LOCK(y); UNLOCK(y); UNLOCK(x)
+
+/*
+ * Generate different permutations of the same testcase, using
+ * the same basic lock-dependency/state events:
+ */
+
+#define GENERATE_TESTCASE(name)			\
+						\
+static void name(void) { E(); }
+
+#define GENERATE_PERMUTATIONS_2_EVENTS(name)	\
+						\
+static void name##_12(void) { E1(); E2(); }	\
+static void name##_21(void) { E2(); E1(); }
+
+#define GENERATE_PERMUTATIONS_3_EVENTS(name)		\
+							\
+static void name##_123(void) { E1(); E2(); E3(); }	\
+static void name##_132(void) { E1(); E3(); E2(); }	\
+static void name##_213(void) { E2(); E1(); E3(); }	\
+static void name##_231(void) { E2(); E3(); E1(); }	\
+static void name##_312(void) { E3(); E1(); E2(); }	\
+static void name##_321(void) { E3(); E2(); E1(); }
+
+/*
+ * AA deadlock:
+ */
+
+#define E()					\
+						\
+	LOCK(X1);				\
+	LOCK(X2); /* this one should fail */	\
+	UNLOCK(X2);				\
+	UNLOCK(X1);
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(AA_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(AA_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(AA_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(AA_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(AA_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(AA_rsem)
+
+#undef E
+
+/*
+ * Special-case for read-locking, they are
+ * allowed to recurse on the same lock instance:
+ */
+static void rlock_AA1(void)
+{
+	RL(X1);
+	RL(X1); // this one should NOT fail
+	RU(X1);
+	RU(X1);
+}
+
+static void rsem_AA1(void)
+{
+	RSL(X1);
+	RSL(X1); // this one should fail
+	RSU(X1);
+	RSU(X1);
+}
+
+/*
+ * ABBA deadlock:
+ */
+
+#define E()					\
+						\
+	LOCK_UNLOCK_2(A, B);			\
+	LOCK_UNLOCK_2(B, A); /* fail */
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(ABBA_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(ABBA_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(ABBA_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(ABBA_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(ABBA_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(ABBA_rsem)
+
+#undef E
+
+/*
+ * AB BC CA deadlock:
+ */
+
+#define E()					\
+						\
+	LOCK_UNLOCK_2(A, B);			\
+	LOCK_UNLOCK_2(B, C);			\
+	LOCK_UNLOCK_2(C, A); /* fail */
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(ABBCCA_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(ABBCCA_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(ABBCCA_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(ABBCCA_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(ABBCCA_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(ABBCCA_rsem)
+
+#undef E
+
+/*
+ * AB CA BC deadlock:
+ */
+
+#define E()					\
+						\
+	LOCK_UNLOCK_2(A, B);			\
+	LOCK_UNLOCK_2(C, A);			\
+	LOCK_UNLOCK_2(B, C); /* fail */
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(ABCABC_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(ABCABC_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(ABCABC_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(ABCABC_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(ABCABC_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(ABCABC_rsem)
+
+#undef E
+
+/*
+ * AB BC CD DA deadlock:
+ */
+
+#define E()					\
+						\
+	LOCK_UNLOCK_2(A, B);			\
+	LOCK_UNLOCK_2(B, C);			\
+	LOCK_UNLOCK_2(C, D);			\
+	LOCK_UNLOCK_2(D, A); /* fail */
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(ABBCCDDA_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(ABBCCDDA_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(ABBCCDDA_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(ABBCCDDA_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(ABBCCDDA_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(ABBCCDDA_rsem)
+
+#undef E
+
+/*
+ * AB CD BD DA deadlock:
+ */
+#define E()					\
+						\
+	LOCK_UNLOCK_2(A, B);			\
+	LOCK_UNLOCK_2(C, D);			\
+	LOCK_UNLOCK_2(B, D);			\
+	LOCK_UNLOCK_2(D, A); /* fail */
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(ABCDBDDA_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(ABCDBDDA_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(ABCDBDDA_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(ABCDBDDA_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(ABCDBDDA_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(ABCDBDDA_rsem)
+
+#undef E
+
+/*
+ * AB CD BC DA deadlock:
+ */
+#define E()					\
+						\
+	LOCK_UNLOCK_2(A, B);			\
+	LOCK_UNLOCK_2(C, D);			\
+	LOCK_UNLOCK_2(B, C);			\
+	LOCK_UNLOCK_2(D, A); /* fail */
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(ABCDBCDA_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(ABCDBCDA_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(ABCDBCDA_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(ABCDBCDA_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(ABCDBCDA_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(ABCDBCDA_rsem)
+
+#undef E
+
+/*
+ * Double unlock:
+ */
+#define E()					\
+						\
+	LOCK(A);				\
+	UNLOCK(A);				\
+	UNLOCK(A); /* fail */
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(double_unlock_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(double_unlock_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(double_unlock_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(double_unlock_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(double_unlock_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(double_unlock_rsem)
+
+#undef E
+
+/*
+ * Bad unlock ordering:
+ */
+#define E()					\
+						\
+	LOCK(A);				\
+	LOCK(B);				\
+	UNLOCK(A); /* fail */			\
+	UNLOCK(B);
+
+/*
+ * 6 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_TESTCASE(bad_unlock_order_spin)
+#include "locking-selftest-wlock.h"
+GENERATE_TESTCASE(bad_unlock_order_wlock)
+#include "locking-selftest-rlock.h"
+GENERATE_TESTCASE(bad_unlock_order_rlock)
+#include "locking-selftest-mutex.h"
+GENERATE_TESTCASE(bad_unlock_order_mutex)
+#include "locking-selftest-wsem.h"
+GENERATE_TESTCASE(bad_unlock_order_wsem)
+#include "locking-selftest-rsem.h"
+GENERATE_TESTCASE(bad_unlock_order_rsem)
+
+#undef E
+
+#ifdef CONFIG_LOCKDEP
+/*
+ * bad unlock ordering - but using the _non_nested API,
+ * which must supress the warning:
+ */
+static void spin_order_nn(void)
+{
+	L(A);
+	L(B);
+	UNN(A); // this one should succeed
+	UNN(B);
+}
+
+static void rlock_order_nn(void)
+{
+	RL(A);
+	RL(B);
+	RUNN(A); // this one should succeed
+	RUNN(B);
+}
+
+static void mutex_order_nn(void)
+{
+	ML(A);
+	ML(B);
+	MUNN(A); // this one should succeed
+	MUNN(B);
+}
+
+static void rsem_order_nn(void)
+{
+	RSL(A);
+	RSL(B);
+	RSUNN(A); // this one should succeed
+	RSUNN(B);
+}
+
+#endif
+
+/*
+ * locking an irq-safe lock with irqs enabled:
+ */
+#define E1()				\
+					\
+	IRQ_ENTER();			\
+	LOCK(A);			\
+	UNLOCK(A);			\
+	IRQ_EXIT();
+
+#define E2()				\
+					\
+	LOCK(A);			\
+	UNLOCK(A);
+
+/*
+ * Generate 24 testcases:
+ */
+#include "locking-selftest-spin-hardirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_hard_spin)
+
+#include "locking-selftest-rlock-hardirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_hard_rlock)
+
+#include "locking-selftest-wlock-hardirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_hard_wlock)
+
+#include "locking-selftest-spin-softirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_soft_spin)
+
+#include "locking-selftest-rlock-softirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_soft_rlock)
+
+#include "locking-selftest-wlock-softirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_soft_wlock)
+
+#undef E1
+#undef E2
+
+/*
+ * Enabling hardirqs with a softirq-safe lock held:
+ */
+#define E1()				\
+					\
+	SOFTIRQ_ENTER();		\
+	LOCK(A);			\
+	UNLOCK(A);			\
+	SOFTIRQ_EXIT();
+
+#define E2()				\
+					\
+	HARDIRQ_DISABLE();		\
+	LOCK(A);			\
+	HARDIRQ_ENABLE();		\
+	UNLOCK(A);
+
+/*
+ * Generate 12 testcases:
+ */
+#include "locking-selftest-spin.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2A_spin)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2A_wlock)
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2A_rlock)
+
+#undef E1
+#undef E2
+
+/*
+ * Enabling irqs with an irq-safe lock held:
+ */
+#define E1()				\
+					\
+	IRQ_ENTER();			\
+	LOCK(A);			\
+	UNLOCK(A);			\
+	IRQ_EXIT();
+
+#define E2()				\
+					\
+	IRQ_DISABLE();			\
+	LOCK(A);			\
+	IRQ_ENABLE();			\
+	UNLOCK(A);
+
+/*
+ * Generate 24 testcases:
+ */
+#include "locking-selftest-spin-hardirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_hard_spin)
+
+#include "locking-selftest-rlock-hardirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_hard_rlock)
+
+#include "locking-selftest-wlock-hardirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_hard_wlock)
+
+#include "locking-selftest-spin-softirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_soft_spin)
+
+#include "locking-selftest-rlock-softirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_soft_rlock)
+
+#include "locking-selftest-wlock-softirq.h"
+GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_soft_wlock)
+
+#undef E1
+#undef E2
+
+/*
+ * Acquiring a irq-unsafe lock while holding an irq-safe-lock:
+ */
+#define E1()				\
+					\
+	LOCK(A);			\
+	LOCK(B);			\
+	UNLOCK(B);			\
+	UNLOCK(A);			\
+
+#define E2()				\
+					\
+	LOCK(B);			\
+	UNLOCK(B);
+
+#define E3()				\
+					\
+	IRQ_ENTER();			\
+	LOCK(A);			\
+	UNLOCK(A);			\
+	IRQ_EXIT();
+
+/*
+ * Generate 36 testcases:
+ */
+#include "locking-selftest-spin-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_hard_spin)
+
+#include "locking-selftest-rlock-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_hard_rlock)
+
+#include "locking-selftest-wlock-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_hard_wlock)
+
+#include "locking-selftest-spin-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_soft_spin)
+
+#include "locking-selftest-rlock-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_soft_rlock)
+
+#include "locking-selftest-wlock-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_soft_wlock)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * If a lock turns into softirq-safe, but earlier it took
+ * a softirq-unsafe lock:
+ */
+
+#define E1()				\
+	IRQ_DISABLE();			\
+	LOCK(A);			\
+	LOCK(B);			\
+	UNLOCK(B);			\
+	UNLOCK(A);			\
+	IRQ_ENABLE();
+
+#define E2()				\
+	LOCK(B);			\
+	UNLOCK(B);
+
+#define E3()				\
+	IRQ_ENTER();			\
+	LOCK(A);			\
+	UNLOCK(A);			\
+	IRQ_EXIT();
+
+/*
+ * Generate 36 testcases:
+ */
+#include "locking-selftest-spin-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_hard_spin)
+
+#include "locking-selftest-rlock-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_hard_rlock)
+
+#include "locking-selftest-wlock-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_hard_wlock)
+
+#include "locking-selftest-spin-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_soft_spin)
+
+#include "locking-selftest-rlock-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_soft_rlock)
+
+#include "locking-selftest-wlock-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_soft_wlock)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * read-lock / write-lock irq inversion.
+ *
+ * Deadlock scenario:
+ *
+ * CPU#1 is at #1, i.e. it has write-locked A, but has not
+ * taken B yet.
+ *
+ * CPU#2 is at #2, i.e. it has locked B.
+ *
+ * Hardirq hits CPU#2 at point #2 and is trying to read-lock A.
+ *
+ * The deadlock occurs because CPU#1 will spin on B, and CPU#2
+ * will spin on A.
+ */
+
+#define E1()				\
+					\
+	IRQ_DISABLE();			\
+	WL(A);				\
+	LOCK(B);			\
+	UNLOCK(B);			\
+	WU(A);				\
+	IRQ_ENABLE();
+
+#define E2()				\
+					\
+	LOCK(B);			\
+	UNLOCK(B);
+
+#define E3()				\
+					\
+	IRQ_ENTER();			\
+	RL(A);				\
+	RU(A);				\
+	IRQ_EXIT();
+
+/*
+ * Generate 36 testcases:
+ */
+#include "locking-selftest-spin-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_hard_spin)
+
+#include "locking-selftest-rlock-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_hard_rlock)
+
+#include "locking-selftest-wlock-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_hard_wlock)
+
+#include "locking-selftest-spin-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_spin)
+
+#include "locking-selftest-rlock-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_rlock)
+
+#include "locking-selftest-wlock-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_wlock)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * read-lock / write-lock recursion that is actually safe.
+ */
+
+#define E1()				\
+					\
+	IRQ_DISABLE();			\
+	WL(A);				\
+	WU(A);				\
+	IRQ_ENABLE();
+
+#define E2()				\
+					\
+	RL(A);				\
+	RU(A);				\
+
+#define E3()				\
+					\
+	IRQ_ENTER();			\
+	RL(A);				\
+	L(B);				\
+	U(B);				\
+	RU(A);				\
+	IRQ_EXIT();
+
+/*
+ * Generate 12 testcases:
+ */
+#include "locking-selftest-hardirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard)
+
+#include "locking-selftest-softirq.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * read-lock / write-lock recursion that is unsafe.
+ */
+
+#define E1()				\
+					\
+	IRQ_DISABLE();			\
+	L(B);				\
+	WL(A);				\
+	WU(A);				\
+	U(B);				\
+	IRQ_ENABLE();
+
+#define E2()				\
+					\
+	RL(A);				\
+	RU(A);				\
+
+#define E3()				\
+					\
+	IRQ_ENTER();			\
+	L(B);				\
+	U(B);				\
+	IRQ_EXIT();
+
+/*
+ * Generate 12 testcases:
+ */
+#include "locking-selftest-hardirq.h"
+// GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard)
+
+#include "locking-selftest-softirq.h"
+// GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft)
+
+#define lockdep_reset()
+#define lockdep_reset_lock(x)
+
+#ifdef CONFIG_PROVE_SPIN_LOCKING
+# define I_SPINLOCK(x)	lockdep_reset_lock(&lock_##x.dep_map)
+#else
+# define I_SPINLOCK(x)
+#endif
+
+#ifdef CONFIG_PROVE_RW_LOCKING
+# define I_RWLOCK(x)	lockdep_reset_lock(&rwlock_##x.dep_map)
+#else
+# define I_RWLOCK(x)
+#endif
+
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+# define I_MUTEX(x)	lockdep_reset_lock(&mutex_##x.dep_map)
+#else
+# define I_MUTEX(x)
+#endif
+
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+# define I_RWSEM(x)	lockdep_reset_lock(&rwsem_##x.dep_map)
+#else
+# define I_RWSEM(x)
+#endif
+
+#define I1(x)					\
+	do {					\
+		I_SPINLOCK(x);			\
+		I_RWLOCK(x);			\
+		I_MUTEX(x);			\
+		I_RWSEM(x);			\
+	} while (0)
+
+#define I2(x)					\
+	do {					\
+		spin_lock_init(&lock_##x);	\
+		rwlock_init(&rwlock_##x);	\
+		mutex_init(&mutex_##x);		\
+		init_rwsem(&rwsem_##x);		\
+	} while (0)
+
+static void reset_locks(void)
+{
+	local_irq_disable();
+	I1(A); I1(B); I1(C); I1(D);
+	I1(X1); I1(X2); I1(Y1); I1(Y2); I1(Z1); I1(Z2);
+	lockdep_reset();
+	I2(A); I2(B); I2(C); I2(D);
+	init_shared_types();
+	local_irq_enable();
+}
+
+#undef I
+
+static int testcase_total;
+static int testcase_successes;
+static int expected_testcase_failures;
+static int unexpected_testcase_failures;
+
+static void dotest(void (*testcase_fn)(void), int expected, int locktype)
+{
+	unsigned long saved_preempt_count = preempt_count();
+	int unexpected_failure = 0;
+
+	WARN_ON(irqs_disabled());
+
+	testcase_fn();
+#ifdef CONFIG_PROVE_SPIN_LOCKING
+	if (locktype == LOCKTYPE_SPIN && debug_locks != expected)
+		unexpected_failure = 1;
+#endif
+#ifdef CONFIG_PROVE_RW_LOCKING
+	if (locktype == LOCKTYPE_RWLOCK && debug_locks != expected)
+		unexpected_failure = 1;
+#endif
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+	if (locktype == LOCKTYPE_MUTEX && debug_locks != expected)
+		unexpected_failure = 1;
+#endif
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+	if (locktype == LOCKTYPE_RWSEM && debug_locks != expected)
+		unexpected_failure = 1;
+#endif
+	if (debug_locks != expected) {
+		if (unexpected_failure) {
+			unexpected_testcase_failures++;
+			printk("FAILED|");
+		} else {
+			expected_testcase_failures++;
+			printk("failed|");
+		}
+	} else {
+		testcase_successes++;
+		printk("  ok  |");
+	}
+	testcase_total++;
+
+	/*
+	 * Some tests (e.g. double-unlock) might corrupt the preemption
+	 * count, so restore it:
+	 */
+	preempt_count() = saved_preempt_count;
+#ifdef CONFIG_TRACE_IRQFLAGS
+	if (softirq_count())
+		current->softirqs_enabled = 0;
+	else
+		current->softirqs_enabled = 1;
+#endif
+
+	reset_locks();
+}
+
+static inline void print_testname(const char *testname)
+{
+	printk("%33s:", testname);
+}
+
+#define DO_TESTCASE_1(desc, name, nr)				\
+	print_testname(desc"/"#nr);				\
+	dotest(name##_##nr, SUCCESS, LOCKTYPE_RWLOCK);		\
+	printk("\n");
+
+#define DO_TESTCASE_1B(desc, name, nr)				\
+	print_testname(desc"/"#nr);				\
+	dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK);		\
+	printk("\n");
+
+#define DO_TESTCASE_3(desc, name, nr)				\
+	print_testname(desc"/"#nr);				\
+	dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN);	\
+	dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK);	\
+	dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK);	\
+	printk("\n");
+
+#define DO_TESTCASE_6(desc, name)				\
+	print_testname(desc);					\
+	dotest(name##_spin, FAILURE, LOCKTYPE_SPIN);		\
+	dotest(name##_wlock, FAILURE, LOCKTYPE_RWLOCK);		\
+	dotest(name##_rlock, FAILURE, LOCKTYPE_RWLOCK);		\
+	dotest(name##_mutex, FAILURE, LOCKTYPE_SPIN);		\
+	dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);		\
+	dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);		\
+	printk("\n");
+
+/*
+ * 'read' variant: rlocks must not trigger.
+ */
+#define DO_TESTCASE_6R(desc, name)				\
+	print_testname(desc);					\
+	dotest(name##_spin, FAILURE, LOCKTYPE_SPIN);		\
+	dotest(name##_wlock, FAILURE, LOCKTYPE_RWLOCK);		\
+	dotest(name##_rlock, SUCCESS, LOCKTYPE_RWLOCK);		\
+	dotest(name##_mutex, FAILURE, LOCKTYPE_SPIN);		\
+	dotest(name##_wsem, FAILURE, LOCKTYPE_RWSEM);		\
+	dotest(name##_rsem, FAILURE, LOCKTYPE_RWSEM);		\
+	printk("\n");
+
+#define DO_TESTCASE_2I(desc, name, nr)				\
+	DO_TESTCASE_1("hard-"desc, name##_hard, nr);		\
+	DO_TESTCASE_1("soft-"desc, name##_soft, nr);
+
+#define DO_TESTCASE_2IB(desc, name, nr)				\
+	DO_TESTCASE_1B("hard-"desc, name##_hard, nr);		\
+	DO_TESTCASE_1B("soft-"desc, name##_soft, nr);
+
+#define DO_TESTCASE_6I(desc, name, nr)				\
+	DO_TESTCASE_3("hard-"desc, name##_hard, nr);		\
+	DO_TESTCASE_3("soft-"desc, name##_soft, nr);
+
+#define DO_TESTCASE_2x3(desc, name)				\
+	DO_TESTCASE_3(desc, name, 12);				\
+	DO_TESTCASE_3(desc, name, 21);
+
+#define DO_TESTCASE_2x6(desc, name)				\
+	DO_TESTCASE_6I(desc, name, 12);				\
+	DO_TESTCASE_6I(desc, name, 21);
+
+#define DO_TESTCASE_6x2(desc, name)				\
+	DO_TESTCASE_2I(desc, name, 123);			\
+	DO_TESTCASE_2I(desc, name, 132);			\
+	DO_TESTCASE_2I(desc, name, 213);			\
+	DO_TESTCASE_2I(desc, name, 231);			\
+	DO_TESTCASE_2I(desc, name, 312);			\
+	DO_TESTCASE_2I(desc, name, 321);
+
+#define DO_TESTCASE_6x2B(desc, name)				\
+	DO_TESTCASE_2IB(desc, name, 123);			\
+	DO_TESTCASE_2IB(desc, name, 132);			\
+	DO_TESTCASE_2IB(desc, name, 213);			\
+	DO_TESTCASE_2IB(desc, name, 231);			\
+	DO_TESTCASE_2IB(desc, name, 312);			\
+	DO_TESTCASE_2IB(desc, name, 321);
+
+
+#define DO_TESTCASE_6x6(desc, name)				\
+	DO_TESTCASE_6I(desc, name, 123);			\
+	DO_TESTCASE_6I(desc, name, 132);			\
+	DO_TESTCASE_6I(desc, name, 213);			\
+	DO_TESTCASE_6I(desc, name, 231);			\
+	DO_TESTCASE_6I(desc, name, 312);			\
+	DO_TESTCASE_6I(desc, name, 321);
+
+void locking_selftest(void)
+{
+	/*
+	 * Got a locking failure before the selftest ran?
+	 */
+	if (!debug_locks) {
+		printk("----------------------------------\n");
+		printk("| Locking API testsuite disabled |\n");
+		printk("----------------------------------\n");
+		return;
+	}
+
+	/*
+	 * Run the testsuite:
+	 */
+	printk("------------------------\n");
+	printk("| Locking API testsuite:\n");
+	printk("----------------------------------------------------------------------------\n");
+	printk("                                 | spin |wlock |rlock |mutex | wsem | rsem |\n");
+	printk("  --------------------------------------------------------------------------\n");
+
+	init_shared_types();
+	debug_locks_silent = !debug_locks_verbose;
+
+	DO_TESTCASE_6("A-A deadlock", AA);
+	DO_TESTCASE_6R("A-B-B-A deadlock", ABBA);
+	DO_TESTCASE_6R("A-B-B-C-C-A deadlock", ABBCCA);
+	DO_TESTCASE_6R("A-B-C-A-B-C deadlock", ABCABC);
+	DO_TESTCASE_6R("A-B-B-C-C-D-D-A deadlock", ABBCCDDA);
+	DO_TESTCASE_6R("A-B-C-D-B-D-D-A deadlock", ABCDBDDA);
+	DO_TESTCASE_6R("A-B-C-D-B-C-D-A deadlock", ABCDBCDA);
+	DO_TESTCASE_6("double unlock", double_unlock);
+	DO_TESTCASE_6("bad unlock order", bad_unlock_order);
+
+	printk("  --------------------------------------------------------------------------\n");
+	print_testname("recursive read-lock");
+	printk("             |");
+	dotest(rlock_AA1, SUCCESS, LOCKTYPE_RWLOCK);
+	printk("             |");
+	dotest(rsem_AA1, FAILURE, LOCKTYPE_RWLOCK);
+	printk("\n");
+
+	printk("  --------------------------------------------------------------------------\n");
+
+#ifdef CONFIG_LOCKDEP
+	print_testname("non-nested unlock");
+	dotest(spin_order_nn, SUCCESS, LOCKTYPE_SPIN);
+	dotest(rlock_order_nn, SUCCESS, LOCKTYPE_RWLOCK);
+	dotest(mutex_order_nn, SUCCESS, LOCKTYPE_MUTEX);
+	dotest(rsem_order_nn, SUCCESS, LOCKTYPE_RWSEM);
+	printk("\n");
+	printk("  ------------------------------------------------------------\n");
+#endif
+	/*
+	 * irq-context testcases:
+	 */
+	DO_TESTCASE_2x6("irqs-on + irq-safe-A", irqsafe1);
+	DO_TESTCASE_2x3("sirq-safe-A => hirqs-on", irqsafe2A);
+	DO_TESTCASE_2x6("safe-A + irqs-on", irqsafe2B);
+	DO_TESTCASE_6x6("safe-A + unsafe-B #1", irqsafe3);
+	DO_TESTCASE_6x6("safe-A + unsafe-B #2", irqsafe4);
+	DO_TESTCASE_6x6("irq lock-inversion", irq_inversion);
+
+	DO_TESTCASE_6x2("irq read-recursion", irq_read_recursion);
+//	DO_TESTCASE_6x2B("irq read-recursion #2", irq_read_recursion2);
+
+	if (unexpected_testcase_failures) {
+		printk("-----------------------------------------------------------------\n");
+		debug_locks = 0;
+		printk("BUG: %3d unexpected failures (out of %3d) - debugging disabled! |\n",
+			unexpected_testcase_failures, testcase_total);
+		printk("-----------------------------------------------------------------\n");
+	} else if (expected_testcase_failures && testcase_successes) {
+		printk("--------------------------------------------------------\n");
+		printk("%3d out of %3d testcases failed, as expected. |\n",
+			expected_testcase_failures, testcase_total);
+		printk("----------------------------------------------------\n");
+		debug_locks = 1;
+	} else if (expected_testcase_failures && !testcase_successes) {
+		printk("--------------------------------------------------------\n");
+		printk("All %3d testcases failed, as expected. |\n",
+			expected_testcase_failures);
+		printk("----------------------------------------\n");
+		debug_locks = 1;
+	} else {
+		printk("-------------------------------------------------------\n");
+		printk("Good, all %3d testcases passed! |\n",
+			testcase_successes);
+		printk("---------------------------------\n");
+		debug_locks = 1;
+	}
+	debug_locks_silent = 0;
+}

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 09/61] lock validator: spin/rwlock init cleanups
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (7 preceding siblings ...)
  2006-05-29 21:23 ` [patch 08/61] lock validator: locking API self-tests Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-29 21:23 ` [patch 10/61] lock validator: locking init debugging improvement Ingo Molnar
                   ` (64 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

locking init cleanups:

 - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
 - convert rwlocks in a similar manner

this patch was generated automatically.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/ia64/sn/kernel/irq.c                    |    2 +-
 arch/mips/kernel/smtc.c                      |    4 ++--
 arch/powerpc/platforms/cell/spufs/switch.c   |    2 +-
 arch/powerpc/platforms/powermac/pfunc_core.c |    2 +-
 arch/powerpc/platforms/pseries/eeh_event.c   |    2 +-
 arch/powerpc/sysdev/mmio_nvram.c             |    2 +-
 arch/xtensa/kernel/time.c                    |    2 +-
 arch/xtensa/kernel/traps.c                   |    2 +-
 drivers/char/drm/drm_memory_debug.h          |    2 +-
 drivers/char/drm/via_dmablit.c               |    2 +-
 drivers/char/epca.c                          |    2 +-
 drivers/char/moxa.c                          |    2 +-
 drivers/char/specialix.c                     |    2 +-
 drivers/char/sx.c                            |    2 +-
 drivers/isdn/gigaset/common.c                |    2 +-
 drivers/leds/led-core.c                      |    2 +-
 drivers/leds/led-triggers.c                  |    2 +-
 drivers/message/i2o/exec-osm.c               |    2 +-
 drivers/misc/ibmasm/module.c                 |    2 +-
 drivers/pcmcia/m8xx_pcmcia.c                 |    4 ++--
 drivers/rapidio/rio-access.c                 |    4 ++--
 drivers/rtc/rtc-sa1100.c                     |    2 +-
 drivers/rtc/rtc-vr41xx.c                     |    2 +-
 drivers/s390/block/dasd_eer.c                |    2 +-
 drivers/scsi/libata-core.c                   |    2 +-
 drivers/sn/ioc3.c                            |    2 +-
 drivers/usb/ip/stub_dev.c                    |    4 ++--
 drivers/usb/ip/vhci_hcd.c                    |    4 ++--
 drivers/video/backlight/hp680_bl.c           |    2 +-
 fs/gfs2/ops_fstype.c                         |    2 +-
 fs/nfsd/nfs4state.c                          |    2 +-
 fs/ocfs2/cluster/heartbeat.c                 |    2 +-
 fs/ocfs2/cluster/tcp.c                       |    2 +-
 fs/ocfs2/dlm/dlmdomain.c                     |    2 +-
 fs/ocfs2/dlm/dlmlock.c                       |    2 +-
 fs/ocfs2/dlm/dlmrecovery.c                   |    4 ++--
 fs/ocfs2/dlmglue.c                           |    2 +-
 fs/ocfs2/journal.c                           |    2 +-
 fs/reiser4/block_alloc.c                     |    2 +-
 fs/reiser4/debug.c                           |    2 +-
 fs/reiser4/fsdata.c                          |    2 +-
 fs/reiser4/txnmgr.c                          |    2 +-
 include/asm-alpha/core_t2.h                  |    2 +-
 kernel/audit.c                               |    2 +-
 mm/sparse.c                                  |    2 +-
 net/ipv6/route.c                             |    2 +-
 net/sunrpc/auth_gss/gss_krb5_seal.c          |    2 +-
 net/tipc/bcast.c                             |    4 ++--
 net/tipc/bearer.c                            |    2 +-
 net/tipc/config.c                            |    2 +-
 net/tipc/dbg.c                               |    2 +-
 net/tipc/handler.c                           |    2 +-
 net/tipc/name_table.c                        |    4 ++--
 net/tipc/net.c                               |    2 +-
 net/tipc/node.c                              |    2 +-
 net/tipc/port.c                              |    4 ++--
 net/tipc/ref.c                               |    4 ++--
 net/tipc/subscr.c                            |    2 +-
 net/tipc/user_reg.c                          |    2 +-
 59 files changed, 69 insertions(+), 69 deletions(-)

Index: linux/arch/ia64/sn/kernel/irq.c
===================================================================
--- linux.orig/arch/ia64/sn/kernel/irq.c
+++ linux/arch/ia64/sn/kernel/irq.c
@@ -27,7 +27,7 @@ static void unregister_intr_pda(struct s
 int sn_force_interrupt_flag = 1;
 extern int sn_ioif_inited;
 struct list_head **sn_irq_lh;
-static spinlock_t sn_irq_info_lock = SPIN_LOCK_UNLOCKED; /* non-IRQ lock */
+static DEFINE_SPINLOCK(sn_irq_info_lock); /* non-IRQ lock */
 
 u64 sn_intr_alloc(nasid_t local_nasid, int local_widget,
 				     struct sn_irq_info *sn_irq_info,
Index: linux/arch/mips/kernel/smtc.c
===================================================================
--- linux.orig/arch/mips/kernel/smtc.c
+++ linux/arch/mips/kernel/smtc.c
@@ -367,7 +367,7 @@ void mipsmt_prepare_cpus(void)
 	dvpe();
 	dmt();
 
-	freeIPIq.lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&freeIPIq.lock);
 
 	/*
 	 * We probably don't have as many VPEs as we do SMP "CPUs",
@@ -375,7 +375,7 @@ void mipsmt_prepare_cpus(void)
 	 */
 	for (i=0; i<NR_CPUS; i++) {
 		IPIQ[i].head = IPIQ[i].tail = NULL;
-		IPIQ[i].lock = SPIN_LOCK_UNLOCKED;
+		spin_lock_init(&IPIQ[i].lock);
 		IPIQ[i].depth = 0;
 		ipi_timer_latch[i] = 0;
 	}
Index: linux/arch/powerpc/platforms/cell/spufs/switch.c
===================================================================
--- linux.orig/arch/powerpc/platforms/cell/spufs/switch.c
+++ linux/arch/powerpc/platforms/cell/spufs/switch.c
@@ -2183,7 +2183,7 @@ void spu_init_csa(struct spu_state *csa)
 
 	memset(lscsa, 0, sizeof(struct spu_lscsa));
 	csa->lscsa = lscsa;
-	csa->register_lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&csa->register_lock);
 
 	/* Set LS pages reserved to allow for user-space mapping. */
 	for (p = lscsa->ls; p < lscsa->ls + LS_SIZE; p += PAGE_SIZE)
Index: linux/arch/powerpc/platforms/powermac/pfunc_core.c
===================================================================
--- linux.orig/arch/powerpc/platforms/powermac/pfunc_core.c
+++ linux/arch/powerpc/platforms/powermac/pfunc_core.c
@@ -545,7 +545,7 @@ struct pmf_device {
 };
 
 static LIST_HEAD(pmf_devices);
-static spinlock_t pmf_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(pmf_lock);
 
 static void pmf_release_device(struct kref *kref)
 {
Index: linux/arch/powerpc/platforms/pseries/eeh_event.c
===================================================================
--- linux.orig/arch/powerpc/platforms/pseries/eeh_event.c
+++ linux/arch/powerpc/platforms/pseries/eeh_event.c
@@ -35,7 +35,7 @@
  */
 
 /* EEH event workqueue setup. */
-static spinlock_t eeh_eventlist_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(eeh_eventlist_lock);
 LIST_HEAD(eeh_eventlist);
 static void eeh_thread_launcher(void *);
 DECLARE_WORK(eeh_event_wq, eeh_thread_launcher, NULL);
Index: linux/arch/powerpc/sysdev/mmio_nvram.c
===================================================================
--- linux.orig/arch/powerpc/sysdev/mmio_nvram.c
+++ linux/arch/powerpc/sysdev/mmio_nvram.c
@@ -32,7 +32,7 @@
 
 static void __iomem *mmio_nvram_start;
 static long mmio_nvram_len;
-static spinlock_t mmio_nvram_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(mmio_nvram_lock);
 
 static ssize_t mmio_nvram_read(char *buf, size_t count, loff_t *index)
 {
Index: linux/arch/xtensa/kernel/time.c
===================================================================
--- linux.orig/arch/xtensa/kernel/time.c
+++ linux/arch/xtensa/kernel/time.c
@@ -29,7 +29,7 @@
 
 extern volatile unsigned long wall_jiffies;
 
-spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(rtc_lock);
 EXPORT_SYMBOL(rtc_lock);
 
 
Index: linux/arch/xtensa/kernel/traps.c
===================================================================
--- linux.orig/arch/xtensa/kernel/traps.c
+++ linux/arch/xtensa/kernel/traps.c
@@ -461,7 +461,7 @@ void show_code(unsigned int *pc)
 	}
 }
 
-spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(die_lock);
 
 void die(const char * str, struct pt_regs * regs, long err)
 {
Index: linux/drivers/char/drm/drm_memory_debug.h
===================================================================
--- linux.orig/drivers/char/drm/drm_memory_debug.h
+++ linux/drivers/char/drm/drm_memory_debug.h
@@ -43,7 +43,7 @@ typedef struct drm_mem_stats {
 	unsigned long bytes_freed;
 } drm_mem_stats_t;
 
-static spinlock_t drm_mem_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(drm_mem_lock);
 static unsigned long drm_ram_available = 0;	/* In pages */
 static unsigned long drm_ram_used = 0;
 static drm_mem_stats_t drm_mem_stats[] =
Index: linux/drivers/char/drm/via_dmablit.c
===================================================================
--- linux.orig/drivers/char/drm/via_dmablit.c
+++ linux/drivers/char/drm/via_dmablit.c
@@ -557,7 +557,7 @@ via_init_dmablit(drm_device_t *dev)
 		blitq->num_outstanding = 0;
 		blitq->is_active = 0;
 		blitq->aborting = 0;
-		blitq->blit_lock = SPIN_LOCK_UNLOCKED;
+		spin_lock_init(&blitq->blit_lock);
 		for (j=0; j<VIA_NUM_BLIT_SLOTS; ++j) {
 			DRM_INIT_WAITQUEUE(blitq->blit_queue + j);
 		}
Index: linux/drivers/char/epca.c
===================================================================
--- linux.orig/drivers/char/epca.c
+++ linux/drivers/char/epca.c
@@ -80,7 +80,7 @@ static int invalid_lilo_config;
 /* The ISA boards do window flipping into the same spaces so its only sane
    with a single lock. It's still pretty efficient */
 
-static spinlock_t epca_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(epca_lock);
 
 /* -----------------------------------------------------------------------
 	MAXBOARDS is typically 12, but ISA and EISA cards are restricted to 
Index: linux/drivers/char/moxa.c
===================================================================
--- linux.orig/drivers/char/moxa.c
+++ linux/drivers/char/moxa.c
@@ -301,7 +301,7 @@ static struct tty_operations moxa_ops = 
 	.tiocmset = moxa_tiocmset,
 };
 
-static spinlock_t moxa_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(moxa_lock);
 
 #ifdef CONFIG_PCI
 static int moxa_get_PCI_conf(struct pci_dev *p, int board_type, moxa_board_conf * board)
Index: linux/drivers/char/specialix.c
===================================================================
--- linux.orig/drivers/char/specialix.c
+++ linux/drivers/char/specialix.c
@@ -2477,7 +2477,7 @@ static int __init specialix_init(void)
 #endif
 
 	for (i = 0; i < SX_NBOARD; i++)
-		sx_board[i].lock = SPIN_LOCK_UNLOCKED;
+		spin_lock_init(&sx_board[i].lock);
 
 	if (sx_init_drivers()) {
 		func_exit();
Index: linux/drivers/char/sx.c
===================================================================
--- linux.orig/drivers/char/sx.c
+++ linux/drivers/char/sx.c
@@ -2320,7 +2320,7 @@ static int sx_init_portstructs (int nboa
 #ifdef NEW_WRITE_LOCKING
 			port->gs.port_write_mutex = MUTEX;
 #endif
-			port->gs.driver_lock = SPIN_LOCK_UNLOCKED;
+			spin_lock_init(&port->gs.driver_lock);
 			/*
 			 * Initializing wait queue
 			 */
Index: linux/drivers/isdn/gigaset/common.c
===================================================================
--- linux.orig/drivers/isdn/gigaset/common.c
+++ linux/drivers/isdn/gigaset/common.c
@@ -981,7 +981,7 @@ exit:
 EXPORT_SYMBOL_GPL(gigaset_stop);
 
 static LIST_HEAD(drivers);
-static spinlock_t driver_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(driver_lock);
 
 struct cardstate *gigaset_get_cs_by_id(int id)
 {
Index: linux/drivers/leds/led-core.c
===================================================================
--- linux.orig/drivers/leds/led-core.c
+++ linux/drivers/leds/led-core.c
@@ -18,7 +18,7 @@
 #include <linux/leds.h>
 #include "leds.h"
 
-rwlock_t leds_list_lock = RW_LOCK_UNLOCKED;
+DEFINE_RWLOCK(leds_list_lock);
 LIST_HEAD(leds_list);
 
 EXPORT_SYMBOL_GPL(leds_list);
Index: linux/drivers/leds/led-triggers.c
===================================================================
--- linux.orig/drivers/leds/led-triggers.c
+++ linux/drivers/leds/led-triggers.c
@@ -26,7 +26,7 @@
 /*
  * Nests outside led_cdev->trigger_lock
  */
-static rwlock_t triggers_list_lock = RW_LOCK_UNLOCKED;
+static DEFINE_RWLOCK(triggers_list_lock);
 static LIST_HEAD(trigger_list);
 
 ssize_t led_trigger_store(struct class_device *dev, const char *buf,
Index: linux/drivers/message/i2o/exec-osm.c
===================================================================
--- linux.orig/drivers/message/i2o/exec-osm.c
+++ linux/drivers/message/i2o/exec-osm.c
@@ -213,7 +213,7 @@ static int i2o_msg_post_wait_complete(st
 {
 	struct i2o_exec_wait *wait, *tmp;
 	unsigned long flags;
-	static spinlock_t lock = SPIN_LOCK_UNLOCKED;
+	static DEFINE_SPINLOCK(lock);
 	int rc = 1;
 
 	/*
Index: linux/drivers/misc/ibmasm/module.c
===================================================================
--- linux.orig/drivers/misc/ibmasm/module.c
+++ linux/drivers/misc/ibmasm/module.c
@@ -85,7 +85,7 @@ static int __devinit ibmasm_init_one(str
 	}
 	memset(sp, 0, sizeof(struct service_processor));
 
-	sp->lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&sp->lock);
 	INIT_LIST_HEAD(&sp->command_queue);
 
 	pci_set_drvdata(pdev, (void *)sp);
Index: linux/drivers/pcmcia/m8xx_pcmcia.c
===================================================================
--- linux.orig/drivers/pcmcia/m8xx_pcmcia.c
+++ linux/drivers/pcmcia/m8xx_pcmcia.c
@@ -157,7 +157,7 @@ MODULE_LICENSE("Dual MPL/GPL");
 
 static int pcmcia_schlvl = PCMCIA_SCHLVL;
 
-static spinlock_t events_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(events_lock);
 
 
 #define PCMCIA_SOCKET_KEY_5V 1
@@ -644,7 +644,7 @@ static struct platform_device m8xx_devic
 };
 
 static u32 pending_events[PCMCIA_SOCKETS_NO];
-static spinlock_t pending_event_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(pending_event_lock);
 
 static irqreturn_t m8xx_interrupt(int irq, void *dev, struct pt_regs *regs)
 {
Index: linux/drivers/rapidio/rio-access.c
===================================================================
--- linux.orig/drivers/rapidio/rio-access.c
+++ linux/drivers/rapidio/rio-access.c
@@ -17,8 +17,8 @@
  * These interrupt-safe spinlocks protect all accesses to RIO
  * configuration space and doorbell access.
  */
-static spinlock_t rio_config_lock = SPIN_LOCK_UNLOCKED;
-static spinlock_t rio_doorbell_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(rio_config_lock);
+static DEFINE_SPINLOCK(rio_doorbell_lock);
 
 /*
  *  Wrappers for all RIO configuration access functions.  They just check
Index: linux/drivers/rtc/rtc-sa1100.c
===================================================================
--- linux.orig/drivers/rtc/rtc-sa1100.c
+++ linux/drivers/rtc/rtc-sa1100.c
@@ -45,7 +45,7 @@
 
 static unsigned long rtc_freq = 1024;
 static struct rtc_time rtc_alarm;
-static spinlock_t sa1100_rtc_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(sa1100_rtc_lock);
 
 static int rtc_update_alarm(struct rtc_time *alrm)
 {
Index: linux/drivers/rtc/rtc-vr41xx.c
===================================================================
--- linux.orig/drivers/rtc/rtc-vr41xx.c
+++ linux/drivers/rtc/rtc-vr41xx.c
@@ -93,7 +93,7 @@ static void __iomem *rtc2_base;
 
 static unsigned long epoch = 1970;	/* Jan 1 1970 00:00:00 */
 
-static spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(rtc_lock);
 static char rtc_name[] = "RTC";
 static unsigned long periodic_frequency;
 static unsigned long periodic_count;
Index: linux/drivers/s390/block/dasd_eer.c
===================================================================
--- linux.orig/drivers/s390/block/dasd_eer.c
+++ linux/drivers/s390/block/dasd_eer.c
@@ -89,7 +89,7 @@ struct eerbuffer {
 };
 
 static LIST_HEAD(bufferlist);
-static spinlock_t bufferlock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(bufferlock);
 static DECLARE_WAIT_QUEUE_HEAD(dasd_eer_read_wait_queue);
 
 /*
Index: linux/drivers/scsi/libata-core.c
===================================================================
--- linux.orig/drivers/scsi/libata-core.c
+++ linux/drivers/scsi/libata-core.c
@@ -5605,7 +5605,7 @@ module_init(ata_init);
 module_exit(ata_exit);
 
 static unsigned long ratelimit_time;
-static spinlock_t ata_ratelimit_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(ata_ratelimit_lock);
 
 int ata_ratelimit(void)
 {
Index: linux/drivers/sn/ioc3.c
===================================================================
--- linux.orig/drivers/sn/ioc3.c
+++ linux/drivers/sn/ioc3.c
@@ -26,7 +26,7 @@ static DECLARE_RWSEM(ioc3_devices_rwsem)
 
 static struct ioc3_submodule *ioc3_submodules[IOC3_MAX_SUBMODULES];
 static struct ioc3_submodule *ioc3_ethernet;
-static rwlock_t ioc3_submodules_lock = RW_LOCK_UNLOCKED;
+static DEFINE_RWLOCK(ioc3_submodules_lock);
 
 /* NIC probing code */
 
Index: linux/drivers/usb/ip/stub_dev.c
===================================================================
--- linux.orig/drivers/usb/ip/stub_dev.c
+++ linux/drivers/usb/ip/stub_dev.c
@@ -285,13 +285,13 @@ static struct stub_device * stub_device_
 
 	sdev->ud.side = USBIP_STUB;
 	sdev->ud.status = SDEV_ST_AVAILABLE;
-	sdev->ud.lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&sdev->ud.lock);
 	sdev->ud.tcp_socket = NULL;
 
 	INIT_LIST_HEAD(&sdev->priv_init);
 	INIT_LIST_HEAD(&sdev->priv_tx);
 	INIT_LIST_HEAD(&sdev->priv_free);
-	sdev->priv_lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&sdev->priv_lock);
 
 	sdev->ud.eh_ops.shutdown = stub_shutdown_connection;
 	sdev->ud.eh_ops.reset    = stub_device_reset;
Index: linux/drivers/usb/ip/vhci_hcd.c
===================================================================
--- linux.orig/drivers/usb/ip/vhci_hcd.c
+++ linux/drivers/usb/ip/vhci_hcd.c
@@ -768,11 +768,11 @@ static void vhci_device_init(struct vhci
 
 	vdev->ud.side   = USBIP_VHCI;
 	vdev->ud.status = VDEV_ST_NULL;
-	vdev->ud.lock   = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&vdev->ud.lock  );
 
 	INIT_LIST_HEAD(&vdev->priv_rx);
 	INIT_LIST_HEAD(&vdev->priv_tx);
-	vdev->priv_lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&vdev->priv_lock);
 
 	init_waitqueue_head(&vdev->waitq);
 
Index: linux/drivers/video/backlight/hp680_bl.c
===================================================================
--- linux.orig/drivers/video/backlight/hp680_bl.c
+++ linux/drivers/video/backlight/hp680_bl.c
@@ -27,7 +27,7 @@
 
 static int hp680bl_suspended;
 static int current_intensity = 0;
-static spinlock_t bl_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(bl_lock);
 static struct backlight_device *hp680_backlight_device;
 
 static void hp680bl_send_intensity(struct backlight_device *bd)
Index: linux/fs/gfs2/ops_fstype.c
===================================================================
--- linux.orig/fs/gfs2/ops_fstype.c
+++ linux/fs/gfs2/ops_fstype.c
@@ -58,7 +58,7 @@ static struct gfs2_sbd *init_sbd(struct 
 	gfs2_tune_init(&sdp->sd_tune);
 
 	for (x = 0; x < GFS2_GL_HASH_SIZE; x++) {
-		sdp->sd_gl_hash[x].hb_lock = RW_LOCK_UNLOCKED;
+		rwlock_init(&sdp->sd_gl_hash[x].hb_lock);
 		INIT_LIST_HEAD(&sdp->sd_gl_hash[x].hb_list);
 	}
 	INIT_LIST_HEAD(&sdp->sd_reclaim_list);
Index: linux/fs/nfsd/nfs4state.c
===================================================================
--- linux.orig/fs/nfsd/nfs4state.c
+++ linux/fs/nfsd/nfs4state.c
@@ -123,7 +123,7 @@ static void release_stateid(struct nfs4_
  */
 
 /* recall_lock protects the del_recall_lru */
-static spinlock_t recall_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(recall_lock);
 static struct list_head del_recall_lru;
 
 static void
Index: linux/fs/ocfs2/cluster/heartbeat.c
===================================================================
--- linux.orig/fs/ocfs2/cluster/heartbeat.c
+++ linux/fs/ocfs2/cluster/heartbeat.c
@@ -54,7 +54,7 @@ static DECLARE_RWSEM(o2hb_callback_sem);
  * multiple hb threads are watching multiple regions.  A node is live
  * whenever any of the threads sees activity from the node in its region.
  */
-static spinlock_t o2hb_live_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(o2hb_live_lock);
 static struct list_head o2hb_live_slots[O2NM_MAX_NODES];
 static unsigned long o2hb_live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
 static LIST_HEAD(o2hb_node_events);
Index: linux/fs/ocfs2/cluster/tcp.c
===================================================================
--- linux.orig/fs/ocfs2/cluster/tcp.c
+++ linux/fs/ocfs2/cluster/tcp.c
@@ -107,7 +107,7 @@
 	    ##args);							\
 } while (0)
 
-static rwlock_t o2net_handler_lock = RW_LOCK_UNLOCKED;
+static DEFINE_RWLOCK(o2net_handler_lock);
 static struct rb_root o2net_handler_tree = RB_ROOT;
 
 static struct o2net_node o2net_nodes[O2NM_MAX_NODES];
Index: linux/fs/ocfs2/dlm/dlmdomain.c
===================================================================
--- linux.orig/fs/ocfs2/dlm/dlmdomain.c
+++ linux/fs/ocfs2/dlm/dlmdomain.c
@@ -88,7 +88,7 @@ out_free:
  *
  */
 
-spinlock_t dlm_domain_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(dlm_domain_lock);
 LIST_HEAD(dlm_domains);
 static DECLARE_WAIT_QUEUE_HEAD(dlm_domain_events);
 
Index: linux/fs/ocfs2/dlm/dlmlock.c
===================================================================
--- linux.orig/fs/ocfs2/dlm/dlmlock.c
+++ linux/fs/ocfs2/dlm/dlmlock.c
@@ -53,7 +53,7 @@
 #define MLOG_MASK_PREFIX ML_DLM
 #include "cluster/masklog.h"
 
-static spinlock_t dlm_cookie_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(dlm_cookie_lock);
 static u64 dlm_next_cookie = 1;
 
 static enum dlm_status dlm_send_remote_lock_request(struct dlm_ctxt *dlm,
Index: linux/fs/ocfs2/dlm/dlmrecovery.c
===================================================================
--- linux.orig/fs/ocfs2/dlm/dlmrecovery.c
+++ linux/fs/ocfs2/dlm/dlmrecovery.c
@@ -101,8 +101,8 @@ static int dlm_lockres_master_requery(st
 
 static u64 dlm_get_next_mig_cookie(void);
 
-static spinlock_t dlm_reco_state_lock = SPIN_LOCK_UNLOCKED;
-static spinlock_t dlm_mig_cookie_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(dlm_reco_state_lock);
+static DEFINE_SPINLOCK(dlm_mig_cookie_lock);
 static u64 dlm_mig_cookie = 1;
 
 static u64 dlm_get_next_mig_cookie(void)
Index: linux/fs/ocfs2/dlmglue.c
===================================================================
--- linux.orig/fs/ocfs2/dlmglue.c
+++ linux/fs/ocfs2/dlmglue.c
@@ -242,7 +242,7 @@ static void ocfs2_build_lock_name(enum o
 	mlog_exit_void();
 }
 
-static spinlock_t ocfs2_dlm_tracking_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(ocfs2_dlm_tracking_lock);
 
 static void ocfs2_add_lockres_tracking(struct ocfs2_lock_res *res,
 				       struct ocfs2_dlm_debug *dlm_debug)
Index: linux/fs/ocfs2/journal.c
===================================================================
--- linux.orig/fs/ocfs2/journal.c
+++ linux/fs/ocfs2/journal.c
@@ -49,7 +49,7 @@
 
 #include "buffer_head_io.h"
 
-spinlock_t trans_inc_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(trans_inc_lock);
 
 static int ocfs2_force_read_journal(struct inode *inode);
 static int ocfs2_recover_node(struct ocfs2_super *osb,
Index: linux/fs/reiser4/block_alloc.c
===================================================================
--- linux.orig/fs/reiser4/block_alloc.c
+++ linux/fs/reiser4/block_alloc.c
@@ -499,7 +499,7 @@ void cluster_reserved2free(int count)
 	spin_unlock_reiser4_super(sbinfo);
 }
 
-static spinlock_t fake_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(fake_lock);
 static reiser4_block_nr fake_gen = 0;
 
 /* obtain a block number for new formatted node which will be used to refer
Index: linux/fs/reiser4/debug.c
===================================================================
--- linux.orig/fs/reiser4/debug.c
+++ linux/fs/reiser4/debug.c
@@ -52,7 +52,7 @@ static char panic_buf[REISER4_PANIC_MSG_
 /*
  * lock protecting consistency of panic_buf under concurrent panics
  */
-static spinlock_t panic_guard = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(panic_guard);
 
 /* Your best friend. Call it on each occasion.  This is called by
     fs/reiser4/debug.h:reiser4_panic(). */
Index: linux/fs/reiser4/fsdata.c
===================================================================
--- linux.orig/fs/reiser4/fsdata.c
+++ linux/fs/reiser4/fsdata.c
@@ -17,7 +17,7 @@ static LIST_HEAD(cursor_cache);
 static unsigned long d_cursor_unused = 0;
 
 /* spinlock protecting manipulations with dir_cursor's hash table and lists */
-spinlock_t d_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(d_lock);
 
 static reiser4_file_fsdata *create_fsdata(struct file *file);
 static int file_is_stateless(struct file *file);
Index: linux/fs/reiser4/txnmgr.c
===================================================================
--- linux.orig/fs/reiser4/txnmgr.c
+++ linux/fs/reiser4/txnmgr.c
@@ -905,7 +905,7 @@ jnode *find_first_dirty_jnode(txn_atom *
 
 /* this spin lock is used to prevent races during steal on capture.
    FIXME: should be per filesystem or even per atom */
-spinlock_t scan_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(scan_lock);
 
 /* Scan atom->writeback_nodes list and dispatch jnodes according to their state:
  * move dirty and !writeback jnodes to @fq, clean jnodes to atom's clean
Index: linux/include/asm-alpha/core_t2.h
===================================================================
--- linux.orig/include/asm-alpha/core_t2.h
+++ linux/include/asm-alpha/core_t2.h
@@ -435,7 +435,7 @@ static inline void t2_outl(u32 b, unsign
 	set_hae(msb); \
 }
 
-static spinlock_t t2_hae_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(t2_hae_lock);
 
 __EXTERN_INLINE u8 t2_readb(const volatile void __iomem *xaddr)
 {
Index: linux/kernel/audit.c
===================================================================
--- linux.orig/kernel/audit.c
+++ linux/kernel/audit.c
@@ -787,7 +787,7 @@ err:
  */
 unsigned int audit_serial(void)
 {
-	static spinlock_t serial_lock = SPIN_LOCK_UNLOCKED;
+	static DEFINE_SPINLOCK(serial_lock);
 	static unsigned int serial = 0;
 
 	unsigned long flags;
Index: linux/mm/sparse.c
===================================================================
--- linux.orig/mm/sparse.c
+++ linux/mm/sparse.c
@@ -45,7 +45,7 @@ static struct mem_section *sparse_index_
 
 static int sparse_index_init(unsigned long section_nr, int nid)
 {
-	static spinlock_t index_init_lock = SPIN_LOCK_UNLOCKED;
+	static DEFINE_SPINLOCK(index_init_lock);
 	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
 	struct mem_section *section;
 	int ret = 0;
Index: linux/net/ipv6/route.c
===================================================================
--- linux.orig/net/ipv6/route.c
+++ linux/net/ipv6/route.c
@@ -343,7 +343,7 @@ static struct rt6_info *rt6_select(struc
 	    (strict & RT6_SELECT_F_REACHABLE) &&
 	    last && last != rt0) {
 		/* no entries matched; do round-robin */
-		static spinlock_t lock = SPIN_LOCK_UNLOCKED;
+		static DEFINE_SPINLOCK(lock);
 		spin_lock(&lock);
 		*head = rt0->u.next;
 		rt0->u.next = last->u.next;
Index: linux/net/sunrpc/auth_gss/gss_krb5_seal.c
===================================================================
--- linux.orig/net/sunrpc/auth_gss/gss_krb5_seal.c
+++ linux/net/sunrpc/auth_gss/gss_krb5_seal.c
@@ -70,7 +70,7 @@
 # define RPCDBG_FACILITY        RPCDBG_AUTH
 #endif
 
-spinlock_t krb5_seq_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(krb5_seq_lock);
 
 u32
 gss_get_mic_kerberos(struct gss_ctx *gss_ctx, struct xdr_buf *text,
Index: linux/net/tipc/bcast.c
===================================================================
--- linux.orig/net/tipc/bcast.c
+++ linux/net/tipc/bcast.c
@@ -102,7 +102,7 @@ struct bclink {
 static struct bcbearer *bcbearer = NULL;
 static struct bclink *bclink = NULL;
 static struct link *bcl = NULL;
-static spinlock_t bc_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(bc_lock);
 
 char tipc_bclink_name[] = "multicast-link";
 
@@ -783,7 +783,7 @@ int tipc_bclink_init(void)
 	memset(bclink, 0, sizeof(struct bclink));
 	INIT_LIST_HEAD(&bcl->waiting_ports);
 	bcl->next_out_no = 1;
-	bclink->node.lock =  SPIN_LOCK_UNLOCKED;        
+	spin_lock_init(&bclink->node.lock);
 	bcl->owner = &bclink->node;
         bcl->max_pkt = MAX_PKT_DEFAULT_MCAST;
 	tipc_link_set_queue_limits(bcl, BCLINK_WIN_DEFAULT);
Index: linux/net/tipc/bearer.c
===================================================================
--- linux.orig/net/tipc/bearer.c
+++ linux/net/tipc/bearer.c
@@ -552,7 +552,7 @@ restart:
 		b_ptr->link_req = tipc_disc_init_link_req(b_ptr, &m_ptr->bcast_addr,
 							  bcast_scope, 2);
 	}
-	b_ptr->publ.lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&b_ptr->publ.lock);
 	write_unlock_bh(&tipc_net_lock);
 	info("Enabled bearer <%s>, discovery domain %s, priority %u\n",
 	     name, addr_string_fill(addr_string, bcast_scope), priority);
Index: linux/net/tipc/config.c
===================================================================
--- linux.orig/net/tipc/config.c
+++ linux/net/tipc/config.c
@@ -63,7 +63,7 @@ struct manager {
 
 static struct manager mng = { 0};
 
-static spinlock_t config_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(config_lock);
 
 static const void *req_tlv_area;	/* request message TLV area */
 static int req_tlv_space;		/* request message TLV area size */
Index: linux/net/tipc/dbg.c
===================================================================
--- linux.orig/net/tipc/dbg.c
+++ linux/net/tipc/dbg.c
@@ -41,7 +41,7 @@
 #define MAX_STRING 512
 
 static char print_string[MAX_STRING];
-static spinlock_t print_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(print_lock);
 
 static struct print_buf cons_buf = { NULL, 0, NULL, NULL };
 struct print_buf *TIPC_CONS = &cons_buf;
Index: linux/net/tipc/handler.c
===================================================================
--- linux.orig/net/tipc/handler.c
+++ linux/net/tipc/handler.c
@@ -44,7 +44,7 @@ struct queue_item {
 
 static kmem_cache_t *tipc_queue_item_cache;
 static struct list_head signal_queue_head;
-static spinlock_t qitem_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(qitem_lock);
 static int handler_enabled = 0;
 
 static void process_signal_queue(unsigned long dummy);
Index: linux/net/tipc/name_table.c
===================================================================
--- linux.orig/net/tipc/name_table.c
+++ linux/net/tipc/name_table.c
@@ -101,7 +101,7 @@ struct name_table {
 
 static struct name_table table = { NULL } ;
 static atomic_t rsv_publ_ok = ATOMIC_INIT(0);
-rwlock_t tipc_nametbl_lock = RW_LOCK_UNLOCKED;
+DEFINE_RWLOCK(tipc_nametbl_lock);
 
 
 static int hash(int x)
@@ -172,7 +172,7 @@ static struct name_seq *tipc_nameseq_cre
 	}
 
 	memset(nseq, 0, sizeof(*nseq));
-	nseq->lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&nseq->lock);
 	nseq->type = type;
 	nseq->sseqs = sseq;
 	dbg("tipc_nameseq_create() nseq = %x type %u, ssseqs %x, ff: %u\n",
Index: linux/net/tipc/net.c
===================================================================
--- linux.orig/net/tipc/net.c
+++ linux/net/tipc/net.c
@@ -115,7 +115,7 @@
  *     - A local spin_lock protecting the queue of subscriber events.
 */
 
-rwlock_t tipc_net_lock = RW_LOCK_UNLOCKED;
+DEFINE_RWLOCK(tipc_net_lock);
 struct network tipc_net = { NULL };
 
 struct node *tipc_net_select_remote_node(u32 addr, u32 ref) 
Index: linux/net/tipc/node.c
===================================================================
--- linux.orig/net/tipc/node.c
+++ linux/net/tipc/node.c
@@ -64,7 +64,7 @@ struct node *tipc_node_create(u32 addr)
         if (n_ptr != NULL) {
                 memset(n_ptr, 0, sizeof(*n_ptr));
                 n_ptr->addr = addr;
-                n_ptr->lock =  SPIN_LOCK_UNLOCKED;	
+                spin_lock_init(&n_ptr->lock);
                 INIT_LIST_HEAD(&n_ptr->nsub);
 	
 		c_ptr = tipc_cltr_find(addr);
Index: linux/net/tipc/port.c
===================================================================
--- linux.orig/net/tipc/port.c
+++ linux/net/tipc/port.c
@@ -57,8 +57,8 @@
 static struct sk_buff *msg_queue_head = NULL;
 static struct sk_buff *msg_queue_tail = NULL;
 
-spinlock_t tipc_port_list_lock = SPIN_LOCK_UNLOCKED;
-static spinlock_t queue_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(tipc_port_list_lock);
+static DEFINE_SPINLOCK(queue_lock);
 
 static LIST_HEAD(ports);
 static void port_handle_node_down(unsigned long ref);
Index: linux/net/tipc/ref.c
===================================================================
--- linux.orig/net/tipc/ref.c
+++ linux/net/tipc/ref.c
@@ -63,7 +63,7 @@
 
 struct ref_table tipc_ref_table = { NULL };
 
-static rwlock_t ref_table_lock = RW_LOCK_UNLOCKED;
+static DEFINE_RWLOCK(ref_table_lock);
 
 /**
  * tipc_ref_table_init - create reference table for objects
@@ -87,7 +87,7 @@ int tipc_ref_table_init(u32 requested_si
 	index_mask = sz - 1;
 	for (i = sz - 1; i >= 0; i--) {
 		table[i].object = NULL;
-		table[i].lock = SPIN_LOCK_UNLOCKED;
+		spin_lock_init(&table[i].lock);
 		table[i].data.next_plus_upper = (start & ~index_mask) + i - 1;
 	}
 	tipc_ref_table.entries = table;
Index: linux/net/tipc/subscr.c
===================================================================
--- linux.orig/net/tipc/subscr.c
+++ linux/net/tipc/subscr.c
@@ -457,7 +457,7 @@ int tipc_subscr_start(void)
 	int res = -1;
 
 	memset(&topsrv, 0, sizeof (topsrv));
-	topsrv.lock = SPIN_LOCK_UNLOCKED;
+	spin_lock_init(&topsrv.lock);
 	INIT_LIST_HEAD(&topsrv.subscriber_list);
 
 	spin_lock_bh(&topsrv.lock);
Index: linux/net/tipc/user_reg.c
===================================================================
--- linux.orig/net/tipc/user_reg.c
+++ linux/net/tipc/user_reg.c
@@ -67,7 +67,7 @@ struct tipc_user {
 
 static struct tipc_user *users = NULL;
 static u32 next_free_user = MAX_USERID + 1;
-static spinlock_t reg_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(reg_lock);
 
 /**
  * reg_init - create TIPC user registry (but don't activate it)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 10/61] lock validator: locking init debugging improvement
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (8 preceding siblings ...)
  2006-05-29 21:23 ` [patch 09/61] lock validator: spin/rwlock init cleanups Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-29 21:23 ` [patch 11/61] lock validator: lockdep: small xfs init_rwsem() cleanup Ingo Molnar
                   ` (63 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

locking init improvement:

 - introduce and use __SPIN_LOCK_UNLOCKED for array initializations,
   to pass in the name string of locks, used by debugging

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86_64/kernel/smpboot.c   |    3 +++
 arch/x86_64/kernel/vsyscall.c  |    2 +-
 block/ll_rw_blk.c              |    1 +
 drivers/char/random.c          |    6 +++---
 drivers/ide/ide-io.c           |    2 ++
 drivers/scsi/libata-core.c     |    2 ++
 drivers/spi/spi.c              |    1 +
 fs/dcache.c                    |    2 +-
 include/linux/idr.h            |    2 +-
 include/linux/init_task.h      |   10 +++++-----
 include/linux/notifier.h       |    2 +-
 include/linux/seqlock.h        |   12 ++++++++++--
 include/linux/spinlock_types.h |   15 +++++++++------
 include/linux/wait.h           |    2 +-
 kernel/kmod.c                  |    2 ++
 kernel/rcupdate.c              |    4 ++--
 kernel/timer.c                 |    2 +-
 mm/swap_state.c                |    2 +-
 net/ipv4/tcp_ipv4.c            |    2 +-
 net/ipv4/tcp_minisocks.c       |    2 +-
 net/ipv4/xfrm4_policy.c        |    4 ++--
 21 files changed, 51 insertions(+), 29 deletions(-)

Index: linux/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smpboot.c
+++ linux/arch/x86_64/kernel/smpboot.c
@@ -771,8 +771,11 @@ static int __cpuinit do_boot_cpu(int cpu
 		.cpu = cpu,
 		.done = COMPLETION_INITIALIZER(c_idle.done),
 	};
+
 	DECLARE_WORK(work, do_fork_idle, &c_idle);
 
+	init_completion(&c_idle.done);
+
 	/* allocate memory for gdts of secondary cpus. Hotplug is considered */
 	if (!cpu_gdt_descr[cpu].address &&
 		!(cpu_gdt_descr[cpu].address = get_zeroed_page(GFP_KERNEL))) {
Index: linux/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux.orig/arch/x86_64/kernel/vsyscall.c
+++ linux/arch/x86_64/kernel/vsyscall.c
@@ -37,7 +37,7 @@
 #define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
 
 int __sysctl_vsyscall __section_sysctl_vsyscall = 1;
-seqlock_t __xtime_lock __section_xtime_lock = SEQLOCK_UNLOCKED;
+__section_xtime_lock DEFINE_SEQLOCK(__xtime_lock);
 
 #include <asm/unistd.h>
 
Index: linux/block/ll_rw_blk.c
===================================================================
--- linux.orig/block/ll_rw_blk.c
+++ linux/block/ll_rw_blk.c
@@ -2529,6 +2529,7 @@ int blk_execute_rq(request_queue_t *q, s
 	char sense[SCSI_SENSE_BUFFERSIZE];
 	int err = 0;
 
+	init_completion(&wait);
 	/*
 	 * we need an extra reference to the request, so we can look at
 	 * it after io completion
Index: linux/drivers/char/random.c
===================================================================
--- linux.orig/drivers/char/random.c
+++ linux/drivers/char/random.c
@@ -417,7 +417,7 @@ static struct entropy_store input_pool =
 	.poolinfo = &poolinfo_table[0],
 	.name = "input",
 	.limit = 1,
-	.lock = SPIN_LOCK_UNLOCKED,
+	.lock = __SPIN_LOCK_UNLOCKED(&input_pool.lock),
 	.pool = input_pool_data
 };
 
@@ -426,7 +426,7 @@ static struct entropy_store blocking_poo
 	.name = "blocking",
 	.limit = 1,
 	.pull = &input_pool,
-	.lock = SPIN_LOCK_UNLOCKED,
+	.lock = __SPIN_LOCK_UNLOCKED(&blocking_pool.lock),
 	.pool = blocking_pool_data
 };
 
@@ -434,7 +434,7 @@ static struct entropy_store nonblocking_
 	.poolinfo = &poolinfo_table[1],
 	.name = "nonblocking",
 	.pull = &input_pool,
-	.lock = SPIN_LOCK_UNLOCKED,
+	.lock = __SPIN_LOCK_UNLOCKED(&nonblocking_pool.lock),
 	.pool = nonblocking_pool_data
 };
 
Index: linux/drivers/ide/ide-io.c
===================================================================
--- linux.orig/drivers/ide/ide-io.c
+++ linux/drivers/ide/ide-io.c
@@ -1700,6 +1700,8 @@ int ide_do_drive_cmd (ide_drive_t *drive
 	int where = ELEVATOR_INSERT_BACK, err;
 	int must_wait = (action == ide_wait || action == ide_head_wait);
 
+	init_completion(&wait);
+
 	rq->errors = 0;
 	rq->rq_status = RQ_ACTIVE;
 
Index: linux/drivers/scsi/libata-core.c
===================================================================
--- linux.orig/drivers/scsi/libata-core.c
+++ linux/drivers/scsi/libata-core.c
@@ -994,6 +994,8 @@ unsigned ata_exec_internal(struct ata_de
 	unsigned int err_mask;
 	int rc;
 
+	init_completion(&wait);
+
 	spin_lock_irqsave(&ap->host_set->lock, flags);
 
 	/* no internal command while frozen */
Index: linux/drivers/spi/spi.c
===================================================================
--- linux.orig/drivers/spi/spi.c
+++ linux/drivers/spi/spi.c
@@ -512,6 +512,7 @@ int spi_sync(struct spi_device *spi, str
 	DECLARE_COMPLETION(done);
 	int status;
 
+	init_completion(&done);
 	message->complete = spi_complete;
 	message->context = &done;
 	status = spi_async(spi, message);
Index: linux/fs/dcache.c
===================================================================
--- linux.orig/fs/dcache.c
+++ linux/fs/dcache.c
@@ -39,7 +39,7 @@ int sysctl_vfs_cache_pressure __read_mos
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
  __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
-static seqlock_t rename_lock __cacheline_aligned_in_smp = SEQLOCK_UNLOCKED;
+static __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
 EXPORT_SYMBOL(dcache_lock);
 
Index: linux/include/linux/idr.h
===================================================================
--- linux.orig/include/linux/idr.h
+++ linux/include/linux/idr.h
@@ -66,7 +66,7 @@ struct idr {
 	.id_free	= NULL,					\
 	.layers 	= 0,					\
 	.id_free_cnt	= 0,					\
-	.lock		= SPIN_LOCK_UNLOCKED,			\
+	.lock		= __SPIN_LOCK_UNLOCKED(name.lock),	\
 }
 #define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
 
Index: linux/include/linux/init_task.h
===================================================================
--- linux.orig/include/linux/init_task.h
+++ linux/include/linux/init_task.h
@@ -22,7 +22,7 @@
 	.count		= ATOMIC_INIT(1), 		\
 	.fdt		= &init_files.fdtab, 		\
 	.fdtab		= INIT_FDTABLE,			\
-	.file_lock	= SPIN_LOCK_UNLOCKED, 		\
+	.file_lock	= __SPIN_LOCK_UNLOCKED(init_task.file_lock), \
 	.next_fd	= 0, 				\
 	.close_on_exec_init = { { 0, } }, 		\
 	.open_fds_init	= { { 0, } }, 			\
@@ -37,7 +37,7 @@
 	.user_id	= 0,				\
 	.next		= NULL,				\
 	.wait		= __WAIT_QUEUE_HEAD_INITIALIZER(name.wait), \
-	.ctx_lock	= SPIN_LOCK_UNLOCKED,		\
+	.ctx_lock	= __SPIN_LOCK_UNLOCKED(name.ctx_lock), \
 	.reqs_active	= 0U,				\
 	.max_reqs	= ~0U,				\
 }
@@ -49,7 +49,7 @@
 	.mm_users	= ATOMIC_INIT(2), 			\
 	.mm_count	= ATOMIC_INIT(1), 			\
 	.mmap_sem	= __RWSEM_INITIALIZER(name.mmap_sem),	\
-	.page_table_lock =  SPIN_LOCK_UNLOCKED, 		\
+	.page_table_lock =  __SPIN_LOCK_UNLOCKED(name.page_table_lock),	\
 	.mmlist		= LIST_HEAD_INIT(name.mmlist),		\
 	.cpu_vm_mask	= CPU_MASK_ALL,				\
 }
@@ -78,7 +78,7 @@ extern struct nsproxy init_nsproxy;
 #define INIT_SIGHAND(sighand) {						\
 	.count		= ATOMIC_INIT(1), 				\
 	.action		= { { { .sa_handler = NULL, } }, },		\
-	.siglock	= SPIN_LOCK_UNLOCKED, 				\
+	.siglock	= __SPIN_LOCK_UNLOCKED(sighand.siglock),	\
 }
 
 extern struct group_info init_groups;
@@ -129,7 +129,7 @@ extern struct group_info init_groups;
 		.list = LIST_HEAD_INIT(tsk.pending.list),		\
 		.signal = {{0}}},					\
 	.blocked	= {{0}},					\
-	.alloc_lock	= SPIN_LOCK_UNLOCKED,				\
+	.alloc_lock	= __SPIN_LOCK_UNLOCKED(tsk.alloc_lock),		\
 	.journal_info	= NULL,						\
 	.cpu_timers	= INIT_CPU_TIMERS(tsk.cpu_timers),		\
 	.fs_excl	= ATOMIC_INIT(0),				\
Index: linux/include/linux/notifier.h
===================================================================
--- linux.orig/include/linux/notifier.h
+++ linux/include/linux/notifier.h
@@ -65,7 +65,7 @@ struct raw_notifier_head {
 	} while (0)
 
 #define ATOMIC_NOTIFIER_INIT(name) {				\
-		.lock = SPIN_LOCK_UNLOCKED,			\
+		.lock = __SPIN_LOCK_UNLOCKED(name.lock),	\
 		.head = NULL }
 #define BLOCKING_NOTIFIER_INIT(name) {				\
 		.rwsem = __RWSEM_INITIALIZER((name).rwsem),	\
Index: linux/include/linux/seqlock.h
===================================================================
--- linux.orig/include/linux/seqlock.h
+++ linux/include/linux/seqlock.h
@@ -38,9 +38,17 @@ typedef struct {
  * These macros triggered gcc-3.x compile-time problems.  We think these are
  * OK now.  Be cautious.
  */
-#define SEQLOCK_UNLOCKED { 0, SPIN_LOCK_UNLOCKED }
-#define seqlock_init(x)	do { *(x) = (seqlock_t) SEQLOCK_UNLOCKED; } while (0)
+#define __SEQLOCK_UNLOCKED(lockname) \
+		 { 0, __SPIN_LOCK_UNLOCKED(lockname) }
 
+#define SEQLOCK_UNLOCKED \
+		 __SEQLOCK_UNLOCKED(old_style_seqlock_init)
+
+#define seqlock_init(x) \
+		do { *(x) = (seqlock_t) __SEQLOCK_UNLOCKED(x); } while (0)
+
+#define DEFINE_SEQLOCK(x) \
+		seqlock_t x = __SEQLOCK_UNLOCKED(x)
 
 /* Lock out other writers and update the count.
  * Acts like a normal spin_lock/unlock.
Index: linux/include/linux/spinlock_types.h
===================================================================
--- linux.orig/include/linux/spinlock_types.h
+++ linux/include/linux/spinlock_types.h
@@ -44,24 +44,27 @@ typedef struct {
 #define SPINLOCK_OWNER_INIT	((void *)-1L)
 
 #ifdef CONFIG_DEBUG_SPINLOCK
-# define SPIN_LOCK_UNLOCKED						\
+# define __SPIN_LOCK_UNLOCKED(lockname)					\
 	(spinlock_t)	{	.raw_lock = __RAW_SPIN_LOCK_UNLOCKED,	\
 				.magic = SPINLOCK_MAGIC,		\
 				.owner = SPINLOCK_OWNER_INIT,		\
 				.owner_cpu = -1 }
-#define RW_LOCK_UNLOCKED						\
+#define __RW_LOCK_UNLOCKED(lockname)					\
 	(rwlock_t)	{	.raw_lock = __RAW_RW_LOCK_UNLOCKED,	\
 				.magic = RWLOCK_MAGIC,			\
 				.owner = SPINLOCK_OWNER_INIT,		\
 				.owner_cpu = -1 }
 #else
-# define SPIN_LOCK_UNLOCKED \
+# define __SPIN_LOCK_UNLOCKED(lockname) \
 	(spinlock_t)	{	.raw_lock = __RAW_SPIN_LOCK_UNLOCKED }
-#define RW_LOCK_UNLOCKED \
+#define __RW_LOCK_UNLOCKED(lockname) \
 	(rwlock_t)	{	.raw_lock = __RAW_RW_LOCK_UNLOCKED }
 #endif
 
-#define DEFINE_SPINLOCK(x)	spinlock_t x = SPIN_LOCK_UNLOCKED
-#define DEFINE_RWLOCK(x)	rwlock_t x = RW_LOCK_UNLOCKED
+#define SPIN_LOCK_UNLOCKED	__SPIN_LOCK_UNLOCKED(old_style_spin_init)
+#define RW_LOCK_UNLOCKED	__RW_LOCK_UNLOCKED(old_style_rw_init)
+
+#define DEFINE_SPINLOCK(x)	spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
+#define DEFINE_RWLOCK(x)	rwlock_t x = __RW_LOCK_UNLOCKED(x)
 
 #endif /* __LINUX_SPINLOCK_TYPES_H */
Index: linux/include/linux/wait.h
===================================================================
--- linux.orig/include/linux/wait.h
+++ linux/include/linux/wait.h
@@ -68,7 +68,7 @@ struct task_struct;
 	wait_queue_t name = __WAITQUEUE_INITIALIZER(name, tsk)
 
 #define __WAIT_QUEUE_HEAD_INITIALIZER(name) {				\
-	.lock		= SPIN_LOCK_UNLOCKED,				\
+	.lock		= __SPIN_LOCK_UNLOCKED(name.lock),		\
 	.task_list	= { &(name).task_list, &(name).task_list } }
 
 #define DECLARE_WAIT_QUEUE_HEAD(name) \
Index: linux/kernel/kmod.c
===================================================================
--- linux.orig/kernel/kmod.c
+++ linux/kernel/kmod.c
@@ -246,6 +246,8 @@ int call_usermodehelper_keys(char *path,
 	};
 	DECLARE_WORK(work, __call_usermodehelper, &sub_info);
 
+	init_completion(&done);
+
 	if (!khelper_wq)
 		return -EBUSY;
 
Index: linux/kernel/rcupdate.c
===================================================================
--- linux.orig/kernel/rcupdate.c
+++ linux/kernel/rcupdate.c
@@ -53,13 +53,13 @@
 static struct rcu_ctrlblk rcu_ctrlblk = {
 	.cur = -300,
 	.completed = -300,
-	.lock = SPIN_LOCK_UNLOCKED,
+	.lock = __SPIN_LOCK_UNLOCKED(&rcu_ctrlblk.lock),
 	.cpumask = CPU_MASK_NONE,
 };
 static struct rcu_ctrlblk rcu_bh_ctrlblk = {
 	.cur = -300,
 	.completed = -300,
-	.lock = SPIN_LOCK_UNLOCKED,
+	.lock = __SPIN_LOCK_UNLOCKED(&rcu_bh_ctrlblk.lock),
 	.cpumask = CPU_MASK_NONE,
 };
 
Index: linux/kernel/timer.c
===================================================================
--- linux.orig/kernel/timer.c
+++ linux/kernel/timer.c
@@ -1142,7 +1142,7 @@ unsigned long wall_jiffies = INITIAL_JIF
  * playing with xtime and avenrun.
  */
 #ifndef ARCH_HAVE_XTIME_LOCK
-seqlock_t xtime_lock __cacheline_aligned_in_smp = SEQLOCK_UNLOCKED;
+__cacheline_aligned_in_smp DEFINE_SEQLOCK(xtime_lock);
 
 EXPORT_SYMBOL(xtime_lock);
 #endif
Index: linux/mm/swap_state.c
===================================================================
--- linux.orig/mm/swap_state.c
+++ linux/mm/swap_state.c
@@ -39,7 +39,7 @@ static struct backing_dev_info swap_back
 
 struct address_space swapper_space = {
 	.page_tree	= RADIX_TREE_INIT(GFP_ATOMIC|__GFP_NOWARN),
-	.tree_lock	= RW_LOCK_UNLOCKED,
+	.tree_lock	= __RW_LOCK_UNLOCKED(swapper_space.tree_lock),
 	.a_ops		= &swap_aops,
 	.i_mmap_nonlinear = LIST_HEAD_INIT(swapper_space.i_mmap_nonlinear),
 	.backing_dev_info = &swap_backing_dev_info,
Index: linux/net/ipv4/tcp_ipv4.c
===================================================================
--- linux.orig/net/ipv4/tcp_ipv4.c
+++ linux/net/ipv4/tcp_ipv4.c
@@ -90,7 +90,7 @@ static struct socket *tcp_socket;
 void tcp_v4_send_check(struct sock *sk, int len, struct sk_buff *skb);
 
 struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
-	.lhash_lock	= RW_LOCK_UNLOCKED,
+	.lhash_lock	= __RW_LOCK_UNLOCKED(tcp_hashinfo.lhash_lock),
 	.lhash_users	= ATOMIC_INIT(0),
 	.lhash_wait	= __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
Index: linux/net/ipv4/tcp_minisocks.c
===================================================================
--- linux.orig/net/ipv4/tcp_minisocks.c
+++ linux/net/ipv4/tcp_minisocks.c
@@ -41,7 +41,7 @@ int sysctl_tcp_abort_on_overflow;
 struct inet_timewait_death_row tcp_death_row = {
 	.sysctl_max_tw_buckets = NR_FILE * 2,
 	.period		= TCP_TIMEWAIT_LEN / INET_TWDR_TWKILL_SLOTS,
-	.death_lock	= SPIN_LOCK_UNLOCKED,
+	.death_lock	= __SPIN_LOCK_UNLOCKED(tcp_death_row.death_lock),
 	.hashinfo	= &tcp_hashinfo,
 	.tw_timer	= TIMER_INITIALIZER(inet_twdr_hangman, 0,
 					    (unsigned long)&tcp_death_row),
Index: linux/net/ipv4/xfrm4_policy.c
===================================================================
--- linux.orig/net/ipv4/xfrm4_policy.c
+++ linux/net/ipv4/xfrm4_policy.c
@@ -17,7 +17,7 @@
 static struct dst_ops xfrm4_dst_ops;
 static struct xfrm_policy_afinfo xfrm4_policy_afinfo;
 
-static struct xfrm_type_map xfrm4_type_map = { .lock = RW_LOCK_UNLOCKED };
+static struct xfrm_type_map xfrm4_type_map = { .lock = __RW_LOCK_UNLOCKED(xfrm4_type_map.lock) };
 
 static int xfrm4_dst_lookup(struct xfrm_dst **dst, struct flowi *fl)
 {
@@ -299,7 +299,7 @@ static struct dst_ops xfrm4_dst_ops = {
 
 static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
 	.family = 		AF_INET,
-	.lock = 		RW_LOCK_UNLOCKED,
+	.lock = 		__RW_LOCK_UNLOCKED(xfrm4_policy_afinfo.lock),
 	.type_map = 		&xfrm4_type_map,
 	.dst_ops =		&xfrm4_dst_ops,
 	.dst_lookup =		xfrm4_dst_lookup,

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 11/61] lock validator: lockdep: small xfs init_rwsem() cleanup
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (9 preceding siblings ...)
  2006-05-29 21:23 ` [patch 10/61] lock validator: locking init debugging improvement Ingo Molnar
@ 2006-05-29 21:23 ` Ingo Molnar
  2006-05-30  1:33   ` Andrew Morton
  2006-05-29 21:24 ` [patch 12/61] lock validator: beautify x86_64 stacktraces Ingo Molnar
                   ` (62 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

init_rwsem() has no return value. This is not a problem if init_rwsem()
is a function, but it's a problem if it's a do { ... } while (0) macro.
(which lockdep introduces)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/xfs/linux-2.6/mrlock.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/fs/xfs/linux-2.6/mrlock.h
===================================================================
--- linux.orig/fs/xfs/linux-2.6/mrlock.h
+++ linux/fs/xfs/linux-2.6/mrlock.h
@@ -28,7 +28,7 @@ typedef struct {
 } mrlock_t;
 
 #define mrinit(mrp, name)	\
-	( (mrp)->mr_writer = 0, init_rwsem(&(mrp)->mr_lock) )
+	do { (mrp)->mr_writer = 0; init_rwsem(&(mrp)->mr_lock); } while (0)
 #define mrlock_init(mrp, t,n,s)	mrinit(mrp, n)
 #define mrfree(mrp)		do { } while (0)
 #define mraccess(mrp)		mraccessf(mrp, 0)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 12/61] lock validator: beautify x86_64 stacktraces
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (10 preceding siblings ...)
  2006-05-29 21:23 ` [patch 11/61] lock validator: lockdep: small xfs init_rwsem() cleanup Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-30  1:33   ` Andrew Morton
  2006-05-29 21:24 ` [patch 13/61] lock validator: x86_64: document stack frame internals Ingo Molnar
                   ` (61 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

beautify x86_64 stacktraces to be more readable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86_64/kernel/traps.c  |   55 ++++++++++++++++++++------------------------
 include/asm-x86_64/kdebug.h |    2 -
 2 files changed, 27 insertions(+), 30 deletions(-)

Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -108,28 +108,30 @@ static inline void preempt_conditional_c
 static int kstack_depth_to_print = 10;
 
 #ifdef CONFIG_KALLSYMS
-#include <linux/kallsyms.h> 
-int printk_address(unsigned long address)
-{ 
+# include <linux/kallsyms.h>
+void printk_address(unsigned long address)
+{
 	unsigned long offset = 0, symsize;
 	const char *symname;
 	char *modname;
-	char *delim = ":"; 
+	char *delim = ":";
 	char namebuf[128];
 
-	symname = kallsyms_lookup(address, &symsize, &offset, &modname, namebuf); 
-	if (!symname) 
-		return printk("[<%016lx>]", address);
-	if (!modname) 
+	symname = kallsyms_lookup(address, &symsize, &offset, &modname, namebuf);
+	if (!symname) {
+		printk(" [<%016lx>]", address);
+		return;
+	}
+	if (!modname)
 		modname = delim = ""; 		
-        return printk("<%016lx>{%s%s%s%s%+ld}",
-		      address, delim, modname, delim, symname, offset); 
-} 
+	printk(" [<%016lx>] %s%s%s%s+0x%lx/0x%lx",
+		address, delim, modname, delim, symname, offset, symsize);
+}
 #else
-int printk_address(unsigned long address)
-{ 
-	return printk("[<%016lx>]", address);
-} 
+void printk_address(unsigned long address)
+{
+	printk(" [<%016lx>]", address);
+}
 #endif
 
 static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
@@ -200,21 +202,14 @@ void show_trace(unsigned long *stack)
 {
 	const unsigned cpu = safe_smp_processor_id();
 	unsigned long *irqstack_end = (unsigned long *)cpu_pda(cpu)->irqstackptr;
-	int i;
 	unsigned used = 0;
 
-	printk("\nCall Trace:");
+	printk("\nCall Trace:\n");
 
 #define HANDLE_STACK(cond) \
 	do while (cond) { \
 		unsigned long addr = *stack++; \
 		if (kernel_text_address(addr)) { \
-			if (i > 50) { \
-				printk("\n       "); \
-				i = 0; \
-			} \
-			else \
-				i += printk(" "); \
 			/* \
 			 * If the address is either in the text segment of the \
 			 * kernel, or in the region which contains vmalloc'ed \
@@ -223,20 +218,21 @@ void show_trace(unsigned long *stack)
 			 * down the cause of the crash will be able to figure \
 			 * out the call path that was taken. \
 			 */ \
-			i += printk_address(addr); \
+			printk_address(addr); \
+			printk("\n"); \
 		} \
 	} while (0)
 
-	for(i = 11; ; ) {
+	for ( ; ; ) {
 		const char *id;
 		unsigned long *estack_end;
 		estack_end = in_exception_stack(cpu, (unsigned long)stack,
 						&used, &id);
 
 		if (estack_end) {
-			i += printk(" <%s>", id);
+			printk(" <%s>", id);
 			HANDLE_STACK (stack < estack_end);
-			i += printk(" <EOE>");
+			printk(" <EOE>");
 			stack = (unsigned long *) estack_end[-2];
 			continue;
 		}
@@ -246,11 +242,11 @@ void show_trace(unsigned long *stack)
 				(IRQSTACKSIZE - 64) / sizeof(*irqstack);
 
 			if (stack >= irqstack && stack < irqstack_end) {
-				i += printk(" <IRQ>");
+				printk(" <IRQ>");
 				HANDLE_STACK (stack < irqstack_end);
 				stack = (unsigned long *) (irqstack_end[-1]);
 				irqstack_end = NULL;
-				i += printk(" <EOI>");
+				printk(" <EOI>");
 				continue;
 			}
 		}
@@ -259,6 +255,7 @@ void show_trace(unsigned long *stack)
 
 	HANDLE_STACK (((long) stack & (THREAD_SIZE-1)) != 0);
 #undef HANDLE_STACK
+
 	printk("\n");
 }
 
Index: linux/include/asm-x86_64/kdebug.h
===================================================================
--- linux.orig/include/asm-x86_64/kdebug.h
+++ linux/include/asm-x86_64/kdebug.h
@@ -49,7 +49,7 @@ static inline int notify_die(enum die_va
 	return atomic_notifier_call_chain(&die_chain, val, &args);
 } 
 
-extern int printk_address(unsigned long address);
+extern void printk_address(unsigned long address);
 extern void die(const char *,struct pt_regs *,long);
 extern void __die(const char *,struct pt_regs *,long);
 extern void show_registers(struct pt_regs *regs);

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 13/61] lock validator: x86_64: document stack frame internals
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (11 preceding siblings ...)
  2006-05-29 21:24 ` [patch 12/61] lock validator: beautify x86_64 stacktraces Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-29 21:24 ` [patch 14/61] lock validator: stacktrace Ingo Molnar
                   ` (60 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

document stack frame nesting internals some more.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86_64/kernel/traps.c |   64 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -134,8 +134,9 @@ void printk_address(unsigned long addres
 }
 #endif
 
-static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
-					unsigned *usedp, const char **idp)
+unsigned long *
+in_exception_stack(unsigned cpu, unsigned long stack, unsigned *usedp,
+		   const char **idp)
 {
 	static char ids[][8] = {
 		[DEBUG_STACK - 1] = "#DB",
@@ -149,10 +150,22 @@ static unsigned long *in_exception_stack
 	};
 	unsigned k;
 
+	/*
+	 * Iterate over all exception stacks, and figure out whether
+	 * 'stack' is in one of them:
+	 */
 	for (k = 0; k < N_EXCEPTION_STACKS; k++) {
 		unsigned long end;
 
+		/*
+		 * set 'end' to the end of the exception stack.
+		 */
 		switch (k + 1) {
+		/*
+		 * TODO: this block is not needed i think, because
+		 * setup64.c:cpu_init() sets up t->ist[DEBUG_STACK]
+		 * properly too.
+		 */
 #if DEBUG_STKSZ > EXCEPTION_STKSZ
 		case DEBUG_STACK:
 			end = cpu_pda(cpu)->debugstack + DEBUG_STKSZ;
@@ -162,19 +175,43 @@ static unsigned long *in_exception_stack
 			end = per_cpu(init_tss, cpu).ist[k];
 			break;
 		}
+		/*
+		 * Is 'stack' above this exception frame's end?
+		 * If yes then skip to the next frame.
+		 */
 		if (stack >= end)
 			continue;
+		/*
+		 * Is 'stack' above this exception frame's start address?
+		 * If yes then we found the right frame.
+		 */
 		if (stack >= end - EXCEPTION_STKSZ) {
+			/*
+			 * Make sure we only iterate through an exception
+			 * stack once. If it comes up for the second time
+			 * then there's something wrong going on - just
+			 * break out and return NULL:
+			 */
 			if (*usedp & (1U << k))
 				break;
 			*usedp |= 1U << k;
 			*idp = ids[k];
 			return (unsigned long *)end;
 		}
+		/*
+		 * If this is a debug stack, and if it has a larger size than
+		 * the usual exception stacks, then 'stack' might still
+		 * be within the lower portion of the debug stack:
+		 */
 #if DEBUG_STKSZ > EXCEPTION_STKSZ
 		if (k == DEBUG_STACK - 1 && stack >= end - DEBUG_STKSZ) {
 			unsigned j = N_EXCEPTION_STACKS - 1;
 
+			/*
+			 * Black magic. A large debug stack is composed of
+			 * multiple exception stack entries, which we
+			 * iterate through now. Dont look:
+			 */
 			do {
 				++j;
 				end -= EXCEPTION_STKSZ;
@@ -206,6 +243,11 @@ void show_trace(unsigned long *stack)
 
 	printk("\nCall Trace:\n");
 
+	/*
+	 * Print function call entries within a stack. 'cond' is the
+	 * "end of stackframe" condition, that the 'stack++'
+	 * iteration will eventually trigger.
+	 */
 #define HANDLE_STACK(cond) \
 	do while (cond) { \
 		unsigned long addr = *stack++; \
@@ -223,6 +265,11 @@ void show_trace(unsigned long *stack)
 		} \
 	} while (0)
 
+	/*
+	 * Print function call entries in all stacks, starting at the
+	 * current stack address. If the stacks consist of nested
+	 * exceptions
+	 */
 	for ( ; ; ) {
 		const char *id;
 		unsigned long *estack_end;
@@ -233,6 +280,11 @@ void show_trace(unsigned long *stack)
 			printk(" <%s>", id);
 			HANDLE_STACK (stack < estack_end);
 			printk(" <EOE>");
+			/*
+			 * We link to the next stack via the
+			 * second-to-last pointer (index -2 to end) in the
+			 * exception stack:
+			 */
 			stack = (unsigned long *) estack_end[-2];
 			continue;
 		}
@@ -244,6 +296,11 @@ void show_trace(unsigned long *stack)
 			if (stack >= irqstack && stack < irqstack_end) {
 				printk(" <IRQ>");
 				HANDLE_STACK (stack < irqstack_end);
+				/*
+				 * We link to the next stack (which would be
+				 * the process stack normally) the last
+				 * pointer (index -1 to end) in the IRQ stack:
+				 */
 				stack = (unsigned long *) (irqstack_end[-1]);
 				irqstack_end = NULL;
 				printk(" <EOI>");
@@ -253,6 +310,9 @@ void show_trace(unsigned long *stack)
 		break;
 	}
 
+	/*
+	 * This prints the process stack:
+	 */
 	HANDLE_STACK (((long) stack & (THREAD_SIZE-1)) != 0);
 #undef HANDLE_STACK
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 14/61] lock validator: stacktrace
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (12 preceding siblings ...)
  2006-05-29 21:24 ` [patch 13/61] lock validator: x86_64: document stack frame internals Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-29 21:24 ` [patch 15/61] lock validator: x86_64: use stacktrace to generate backtraces Ingo Molnar
                   ` (59 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

framework to generate and save stacktraces quickly, without printing
anything to the console.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/Makefile       |    2 
 arch/i386/kernel/stacktrace.c   |   98 +++++++++++++++++
 arch/x86_64/kernel/Makefile     |    2 
 arch/x86_64/kernel/stacktrace.c |  219 ++++++++++++++++++++++++++++++++++++++++
 include/linux/stacktrace.h      |   15 ++
 kernel/Makefile                 |    2 
 kernel/stacktrace.c             |   26 ++++
 7 files changed, 361 insertions(+), 3 deletions(-)

Index: linux/arch/i386/kernel/Makefile
===================================================================
--- linux.orig/arch/i386/kernel/Makefile
+++ linux/arch/i386/kernel/Makefile
@@ -4,7 +4,7 @@
 
 extra-y := head.o init_task.o vmlinux.lds
 
-obj-y	:= process.o semaphore.o signal.o entry.o traps.o irq.o \
+obj-y	:= process.o semaphore.o signal.o entry.o traps.o irq.o stacktrace.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o bootflag.o \
 		quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
Index: linux/arch/i386/kernel/stacktrace.c
===================================================================
--- /dev/null
+++ linux/arch/i386/kernel/stacktrace.c
@@ -0,0 +1,98 @@
+/*
+ * arch/i386/kernel/stacktrace.c
+ *
+ * Stack trace management functions
+ *
+ *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ */
+#include <linux/sched.h>
+#include <linux/stacktrace.h>
+
+static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
+{
+	return	p > (void *)tinfo &&
+		p < (void *)tinfo + THREAD_SIZE - 3;
+}
+
+/*
+ * Save stack-backtrace addresses into a stack_trace buffer:
+ */
+static inline unsigned long
+save_context_stack(struct stack_trace *trace, unsigned int skip,
+		   struct thread_info *tinfo, unsigned long *stack,
+		   unsigned long ebp)
+{
+	unsigned long addr;
+
+#ifdef CONFIG_FRAME_POINTER
+	while (valid_stack_ptr(tinfo, (void *)ebp)) {
+		addr = *(unsigned long *)(ebp + 4);
+		if (!skip)
+			trace->entries[trace->nr_entries++] = addr;
+		else
+			skip--;
+		if (trace->nr_entries >= trace->max_entries)
+			break;
+		/*
+		 * break out of recursive entries (such as
+		 * end_of_stack_stop_unwind_function):
+	 	 */
+		if (ebp == *(unsigned long *)ebp)
+			break;
+
+		ebp = *(unsigned long *)ebp;
+	}
+#else
+	while (valid_stack_ptr(tinfo, stack)) {
+		addr = *stack++;
+		if (__kernel_text_address(addr)) {
+			if (!skip)
+				trace->entries[trace->nr_entries++] = addr;
+			else
+				skip--;
+			if (trace->nr_entries >= trace->max_entries)
+				break;
+		}
+	}
+#endif
+
+	return ebp;
+}
+
+/*
+ * Save stack-backtrace addresses into a stack_trace buffer.
+ * If all_contexts is set, all contexts (hardirq, softirq and process)
+ * are saved. If not set then only the current context is saved.
+ */
+void save_stack_trace(struct stack_trace *trace,
+		      struct task_struct *task, int all_contexts,
+		      unsigned int skip)
+{
+	unsigned long ebp;
+	unsigned long *stack = &ebp;
+
+	WARN_ON(trace->nr_entries || !trace->max_entries);
+
+	if (!task || task == current) {
+		/* Grab ebp right from our regs: */
+		asm ("movl %%ebp, %0" : "=r" (ebp));
+	} else {
+		/* ebp is the last reg pushed by switch_to(): */
+		ebp = *(unsigned long *) task->thread.esp;
+	}
+
+	while (1) {
+		struct thread_info *context = (struct thread_info *)
+				((unsigned long)stack & (~(THREAD_SIZE - 1)));
+
+		ebp = save_context_stack(trace, skip, context, stack, ebp);
+		stack = (unsigned long *)context->previous_esp;
+		if (!all_contexts || !stack ||
+				trace->nr_entries >= trace->max_entries)
+			break;
+		trace->entries[trace->nr_entries++] = ULONG_MAX;
+		if (trace->nr_entries >= trace->max_entries)
+			break;
+	}
+}
+
Index: linux/arch/x86_64/kernel/Makefile
===================================================================
--- linux.orig/arch/x86_64/kernel/Makefile
+++ linux/arch/x86_64/kernel/Makefile
@@ -4,7 +4,7 @@
 
 extra-y 	:= head.o head64.o init_task.o vmlinux.lds
 EXTRA_AFLAGS	:= -traditional
-obj-y	:= process.o signal.o entry.o traps.o irq.o \
+obj-y	:= process.o signal.o entry.o traps.o irq.o stacktrace.o \
 		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
 		x8664_ksyms.o i387.o syscall.o vsyscall.o \
 		setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
Index: linux/arch/x86_64/kernel/stacktrace.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/kernel/stacktrace.c
@@ -0,0 +1,219 @@
+/*
+ * arch/x86_64/kernel/stacktrace.c
+ *
+ * Stack trace management functions
+ *
+ *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ */
+#include <linux/sched.h>
+#include <linux/stacktrace.h>
+
+#include <asm/smp.h>
+
+static inline int
+in_range(unsigned long start, unsigned long addr, unsigned long end)
+{
+	return addr >= start && addr <= end;
+}
+
+static unsigned long
+get_stack_end(struct task_struct *task, unsigned long stack)
+{
+	unsigned long stack_start, stack_end, flags;
+	int i, cpu;
+
+	/*
+	 * The most common case is that we are in the task stack:
+	 */
+	stack_start = (unsigned long)task->thread_info;
+	stack_end = stack_start + THREAD_SIZE;
+
+	if (in_range(stack_start, stack, stack_end))
+		return stack_end;
+
+	/*
+	 * We are in an interrupt if irqstackptr is set:
+	 */
+	raw_local_irq_save(flags);
+	cpu = safe_smp_processor_id();
+	stack_end = (unsigned long)cpu_pda(cpu)->irqstackptr;
+
+	if (stack_end) {
+		stack_start = stack_end & ~(IRQSTACKSIZE-1);
+		if (in_range(stack_start, stack, stack_end))
+			goto out_restore;
+		/*
+		 * We get here if we are in an IRQ context but we
+		 * are also in an exception stack.
+		 */
+	}
+
+	/*
+	 * Iterate over all exception stacks, and figure out whether
+	 * 'stack' is in one of them:
+	 */
+	for (i = 0; i < N_EXCEPTION_STACKS; i++) {
+		/*
+		 * set 'end' to the end of the exception stack.
+		 */
+		stack_end = per_cpu(init_tss, cpu).ist[i];
+		stack_start = stack_end - EXCEPTION_STKSZ;
+
+		/*
+		 * Is 'stack' above this exception frame's end?
+		 * If yes then skip to the next frame.
+		 */
+		if (stack >= stack_end)
+			continue;
+		/*
+		 * Is 'stack' above this exception frame's start address?
+		 * If yes then we found the right frame.
+		 */
+		if (stack >= stack_start)
+			goto out_restore;
+
+		/*
+		 * If this is a debug stack, and if it has a larger size than
+		 * the usual exception stacks, then 'stack' might still
+		 * be within the lower portion of the debug stack:
+		 */
+#if DEBUG_STKSZ > EXCEPTION_STKSZ
+		if (i == DEBUG_STACK - 1 && stack >= stack_end - DEBUG_STKSZ) {
+			/*
+			 * Black magic. A large debug stack is composed of
+			 * multiple exception stack entries, which we
+			 * iterate through now. Dont look:
+			 */
+			do {
+				stack_end -= EXCEPTION_STKSZ;
+				stack_start -= EXCEPTION_STKSZ;
+			} while (stack < stack_start);
+
+			goto out_restore;
+		}
+#endif
+	}
+	/*
+	 * Ok, 'stack' is not pointing to any of the system stacks.
+	 */
+	stack_end = 0;
+
+out_restore:
+	raw_local_irq_restore(flags);
+
+	return stack_end;
+}
+
+
+/*
+ * Save stack-backtrace addresses into a stack_trace buffer:
+ */
+static inline unsigned long
+save_context_stack(struct stack_trace *trace, unsigned int skip,
+		   unsigned long stack, unsigned long stack_end)
+{
+	unsigned long addr, prev_stack = 0;
+
+#ifdef CONFIG_FRAME_POINTER
+	while (in_range(prev_stack, (unsigned long)stack, stack_end)) {
+		pr_debug("stack:          %p\n", (void *)stack);
+		addr = (unsigned long)(((unsigned long *)stack)[1]);
+		pr_debug("addr:           %p\n", (void *)addr);
+		if (!skip)
+			trace->entries[trace->nr_entries++] = addr-1;
+		else
+			skip--;
+		if (trace->nr_entries >= trace->max_entries)
+			break;
+		if (!addr)
+			return 0;
+		/*
+		 * Stack frames must go forwards (otherwise a loop could
+		 * happen if the stackframe is corrupted), so we move
+		 * prev_stack forwards:
+		 */
+		prev_stack = stack;
+		stack = (unsigned long)(((unsigned long *)stack)[0]);
+	}
+	pr_debug("invalid:        %p\n", (void *)stack);
+#else
+	while (stack < stack_end) {
+		addr = (unsigned long *)stack[0];
+		stack += sizeof(long);
+		if (__kernel_text_address(addr)) {
+			if (!skip)
+				trace->entries[trace->nr_entries++] = addr-1;
+			else
+				skip--;
+			if (trace->nr_entries >= trace->max_entries)
+				break;
+		}
+	}
+#endif
+	return stack;
+}
+
+#define MAX_STACKS 10
+
+/*
+ * Save stack-backtrace addresses into a stack_trace buffer.
+ * If all_contexts is set, all contexts (hardirq, softirq and process)
+ * are saved. If not set then only the current context is saved.
+ */
+void save_stack_trace(struct stack_trace *trace,
+		      struct task_struct *task, int all_contexts,
+		      unsigned int skip)
+{
+	unsigned long stack = (unsigned long)&stack;
+	int i, nr_stacks = 0, stacks_done[MAX_STACKS];
+
+	WARN_ON(trace->nr_entries || !trace->max_entries);
+
+	if (!task)
+		task = current;
+
+	pr_debug("task: %p, ti: %p\n", task, task->thread_info);
+
+	if (!task || task == current) {
+		/* Grab rbp right from our regs: */
+		asm ("mov %%rbp, %0" : "=r" (stack));
+		pr_debug("rbp:            %p\n", (void *)stack);
+	} else {
+		/* rbp is the last reg pushed by switch_to(): */
+		stack = task->thread.rsp;
+		pr_debug("other task rsp: %p\n", (void *)stack);
+		stack = (unsigned long)(((unsigned long *)stack)[0]);
+		pr_debug("other task rbp: %p\n", (void *)stack);
+	}
+
+	while (1) {
+		unsigned long stack_end = get_stack_end(task, stack);
+
+		pr_debug("stack:          %p\n", (void *)stack);
+		pr_debug("stack end:      %p\n", (void *)stack_end);
+
+		/*
+		 * Invalid stack addres?
+		 */
+		if (!stack_end)
+			return;
+		/*
+		 * Were we in this stack already? (recursion)
+		 */
+		for (i = 0; i < nr_stacks; i++)
+			if (stacks_done[i] == stack_end)
+				return;
+		stacks_done[nr_stacks] = stack_end;
+
+		stack = save_context_stack(trace, skip, stack, stack_end);
+		if (!all_contexts || !stack ||
+				trace->nr_entries >= trace->max_entries)
+			return;
+		trace->entries[trace->nr_entries++] = ULONG_MAX;
+		if (trace->nr_entries >= trace->max_entries)
+			return;
+		if (++nr_stacks >= MAX_STACKS)
+			return;
+	}
+}
+
Index: linux/include/linux/stacktrace.h
===================================================================
--- /dev/null
+++ linux/include/linux/stacktrace.h
@@ -0,0 +1,15 @@
+#ifndef __LINUX_STACKTRACE_H
+#define __LINUX_STACKTRACE_H
+
+struct stack_trace {
+	unsigned int nr_entries, max_entries;
+	unsigned long *entries;
+};
+
+extern void save_stack_trace(struct stack_trace *trace,
+			     struct task_struct *task, int all_contexts,
+			     unsigned int skip);
+
+extern void print_stack_trace(struct stack_trace *trace, int spaces);
+
+#endif
Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y     = sched.o fork.o exec_domain.o
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o extable.o params.o posix-timers.o \
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
-	    hrtimer.o nsproxy.o
+	    hrtimer.o nsproxy.o stacktrace.o
 
 obj-y += time/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
Index: linux/kernel/stacktrace.c
===================================================================
--- /dev/null
+++ linux/kernel/stacktrace.c
@@ -0,0 +1,26 @@
+/*
+ * kernel/stacktrace.c
+ *
+ * Stack trace management functions
+ *
+ *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ */
+#include <linux/sched.h>
+#include <linux/kallsyms.h>
+#include <linux/stacktrace.h>
+
+void print_stack_trace(struct stack_trace *trace, int spaces)
+{
+	int i, j;
+
+	for (i = 0; i < trace->nr_entries; i++) {
+		unsigned long ip = trace->entries[i];
+
+		for (j = 0; j < spaces + 1; j++)
+			printk(" ");
+
+		printk("[<%08lx>]", ip);
+		print_symbol(" %s\n", ip);
+	}
+}
+

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 15/61] lock validator: x86_64: use stacktrace to generate backtraces
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (13 preceding siblings ...)
  2006-05-29 21:24 ` [patch 14/61] lock validator: stacktrace Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-30  1:33   ` Andrew Morton
  2006-05-29 21:24 ` [patch 16/61] lock validator: fown locking workaround Ingo Molnar
                   ` (58 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

this switches x86_64 to use the stacktrace infrastructure when generating
backtrace printouts, if CONFIG_FRAME_POINTER=y. (This patch will go away
once the dwarf2 stackframe parser in -mm goes upstream.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86_64/kernel/traps.c |   35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -235,7 +235,31 @@ in_exception_stack(unsigned cpu, unsigne
  * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
  */
 
-void show_trace(unsigned long *stack)
+#ifdef CONFIG_FRAME_POINTER
+
+#include <linux/stacktrace.h>
+
+#define MAX_TRACE_ENTRIES 64
+
+static void __show_trace(struct task_struct *task, unsigned long *stack)
+{
+	unsigned long entries[MAX_TRACE_ENTRIES];
+	struct stack_trace trace;
+
+	trace.nr_entries = 0;
+	trace.max_entries = MAX_TRACE_ENTRIES;
+	trace.entries = entries;
+
+	save_stack_trace(&trace, task, 1, 0);
+
+	pr_debug("got %d/%d entries.\n", trace.nr_entries, trace.max_entries);
+
+	print_stack_trace(&trace, 4);
+}
+
+#else
+
+void __show_trace(struct task_struct *task, unsigned long *stack)
 {
 	const unsigned cpu = safe_smp_processor_id();
 	unsigned long *irqstack_end = (unsigned long *)cpu_pda(cpu)->irqstackptr;
@@ -319,6 +343,13 @@ void show_trace(unsigned long *stack)
 	printk("\n");
 }
 
+#endif
+
+void show_trace(unsigned long *stack)
+{
+	__show_trace(current, stack);
+}
+
 void show_stack(struct task_struct *tsk, unsigned long * rsp)
 {
 	unsigned long *stack;
@@ -353,7 +384,7 @@ void show_stack(struct task_struct *tsk,
 		printk("%016lx ", *stack++);
 		touch_nmi_watchdog();
 	}
-	show_trace((unsigned long *)rsp);
+	__show_trace(tsk, (unsigned long *)rsp);
 }
 
 /*

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 16/61] lock validator: fown locking workaround
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (14 preceding siblings ...)
  2006-05-29 21:24 ` [patch 15/61] lock validator: x86_64: use stacktrace to generate backtraces Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-30  1:34   ` Andrew Morton
  2006-05-29 21:24 ` [patch 17/61] lock validator: sk_callback_lock workaround Ingo Molnar
                   ` (57 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

temporary workaround for the lock validator: make all uses of
f_owner.lock irq-safe. (The real solution will be to express to
the lock validator that f_owner.lock rules are to be generated
per-filesystem.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/cifs/file.c |   18 +++++++++---------
 fs/fcntl.c     |   11 +++++++----
 2 files changed, 16 insertions(+), 13 deletions(-)

Index: linux/fs/cifs/file.c
===================================================================
--- linux.orig/fs/cifs/file.c
+++ linux/fs/cifs/file.c
@@ -108,7 +108,7 @@ static inline int cifs_open_inode_helper
 			 &pCifsInode->openFileList);
 	}
 	write_unlock(&GlobalSMBSeslock);
-	write_unlock(&file->f_owner.lock);
+	write_unlock_irq(&file->f_owner.lock);
 	if (pCifsInode->clientCanCacheRead) {
 		/* we have the inode open somewhere else
 		   no need to discard cache data */
@@ -280,7 +280,7 @@ int cifs_open(struct inode *inode, struc
 		goto out;
 	}
 	pCifsFile = cifs_init_private(file->private_data, inode, file, netfid);
-	write_lock(&file->f_owner.lock);
+	write_lock_irq(&file->f_owner.lock);
 	write_lock(&GlobalSMBSeslock);
 	list_add(&pCifsFile->tlist, &pTcon->openFileList);
 
@@ -291,7 +291,7 @@ int cifs_open(struct inode *inode, struc
 					    &oplock, buf, full_path, xid);
 	} else {
 		write_unlock(&GlobalSMBSeslock);
-		write_unlock(&file->f_owner.lock);
+		write_unlock_irq(&file->f_owner.lock);
 	}
 
 	if (oplock & CIFS_CREATE_ACTION) {           
@@ -470,7 +470,7 @@ int cifs_close(struct inode *inode, stru
 	pTcon = cifs_sb->tcon;
 	if (pSMBFile) {
 		pSMBFile->closePend = TRUE;
-		write_lock(&file->f_owner.lock);
+		write_lock_irq(&file->f_owner.lock);
 		if (pTcon) {
 			/* no sense reconnecting to close a file that is
 			   already closed */
@@ -485,23 +485,23 @@ int cifs_close(struct inode *inode, stru
 					the struct would be in each open file,
 					but this should give enough time to 
 					clear the socket */
-					write_unlock(&file->f_owner.lock);
+					write_unlock_irq(&file->f_owner.lock);
 					cERROR(1,("close with pending writes"));
 					msleep(timeout);
-					write_lock(&file->f_owner.lock);
+					write_lock_irq(&file->f_owner.lock);
 					timeout *= 4;
 				} 
-				write_unlock(&file->f_owner.lock);
+				write_unlock_irq(&file->f_owner.lock);
 				rc = CIFSSMBClose(xid, pTcon,
 						  pSMBFile->netfid);
-				write_lock(&file->f_owner.lock);
+				write_lock_irq(&file->f_owner.lock);
 			}
 		}
 		write_lock(&GlobalSMBSeslock);
 		list_del(&pSMBFile->flist);
 		list_del(&pSMBFile->tlist);
 		write_unlock(&GlobalSMBSeslock);
-		write_unlock(&file->f_owner.lock);
+		write_unlock_irq(&file->f_owner.lock);
 		kfree(pSMBFile->search_resume_name);
 		kfree(file->private_data);
 		file->private_data = NULL;
Index: linux/fs/fcntl.c
===================================================================
--- linux.orig/fs/fcntl.c
+++ linux/fs/fcntl.c
@@ -470,9 +470,10 @@ static void send_sigio_to_task(struct ta
 void send_sigio(struct fown_struct *fown, int fd, int band)
 {
 	struct task_struct *p;
+	unsigned long flags;
 	int pid;
 	
-	read_lock(&fown->lock);
+	read_lock_irqsave(&fown->lock, flags);
 	pid = fown->pid;
 	if (!pid)
 		goto out_unlock_fown;
@@ -490,7 +491,7 @@ void send_sigio(struct fown_struct *fown
 	}
 	read_unlock(&tasklist_lock);
  out_unlock_fown:
-	read_unlock(&fown->lock);
+	read_unlock_irqrestore(&fown->lock, flags);
 }
 
 static void send_sigurg_to_task(struct task_struct *p,
@@ -503,9 +504,10 @@ static void send_sigurg_to_task(struct t
 int send_sigurg(struct fown_struct *fown)
 {
 	struct task_struct *p;
+	unsigned long flags;
 	int pid, ret = 0;
 	
-	read_lock(&fown->lock);
+	read_lock_irqsave(&fown->lock, flags);
 	pid = fown->pid;
 	if (!pid)
 		goto out_unlock_fown;
@@ -525,7 +527,8 @@ int send_sigurg(struct fown_struct *fown
 	}
 	read_unlock(&tasklist_lock);
  out_unlock_fown:
-	read_unlock(&fown->lock);
+	read_unlock_irqrestore(&fown->lock, flags);
+
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 17/61] lock validator: sk_callback_lock workaround
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (15 preceding siblings ...)
  2006-05-29 21:24 ` [patch 16/61] lock validator: fown locking workaround Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-30  1:34   ` Andrew Morton
  2006-05-29 21:24 ` [patch 18/61] lock validator: irqtrace: core Ingo Molnar
                   ` (56 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

temporary workaround for the lock validator: make all uses of
sk_callback_lock softirq-safe. (The real solution will be to
express to the lock validator that sk_callback_lock rules are
to be generated per-address-family.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 net/core/sock.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Index: linux/net/core/sock.c
===================================================================
--- linux.orig/net/core/sock.c
+++ linux/net/core/sock.c
@@ -934,9 +934,9 @@ int sock_i_uid(struct sock *sk)
 {
 	int uid;
 
-	read_lock(&sk->sk_callback_lock);
+	read_lock_bh(&sk->sk_callback_lock);
 	uid = sk->sk_socket ? SOCK_INODE(sk->sk_socket)->i_uid : 0;
-	read_unlock(&sk->sk_callback_lock);
+	read_unlock_bh(&sk->sk_callback_lock);
 	return uid;
 }
 
@@ -944,9 +944,9 @@ unsigned long sock_i_ino(struct sock *sk
 {
 	unsigned long ino;
 
-	read_lock(&sk->sk_callback_lock);
+	read_lock_bh(&sk->sk_callback_lock);
 	ino = sk->sk_socket ? SOCK_INODE(sk->sk_socket)->i_ino : 0;
-	read_unlock(&sk->sk_callback_lock);
+	read_unlock_bh(&sk->sk_callback_lock);
 	return ino;
 }
 
@@ -1306,33 +1306,33 @@ ssize_t sock_no_sendpage(struct socket *
 
 static void sock_def_wakeup(struct sock *sk)
 {
-	read_lock(&sk->sk_callback_lock);
+	read_lock_bh(&sk->sk_callback_lock);
 	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
 		wake_up_interruptible_all(sk->sk_sleep);
-	read_unlock(&sk->sk_callback_lock);
+	read_unlock_bh(&sk->sk_callback_lock);
 }
 
 static void sock_def_error_report(struct sock *sk)
 {
-	read_lock(&sk->sk_callback_lock);
+	read_lock_bh(&sk->sk_callback_lock);
 	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
 		wake_up_interruptible(sk->sk_sleep);
 	sk_wake_async(sk,0,POLL_ERR); 
-	read_unlock(&sk->sk_callback_lock);
+	read_unlock_bh(&sk->sk_callback_lock);
 }
 
 static void sock_def_readable(struct sock *sk, int len)
 {
-	read_lock(&sk->sk_callback_lock);
+	read_lock_bh(&sk->sk_callback_lock);
 	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
 		wake_up_interruptible(sk->sk_sleep);
 	sk_wake_async(sk,1,POLL_IN);
-	read_unlock(&sk->sk_callback_lock);
+	read_unlock_bh(&sk->sk_callback_lock);
 }
 
 static void sock_def_write_space(struct sock *sk)
 {
-	read_lock(&sk->sk_callback_lock);
+	read_lock_bh(&sk->sk_callback_lock);
 
 	/* Do not wake up a writer until he can make "significant"
 	 * progress.  --DaveM
@@ -1346,7 +1346,7 @@ static void sock_def_write_space(struct 
 			sk_wake_async(sk, 2, POLL_OUT);
 	}
 
-	read_unlock(&sk->sk_callback_lock);
+	read_unlock_bh(&sk->sk_callback_lock);
 }
 
 static void sock_def_destruct(struct sock *sk)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 18/61] lock validator: irqtrace: core
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (16 preceding siblings ...)
  2006-05-29 21:24 ` [patch 17/61] lock validator: sk_callback_lock workaround Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-30  1:34   ` Andrew Morton
  2006-05-29 21:24 ` [patch 19/61] lock validator: irqtrace: cleanup: include/asm-i386/irqflags.h Ingo Molnar
                   ` (55 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

accurate hard-IRQ-flags state tracing. This allows us to attach
extra functionality to IRQ flags on/off events (such as trace-on/off).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/entry.S       |   25 ++++++-
 arch/i386/kernel/irq.c         |    6 +
 arch/x86_64/ia32/ia32entry.S   |   19 +++++
 arch/x86_64/kernel/entry.S     |   54 +++++++++++++++-
 arch/x86_64/kernel/irq.c       |    4 -
 include/asm-i386/irqflags.h    |   56 ++++++++++++++++
 include/asm-i386/spinlock.h    |    5 +
 include/asm-i386/system.h      |   20 -----
 include/asm-powerpc/irqflags.h |   31 +++++++++
 include/asm-x86_64/irqflags.h  |   54 ++++++++++++++++
 include/asm-x86_64/system.h    |   38 -----------
 include/linux/hardirq.h        |   13 +++
 include/linux/init_task.h      |    1 
 include/linux/interrupt.h      |   11 +--
 include/linux/sched.h          |   15 ++++
 include/linux/trace_irqflags.h |   87 ++++++++++++++++++++++++++
 kernel/fork.c                  |   20 +++++
 kernel/sched.c                 |    4 -
 kernel/softirq.c               |  137 +++++++++++++++++++++++++++++++++++------
 lib/locking-selftest.c         |    3 
 20 files changed, 513 insertions(+), 90 deletions(-)

Index: linux/arch/i386/kernel/entry.S
===================================================================
--- linux.orig/arch/i386/kernel/entry.S
+++ linux/arch/i386/kernel/entry.S
@@ -43,6 +43,7 @@
 #include <linux/config.h>
 #include <linux/linkage.h>
 #include <asm/thread_info.h>
+#include <asm/irqflags.h>
 #include <asm/errno.h>
 #include <asm/segment.h>
 #include <asm/smp.h>
@@ -76,7 +77,7 @@ NT_MASK		= 0x00004000
 VM_MASK		= 0x00020000
 
 #ifdef CONFIG_PREEMPT
-#define preempt_stop		cli
+#define preempt_stop		cli; TRACE_IRQS_OFF
 #else
 #define preempt_stop
 #define resume_kernel		restore_nocheck
@@ -186,6 +187,10 @@ need_resched:
 ENTRY(sysenter_entry)
 	movl TSS_sysenter_esp0(%esp),%esp
 sysenter_past_esp:
+	/*
+	 * No need to follow this irqs on/off section: the syscall
+	 * disabled irqs and here we enable it straight after entry:
+	 */
 	sti
 	pushl $(__USER_DS)
 	pushl %ebp
@@ -217,6 +222,7 @@ sysenter_past_esp:
 	call *sys_call_table(,%eax,4)
 	movl %eax,EAX(%esp)
 	cli
+	TRACE_IRQS_OFF
 	movl TI_flags(%ebp), %ecx
 	testw $_TIF_ALLWORK_MASK, %cx
 	jne syscall_exit_work
@@ -224,6 +230,7 @@ sysenter_past_esp:
 	movl EIP(%esp), %edx
 	movl OLDESP(%esp), %ecx
 	xorl %ebp,%ebp
+	TRACE_IRQS_ON
 	sti
 	sysexit
 
@@ -250,6 +257,7 @@ syscall_exit:
 	cli				# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
 					# between sampling and the iret
+	TRACE_IRQS_OFF
 	movl TI_flags(%ebp), %ecx
 	testw $_TIF_ALLWORK_MASK, %cx	# current->work
 	jne syscall_exit_work
@@ -265,11 +273,14 @@ restore_all:
 	cmpl $((4 << 8) | 3), %eax
 	je ldt_ss			# returning to user-space with LDT SS
 restore_nocheck:
+	TRACE_IRQS_ON
+restore_nocheck_notrace:
 	RESTORE_REGS
 	addl $4, %esp
 1:	iret
 .section .fixup,"ax"
 iret_exc:
+	TRACE_IRQS_ON
 	sti
 	pushl $0			# no error code
 	pushl $do_iret_error
@@ -293,10 +304,12 @@ ldt_ss:
 	 * dosemu and wine happy. */
 	subl $8, %esp		# reserve space for switch16 pointer
 	cli
+	TRACE_IRQS_OFF
 	movl %esp, %eax
 	/* Set up the 16bit stack frame with switch32 pointer on top,
 	 * and a switch16 pointer on top of the current frame. */
 	call setup_x86_bogus_stack
+	TRACE_IRQS_ON
 	RESTORE_REGS
 	lss 20+4(%esp), %esp	# switch to 16bit stack
 1:	iret
@@ -315,6 +328,7 @@ work_resched:
 	cli				# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
 					# between sampling and the iret
+	TRACE_IRQS_OFF
 	movl TI_flags(%ebp), %ecx
 	andl $_TIF_WORK_MASK, %ecx	# is there any work to be done other
 					# than syscall tracing?
@@ -364,6 +378,7 @@ syscall_trace_entry:
 syscall_exit_work:
 	testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl
 	jz work_pending
+	TRACE_IRQS_ON
 	sti				# could let do_syscall_trace() call
 					# schedule() instead
 	movl %esp, %eax
@@ -425,9 +440,14 @@ ENTRY(irq_entries_start)
 vector=vector+1
 .endr
 
+/*
+ * the CPU automatically disables interrupts when executing an IRQ vector,
+ * so IRQ-flags tracing has to follow that:
+ */
 	ALIGN
 common_interrupt:
 	SAVE_ALL
+	TRACE_IRQS_OFF
 	movl %esp,%eax
 	call do_IRQ
 	jmp ret_from_intr
@@ -436,6 +456,7 @@ common_interrupt:
 ENTRY(name)				\
 	pushl $~(nr);			\
 	SAVE_ALL			\
+	TRACE_IRQS_OFF			\
 	movl %esp,%eax;			\
 	call smp_/**/name;		\
 	jmp ret_from_intr;
@@ -565,7 +586,7 @@ nmi_stack_correct:
 	xorl %edx,%edx		# zero error code
 	movl %esp,%eax		# pt_regs pointer
 	call do_nmi
-	jmp restore_all
+	jmp restore_nocheck_notrace
 
 nmi_stack_fixup:
 	FIX_STACK(12,nmi_stack_correct, 1)
Index: linux/arch/i386/kernel/irq.c
===================================================================
--- linux.orig/arch/i386/kernel/irq.c
+++ linux/arch/i386/kernel/irq.c
@@ -147,7 +147,7 @@ void irq_ctx_init(int cpu)
 	irqctx->tinfo.task              = NULL;
 	irqctx->tinfo.exec_domain       = NULL;
 	irqctx->tinfo.cpu               = cpu;
-	irqctx->tinfo.preempt_count     = SOFTIRQ_OFFSET;
+	irqctx->tinfo.preempt_count     = 0;
 	irqctx->tinfo.addr_limit        = MAKE_MM_SEG(0);
 
 	softirq_ctx[cpu] = irqctx;
@@ -192,6 +192,10 @@ asmlinkage void do_softirq(void)
 			: "0"(isp)
 			: "memory", "cc", "edx", "ecx", "eax"
 		);
+		/*
+		 * Shouldnt happen, we returned above if in_interrupt():
+	 	 */
+		WARN_ON_ONCE(softirq_count());
 	}
 
 	local_irq_restore(flags);
Index: linux/arch/x86_64/ia32/ia32entry.S
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32entry.S
+++ linux/arch/x86_64/ia32/ia32entry.S
@@ -13,6 +13,7 @@
 #include <asm/thread_info.h>	
 #include <asm/segment.h>
 #include <asm/vsyscall32.h>
+#include <asm/irqflags.h>
 #include <linux/linkage.h>
 
 #define IA32_NR_syscalls ((ia32_syscall_end - ia32_sys_call_table)/8)
@@ -75,6 +76,10 @@ ENTRY(ia32_sysenter_target)
 	swapgs
 	movq	%gs:pda_kernelstack, %rsp
 	addq	$(PDA_STACKOFFSET),%rsp	
+	/*
+	 * No need to follow this irqs on/off section: the syscall
+	 * disabled irqs, here we enable it straight after entry:
+	 */
 	sti	
  	movl	%ebp,%ebp		/* zero extension */
 	pushq	$__USER32_DS
@@ -118,6 +123,7 @@ sysenter_do_call:	
 	movq	%rax,RAX-ARGOFFSET(%rsp)
 	GET_THREAD_INFO(%r10)
 	cli
+	TRACE_IRQS_OFF
 	testl	$_TIF_ALLWORK_MASK,threadinfo_flags(%r10)
 	jnz	int_ret_from_sys_call
 	andl    $~TS_COMPAT,threadinfo_status(%r10)
@@ -132,6 +138,7 @@ sysenter_do_call:	
 	CFI_REGISTER rsp,rcx
 	movl	$VSYSCALL32_SYSEXIT,%edx	/* User %eip */
 	CFI_REGISTER rip,rdx
+	TRACE_IRQS_ON
 	swapgs
 	sti		/* sti only takes effect after the next instruction */
 	/* sysexit */
@@ -186,6 +193,10 @@ ENTRY(ia32_cstar_target)
 	movl	%esp,%r8d
 	CFI_REGISTER	rsp,r8
 	movq	%gs:pda_kernelstack,%rsp
+	/*
+	 * No need to follow this irqs on/off section: the syscall
+	 * disabled irqs and here we enable it straight after entry:
+	 */
 	sti
 	SAVE_ARGS 8,1,1
 	movl 	%eax,%eax	/* zero extension */
@@ -220,6 +231,7 @@ cstar_do_call:	
 	movq %rax,RAX-ARGOFFSET(%rsp)
 	GET_THREAD_INFO(%r10)
 	cli
+	TRACE_IRQS_OFF
 	testl $_TIF_ALLWORK_MASK,threadinfo_flags(%r10)
 	jnz  int_ret_from_sys_call
 	andl $~TS_COMPAT,threadinfo_status(%r10)
@@ -228,6 +240,7 @@ cstar_do_call:	
 	CFI_REGISTER rip,rcx
 	movl EFLAGS-ARGOFFSET(%rsp),%r11d	
 	/*CFI_REGISTER rflags,r11*/
+	TRACE_IRQS_ON
 	movl RSP-ARGOFFSET(%rsp),%esp
 	CFI_RESTORE rsp
 	swapgs
@@ -286,7 +299,11 @@ ENTRY(ia32_syscall)
 	/*CFI_REL_OFFSET	rflags,EFLAGS-RIP*/
 	/*CFI_REL_OFFSET	cs,CS-RIP*/
 	CFI_REL_OFFSET	rip,RIP-RIP
-	swapgs	
+	swapgs
+	/*
+	 * No need to follow this irqs on/off section: the syscall
+	 * disabled irqs and here we enable it straight after entry:
+	 */
 	sti
 	movl %eax,%eax
 	pushq %rax
Index: linux/arch/x86_64/kernel/entry.S
===================================================================
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -42,13 +42,14 @@
 #include <asm/thread_info.h>
 #include <asm/hw_irq.h>
 #include <asm/page.h>
+#include <asm/irqflags.h>
 
 	.code64
 
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
 #endif	
-	
+
 /*
  * C code is not supposed to know about undefined top of stack. Every time 
  * a C function with an pt_regs argument is called from the SYSCALL based 
@@ -195,6 +196,10 @@ ENTRY(system_call)
 	swapgs
 	movq	%rsp,%gs:pda_oldrsp 
 	movq	%gs:pda_kernelstack,%rsp
+	/*
+	 * No need to follow this irqs off/on section - it's straight
+	 * and short:
+	 */
 	sti					
 	SAVE_ARGS 8,1
 	movq  %rax,ORIG_RAX-ARGOFFSET(%rsp) 
@@ -220,10 +225,15 @@ ret_from_sys_call:
 sysret_check:		
 	GET_THREAD_INFO(%rcx)
 	cli
+	TRACE_IRQS_OFF
 	movl threadinfo_flags(%rcx),%edx
 	andl %edi,%edx
 	CFI_REMEMBER_STATE
 	jnz  sysret_careful 
+	/*
+	 * sysretq will re-enable interrupts:
+	 */
+	TRACE_IRQS_ON
 	movq RIP-ARGOFFSET(%rsp),%rcx
 	CFI_REGISTER	rip,rcx
 	RESTORE_ARGS 0,-ARG_SKIP,1
@@ -238,6 +248,7 @@ sysret_careful:
 	CFI_RESTORE_STATE
 	bt $TIF_NEED_RESCHED,%edx
 	jnc sysret_signal
+	TRACE_IRQS_ON
 	sti
 	pushq %rdi
 	CFI_ADJUST_CFA_OFFSET 8
@@ -248,6 +259,7 @@ sysret_careful:
 
 	/* Handle a signal */ 
 sysret_signal:
+	TRACE_IRQS_ON
 	sti
 	testl $(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP),%edx
 	jz    1f
@@ -262,6 +274,7 @@ sysret_signal:
 	/* Use IRET because user could have changed frame. This
 	   works because ptregscall_common has called FIXUP_TOP_OF_STACK. */
 	cli
+	TRACE_IRQS_OFF
 	jmp int_with_check
 	
 badsys:
@@ -315,6 +328,7 @@ ENTRY(int_ret_from_sys_call)
 	CFI_REL_OFFSET	r10,R10-ARGOFFSET
 	CFI_REL_OFFSET	r11,R11-ARGOFFSET
 	cli
+	TRACE_IRQS_OFF
 	testl $3,CS-ARGOFFSET(%rsp)
 	je retint_restore_args
 	movl $_TIF_ALLWORK_MASK,%edi
@@ -333,6 +347,7 @@ int_with_check:
 int_careful:
 	bt $TIF_NEED_RESCHED,%edx
 	jnc  int_very_careful
+	TRACE_IRQS_ON
 	sti
 	pushq %rdi
 	CFI_ADJUST_CFA_OFFSET 8
@@ -340,10 +355,12 @@ int_careful:
 	popq %rdi
 	CFI_ADJUST_CFA_OFFSET -8
 	cli
+	TRACE_IRQS_OFF
 	jmp int_with_check
 
 	/* handle signals and tracing -- both require a full stack frame */
 int_very_careful:
+	TRACE_IRQS_ON
 	sti
 	SAVE_REST
 	/* Check for syscall exit trace */	
@@ -357,6 +374,7 @@ int_very_careful:
 	CFI_ADJUST_CFA_OFFSET -8
 	andl $~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edi
 	cli
+	TRACE_IRQS_OFF
 	jmp int_restore_rest
 	
 int_signal:
@@ -369,6 +387,7 @@ int_signal:
 int_restore_rest:
 	RESTORE_REST
 	cli
+	TRACE_IRQS_OFF
 	jmp int_with_check
 	CFI_ENDPROC
 END(int_ret_from_sys_call)
@@ -501,6 +520,11 @@ END(stub_rt_sigreturn)
 #ifndef CONFIG_DEBUG_INFO
 	CFI_ADJUST_CFA_OFFSET	8
 #endif
+	/*
+	 * We entered an interrupt context - irqs are off:
+	 */
+	TRACE_IRQS_OFF
+
 	call \func
 	.endm
 
@@ -514,6 +538,7 @@ ret_from_intr:
 	CFI_ADJUST_CFA_OFFSET	-8
 #endif
 	cli	
+	TRACE_IRQS_OFF
 	decl %gs:pda_irqcount
 #ifdef CONFIG_DEBUG_INFO
 	movq RBP(%rdi),%rbp
@@ -538,9 +563,21 @@ retint_check:
 	CFI_REMEMBER_STATE
 	jnz  retint_careful
 retint_swapgs:	 	
+	/*
+	 * The iretq will re-enable interrupts:
+	 */
+	cli
+	TRACE_IRQS_ON
 	swapgs 
+	jmp restore_args
+
 retint_restore_args:				
 	cli
+	/*
+	 * The iretq will re-enable interrupts:
+	 */
+	TRACE_IRQS_ON
+restore_args:
 	RESTORE_ARGS 0,8,0						
 iret_label:	
 	iretq
@@ -553,6 +590,7 @@ iret_label:	
 	/* running with kernel gs */
 bad_iret:
 	movq $11,%rdi	/* SIGSEGV */
+	TRACE_IRQS_ON
 	sti
 	jmp do_exit			
 	.previous	
@@ -562,6 +600,7 @@ retint_careful:
 	CFI_RESTORE_STATE
 	bt    $TIF_NEED_RESCHED,%edx
 	jnc   retint_signal
+	TRACE_IRQS_ON
 	sti
 	pushq %rdi
 	CFI_ADJUST_CFA_OFFSET	8
@@ -570,11 +609,13 @@ retint_careful:
 	CFI_ADJUST_CFA_OFFSET	-8
 	GET_THREAD_INFO(%rcx)
 	cli
+	TRACE_IRQS_OFF
 	jmp retint_check
 	
 retint_signal:
 	testl $(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP),%edx
 	jz    retint_swapgs
+	TRACE_IRQS_ON
 	sti
 	SAVE_REST
 	movq $-1,ORIG_RAX(%rsp) 			
@@ -583,6 +624,7 @@ retint_signal:
 	call do_notify_resume
 	RESTORE_REST
 	cli
+	TRACE_IRQS_OFF
 	movl $_TIF_NEED_RESCHED,%edi
 	GET_THREAD_INFO(%rcx)
 	jmp retint_check
@@ -714,6 +756,7 @@ END(spurious_interrupt)
 	addq	$EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
 	.endif
 	cli
+	TRACE_IRQS_OFF
 	.endm
 	
 /*
@@ -771,6 +814,7 @@ error_exit:		
 	movl %ebx,%eax		
 	RESTORE_REST
 	cli
+	TRACE_IRQS_OFF
 	GET_THREAD_INFO(%rcx)	
 	testl %eax,%eax
 	jne  retint_kernel
@@ -778,6 +822,10 @@ error_exit:		
 	movl  $_TIF_WORK_MASK,%edi
 	andl  %edi,%edx
 	jnz  retint_careful
+	/*
+	 * The iret will restore flags:
+	 */
+	TRACE_IRQS_ON
 	swapgs 
 	RESTORE_ARGS 0,8,0						
 	jmp iret_label
@@ -980,16 +1028,20 @@ paranoid_userspace:	
 	testl $_TIF_NEED_RESCHED,%ebx
 	jnz paranoid_schedule
 	movl %ebx,%edx			/* arg3: thread flags */
+	TRACE_IRQS_ON
 	sti
 	xorl %esi,%esi 			/* arg2: oldset */
 	movq %rsp,%rdi 			/* arg1: &pt_regs */
 	call do_notify_resume
 	cli
+	TRACE_IRQS_OFF
 	jmp paranoid_userspace
 paranoid_schedule:
+	TRACE_IRQS_ON
 	sti
 	call schedule
 	cli
+	TRACE_IRQS_OFF
 	jmp paranoid_userspace
 	CFI_ENDPROC
 END(nmi)
Index: linux/arch/x86_64/kernel/irq.c
===================================================================
--- linux.orig/arch/x86_64/kernel/irq.c
+++ linux/arch/x86_64/kernel/irq.c
@@ -145,8 +145,10 @@ asmlinkage void do_softirq(void)
  	local_irq_save(flags);
  	pending = local_softirq_pending();
  	/* Switch to interrupt stack */
- 	if (pending)
+ 	if (pending) {
 		call_softirq();
+		WARN_ON_ONCE(softirq_count());
+	}
  	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(do_softirq);
Index: linux/include/asm-i386/irqflags.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/irqflags.h
@@ -0,0 +1,56 @@
+/*
+ * include/asm-i386/irqflags.h
+ *
+ * IRQ flags handling
+ *
+ * This file gets included from lowlevel asm headers too, to provide
+ * wrapped versions of the local_irq_*() APIs, based on the
+ * raw_local_irq_*() macros from the lowlevel headers.
+ */
+#ifndef _ASM_IRQFLAGS_H
+#define _ASM_IRQFLAGS_H
+
+#define raw_local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
+#define raw_local_irq_restore(x) do { typecheck(unsigned long,x); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
+#define raw_local_irq_disable()	__asm__ __volatile__("cli": : :"memory")
+#define raw_local_irq_enable()	__asm__ __volatile__("sti": : :"memory")
+/* used in the idle loop; sti takes one instruction cycle to complete */
+#define raw_safe_halt()		__asm__ __volatile__("sti; hlt": : :"memory")
+/* used when interrupts are already enabled or to shutdown the processor */
+#define halt()			__asm__ __volatile__("hlt": : :"memory")
+
+#define raw_irqs_disabled_flags(flags)	(!((flags) & (1<<9)))
+
+/* For spinlocks etc */
+#define raw_local_irq_save(x)	__asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory")
+
+/*
+ * Do the CPU's IRQ-state tracing from assembly code. We call a
+ * C function, so save all the C-clobbered registers:
+ */
+#ifdef CONFIG_TRACE_IRQFLAGS
+
+# define TRACE_IRQS_ON				\
+	pushl %eax;				\
+	pushl %ecx;				\
+	pushl %edx;				\
+	call trace_hardirqs_on;			\
+	popl %edx;				\
+	popl %ecx;				\
+	popl %eax;
+
+# define TRACE_IRQS_OFF				\
+	pushl %eax;				\
+	pushl %ecx;				\
+	pushl %edx;				\
+	call trace_hardirqs_off;		\
+	popl %edx;				\
+	popl %ecx;				\
+	popl %eax;
+
+#else
+# define TRACE_IRQS_ON
+# define TRACE_IRQS_OFF
+#endif
+
+#endif
Index: linux/include/asm-i386/spinlock.h
===================================================================
--- linux.orig/include/asm-i386/spinlock.h
+++ linux/include/asm-i386/spinlock.h
@@ -31,6 +31,11 @@
 	"jmp 1b\n" \
 	"3:\n\t"
 
+/*
+ * NOTE: there's an irqs-on section here, which normally would have to be
+ * irq-traced, but on CONFIG_TRACE_IRQFLAGS we never use
+ * __raw_spin_lock_string_flags().
+ */
 #define __raw_spin_lock_string_flags \
 	"\n1:\t" \
 	"lock ; decb %0\n\t" \
Index: linux/include/asm-i386/system.h
===================================================================
--- linux.orig/include/asm-i386/system.h
+++ linux/include/asm-i386/system.h
@@ -456,25 +456,7 @@ static inline unsigned long long __cmpxc
 
 #define set_wmb(var, value) do { var = value; wmb(); } while (0)
 
-/* interrupt control.. */
-#define local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
-#define local_irq_restore(x) 	do { typecheck(unsigned long,x); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
-#define local_irq_disable() 	__asm__ __volatile__("cli": : :"memory")
-#define local_irq_enable()	__asm__ __volatile__("sti": : :"memory")
-/* used in the idle loop; sti takes one instruction cycle to complete */
-#define safe_halt()		__asm__ __volatile__("sti; hlt": : :"memory")
-/* used when interrupts are already enabled or to shutdown the processor */
-#define halt()			__asm__ __volatile__("hlt": : :"memory")
-
-#define irqs_disabled()			\
-({					\
-	unsigned long flags;		\
-	local_save_flags(flags);	\
-	!(flags & (1<<9));		\
-})
-
-/* For spinlocks etc */
-#define local_irq_save(x)	__asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory")
+#include <linux/trace_irqflags.h>
 
 /*
  * disable hlt during certain critical i/o operations
Index: linux/include/asm-powerpc/irqflags.h
===================================================================
--- /dev/null
+++ linux/include/asm-powerpc/irqflags.h
@@ -0,0 +1,31 @@
+/*
+ * include/asm-powerpc/irqflags.h
+ *
+ * IRQ flags handling
+ *
+ * This file gets included from lowlevel asm headers too, to provide
+ * wrapped versions of the local_irq_*() APIs, based on the
+ * raw_local_irq_*() macros from the lowlevel headers.
+ */
+#ifndef _ASM_IRQFLAGS_H
+#define _ASM_IRQFLAGS_H
+
+/*
+ * Get definitions for raw_local_save_flags(x), etc.
+ */
+#include <asm-powerpc/hw_irq.h>
+
+/*
+ * Do the CPU's IRQ-state tracing from assembly code. We call a
+ * C function, so save all the C-clobbered registers:
+ */
+#ifdef CONFIG_TRACE_IRQFLAGS
+
+#error No support on PowerPC yet for CONFIG_TRACE_IRQFLAGS
+
+#else
+# define TRACE_IRQS_ON
+# define TRACE_IRQS_OFF
+#endif
+
+#endif
Index: linux/include/asm-x86_64/irqflags.h
===================================================================
--- /dev/null
+++ linux/include/asm-x86_64/irqflags.h
@@ -0,0 +1,54 @@
+/*
+ * include/asm-x86_64/irqflags.h
+ *
+ * IRQ flags handling
+ *
+ * This file gets included from lowlevel asm headers too, to provide
+ * wrapped versions of the local_irq_*() APIs, based on the
+ * raw_local_irq_*() macros from the lowlevel headers.
+ */
+#ifndef _ASM_IRQFLAGS_H
+#define _ASM_IRQFLAGS_H
+
+/* interrupt control.. */
+#define raw_local_save_flags(x)	do { warn_if_not_ulong(x); __asm__ __volatile__("# save_flags \n\t pushfq ; popq %q0":"=g" (x): /* no input */ :"memory"); } while (0)
+#define raw_local_irq_restore(x) 	__asm__ __volatile__("# restore_flags \n\t pushq %0 ; popfq": /* no output */ :"g" (x):"memory", "cc")
+
+#ifdef CONFIG_X86_VSMP
+/* Interrupt control for VSMP  architecture */
+#define raw_local_irq_disable()	do { unsigned long flags; raw_local_save_flags(flags); raw_local_irq_restore((flags & ~(1 << 9)) | (1 << 18)); } while (0)
+#define raw_local_irq_enable()	do { unsigned long flags; raw_local_save_flags(flags); raw_local_irq_restore((flags | (1 << 9)) & ~(1 << 18)); } while (0)
+
+#define raw_irqs_disabled_flags(flags)	\
+({						\
+	(flags & (1<<18)) || !(flags & (1<<9));	\
+})
+
+/* For spinlocks etc */
+#define raw_local_irq_save(x)	do { raw_local_save_flags(x); raw_local_irq_restore((x & ~(1 << 9)) | (1 << 18)); } while (0)
+#else  /* CONFIG_X86_VSMP */
+#define raw_local_irq_disable() 	__asm__ __volatile__("cli": : :"memory")
+#define raw_local_irq_enable()	__asm__ __volatile__("sti": : :"memory")
+
+#define raw_irqs_disabled_flags(flags)	\
+({						\
+	!(flags & (1<<9));			\
+})
+
+/* For spinlocks etc */
+#define raw_local_irq_save(x) 	do { warn_if_not_ulong(x); __asm__ __volatile__("# raw_local_irq_save \n\t pushfq ; popq %0 ; cli":"=g" (x): /* no input */ :"memory"); } while (0)
+#endif
+
+#define raw_irqs_disabled()			\
+({						\
+	unsigned long flags;			\
+	raw_local_save_flags(flags);		\
+	raw_irqs_disabled_flags(flags);		\
+})
+
+/* used in the idle loop; sti takes one instruction cycle to complete */
+#define raw_safe_halt()	__asm__ __volatile__("sti; hlt": : :"memory")
+/* used when interrupts are already enabled or to shutdown the processor */
+#define halt()			__asm__ __volatile__("hlt": : :"memory")
+
+#endif
Index: linux/include/asm-x86_64/system.h
===================================================================
--- linux.orig/include/asm-x86_64/system.h
+++ linux/include/asm-x86_64/system.h
@@ -244,43 +244,7 @@ static inline unsigned long __cmpxchg(vo
 
 #define warn_if_not_ulong(x) do { unsigned long foo; (void) (&(x) == &foo); } while (0)
 
-/* interrupt control.. */
-#define local_save_flags(x)	do { warn_if_not_ulong(x); __asm__ __volatile__("# save_flags \n\t pushfq ; popq %q0":"=g" (x): /* no input */ :"memory"); } while (0)
-#define local_irq_restore(x) 	__asm__ __volatile__("# restore_flags \n\t pushq %0 ; popfq": /* no output */ :"g" (x):"memory", "cc")
-
-#ifdef CONFIG_X86_VSMP
-/* Interrupt control for VSMP  architecture */
-#define local_irq_disable()	do { unsigned long flags; local_save_flags(flags); local_irq_restore((flags & ~(1 << 9)) | (1 << 18)); } while (0)
-#define local_irq_enable()	do { unsigned long flags; local_save_flags(flags); local_irq_restore((flags | (1 << 9)) & ~(1 << 18)); } while (0)
-
-#define irqs_disabled()					\
-({							\
-	unsigned long flags;				\
-	local_save_flags(flags);			\
-	(flags & (1<<18)) || !(flags & (1<<9));		\
-})
-
-/* For spinlocks etc */
-#define local_irq_save(x)	do { local_save_flags(x); local_irq_restore((x & ~(1 << 9)) | (1 << 18)); } while (0)
-#else  /* CONFIG_X86_VSMP */
-#define local_irq_disable() 	__asm__ __volatile__("cli": : :"memory")
-#define local_irq_enable()	__asm__ __volatile__("sti": : :"memory")
-
-#define irqs_disabled()			\
-({					\
-	unsigned long flags;		\
-	local_save_flags(flags);	\
-	!(flags & (1<<9));		\
-})
-
-/* For spinlocks etc */
-#define local_irq_save(x) 	do { warn_if_not_ulong(x); __asm__ __volatile__("# local_irq_save \n\t pushfq ; popq %0 ; cli":"=g" (x): /* no input */ :"memory"); } while (0)
-#endif
-
-/* used in the idle loop; sti takes one instruction cycle to complete */
-#define safe_halt()		__asm__ __volatile__("sti; hlt": : :"memory")
-/* used when interrupts are already enabled or to shutdown the processor */
-#define halt()			__asm__ __volatile__("hlt": : :"memory")
+#include <linux/trace_irqflags.h>
 
 void cpu_idle_wait(void);
 
Index: linux/include/linux/hardirq.h
===================================================================
--- linux.orig/include/linux/hardirq.h
+++ linux/include/linux/hardirq.h
@@ -87,7 +87,11 @@ extern void synchronize_irq(unsigned int
 #endif
 
 #define nmi_enter()		irq_enter()
-#define nmi_exit()		sub_preempt_count(HARDIRQ_OFFSET)
+#define nmi_exit()					\
+	do {						\
+		sub_preempt_count(HARDIRQ_OFFSET);	\
+		trace_hardirq_exit();			\
+	} while (0)
 
 struct task_struct;
 
@@ -97,10 +101,17 @@ static inline void account_system_vtime(
 }
 #endif
 
+/*
+ * It is safe to do non-atomic ops on ->hardirq_context,
+ * because NMI handlers may not preempt and the ops are
+ * always balanced, so the interrupted value of ->hardirq_context
+ * will always be restored.
+ */
 #define irq_enter()					\
 	do {						\
 		account_system_vtime(current);		\
 		add_preempt_count(HARDIRQ_OFFSET);	\
+		trace_hardirq_enter();			\
 	} while (0)
 
 extern void irq_exit(void);
Index: linux/include/linux/init_task.h
===================================================================
--- linux.orig/include/linux/init_task.h
+++ linux/include/linux/init_task.h
@@ -133,6 +133,7 @@ extern struct group_info init_groups;
 	.journal_info	= NULL,						\
 	.cpu_timers	= INIT_CPU_TIMERS(tsk.cpu_timers),		\
 	.fs_excl	= ATOMIC_INIT(0),				\
+ 	INIT_TRACE_IRQFLAGS						\
 }
 
 
Index: linux/include/linux/interrupt.h
===================================================================
--- linux.orig/include/linux/interrupt.h
+++ linux/include/linux/interrupt.h
@@ -10,6 +10,7 @@
 #include <linux/irqreturn.h>
 #include <linux/hardirq.h>
 #include <linux/sched.h>
+#include <linux/trace_irqflags.h>
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
 #include <asm/system.h>
@@ -72,13 +73,11 @@ static inline void __deprecated save_and
 #define save_and_cli(x)	save_and_cli(&x)
 #endif /* CONFIG_SMP */
 
-/* SoftIRQ primitives.  */
-#define local_bh_disable() \
-		do { add_preempt_count(SOFTIRQ_OFFSET); barrier(); } while (0)
-#define __local_bh_enable() \
-		do { barrier(); sub_preempt_count(SOFTIRQ_OFFSET); } while (0)
-
+extern void local_bh_disable(void);
+extern void __local_bh_enable(void);
+extern void _local_bh_enable(void);
 extern void local_bh_enable(void);
+extern void local_bh_enable_ip(unsigned long ip);
 
 /* PLEASE, avoid to allocate new softirqs, if you need not _really_ high
    frequency threaded job scheduling. For almost all the purposes
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -916,6 +916,21 @@ struct task_struct {
 	/* mutex deadlock detection */
 	struct mutex_waiter *blocked_on;
 #endif
+#ifdef CONFIG_TRACE_IRQFLAGS
+	unsigned int irq_events;
+	int hardirqs_enabled;
+	unsigned long hardirq_enable_ip;
+	unsigned int hardirq_enable_event;
+	unsigned long hardirq_disable_ip;
+	unsigned int hardirq_disable_event;
+	int softirqs_enabled;
+	unsigned long softirq_disable_ip;
+	unsigned int softirq_disable_event;
+	unsigned long softirq_enable_ip;
+	unsigned int softirq_enable_event;
+	int hardirq_context;
+	int softirq_context;
+#endif
 
 /* journalling filesystem info */
 	void *journal_info;
Index: linux/include/linux/trace_irqflags.h
===================================================================
--- /dev/null
+++ linux/include/linux/trace_irqflags.h
@@ -0,0 +1,87 @@
+/*
+ * include/linux/trace_irqflags.h
+ *
+ * IRQ flags tracing: follow the state of the hardirq and softirq flags and
+ * provide callbacks for transitions between ON and OFF states.
+ *
+ * This file gets included from lowlevel asm headers too, to provide
+ * wrapped versions of the local_irq_*() APIs, based on the
+ * raw_local_irq_*() macros from the lowlevel headers.
+ */
+#ifndef _LINUX_TRACE_IRQFLAGS_H
+#define _LINUX_TRACE_IRQFLAGS_H
+
+#include <asm/irqflags.h>
+
+/*
+ * The local_irq_*() APIs are equal to the raw_local_irq*()
+ * if !TRACE_IRQFLAGS.
+ */
+#ifdef CONFIG_TRACE_IRQFLAGS
+  extern void trace_hardirqs_on(void);
+  extern void trace_hardirqs_off(void);
+  extern void trace_softirqs_on(unsigned long ip);
+  extern void trace_softirqs_off(unsigned long ip);
+# define trace_hardirq_context(p)	((p)->hardirq_context)
+# define trace_softirq_context(p)	((p)->softirq_context)
+# define trace_hardirqs_enabled(p)	((p)->hardirqs_enabled)
+# define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
+# define trace_hardirq_enter()	do { current->hardirq_context++; } while (0)
+# define trace_hardirq_exit()	do { current->hardirq_context--; } while (0)
+# define trace_softirq_enter()	do { current->softirq_context++; } while (0)
+# define trace_softirq_exit()	do { current->softirq_context--; } while (0)
+# define INIT_TRACE_IRQFLAGS	.softirqs_enabled = 1,
+
+#else
+# define trace_hardirqs_on()		do { } while (0)
+# define trace_hardirqs_off()		do { } while (0)
+# define trace_softirqs_on(ip)		do { } while (0)
+# define trace_softirqs_off(ip)		do { } while (0)
+# define trace_hardirq_context(p)	0
+# define trace_softirq_context(p)	0
+# define trace_hardirqs_enabled(p)	0
+# define trace_softirqs_enabled(p)	0
+# define trace_hardirq_enter()		do { } while (0)
+# define trace_hardirq_exit()		do { } while (0)
+# define trace_softirq_enter()		do { } while (0)
+# define trace_softirq_exit()		do { } while (0)
+# define INIT_TRACE_IRQFLAGS
+#endif
+
+#define local_irq_enable() \
+	do { trace_hardirqs_on(); raw_local_irq_enable(); } while (0)
+#define local_irq_disable() \
+	do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0)
+#define local_irq_save(flags) \
+	do { raw_local_irq_save(flags); trace_hardirqs_off(); } while (0)
+
+#define local_irq_restore(flags)				\
+	do {							\
+		if (raw_irqs_disabled_flags(flags)) {		\
+			raw_local_irq_restore(flags);		\
+			trace_hardirqs_off();			\
+		} else {					\
+			trace_hardirqs_on();			\
+			raw_local_irq_restore(flags);		\
+		}						\
+	} while (0)
+
+#define safe_halt()						\
+	do {							\
+		trace_hardirqs_on();				\
+		raw_safe_halt();				\
+	} while (0)
+
+#define local_save_flags(flags)		raw_local_save_flags(flags)
+
+#define irqs_disabled()						\
+({								\
+	unsigned long flags;					\
+								\
+	raw_local_save_flags(flags);				\
+	raw_irqs_disabled_flags(flags);				\
+})
+
+#define irqs_disabled_flags(flags)	raw_irqs_disabled_flags(flags)
+
+#endif
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -970,6 +970,10 @@ static task_t *copy_process(unsigned lon
 	if (!p)
 		goto fork_out;
 
+#ifdef CONFIG_TRACE_IRQFLAGS
+	DEBUG_WARN_ON(!p->hardirqs_enabled);
+	DEBUG_WARN_ON(!p->softirqs_enabled);
+#endif
 	retval = -EAGAIN;
 	if (atomic_read(&p->user->processes) >=
 			p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
@@ -1051,7 +1055,21 @@ static task_t *copy_process(unsigned lon
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
-
+#ifdef CONFIG_TRACE_IRQFLAGS
+	p->irq_events = 0;
+	p->hardirqs_enabled = 0;
+	p->hardirq_enable_ip = 0;
+	p->hardirq_enable_event = 0;
+	p->hardirq_disable_ip = _THIS_IP_;
+	p->hardirq_disable_event = 0;
+	p->softirqs_enabled = 1;
+	p->softirq_enable_ip = _THIS_IP_;
+	p->softirq_enable_event = 0;
+	p->softirq_disable_ip = 0;
+	p->softirq_disable_event = 0;
+	p->hardirq_context = 0;
+	p->softirq_context = 0;
+#endif
 	p->tgid = p->pid;
 	if (clone_flags & CLONE_THREAD)
 		p->tgid = current->tgid;
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4481,7 +4481,9 @@ int __sched cond_resched_softirq(void)
 	BUG_ON(!in_softirq());
 
 	if (need_resched()) {
-		__local_bh_enable();
+		raw_local_irq_disable();
+		_local_bh_enable();
+		raw_local_irq_enable();
 		__cond_resched();
 		local_bh_disable();
 		return 1;
Index: linux/kernel/softirq.c
===================================================================
--- linux.orig/kernel/softirq.c
+++ linux/kernel/softirq.c
@@ -62,6 +62,119 @@ static inline void wakeup_softirqd(void)
 }
 
 /*
+ * This one is for softirq.c-internal use,
+ * where hardirqs are disabled legitimately:
+ */
+static void __local_bh_disable(unsigned long ip)
+{
+	unsigned long flags;
+
+	WARN_ON_ONCE(in_irq());
+
+	raw_local_irq_save(flags);
+	add_preempt_count(SOFTIRQ_OFFSET);
+	/*
+	 * Were softirqs turned off above:
+	 */
+	if (softirq_count() == SOFTIRQ_OFFSET)
+		trace_softirqs_off(ip);
+	raw_local_irq_restore(flags);
+}
+
+void local_bh_disable(void)
+{
+	WARN_ON_ONCE(irqs_disabled());
+	__local_bh_disable((unsigned long)__builtin_return_address(0));
+}
+
+EXPORT_SYMBOL(local_bh_disable);
+
+void __local_bh_enable(void)
+{
+	WARN_ON_ONCE(in_irq());
+
+	/*
+	 * softirqs should never be enabled by __local_bh_enable(),
+	 * it always nests inside local_bh_enable() sections:
+	 */
+	WARN_ON_ONCE(softirq_count() == SOFTIRQ_OFFSET);
+
+	sub_preempt_count(SOFTIRQ_OFFSET);
+}
+
+EXPORT_SYMBOL(__local_bh_enable);
+
+/*
+ * Special-case - softirqs can safely be enabled in
+ * cond_resched_softirq(), or by __do_softirq(),
+ * without processing still-pending softirqs:
+ */
+void _local_bh_enable(void)
+{
+	WARN_ON_ONCE(in_irq());
+	WARN_ON_ONCE(!irqs_disabled());
+
+	if (softirq_count() == SOFTIRQ_OFFSET)
+		trace_softirqs_on((unsigned long)__builtin_return_address(0));
+	sub_preempt_count(SOFTIRQ_OFFSET);
+}
+
+void local_bh_enable(void)
+{
+	unsigned long flags;
+
+	WARN_ON_ONCE(in_irq());
+	WARN_ON_ONCE(irqs_disabled());
+
+	local_irq_save(flags);
+	/*
+	 * Are softirqs going to be turned on now:
+	 */
+	if (softirq_count() == SOFTIRQ_OFFSET)
+		trace_softirqs_on((unsigned long)__builtin_return_address(0));
+	/*
+	 * Keep preemption disabled until we are done with
+	 * softirq processing:
+ 	 */
+ 	sub_preempt_count(SOFTIRQ_OFFSET - 1);
+
+	if (unlikely(!in_interrupt() && local_softirq_pending()))
+		do_softirq();
+
+	dec_preempt_count();
+	local_irq_restore(flags);
+	preempt_check_resched();
+}
+EXPORT_SYMBOL(local_bh_enable);
+
+void local_bh_enable_ip(unsigned long ip)
+{
+	unsigned long flags;
+
+	WARN_ON_ONCE(in_irq());
+
+	local_irq_save(flags);
+	/*
+	 * Are softirqs going to be turned on now:
+	 */
+	if (softirq_count() == SOFTIRQ_OFFSET)
+		trace_softirqs_on(ip);
+	/*
+	 * Keep preemption disabled until we are done with
+	 * softirq processing:
+ 	 */
+ 	sub_preempt_count(SOFTIRQ_OFFSET - 1);
+
+	if (unlikely(!in_interrupt() && local_softirq_pending()))
+		do_softirq();
+
+	dec_preempt_count();
+	local_irq_restore(flags);
+	preempt_check_resched();
+}
+EXPORT_SYMBOL(local_bh_enable_ip);
+
+/*
  * We restart softirq processing MAX_SOFTIRQ_RESTART times,
  * and we fall back to softirqd after that.
  *
@@ -80,8 +193,9 @@ asmlinkage void __do_softirq(void)
 	int cpu;
 
 	pending = local_softirq_pending();
+	__local_bh_disable((unsigned long)__builtin_return_address(0));
+	trace_softirq_enter();
 
-	local_bh_disable();
 	cpu = smp_processor_id();
 restart:
 	/* Reset the pending bitmask before enabling irqs */
@@ -109,7 +223,8 @@ restart:
 	if (pending)
 		wakeup_softirqd();
 
-	__local_bh_enable();
+	trace_softirq_exit();
+	_local_bh_enable();
 }
 
 #ifndef __ARCH_HAS_DO_SOFTIRQ
@@ -136,23 +251,6 @@ EXPORT_SYMBOL(do_softirq);
 
 #endif
 
-void local_bh_enable(void)
-{
-	WARN_ON(irqs_disabled());
-	/*
-	 * Keep preemption disabled until we are done with
-	 * softirq processing:
- 	 */
- 	sub_preempt_count(SOFTIRQ_OFFSET - 1);
-
-	if (unlikely(!in_interrupt() && local_softirq_pending()))
-		do_softirq();
-
-	dec_preempt_count();
-	preempt_check_resched();
-}
-EXPORT_SYMBOL(local_bh_enable);
-
 #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
 # define invoke_softirq()	__do_softirq()
 #else
@@ -165,6 +263,7 @@ EXPORT_SYMBOL(local_bh_enable);
 void irq_exit(void)
 {
 	account_system_vtime(current);
+	trace_hardirq_exit();
 	sub_preempt_count(IRQ_EXIT_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
Index: linux/lib/locking-selftest.c
===================================================================
--- linux.orig/lib/locking-selftest.c
+++ linux/lib/locking-selftest.c
@@ -19,6 +19,7 @@
 #include <linux/kallsyms.h>
 #include <linux/interrupt.h>
 #include <linux/debug_locks.h>
+#include <linux/trace_irqflags.h>
 
 /*
  * Change this to 1 if you want to see the failure printouts:
@@ -157,9 +158,11 @@ static void init_shared_types(void)
 #define SOFTIRQ_ENTER()				\
 		local_bh_disable();		\
 		local_irq_disable();		\
+		trace_softirq_enter();		\
 		WARN_ON(!in_softirq());
 
 #define SOFTIRQ_EXIT()				\
+		trace_softirq_exit();		\
 		local_irq_enable();		\
 		local_bh_enable();
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 19/61] lock validator: irqtrace: cleanup: include/asm-i386/irqflags.h
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (17 preceding siblings ...)
  2006-05-29 21:24 ` [patch 18/61] lock validator: irqtrace: core Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-29 21:24 ` [patch 20/61] lock validator: irqtrace: cleanup: include/asm-x86_64/irqflags.h Ingo Molnar
                   ` (54 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

clean up the x86 irqflags.h file:

 - macro => inline function transformation
 - simplifications
 - style fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/asm-i386/irqflags.h |   95 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 83 insertions(+), 12 deletions(-)

Index: linux/include/asm-i386/irqflags.h
===================================================================
--- linux.orig/include/asm-i386/irqflags.h
+++ linux/include/asm-i386/irqflags.h
@@ -5,24 +5,95 @@
  *
  * This file gets included from lowlevel asm headers too, to provide
  * wrapped versions of the local_irq_*() APIs, based on the
- * raw_local_irq_*() macros from the lowlevel headers.
+ * raw_local_irq_*() functions from the lowlevel headers.
  */
 #ifndef _ASM_IRQFLAGS_H
 #define _ASM_IRQFLAGS_H
 
-#define raw_local_save_flags(x)	do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
-#define raw_local_irq_restore(x) do { typecheck(unsigned long,x); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
-#define raw_local_irq_disable()	__asm__ __volatile__("cli": : :"memory")
-#define raw_local_irq_enable()	__asm__ __volatile__("sti": : :"memory")
-/* used in the idle loop; sti takes one instruction cycle to complete */
-#define raw_safe_halt()		__asm__ __volatile__("sti; hlt": : :"memory")
-/* used when interrupts are already enabled or to shutdown the processor */
-#define halt()			__asm__ __volatile__("hlt": : :"memory")
+#ifndef __ASSEMBLY__
 
-#define raw_irqs_disabled_flags(flags)	(!((flags) & (1<<9)))
+static inline unsigned long __raw_local_save_flags(void)
+{
+	unsigned long flags;
+
+	__asm__ __volatile__(
+		"pushfl ; popl %0"
+		: "=g" (flags)
+		: /* no input */
+	);
+
+	return flags;
+}
+
+#define raw_local_save_flags(flags) \
+		do { (flags) = __raw_local_save_flags(); } while (0)
+
+static inline void raw_local_irq_restore(unsigned long flags)
+{
+	__asm__ __volatile__(
+		"pushl %0 ; popfl"
+		: /* no output */
+		:"g" (flags)
+		:"memory", "cc"
+	);
+}
+
+static inline void raw_local_irq_disable(void)
+{
+	__asm__ __volatile__("cli" : : : "memory");
+}
+
+static inline void raw_local_irq_enable(void)
+{
+	__asm__ __volatile__("sti" : : : "memory");
+}
 
-/* For spinlocks etc */
-#define raw_local_irq_save(x)	__asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input */ :"memory")
+/*
+ * Used in the idle loop; sti takes one instruction cycle
+ * to complete:
+ */
+static inline void raw_safe_halt(void)
+{
+	__asm__ __volatile__("sti; hlt" : : : "memory");
+}
+
+/*
+ * Used when interrupts are already enabled or to
+ * shutdown the processor:
+ */
+static inline void halt(void)
+{
+	__asm__ __volatile__("hlt": : :"memory");
+}
+
+static inline int raw_irqs_disabled_flags(unsigned long flags)
+{
+	return !(flags & (1 << 9));
+}
+
+static inline int raw_irqs_disabled(void)
+{
+	unsigned long flags = __raw_local_save_flags();
+
+	return raw_irqs_disabled_flags(flags);
+}
+
+/*
+ * For spinlocks, etc:
+ */
+static inline unsigned long __raw_local_irq_save(void)
+{
+	unsigned long flags = __raw_local_save_flags();
+
+	raw_local_irq_disable();
+
+	return flags;
+}
+
+#define raw_local_irq_save(flags) \
+		do { (flags) = __raw_local_irq_save(); } while (0)
+
+#endif /* __ASSEMBLY__ */
 
 /*
  * Do the CPU's IRQ-state tracing from assembly code. We call a

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 20/61] lock validator: irqtrace: cleanup: include/asm-x86_64/irqflags.h
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (18 preceding siblings ...)
  2006-05-29 21:24 ` [patch 19/61] lock validator: irqtrace: cleanup: include/asm-i386/irqflags.h Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-29 21:24 ` [patch 21/61] lock validator: lockdep: add local_irq_enable_in_hardirq() API Ingo Molnar
                   ` (53 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

clean up the x86-64 irqflags.h file:

 - macro => inline function transformation
 - simplifications
 - style fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86_64/lib/thunk.S       |    5 +
 include/asm-x86_64/irqflags.h |  159 ++++++++++++++++++++++++++++++++----------
 2 files changed, 128 insertions(+), 36 deletions(-)

Index: linux/arch/x86_64/lib/thunk.S
===================================================================
--- linux.orig/arch/x86_64/lib/thunk.S
+++ linux/arch/x86_64/lib/thunk.S
@@ -47,6 +47,11 @@
 	thunk_retrax __down_failed_interruptible,__down_interruptible
 	thunk_retrax __down_failed_trylock,__down_trylock
 	thunk __up_wakeup,__up
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+	thunk trace_hardirqs_on_thunk,trace_hardirqs_on
+	thunk trace_hardirqs_off_thunk,trace_hardirqs_off
+#endif
 	
 	/* SAVE_ARGS below is used only for the .cfi directives it contains. */
 	CFI_STARTPROC
Index: linux/include/asm-x86_64/irqflags.h
===================================================================
--- linux.orig/include/asm-x86_64/irqflags.h
+++ linux/include/asm-x86_64/irqflags.h
@@ -5,50 +5,137 @@
  *
  * This file gets included from lowlevel asm headers too, to provide
  * wrapped versions of the local_irq_*() APIs, based on the
- * raw_local_irq_*() macros from the lowlevel headers.
+ * raw_local_irq_*() functions from the lowlevel headers.
  */
 #ifndef _ASM_IRQFLAGS_H
 #define _ASM_IRQFLAGS_H
 
-/* interrupt control.. */
-#define raw_local_save_flags(x)	do { warn_if_not_ulong(x); __asm__ __volatile__("# save_flags \n\t pushfq ; popq %q0":"=g" (x): /* no input */ :"memory"); } while (0)
-#define raw_local_irq_restore(x) 	__asm__ __volatile__("# restore_flags \n\t pushq %0 ; popfq": /* no output */ :"g" (x):"memory", "cc")
+#ifndef __ASSEMBLY__
+/*
+ * Interrupt control:
+ */
+
+static inline unsigned long __raw_local_save_flags(void)
+{
+	unsigned long flags;
+
+	__asm__ __volatile__(
+		"# __raw_save_flags\n\t"
+		"pushfq ; popq %q0"
+		: "=g" (flags)
+		: /* no input */
+		: "memory"
+	);
+
+	return flags;
+}
+
+#define raw_local_save_flags(flags) \
+		do { (flags) = __raw_local_save_flags(); } while (0)
+
+static inline void raw_local_irq_restore(unsigned long flags)
+{
+	__asm__ __volatile__(
+		"pushq %0 ; popfq"
+		: /* no output */
+		:"g" (flags)
+		:"memory", "cc"
+	);
+}
 
 #ifdef CONFIG_X86_VSMP
-/* Interrupt control for VSMP  architecture */
-#define raw_local_irq_disable()	do { unsigned long flags; raw_local_save_flags(flags); raw_local_irq_restore((flags & ~(1 << 9)) | (1 << 18)); } while (0)
-#define raw_local_irq_enable()	do { unsigned long flags; raw_local_save_flags(flags); raw_local_irq_restore((flags | (1 << 9)) & ~(1 << 18)); } while (0)
-
-#define raw_irqs_disabled_flags(flags)	\
-({						\
-	(flags & (1<<18)) || !(flags & (1<<9));	\
-})
-
-/* For spinlocks etc */
-#define raw_local_irq_save(x)	do { raw_local_save_flags(x); raw_local_irq_restore((x & ~(1 << 9)) | (1 << 18)); } while (0)
-#else  /* CONFIG_X86_VSMP */
-#define raw_local_irq_disable() 	__asm__ __volatile__("cli": : :"memory")
-#define raw_local_irq_enable()	__asm__ __volatile__("sti": : :"memory")
-
-#define raw_irqs_disabled_flags(flags)	\
-({						\
-	!(flags & (1<<9));			\
-})
 
-/* For spinlocks etc */
-#define raw_local_irq_save(x) 	do { warn_if_not_ulong(x); __asm__ __volatile__("# raw_local_irq_save \n\t pushfq ; popq %0 ; cli":"=g" (x): /* no input */ :"memory"); } while (0)
+/*
+ * Interrupt control for the VSMP architecture:
+ */
+
+static inline void raw_local_irq_disable(void)
+{
+	unsigned long flags = __raw_local_save_flags();
+
+	raw_local_irq_restore((flags & ~(1 << 9)) | (1 << 18));
+}
+
+static inline void raw_local_irq_enable(void)
+{
+	unsigned long flags = __raw_local_save_flags();
+
+	raw_local_irq_restore((flags | (1 << 9)) & ~(1 << 18));
+}
+
+static inline int raw_irqs_disabled_flags(unsigned long flags)
+{
+	return !(flags & (1<<9)) || (flags & (1 << 18));
+}
+
+#else /* CONFIG_X86_VSMP */
+
+static inline void raw_local_irq_disable(void)
+{
+	__asm__ __volatile__("cli" : : : "memory");
+}
+
+static inline void raw_local_irq_enable(void)
+{
+	__asm__ __volatile__("sti" : : : "memory");
+}
+
+static inline int raw_irqs_disabled_flags(unsigned long flags)
+{
+	return !(flags & (1 << 9));
+}
+
 #endif
 
-#define raw_irqs_disabled()			\
-({						\
-	unsigned long flags;			\
-	raw_local_save_flags(flags);		\
-	raw_irqs_disabled_flags(flags);		\
-})
-
-/* used in the idle loop; sti takes one instruction cycle to complete */
-#define raw_safe_halt()	__asm__ __volatile__("sti; hlt": : :"memory")
-/* used when interrupts are already enabled or to shutdown the processor */
-#define halt()			__asm__ __volatile__("hlt": : :"memory")
+/*
+ * For spinlocks, etc.:
+ */
+
+static inline unsigned long __raw_local_irq_save(void)
+{
+	unsigned long flags = __raw_local_save_flags();
+
+	raw_local_irq_disable();
+
+	return flags;
+}
+
+#define raw_local_irq_save(flags) \
+		do { (flags) = __raw_local_irq_save(); } while (0)
+
+static inline int raw_irqs_disabled(void)
+{
+	unsigned long flags = __raw_local_save_flags();
+
+	return raw_irqs_disabled_flags(flags);
+}
+
+/*
+ * Used in the idle loop; sti takes one instruction cycle
+ * to complete:
+ */
+static inline void raw_safe_halt(void)
+{
+	__asm__ __volatile__("sti; hlt" : : : "memory");
+}
+
+/*
+ * Used when interrupts are already enabled or to
+ * shutdown the processor:
+ */
+static inline void halt(void)
+{
+	__asm__ __volatile__("hlt": : :"memory");
+}
+
+#else /* __ASSEMBLY__: */
+# ifdef CONFIG_TRACE_IRQFLAGS
+#  define TRACE_IRQS_ON		call trace_hardirqs_on_thunk
+#  define TRACE_IRQS_OFF	call trace_hardirqs_off_thunk
+# else
+#  define TRACE_IRQS_ON
+#  define TRACE_IRQS_OFF
+# endif
+#endif
 
 #endif

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 21/61] lock validator: lockdep: add local_irq_enable_in_hardirq() API.
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (19 preceding siblings ...)
  2006-05-29 21:24 ` [patch 20/61] lock validator: irqtrace: cleanup: include/asm-x86_64/irqflags.h Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-30  1:34   ` Andrew Morton
  2006-05-29 21:24 ` [patch 22/61] lock validator: add per_cpu_offset() Ingo Molnar
                   ` (52 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

introduce local_irq_enable_in_hardirq() API. It is currently
aliased to local_irq_enable(), hence has no functional effects.

This API will be used by lockdep, but even without lockdep
this will better document places in the kernel where a hardirq
context enables hardirqs.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/nmi.c         |    3 ++-
 arch/x86_64/kernel/nmi.c       |    3 ++-
 drivers/ide/ide-io.c           |    6 +++---
 drivers/ide/ide-taskfile.c     |    2 +-
 include/linux/ide.h            |    2 +-
 include/linux/trace_irqflags.h |    2 ++
 kernel/irq/handle.c            |    2 +-
 7 files changed, 12 insertions(+), 8 deletions(-)

Index: linux/arch/i386/kernel/nmi.c
===================================================================
--- linux.orig/arch/i386/kernel/nmi.c
+++ linux/arch/i386/kernel/nmi.c
@@ -188,7 +188,8 @@ static __cpuinit inline int nmi_known_cp
 static __init void nmi_cpu_busy(void *data)
 {
 	volatile int *endflag = data;
-	local_irq_enable();
+
+	local_irq_enable_in_hardirq();
 	/* Intentionally don't use cpu_relax here. This is
 	   to make sure that the performance counter really ticks,
 	   even if there is a simulator or similar that catches the
Index: linux/arch/x86_64/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86_64/kernel/nmi.c
+++ linux/arch/x86_64/kernel/nmi.c
@@ -186,7 +186,8 @@ void nmi_watchdog_default(void)
 static __init void nmi_cpu_busy(void *data)
 {
 	volatile int *endflag = data;
-	local_irq_enable();
+
+	local_irq_enable_in_hardirq();
 	/* Intentionally don't use cpu_relax here. This is
 	   to make sure that the performance counter really ticks,
 	   even if there is a simulator or similar that catches the
Index: linux/drivers/ide/ide-io.c
===================================================================
--- linux.orig/drivers/ide/ide-io.c
+++ linux/drivers/ide/ide-io.c
@@ -689,7 +689,7 @@ static ide_startstop_t drive_cmd_intr (i
 	u8 stat = hwif->INB(IDE_STATUS_REG);
 	int retries = 10;
 
-	local_irq_enable();
+	local_irq_enable_in_hardirq();
 	if ((stat & DRQ_STAT) && args && args[3]) {
 		u8 io_32bit = drive->io_32bit;
 		drive->io_32bit = 0;
@@ -1273,7 +1273,7 @@ static void ide_do_request (ide_hwgroup_
 		if (masked_irq != IDE_NO_IRQ && hwif->irq != masked_irq)
 			disable_irq_nosync(hwif->irq);
 		spin_unlock(&ide_lock);
-		local_irq_enable();
+		local_irq_enable_in_hardirq();
 			/* allow other IRQs while we start this request */
 		startstop = start_request(drive, rq);
 		spin_lock_irq(&ide_lock);
@@ -1622,7 +1622,7 @@ irqreturn_t ide_intr (int irq, void *dev
 	spin_unlock(&ide_lock);
 
 	if (drive->unmask)
-		local_irq_enable();
+		local_irq_enable_in_hardirq();
 	/* service this interrupt, may set handler for next interrupt */
 	startstop = handler(drive);
 	spin_lock_irq(&ide_lock);
Index: linux/drivers/ide/ide-taskfile.c
===================================================================
--- linux.orig/drivers/ide/ide-taskfile.c
+++ linux/drivers/ide/ide-taskfile.c
@@ -223,7 +223,7 @@ ide_startstop_t task_no_data_intr (ide_d
 	ide_hwif_t *hwif	= HWIF(drive);
 	u8 stat;
 
-	local_irq_enable();
+	local_irq_enable_in_hardirq();
 	if (!OK_STAT(stat = hwif->INB(IDE_STATUS_REG),READY_STAT,BAD_STAT)) {
 		return ide_error(drive, "task_no_data_intr", stat);
 		/* calls ide_end_drive_cmd */
Index: linux/include/linux/ide.h
===================================================================
--- linux.orig/include/linux/ide.h
+++ linux/include/linux/ide.h
@@ -1361,7 +1361,7 @@ extern struct semaphore ide_cfg_sem;
  * ide_drive_t->hwif: constant, no locking
  */
 
-#define local_irq_set(flags)	do { local_save_flags((flags)); local_irq_enable(); } while (0)
+#define local_irq_set(flags)	do { local_save_flags((flags)); local_irq_enable_in_hardirq(); } while (0)
 
 extern struct bus_type ide_bus_type;
 
Index: linux/include/linux/trace_irqflags.h
===================================================================
--- linux.orig/include/linux/trace_irqflags.h
+++ linux/include/linux/trace_irqflags.h
@@ -66,6 +66,8 @@
 		}						\
 	} while (0)
 
+#define local_irq_enable_in_hardirq()	local_irq_enable()
+
 #define safe_halt()						\
 	do {							\
 		trace_hardirqs_on();				\
Index: linux/kernel/irq/handle.c
===================================================================
--- linux.orig/kernel/irq/handle.c
+++ linux/kernel/irq/handle.c
@@ -83,7 +83,7 @@ fastcall irqreturn_t handle_IRQ_event(un
 	unsigned int status = 0;
 
 	if (!(action->flags & SA_INTERRUPT))
-		local_irq_enable();
+		local_irq_enable_in_hardirq();
 
 	do {
 		ret = action->handler(irq, action->dev_id, regs);

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 22/61] lock validator:  add per_cpu_offset()
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (20 preceding siblings ...)
  2006-05-29 21:24 ` [patch 21/61] lock validator: lockdep: add local_irq_enable_in_hardirq() API Ingo Molnar
@ 2006-05-29 21:24 ` Ingo Molnar
  2006-05-30  1:34   ` Andrew Morton
  2006-05-29 21:25 ` [patch 23/61] lock validator: core Ingo Molnar
                   ` (51 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

add the per_cpu_offset() generic method. (used by the lock validator)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/asm-generic/percpu.h |    2 ++
 include/asm-x86_64/percpu.h  |    2 ++
 2 files changed, 4 insertions(+)

Index: linux/include/asm-generic/percpu.h
===================================================================
--- linux.orig/include/asm-generic/percpu.h
+++ linux/include/asm-generic/percpu.h
@@ -7,6 +7,8 @@
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
 
+#define per_cpu_offset(x) (__per_cpu_offset[x])
+
 /* Separate out the type, so (int[3], foo) works. */
 #define DEFINE_PER_CPU(type, name) \
     __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
Index: linux/include/asm-x86_64/percpu.h
===================================================================
--- linux.orig/include/asm-x86_64/percpu.h
+++ linux/include/asm-x86_64/percpu.h
@@ -14,6 +14,8 @@
 #define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
 #define __my_cpu_offset() read_pda(data_offset)
 
+#define per_cpu_offset(x) (__per_cpu_offset(x))
+
 /* Separate out the type, so (int[3], foo) works. */
 #define DEFINE_PER_CPU(type, name) \
     __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 23/61] lock validator: core
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (21 preceding siblings ...)
  2006-05-29 21:24 ` [patch 22/61] lock validator: add per_cpu_offset() Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:25 ` [patch 24/61] lock validator: procfs Ingo Molnar
                   ` (50 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

lock validator core changes. Not enabled yet.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/init_task.h      |    1 
 include/linux/lockdep.h        |  280 ++++
 include/linux/sched.h          |   12 
 include/linux/trace_irqflags.h |   13 
 init/main.c                    |   16 
 kernel/Makefile                |    1 
 kernel/fork.c                  |    5 
 kernel/irq/manage.c            |    6 
 kernel/lockdep.c               | 2633 +++++++++++++++++++++++++++++++++++++++++
 kernel/lockdep_internals.h     |   93 +
 kernel/module.c                |    3 
 lib/Kconfig.debug              |    2 
 lib/locking-selftest.c         |    4 
 13 files changed, 3064 insertions(+), 5 deletions(-)

Index: linux/include/linux/init_task.h
===================================================================
--- linux.orig/include/linux/init_task.h
+++ linux/include/linux/init_task.h
@@ -134,6 +134,7 @@ extern struct group_info init_groups;
 	.cpu_timers	= INIT_CPU_TIMERS(tsk.cpu_timers),		\
 	.fs_excl	= ATOMIC_INIT(0),				\
  	INIT_TRACE_IRQFLAGS						\
+ 	INIT_LOCKDEP							\
 }
 
 
Index: linux/include/linux/lockdep.h
===================================================================
--- /dev/null
+++ linux/include/linux/lockdep.h
@@ -0,0 +1,280 @@
+/*
+ * Runtime locking correctness validator
+ *
+ *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * see Documentation/lockdep-design.txt for more details.
+ */
+#ifndef __LINUX_LOCKDEP_H
+#define __LINUX_LOCKDEP_H
+
+#include <linux/linkage.h>
+#include <linux/list.h>
+#include <linux/debug_locks.h>
+#include <linux/stacktrace.h>
+
+#ifdef CONFIG_LOCKDEP
+
+/*
+ * Lock-type usage-state bits:
+ */
+enum lock_usage_bit
+{
+	LOCK_USED = 0,
+	LOCK_USED_IN_HARDIRQ,
+	LOCK_USED_IN_SOFTIRQ,
+	LOCK_ENABLED_SOFTIRQS,
+	LOCK_ENABLED_HARDIRQS,
+	LOCK_USED_IN_HARDIRQ_READ,
+	LOCK_USED_IN_SOFTIRQ_READ,
+	LOCK_ENABLED_SOFTIRQS_READ,
+	LOCK_ENABLED_HARDIRQS_READ,
+	LOCK_USAGE_STATES
+};
+
+/*
+ * Usage-state bitmasks:
+ */
+#define LOCKF_USED			(1 << LOCK_USED)
+#define LOCKF_USED_IN_HARDIRQ		(1 << LOCK_USED_IN_HARDIRQ)
+#define LOCKF_USED_IN_SOFTIRQ		(1 << LOCK_USED_IN_SOFTIRQ)
+#define LOCKF_ENABLED_HARDIRQS		(1 << LOCK_ENABLED_HARDIRQS)
+#define LOCKF_ENABLED_SOFTIRQS		(1 << LOCK_ENABLED_SOFTIRQS)
+
+#define LOCKF_ENABLED_IRQS (LOCKF_ENABLED_HARDIRQS | LOCKF_ENABLED_SOFTIRQS)
+#define LOCKF_USED_IN_IRQ (LOCKF_USED_IN_HARDIRQ | LOCKF_USED_IN_SOFTIRQ)
+
+#define LOCKF_USED_IN_HARDIRQ_READ	(1 << LOCK_USED_IN_HARDIRQ_READ)
+#define LOCKF_USED_IN_SOFTIRQ_READ	(1 << LOCK_USED_IN_SOFTIRQ_READ)
+#define LOCKF_ENABLED_HARDIRQS_READ	(1 << LOCK_ENABLED_HARDIRQS_READ)
+#define LOCKF_ENABLED_SOFTIRQS_READ	(1 << LOCK_ENABLED_SOFTIRQS_READ)
+
+#define LOCKF_ENABLED_IRQS_READ \
+		(LOCKF_ENABLED_HARDIRQS_READ | LOCKF_ENABLED_SOFTIRQS_READ)
+#define LOCKF_USED_IN_IRQ_READ \
+		(LOCKF_USED_IN_HARDIRQ_READ | LOCKF_USED_IN_SOFTIRQ_READ)
+
+#define MAX_LOCKDEP_SUBTYPES		8UL
+
+/*
+ * Lock-types are keyed via unique addresses, by embedding the
+ * locktype-key into the kernel (or module) .data section. (For
+ * static locks we use the lock address itself as the key.)
+ */
+struct lockdep_subtype_key {
+	char __one_byte;
+} __attribute__ ((__packed__));
+
+struct lockdep_type_key {
+	struct lockdep_subtype_key	subkeys[MAX_LOCKDEP_SUBTYPES];
+};
+
+/*
+ * The lock-type itself:
+ */
+struct lock_type {
+	/*
+	 * type-hash:
+	 */
+	struct list_head		hash_entry;
+
+	/*
+	 * global list of all lock-types:
+	 */
+	struct list_head		lock_entry;
+
+	struct lockdep_subtype_key	*key;
+	unsigned int			subtype;
+
+	/*
+	 * IRQ/softirq usage tracking bits:
+	 */
+	unsigned long			usage_mask;
+	struct stack_trace		usage_traces[LOCK_USAGE_STATES];
+
+	/*
+	 * These fields represent a directed graph of lock dependencies,
+	 * to every node we attach a list of "forward" and a list of
+	 * "backward" graph nodes.
+	 */
+	struct list_head		locks_after, locks_before;
+
+	/*
+	 * Generation counter, when doing certain types of graph walking,
+	 * to ensure that we check one node only once:
+	 */
+	unsigned int			version;
+
+	/*
+	 * Statistics counter:
+	 */
+	unsigned long			ops;
+
+	const char			*name;
+	int				name_version;
+};
+
+/*
+ * Map the lock object (the lock instance) to the lock-type object.
+ * This is embedded into specific lock instances:
+ */
+struct lockdep_map {
+	struct lockdep_type_key		*key;
+	struct lock_type		*type[MAX_LOCKDEP_SUBTYPES];
+	const char			*name;
+};
+
+/*
+ * Every lock has a list of other locks that were taken after it.
+ * We only grow the list, never remove from it:
+ */
+struct lock_list {
+	struct list_head		entry;
+	struct lock_type		*type;
+	struct stack_trace		trace;
+};
+
+/*
+ * We record lock dependency chains, so that we can cache them:
+ */
+struct lock_chain {
+	struct list_head		entry;
+	u64				chain_key;
+};
+
+struct held_lock {
+	/*
+	 * One-way hash of the dependency chain up to this point. We
+	 * hash the hashes step by step as the dependency chain grows.
+	 *
+	 * We use it for dependency-caching and we skip detection
+	 * passes and dependency-updates if there is a cache-hit, so
+	 * it is absolutely critical for 100% coverage of the validator
+	 * to have a unique key value for every unique dependency path
+	 * that can occur in the system, to make a unique hash value
+	 * as likely as possible - hence the 64-bit width.
+	 *
+	 * The task struct holds the current hash value (initialized
+	 * with zero), here we store the previous hash value:
+	 */
+	u64				prev_chain_key;
+	struct lock_type		*type;
+	unsigned long			acquire_ip;
+	struct lockdep_map		*instance;
+
+	/*
+	 * The lock-stack is unified in that the lock chains of interrupt
+	 * contexts nest ontop of process context chains, but we 'separate'
+	 * the hashes by starting with 0 if we cross into an interrupt
+	 * context, and we also keep do not add cross-context lock
+	 * dependencies - the lock usage graph walking covers that area
+	 * anyway, and we'd just unnecessarily increase the number of
+	 * dependencies otherwise. [Note: hardirq and softirq contexts
+	 * are separated from each other too.]
+	 *
+	 * The following field is used to detect when we cross into an
+	 * interrupt context:
+	 */
+	int				irq_context;
+	int				trylock;
+	int				read;
+	int				hardirqs_off;
+};
+
+/*
+ * Initialization, self-test and debugging-output methods:
+ */
+extern void lockdep_init(void);
+extern void lockdep_info(void);
+extern void lockdep_reset(void);
+extern void lockdep_reset_lock(struct lockdep_map *lock);
+extern void lockdep_free_key_range(void *start, unsigned long size);
+
+extern void print_lock_types(void);
+extern void lockdep_print_held_locks(struct task_struct *task);
+
+/*
+ * These methods are used by specific locking variants (spinlocks,
+ * rwlocks, mutexes and rwsems) to pass init/acquire/release events
+ * to lockdep:
+ */
+
+extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
+			     struct lockdep_type_key *key);
+
+extern void lockdep_acquire(struct lockdep_map *lock, unsigned int subtype,
+			    int trylock, int read, unsigned long ip);
+
+extern void lockdep_release(struct lockdep_map *lock, int nested,
+			    unsigned long ip);
+
+# define INIT_LOCKDEP				.lockdep_recursion = 0,
+
+extern void early_boot_irqs_off(void);
+extern void early_boot_irqs_on(void);
+
+#else /* LOCKDEP */
+# define lockdep_init()				do { } while (0)
+# define lockdep_info()				do { } while (0)
+# define print_lock_types()			do { } while (0)
+# define lockdep_print_held_locks(task)		do { (void)(task); } while (0)
+# define lockdep_init_map(lock, name, key)	do { } while (0)
+# define INIT_LOCKDEP
+# define lockdep_reset()		do { debug_locks = 1; } while (0)
+# define lockdep_free_key_range(start, size)	do { } while (0)
+# define early_boot_irqs_off()			do { } while (0)
+# define early_boot_irqs_on()			do { } while (0)
+/*
+ * The type key takes no space if lockdep is disabled:
+ */
+struct lockdep_type_key { };
+#endif /* !LOCKDEP */
+
+/*
+ * For trivial one-depth nesting of a lock-type, the following
+ * global define can be used. (Subsystems with multiple levels
+ * of nesting should define their own lock-nesting subtypes.)
+ */
+#define SINGLE_DEPTH_NESTING			1
+
+/*
+ * Map the dependency ops to NOP or to real lockdep ops, depending
+ * on the per lock-type debug mode:
+ */
+#ifdef CONFIG_PROVE_SPIN_LOCKING
+# define spin_acquire(l, s, t, i)		lockdep_acquire(l, s, t, 0, i)
+# define spin_release(l, n, i)			lockdep_release(l, n, i)
+#else
+# define spin_acquire(l, s, t, i)		do { } while (0)
+# define spin_release(l, n, i)			do { } while (0)
+#endif
+
+#ifdef CONFIG_PROVE_RW_LOCKING
+# define rwlock_acquire(l, s, t, i)		lockdep_acquire(l, s, t, 0, i)
+# define rwlock_acquire_read(l, s, t, i)	lockdep_acquire(l, s, t, 1, i)
+# define rwlock_release(l, n, i)		lockdep_release(l, n, i)
+#else
+# define rwlock_acquire(l, s, t, i)		do { } while (0)
+# define rwlock_acquire_read(l, s, t, i)	do { } while (0)
+# define rwlock_release(l, n, i)		do { } while (0)
+#endif
+
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+# define mutex_acquire(l, s, t, i)		lockdep_acquire(l, s, t, 0, i)
+# define mutex_release(l, n, i)			lockdep_release(l, n, i)
+#else
+# define mutex_acquire(l, s, t, i)		do { } while (0)
+# define mutex_release(l, n, i)			do { } while (0)
+#endif
+
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+# define rwsem_acquire(l, s, t, i)		lockdep_acquire(l, s, t, 0, i)
+# define rwsem_acquire_read(l, s, t, i)		lockdep_acquire(l, s, t, -1, i)
+# define rwsem_release(l, n, i)			lockdep_release(l, n, i)
+#else
+# define rwsem_acquire(l, s, t, i)		do { } while (0)
+# define rwsem_acquire_read(l, s, t, i)		do { } while (0)
+# define rwsem_release(l, n, i)			do { } while (0)
+#endif
+
+#endif /* __LINUX_LOCKDEP_H */
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -931,6 +931,13 @@ struct task_struct {
 	int hardirq_context;
 	int softirq_context;
 #endif
+#ifdef CONFIG_LOCKDEP
+# define MAX_LOCK_DEPTH 30UL
+	u64 curr_chain_key;
+	int lockdep_depth;
+	struct held_lock held_locks[MAX_LOCK_DEPTH];
+#endif
+	unsigned int lockdep_recursion;
 
 /* journalling filesystem info */
 	void *journal_info;
@@ -1350,6 +1357,11 @@ static inline void task_lock(struct task
 	spin_lock(&p->alloc_lock);
 }
 
+static inline void task_lock_free(struct task_struct *p)
+{
+	spin_lock_nested(&p->alloc_lock, SINGLE_DEPTH_NESTING);
+}
+
 static inline void task_unlock(struct task_struct *p)
 {
 	spin_unlock(&p->alloc_lock);
Index: linux/include/linux/trace_irqflags.h
===================================================================
--- linux.orig/include/linux/trace_irqflags.h
+++ linux/include/linux/trace_irqflags.h
@@ -66,7 +66,18 @@
 		}						\
 	} while (0)
 
-#define local_irq_enable_in_hardirq()	local_irq_enable()
+/*
+ * On lockdep we dont want to enable hardirqs in hardirq
+ * context. NOTE: in theory this might break fragile code
+ * that relies on hardirq delivery - in practice we dont
+ * seem to have such places left. So the only effect should
+ * be slightly increased irqs-off latencies.
+ */
+#ifdef CONFIG_LOCKDEP
+# define local_irq_enable_in_hardirq()	do { } while (0)
+#else
+# define local_irq_enable_in_hardirq()	local_irq_enable()
+#endif
 
 #define safe_halt()						\
 	do {							\
Index: linux/init/main.c
===================================================================
--- linux.orig/init/main.c
+++ linux/init/main.c
@@ -54,6 +54,7 @@
 #include <linux/root_dev.h>
 #include <linux/buffer_head.h>
 #include <linux/debug_locks.h>
+#include <linux/lockdep.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -80,6 +81,7 @@
 
 static int init(void *);
 
+extern void early_init_irq_lock_type(void);
 extern void init_IRQ(void);
 extern void fork_init(unsigned long);
 extern void mca_init(void);
@@ -461,6 +463,17 @@ asmlinkage void __init start_kernel(void
 {
 	char * command_line;
 	extern struct kernel_param __start___param[], __stop___param[];
+
+	/*
+	 * Need to run as early as possible, to initialize the
+	 * lockdep hash:
+	 */
+	lockdep_init();
+
+	local_irq_disable();
+	early_boot_irqs_off();
+	early_init_irq_lock_type();
+
 /*
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
@@ -512,8 +525,11 @@ asmlinkage void __init start_kernel(void
 	if (panic_later)
 		panic(panic_later, panic_param);
 	profile_init();
+	early_boot_irqs_on();
 	local_irq_enable();
 
+	lockdep_info();
+
 	/*
 	 * Need to run this when irqs are enabled, because it wants
 	 * to self-test [hard/soft]-irqs on/off lock inversion bugs
Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -12,6 +12,7 @@ obj-y     = sched.o fork.o exec_domain.o
 
 obj-y += time/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
+obj-$(CONFIG_LOCKDEP) += lockdep.o
 obj-$(CONFIG_FUTEX) += futex.o
 ifeq ($(CONFIG_COMPAT),y)
 obj-$(CONFIG_FUTEX) += futex_compat.o
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -1049,6 +1049,11 @@ static task_t *copy_process(unsigned lon
  	}
 	mpol_fix_fork_child_flag(p);
 #endif
+#ifdef CONFIG_LOCKDEP
+	p->lockdep_depth = 0; /* no locks held yet */
+	p->curr_chain_key = 0;
+	p->lockdep_recursion = 0;
+#endif
 
 	rt_mutex_init_task(p);
 
Index: linux/kernel/irq/manage.c
===================================================================
--- linux.orig/kernel/irq/manage.c
+++ linux/kernel/irq/manage.c
@@ -406,6 +406,12 @@ int request_irq(unsigned int irq,
 		   immediately, so let's make sure....
 		   We do this before actually registering it, to make sure that a 'real'
 		   IRQ doesn't run in parallel with our fake. */
+#ifdef CONFIG_LOCKDEP
+		/*
+		 * Lockdep wants atomic interrupt handlers:
+		 */
+		irqflags |= SA_INTERRUPT;
+#endif
 		if (irqflags & SA_INTERRUPT) {
 			unsigned long flags;
 
Index: linux/kernel/lockdep.c
===================================================================
--- /dev/null
+++ linux/kernel/lockdep.c
@@ -0,0 +1,2633 @@
+/*
+ * kernel/lockdep.c
+ *
+ * Runtime locking correctness validator
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * this code maps all the lock dependencies as they occur in a live kernel
+ * and will warn about the following types of locking bugs:
+ *
+ * - lock inversion scenarios
+ * - circular lock dependencies
+ * - hardirq/softirq safe/unsafe locking bugs
+ *
+ * Bugs are reported even if the current locking scenario does not cause
+ * any deadlock at this point.
+ *
+ * I.e. if anytime in the past two locks were taken in a different order,
+ * even if it happened for another task, even if those were different
+ * locks (but of the same type as this lock), this code will detect it.
+ *
+ * Thanks to Arjan van de Ven for coming up with the initial idea of
+ * mapping lock dependencies runtime.
+ */
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/delay.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+#include <linux/kallsyms.h>
+#include <linux/interrupt.h>
+#include <linux/stacktrace.h>
+#include <linux/debug_locks.h>
+#include <linux/trace_irqflags.h>
+
+#include <asm/sections.h>
+
+#include "lockdep_internals.h"
+
+/*
+ * hash_lock: protects the lockdep hashes and type/list/hash allocators.
+ *
+ * This is one of the rare exceptions where it's justified
+ * to use a raw spinlock - we really dont want the spinlock
+ * code to recurse back into the lockdep code.
+ */
+static raw_spinlock_t hash_lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
+
+static int lockdep_initialized;
+
+unsigned long nr_list_entries;
+static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES];
+
+/*
+ * Allocate a lockdep entry. (assumes hash_lock held, returns
+ * with NULL on failure)
+ */
+static struct lock_list *alloc_list_entry(void)
+{
+	if (nr_list_entries >= MAX_LOCKDEP_ENTRIES) {
+		__raw_spin_unlock(&hash_lock);
+		debug_locks_off();
+		printk("BUG: MAX_LOCKDEP_ENTRIES too low!\n");
+		printk("turning off the locking correctness validator.\n");
+		return NULL;
+	}
+	return list_entries + nr_list_entries++;
+}
+
+/*
+ * All data structures here are protected by the global debug_lock.
+ *
+ * Mutex key structs only get allocated, once during bootup, and never
+ * get freed - this significantly simplifies the debugging code.
+ */
+unsigned long nr_lock_types;
+static struct lock_type lock_types[MAX_LOCKDEP_KEYS];
+
+/*
+ * We keep a global list of all lock types. The list only grows,
+ * never shrinks. The list is only accessed with the lockdep
+ * spinlock lock held.
+ */
+LIST_HEAD(all_lock_types);
+
+/*
+ * The lockdep types are in a hash-table as well, for fast lookup:
+ */
+#define TYPEHASH_BITS		(MAX_LOCKDEP_KEYS_BITS - 1)
+#define TYPEHASH_SIZE		(1UL << TYPEHASH_BITS)
+#define TYPEHASH_MASK		(TYPEHASH_SIZE - 1)
+#define __typehashfn(key)	((((unsigned long)key >> TYPEHASH_BITS) + (unsigned long)key) & TYPEHASH_MASK)
+#define typehashentry(key)	(typehash_table + __typehashfn((key)))
+
+static struct list_head typehash_table[TYPEHASH_SIZE];
+
+unsigned long nr_lock_chains;
+static struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS];
+
+/*
+ * We put the lock dependency chains into a hash-table as well, to cache
+ * their existence:
+ */
+#define CHAINHASH_BITS		(MAX_LOCKDEP_CHAINS_BITS-1)
+#define CHAINHASH_SIZE		(1UL << CHAINHASH_BITS)
+#define CHAINHASH_MASK		(CHAINHASH_SIZE - 1)
+#define __chainhashfn(chain) \
+		(((chain >> CHAINHASH_BITS) + chain) & CHAINHASH_MASK)
+#define chainhashentry(chain)	(chainhash_table + __chainhashfn((chain)))
+
+static struct list_head chainhash_table[CHAINHASH_SIZE];
+
+/*
+ * The hash key of the lock dependency chains is a hash itself too:
+ * it's a hash of all locks taken up to that lock, including that lock.
+ * It's a 64-bit hash, because it's important for the keys to be
+ * unique.
+ */
+#define iterate_chain_key(key1, key2) \
+	(((key1) << MAX_LOCKDEP_KEYS_BITS/2) ^ \
+	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS/2)) ^ \
+	(key2))
+
+/*
+ * Debugging switches:
+ */
+#define LOCKDEP_OFF		0
+
+#define VERBOSE			0
+
+#if VERBOSE
+# define HARDIRQ_VERBOSE	1
+# define SOFTIRQ_VERBOSE	1
+#else
+# define HARDIRQ_VERBOSE	0
+# define SOFTIRQ_VERBOSE	0
+#endif
+
+#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE
+/*
+ * Quick filtering for interesting events:
+ */
+static int type_filter(struct lock_type *type)
+{
+	if (type->name_version == 2 &&
+			!strcmp(type->name, "xfrm_state_afinfo_lock"))
+		return 1;
+	if ((type->name_version == 2 || type->name_version == 4) &&
+			!strcmp(type->name, "&mc->mca_lock"))
+		return 1;
+	return 0;
+}
+#endif
+
+static int verbose(struct lock_type *type)
+{
+#if VERBOSE
+	return type_filter(type);
+#endif
+	return 0;
+}
+
+static int hardirq_verbose(struct lock_type *type)
+{
+#if HARDIRQ_VERBOSE
+	return type_filter(type);
+#endif
+	return 0;
+}
+
+static int softirq_verbose(struct lock_type *type)
+{
+#if SOFTIRQ_VERBOSE
+	return type_filter(type);
+#endif
+	return 0;
+}
+
+/*
+ * Stack-trace: tightly packed array of stack backtrace
+ * addresses. Protected by the hash_lock.
+ */
+unsigned long nr_stack_trace_entries;
+static unsigned long stack_trace[MAX_STACK_TRACE_ENTRIES];
+
+static int save_trace(struct stack_trace *trace)
+{
+	trace->nr_entries = 0;
+	trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
+	trace->entries = stack_trace + nr_stack_trace_entries;
+
+	save_stack_trace(trace, NULL, 0, 3);
+
+	trace->max_entries = trace->nr_entries;
+
+	nr_stack_trace_entries += trace->nr_entries;
+	if (DEBUG_WARN_ON(nr_stack_trace_entries > MAX_STACK_TRACE_ENTRIES))
+		return 0;
+
+	if (nr_stack_trace_entries == MAX_STACK_TRACE_ENTRIES) {
+		__raw_spin_unlock(&hash_lock);
+		if (debug_locks_off()) {
+			printk("BUG: MAX_STACK_TRACE_ENTRIES too low!\n");
+			printk("turning off the locking correctness validator.\n");
+			dump_stack();
+		}
+		return 0;
+	}
+
+	return 1;
+}
+
+unsigned int nr_hardirq_chains;
+unsigned int nr_softirq_chains;
+unsigned int nr_process_chains;
+unsigned int max_lockdep_depth;
+unsigned int max_recursion_depth;
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+/*
+ * We cannot printk in early bootup code. Not even early_printk()
+ * might work. So we mark any initialization errors and printk
+ * about it later on, in lockdep_info().
+ */
+int lockdep_init_error;
+
+/*
+ * Various lockdep statistics:
+ */
+atomic_t chain_lookup_hits;
+atomic_t chain_lookup_misses;
+atomic_t hardirqs_on_events;
+atomic_t hardirqs_off_events;
+atomic_t redundant_hardirqs_on;
+atomic_t redundant_hardirqs_off;
+atomic_t softirqs_on_events;
+atomic_t softirqs_off_events;
+atomic_t redundant_softirqs_on;
+atomic_t redundant_softirqs_off;
+atomic_t nr_unused_locks;
+atomic_t nr_hardirq_safe_locks;
+atomic_t nr_softirq_safe_locks;
+atomic_t nr_hardirq_unsafe_locks;
+atomic_t nr_softirq_unsafe_locks;
+atomic_t nr_hardirq_read_safe_locks;
+atomic_t nr_softirq_read_safe_locks;
+atomic_t nr_hardirq_read_unsafe_locks;
+atomic_t nr_softirq_read_unsafe_locks;
+atomic_t nr_cyclic_checks;
+atomic_t nr_cyclic_check_recursions;
+atomic_t nr_find_usage_forwards_checks;
+atomic_t nr_find_usage_forwards_recursions;
+atomic_t nr_find_usage_backwards_checks;
+atomic_t nr_find_usage_backwards_recursions;
+# define debug_atomic_inc(ptr)		atomic_inc(ptr)
+# define debug_atomic_dec(ptr)		atomic_dec(ptr)
+# define debug_atomic_read(ptr)		atomic_read(ptr)
+#else
+# define debug_atomic_inc(ptr)		do { } while (0)
+# define debug_atomic_dec(ptr)		do { } while (0)
+# define debug_atomic_read(ptr)		0
+#endif
+
+/*
+ * Locking printouts:
+ */
+
+static const char *usage_str[] =
+{
+	[LOCK_USED] =			"initial-use ",
+	[LOCK_USED_IN_HARDIRQ] =	"in-hardirq-W",
+	[LOCK_USED_IN_SOFTIRQ] =	"in-softirq-W",
+	[LOCK_ENABLED_SOFTIRQS] =	"softirq-on-W",
+	[LOCK_ENABLED_HARDIRQS] =	"hardirq-on-W",
+	[LOCK_USED_IN_HARDIRQ_READ] =	"in-hardirq-R",
+	[LOCK_USED_IN_SOFTIRQ_READ] =	"in-softirq-R",
+	[LOCK_ENABLED_SOFTIRQS_READ] =	"softirq-on-R",
+	[LOCK_ENABLED_HARDIRQS_READ] =	"hardirq-on-R",
+};
+
+static void printk_sym(unsigned long ip)
+{
+	printk(" [<%08lx>]", ip);
+	print_symbol(" %s\n", ip);
+}
+
+const char * __get_key_name(struct lockdep_subtype_key *key, char *str)
+{
+	unsigned long offs, size;
+	char *modname;
+
+	return kallsyms_lookup((unsigned long)key, &size, &offs, &modname, str);
+}
+
+void
+get_usage_chars(struct lock_type *type, char *c1, char *c2, char *c3, char *c4)
+{
+	*c1 = '.', *c2 = '.', *c3 = '.', *c4 = '.';
+
+	if (type->usage_mask & LOCKF_USED_IN_HARDIRQ)
+		*c1 = '+';
+	else
+		if (type->usage_mask & LOCKF_ENABLED_HARDIRQS)
+			*c1 = '-';
+
+	if (type->usage_mask & LOCKF_USED_IN_SOFTIRQ)
+		*c2 = '+';
+	else
+		if (type->usage_mask & LOCKF_ENABLED_SOFTIRQS)
+			*c2 = '-';
+
+	if (type->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
+		*c3 = '-';
+	if (type->usage_mask & LOCKF_USED_IN_HARDIRQ_READ) {
+		*c3 = '+';
+		if (type->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
+			*c3 = (char)'??';
+	}
+
+	if (type->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
+		*c4 = '-';
+	if (type->usage_mask & LOCKF_USED_IN_SOFTIRQ_READ) {
+		*c4 = '+';
+		if (type->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
+			*c4 = (char)'??';
+	}
+}
+
+static void print_lock_name(struct lock_type *type)
+{
+	char str[128], c1, c2, c3, c4;
+	const char *name;
+
+	get_usage_chars(type, &c1, &c2, &c3, &c4);
+
+	name = type->name;
+	if (!name) {
+		name = __get_key_name(type->key, str);
+		printk(" (%s", name);
+	} else {
+		printk(" (%s", name);
+		if (type->name_version > 1)
+			printk("#%d", type->name_version);
+		if (type->subtype)
+			printk("/%d", type->subtype);
+	}
+	printk("){%c%c%c%c}", c1, c2, c3, c4);
+}
+
+static void print_lock_name_field(struct lock_type *type)
+{
+	const char *name;
+	char str[128];
+
+	name = type->name;
+	if (!name) {
+		name = __get_key_name(type->key, str);
+		printk("%30s", name);
+	} else {
+		printk("%30s", name);
+		if (type->name_version > 1)
+			printk("#%d", type->name_version);
+		if (type->subtype)
+			printk("/%d", type->subtype);
+	}
+}
+
+static void print_lockdep_cache(struct lockdep_map *lock)
+{
+	const char *name;
+	char str[128];
+
+	name = lock->name;
+	if (!name)
+		name = __get_key_name(lock->key->subkeys, str);
+
+	printk("%s", name);
+}
+
+static void print_lock(struct held_lock *hlock)
+{
+	print_lock_name(hlock->type);
+	printk(", at:");
+	printk_sym(hlock->acquire_ip);
+}
+
+void lockdep_print_held_locks(struct task_struct *curr)
+{
+	int i;
+
+	if (!curr->lockdep_depth) {
+		printk("no locks held by %s/%d.\n", curr->comm, curr->pid);
+		return;
+	}
+	printk("%d locks held by %s/%d:\n",
+		curr->lockdep_depth, curr->comm, curr->pid);
+
+	for (i = 0; i < curr->lockdep_depth; i++) {
+		printk(" #%d: ", i);
+		print_lock(curr->held_locks + i);
+	}
+}
+/*
+ * Helper to print a nice hierarchy of lock dependencies:
+ */
+static void print_spaces(int nr)
+{
+	int i;
+
+	for (i = 0; i < nr; i++)
+		printk("  ");
+}
+
+void print_lock_type_header(struct lock_type *type, int depth)
+{
+	int bit;
+
+	print_spaces(depth);
+	printk("->");
+	print_lock_name(type);
+	printk(" ops: %lu", type->ops);
+	printk(" {\n");
+
+	for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
+		if (type->usage_mask & (1 << bit)) {
+			int len = depth;
+
+			print_spaces(depth);
+			len += printk("   %s", usage_str[bit]);
+			len += printk(" at:\n");
+			print_stack_trace(type->usage_traces + bit, len);
+		}
+	}
+	print_spaces(depth);
+	printk(" }\n");
+
+	print_spaces(depth);
+	printk(" ... key      at:");
+	printk_sym((unsigned long)type->key);
+}
+
+/*
+ * printk all lock dependencies starting at <entry>:
+ */
+static void print_lock_dependencies(struct lock_type *type, int depth)
+{
+	struct lock_list *entry;
+
+	if (DEBUG_WARN_ON(depth >= 20))
+		return;
+
+	print_lock_type_header(type, depth);
+
+	list_for_each_entry(entry, &type->locks_after, entry) {
+		DEBUG_WARN_ON(!entry->type);
+		print_lock_dependencies(entry->type, depth + 1);
+
+		print_spaces(depth);
+		printk(" ... acquired at:\n");
+		print_stack_trace(&entry->trace, 2);
+		printk("\n");
+	}
+}
+
+/*
+ * printk all locks that are taken after this lock:
+ */
+static void print_flat_dependencies(struct lock_type *type)
+{
+	struct lock_list *entry;
+	int nr = 0;
+
+	printk(" {\n");
+	list_for_each_entry(entry, &type->locks_after, entry) {
+		nr++;
+		DEBUG_WARN_ON(!entry->type);
+		printk("    -> ");
+		print_lock_name_field(entry->type);
+		if (entry->type->subtype)
+			printk("/%d", entry->type->subtype);
+		print_stack_trace(&entry->trace, 2);
+	}
+	printk(" } [%d]", nr);
+}
+
+void print_lock_type(struct lock_type *type)
+{
+	print_lock_type_header(type, 0);
+	if (!list_empty(&type->locks_after))
+		print_flat_dependencies(type);
+	printk("\n");
+}
+
+void print_lock_types(void)
+{
+	struct list_head *head;
+	struct lock_type *type;
+	int i, nr;
+
+	printk("lock types:\n");
+
+	for (i = 0; i < TYPEHASH_SIZE; i++) {
+		head = typehash_table + i;
+		if (list_empty(head))
+			continue;
+		printk("\nhash-list at %d:\n", i);
+		nr = 0;
+		list_for_each_entry(type, head, hash_entry) {
+			printk("\n");
+			print_lock_type(type);
+			nr++;
+		}
+	}
+}
+
+/*
+ * Add a new dependency to the head of the list:
+ */
+static int add_lock_to_list(struct lock_type *type, struct lock_type *this,
+			    struct list_head *head, unsigned long ip)
+{
+	struct lock_list *entry;
+	/*
+	 * Lock not present yet - get a new dependency struct and
+	 * add it to the list:
+	 */
+	entry = alloc_list_entry();
+	if (!entry)
+		return 0;
+
+	entry->type = this;
+	save_trace(&entry->trace);
+
+	/*
+	 * Since we never remove from the dependency list, the list can
+	 * be walked lockless by other CPUs, it's only allocation
+	 * that must be protected by the spinlock. But this also means
+	 * we must make new entries visible only once writes to the
+	 * entry become visible - hence the RCU op:
+	 */
+	list_add_tail_rcu(&entry->entry, head);
+
+	return 1;
+}
+
+/*
+ * Recursive, forwards-direction lock-dependency checking, used for
+ * both noncyclic checking and for hardirq-unsafe/softirq-unsafe
+ * checking.
+ *
+ * (to keep the stackframe of the recursive functions small we
+ *  use these global variables, and we also mark various helper
+ *  functions as noinline.)
+ */
+static struct held_lock *check_source, *check_target;
+
+/*
+ * Print a dependency chain entry (this is only done when a deadlock
+ * has been detected):
+ */
+static noinline int
+print_circular_bug_entry(struct lock_list *target, unsigned int depth)
+{
+	if (debug_locks_silent)
+		return 0;
+	printk("\n-> #%u", depth);
+	print_lock_name(target->type);
+	printk(":\n");
+	print_stack_trace(&target->trace, 6);
+
+	return 0;
+}
+
+/*
+ * When a circular dependency is detected, print the
+ * header first:
+ */
+static noinline int
+print_circular_bug_header(struct lock_list *entry, unsigned int depth)
+{
+	struct task_struct *curr = current;
+
+	__raw_spin_unlock(&hash_lock);
+	debug_locks_off();
+	if (debug_locks_silent)
+		return 0;
+
+	printk("\n=====================================================\n");
+	printk(  "[ BUG: possible circular locking deadlock detected! ]\n");
+	printk(  "-----------------------------------------------------\n");
+	printk("%s/%d is trying to acquire lock:\n",
+		curr->comm, curr->pid);
+	print_lock(check_source);
+	printk("\nbut task is already holding lock:\n");
+	print_lock(check_target);
+	printk("\nwhich lock already depends on the new lock,\n");
+	printk("which could lead to circular deadlocks!\n");
+	printk("\nthe existing dependency chain (in reverse order) is:\n");
+
+	print_circular_bug_entry(entry, depth);
+
+	return 0;
+}
+
+static noinline int print_circular_bug_tail(void)
+{
+	struct task_struct *curr = current;
+	struct lock_list this;
+
+	if (debug_locks_silent)
+		return 0;
+
+	this.type = check_source->type;
+	save_trace(&this.trace);
+	print_circular_bug_entry(&this, 0);
+
+	printk("\nother info that might help us debug this:\n\n");
+	lockdep_print_held_locks(curr);
+
+	printk("\nstack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+static int noinline print_infinite_recursion_bug(void)
+{
+	__raw_spin_unlock(&hash_lock);
+	DEBUG_WARN_ON(1);
+
+	return 0;
+}
+
+/*
+ * Prove that the dependency graph starting at <entry> can not
+ * lead to <target>. Print an error and return 0 if it does.
+ */
+static noinline int
+check_noncircular(struct lock_type *source, unsigned int depth)
+{
+	struct lock_list *entry;
+
+	debug_atomic_inc(&nr_cyclic_check_recursions);
+	if (depth > max_recursion_depth)
+		max_recursion_depth = depth;
+	if (depth >= 20)
+		return print_infinite_recursion_bug();
+	/*
+	 * Check this lock's dependency list:
+	 */
+	list_for_each_entry(entry, &source->locks_after, entry) {
+		if (entry->type == check_target->type)
+			return print_circular_bug_header(entry, depth+1);
+		debug_atomic_inc(&nr_cyclic_checks);
+		if (!check_noncircular(entry->type, depth+1))
+			return print_circular_bug_entry(entry, depth+1);
+	}
+	return 1;
+}
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+
+/*
+ * Forwards and backwards subgraph searching, for the purposes of
+ * proving that two subgraphs can be connected by a new dependency
+ * without creating any illegal irq-safe -> irq-unsafe lock dependency.
+ */
+static enum lock_usage_bit find_usage_bit;
+static struct lock_type *forwards_match, *backwards_match;
+
+/*
+ * Find a node in the forwards-direction dependency sub-graph starting
+ * at <source> that matches <find_usage_bit>.
+ *
+ * Return 2 if such a node exists in the subgraph, and put that node
+ * into <forwards_match>.
+ *
+ * Return 1 otherwise and keep <forwards_match> unchanged.
+ * Return 0 on error.
+ */
+static noinline int
+find_usage_forwards(struct lock_type *source, unsigned int depth)
+{
+	struct lock_list *entry;
+	int ret;
+
+	if (depth > max_recursion_depth)
+		max_recursion_depth = depth;
+	if (depth >= 20)
+		return print_infinite_recursion_bug();
+
+	debug_atomic_inc(&nr_find_usage_forwards_checks);
+	if (source->usage_mask & (1 << find_usage_bit)) {
+		forwards_match = source;
+		return 2;
+	}
+
+	/*
+	 * Check this lock's dependency list:
+	 */
+	list_for_each_entry(entry, &source->locks_after, entry) {
+		debug_atomic_inc(&nr_find_usage_forwards_recursions);
+		ret = find_usage_forwards(entry->type, depth+1);
+		if (ret == 2 || ret == 0)
+			return ret;
+	}
+	return 1;
+}
+
+/*
+ * Find a node in the backwards-direction dependency sub-graph starting
+ * at <source> that matches <find_usage_bit>.
+ *
+ * Return 2 if such a node exists in the subgraph, and put that node
+ * into <backwards_match>.
+ *
+ * Return 1 otherwise and keep <backwards_match> unchanged.
+ * Return 0 on error.
+ */
+static noinline int
+find_usage_backwards(struct lock_type *source, unsigned int depth)
+{
+	struct lock_list *entry;
+	int ret;
+
+	if (depth > max_recursion_depth)
+		max_recursion_depth = depth;
+	if (depth >= 20)
+		return print_infinite_recursion_bug();
+
+	debug_atomic_inc(&nr_find_usage_backwards_checks);
+	if (source->usage_mask & (1 << find_usage_bit)) {
+		backwards_match = source;
+		return 2;
+	}
+
+	/*
+	 * Check this lock's dependency list:
+	 */
+	list_for_each_entry(entry, &source->locks_before, entry) {
+		debug_atomic_inc(&nr_find_usage_backwards_recursions);
+		ret = find_usage_backwards(entry->type, depth+1);
+		if (ret == 2 || ret == 0)
+			return ret;
+	}
+	return 1;
+}
+
+static int
+print_bad_irq_dependency(struct task_struct *curr,
+			 struct held_lock *prev,
+			 struct held_lock *next,
+			 enum lock_usage_bit bit1,
+			 enum lock_usage_bit bit2,
+			 const char *irqtype)
+{
+	__raw_spin_unlock(&hash_lock);
+	debug_locks_off();
+	if (debug_locks_silent)
+		return 0;
+
+	printk("\n======================================================\n");
+	printk(  "[ BUG: %s-safe -> %s-unsafe lock order detected! ]\n",
+		irqtype, irqtype);
+	printk(  "------------------------------------------------------\n");
+	printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
+		curr->comm, curr->pid,
+		curr->hardirq_context, hardirq_count() >> HARDIRQ_SHIFT,
+		curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT,
+		curr->hardirqs_enabled,
+		curr->softirqs_enabled);
+	print_lock(next);
+
+	printk("\nand this task is already holding:\n");
+	print_lock(prev);
+	printk("which would create a new lock dependency:\n");
+	print_lock_name(prev->type);
+	printk(" ->");
+	print_lock_name(next->type);
+	printk("\n");
+
+	printk("\nbut this new dependency connects a %s-irq-safe lock:\n",
+		irqtype);
+	print_lock_name(backwards_match);
+	printk("\n... which became %s-irq-safe at:\n", irqtype);
+
+	print_stack_trace(backwards_match->usage_traces + bit1, 1);
+
+	printk("\nto a %s-irq-unsafe lock:\n", irqtype);
+	print_lock_name(forwards_match);
+	printk("\n... which became %s-irq-unsafe at:\n", irqtype);
+	printk("...");
+
+	print_stack_trace(forwards_match->usage_traces + bit2, 1);
+
+	printk("\nwhich could potentially lead to deadlocks!\n");
+
+	printk("\nother info that might help us debug this:\n\n");
+	lockdep_print_held_locks(curr);
+
+	printk("\nthe %s-irq-safe lock's dependencies:\n", irqtype);
+	print_lock_dependencies(backwards_match, 0);
+
+	printk("\nthe %s-irq-unsafe lock's dependencies:\n", irqtype);
+	print_lock_dependencies(forwards_match, 0);
+
+	printk("\nstack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+static int
+check_usage(struct task_struct *curr, struct held_lock *prev,
+	    struct held_lock *next, enum lock_usage_bit bit_backwards,
+	    enum lock_usage_bit bit_forwards, const char *irqtype)
+{
+	int ret;
+
+	find_usage_bit = bit_backwards;
+	/* fills in <backwards_match> */
+	ret = find_usage_backwards(prev->type, 0);
+	if (!ret || ret == 1)
+		return ret;
+
+	find_usage_bit = bit_forwards;
+	ret = find_usage_forwards(next->type, 0);
+	if (!ret || ret == 1)
+		return ret;
+	/* ret == 2 */
+	return print_bad_irq_dependency(curr, prev, next,
+			bit_backwards, bit_forwards, irqtype);
+}
+
+#endif
+
+static int
+print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
+		   struct held_lock *next)
+{
+	debug_locks_off();
+	__raw_spin_unlock(&hash_lock);
+	if (debug_locks_silent)
+		return 0;
+
+	printk("\n====================================\n");
+	printk(  "[ BUG: possible deadlock detected! ]\n");
+	printk(  "------------------------------------\n");
+	printk("%s/%d is trying to acquire lock:\n",
+		curr->comm, curr->pid);
+	print_lock(next);
+	printk("\nbut task is already holding lock:\n");
+	print_lock(prev);
+	printk("\nwhich could potentially lead to deadlocks!\n");
+
+	printk("\nother info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	printk("\nstack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Check whether we are holding such a type already.
+ *
+ * (Note that this has to be done separately, because the graph cannot
+ * detect such types of deadlocks.)
+ *
+ * Returns: 0 on deadlock detected, 1 on OK, 2 on recursive read
+ */
+static int
+check_deadlock(struct task_struct *curr, struct held_lock *next,
+	       struct lockdep_map *next_instance, int read)
+{
+	struct held_lock *prev;
+	int i;
+
+	for (i = 0; i < curr->lockdep_depth; i++) {
+		prev = curr->held_locks + i;
+		if (prev->type != next->type)
+			continue;
+		/*
+		 * Allow read-after-read recursion of the same
+		 * lock instance (i.e. read_lock(lock)+read_lock(lock)):
+		 */
+		if ((read > 0) && prev->read &&
+				(prev->instance == next_instance))
+			return 2;
+		return print_deadlock_bug(curr, prev, next);
+	}
+	return 1;
+}
+
+/*
+ * There was a chain-cache miss, and we are about to add a new dependency
+ * to a previous lock. We recursively validate the following rules:
+ *
+ *  - would the adding of the <prev> -> <next> dependency create a
+ *    circular dependency in the graph? [== circular deadlock]
+ *
+ *  - does the new prev->next dependency connect any hardirq-safe lock
+ *    (in the full backwards-subgraph starting at <prev>) with any
+ *    hardirq-unsafe lock (in the full forwards-subgraph starting at
+ *    <next>)? [== illegal lock inversion with hardirq contexts]
+ *
+ *  - does the new prev->next dependency connect any softirq-safe lock
+ *    (in the full backwards-subgraph starting at <prev>) with any
+ *    softirq-unsafe lock (in the full forwards-subgraph starting at
+ *    <next>)? [== illegal lock inversion with softirq contexts]
+ *
+ * any of these scenarios could lead to a deadlock.
+ *
+ * Then if all the validations pass, we add the forwards and backwards
+ * dependency.
+ */
+static int
+check_prev_add(struct task_struct *curr, struct held_lock *prev,
+	       struct held_lock *next)
+{
+	struct lock_list *entry;
+	int ret;
+
+	/*
+	 * Prove that the new <prev> -> <next> dependency would not
+	 * create a circular dependency in the graph. (We do this by
+	 * forward-recursing into the graph starting at <next>, and
+	 * checking whether we can reach <prev>.)
+	 *
+	 * We are using global variables to control the recursion, to
+	 * keep the stackframe size of the recursive functions low:
+	 */
+	check_source = next;
+	check_target = prev;
+	if (!(check_noncircular(next->type, 0)))
+		return print_circular_bug_tail();
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+	/*
+	 * Prove that the new dependency does not connect a hardirq-safe
+	 * lock with a hardirq-unsafe lock - to achieve this we search
+	 * the backwards-subgraph starting at <prev>, and the
+	 * forwards-subgraph starting at <next>:
+	 */
+	if (!check_usage(curr, prev, next, LOCK_USED_IN_HARDIRQ,
+					LOCK_ENABLED_HARDIRQS, "hard"))
+		return 0;
+
+	/*
+	 * Prove that the new dependency does not connect a hardirq-safe-read
+	 * lock with a hardirq-unsafe lock - to achieve this we search
+	 * the backwards-subgraph starting at <prev>, and the
+	 * forwards-subgraph starting at <next>:
+	 */
+	if (!check_usage(curr, prev, next, LOCK_USED_IN_HARDIRQ_READ,
+					LOCK_ENABLED_HARDIRQS, "hard-read"))
+		return 0;
+
+	/*
+	 * Prove that the new dependency does not connect a softirq-safe
+	 * lock with a softirq-unsafe lock - to achieve this we search
+	 * the backwards-subgraph starting at <prev>, and the
+	 * forwards-subgraph starting at <next>:
+	 */
+	if (!check_usage(curr, prev, next, LOCK_USED_IN_SOFTIRQ,
+					LOCK_ENABLED_SOFTIRQS, "soft"))
+		return 0;
+	/*
+	 * Prove that the new dependency does not connect a softirq-safe-read
+	 * lock with a softirq-unsafe lock - to achieve this we search
+	 * the backwards-subgraph starting at <prev>, and the
+	 * forwards-subgraph starting at <next>:
+	 */
+	if (!check_usage(curr, prev, next, LOCK_USED_IN_SOFTIRQ_READ,
+					LOCK_ENABLED_SOFTIRQS, "soft"))
+		return 0;
+#endif
+	/*
+	 * For recursive read-locks we do all the dependency checks,
+	 * but we dont store read-triggered dependencies (only
+	 * write-triggered dependencies). This ensures that only the
+	 * write-side dependencies matter, and that if for example a
+	 * write-lock never takes any other locks, then the reads are
+	 * equivalent to a NOP.
+	 */
+	if (next->read == 1 || prev->read == 1)
+		return 1;
+	/*
+	 * Is the <prev> -> <next> dependency already present?
+	 *
+	 * (this may occur even though this is a new chain: consider
+	 *  e.g. the L1 -> L2 -> L3 -> L4 and the L5 -> L1 -> L2 -> L3
+	 *  chains - the second one will be new, but L1 already has
+	 *  L2 added to its dependency list, due to the first chain.)
+	 */
+	list_for_each_entry(entry, &prev->type->locks_after, entry) {
+		if (entry->type == next->type)
+			return 2;
+	}
+
+	/*
+	 * Ok, all validations passed, add the new lock
+	 * to the previous lock's dependency list:
+	 */
+	ret = add_lock_to_list(prev->type, next->type,
+			       &prev->type->locks_after, next->acquire_ip);
+	if (!ret)
+		return 0;
+	/*
+	 * Return value of 2 signals 'dependency already added',
+	 * in that case we dont have to add the backlink either.
+	 */
+	if (ret == 2)
+		return 2;
+	ret = add_lock_to_list(next->type, prev->type,
+			       &next->type->locks_before, next->acquire_ip);
+
+	/*
+	 * Debugging printouts:
+	 */
+	if (verbose(prev->type) || verbose(next->type)) {
+		__raw_spin_unlock(&hash_lock);
+		print_lock_name_field(prev->type);
+		printk(" => ");
+		print_lock_name_field(next->type);
+		printk("\n");
+		dump_stack();
+		__raw_spin_lock(&hash_lock);
+	}
+	return 1;
+}
+
+/*
+ * Add the dependency to all directly-previous locks that are 'relevant'.
+ * The ones that are relevant are (in increasing distance from curr):
+ * all consecutive trylock entries and the final non-trylock entry - or
+ * the end of this context's lock-chain - whichever comes first.
+ */
+static int
+check_prevs_add(struct task_struct *curr, struct held_lock *next)
+{
+	int depth = curr->lockdep_depth;
+	struct held_lock *hlock;
+
+	/*
+	 * Debugging checks.
+	 *
+	 * Depth must not be zero for a non-head lock:
+	 */
+	if (!depth)
+		goto out_bug;
+	/*
+	 * At least two relevant locks must exist for this
+	 * to be a head:
+	 */
+	if (curr->held_locks[depth].irq_context !=
+			curr->held_locks[depth-1].irq_context)
+		goto out_bug;
+
+	for (;;) {
+		hlock = curr->held_locks + depth-1;
+		/*
+		 * Only non-recursive-read entries get new dependencies
+		 * added:
+		 */
+		if (hlock->read != 2) {
+			check_prev_add(curr, hlock, next);
+			/*
+			 * Stop after the first non-trylock entry,
+			 * as non-trylock entries have added their
+			 * own direct dependencies already, so this
+			 * lock is connected to them indirectly:
+			 */
+			if (!hlock->trylock)
+				break;
+		}
+		depth--;
+		/*
+		 * End of lock-stack?
+		 */
+		if (!depth)
+			break;
+		/*
+		 * Stop the search if we cross into another context:
+		 */
+		if (curr->held_locks[depth].irq_context !=
+				curr->held_locks[depth-1].irq_context)
+			break;
+	}
+	return 1;
+out_bug:
+	__raw_spin_unlock(&hash_lock);
+	DEBUG_WARN_ON(1);
+
+	return 0;
+}
+
+
+/*
+ * Is this the address of a static object:
+ */
+static int static_obj(void *obj)
+{
+	unsigned long start = (unsigned long) &_stext,
+		      end   = (unsigned long) &_end,
+		      addr  = (unsigned long) obj;
+	int i;
+
+	/*
+	 * static variable?
+	 */
+	if ((addr >= start) && (addr < end))
+		return 1;
+
+#ifdef CONFIG_SMP
+	/*
+	 * percpu var?
+	 */
+	for_each_possible_cpu(i) {
+		start = (unsigned long) &__per_cpu_start + per_cpu_offset(i);
+		end   = (unsigned long) &__per_cpu_end   + per_cpu_offset(i);
+
+		if ((addr >= start) && (addr < end))
+			return 1;
+	}
+#endif
+
+	/*
+	 * module var?
+	 */
+	return __module_address(addr);
+}
+
+/*
+ * To make lock name printouts unique, we calculate a unique
+ * type->name_version generation counter:
+ */
+int count_matching_names(struct lock_type *new_type)
+{
+	struct lock_type *type;
+	int count = 0;
+
+	if (!new_type->name)
+		return 0;
+
+	list_for_each_entry(type, &all_lock_types, lock_entry) {
+		if (new_type->key - new_type->subtype == type->key)
+			return type->name_version;
+		if (!strcmp(type->name, new_type->name))
+			count = max(count, type->name_version);
+	}
+
+	return count + 1;
+}
+
+extern void __error_too_big_MAX_LOCKDEP_SUBTYPES(void);
+
+/*
+ * Register a lock's type in the hash-table, if the type is not present
+ * yet. Otherwise we look it up. We cache the result in the lock object
+ * itself, so actual lookup of the hash should be once per lock object.
+ */
+static inline struct lock_type *
+register_lock_type(struct lockdep_map *lock, unsigned int subtype)
+{
+	struct lockdep_subtype_key *key;
+	struct list_head *hash_head;
+	struct lock_type *type;
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+	/*
+	 * If the architecture calls into lockdep before initializing
+	 * the hashes then we'll warn about it later. (we cannot printk
+	 * right now)
+	 */
+	if (unlikely(!lockdep_initialized)) {
+		lockdep_init();
+		lockdep_init_error = 1;
+	}
+#endif
+
+	/*
+	 * Static locks do not have their type-keys yet - for them the key
+	 * is the lock object itself:
+	 */
+	if (unlikely(!lock->key))
+		lock->key = (void *)lock;
+
+	/*
+	 * Debug-check: all keys must be persistent!
+ 	 */
+	if (DEBUG_WARN_ON(!static_obj(lock->key))) {
+		debug_locks_off();
+		printk("BUG: trying to register non-static key!\n");
+		printk("turning off the locking correctness validator.\n");
+		dump_stack();
+		return NULL;
+	}
+
+	/*
+	 * NOTE: the type-key must be unique. For dynamic locks, a static
+	 * lockdep_type_key variable is passed in through the mutex_init()
+	 * (or spin_lock_init()) call - which acts as the key. For static
+	 * locks we use the lock object itself as the key.
+	 */
+	if (sizeof(struct lockdep_type_key) > sizeof(struct lock_type))
+		__error_too_big_MAX_LOCKDEP_SUBTYPES();
+
+	key = lock->key->subkeys + subtype;
+
+	hash_head = typehashentry(key);
+
+	/*
+	 * We can walk the hash lockfree, because the hash only
+	 * grows, and we are careful when adding entries to the end:
+	 */
+	list_for_each_entry(type, hash_head, hash_entry)
+		if (type->key == key)
+			goto out_set;
+
+	__raw_spin_lock(&hash_lock);
+	/*
+	 * We have to do the hash-walk again, to avoid races
+	 * with another CPU:
+	 */
+	list_for_each_entry(type, hash_head, hash_entry)
+		if (type->key == key)
+			goto out_unlock_set;
+	/*
+	 * Allocate a new key from the static array, and add it to
+	 * the hash:
+	 */
+	if (nr_lock_types >= MAX_LOCKDEP_KEYS) {
+		__raw_spin_unlock(&hash_lock);
+		debug_locks_off();
+		printk("BUG: MAX_LOCKDEP_KEYS too low!\n");
+		printk("turning off the locking correctness validator.\n");
+		return NULL;
+	}
+	type = lock_types + nr_lock_types++;
+	debug_atomic_inc(&nr_unused_locks);
+	type->key = key;
+	type->name = lock->name;
+	type->subtype = subtype;
+	INIT_LIST_HEAD(&type->lock_entry);
+	INIT_LIST_HEAD(&type->locks_before);
+	INIT_LIST_HEAD(&type->locks_after);
+	type->name_version = count_matching_names(type);
+	/*
+	 * We use RCU's safe list-add method to make
+	 * parallel walking of the hash-list safe:
+	 */
+	list_add_tail_rcu(&type->hash_entry, hash_head);
+
+	if (verbose(type)) {
+		__raw_spin_unlock(&hash_lock);
+		printk("new type %p: %s", type->key, type->name);
+		if (type->name_version > 1)
+			printk("#%d", type->name_version);
+		printk("\n");
+		dump_stack();
+		__raw_spin_lock(&hash_lock);
+	}
+out_unlock_set:
+	__raw_spin_unlock(&hash_lock);
+
+out_set:
+	lock->type[subtype] = type;
+
+	DEBUG_WARN_ON(type->subtype != subtype);
+
+	return type;
+}
+
+/*
+ * Look up a dependency chain. If the key is not present yet then
+ * add it and return 0 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 1.
+ */
+static inline int lookup_chain_cache(u64 chain_key)
+{
+	struct list_head *hash_head = chainhashentry(chain_key);
+	struct lock_chain *chain;
+
+	DEBUG_WARN_ON(!irqs_disabled());
+	/*
+	 * We can walk it lock-free, because entries only get added
+	 * to the hash:
+	 */
+	list_for_each_entry(chain, hash_head, entry) {
+		if (chain->chain_key == chain_key) {
+cache_hit:
+			debug_atomic_inc(&chain_lookup_hits);
+			/*
+			 * In the debugging case, force redundant checking
+			 * by returning 1:
+			 */
+#ifdef CONFIG_DEBUG_LOCKDEP
+			__raw_spin_lock(&hash_lock);
+			return 1;
+#endif
+			return 0;
+		}
+	}
+	/*
+	 * Allocate a new chain entry from the static array, and add
+	 * it to the hash:
+	 */
+	__raw_spin_lock(&hash_lock);
+	/*
+	 * We have to walk the chain again locked - to avoid duplicates:
+	 */
+	list_for_each_entry(chain, hash_head, entry) {
+		if (chain->chain_key == chain_key) {
+			__raw_spin_unlock(&hash_lock);
+			goto cache_hit;
+		}
+	}
+	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
+		__raw_spin_unlock(&hash_lock);
+		debug_locks_off();
+		printk("BUG: MAX_LOCKDEP_CHAINS too low!\n");
+		printk("turning off the locking correctness validator.\n");
+		return 0;
+	}
+	chain = lock_chains + nr_lock_chains++;
+	chain->chain_key = chain_key;
+	list_add_tail_rcu(&chain->entry, hash_head);
+	debug_atomic_inc(&chain_lookup_misses);
+#ifdef CONFIG_TRACE_IRQFLAGS
+	if (current->hardirq_context)
+		nr_hardirq_chains++;
+	else {
+		if (current->softirq_context)
+			nr_softirq_chains++;
+		else
+			nr_process_chains++;
+	}
+#else
+	nr_process_chains++;
+#endif
+
+	return 1;
+}
+
+/*
+ * We are building curr_chain_key incrementally, so double-check
+ * it from scratch, to make sure that it's done correctly:
+ */
+static void check_chain_key(struct task_struct *curr)
+{
+#ifdef CONFIG_DEBUG_LOCKDEP
+	struct held_lock *hlock, *prev_hlock = NULL;
+	unsigned int i, id;
+	u64 chain_key = 0;
+
+	for (i = 0; i < curr->lockdep_depth; i++) {
+		hlock = curr->held_locks + i;
+		if (chain_key != hlock->prev_chain_key) {
+			debug_locks_off();
+			printk("hm#1, depth: %u [%u], %016Lx != %016Lx\n",
+				curr->lockdep_depth, i, chain_key,
+				hlock->prev_chain_key);
+			WARN_ON(1);
+			return;
+		}
+		id = hlock->type - lock_types;
+		DEBUG_WARN_ON(id >= MAX_LOCKDEP_KEYS);
+		if (prev_hlock && (prev_hlock->irq_context !=
+							hlock->irq_context))
+			chain_key = 0;
+		chain_key = iterate_chain_key(chain_key, id);
+		prev_hlock = hlock;
+	}
+	if (chain_key != curr->curr_chain_key) {
+		debug_locks_off();
+		printk("hm#2, depth: %u [%u], %016Lx != %016Lx\n",
+			curr->lockdep_depth, i, chain_key,
+			curr->curr_chain_key);
+		WARN_ON(1);
+	}
+#endif
+}
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+
+/*
+ * print irq inversion bug:
+ */
+static int
+print_irq_inversion_bug(struct task_struct *curr, struct lock_type *other,
+			struct held_lock *this, int forwards,
+			const char *irqtype)
+{
+	__raw_spin_unlock(&hash_lock);
+	debug_locks_off();
+	if (debug_locks_silent)
+		return 0;
+
+	printk("\n==================================================\n");
+	printk(  "[ BUG: possible irq lock inversion bug detected! ]\n");
+	printk(  "--------------------------------------------------\n");
+	printk("%s/%d just changed the state of lock:\n",
+		curr->comm, curr->pid);
+	print_lock(this);
+	if (forwards)
+		printk("but this lock took another, %s-irq-unsafe lock in the past:\n", irqtype);
+	else
+		printk("but this lock was taken by another, %s-irq-safe lock in the past:\n", irqtype);
+	print_lock_name(other);
+	printk("\n\nand interrupts could create inverse lock ordering between them,\n");
+
+	printk("which could potentially lead to deadlocks!\n");
+
+	printk("\nother info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	printk("\nthe first lock's dependencies:\n");
+	print_lock_dependencies(this->type, 0);
+
+	printk("\nthe second lock's dependencies:\n");
+	print_lock_dependencies(other, 0);
+
+	printk("\nstack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Prove that in the forwards-direction subgraph starting at <this>
+ * there is no lock matching <mask>:
+ */
+static int
+check_usage_forwards(struct task_struct *curr, struct held_lock *this,
+		     enum lock_usage_bit bit, const char *irqtype)
+{
+	int ret;
+
+	find_usage_bit = bit;
+	/* fills in <forwards_match> */
+	ret = find_usage_forwards(this->type, 0);
+	if (!ret || ret == 1)
+		return ret;
+
+	return print_irq_inversion_bug(curr, forwards_match, this, 1, irqtype);
+}
+
+/*
+ * Prove that in the backwards-direction subgraph starting at <this>
+ * there is no lock matching <mask>:
+ */
+static int
+check_usage_backwards(struct task_struct *curr, struct held_lock *this,
+		      enum lock_usage_bit bit, const char *irqtype)
+{
+	int ret;
+
+	find_usage_bit = bit;
+	/* fills in <backwards_match> */
+	ret = find_usage_backwards(this->type, 0);
+	if (!ret || ret == 1)
+		return ret;
+
+	return print_irq_inversion_bug(curr, backwards_match, this, 0, irqtype);
+}
+
+static inline void print_irqtrace_events(struct task_struct *curr)
+{
+	printk("irq event stamp: %u\n", curr->irq_events);
+	printk("hardirqs last  enabled at (%u): [<%08lx>]",
+		curr->hardirq_enable_event, curr->hardirq_enable_ip);
+	print_symbol(" %s\n", curr->hardirq_enable_ip);
+	printk("hardirqs last disabled at (%u): [<%08lx>]",
+		curr->hardirq_disable_event, curr->hardirq_disable_ip);
+	print_symbol(" %s\n", curr->hardirq_disable_ip);
+	printk("softirqs last  enabled at (%u): [<%08lx>]",
+		curr->softirq_enable_event, curr->softirq_enable_ip);
+	print_symbol(" %s\n", curr->softirq_enable_ip);
+	printk("softirqs last disabled at (%u): [<%08lx>]",
+		curr->softirq_disable_event, curr->softirq_disable_ip);
+	print_symbol(" %s\n", curr->softirq_disable_ip);
+}
+
+#else
+static inline void print_irqtrace_events(struct task_struct *curr)
+{
+}
+#endif
+
+static int
+print_usage_bug(struct task_struct *curr, struct held_lock *this,
+		enum lock_usage_bit prev_bit, enum lock_usage_bit new_bit)
+{
+	__raw_spin_unlock(&hash_lock);
+	debug_locks_off();
+	if (debug_locks_silent)
+		return 0;
+
+	printk("\n============================\n");
+	printk(  "[ BUG: illegal lock usage! ]\n");
+	printk(  "----------------------------\n");
+
+	printk("illegal {%s} -> {%s} usage.\n",
+		usage_str[prev_bit], usage_str[new_bit]);
+
+	printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n",
+		curr->comm, curr->pid,
+		trace_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT,
+		trace_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT,
+		trace_hardirqs_enabled(curr),
+		trace_softirqs_enabled(curr));
+	print_lock(this);
+
+	printk("{%s} state was registered at:\n", usage_str[prev_bit]);
+	print_stack_trace(this->type->usage_traces + prev_bit, 1);
+
+	print_irqtrace_events(curr);
+	printk("\nother info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	printk("\nstack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Print out an error if an invalid bit is set:
+ */
+static inline int
+valid_state(struct task_struct *curr, struct held_lock *this,
+	    enum lock_usage_bit new_bit, enum lock_usage_bit bad_bit)
+{
+	if (unlikely(this->type->usage_mask & (1 << bad_bit)))
+		return print_usage_bug(curr, this, bad_bit, new_bit);
+	return 1;
+}
+
+#define STRICT_READ_CHECKS	1
+
+/*
+ * Mark a lock with a usage bit, and validate the state transition:
+ */
+static int mark_lock(struct task_struct *curr, struct held_lock *this,
+		     enum lock_usage_bit new_bit, unsigned long ip)
+{
+	unsigned int new_mask = 1 << new_bit, ret = 1;
+
+	/*
+	 * If already set then do not dirty the cacheline,
+	 * nor do any checks:
+	 */
+	if (likely(this->type->usage_mask & new_mask))
+		return 1;
+
+	__raw_spin_lock(&hash_lock);
+	/*
+	 * Make sure we didnt race:
+	 */
+	if (unlikely(this->type->usage_mask & new_mask)) {
+		__raw_spin_unlock(&hash_lock);
+		return 1;
+	}
+
+	this->type->usage_mask |= new_mask;
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+	if (new_bit == LOCK_ENABLED_HARDIRQS ||
+			new_bit == LOCK_ENABLED_HARDIRQS_READ)
+		ip = curr->hardirq_enable_ip;
+	else if (new_bit == LOCK_ENABLED_SOFTIRQS ||
+			new_bit == LOCK_ENABLED_SOFTIRQS_READ)
+		ip = curr->softirq_enable_ip;
+#endif
+	if (!save_trace(this->type->usage_traces + new_bit))
+		return 0;
+
+	switch (new_bit) {
+#ifdef CONFIG_TRACE_IRQFLAGS
+	case LOCK_USED_IN_HARDIRQ:
+		if (!valid_state(curr, this, new_bit, LOCK_ENABLED_HARDIRQS))
+			return 0;
+		if (!valid_state(curr, this, new_bit,
+				 LOCK_ENABLED_HARDIRQS_READ))
+			return 0;
+		/*
+		 * just marked it hardirq-safe, check that this lock
+		 * took no hardirq-unsafe lock in the past:
+		 */
+		if (!check_usage_forwards(curr, this,
+					  LOCK_ENABLED_HARDIRQS, "hard"))
+			return 0;
+#if STRICT_READ_CHECKS
+		/*
+		 * just marked it hardirq-safe, check that this lock
+		 * took no hardirq-unsafe-read lock in the past:
+		 */
+		if (!check_usage_forwards(curr, this,
+				LOCK_ENABLED_HARDIRQS_READ, "hard-read"))
+			return 0;
+#endif
+		debug_atomic_inc(&nr_hardirq_safe_locks);
+		if (hardirq_verbose(this->type))
+			ret = 2;
+		break;
+	case LOCK_USED_IN_SOFTIRQ:
+		if (!valid_state(curr, this, new_bit, LOCK_ENABLED_SOFTIRQS))
+			return 0;
+		if (!valid_state(curr, this, new_bit,
+				 LOCK_ENABLED_SOFTIRQS_READ))
+			return 0;
+		/*
+		 * just marked it softirq-safe, check that this lock
+		 * took no softirq-unsafe lock in the past:
+		 */
+		if (!check_usage_forwards(curr, this,
+					  LOCK_ENABLED_SOFTIRQS, "soft"))
+			return 0;
+#if STRICT_READ_CHECKS
+		/*
+		 * just marked it softirq-safe, check that this lock
+		 * took no softirq-unsafe-read lock in the past:
+		 */
+		if (!check_usage_forwards(curr, this,
+				LOCK_ENABLED_SOFTIRQS_READ, "soft-read"))
+			return 0;
+#endif
+		debug_atomic_inc(&nr_softirq_safe_locks);
+		if (softirq_verbose(this->type))
+			ret = 2;
+		break;
+	case LOCK_USED_IN_HARDIRQ_READ:
+		if (!valid_state(curr, this, new_bit, LOCK_ENABLED_HARDIRQS))
+			return 0;
+		/*
+		 * just marked it hardirq-read-safe, check that this lock
+		 * took no hardirq-unsafe lock in the past:
+		 */
+		if (!check_usage_forwards(curr, this,
+					  LOCK_ENABLED_HARDIRQS, "hard"))
+			return 0;
+		debug_atomic_inc(&nr_hardirq_read_safe_locks);
+		if (hardirq_verbose(this->type))
+			ret = 2;
+		break;
+	case LOCK_USED_IN_SOFTIRQ_READ:
+		if (!valid_state(curr, this, new_bit, LOCK_ENABLED_SOFTIRQS))
+			return 0;
+		/*
+		 * just marked it softirq-read-safe, check that this lock
+		 * took no softirq-unsafe lock in the past:
+		 */
+		if (!check_usage_forwards(curr, this,
+					  LOCK_ENABLED_SOFTIRQS, "soft"))
+			return 0;
+		debug_atomic_inc(&nr_softirq_read_safe_locks);
+		if (softirq_verbose(this->type))
+			ret = 2;
+		break;
+	case LOCK_ENABLED_HARDIRQS:
+		if (!valid_state(curr, this, new_bit, LOCK_USED_IN_HARDIRQ))
+			return 0;
+		if (!valid_state(curr, this, new_bit,
+				 LOCK_USED_IN_HARDIRQ_READ))
+			return 0;
+		/*
+		 * just marked it hardirq-unsafe, check that no hardirq-safe
+		 * lock in the system ever took it in the past:
+		 */
+		if (!check_usage_backwards(curr, this,
+					   LOCK_USED_IN_HARDIRQ, "hard"))
+			return 0;
+#if STRICT_READ_CHECKS
+		/*
+		 * just marked it hardirq-unsafe, check that no
+		 * hardirq-safe-read lock in the system ever took
+		 * it in the past:
+		 */
+		if (!check_usage_backwards(curr, this,
+				   LOCK_USED_IN_HARDIRQ_READ, "hard-read"))
+			return 0;
+#endif
+		debug_atomic_inc(&nr_hardirq_unsafe_locks);
+		if (hardirq_verbose(this->type))
+			ret = 2;
+		break;
+	case LOCK_ENABLED_SOFTIRQS:
+		if (!valid_state(curr, this, new_bit, LOCK_USED_IN_SOFTIRQ))
+			return 0;
+		if (!valid_state(curr, this, new_bit,
+				 LOCK_USED_IN_SOFTIRQ_READ))
+			return 0;
+		/*
+		 * just marked it softirq-unsafe, check that no softirq-safe
+		 * lock in the system ever took it in the past:
+		 */
+		if (!check_usage_backwards(curr, this,
+					   LOCK_USED_IN_SOFTIRQ, "soft"))
+			return 0;
+#if STRICT_READ_CHECKS
+		/*
+		 * just marked it softirq-unsafe, check that no
+		 * softirq-safe-read lock in the system ever took
+		 * it in the past:
+		 */
+		if (!check_usage_backwards(curr, this,
+				   LOCK_USED_IN_SOFTIRQ_READ, "soft-read"))
+			return 0;
+#endif
+		debug_atomic_inc(&nr_softirq_unsafe_locks);
+		if (softirq_verbose(this->type))
+			ret = 2;
+		break;
+	case LOCK_ENABLED_HARDIRQS_READ:
+		if (!valid_state(curr, this, new_bit, LOCK_USED_IN_HARDIRQ))
+			return 0;
+#if STRICT_READ_CHECKS
+		/*
+		 * just marked it hardirq-read-unsafe, check that no
+		 * hardirq-safe lock in the system ever took it in the past:
+		 */
+		if (!check_usage_backwards(curr, this,
+					   LOCK_USED_IN_HARDIRQ, "hard"))
+			return 0;
+#endif
+		debug_atomic_inc(&nr_hardirq_read_unsafe_locks);
+		if (hardirq_verbose(this->type))
+			ret = 2;
+		break;
+	case LOCK_ENABLED_SOFTIRQS_READ:
+		if (!valid_state(curr, this, new_bit, LOCK_USED_IN_SOFTIRQ))
+			return 0;
+#if STRICT_READ_CHECKS
+		/*
+		 * just marked it softirq-read-unsafe, check that no
+		 * softirq-safe lock in the system ever took it in the past:
+		 */
+		if (!check_usage_backwards(curr, this,
+					   LOCK_USED_IN_SOFTIRQ, "soft"))
+			return 0;
+#endif
+		debug_atomic_inc(&nr_softirq_read_unsafe_locks);
+		if (softirq_verbose(this->type))
+			ret = 2;
+		break;
+#endif
+	case LOCK_USED:
+		/*
+		 * Add it to the global list of types:
+		 */
+		list_add_tail_rcu(&this->type->lock_entry, &all_lock_types);
+		debug_atomic_dec(&nr_unused_locks);
+		break;
+	default:
+		debug_locks_off();
+		WARN_ON(1);
+		return 0;
+	}
+
+	__raw_spin_unlock(&hash_lock);
+
+	/*
+	 * We must printk outside of the hash_lock:
+	 */
+	if (ret == 2) {
+		printk("\nmarked lock as {%s}:\n", usage_str[new_bit]);
+		print_lock(this);
+		print_irqtrace_events(curr);
+		dump_stack();
+	}
+
+	return ret;
+}
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+/*
+ * Mark all held locks with a usage bit:
+ */
+static int
+mark_held_locks(struct task_struct *curr, int hardirq, unsigned long ip)
+{
+	enum lock_usage_bit usage_bit;
+	struct held_lock *hlock;
+	int i;
+
+	for (i = 0; i < curr->lockdep_depth; i++) {
+		hlock = curr->held_locks + i;
+
+		if (hardirq) {
+			if (hlock->read)
+				usage_bit = LOCK_ENABLED_HARDIRQS_READ;
+			else
+				usage_bit = LOCK_ENABLED_HARDIRQS;
+		} else {
+			if (hlock->read)
+				usage_bit = LOCK_ENABLED_SOFTIRQS_READ;
+			else
+				usage_bit = LOCK_ENABLED_SOFTIRQS;
+		}
+		if (!mark_lock(curr, hlock, usage_bit, ip))
+			return 0;
+	}
+
+	return 1;
+}
+
+/*
+ * Debugging helper: via this flag we know that we are in
+ * 'early bootup code', and will warn about any invalid irqs-on event:
+ */
+static int early_boot_irqs_enabled;
+
+void early_boot_irqs_off(void)
+{
+	early_boot_irqs_enabled = 0;
+}
+
+void early_boot_irqs_on(void)
+{
+	early_boot_irqs_enabled = 1;
+}
+
+/*
+ * Hardirqs will be enabled:
+ */
+void trace_hardirqs_on(void)
+{
+	struct task_struct *curr = current;
+	unsigned long ip;
+
+	if (unlikely(!debug_locks))
+		return;
+
+	if (DEBUG_WARN_ON(unlikely(!early_boot_irqs_enabled)))
+		return;
+
+	if (unlikely(curr->hardirqs_enabled)) {
+		debug_atomic_inc(&redundant_hardirqs_on);
+		return;
+	}
+	/* we'll do an OFF -> ON transition: */
+	curr->hardirqs_enabled = 1;
+	ip = (unsigned long) __builtin_return_address(0);
+
+	if (DEBUG_WARN_ON(!irqs_disabled()))
+		return;
+	if (DEBUG_WARN_ON(current->hardirq_context))
+		return;
+	/*
+	 * We are going to turn hardirqs on, so set the
+	 * usage bit for all held locks:
+	 */
+	if (!mark_held_locks(curr, 1, ip))
+		return;
+	/*
+	 * If we have softirqs enabled, then set the usage
+	 * bit for all held locks. (disabled hardirqs prevented
+	 * this bit from being set before)
+	 */
+	if (curr->softirqs_enabled)
+		if (!mark_held_locks(curr, 0, ip))
+			return;
+
+	curr->hardirq_enable_ip = ip;
+	curr->hardirq_enable_event = ++curr->irq_events;
+	debug_atomic_inc(&hardirqs_on_events);
+}
+
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+/*
+ * Hardirqs were disabled:
+ */
+void trace_hardirqs_off(void)
+{
+	struct task_struct *curr = current;
+
+	if (unlikely(!debug_locks))
+		return;
+
+	if (DEBUG_WARN_ON(!irqs_disabled()))
+		return;
+
+	if (curr->hardirqs_enabled) {
+		/*
+		 * We have done an ON -> OFF transition:
+		 */
+		curr->hardirqs_enabled = 0;
+		curr->hardirq_disable_ip = _RET_IP_;
+		curr->hardirq_disable_event = ++curr->irq_events;
+		debug_atomic_inc(&hardirqs_off_events);
+	} else
+		debug_atomic_inc(&redundant_hardirqs_off);
+}
+
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+/*
+ * Softirqs will be enabled:
+ */
+void trace_softirqs_on(unsigned long ip)
+{
+	struct task_struct *curr = current;
+
+	if (unlikely(!debug_locks))
+		return;
+
+	if (DEBUG_WARN_ON(!irqs_disabled()))
+		return;
+
+	if (curr->softirqs_enabled) {
+		debug_atomic_inc(&redundant_softirqs_on);
+		return;
+	}
+
+	/*
+	 * We'll do an OFF -> ON transition:
+	 */
+	curr->softirqs_enabled = 1;
+	curr->softirq_enable_ip = ip;
+	curr->softirq_enable_event = ++curr->irq_events;
+	debug_atomic_inc(&softirqs_on_events);
+	/*
+	 * We are going to turn softirqs on, so set the
+	 * usage bit for all held locks, if hardirqs are
+	 * enabled too:
+	 */
+	if (curr->hardirqs_enabled)
+		mark_held_locks(curr, 0, ip);
+}
+
+/*
+ * Softirqs were disabled:
+ */
+void trace_softirqs_off(unsigned long ip)
+{
+	struct task_struct *curr = current;
+
+	if (unlikely(!debug_locks))
+		return;
+
+	if (DEBUG_WARN_ON(!irqs_disabled()))
+		return;
+
+	if (curr->softirqs_enabled) {
+		/*
+		 * We have done an ON -> OFF transition:
+		 */
+		curr->softirqs_enabled = 0;
+		curr->softirq_disable_ip = ip;
+		curr->softirq_disable_event = ++curr->irq_events;
+		debug_atomic_inc(&softirqs_off_events);
+		DEBUG_WARN_ON(!softirq_count());
+	} else
+		debug_atomic_inc(&redundant_softirqs_off);
+}
+
+#endif
+
+/*
+ * Initialize a lock instance's lock-type mapping info:
+ */
+void lockdep_init_map(struct lockdep_map *lock, const char *name,
+		      struct lockdep_type_key *key)
+{
+	if (unlikely(!debug_locks))
+		return;
+
+	if (DEBUG_WARN_ON(!key))
+		return;
+
+	/*
+	 * Sanity check, the lock-type key must be persistent:
+	 */
+	if (!static_obj(key)) {
+		printk("BUG: key %p not in .data!\n", key);
+		DEBUG_WARN_ON(1);
+		return;
+	}
+	lock->name = name;
+	lock->key = key;
+	memset(lock->type, 0, sizeof(lock->type[0])*MAX_LOCKDEP_SUBTYPES);
+}
+
+EXPORT_SYMBOL_GPL(lockdep_init_map);
+
+/*
+ * This gets called for every mutex_lock*()/spin_lock*() operation.
+ * We maintain the dependency maps and validate the locking attempt:
+ */
+static int __lockdep_acquire(struct lockdep_map *lock, unsigned int subtype,
+			     int trylock, int read, int hardirqs_off,
+			     unsigned long ip)
+{
+	struct task_struct *curr = current;
+	struct held_lock *hlock;
+	struct lock_type *type;
+	unsigned int depth, id;
+	int chain_head = 0;
+	u64 chain_key;
+
+	if (unlikely(!debug_locks))
+		return 0;
+
+	if (DEBUG_WARN_ON(!irqs_disabled()))
+		return 0;
+
+	if (unlikely(subtype >= MAX_LOCKDEP_SUBTYPES)) {
+		debug_locks_off();
+		printk("BUG: MAX_LOCKDEP_SUBTYPES too low!\n");
+		printk("turning off the locking correctness validator.\n");
+		return 0;
+	}
+
+	type = lock->type[subtype];
+	/* not cached yet? */
+	if (unlikely(!type)) {
+		type = register_lock_type(lock, subtype);
+		if (!type)
+			return 0;
+	}
+	debug_atomic_inc((atomic_t *)&type->ops);
+
+	/*
+	 * Add the lock to the list of currently held locks.
+	 * (we dont increase the depth just yet, up until the
+	 * dependency checks are done)
+	 */
+	depth = curr->lockdep_depth;
+	if (DEBUG_WARN_ON(depth >= MAX_LOCK_DEPTH))
+		return 0;
+
+	hlock = curr->held_locks + depth;
+
+	hlock->type = type;
+	hlock->acquire_ip = ip;
+	hlock->instance = lock;
+	hlock->trylock = trylock;
+	hlock->read = read;
+	hlock->hardirqs_off = hardirqs_off;
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+	/*
+	 * If non-trylock use in a hardirq or softirq context, then
+	 * mark the lock as used in these contexts:
+	 */
+	if (!trylock) {
+		if (read) {
+			if (curr->hardirq_context)
+				if (!mark_lock(curr, hlock,
+						LOCK_USED_IN_HARDIRQ_READ, ip))
+					return 0;
+			if (curr->softirq_context)
+				if (!mark_lock(curr, hlock,
+						LOCK_USED_IN_SOFTIRQ_READ, ip))
+					return 0;
+		} else {
+			if (curr->hardirq_context)
+				if (!mark_lock(curr, hlock, LOCK_USED_IN_HARDIRQ, ip))
+					return 0;
+			if (curr->softirq_context)
+				if (!mark_lock(curr, hlock, LOCK_USED_IN_SOFTIRQ, ip))
+					return 0;
+		}
+	}
+	if (!hardirqs_off) {
+		if (read) {
+			if (!mark_lock(curr, hlock,
+					LOCK_ENABLED_HARDIRQS_READ, ip))
+				return 0;
+			if (curr->softirqs_enabled)
+				if (!mark_lock(curr, hlock,
+						LOCK_ENABLED_SOFTIRQS_READ, ip))
+					return 0;
+		} else {
+			if (!mark_lock(curr, hlock,
+					LOCK_ENABLED_HARDIRQS, ip))
+				return 0;
+			if (curr->softirqs_enabled)
+				if (!mark_lock(curr, hlock,
+						LOCK_ENABLED_SOFTIRQS, ip))
+					return 0;
+		}
+	}
+#endif
+	/* mark it as used: */
+	if (!mark_lock(curr, hlock, LOCK_USED, ip))
+		return 0;
+	/*
+	 * Calculate the chain hash: it's the combined has of all the
+	 * lock keys along the dependency chain. We save the hash value
+	 * at every step so that we can get the current hash easily
+	 * after unlock. The chain hash is then used to cache dependency
+	 * results.
+	 *
+	 * The 'key ID' is what is the most compact key value to drive
+	 * the hash, not type->key.
+	 */
+	id = type - lock_types;
+	if (DEBUG_WARN_ON(id >= MAX_LOCKDEP_KEYS))
+		return 0;
+
+	chain_key = curr->curr_chain_key;
+	if (!depth) {
+		if (DEBUG_WARN_ON(chain_key != 0))
+			return 0;
+		chain_head = 1;
+	}
+
+	hlock->prev_chain_key = chain_key;
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+	/*
+	 * Keep track of points where we cross into an interrupt context:
+	 */
+	hlock->irq_context = 2*(curr->hardirq_context ? 1 : 0) +
+				curr->softirq_context;
+	if (depth) {
+		struct held_lock *prev_hlock;
+
+		prev_hlock = curr->held_locks + depth-1;
+		/*
+		 * If we cross into another context, reset the
+		 * hash key (this also prevents the checking and the
+		 * adding of the dependency to 'prev'):
+		 */
+		if (prev_hlock->irq_context != hlock->irq_context) {
+			chain_key = 0;
+			chain_head = 1;
+		}
+	}
+#endif
+	chain_key = iterate_chain_key(chain_key, id);
+	curr->curr_chain_key = chain_key;
+
+	/*
+	 * Trylock needs to maintain the stack of held locks, but it
+	 * does not add new dependencies, because trylock can be done
+	 * in any order.
+	 *
+	 * We look up the chain_key and do the O(N^2) check and update of
+	 * the dependencies only if this is a new dependency chain.
+	 * (If lookup_chain_cache() returns with 1 it acquires
+	 * hash_lock for us)
+	 */
+	if (!trylock && lookup_chain_cache(chain_key)) {
+		/*
+		 * Check whether last held lock:
+		 *
+		 * - is irq-safe, if this lock is irq-unsafe
+		 * - is softirq-safe, if this lock is hardirq-unsafe
+		 *
+		 * And check whether the new lock's dependency graph
+		 * could lead back to the previous lock.
+		 *
+		 * any of these scenarios could lead to a deadlock. If
+		 * All validations
+		 */
+		int ret = check_deadlock(curr, hlock, lock, read);
+
+		if (!ret)
+			return 0;
+		/*
+		 * Mark recursive read, as we jump over it when
+		 * building dependencies (just like we jump over
+		 * trylock entries):
+		 */
+		if (ret == 2)
+			hlock->read = 2;
+		/*
+		 * Add dependency only if this lock is not the head
+		 * of the chain, and if it's not a secondary read-lock:
+		 */
+		if (!chain_head && ret != 2)
+			if (!check_prevs_add(curr, hlock))
+				return 0;
+		__raw_spin_unlock(&hash_lock);
+	}
+	curr->lockdep_depth++;
+	check_chain_key(curr);
+	if (unlikely(curr->lockdep_depth >= MAX_LOCK_DEPTH)) {
+		debug_locks_off();
+		printk("BUG: MAX_LOCK_DEPTH too low!\n");
+		printk("turning off the locking correctness validator.\n");
+		return 0;
+	}
+	if (unlikely(curr->lockdep_depth > max_lockdep_depth))
+		max_lockdep_depth = curr->lockdep_depth;
+
+	return 1;
+}
+
+static int
+print_unlock_order_bug(struct task_struct *curr, struct lockdep_map *lock,
+		       struct held_lock *hlock, unsigned long ip)
+{
+	debug_locks_off();
+	if (debug_locks_silent)
+		return 0;
+
+	printk("\n======================================\n");
+	printk(  "[ BUG: bad unlock ordering detected! ]\n");
+	printk(  "--------------------------------------\n");
+	printk("%s/%d is trying to release lock (",
+		curr->comm, curr->pid);
+	print_lockdep_cache(lock);
+	printk(") at:\n");
+	printk_sym(ip);
+	printk("but the next lock to release is:\n");
+	print_lock(hlock);
+	printk("\nother info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	printk("\nstack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+static int
+print_unlock_inbalance_bug(struct task_struct *curr, struct lockdep_map *lock,
+			   unsigned long ip)
+{
+	debug_locks_off();
+	if (debug_locks_silent)
+		return 0;
+
+	printk("\n=====================================\n");
+	printk(  "[ BUG: bad unlock balance detected! ]\n");
+	printk(  "-------------------------------------\n");
+	printk("%s/%d is trying to release lock (",
+		curr->comm, curr->pid);
+	print_lockdep_cache(lock);
+	printk(") at:\n");
+	printk_sym(ip);
+	printk("but there are no more locks to release!\n");
+	printk("\nother info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	printk("\nstack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Common debugging checks for both nested and non-nested unlock:
+ */
+static int check_unlock(struct task_struct *curr, struct lockdep_map *lock,
+			unsigned long ip)
+{
+	if (unlikely(!debug_locks))
+		return 0;
+	if (DEBUG_WARN_ON(!irqs_disabled()))
+		return 0;
+
+	if (curr->lockdep_depth <= 0)
+		return print_unlock_inbalance_bug(curr, lock, ip);
+
+	return 1;
+}
+
+/*
+ * Remove the lock to the list of currently held locks - this gets
+ * called on mutex_unlock()/spin_unlock*() (or on a failed
+ * mutex_lock_interruptible()). This is done for unlocks that nest
+ * perfectly. (i.e. the current top of the lock-stack is unlocked)
+ */
+static int lockdep_release_nested(struct task_struct *curr,
+				  struct lockdep_map *lock, unsigned long ip)
+{
+	struct held_lock *hlock;
+	unsigned int depth;
+
+	/*
+	 * Pop off the top of the lock stack:
+	 */
+	depth = --curr->lockdep_depth;
+	hlock = curr->held_locks + depth;
+
+	if (hlock->instance != lock)
+		return print_unlock_order_bug(curr, lock, hlock, ip);
+
+	if (DEBUG_WARN_ON(!depth && (hlock->prev_chain_key != 0)))
+		return 0;
+
+	curr->curr_chain_key = hlock->prev_chain_key;
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+	hlock->prev_chain_key = 0;
+	hlock->type = NULL;
+	hlock->acquire_ip = 0;
+	hlock->irq_context = 0;
+#endif
+	return 1;
+}
+
+/*
+ * Remove the lock to the list of currently held locks in a
+ * potentially non-nested (out of order) manner. This is a
+ * relatively rare operation, as all the unlock APIs default
+ * to nested mode (which uses lockdep_release()):
+ */
+static int
+lockdep_release_non_nested(struct task_struct *curr,
+			   struct lockdep_map *lock, unsigned long ip)
+{
+	struct held_lock *hlock, *prev_hlock;
+	unsigned int depth;
+	int i;
+
+	/*
+	 * Check whether the lock exists in the current stack
+	 * of held locks:
+	 */
+	depth = curr->lockdep_depth;
+	if (DEBUG_WARN_ON(!depth))
+		return 0;
+
+	prev_hlock = NULL;
+	for (i = depth-1; i >= 0; i--) {
+		hlock = curr->held_locks + i;
+		/*
+		 * We must not cross into another context:
+		 */
+		if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
+			break;
+		if (hlock->instance == lock)
+			goto found_it;
+		prev_hlock = hlock;
+	}
+	return print_unlock_inbalance_bug(curr, lock, ip);
+
+found_it:
+	/*
+	 * We have the right lock to unlock, 'hlock' points to it.
+	 * Now we remove it from the stack, and add back the other
+	 * entries (if any), recalculating the hash along the way:
+	 */
+	curr->lockdep_depth = i;
+	curr->curr_chain_key = hlock->prev_chain_key;
+
+	for (i++; i < depth; i++) {
+		hlock = curr->held_locks + i;
+		if (!__lockdep_acquire(hlock->instance,
+			hlock->type->subtype, hlock->trylock,
+				hlock->read, hlock->hardirqs_off,
+				hlock->acquire_ip))
+			return 0;
+	}
+
+	if (DEBUG_WARN_ON(curr->lockdep_depth != depth - 1))
+		return 0;
+	return 1;
+}
+
+/*
+ * Remove the lock to the list of currently held locks - this gets
+ * called on mutex_unlock()/spin_unlock*() (or on a failed
+ * mutex_lock_interruptible()). This is done for unlocks that nest
+ * perfectly. (i.e. the current top of the lock-stack is unlocked)
+ */
+static void __lockdep_release(struct lockdep_map *lock, int nested,
+			      unsigned long ip)
+{
+	struct task_struct *curr = current;
+
+	if (!check_unlock(curr, lock, ip))
+		return;
+
+	if (nested) {
+		if (!lockdep_release_nested(curr, lock, ip))
+			return;
+	} else {
+		if (!lockdep_release_non_nested(curr, lock, ip))
+			return;
+	}
+
+	check_chain_key(curr);
+}
+
+/*
+ * Check whether we follow the irq-flags state precisely:
+ */
+static void check_flags(unsigned long flags)
+{
+#if defined(CONFIG_DEBUG_LOCKDEP) && defined(CONFIG_TRACE_IRQFLAGS)
+	if (!debug_locks)
+		return;
+
+	if (irqs_disabled_flags(flags))
+		DEBUG_WARN_ON(current->hardirqs_enabled);
+	else
+		DEBUG_WARN_ON(!current->hardirqs_enabled);
+
+	/*
+	 * We dont accurately track softirq state in e.g.
+	 * hardirq contexts (such as on 4KSTACKS), so only
+	 * check if not in hardirq contexts:
+	 */
+	if (!hardirq_count()) {
+		if (softirq_count())
+			DEBUG_WARN_ON(current->softirqs_enabled);
+		else
+			DEBUG_WARN_ON(!current->softirqs_enabled);
+	}
+
+	if (!debug_locks)
+		print_irqtrace_events(current);
+#endif
+}
+
+/*
+ * We are not always called with irqs disabled - do that here,
+ * and also avoid lockdep recursion:
+ */
+void lockdep_acquire(struct lockdep_map *lock, unsigned int subtype,
+		     int trylock, int read, unsigned long ip)
+{
+	unsigned long flags;
+
+	if (LOCKDEP_OFF)
+		return;
+
+	raw_local_irq_save(flags);
+	check_flags(flags);
+
+	if (unlikely(current->lockdep_recursion))
+		goto out;
+	current->lockdep_recursion = 1;
+	__lockdep_acquire(lock, subtype, trylock, read, irqs_disabled_flags(flags), ip);
+	current->lockdep_recursion = 0;
+out:
+	raw_local_irq_restore(flags);
+}
+
+EXPORT_SYMBOL_GPL(lockdep_acquire);
+
+void lockdep_release(struct lockdep_map *lock, int nested, unsigned long ip)
+{
+	unsigned long flags;
+
+	if (LOCKDEP_OFF)
+		return;
+
+	raw_local_irq_save(flags);
+	check_flags(flags);
+	if (unlikely(current->lockdep_recursion))
+		goto out;
+	current->lockdep_recursion = 1;
+	__lockdep_release(lock, nested, ip);
+	current->lockdep_recursion = 0;
+out:
+	raw_local_irq_restore(flags);
+}
+
+EXPORT_SYMBOL_GPL(lockdep_release);
+
+/*
+ * Used by the testsuite, sanitize the validator state
+ * after a simulated failure:
+ */
+
+void lockdep_reset(void)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	current->curr_chain_key = 0;
+	current->lockdep_depth = 0;
+	current->lockdep_recursion = 0;
+	memset(current->held_locks, 0, MAX_LOCK_DEPTH*sizeof(struct held_lock));
+	nr_hardirq_chains = 0;
+	nr_softirq_chains = 0;
+	nr_process_chains = 0;
+	debug_locks = 1;
+	raw_local_irq_restore(flags);
+}
+
+static void zap_type(struct lock_type *type)
+{
+	int i;
+
+	/*
+	 * Remove all dependencies this lock is
+	 * involved in:
+	 */
+	for (i = 0; i < nr_list_entries; i++) {
+		if (list_entries[i].type == type)
+			list_del_rcu(&list_entries[i].entry);
+	}
+	/*
+	 * Unhash the type and remove it from the all_lock_types list:
+	 */
+	list_del_rcu(&type->hash_entry);
+	list_del_rcu(&type->lock_entry);
+
+}
+
+static inline int within(void *addr, void *start, unsigned long size)
+{
+	return addr >= start && addr < start + size;
+}
+
+void lockdep_free_key_range(void *start, unsigned long size)
+{
+	struct lock_type *type, *next;
+	struct list_head *head;
+	unsigned long flags;
+	int i;
+
+	raw_local_irq_save(flags);
+	__raw_spin_lock(&hash_lock);
+
+	/*
+	 * Unhash all types that were created by this module:
+	 */
+	for (i = 0; i < TYPEHASH_SIZE; i++) {
+		head = typehash_table + i;
+		if (list_empty(head))
+			continue;
+		list_for_each_entry_safe(type, next, head, hash_entry)
+			if (within(type->key, start, size))
+				zap_type(type);
+	}
+
+	__raw_spin_unlock(&hash_lock);
+	raw_local_irq_restore(flags);
+}
+
+void lockdep_reset_lock(struct lockdep_map *lock)
+{
+	struct lock_type *type, *next, *entry;
+	struct list_head *head;
+	unsigned long flags;
+	int i, j;
+
+	raw_local_irq_save(flags);
+	__raw_spin_lock(&hash_lock);
+
+	/*
+	 * Remove all types this lock has:
+	 */
+	for (i = 0; i < TYPEHASH_SIZE; i++) {
+		head = typehash_table + i;
+		if (list_empty(head))
+			continue;
+		list_for_each_entry_safe(type, next, head, hash_entry) {
+			for (j = 0; j < MAX_LOCKDEP_SUBTYPES; j++) {
+				entry = lock->type[j];
+				if (type == entry) {
+					zap_type(type);
+					lock->type[j] = NULL;
+					break;
+				}
+			}
+		}
+	}
+
+	/*
+	 * Debug check: in the end all mapped types should
+	 * be gone.
+	 */
+	for (j = 0; j < MAX_LOCKDEP_SUBTYPES; j++) {
+		entry = lock->type[j];
+		if (!entry)
+			continue;
+		__raw_spin_unlock(&hash_lock);
+		DEBUG_WARN_ON(1);
+		raw_local_irq_restore(flags);
+		return;
+	}
+
+	__raw_spin_unlock(&hash_lock);
+	raw_local_irq_restore(flags);
+}
+
+void __init lockdep_init(void)
+{
+	int i;
+
+	/*
+	 * Some architectures have their own start_kernel()
+	 * code which calls lockdep_init(), while we also
+	 * call lockdep_init() from the start_kernel() itself,
+	 * and we want to initialize the hashes only once:
+	 */
+	if (lockdep_initialized)
+		return;
+
+	for (i = 0; i < TYPEHASH_SIZE; i++)
+		INIT_LIST_HEAD(typehash_table + i);
+
+	for (i = 0; i < CHAINHASH_SIZE; i++)
+		INIT_LIST_HEAD(chainhash_table + i);
+
+	lockdep_initialized = 1;
+}
+
+void __init lockdep_info(void)
+{
+	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
+
+	printk("... MAX_LOCKDEP_SUBTYPES:    %lu\n", MAX_LOCKDEP_SUBTYPES);
+	printk("... MAX_LOCK_DEPTH:          %lu\n", MAX_LOCK_DEPTH);
+	printk("... MAX_LOCKDEP_KEYS:        %lu\n", MAX_LOCKDEP_KEYS);
+	printk("... TYPEHASH_SIZE:           %lu\n", TYPEHASH_SIZE);
+	printk("... MAX_LOCKDEP_ENTRIES:     %lu\n", MAX_LOCKDEP_ENTRIES);
+	printk("... MAX_LOCKDEP_CHAINS:      %lu\n", MAX_LOCKDEP_CHAINS);
+	printk("... CHAINHASH_SIZE:          %lu\n", CHAINHASH_SIZE);
+
+	printk(" memory used by lock dependency info: %lu kB\n",
+		(sizeof(struct lock_type) * MAX_LOCKDEP_KEYS +
+		sizeof(struct list_head) * TYPEHASH_SIZE +
+		sizeof(struct lock_list) * MAX_LOCKDEP_ENTRIES +
+		sizeof(struct lock_chain) * MAX_LOCKDEP_CHAINS +
+		sizeof(struct list_head) * CHAINHASH_SIZE) / 1024);
+
+	printk(" per task-struct memory footprint: %lu bytes\n",
+		sizeof(struct held_lock) * MAX_LOCK_DEPTH);
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+	if (lockdep_init_error)
+		printk("WARNING: lockdep init error! Arch code didnt call lockdep_init() early enough?\n");
+#endif
+}
+
Index: linux/kernel/lockdep_internals.h
===================================================================
--- /dev/null
+++ linux/kernel/lockdep_internals.h
@@ -0,0 +1,93 @@
+/*
+ * kernel/lockdep_internals.h
+ *
+ * Runtime locking correctness validator
+ *
+ * lockdep subsystem internal functions and variables.
+ */
+
+/*
+ * MAX_LOCKDEP_ENTRIES is the maximum number of lock dependencies
+ * we track.
+ *
+ * We use the per-lock dependency maps in two ways: we grow it by adding
+ * every to-be-taken lock to all currently held lock's own dependency
+ * table (if it's not there yet), and we check it for lock order
+ * conflicts and deadlocks.
+ */
+#define MAX_LOCKDEP_ENTRIES	8192UL
+
+#define MAX_LOCKDEP_KEYS_BITS	11
+#define MAX_LOCKDEP_KEYS	(1UL << MAX_LOCKDEP_KEYS_BITS)
+
+#define MAX_LOCKDEP_CHAINS_BITS	13
+#define MAX_LOCKDEP_CHAINS	(1UL << MAX_LOCKDEP_CHAINS_BITS)
+
+/*
+ * Stack-trace: tightly packed array of stack backtrace
+ * addresses. Protected by the hash_lock.
+ */
+#define MAX_STACK_TRACE_ENTRIES	131072UL
+
+extern struct list_head all_lock_types;
+
+extern void
+get_usage_chars(struct lock_type *type, char *c1, char *c2, char *c3, char *c4);
+
+extern const char * __get_key_name(struct lockdep_subtype_key *key, char *str);
+
+extern unsigned long nr_lock_types;
+extern unsigned long nr_list_entries;
+extern unsigned long nr_lock_chains;
+extern unsigned long nr_stack_trace_entries;
+
+extern unsigned int nr_hardirq_chains;
+extern unsigned int nr_softirq_chains;
+extern unsigned int nr_process_chains;
+extern unsigned int max_lockdep_depth;
+extern unsigned int max_recursion_depth;
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+/*
+ * We cannot printk in early bootup code. Not even early_printk()
+ * might work. So we mark any initialization errors and printk
+ * about it later on, in lockdep_info().
+ */
+extern int lockdep_init_error;
+
+/*
+ * Various lockdep statistics:
+ */
+extern atomic_t chain_lookup_hits;
+extern atomic_t chain_lookup_misses;
+extern atomic_t hardirqs_on_events;
+extern atomic_t hardirqs_off_events;
+extern atomic_t redundant_hardirqs_on;
+extern atomic_t redundant_hardirqs_off;
+extern atomic_t softirqs_on_events;
+extern atomic_t softirqs_off_events;
+extern atomic_t redundant_softirqs_on;
+extern atomic_t redundant_softirqs_off;
+extern atomic_t nr_unused_locks;
+extern atomic_t nr_hardirq_safe_locks;
+extern atomic_t nr_softirq_safe_locks;
+extern atomic_t nr_hardirq_unsafe_locks;
+extern atomic_t nr_softirq_unsafe_locks;
+extern atomic_t nr_hardirq_read_safe_locks;
+extern atomic_t nr_softirq_read_safe_locks;
+extern atomic_t nr_hardirq_read_unsafe_locks;
+extern atomic_t nr_softirq_read_unsafe_locks;
+extern atomic_t nr_cyclic_checks;
+extern atomic_t nr_cyclic_check_recursions;
+extern atomic_t nr_find_usage_forwards_checks;
+extern atomic_t nr_find_usage_forwards_recursions;
+extern atomic_t nr_find_usage_backwards_checks;
+extern atomic_t nr_find_usage_backwards_recursions;
+# define debug_atomic_inc(ptr)		atomic_inc(ptr)
+# define debug_atomic_dec(ptr)		atomic_dec(ptr)
+# define debug_atomic_read(ptr)		atomic_read(ptr)
+#else
+# define debug_atomic_inc(ptr)		do { } while (0)
+# define debug_atomic_dec(ptr)		do { } while (0)
+# define debug_atomic_read(ptr)		0
+#endif
Index: linux/kernel/module.c
===================================================================
--- linux.orig/kernel/module.c
+++ linux/kernel/module.c
@@ -1151,6 +1151,9 @@ static void free_module(struct module *m
 	if (mod->percpu)
 		percpu_modfree(mod->percpu);
 
+	/* Free lock-types: */
+	lockdep_free_key_range(mod->module_core, mod->core_size);
+
 	/* Finally, free the core (containing the module structure) */
 	module_free(mod, mod->module_core);
 }
Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -57,7 +57,7 @@ config DEBUG_KERNEL
 config LOG_BUF_SHIFT
 	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)" if DEBUG_KERNEL
 	range 12 21
-	default 17 if S390
+	default 17 if S390 || LOCKDEP
 	default 16 if X86_NUMAQ || IA64
 	default 15 if SMP
 	default 14
Index: linux/lib/locking-selftest.c
===================================================================
--- linux.orig/lib/locking-selftest.c
+++ linux/lib/locking-selftest.c
@@ -15,6 +15,7 @@
 #include <linux/sched.h>
 #include <linux/delay.h>
 #include <linux/module.h>
+#include <linux/lockdep.h>
 #include <linux/spinlock.h>
 #include <linux/kallsyms.h>
 #include <linux/interrupt.h>
@@ -872,9 +873,6 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_
 #include "locking-selftest-softirq.h"
 // GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft)
 
-#define lockdep_reset()
-#define lockdep_reset_lock(x)
-
 #ifdef CONFIG_PROVE_SPIN_LOCKING
 # define I_SPINLOCK(x)	lockdep_reset_lock(&lock_##x.dep_map)
 #else

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 24/61] lock validator: procfs
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (22 preceding siblings ...)
  2006-05-29 21:25 ` [patch 23/61] lock validator: core Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:25 ` [patch 25/61] lock validator: design docs Ingo Molnar
                   ` (49 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

lock validator /proc/lockdep and /proc/lockdep_stats support.
(FIXME: should go into debugfs)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/Makefile       |    3 
 kernel/lockdep_proc.c |  345 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 348 insertions(+)

Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -13,6 +13,9 @@ obj-y     = sched.o fork.o exec_domain.o
 obj-y += time/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
 obj-$(CONFIG_LOCKDEP) += lockdep.o
+ifeq ($(CONFIG_PROC_FS),y)
+obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
+endif
 obj-$(CONFIG_FUTEX) += futex.o
 ifeq ($(CONFIG_COMPAT),y)
 obj-$(CONFIG_FUTEX) += futex_compat.o
Index: linux/kernel/lockdep_proc.c
===================================================================
--- /dev/null
+++ linux/kernel/lockdep_proc.c
@@ -0,0 +1,345 @@
+/*
+ * kernel/lockdep_proc.c
+ *
+ * Runtime locking correctness validator
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * Code for /proc/lockdep and /proc/lockdep_stats:
+ *
+ */
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/debug_locks.h>
+
+#include "lockdep_internals.h"
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct lock_type *type = v;
+
+	(*pos)++;
+
+	if (type->lock_entry.next != &all_lock_types)
+		type = list_entry(type->lock_entry.next, struct lock_type,
+				  lock_entry);
+	else
+		type = NULL;
+	m->private = type;
+
+	return type;
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+	struct lock_type *type = m->private;
+
+	if (&type->lock_entry == all_lock_types.next)
+		seq_printf(m, "all lock types:\n");
+
+	return type;
+}
+
+static void l_stop(struct seq_file *m, void *v)
+{
+}
+
+static unsigned long count_forward_deps(struct lock_type *type)
+{
+	struct lock_list *entry;
+	unsigned long ret = 1;
+
+	/*
+	 * Recurse this type's dependency list:
+	 */
+	list_for_each_entry(entry, &type->locks_after, entry)
+		ret += count_forward_deps(entry->type);
+
+	return ret;
+}
+
+static unsigned long count_backward_deps(struct lock_type *type)
+{
+	struct lock_list *entry;
+	unsigned long ret = 1;
+
+	/*
+	 * Recurse this type's dependency list:
+	 */
+	list_for_each_entry(entry, &type->locks_before, entry)
+		ret += count_backward_deps(entry->type);
+
+	return ret;
+}
+
+static int l_show(struct seq_file *m, void *v)
+{
+	unsigned long nr_forward_deps, nr_backward_deps;
+	struct lock_type *type = m->private;
+	char str[128], c1, c2, c3, c4;
+	const char *name;
+
+	seq_printf(m, "%p", type->key);
+#ifdef CONFIG_DEBUG_LOCKDEP
+	seq_printf(m, " OPS:%8ld", type->ops);
+#endif
+	nr_forward_deps = count_forward_deps(type);
+	seq_printf(m, " FD:%5ld", nr_forward_deps);
+
+	nr_backward_deps = count_backward_deps(type);
+	seq_printf(m, " BD:%5ld", nr_backward_deps);
+
+	get_usage_chars(type, &c1, &c2, &c3, &c4);
+	seq_printf(m, " %c%c%c%c", c1, c2, c3, c4);
+
+	name = type->name;
+	if (!name) {
+		name = __get_key_name(type->key, str);
+		seq_printf(m, ": %s", name);
+	} else{
+		seq_printf(m, ": %s", name);
+		if (type->name_version > 1)
+			seq_printf(m, "#%d", type->name_version);
+		if (type->subtype)
+			seq_printf(m, "/%d", type->subtype);
+	}
+	seq_puts(m, "\n");
+
+	return 0;
+}
+
+static struct seq_operations lockdep_ops = {
+	.start	= l_start,
+	.next	= l_next,
+	.stop	= l_stop,
+	.show	= l_show,
+};
+
+static int lockdep_open(struct inode *inode, struct file *file)
+{
+	int res = seq_open(file, &lockdep_ops);
+	if (!res) {
+		struct seq_file *m = file->private_data;
+
+		if (!list_empty(&all_lock_types))
+			m->private = list_entry(all_lock_types.next,
+					struct lock_type, lock_entry);
+		else
+			m->private = NULL;
+	}
+	return res;
+}
+
+static struct file_operations proc_lockdep_operations = {
+	.open		= lockdep_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
+static void lockdep_stats_debug_show(struct seq_file *m)
+{
+#ifdef CONFIG_DEBUG_LOCKDEP
+	unsigned int hi1 = debug_atomic_read(&hardirqs_on_events),
+		     hi2 = debug_atomic_read(&hardirqs_off_events),
+		     hr1 = debug_atomic_read(&redundant_hardirqs_on),
+		     hr2 = debug_atomic_read(&redundant_hardirqs_off),
+		     si1 = debug_atomic_read(&softirqs_on_events),
+		     si2 = debug_atomic_read(&softirqs_off_events),
+		     sr1 = debug_atomic_read(&redundant_softirqs_on),
+		     sr2 = debug_atomic_read(&redundant_softirqs_off);
+
+	seq_printf(m, " chain lookup misses:           %11u\n",
+		debug_atomic_read(&chain_lookup_misses));
+	seq_printf(m, " chain lookup hits:             %11u\n",
+		debug_atomic_read(&chain_lookup_hits));
+	seq_printf(m, " cyclic checks:                 %11u\n",
+		debug_atomic_read(&nr_cyclic_checks));
+	seq_printf(m, " cyclic-check recursions:       %11u\n",
+		debug_atomic_read(&nr_cyclic_check_recursions));
+	seq_printf(m, " find-mask forwards checks:     %11u\n",
+		debug_atomic_read(&nr_find_usage_forwards_checks));
+	seq_printf(m, " find-mask forwards recursions: %11u\n",
+		debug_atomic_read(&nr_find_usage_forwards_recursions));
+	seq_printf(m, " find-mask backwards checks:    %11u\n",
+		debug_atomic_read(&nr_find_usage_backwards_checks));
+	seq_printf(m, " find-mask backwards recursions:%11u\n",
+		debug_atomic_read(&nr_find_usage_backwards_recursions));
+
+	seq_printf(m, " hardirq on events:             %11u\n", hi1);
+	seq_printf(m, " hardirq off events:            %11u\n", hi2);
+	seq_printf(m, " redundant hardirq ons:         %11u\n", hr1);
+	seq_printf(m, " redundant hardirq offs:        %11u\n", hr2);
+	seq_printf(m, " softirq on events:             %11u\n", si1);
+	seq_printf(m, " softirq off events:            %11u\n", si2);
+	seq_printf(m, " redundant softirq ons:         %11u\n", sr1);
+	seq_printf(m, " redundant softirq offs:        %11u\n", sr2);
+#endif
+}
+
+static int lockdep_stats_show(struct seq_file *m, void *v)
+{
+	struct lock_type *type;
+	unsigned long nr_unused = 0, nr_uncategorized = 0,
+		      nr_irq_safe = 0, nr_irq_unsafe = 0,
+		      nr_softirq_safe = 0, nr_softirq_unsafe = 0,
+		      nr_hardirq_safe = 0, nr_hardirq_unsafe = 0,
+		      nr_irq_read_safe = 0, nr_irq_read_unsafe = 0,
+		      nr_softirq_read_safe = 0, nr_softirq_read_unsafe = 0,
+		      nr_hardirq_read_safe = 0, nr_hardirq_read_unsafe = 0,
+		      sum_forward_deps = 0, factor = 0;
+
+	list_for_each_entry(type, &all_lock_types, lock_entry) {
+
+		if (type->usage_mask == 0)
+			nr_unused++;
+		if (type->usage_mask == LOCKF_USED)
+			nr_uncategorized++;
+		if (type->usage_mask & LOCKF_USED_IN_IRQ)
+			nr_irq_safe++;
+		if (type->usage_mask & LOCKF_ENABLED_IRQS)
+			nr_irq_unsafe++;
+		if (type->usage_mask & LOCKF_USED_IN_SOFTIRQ)
+			nr_softirq_safe++;
+		if (type->usage_mask & LOCKF_ENABLED_SOFTIRQS)
+			nr_softirq_unsafe++;
+		if (type->usage_mask & LOCKF_USED_IN_HARDIRQ)
+			nr_hardirq_safe++;
+		if (type->usage_mask & LOCKF_ENABLED_HARDIRQS)
+			nr_hardirq_unsafe++;
+		if (type->usage_mask & LOCKF_USED_IN_IRQ_READ)
+			nr_irq_read_safe++;
+		if (type->usage_mask & LOCKF_ENABLED_IRQS_READ)
+			nr_irq_read_unsafe++;
+		if (type->usage_mask & LOCKF_USED_IN_SOFTIRQ_READ)
+			nr_softirq_read_safe++;
+		if (type->usage_mask & LOCKF_ENABLED_SOFTIRQS_READ)
+			nr_softirq_read_unsafe++;
+		if (type->usage_mask & LOCKF_USED_IN_HARDIRQ_READ)
+			nr_hardirq_read_safe++;
+		if (type->usage_mask & LOCKF_ENABLED_HARDIRQS_READ)
+			nr_hardirq_read_unsafe++;
+
+		sum_forward_deps += count_forward_deps(type);
+	}
+#ifdef CONFIG_LOCKDEP_DEBUG
+	DEBUG_WARN_ON(debug_atomic_read(&nr_unused_locks) != nr_unused);
+#endif
+	seq_printf(m, " lock-types:                    %11lu [max: %lu]\n",
+			nr_lock_types, MAX_LOCKDEP_KEYS);
+	seq_printf(m, " direct dependencies:           %11lu [max: %lu]\n",
+			nr_list_entries, MAX_LOCKDEP_ENTRIES);
+	seq_printf(m, " indirect dependencies:         %11lu\n",
+			sum_forward_deps);
+
+	/*
+	 * Total number of dependencies:
+	 *
+	 * All irq-safe locks may nest inside irq-unsafe locks,
+	 * plus all the other known dependencies:
+	 */
+	seq_printf(m, " all direct dependencies:       %11lu\n",
+			nr_irq_unsafe * nr_irq_safe +
+			nr_hardirq_unsafe * nr_hardirq_safe +
+			nr_list_entries);
+
+	/*
+	 * Estimated factor between direct and indirect
+	 * dependencies:
+	 */
+	if (nr_list_entries)
+		factor = sum_forward_deps / nr_list_entries;
+
+	seq_printf(m, " dependency chains:             %11lu [max: %lu]\n",
+			nr_lock_chains, MAX_LOCKDEP_CHAINS);
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+	seq_printf(m, " in-hardirq chains:             %11u\n",
+			nr_hardirq_chains);
+	seq_printf(m, " in-softirq chains:             %11u\n",
+			nr_softirq_chains);
+#endif
+	seq_printf(m, " in-process chains:             %11u\n",
+			nr_process_chains);
+	seq_printf(m, " stack-trace entries:           %11lu [max: %lu]\n",
+			nr_stack_trace_entries, MAX_STACK_TRACE_ENTRIES);
+	seq_printf(m, " combined max dependencies:     %11u\n",
+			(nr_hardirq_chains + 1) *
+			(nr_softirq_chains + 1) *
+			(nr_process_chains + 1)
+	);
+	seq_printf(m, " hardirq-safe locks:            %11lu\n",
+			nr_hardirq_safe);
+	seq_printf(m, " hardirq-unsafe locks:          %11lu\n",
+			nr_hardirq_unsafe);
+	seq_printf(m, " softirq-safe locks:            %11lu\n",
+			nr_softirq_safe);
+	seq_printf(m, " softirq-unsafe locks:          %11lu\n",
+			nr_softirq_unsafe);
+	seq_printf(m, " irq-safe locks:                %11lu\n",
+			nr_irq_safe);
+	seq_printf(m, " irq-unsafe locks:              %11lu\n",
+			nr_irq_unsafe);
+
+	seq_printf(m, " hardirq-read-safe locks:       %11lu\n",
+			nr_hardirq_read_safe);
+	seq_printf(m, " hardirq-read-unsafe locks:     %11lu\n",
+			nr_hardirq_read_unsafe);
+	seq_printf(m, " softirq-read-safe locks:       %11lu\n",
+			nr_softirq_read_safe);
+	seq_printf(m, " softirq-read-unsafe locks:     %11lu\n",
+			nr_softirq_read_unsafe);
+	seq_printf(m, " irq-read-safe locks:           %11lu\n",
+			nr_irq_read_safe);
+	seq_printf(m, " irq-read-unsafe locks:         %11lu\n",
+			nr_irq_read_unsafe);
+
+	seq_printf(m, " uncategorized locks:           %11lu\n",
+			nr_uncategorized);
+	seq_printf(m, " unused locks:                  %11lu\n",
+			nr_unused);
+	seq_printf(m, " max locking depth:             %11u\n",
+			max_lockdep_depth);
+	seq_printf(m, " max recursion depth:           %11u\n",
+			max_recursion_depth);
+	lockdep_stats_debug_show(m);
+	seq_printf(m, " debug_locks:                   %11u\n",
+			debug_locks);
+
+	return 0;
+}
+
+static int lockdep_stats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, lockdep_stats_show, NULL);
+}
+
+static struct file_operations proc_lockdep_stats_operations = {
+	.open		= lockdep_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
+static int __init lockdep_proc_init(void)
+{
+	struct proc_dir_entry *entry;
+
+	entry = create_proc_entry("lockdep", S_IRUSR, NULL);
+	if (entry)
+		entry->proc_fops = &proc_lockdep_operations;
+
+	entry = create_proc_entry("lockdep_stats", S_IRUSR, NULL);
+	if (entry)
+		entry->proc_fops = &proc_lockdep_stats_operations;
+
+	return 0;
+}
+
+__initcall(lockdep_proc_init);
+

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 25/61] lock validator: design docs
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (23 preceding siblings ...)
  2006-05-29 21:25 ` [patch 24/61] lock validator: procfs Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-30  9:07   ` Nikita Danilov
  2006-05-29 21:25 ` [patch 26/61] lock validator: prove rwsem locking correctness Ingo Molnar
                   ` (48 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

lock validator design documentation.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 Documentation/lockdep-design.txt |  224 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 224 insertions(+)

Index: linux/Documentation/lockdep-design.txt
===================================================================
--- /dev/null
+++ linux/Documentation/lockdep-design.txt
@@ -0,0 +1,224 @@
+Runtime locking correctness validator
+=====================================
+
+started by Ingo Molnar <mingo@redhat.com>
+additions by Arjan van de Ven <arjan@linux.intel.com>
+
+Lock-type
+---------
+
+The basic object the validator operates upon is the 'type' or 'class' of
+locks.
+
+A class of locks is a group of locks that are logically the same with
+respect to locking rules, even if the locks may have multiple (possibly
+tens of thousands of) instantiations. For example a lock in the inode
+struct is one class, while each inode has its own instantiation of that
+lock class.
+
+The validator tracks the 'state' of lock-types, and it tracks
+dependencies between different lock-types. The validator maintains a
+rolling proof that the state and the dependencies are correct.
+
+Unlike an lock instantiation, the lock-type itself never goes away: when
+a lock-type is used for the first time after bootup it gets registered,
+and all subsequent uses of that lock-type will be attached to this
+lock-type.
+
+State
+-----
+
+The validator tracks lock-type usage history into 5 separate state bits:
+
+- 'ever held in hardirq context'                    [ == hardirq-safe   ]
+- 'ever held in softirq context'                    [ == softirq-safe   ]
+- 'ever held with hardirqs enabled'                 [ == hardirq-unsafe ]
+- 'ever held with softirqs and hardirqs enabled'    [ == softirq-unsafe ]
+
+- 'ever used'                                       [ == !unused        ]
+
+Single-lock state rules:
+------------------------
+
+A softirq-unsafe lock-type is automatically hardirq-unsafe as well. The
+following states are exclusive, and only one of them is allowed to be
+set for any lock-type:
+
+ <hardirq-safe> and <hardirq-unsafe>
+ <softirq-safe> and <softirq-unsafe>
+
+The validator detects and reports lock usage that violate these
+single-lock state rules.
+
+Multi-lock dependency rules:
+----------------------------
+
+The same lock-type must not be acquired twice, because this could lead
+to lock recursion deadlocks.
+
+Furthermore, two locks may not be taken in different order:
+
+ <L1> -> <L2>
+ <L2> -> <L1>
+
+because this could lead to lock inversion deadlocks. (The validator
+finds such dependencies in arbitrary complexity, i.e. there can be any
+other locking sequence between the acquire-lock operations, the
+validator will still track all dependencies between locks.)
+
+Furthermore, the following usage based lock dependencies are not allowed
+between any two lock-types:
+
+   <hardirq-safe>   ->  <hardirq-unsafe>
+   <softirq-safe>   ->  <softirq-unsafe>
+
+The first rule comes from the fact the a hardirq-safe lock could be
+taken by a hardirq context, interrupting a hardirq-unsafe lock - and
+thus could result in a lock inversion deadlock. Likewise, a softirq-safe
+lock could be taken by an softirq context, interrupting a softirq-unsafe
+lock.
+
+The above rules are enforced for any locking sequence that occurs in the
+kernel: when acquiring a new lock, the validator checks whether there is
+any rule violation between the new lock and any of the held locks.
+
+When a lock-type changes its state, the following aspects of the above
+dependency rules are enforced:
+
+- if a new hardirq-safe lock is discovered, we check whether it
+  took any hardirq-unsafe lock in the past.
+
+- if a new softirq-safe lock is discovered, we check whether it took
+  any softirq-unsafe lock in the past.
+
+- if a new hardirq-unsafe lock is discovered, we check whether any
+  hardirq-safe lock took it in the past.
+
+- if a new softirq-unsafe lock is discovered, we check whether any
+  softirq-safe lock took it in the past.
+
+(Again, we do these checks too on the basis that an interrupt context
+could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which
+could lead to a lock inversion deadlock - even if that lock scenario did
+not trigger in practice yet.)
+
+Exception 1: Nested data types leading to nested locking
+--------------------------------------------------------
+
+There are a few cases where the Linux kernel acquires more than one
+instance of the same lock-type. Such cases typically happen when there
+is some sort of hierarchy within objects of the same type. In these
+cases there is an inherent "natural" ordering between the two objects
+(defined by the properties of the hierarchy), and the kernel grabs the
+locks in this fixed order on each of the objects.
+
+An example of such an object hieararchy that results in "nested locking"
+is that of a "whole disk" block-dev object and a "partition" block-dev
+object; the partition is "part of" the whole device and as long as one
+always takes the whole disk lock as a higher lock than the partition
+lock, the lock ordering is fully correct. The validator does not
+automatically detect this natural ordering, as the locking rule behind
+the ordering is not static.
+
+In order to teach the validator about this correct usage model, new
+versions of the various locking primitives were added that allow you to
+specify a "nesting level". An example call, for the block device mutex,
+looks like this:
+
+enum bdev_bd_mutex_lock_type
+{
+       BD_MUTEX_NORMAL,
+       BD_MUTEX_WHOLE,
+       BD_MUTEX_PARTITION
+};
+
+ mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
+
+In this case the locking is done on a bdev object that is known to be a
+partition.
+
+The validator treats a lock that is taken in such a nested fasion as a
+separate (sub)class for the purposes of validation.
+
+Note: When changing code to use the _nested() primitives, be careful and
+check really thoroughly that the hiearchy is correctly mapped; otherwise
+you can get false positives or false negatives.
+
+Exception 2: Out of order unlocking
+-----------------------------------
+
+In the Linux kernel, locks are released in the opposite order in which
+they were taken, with a few exceptions. The validator is optimized for
+the common case, and in fact treats an "out of order" unlock as a
+locking bug. (the rationale is that the code is doing something rare,
+which can be a sign of a bug)
+
+There are some cases where releasing the locks out of order is
+unavoidable and dictated by the algorithm that is being implemented.
+Therefore, the validator can be told about this, using a special
+unlocking variant of the primitives. An example call looks like this:
+
+ spin_unlock_non_nested(&target->d_lock);
+
+Here the d_lock is released by the VFS in a different order than it was
+taken, as required by the d_move() algorithm.
+
+Note: the _non_nested() primitives are more expensive than the "normal"
+primitives, and in almost all cases it's trivial to use the natural
+unlock order. There are gains in doing this that are outside the realm
+of the validator regardless so it's strongly suggested to make sure that
+unlocking always happens in the natural order whenever reasonable,
+rather than blindly changing code to use the _non_nested() variants.
+
+Proof of 100% correctness:
+--------------------------
+
+The validator achieves perfect, mathematical 'closure' (proof of locking
+correctness) in the sense that for every simple, standalone single-task
+locking sequence that occured at least once during the lifetime of the
+kernel, the validator proves it with a 100% certainty that no
+combination and timing of these locking sequences can cause any type of
+lock related deadlock. [*]
+
+I.e. complex multi-CPU and multi-task locking scenarios do not have to
+occur in practice to prove a deadlock: only the simple 'component'
+locking chains have to occur at least once (anytime, in any
+task/context) for the validator to be able to prove correctness. (For
+example, complex deadlocks that would normally need more than 3 CPUs and
+a very unlikely constellation of tasks, irq-contexts and timings to
+occur, can be detected on a plain, lightly loaded single-CPU system as
+well!)
+
+This radically decreases the complexity of locking related QA of the
+kernel: what has to be done during QA is to trigger as many "simple"
+single-task locking dependencies in the kernel as possible, at least
+once, to prove locking correctness - instead of having to trigger every
+possible combination of locking interaction between CPUs, combined with
+every possible hardirq and softirq nesting scenario (which is impossible
+to do in practice).
+
+[*] assuming that the validator itself is 100% correct, and no other
+    part of the system corrupts the state of the validator in any way.
+    We also assume that all NMI/SMM paths [which could interrupt
+    even hardirq-disabled codepaths] are correct and do not interfere
+    with the validator. We also assume that the 64-bit 'chain hash'
+    value is unique for every lock-chain in the system. Also, lock
+    recursion must not be higher than 20.
+
+Performance:
+------------
+
+The above rules require _massive_ amounts of runtime checking. If we did
+that for every lock taken and for every irqs-enable event, it would
+render the system practically unusably slow. The complexity of checking
+is O(N^2), so even with just a few hundred lock-types we'd have to do
+tens of thousands of checks for every event.
+
+This problem is solved by checking any given 'locking scenario' (unique
+sequence of locks taken after each other) only once. A simple stack of
+held locks is maintained, and a lightweight 64-bit hash value is
+calculated, which hash is unique for every lock chain. The hash value,
+when the chain is validated for the first time, is then put into a hash
+table, which hash-table can be checked in a lockfree manner. If the
+locking chain occurs again later on, the hash table tells us that we
+dont have to validate the chain again.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 26/61] lock validator: prove rwsem locking correctness
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (24 preceding siblings ...)
  2006-05-29 21:25 ` [patch 25/61] lock validator: design docs Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:25 ` [patch 27/61] lock validator: prove spinlock/rwlock " Ingo Molnar
                   ` (47 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

add CONFIG_PROVE_RWSEM_LOCKING, which uses the lock validator framework
to prove rwsem locking correctness.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/asm-i386/rwsem.h       |   38 +++++++++++++++++++--------
 include/linux/rwsem-spinlock.h |   23 +++++++++++++++-
 include/linux/rwsem.h          |   56 +++++++++++++++++++++++++++++++++++++++++
 lib/rwsem-spinlock.c           |   15 ++++++++--
 lib/rwsem.c                    |   19 +++++++++++++
 5 files changed, 135 insertions(+), 16 deletions(-)

Index: linux/include/asm-i386/rwsem.h
===================================================================
--- linux.orig/include/asm-i386/rwsem.h
+++ linux/include/asm-i386/rwsem.h
@@ -40,6 +40,7 @@
 
 #include <linux/list.h>
 #include <linux/spinlock.h>
+#include <linux/lockdep.h>
 
 struct rwsem_waiter;
 
@@ -64,6 +65,9 @@ struct rw_semaphore {
 #if RWSEM_DEBUG
 	int			debug;
 #endif
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+	struct lockdep_map dep_map;
+#endif
 };
 
 /*
@@ -75,22 +79,29 @@ struct rw_semaphore {
 #define __RWSEM_DEBUG_INIT	/* */
 #endif
 
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
+#else
+# define __RWSEM_DEP_MAP_INIT(lockname)
+#endif
+
+
 #define __RWSEM_INITIALIZER(name) \
 { RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, LIST_HEAD_INIT((name).wait_list) \
-	__RWSEM_DEBUG_INIT }
+	__RWSEM_DEBUG_INIT __RWSEM_DEP_MAP_INIT(name) }
 
 #define DECLARE_RWSEM(name) \
 	struct rw_semaphore name = __RWSEM_INITIALIZER(name)
 
-static inline void init_rwsem(struct rw_semaphore *sem)
-{
-	sem->count = RWSEM_UNLOCKED_VALUE;
-	spin_lock_init(&sem->wait_lock);
-	INIT_LIST_HEAD(&sem->wait_list);
-#if RWSEM_DEBUG
-	sem->debug = 0;
-#endif
-}
+extern void __init_rwsem(struct rw_semaphore *sem, const char *name,
+			 struct lockdep_type_key *key);
+
+#define init_rwsem(sem)						\
+do {								\
+	static struct lockdep_type_key __key;			\
+								\
+	__init_rwsem((sem), #sem, &__key);			\
+} while (0)
 
 /*
  * lock for reading
@@ -143,7 +154,7 @@ LOCK_PREFIX	"  cmpxchgl  %2,%0\n\t"
 /*
  * lock for writing
  */
-static inline void __down_write(struct rw_semaphore *sem)
+static inline void __down_write_nested(struct rw_semaphore *sem, int subtype)
 {
 	int tmp;
 
@@ -167,6 +178,11 @@ LOCK_PREFIX	"  xadd      %%edx,(%%eax)\n
 		: "memory", "cc");
 }
 
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	__down_write_nested(sem, 0);
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
Index: linux/include/linux/rwsem-spinlock.h
===================================================================
--- linux.orig/include/linux/rwsem-spinlock.h
+++ linux/include/linux/rwsem-spinlock.h
@@ -35,6 +35,9 @@ struct rw_semaphore {
 #if RWSEM_DEBUG
 	int			debug;
 #endif
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+	struct lockdep_map dep_map;
+#endif
 };
 
 /*
@@ -46,16 +49,32 @@ struct rw_semaphore {
 #define __RWSEM_DEBUG_INIT	/* */
 #endif
 
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
+#else
+# define __RWSEM_DEP_MAP_INIT(lockname)
+#endif
+
 #define __RWSEM_INITIALIZER(name) \
-{ 0, SPIN_LOCK_UNLOCKED, LIST_HEAD_INIT((name).wait_list) __RWSEM_DEBUG_INIT }
+{ 0, SPIN_LOCK_UNLOCKED, LIST_HEAD_INIT((name).wait_list) __RWSEM_DEBUG_INIT __RWSEM_DEP_MAP_INIT(name) }
 
 #define DECLARE_RWSEM(name) \
 	struct rw_semaphore name = __RWSEM_INITIALIZER(name)
 
-extern void FASTCALL(init_rwsem(struct rw_semaphore *sem));
+extern void __init_rwsem(struct rw_semaphore *sem, const char *name,
+			 struct lockdep_type_key *key);
+
+#define init_rwsem(sem)						\
+do {								\
+	static struct lockdep_type_key __key;			\
+								\
+	__init_rwsem((sem), #sem, &__key);			\
+} while (0)
+
 extern void FASTCALL(__down_read(struct rw_semaphore *sem));
 extern int FASTCALL(__down_read_trylock(struct rw_semaphore *sem));
 extern void FASTCALL(__down_write(struct rw_semaphore *sem));
+extern void FASTCALL(__down_write_nested(struct rw_semaphore *sem, int subtype));
 extern int FASTCALL(__down_write_trylock(struct rw_semaphore *sem));
 extern void FASTCALL(__up_read(struct rw_semaphore *sem));
 extern void FASTCALL(__up_write(struct rw_semaphore *sem));
Index: linux/include/linux/rwsem.h
===================================================================
--- linux.orig/include/linux/rwsem.h
+++ linux/include/linux/rwsem.h
@@ -40,6 +40,20 @@ extern void FASTCALL(rwsemtrace(struct r
 static inline void down_read(struct rw_semaphore *sem)
 {
 	might_sleep();
+	rwsem_acquire_read(&sem->dep_map, 0, 0, _THIS_IP_);
+
+	rwsemtrace(sem,"Entering down_read");
+	__down_read(sem);
+	rwsemtrace(sem,"Leaving down_read");
+}
+
+/*
+ * Take a lock when not the owner will release it:
+ */
+static inline void down_read_non_owner(struct rw_semaphore *sem)
+{
+	might_sleep();
+
 	rwsemtrace(sem,"Entering down_read");
 	__down_read(sem);
 	rwsemtrace(sem,"Leaving down_read");
@@ -53,6 +67,8 @@ static inline int down_read_trylock(stru
 	int ret;
 	rwsemtrace(sem,"Entering down_read_trylock");
 	ret = __down_read_trylock(sem);
+	if (ret == 1)
+		rwsem_acquire_read(&sem->dep_map, 0, 1, _THIS_IP_);
 	rwsemtrace(sem,"Leaving down_read_trylock");
 	return ret;
 }
@@ -63,12 +79,28 @@ static inline int down_read_trylock(stru
 static inline void down_write(struct rw_semaphore *sem)
 {
 	might_sleep();
+	rwsem_acquire(&sem->dep_map, 0, 0, _THIS_IP_);
+
 	rwsemtrace(sem,"Entering down_write");
 	__down_write(sem);
 	rwsemtrace(sem,"Leaving down_write");
 }
 
 /*
+ * lock for writing
+ */
+static inline void down_write_nested(struct rw_semaphore *sem, int subtype)
+{
+	might_sleep();
+	rwsem_acquire(&sem->dep_map, subtype, 0, _THIS_IP_);
+
+	rwsemtrace(sem,"Entering down_write_nested");
+	__down_write_nested(sem, subtype);
+	rwsemtrace(sem,"Leaving down_write_nested");
+}
+
+
+/*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
 static inline int down_write_trylock(struct rw_semaphore *sem)
@@ -76,6 +108,8 @@ static inline int down_write_trylock(str
 	int ret;
 	rwsemtrace(sem,"Entering down_write_trylock");
 	ret = __down_write_trylock(sem);
+	if (ret == 1)
+		rwsem_acquire(&sem->dep_map, 0, 0, _THIS_IP_);
 	rwsemtrace(sem,"Leaving down_write_trylock");
 	return ret;
 }
@@ -85,16 +119,34 @@ static inline int down_write_trylock(str
  */
 static inline void up_read(struct rw_semaphore *sem)
 {
+	rwsem_release(&sem->dep_map, 1, _THIS_IP_);
+
 	rwsemtrace(sem,"Entering up_read");
 	__up_read(sem);
 	rwsemtrace(sem,"Leaving up_read");
 }
 
+static inline void up_read_non_nested(struct rw_semaphore *sem)
+{
+	rwsem_release(&sem->dep_map, 0, _THIS_IP_);
+	__up_read(sem);
+}
+
+/*
+ * Not the owner will release it:
+ */
+static inline void up_read_non_owner(struct rw_semaphore *sem)
+{
+	__up_read(sem);
+}
+
 /*
  * release a write lock
  */
 static inline void up_write(struct rw_semaphore *sem)
 {
+	rwsem_release(&sem->dep_map, 1, _THIS_IP_);
+
 	rwsemtrace(sem,"Entering up_write");
 	__up_write(sem);
 	rwsemtrace(sem,"Leaving up_write");
@@ -105,6 +157,10 @@ static inline void up_write(struct rw_se
  */
 static inline void downgrade_write(struct rw_semaphore *sem)
 {
+	/*
+	 * lockdep: a downgraded write will live on as a write
+	 * dependency.
+	 */
 	rwsemtrace(sem,"Entering downgrade_write");
 	__downgrade_write(sem);
 	rwsemtrace(sem,"Leaving downgrade_write");
Index: linux/lib/rwsem-spinlock.c
===================================================================
--- linux.orig/lib/rwsem-spinlock.c
+++ linux/lib/rwsem-spinlock.c
@@ -30,7 +30,8 @@ void rwsemtrace(struct rw_semaphore *sem
 /*
  * initialise the semaphore
  */
-void fastcall init_rwsem(struct rw_semaphore *sem)
+void __init_rwsem(struct rw_semaphore *sem, const char *name,
+		  struct lockdep_type_key *key)
 {
 	sem->activity = 0;
 	spin_lock_init(&sem->wait_lock);
@@ -38,6 +39,9 @@ void fastcall init_rwsem(struct rw_semap
 #if RWSEM_DEBUG
 	sem->debug = 0;
 #endif
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+	lockdep_init_map(&sem->dep_map, name, key);
+#endif
 }
 
 /*
@@ -204,7 +208,7 @@ int fastcall __down_read_trylock(struct 
  * get a write lock on the semaphore
  * - we increment the waiting count anyway to indicate an exclusive lock
  */
-void fastcall __sched __down_write(struct rw_semaphore *sem)
+void fastcall __sched __down_write_nested(struct rw_semaphore *sem, int subtype)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
@@ -247,6 +251,11 @@ void fastcall __sched __down_write(struc
 	rwsemtrace(sem, "Leaving __down_write");
 }
 
+void fastcall __sched __down_write(struct rw_semaphore *sem)
+{
+	__down_write_nested(sem, 0);
+}
+
 /*
  * trylock for writing -- returns 1 if successful, 0 if contention
  */
@@ -331,7 +340,7 @@ void fastcall __downgrade_write(struct r
 	rwsemtrace(sem, "Leaving __downgrade_write");
 }
 
-EXPORT_SYMBOL(init_rwsem);
+EXPORT_SYMBOL(__init_rwsem);
 EXPORT_SYMBOL(__down_read);
 EXPORT_SYMBOL(__down_read_trylock);
 EXPORT_SYMBOL(__down_write);
Index: linux/lib/rwsem.c
===================================================================
--- linux.orig/lib/rwsem.c
+++ linux/lib/rwsem.c
@@ -8,6 +8,25 @@
 #include <linux/init.h>
 #include <linux/module.h>
 
+/*
+ * Initialize an rwsem:
+ */
+void __init_rwsem(struct rw_semaphore *sem, const char *name,
+		  struct lockdep_type_key *key)
+{
+	sem->count = RWSEM_UNLOCKED_VALUE;
+	spin_lock_init(&sem->wait_lock);
+	INIT_LIST_HEAD(&sem->wait_list);
+#if RWSEM_DEBUG
+	sem->debug = 0;
+#endif
+#ifdef CONFIG_PROVE_RWSEM_LOCKING
+	lockdep_init_map(&sem->dep_map, name, key);
+#endif
+}
+
+EXPORT_SYMBOL(__init_rwsem);
+
 struct rwsem_waiter {
 	struct list_head list;
 	struct task_struct *task;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 27/61] lock validator: prove spinlock/rwlock locking correctness
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (25 preceding siblings ...)
  2006-05-29 21:25 ` [patch 26/61] lock validator: prove rwsem locking correctness Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-30  1:35   ` Andrew Morton
  2006-05-29 21:25 ` [patch 28/61] lock validator: prove mutex " Ingo Molnar
                   ` (46 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

add CONFIG_PROVE_SPIN_LOCKING and CONFIG_PROVE_RW_LOCKING, which uses
the lock validator framework to prove spinlock and rwlock locking
correctness.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/asm-i386/spinlock.h       |    2 
 include/linux/spinlock.h          |   96 ++++++++++++++++++++++-----
 include/linux/spinlock_api_smp.h  |    4 +
 include/linux/spinlock_api_up.h   |    3 
 include/linux/spinlock_types.h    |   32 ++++++++-
 include/linux/spinlock_types_up.h |   10 ++
 include/linux/spinlock_up.h       |    4 -
 kernel/Makefile                   |    2 
 kernel/sched.c                    |   10 ++
 kernel/spinlock.c                 |  131 +++++++++++++++++++++++++++++++++++---
 lib/kernel_lock.c                 |    7 +-
 net/ipv4/route.c                  |    4 -
 12 files changed, 269 insertions(+), 36 deletions(-)

Index: linux/include/asm-i386/spinlock.h
===================================================================
--- linux.orig/include/asm-i386/spinlock.h
+++ linux/include/asm-i386/spinlock.h
@@ -68,6 +68,7 @@ static inline void __raw_spin_lock(raw_s
 		"=m" (lock->slock) : : "memory");
 }
 
+#ifndef CONFIG_PROVE_SPIN_LOCKING
 static inline void __raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long flags)
 {
 	alternative_smp(
@@ -75,6 +76,7 @@ static inline void __raw_spin_lock_flags
 		__raw_spin_lock_string_up,
 		"=m" (lock->slock) : "r" (flags) : "memory");
 }
+#endif
 
 static inline int __raw_spin_trylock(raw_spinlock_t *lock)
 {
Index: linux/include/linux/spinlock.h
===================================================================
--- linux.orig/include/linux/spinlock.h
+++ linux/include/linux/spinlock.h
@@ -82,14 +82,64 @@ extern int __lockfunc generic__raw_read_
 /*
  * Pull the __raw*() functions/declarations (UP-nondebug doesnt need them):
  */
-#if defined(CONFIG_SMP)
+#ifdef CONFIG_SMP
 # include <asm/spinlock.h>
 #else
 # include <linux/spinlock_up.h>
 #endif
 
-#define spin_lock_init(lock)	do { *(lock) = SPIN_LOCK_UNLOCKED; } while (0)
-#define rwlock_init(lock)	do { *(lock) = RW_LOCK_UNLOCKED; } while (0)
+#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PROVE_SPIN_LOCKING)
+  extern void __spin_lock_init(spinlock_t *lock, const char *name,
+			       struct lockdep_type_key *key);
+# define spin_lock_init(lock)					\
+do {								\
+	static struct lockdep_type_key __key;			\
+								\
+	__spin_lock_init((lock), #lock, &__key);		\
+} while (0)
+
+/*
+ * If for example an array of static locks are initialized
+ * via spin_lock_init(), this API variant can be used to
+ * split the lock-types of them:
+ */
+# define spin_lock_init_static(lock)				\
+	__spin_lock_init((lock), #lock,				\
+			 (struct lockdep_type_key *)(lock))	\
+
+/*
+ * Type splitting can also be done for dynamic locks, if for
+ * example there are per-CPU dynamically allocated locks:
+ */
+# define spin_lock_init_key(lock, key)				\
+	__spin_lock_init((lock), #lock, key)
+
+#else
+# define spin_lock_init(lock)					\
+	do { *(lock) = SPIN_LOCK_UNLOCKED; } while (0)
+# define spin_lock_init_static(lock) 				\
+	spin_lock_init(lock)
+# define spin_lock_init_key(lock, key)				\
+	do { spin_lock_init(lock); (void)(key); } while (0)
+#endif
+
+#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PROVE_RW_LOCKING)
+  extern void __rwlock_init(rwlock_t *lock, const char *name,
+			    struct lockdep_type_key *key);
+# define rwlock_init(lock)					\
+do {								\
+	static struct lockdep_type_key __key;			\
+								\
+	__rwlock_init((lock), #lock, &__key);			\
+} while (0)
+# define rwlock_init_key(lock, key)				\
+	__rwlock_init((lock), #lock, key)
+#else
+# define rwlock_init(lock)					\
+	do { *(lock) = RW_LOCK_UNLOCKED; } while (0)
+# define rwlock_init_key(lock, key)				\
+	do { rwlock_init(lock); (void)(key); } while (0)
+#endif
 
 #define spin_is_locked(lock)	__raw_spin_is_locked(&(lock)->raw_lock)
 
@@ -102,7 +152,9 @@ extern int __lockfunc generic__raw_read_
 /*
  * Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
  */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || \
+	defined(CONFIG_PROVE_SPIN_LOCKING) || \
+	defined(CONFIG_PROVE_RW_LOCKING)
 # include <linux/spinlock_api_smp.h>
 #else
 # include <linux/spinlock_api_up.h>
@@ -113,7 +165,6 @@ extern int __lockfunc generic__raw_read_
 #define _raw_spin_lock_flags(lock, flags) _raw_spin_lock(lock)
  extern int _raw_spin_trylock(spinlock_t *lock);
  extern void _raw_spin_unlock(spinlock_t *lock);
-
  extern void _raw_read_lock(rwlock_t *lock);
  extern int _raw_read_trylock(rwlock_t *lock);
  extern void _raw_read_unlock(rwlock_t *lock);
@@ -121,17 +172,17 @@ extern int __lockfunc generic__raw_read_
  extern int _raw_write_trylock(rwlock_t *lock);
  extern void _raw_write_unlock(rwlock_t *lock);
 #else
-# define _raw_spin_unlock(lock)		__raw_spin_unlock(&(lock)->raw_lock)
-# define _raw_spin_trylock(lock)	__raw_spin_trylock(&(lock)->raw_lock)
 # define _raw_spin_lock(lock)		__raw_spin_lock(&(lock)->raw_lock)
 # define _raw_spin_lock_flags(lock, flags) \
 		__raw_spin_lock_flags(&(lock)->raw_lock, *(flags))
+# define _raw_spin_trylock(lock)	__raw_spin_trylock(&(lock)->raw_lock)
+# define _raw_spin_unlock(lock)		__raw_spin_unlock(&(lock)->raw_lock)
 # define _raw_read_lock(rwlock)		__raw_read_lock(&(rwlock)->raw_lock)
-# define _raw_write_lock(rwlock)	__raw_write_lock(&(rwlock)->raw_lock)
-# define _raw_read_unlock(rwlock)	__raw_read_unlock(&(rwlock)->raw_lock)
-# define _raw_write_unlock(rwlock)	__raw_write_unlock(&(rwlock)->raw_lock)
 # define _raw_read_trylock(rwlock)	__raw_read_trylock(&(rwlock)->raw_lock)
+# define _raw_read_unlock(rwlock)	__raw_read_unlock(&(rwlock)->raw_lock)
+# define _raw_write_lock(rwlock)	__raw_write_lock(&(rwlock)->raw_lock)
 # define _raw_write_trylock(rwlock)	__raw_write_trylock(&(rwlock)->raw_lock)
+# define _raw_write_unlock(rwlock)	__raw_write_unlock(&(rwlock)->raw_lock)
 #endif
 
 #define read_can_lock(rwlock)		__raw_read_can_lock(&(rwlock)->raw_lock)
@@ -147,10 +198,14 @@ extern int __lockfunc generic__raw_read_
 #define write_trylock(lock)		__cond_lock(_write_trylock(lock))
 
 #define spin_lock(lock)			_spin_lock(lock)
+#define spin_lock_nested(lock, subtype) \
+					_spin_lock_nested(lock, subtype)
 #define write_lock(lock)		_write_lock(lock)
 #define read_lock(lock)			_read_lock(lock)
 
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || \
+	defined(CONFIG_PROVE_SPIN_LOCKING) || \
+	defined(CONFIG_PROVE_RW_LOCKING)
 #define spin_lock_irqsave(lock, flags)	flags = _spin_lock_irqsave(lock)
 #define read_lock_irqsave(lock, flags)	flags = _read_lock_irqsave(lock)
 #define write_lock_irqsave(lock, flags)	flags = _write_lock_irqsave(lock)
@@ -172,21 +227,24 @@ extern int __lockfunc generic__raw_read_
 /*
  * We inline the unlock functions in the nondebug case:
  */
-#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || !defined(CONFIG_SMP)
+#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || \
+	!defined(CONFIG_SMP) || \
+	defined(CONFIG_PROVE_SPIN_LOCKING) || \
+	defined(CONFIG_PROVE_RW_LOCKING)
 # define spin_unlock(lock)		_spin_unlock(lock)
+# define spin_unlock_non_nested(lock)	_spin_unlock_non_nested(lock)
 # define read_unlock(lock)		_read_unlock(lock)
+# define read_unlock_non_nested(lock)	_read_unlock_non_nested(lock)
 # define write_unlock(lock)		_write_unlock(lock)
-#else
-# define spin_unlock(lock)		__raw_spin_unlock(&(lock)->raw_lock)
-# define read_unlock(lock)		__raw_read_unlock(&(lock)->raw_lock)
-# define write_unlock(lock)		__raw_write_unlock(&(lock)->raw_lock)
-#endif
-
-#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || !defined(CONFIG_SMP)
 # define spin_unlock_irq(lock)		_spin_unlock_irq(lock)
 # define read_unlock_irq(lock)		_read_unlock_irq(lock)
 # define write_unlock_irq(lock)		_write_unlock_irq(lock)
 #else
+# define spin_unlock(lock)		__raw_spin_unlock(&(lock)->raw_lock)
+# define spin_unlock_non_nested(lock)	__raw_spin_unlock(&(lock)->raw_lock)
+# define read_unlock(lock)		__raw_read_unlock(&(lock)->raw_lock)
+# define read_unlock_non_nested(lock)	__raw_read_unlock(&(lock)->raw_lock)
+# define write_unlock(lock)		__raw_write_unlock(&(lock)->raw_lock)
 # define spin_unlock_irq(lock) \
     do { __raw_spin_unlock(&(lock)->raw_lock); local_irq_enable(); } while (0)
 # define read_unlock_irq(lock) \
Index: linux/include/linux/spinlock_api_smp.h
===================================================================
--- linux.orig/include/linux/spinlock_api_smp.h
+++ linux/include/linux/spinlock_api_smp.h
@@ -20,6 +20,8 @@ int in_lock_functions(unsigned long addr
 #define assert_spin_locked(x)	BUG_ON(!spin_is_locked(x))
 
 void __lockfunc _spin_lock(spinlock_t *lock)		__acquires(spinlock_t);
+void __lockfunc _spin_lock_nested(spinlock_t *lock, int subtype)
+							__acquires(spinlock_t);
 void __lockfunc _read_lock(rwlock_t *lock)		__acquires(rwlock_t);
 void __lockfunc _write_lock(rwlock_t *lock)		__acquires(rwlock_t);
 void __lockfunc _spin_lock_bh(spinlock_t *lock)		__acquires(spinlock_t);
@@ -39,7 +41,9 @@ int __lockfunc _read_trylock(rwlock_t *l
 int __lockfunc _write_trylock(rwlock_t *lock);
 int __lockfunc _spin_trylock_bh(spinlock_t *lock);
 void __lockfunc _spin_unlock(spinlock_t *lock)		__releases(spinlock_t);
+void __lockfunc _spin_unlock_non_nested(spinlock_t *lock) __releases(spinlock_t);
 void __lockfunc _read_unlock(rwlock_t *lock)		__releases(rwlock_t);
+void __lockfunc _read_unlock_non_nested(rwlock_t *lock)	__releases(rwlock_t);
 void __lockfunc _write_unlock(rwlock_t *lock)		__releases(rwlock_t);
 void __lockfunc _spin_unlock_bh(spinlock_t *lock)	__releases(spinlock_t);
 void __lockfunc _read_unlock_bh(rwlock_t *lock)		__releases(rwlock_t);
Index: linux/include/linux/spinlock_api_up.h
===================================================================
--- linux.orig/include/linux/spinlock_api_up.h
+++ linux/include/linux/spinlock_api_up.h
@@ -49,6 +49,7 @@
   do { local_irq_restore(flags); __UNLOCK(lock); } while (0)
 
 #define _spin_lock(lock)			__LOCK(lock)
+#define _spin_lock_nested(lock, subtype)	__LOCK(lock)
 #define _read_lock(lock)			__LOCK(lock)
 #define _write_lock(lock)			__LOCK(lock)
 #define _spin_lock_bh(lock)			__LOCK_BH(lock)
@@ -65,7 +66,9 @@
 #define _write_trylock(lock)			({ __LOCK(lock); 1; })
 #define _spin_trylock_bh(lock)			({ __LOCK_BH(lock); 1; })
 #define _spin_unlock(lock)			__UNLOCK(lock)
+#define _spin_unlock_non_nested(lock)		__UNLOCK(lock)
 #define _read_unlock(lock)			__UNLOCK(lock)
+#define _read_unlock_non_nested(lock)		__UNLOCK(lock)
 #define _write_unlock(lock)			__UNLOCK(lock)
 #define _spin_unlock_bh(lock)			__UNLOCK_BH(lock)
 #define _write_unlock_bh(lock)			__UNLOCK_BH(lock)
Index: linux/include/linux/spinlock_types.h
===================================================================
--- linux.orig/include/linux/spinlock_types.h
+++ linux/include/linux/spinlock_types.h
@@ -9,6 +9,8 @@
  * Released under the General Public License (GPL).
  */
 
+#include <linux/lockdep.h>
+
 #if defined(CONFIG_SMP)
 # include <asm/spinlock_types.h>
 #else
@@ -24,6 +26,9 @@ typedef struct {
 	unsigned int magic, owner_cpu;
 	void *owner;
 #endif
+#ifdef CONFIG_PROVE_SPIN_LOCKING
+	struct lockdep_map dep_map;
+#endif
 } spinlock_t;
 
 #define SPINLOCK_MAGIC		0xdead4ead
@@ -37,28 +42,47 @@ typedef struct {
 	unsigned int magic, owner_cpu;
 	void *owner;
 #endif
+#ifdef CONFIG_PROVE_RW_LOCKING
+	struct lockdep_map dep_map;
+#endif
 } rwlock_t;
 
 #define RWLOCK_MAGIC		0xdeaf1eed
 
 #define SPINLOCK_OWNER_INIT	((void *)-1L)
 
+#ifdef CONFIG_PROVE_SPIN_LOCKING
+# define SPIN_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+#else
+# define SPIN_DEP_MAP_INIT(lockname)
+#endif
+
+#ifdef CONFIG_PROVE_RW_LOCKING
+# define RW_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+#else
+# define RW_DEP_MAP_INIT(lockname)
+#endif
+
 #ifdef CONFIG_DEBUG_SPINLOCK
 # define __SPIN_LOCK_UNLOCKED(lockname)					\
 	(spinlock_t)	{	.raw_lock = __RAW_SPIN_LOCK_UNLOCKED,	\
 				.magic = SPINLOCK_MAGIC,		\
 				.owner = SPINLOCK_OWNER_INIT,		\
-				.owner_cpu = -1 }
+				.owner_cpu = -1,			\
+				SPIN_DEP_MAP_INIT(lockname) }
 #define __RW_LOCK_UNLOCKED(lockname)					\
 	(rwlock_t)	{	.raw_lock = __RAW_RW_LOCK_UNLOCKED,	\
 				.magic = RWLOCK_MAGIC,			\
 				.owner = SPINLOCK_OWNER_INIT,		\
-				.owner_cpu = -1 }
+				.owner_cpu = -1,			\
+				RW_DEP_MAP_INIT(lockname) }
 #else
 # define __SPIN_LOCK_UNLOCKED(lockname) \
-	(spinlock_t)	{	.raw_lock = __RAW_SPIN_LOCK_UNLOCKED }
+	(spinlock_t)	{	.raw_lock = __RAW_SPIN_LOCK_UNLOCKED,	\
+				SPIN_DEP_MAP_INIT(lockname) }
 #define __RW_LOCK_UNLOCKED(lockname) \
-	(rwlock_t)	{	.raw_lock = __RAW_RW_LOCK_UNLOCKED }
+	(rwlock_t)	{	.raw_lock = __RAW_RW_LOCK_UNLOCKED,	\
+				RW_DEP_MAP_INIT(lockname) }
 #endif
 
 #define SPIN_LOCK_UNLOCKED	__SPIN_LOCK_UNLOCKED(old_style_spin_init)
Index: linux/include/linux/spinlock_types_up.h
===================================================================
--- linux.orig/include/linux/spinlock_types_up.h
+++ linux/include/linux/spinlock_types_up.h
@@ -12,10 +12,15 @@
  * Released under the General Public License (GPL).
  */
 
-#ifdef CONFIG_DEBUG_SPINLOCK
+#if defined(CONFIG_DEBUG_SPINLOCK) || \
+	defined(CONFIG_PROVE_SPIN_LOCKING) || \
+	defined(CONFIG_PROVE_RW_LOCKING)
 
 typedef struct {
 	volatile unsigned int slock;
+#ifdef CONFIG_PROVE_SPIN_LOCKING
+	struct lockdep_map dep_map;
+#endif
 } raw_spinlock_t;
 
 #define __RAW_SPIN_LOCK_UNLOCKED { 1 }
@@ -30,6 +35,9 @@ typedef struct { } raw_spinlock_t;
 
 typedef struct {
 	/* no debug version on UP */
+#ifdef CONFIG_PROVE_RW_LOCKING
+	struct lockdep_map dep_map;
+#endif
 } raw_rwlock_t;
 
 #define __RAW_RW_LOCK_UNLOCKED { }
Index: linux/include/linux/spinlock_up.h
===================================================================
--- linux.orig/include/linux/spinlock_up.h
+++ linux/include/linux/spinlock_up.h
@@ -17,7 +17,9 @@
  * No atomicity anywhere, we are on UP.
  */
 
-#ifdef CONFIG_DEBUG_SPINLOCK
+#if defined(CONFIG_DEBUG_SPINLOCK) || \
+	defined(CONFIG_PROVE_SPIN_LOCKING) || \
+	defined(CONFIG_PROVE_RW_LOCKING)
 
 #define __raw_spin_is_locked(x)		((x)->slock == 0)
 
Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -26,6 +26,8 @@ obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
 obj-$(CONFIG_SMP) += cpu.o spinlock.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o
+obj-$(CONFIG_PROVE_SPIN_LOCKING) += spinlock.o
+obj-$(CONFIG_PROVE_RW_LOCKING) += spinlock.o
 obj-$(CONFIG_UID16) += uid16.o
 obj-$(CONFIG_MODULES) += module.o
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -312,6 +312,13 @@ static inline void finish_lock_switch(ru
 	/* this is a valid case when another task releases the spinlock */
 	rq->lock.owner = current;
 #endif
+	/*
+	 * If we are tracking spinlock dependencies then we have to
+	 * fix up the runqueue lock - which gets 'carried over' from
+	 * prev into current:
+	 */
+	spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_);
+
 	spin_unlock_irq(&rq->lock);
 }
 
@@ -1839,6 +1846,7 @@ task_t * context_switch(runqueue_t *rq, 
 		WARN_ON(rq->prev_mm);
 		rq->prev_mm = oldmm;
 	}
+	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
 
 	/* Here we just switch the register state and the stack. */
 	switch_to(prev, next, prev);
@@ -4406,6 +4414,7 @@ asmlinkage long sys_sched_yield(void)
 	 * no need to preempt or enable interrupts:
 	 */
 	__release(rq->lock);
+	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
 	_raw_spin_unlock(&rq->lock);
 	preempt_enable_no_resched();
 
@@ -4465,6 +4474,7 @@ int cond_resched_lock(spinlock_t *lock)
 		spin_lock(lock);
 	}
 	if (need_resched()) {
+		spin_release(&lock->dep_map, 1, _THIS_IP_);
 		_raw_spin_unlock(lock);
 		preempt_enable_no_resched();
 		__cond_resched();
Index: linux/kernel/spinlock.c
===================================================================
--- linux.orig/kernel/spinlock.c
+++ linux/kernel/spinlock.c
@@ -14,8 +14,47 @@
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
+#include <linux/debug_locks.h>
 #include <linux/module.h>
 
+#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PROVE_SPIN_LOCKING)
+void __spin_lock_init(spinlock_t *lock, const char *name,
+		      struct lockdep_type_key *key)
+{
+	lock->raw_lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
+#ifdef CONFIG_DEBUG_SPINLOCK
+	lock->magic = SPINLOCK_MAGIC;
+	lock->owner = SPINLOCK_OWNER_INIT;
+	lock->owner_cpu = -1;
+#endif
+#ifdef CONFIG_PROVE_SPIN_LOCKING
+	lockdep_init_map(&lock->dep_map, name, key);
+#endif
+}
+
+EXPORT_SYMBOL(__spin_lock_init);
+
+#endif
+
+#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PROVE_RW_LOCKING)
+
+void __rwlock_init(rwlock_t *lock, const char *name,
+		   struct lockdep_type_key *key)
+{
+	lock->raw_lock = (raw_rwlock_t) __RAW_RW_LOCK_UNLOCKED;
+#ifdef CONFIG_DEBUG_SPINLOCK
+	lock->magic = RWLOCK_MAGIC;
+	lock->owner = SPINLOCK_OWNER_INIT;
+	lock->owner_cpu = -1;
+#endif
+#ifdef CONFIG_PROVE_RW_LOCKING
+	lockdep_init_map(&lock->dep_map, name, key);
+#endif
+}
+
+EXPORT_SYMBOL(__rwlock_init);
+
+#endif
 /*
  * Generic declaration of the raw read_trylock() function,
  * architectures are supposed to optimize this:
@@ -30,8 +69,10 @@ EXPORT_SYMBOL(generic__raw_read_trylock)
 int __lockfunc _spin_trylock(spinlock_t *lock)
 {
 	preempt_disable();
-	if (_raw_spin_trylock(lock))
+	if (_raw_spin_trylock(lock)) {
+		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
 		return 1;
+	}
 	
 	preempt_enable();
 	return 0;
@@ -41,8 +82,10 @@ EXPORT_SYMBOL(_spin_trylock);
 int __lockfunc _read_trylock(rwlock_t *lock)
 {
 	preempt_disable();
-	if (_raw_read_trylock(lock))
+	if (_raw_read_trylock(lock)) {
+		rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_);
 		return 1;
+	}
 
 	preempt_enable();
 	return 0;
@@ -52,19 +95,29 @@ EXPORT_SYMBOL(_read_trylock);
 int __lockfunc _write_trylock(rwlock_t *lock)
 {
 	preempt_disable();
-	if (_raw_write_trylock(lock))
+	if (_raw_write_trylock(lock)) {
+		rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_);
 		return 1;
+	}
 
 	preempt_enable();
 	return 0;
 }
 EXPORT_SYMBOL(_write_trylock);
 
-#if !defined(CONFIG_PREEMPT) || !defined(CONFIG_SMP)
+/*
+ * If lockdep is enabled then we use the non-preemption spin-ops
+ * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
+ * not re-enabled during lock-acquire (which the preempt-spin-ops do):
+ */
+#if !defined(CONFIG_PREEMPT) || !defined(CONFIG_SMP) || \
+	defined(CONFIG_PROVE_SPIN_LOCKING) || \
+	defined(CONFIG_PROVE_RW_LOCKING)
 
 void __lockfunc _read_lock(rwlock_t *lock)
 {
 	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_read_lock(lock);
 }
 EXPORT_SYMBOL(_read_lock);
@@ -75,7 +128,17 @@ unsigned long __lockfunc _spin_lock_irqs
 
 	local_irq_save(flags);
 	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	/*
+	 * On lockdep we dont want the hand-coded irq-enable of
+	 * _raw_spin_lock_flags() code, because lockdep assumes
+	 * that interrupts are not re-enabled during lock-acquire:
+	 */
+#ifdef CONFIG_PROVE_SPIN_LOCKING
+	_raw_spin_lock(lock);
+#else
 	_raw_spin_lock_flags(lock, &flags);
+#endif
 	return flags;
 }
 EXPORT_SYMBOL(_spin_lock_irqsave);
@@ -84,6 +147,7 @@ void __lockfunc _spin_lock_irq(spinlock_
 {
 	local_irq_disable();
 	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_spin_lock(lock);
 }
 EXPORT_SYMBOL(_spin_lock_irq);
@@ -92,6 +156,7 @@ void __lockfunc _spin_lock_bh(spinlock_t
 {
 	local_bh_disable();
 	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_spin_lock(lock);
 }
 EXPORT_SYMBOL(_spin_lock_bh);
@@ -102,6 +167,7 @@ unsigned long __lockfunc _read_lock_irqs
 
 	local_irq_save(flags);
 	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_read_lock(lock);
 	return flags;
 }
@@ -111,6 +177,7 @@ void __lockfunc _read_lock_irq(rwlock_t 
 {
 	local_irq_disable();
 	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_read_lock(lock);
 }
 EXPORT_SYMBOL(_read_lock_irq);
@@ -119,6 +186,7 @@ void __lockfunc _read_lock_bh(rwlock_t *
 {
 	local_bh_disable();
 	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_read_lock(lock);
 }
 EXPORT_SYMBOL(_read_lock_bh);
@@ -129,6 +197,7 @@ unsigned long __lockfunc _write_lock_irq
 
 	local_irq_save(flags);
 	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_write_lock(lock);
 	return flags;
 }
@@ -138,6 +207,7 @@ void __lockfunc _write_lock_irq(rwlock_t
 {
 	local_irq_disable();
 	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_write_lock(lock);
 }
 EXPORT_SYMBOL(_write_lock_irq);
@@ -146,6 +216,7 @@ void __lockfunc _write_lock_bh(rwlock_t 
 {
 	local_bh_disable();
 	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_write_lock(lock);
 }
 EXPORT_SYMBOL(_write_lock_bh);
@@ -153,6 +224,7 @@ EXPORT_SYMBOL(_write_lock_bh);
 void __lockfunc _spin_lock(spinlock_t *lock)
 {
 	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_spin_lock(lock);
 }
 
@@ -161,6 +233,7 @@ EXPORT_SYMBOL(_spin_lock);
 void __lockfunc _write_lock(rwlock_t *lock)
 {
 	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	_raw_write_lock(lock);
 }
 
@@ -256,15 +329,35 @@ BUILD_LOCK_OPS(write, rwlock);
 
 #endif /* CONFIG_PREEMPT */
 
+void __lockfunc _spin_lock_nested(spinlock_t *lock, int subtype)
+{
+	preempt_disable();
+	spin_acquire(&lock->dep_map, subtype, 0, _RET_IP_);
+	_raw_spin_lock(lock);
+}
+
+EXPORT_SYMBOL(_spin_lock_nested);
+
 void __lockfunc _spin_unlock(spinlock_t *lock)
 {
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_spin_unlock(lock);
 	preempt_enable();
 }
 EXPORT_SYMBOL(_spin_unlock);
 
+void __lockfunc _spin_unlock_non_nested(spinlock_t *lock)
+{
+	spin_release(&lock->dep_map, 0, _RET_IP_);
+	_raw_spin_unlock(lock);
+	preempt_enable();
+}
+EXPORT_SYMBOL(_spin_unlock_non_nested);
+
+
 void __lockfunc _write_unlock(rwlock_t *lock)
 {
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_write_unlock(lock);
 	preempt_enable();
 }
@@ -272,13 +365,23 @@ EXPORT_SYMBOL(_write_unlock);
 
 void __lockfunc _read_unlock(rwlock_t *lock)
 {
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_read_unlock(lock);
 	preempt_enable();
 }
 EXPORT_SYMBOL(_read_unlock);
 
+void __lockfunc _read_unlock_non_nested(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 0, _RET_IP_);
+	_raw_read_unlock(lock);
+	preempt_enable();
+}
+EXPORT_SYMBOL(_read_unlock_non_nested);
+
 void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
 {
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_spin_unlock(lock);
 	local_irq_restore(flags);
 	preempt_enable();
@@ -287,6 +390,7 @@ EXPORT_SYMBOL(_spin_unlock_irqrestore);
 
 void __lockfunc _spin_unlock_irq(spinlock_t *lock)
 {
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_spin_unlock(lock);
 	local_irq_enable();
 	preempt_enable();
@@ -295,14 +399,16 @@ EXPORT_SYMBOL(_spin_unlock_irq);
 
 void __lockfunc _spin_unlock_bh(spinlock_t *lock)
 {
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_spin_unlock(lock);
 	preempt_enable_no_resched();
-	local_bh_enable();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
 }
 EXPORT_SYMBOL(_spin_unlock_bh);
 
 void __lockfunc _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_read_unlock(lock);
 	local_irq_restore(flags);
 	preempt_enable();
@@ -311,6 +417,7 @@ EXPORT_SYMBOL(_read_unlock_irqrestore);
 
 void __lockfunc _read_unlock_irq(rwlock_t *lock)
 {
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_read_unlock(lock);
 	local_irq_enable();
 	preempt_enable();
@@ -319,14 +426,16 @@ EXPORT_SYMBOL(_read_unlock_irq);
 
 void __lockfunc _read_unlock_bh(rwlock_t *lock)
 {
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_read_unlock(lock);
 	preempt_enable_no_resched();
-	local_bh_enable();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
 }
 EXPORT_SYMBOL(_read_unlock_bh);
 
 void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_write_unlock(lock);
 	local_irq_restore(flags);
 	preempt_enable();
@@ -335,6 +444,7 @@ EXPORT_SYMBOL(_write_unlock_irqrestore);
 
 void __lockfunc _write_unlock_irq(rwlock_t *lock)
 {
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_write_unlock(lock);
 	local_irq_enable();
 	preempt_enable();
@@ -343,9 +453,10 @@ EXPORT_SYMBOL(_write_unlock_irq);
 
 void __lockfunc _write_unlock_bh(rwlock_t *lock)
 {
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	_raw_write_unlock(lock);
 	preempt_enable_no_resched();
-	local_bh_enable();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
 }
 EXPORT_SYMBOL(_write_unlock_bh);
 
@@ -353,11 +464,13 @@ int __lockfunc _spin_trylock_bh(spinlock
 {
 	local_bh_disable();
 	preempt_disable();
-	if (_raw_spin_trylock(lock))
+	if (_raw_spin_trylock(lock)) {
+		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
 		return 1;
+	}
 
 	preempt_enable_no_resched();
-	local_bh_enable();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
 	return 0;
 }
 EXPORT_SYMBOL(_spin_trylock_bh);
Index: linux/lib/kernel_lock.c
===================================================================
--- linux.orig/lib/kernel_lock.c
+++ linux/lib/kernel_lock.c
@@ -177,7 +177,12 @@ static inline void __lock_kernel(void)
 
 static inline void __unlock_kernel(void)
 {
-	spin_unlock(&kernel_flag);
+	/*
+	 * the BKL is not covered by lockdep, so we open-code the
+	 * unlocking sequence (and thus avoid the dep-chain ops):
+	 */
+	_raw_spin_unlock(&kernel_flag);
+	preempt_enable();
 }
 
 /*
Index: linux/net/ipv4/route.c
===================================================================
--- linux.orig/net/ipv4/route.c
+++ linux/net/ipv4/route.c
@@ -206,7 +206,9 @@ __u8 ip_tos2prio[16] = {
 struct rt_hash_bucket {
 	struct rtable	*chain;
 };
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || \
+	defined(CONFIG_PROVE_SPIN_LOCKING) || \
+	defined(CONFIG_PROVE_RW_LOCKING)
 /*
  * Instead of using one spinlock for each rt_hash_bucket, we use a table of spinlocks
  * The size of this table is a power of two and depends on the number of CPUS.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 28/61] lock validator: prove mutex locking correctness
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (26 preceding siblings ...)
  2006-05-29 21:25 ` [patch 27/61] lock validator: prove spinlock/rwlock " Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:25 ` [patch 29/61] lock validator: print all lock-types on SysRq-D Ingo Molnar
                   ` (45 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

add CONFIG_PROVE_MUTEX_LOCKING, which uses the lock validator framework
to prove mutex locking correctness.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/mutex-debug.h |    8 +++++++-
 include/linux/mutex.h       |   34 +++++++++++++++++++++++++++++++---
 kernel/mutex-debug.c        |    8 ++++++++
 kernel/mutex-lockdep.h      |   40 ++++++++++++++++++++++++++++++++++++++++
 kernel/mutex.c              |   28 ++++++++++++++++++++++------
 kernel/mutex.h              |    3 +--
 6 files changed, 109 insertions(+), 12 deletions(-)

Index: linux/include/linux/mutex-debug.h
===================================================================
--- linux.orig/include/linux/mutex-debug.h
+++ linux/include/linux/mutex-debug.h
@@ -2,6 +2,7 @@
 #define __LINUX_MUTEX_DEBUG_H
 
 #include <linux/linkage.h>
+#include <linux/lockdep.h>
 
 /*
  * Mutexes - debugging helpers:
@@ -10,7 +11,12 @@
 #define __DEBUG_MUTEX_INITIALIZER(lockname)				\
 	, .magic = &lockname
 
-#define mutex_init(sem)		__mutex_init(sem, __FILE__":"#sem)
+#define mutex_init(mutex)						\
+do {									\
+	static struct lockdep_type_key __key;				\
+									\
+	__mutex_init((mutex), #mutex, &__key);				\
+} while (0)
 
 extern void FASTCALL(mutex_destroy(struct mutex *lock));
 
Index: linux/include/linux/mutex.h
===================================================================
--- linux.orig/include/linux/mutex.h
+++ linux/include/linux/mutex.h
@@ -13,6 +13,7 @@
 #include <linux/list.h>
 #include <linux/spinlock_types.h>
 #include <linux/linkage.h>
+#include <linux/lockdep.h>
 
 #include <asm/atomic.h>
 
@@ -53,6 +54,9 @@ struct mutex {
 	const char 		*name;
 	void			*magic;
 #endif
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+	struct lockdep_map	dep_map;
+#endif
 };
 
 /*
@@ -72,20 +76,36 @@ struct mutex_waiter {
 # include <linux/mutex-debug.h>
 #else
 # define __DEBUG_MUTEX_INITIALIZER(lockname)
-# define mutex_init(mutex)			__mutex_init(mutex, NULL)
+# define mutex_init(mutex) \
+do {							\
+	static struct lockdep_type_key __key;		\
+							\
+	__mutex_init((mutex), NULL, &__key);		\
+} while (0)
 # define mutex_destroy(mutex)				do { } while (0)
 #endif
 
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
+		, .dep_map = { .name = #lockname }
+#else
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
+#endif
+
 #define __MUTEX_INITIALIZER(lockname) \
 		{ .count = ATOMIC_INIT(1) \
 		, .wait_lock = SPIN_LOCK_UNLOCKED \
 		, .wait_list = LIST_HEAD_INIT(lockname.wait_list) \
-		__DEBUG_MUTEX_INITIALIZER(lockname) }
+		__DEBUG_MUTEX_INITIALIZER(lockname) \
+		__DEP_MAP_MUTEX_INITIALIZER(lockname) }
 
 #define DEFINE_MUTEX(mutexname) \
 	struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
 
-extern void fastcall __mutex_init(struct mutex *lock, const char *name);
+extern void __mutex_init(struct mutex *lock, const char *name,
+			 struct lockdep_type_key *key);
+
+#define mutex_init_key(mutex, name, key) __mutex_init((mutex), name, key)
 
 /***
  * mutex_is_locked - is the mutex locked
@@ -104,11 +124,19 @@ static inline int fastcall mutex_is_lock
  */
 extern void fastcall mutex_lock(struct mutex *lock);
 extern int fastcall mutex_lock_interruptible(struct mutex *lock);
+
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+extern void mutex_lock_nested(struct mutex *lock, unsigned int subtype);
+#else
+# define mutex_lock_nested(lock, subtype) mutex_lock(lock)
+#endif
+
 /*
  * NOTE: mutex_trylock() follows the spin_trylock() convention,
  *       not the down_trylock() convention!
  */
 extern int fastcall mutex_trylock(struct mutex *lock);
 extern void fastcall mutex_unlock(struct mutex *lock);
+extern void fastcall mutex_unlock_non_nested(struct mutex *lock);
 
 #endif
Index: linux/kernel/mutex-debug.c
===================================================================
--- linux.orig/kernel/mutex-debug.c
+++ linux/kernel/mutex-debug.c
@@ -100,6 +100,14 @@ static int check_deadlock(struct mutex *
 		return 0;
 
 	task = ti->task;
+	/*
+	 * In the PROVE_MUTEX_LOCKING we are tracking all held
+	 * locks already, which allows us to optimize this:
+	 */
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+	if (!task->lockdep_depth)
+		return 0;
+#endif
 	lockblk = NULL;
 	if (task->blocked_on)
 		lockblk = task->blocked_on->lock;
Index: linux/kernel/mutex-lockdep.h
===================================================================
--- /dev/null
+++ linux/kernel/mutex-lockdep.h
@@ -0,0 +1,40 @@
+/*
+ * Mutexes: blocking mutual exclusion locks
+ *
+ * started by Ingo Molnar:
+ *
+ *  Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * This file contains mutex debugging related internal prototypes, for the
+ * !CONFIG_DEBUG_MUTEXES && CONFIG_PROVE_MUTEX_LOCKING case. Most of
+ * them are NOPs:
+ */
+
+#define spin_lock_mutex(lock, flags)			\
+	do {						\
+		local_irq_save(flags);			\
+		__raw_spin_lock(&(lock)->raw_lock);	\
+	} while (0)
+
+#define spin_unlock_mutex(lock, flags)			\
+	do {						\
+		__raw_spin_unlock(&(lock)->raw_lock);	\
+		local_irq_restore(flags);		\
+	} while (0)
+
+#define mutex_remove_waiter(lock, waiter, ti) \
+		__list_del((waiter)->list.prev, (waiter)->list.next)
+
+#define debug_mutex_set_owner(lock, new_owner)		do { } while (0)
+#define debug_mutex_clear_owner(lock)			do { } while (0)
+#define debug_mutex_wake_waiter(lock, waiter)		do { } while (0)
+#define debug_mutex_free_waiter(waiter)			do { } while (0)
+#define debug_mutex_add_waiter(lock, waiter, ti)	do { } while (0)
+#define debug_mutex_unlock(lock)			do { } while (0)
+#define debug_mutex_init(lock, name)			do { } while (0)
+
+static inline void
+debug_mutex_lock_common(struct mutex *lock,
+			struct mutex_waiter *waiter)
+{
+}
Index: linux/kernel/mutex.c
===================================================================
--- linux.orig/kernel/mutex.c
+++ linux/kernel/mutex.c
@@ -27,8 +27,13 @@
 # include "mutex-debug.h"
 # include <asm-generic/mutex-null.h>
 #else
-# include "mutex.h"
-# include <asm/mutex.h>
+# ifdef CONFIG_PROVE_MUTEX_LOCKING
+#  include "mutex-lockdep.h"
+#  include <asm-generic/mutex-null.h>
+# else
+#  include "mutex.h"
+#  include <asm/mutex.h>
+# endif
 #endif
 
 /***
@@ -39,13 +44,18 @@
  *
  * It is not allowed to initialize an already locked mutex.
  */
-__always_inline void fastcall __mutex_init(struct mutex *lock, const char *name)
+void
+__mutex_init(struct mutex *lock, const char *name, struct lockdep_type_key *key)
 {
 	atomic_set(&lock->count, 1);
 	spin_lock_init(&lock->wait_lock);
 	INIT_LIST_HEAD(&lock->wait_list);
 
 	debug_mutex_init(lock, name);
+
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
+	lockdep_init_map(&lock->dep_map, name, key);
+#endif
 }
 
 EXPORT_SYMBOL(__mutex_init);
@@ -146,6 +156,7 @@ __mutex_lock_common(struct mutex *lock, 
 	spin_lock_mutex(&lock->wait_lock, flags);
 
 	debug_mutex_lock_common(lock, &waiter);
+	mutex_acquire(&lock->dep_map, subtype, 0, _RET_IP_);
 	debug_mutex_add_waiter(lock, &waiter, task->thread_info);
 
 	/* add waiting tasks to the end of the waitqueue (FIFO): */
@@ -173,6 +184,7 @@ __mutex_lock_common(struct mutex *lock, 
 		if (unlikely(state == TASK_INTERRUPTIBLE &&
 						signal_pending(task))) {
 			mutex_remove_waiter(lock, &waiter, task->thread_info);
+			mutex_release(&lock->dep_map, 1, _RET_IP_);
 			spin_unlock_mutex(&lock->wait_lock, flags);
 
 			debug_mutex_free_waiter(&waiter);
@@ -198,7 +210,9 @@ __mutex_lock_common(struct mutex *lock, 
 
 	debug_mutex_free_waiter(&waiter);
 
+#ifdef CONFIG_DEBUG_MUTEXES
 	DEBUG_WARN_ON(lock->owner != task->thread_info);
+#endif
 
 	return 0;
 }
@@ -211,7 +225,7 @@ __mutex_lock_slowpath(atomic_t *lock_cou
 	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0);
 }
 
-#ifdef CONFIG_DEBUG_MUTEXES
+#ifdef CONFIG_PROVE_MUTEX_LOCKING
 void __sched
 mutex_lock_nested(struct mutex *lock, unsigned int subtype)
 {
@@ -232,6 +246,7 @@ __mutex_unlock_common_slowpath(atomic_t 
 	unsigned long flags;
 
 	spin_lock_mutex(&lock->wait_lock, flags);
+	mutex_release(&lock->dep_map, nested, _RET_IP_);
 	debug_mutex_unlock(lock);
 
 	/*
@@ -322,9 +337,10 @@ static inline int __mutex_trylock_slowpa
 	spin_lock_mutex(&lock->wait_lock, flags);
 
 	prev = atomic_xchg(&lock->count, -1);
-	if (likely(prev == 1))
+	if (likely(prev == 1)) {
 		debug_mutex_set_owner(lock, current_thread_info());
-
+		mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+	}
 	/* Set it back to 0 if there are no waiters: */
 	if (likely(list_empty(&lock->wait_list)))
 		atomic_set(&lock->count, 0);
Index: linux/kernel/mutex.h
===================================================================
--- linux.orig/kernel/mutex.h
+++ linux/kernel/mutex.h
@@ -16,14 +16,13 @@
 #define mutex_remove_waiter(lock, waiter, ti) \
 		__list_del((waiter)->list.prev, (waiter)->list.next)
 
+#undef DEBUG_WARN_ON
 #define DEBUG_WARN_ON(c)				do { } while (0)
 #define debug_mutex_set_owner(lock, new_owner)		do { } while (0)
 #define debug_mutex_clear_owner(lock)			do { } while (0)
 #define debug_mutex_wake_waiter(lock, waiter)		do { } while (0)
 #define debug_mutex_free_waiter(waiter)			do { } while (0)
 #define debug_mutex_add_waiter(lock, waiter, ti)	do { } while (0)
-#define mutex_acquire(lock, subtype, trylock)	do { } while (0)
-#define mutex_release(lock, nested)		do { } while (0)
 #define debug_mutex_unlock(lock)			do { } while (0)
 #define debug_mutex_init(lock, name)			do { } while (0)
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 29/61] lock validator: print all lock-types on SysRq-D
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (27 preceding siblings ...)
  2006-05-29 21:25 ` [patch 28/61] lock validator: prove mutex " Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:25 ` [patch 30/61] lock validator: x86_64 early init Ingo Molnar
                   ` (44 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

print all lock-types on SysRq-D.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/char/sysrq.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux/drivers/char/sysrq.c
===================================================================
--- linux.orig/drivers/char/sysrq.c
+++ linux/drivers/char/sysrq.c
@@ -148,12 +148,14 @@ static struct sysrq_key_op sysrq_mountro
 	.enable_mask	= SYSRQ_ENABLE_REMOUNT,
 };
 
-#ifdef CONFIG_DEBUG_MUTEXES
+#ifdef CONFIG_LOCKDEP
 static void sysrq_handle_showlocks(int key, struct pt_regs *pt_regs,
 				struct tty_struct *tty)
 {
 	debug_show_all_locks();
+	print_lock_types();
 }
+
 static struct sysrq_key_op sysrq_showlocks_op = {
 	.handler	= sysrq_handle_showlocks,
 	.help_msg	= "show-all-locks(D)",

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 30/61] lock validator: x86_64 early init
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (28 preceding siblings ...)
  2006-05-29 21:25 ` [patch 29/61] lock validator: print all lock-types on SysRq-D Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:25 ` [patch 31/61] lock validator: SMP alternatives workaround Ingo Molnar
                   ` (43 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

x86_64 uses spinlocks very early - earlier than start_kernel().
So call lockdep_init() from the arch setup code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86_64/kernel/head64.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux/arch/x86_64/kernel/head64.c
===================================================================
--- linux.orig/arch/x86_64/kernel/head64.c
+++ linux/arch/x86_64/kernel/head64.c
@@ -85,6 +85,11 @@ void __init x86_64_start_kernel(char * r
 	clear_bss();
 
 	/*
+	 * This must be called really, really early:
+	 */
+	lockdep_init();
+
+	/*
 	 * switch to init_level4_pgt from boot_level4_pgt
 	 */
 	memcpy(init_level4_pgt, boot_level4_pgt, PTRS_PER_PGD*sizeof(pgd_t));

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 31/61] lock validator: SMP alternatives workaround
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (29 preceding siblings ...)
  2006-05-29 21:25 ` [patch 30/61] lock validator: x86_64 early init Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:25 ` [patch 32/61] lock validator: do not recurse in printk() Ingo Molnar
                   ` (42 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

disable SMP alternatives fixups (the patching in of NOPs on 1-CPU
systems) if the lock validator is enabled: there is a binutils
section handling bug that causes corrupted instructions when
UP instructions are patched in.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/alternative.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

Index: linux/arch/i386/kernel/alternative.c
===================================================================
--- linux.orig/arch/i386/kernel/alternative.c
+++ linux/arch/i386/kernel/alternative.c
@@ -301,6 +301,16 @@ void alternatives_smp_switch(int smp)
 	struct smp_alt_module *mod;
 	unsigned long flags;
 
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * A not yet fixed binutils section handling bug prevents
+	 * alternatives-replacement from working reliably, so turn
+	 * it off:
+	 */
+	printk("lockdep: not fixing up alternatives.\n");
+	return;
+#endif
+
 	if (no_replacement || smp_alt_once)
 		return;
 	BUG_ON(!smp && (num_online_cpus() > 1));

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 32/61] lock validator: do not recurse in printk()
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (30 preceding siblings ...)
  2006-05-29 21:25 ` [patch 31/61] lock validator: SMP alternatives workaround Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:25 ` [patch 33/61] lock validator: disable NMI watchdog if CONFIG_LOCKDEP Ingo Molnar
                   ` (41 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

make printk()-ing from within the lock validation code safer by
using the lockdep-recursion counter.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/printk.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -516,7 +516,9 @@ asmlinkage int vprintk(const char *fmt, 
 		zap_locks();
 
 	/* This stops the holder of console_sem just where we want him */
-	spin_lock_irqsave(&logbuf_lock, flags);
+	local_irq_save(flags);
+	current->lockdep_recursion++;
+	spin_lock(&logbuf_lock);
 	printk_cpu = smp_processor_id();
 
 	/* Emit the output into the temporary buffer */
@@ -586,7 +588,7 @@ asmlinkage int vprintk(const char *fmt, 
 		 */
 		console_locked = 1;
 		printk_cpu = UINT_MAX;
-		spin_unlock_irqrestore(&logbuf_lock, flags);
+		spin_unlock(&logbuf_lock);
 
 		/*
 		 * Console drivers may assume that per-cpu resources have
@@ -602,6 +604,8 @@ asmlinkage int vprintk(const char *fmt, 
 			console_locked = 0;
 			up(&console_sem);
 		}
+		current->lockdep_recursion--;
+		local_irq_restore(flags);
 	} else {
 		/*
 		 * Someone else owns the drivers.  We drop the spinlock, which
@@ -609,7 +613,9 @@ asmlinkage int vprintk(const char *fmt, 
 		 * console drivers with the output which we just produced.
 		 */
 		printk_cpu = UINT_MAX;
-		spin_unlock_irqrestore(&logbuf_lock, flags);
+		spin_unlock(&logbuf_lock);
+		current->lockdep_recursion--;
+		local_irq_restore(flags);
 	}
 
 	preempt_enable();
@@ -783,7 +789,13 @@ void release_console_sem(void)
 	up(&console_sem);
 	spin_unlock_irqrestore(&logbuf_lock, flags);
 	if (wake_klogd && !oops_in_progress && waitqueue_active(&log_wait))
-		wake_up_interruptible(&log_wait);
+		/*
+		 * If we printk from within the lock dependency code,
+		 * from within the scheduler code, then do not lock
+		 * up due to self-recursion:
+		 */
+		if (current->lockdep_recursion <= 1)
+			wake_up_interruptible(&log_wait);
 }
 EXPORT_SYMBOL(release_console_sem);
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 33/61] lock validator: disable NMI watchdog if CONFIG_LOCKDEP
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (31 preceding siblings ...)
  2006-05-29 21:25 ` [patch 32/61] lock validator: do not recurse in printk() Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 22:49   ` Keith Owens
  2006-05-29 21:25 ` [patch 34/61] lock validator: special locking: bdev Ingo Molnar
                   ` (40 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

The NMI watchdog uses spinlocks (notifier chains, etc.),
so it's not lockdep-safe at the moment.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86_64/kernel/nmi.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Index: linux/arch/x86_64/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86_64/kernel/nmi.c
+++ linux/arch/x86_64/kernel/nmi.c
@@ -205,6 +205,18 @@ int __init check_nmi_watchdog (void)
 	int *counts;
 	int cpu;
 
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * The NMI watchdog uses spinlocks (notifier chains, etc.),
+	 * so it's not lockdep-safe:
+	 */
+	nmi_watchdog = 0;
+	for_each_online_cpu(cpu)
+		per_cpu(nmi_watchdog_ctlblk.enabled, cpu) = 0;
+
+	printk("lockdep: disabled NMI watchdog.\n");
+	return 0;
+#endif
 	if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DEFAULT))
 		return 0;
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 34/61] lock validator: special locking: bdev
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (32 preceding siblings ...)
  2006-05-29 21:25 ` [patch 33/61] lock validator: disable NMI watchdog if CONFIG_LOCKDEP Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-30  1:35   ` Andrew Morton
  2006-05-29 21:25 ` [patch 35/61] lock validator: special locking: direct-IO Ingo Molnar
                   ` (39 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/md/md.c    |    6 +--
 fs/block_dev.c     |  105 ++++++++++++++++++++++++++++++++++++++++++++++-------
 include/linux/fs.h |   17 ++++++++
 3 files changed, 112 insertions(+), 16 deletions(-)

Index: linux/drivers/md/md.c
===================================================================
--- linux.orig/drivers/md/md.c
+++ linux/drivers/md/md.c
@@ -1394,7 +1394,7 @@ static int lock_rdev(mdk_rdev_t *rdev, d
 	struct block_device *bdev;
 	char b[BDEVNAME_SIZE];
 
-	bdev = open_by_devnum(dev, FMODE_READ|FMODE_WRITE);
+	bdev = open_partition_by_devnum(dev, FMODE_READ|FMODE_WRITE);
 	if (IS_ERR(bdev)) {
 		printk(KERN_ERR "md: could not open %s.\n",
 			__bdevname(dev, b));
@@ -1404,7 +1404,7 @@ static int lock_rdev(mdk_rdev_t *rdev, d
 	if (err) {
 		printk(KERN_ERR "md: could not bd_claim %s.\n",
 			bdevname(bdev, b));
-		blkdev_put(bdev);
+		blkdev_put_partition(bdev);
 		return err;
 	}
 	rdev->bdev = bdev;
@@ -1418,7 +1418,7 @@ static void unlock_rdev(mdk_rdev_t *rdev
 	if (!bdev)
 		MD_BUG();
 	bd_release(bdev);
-	blkdev_put(bdev);
+	blkdev_put_partition(bdev);
 }
 
 void md_autodetect_dev(dev_t dev);
Index: linux/fs/block_dev.c
===================================================================
--- linux.orig/fs/block_dev.c
+++ linux/fs/block_dev.c
@@ -746,7 +746,7 @@ static int bd_claim_by_kobject(struct bl
 	if (!bo)
 		return -ENOMEM;
 
-	mutex_lock(&bdev->bd_mutex);
+	mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_PARTITION);
 	res = bd_claim(bdev, holder);
 	if (res || !add_bd_holder(bdev, bo))
 		free_bd_holder(bo);
@@ -771,7 +771,7 @@ static void bd_release_from_kobject(stru
 	if (!kobj)
 		return;
 
-	mutex_lock(&bdev->bd_mutex);
+	mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_PARTITION);
 	bd_release(bdev);
 	if ((bo = del_bd_holder(bdev, kobj)))
 		free_bd_holder(bo);
@@ -829,6 +829,22 @@ struct block_device *open_by_devnum(dev_
 
 EXPORT_SYMBOL(open_by_devnum);
 
+static int
+blkdev_get_partition(struct block_device *bdev, mode_t mode, unsigned flags);
+
+struct block_device *open_partition_by_devnum(dev_t dev, unsigned mode)
+{
+	struct block_device *bdev = bdget(dev);
+	int err = -ENOMEM;
+	int flags = mode & FMODE_WRITE ? O_RDWR : O_RDONLY;
+	if (bdev)
+		err = blkdev_get_partition(bdev, mode, flags);
+	return err ? ERR_PTR(err) : bdev;
+}
+
+EXPORT_SYMBOL(open_partition_by_devnum);
+
+
 /*
  * This routine checks whether a removable media has been changed,
  * and invalidates all buffer-cache-entries in that case. This
@@ -875,7 +891,11 @@ void bd_set_size(struct block_device *bd
 }
 EXPORT_SYMBOL(bd_set_size);
 
-static int do_open(struct block_device *bdev, struct file *file)
+static int
+blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags);
+
+static int
+do_open(struct block_device *bdev, struct file *file, unsigned int subtype)
 {
 	struct module *owner = NULL;
 	struct gendisk *disk;
@@ -892,7 +912,8 @@ static int do_open(struct block_device *
 	}
 	owner = disk->fops->owner;
 
-	mutex_lock(&bdev->bd_mutex);
+	mutex_lock_nested(&bdev->bd_mutex, subtype);
+
 	if (!bdev->bd_openers) {
 		bdev->bd_disk = disk;
 		bdev->bd_contains = bdev;
@@ -917,13 +938,17 @@ static int do_open(struct block_device *
 			struct block_device *whole;
 			whole = bdget_disk(disk, 0);
 			ret = -ENOMEM;
+			/*
+			 * We must not recurse deeper than 1:
+			 */
+			WARN_ON(subtype != 0);
 			if (!whole)
 				goto out_first;
-			ret = blkdev_get(whole, file->f_mode, file->f_flags);
+			ret = blkdev_get_whole(whole, file->f_mode, file->f_flags);
 			if (ret)
 				goto out_first;
 			bdev->bd_contains = whole;
-			mutex_lock(&whole->bd_mutex);
+			mutex_lock_nested(&whole->bd_mutex, BD_MUTEX_WHOLE);
 			whole->bd_part_count++;
 			p = disk->part[part - 1];
 			bdev->bd_inode->i_data.backing_dev_info =
@@ -951,7 +976,8 @@ static int do_open(struct block_device *
 			if (bdev->bd_invalidated)
 				rescan_partitions(bdev->bd_disk, bdev);
 		} else {
-			mutex_lock(&bdev->bd_contains->bd_mutex);
+			mutex_lock_nested(&bdev->bd_contains->bd_mutex,
+					  BD_MUTEX_PARTITION);
 			bdev->bd_contains->bd_part_count++;
 			mutex_unlock(&bdev->bd_contains->bd_mutex);
 		}
@@ -992,11 +1018,49 @@ int blkdev_get(struct block_device *bdev
 	fake_file.f_dentry = &fake_dentry;
 	fake_dentry.d_inode = bdev->bd_inode;
 
-	return do_open(bdev, &fake_file);
+	return do_open(bdev, &fake_file, BD_MUTEX_NORMAL);
 }
 
 EXPORT_SYMBOL(blkdev_get);
 
+static int
+blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags)
+{
+	/*
+	 * This crockload is due to bad choice of ->open() type.
+	 * It will go away.
+	 * For now, block device ->open() routine must _not_
+	 * examine anything in 'inode' argument except ->i_rdev.
+	 */
+	struct file fake_file = {};
+	struct dentry fake_dentry = {};
+	fake_file.f_mode = mode;
+	fake_file.f_flags = flags;
+	fake_file.f_dentry = &fake_dentry;
+	fake_dentry.d_inode = bdev->bd_inode;
+
+	return do_open(bdev, &fake_file, BD_MUTEX_WHOLE);
+}
+
+static int
+blkdev_get_partition(struct block_device *bdev, mode_t mode, unsigned flags)
+{
+	/*
+	 * This crockload is due to bad choice of ->open() type.
+	 * It will go away.
+	 * For now, block device ->open() routine must _not_
+	 * examine anything in 'inode' argument except ->i_rdev.
+	 */
+	struct file fake_file = {};
+	struct dentry fake_dentry = {};
+	fake_file.f_mode = mode;
+	fake_file.f_flags = flags;
+	fake_file.f_dentry = &fake_dentry;
+	fake_dentry.d_inode = bdev->bd_inode;
+
+	return do_open(bdev, &fake_file, BD_MUTEX_PARTITION);
+}
+
 static int blkdev_open(struct inode * inode, struct file * filp)
 {
 	struct block_device *bdev;
@@ -1012,7 +1076,7 @@ static int blkdev_open(struct inode * in
 
 	bdev = bd_acquire(inode);
 
-	res = do_open(bdev, filp);
+	res = do_open(bdev, filp, BD_MUTEX_NORMAL);
 	if (res)
 		return res;
 
@@ -1026,13 +1090,13 @@ static int blkdev_open(struct inode * in
 	return res;
 }
 
-int blkdev_put(struct block_device *bdev)
+static int __blkdev_put(struct block_device *bdev, unsigned int subtype)
 {
 	int ret = 0;
 	struct inode *bd_inode = bdev->bd_inode;
 	struct gendisk *disk = bdev->bd_disk;
 
-	mutex_lock(&bdev->bd_mutex);
+	mutex_lock_nested(&bdev->bd_mutex, subtype);
 	lock_kernel();
 	if (!--bdev->bd_openers) {
 		sync_blockdev(bdev);
@@ -1042,7 +1106,9 @@ int blkdev_put(struct block_device *bdev
 		if (disk->fops->release)
 			ret = disk->fops->release(bd_inode, NULL);
 	} else {
-		mutex_lock(&bdev->bd_contains->bd_mutex);
+		WARN_ON(subtype != 0);
+		mutex_lock_nested(&bdev->bd_contains->bd_mutex,
+				  BD_MUTEX_PARTITION);
 		bdev->bd_contains->bd_part_count--;
 		mutex_unlock(&bdev->bd_contains->bd_mutex);
 	}
@@ -1059,7 +1125,8 @@ int blkdev_put(struct block_device *bdev
 		bdev->bd_disk = NULL;
 		bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
 		if (bdev != bdev->bd_contains) {
-			blkdev_put(bdev->bd_contains);
+			WARN_ON(subtype != 0);
+			__blkdev_put(bdev->bd_contains, 1);
 		}
 		bdev->bd_contains = NULL;
 	}
@@ -1069,8 +1136,20 @@ int blkdev_put(struct block_device *bdev
 	return ret;
 }
 
+int blkdev_put(struct block_device *bdev)
+{
+	return __blkdev_put(bdev, BD_MUTEX_NORMAL);
+}
+
 EXPORT_SYMBOL(blkdev_put);
 
+int blkdev_put_partition(struct block_device *bdev)
+{
+	return __blkdev_put(bdev, BD_MUTEX_PARTITION);
+}
+
+EXPORT_SYMBOL(blkdev_put_partition);
+
 static int blkdev_close(struct inode * inode, struct file * filp)
 {
 	struct block_device *bdev = I_BDEV(filp->f_mapping->host);
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -436,6 +436,21 @@ struct block_device {
 };
 
 /*
+ * bdev->bd_mutex nesting types for the LOCKDEP validator:
+ *
+ * 0: normal
+ * 1: 'whole'
+ * 2: 'partition'
+ */
+enum bdev_bd_mutex_lock_type
+{
+	BD_MUTEX_NORMAL,
+	BD_MUTEX_WHOLE,
+	BD_MUTEX_PARTITION
+};
+
+
+/*
  * Radix-tree tags, for tagging dirty and writeback pages within the pagecache
  * radix trees
  */
@@ -1404,6 +1419,7 @@ extern void bd_set_size(struct block_dev
 extern void bd_forget(struct inode *inode);
 extern void bdput(struct block_device *);
 extern struct block_device *open_by_devnum(dev_t, unsigned);
+extern struct block_device *open_partition_by_devnum(dev_t, unsigned);
 extern const struct file_operations def_blk_fops;
 extern const struct address_space_operations def_blk_aops;
 extern const struct file_operations def_chr_fops;
@@ -1414,6 +1430,7 @@ extern int blkdev_ioctl(struct inode *, 
 extern long compat_blkdev_ioctl(struct file *, unsigned, unsigned long);
 extern int blkdev_get(struct block_device *, mode_t, unsigned);
 extern int blkdev_put(struct block_device *);
+extern int blkdev_put_partition(struct block_device *);
 extern int bd_claim(struct block_device *, void *);
 extern void bd_release(struct block_device *);
 #ifdef CONFIG_SYSFS

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 35/61] lock validator: special locking: direct-IO
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (33 preceding siblings ...)
  2006-05-29 21:25 ` [patch 34/61] lock validator: special locking: bdev Ingo Molnar
@ 2006-05-29 21:25 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 36/61] lock validator: special locking: serial Ingo Molnar
                   ` (38 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (rwsem-in-irq) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
---
 fs/direct-io.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux/fs/direct-io.c
===================================================================
--- linux.orig/fs/direct-io.c
+++ linux/fs/direct-io.c
@@ -220,7 +220,8 @@ static void dio_complete(struct dio *dio
 	if (dio->end_io && dio->result)
 		dio->end_io(dio->iocb, offset, bytes, dio->map_bh.b_private);
 	if (dio->lock_type == DIO_LOCKING)
-		up_read(&dio->inode->i_alloc_sem);
+		/* lockdep: non-owner release */
+		up_read_non_owner(&dio->inode->i_alloc_sem);
 }
 
 /*
@@ -1261,7 +1262,8 @@ __blockdev_direct_IO(int rw, struct kioc
 		}
 
 		if (dio_lock_type == DIO_LOCKING)
-			down_read(&inode->i_alloc_sem);
+			/* lockdep: not the owner will release it */
+			down_read_non_owner(&inode->i_alloc_sem);
 	}
 
 	/*

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 36/61] lock validator: special locking: serial
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (34 preceding siblings ...)
  2006-05-29 21:25 ` [patch 35/61] lock validator: special locking: direct-IO Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-30  1:35   ` Andrew Morton
  2006-05-29 21:26 ` [patch 37/61] lock validator: special locking: dcache Ingo Molnar
                   ` (37 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (dual-initialized) locking code to the lock validator.
Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
---
 drivers/serial/serial_core.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: linux/drivers/serial/serial_core.c
===================================================================
--- linux.orig/drivers/serial/serial_core.c
+++ linux/drivers/serial/serial_core.c
@@ -1849,6 +1849,12 @@ static const struct baud_rates baud_rate
 	{      0, B38400  }
 };
 
+/*
+ * lockdep: port->lock is initialized in two places, but we
+ *          want only one lock-type:
+ */
+static struct lockdep_type_key port_lock_key;
+
 /**
  *	uart_set_options - setup the serial console parameters
  *	@port: pointer to the serial ports uart_port structure
@@ -1869,7 +1875,7 @@ uart_set_options(struct uart_port *port,
 	 * Ensure that the serial console lock is initialised
 	 * early.
 	 */
-	spin_lock_init(&port->lock);
+	spin_lock_init_key(&port->lock, &port_lock_key);
 
 	memset(&termios, 0, sizeof(struct termios));
 
@@ -2255,7 +2261,7 @@ int uart_add_one_port(struct uart_driver
 	 * initialised.
 	 */
 	if (!(uart_console(port) && (port->cons->flags & CON_ENABLED)))
-		spin_lock_init(&port->lock);
+		spin_lock_init_key(&port->lock, &port_lock_key);
 
 	uart_configure_port(drv, state, port);
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 37/61] lock validator: special locking: dcache
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (35 preceding siblings ...)
  2006-05-29 21:26 ` [patch 36/61] lock validator: special locking: serial Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-30  1:35   ` Andrew Morton
  2006-05-29 21:26 ` [patch 38/61] lock validator: special locking: i_mutex Ingo Molnar
                   ` (36 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/dcache.c            |    6 +++---
 include/linux/dcache.h |   12 ++++++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

Index: linux/fs/dcache.c
===================================================================
--- linux.orig/fs/dcache.c
+++ linux/fs/dcache.c
@@ -1380,10 +1380,10 @@ void d_move(struct dentry * dentry, stru
 	 */
 	if (target < dentry) {
 		spin_lock(&target->d_lock);
-		spin_lock(&dentry->d_lock);
+		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 	} else {
 		spin_lock(&dentry->d_lock);
-		spin_lock(&target->d_lock);
+		spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NESTED);
 	}
 
 	/* Move the dentry to the target hash queue, if on different bucket */
@@ -1420,7 +1420,7 @@ already_unhashed:
 	}
 
 	list_add(&dentry->d_u.d_child, &dentry->d_parent->d_subdirs);
-	spin_unlock(&target->d_lock);
+	spin_unlock_non_nested(&target->d_lock);
 	fsnotify_d_move(dentry);
 	spin_unlock(&dentry->d_lock);
 	write_sequnlock(&rename_lock);
Index: linux/include/linux/dcache.h
===================================================================
--- linux.orig/include/linux/dcache.h
+++ linux/include/linux/dcache.h
@@ -114,6 +114,18 @@ struct dentry {
 	unsigned char d_iname[DNAME_INLINE_LEN_MIN];	/* small names */
 };
 
+/*
+ * dentry->d_lock spinlock nesting types:
+ *
+ * 0: normal
+ * 1: nested
+ */
+enum dentry_d_lock_type
+{
+	DENTRY_D_LOCK_NORMAL,
+	DENTRY_D_LOCK_NESTED
+};
+
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, struct nameidata *);
 	int (*d_hash) (struct dentry *, struct qstr *);

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 38/61] lock validator: special locking: i_mutex
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (36 preceding siblings ...)
  2006-05-29 21:26 ` [patch 37/61] lock validator: special locking: dcache Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-30 20:53   ` Steven Rostedt
  2006-05-29 21:26 ` [patch 39/61] lock validator: special locking: s_lock Ingo Molnar
                   ` (35 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/usb/core/inode.c |    2 +-
 fs/namei.c               |   24 ++++++++++++------------
 include/linux/fs.h       |   14 ++++++++++++++
 3 files changed, 27 insertions(+), 13 deletions(-)

Index: linux/drivers/usb/core/inode.c
===================================================================
--- linux.orig/drivers/usb/core/inode.c
+++ linux/drivers/usb/core/inode.c
@@ -201,7 +201,7 @@ static void update_sb(struct super_block
 	if (!root)
 		return;
 
-	mutex_lock(&root->d_inode->i_mutex);
+	mutex_lock_nested(&root->d_inode->i_mutex, I_MUTEX_PARENT);
 
 	list_for_each_entry(bus, &root->d_subdirs, d_u.d_child) {
 		if (bus->d_inode) {
Index: linux/fs/namei.c
===================================================================
--- linux.orig/fs/namei.c
+++ linux/fs/namei.c
@@ -1422,7 +1422,7 @@ struct dentry *lock_rename(struct dentry
 	struct dentry *p;
 
 	if (p1 == p2) {
-		mutex_lock(&p1->d_inode->i_mutex);
+		mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
 		return NULL;
 	}
 
@@ -1430,30 +1430,30 @@ struct dentry *lock_rename(struct dentry
 
 	for (p = p1; p->d_parent != p; p = p->d_parent) {
 		if (p->d_parent == p2) {
-			mutex_lock(&p2->d_inode->i_mutex);
-			mutex_lock(&p1->d_inode->i_mutex);
+			mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_PARENT);
+			mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_CHILD);
 			return p;
 		}
 	}
 
 	for (p = p2; p->d_parent != p; p = p->d_parent) {
 		if (p->d_parent == p1) {
-			mutex_lock(&p1->d_inode->i_mutex);
-			mutex_lock(&p2->d_inode->i_mutex);
+			mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
+			mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD);
 			return p;
 		}
 	}
 
-	mutex_lock(&p1->d_inode->i_mutex);
-	mutex_lock(&p2->d_inode->i_mutex);
+	mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
+	mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD);
 	return NULL;
 }
 
 void unlock_rename(struct dentry *p1, struct dentry *p2)
 {
-	mutex_unlock(&p1->d_inode->i_mutex);
+	mutex_unlock_non_nested(&p1->d_inode->i_mutex);
 	if (p1 != p2) {
-		mutex_unlock(&p2->d_inode->i_mutex);
+		mutex_unlock_non_nested(&p2->d_inode->i_mutex);
 		mutex_unlock(&p1->d_inode->i_sb->s_vfs_rename_mutex);
 	}
 }
@@ -1750,7 +1750,7 @@ struct dentry *lookup_create(struct name
 {
 	struct dentry *dentry = ERR_PTR(-EEXIST);
 
-	mutex_lock(&nd->dentry->d_inode->i_mutex);
+	mutex_lock_nested(&nd->dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	/*
 	 * Yucky last component or no last component at all?
 	 * (foo/., foo/.., /////)
@@ -2007,7 +2007,7 @@ static long do_rmdir(int dfd, const char
 			error = -EBUSY;
 			goto exit1;
 	}
-	mutex_lock(&nd.dentry->d_inode->i_mutex);
+	mutex_lock_nested(&nd.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	dentry = lookup_hash(&nd);
 	error = PTR_ERR(dentry);
 	if (!IS_ERR(dentry)) {
@@ -2081,7 +2081,7 @@ static long do_unlinkat(int dfd, const c
 	error = -EISDIR;
 	if (nd.last_type != LAST_NORM)
 		goto exit1;
-	mutex_lock(&nd.dentry->d_inode->i_mutex);
+	mutex_lock_nested(&nd.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	dentry = lookup_hash(&nd);
 	error = PTR_ERR(dentry);
 	if (!IS_ERR(dentry)) {
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -558,6 +558,20 @@ struct inode {
 };
 
 /*
+ * inode->i_mutex nesting types for the LOCKDEP validator:
+ *
+ * 0: the object of the current VFS operation
+ * 1: parent
+ * 2: child/target
+ */
+enum inode_i_mutex_lock_type
+{
+	I_MUTEX_NORMAL,
+	I_MUTEX_PARENT,
+	I_MUTEX_CHILD
+};
+
+/*
  * NOTE: in a 32bit arch with a preemptable kernel and
  * an UP compile the i_size_read/write must be atomic
  * with respect to the local cpu (unlike with preempt disabled),

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 39/61] lock validator: special locking: s_lock
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (37 preceding siblings ...)
  2006-05-29 21:26 ` [patch 38/61] lock validator: special locking: i_mutex Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 40/61] lock validator: special locking: futex Ingo Molnar
                   ` (34 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (per-filesystem) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
---
 fs/super.c         |   13 +++++++++----
 include/linux/fs.h |    1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c
+++ linux/fs/super.c
@@ -54,7 +54,7 @@ DEFINE_SPINLOCK(sb_lock);
  *	Allocates and initializes a new &struct super_block.  alloc_super()
  *	returns a pointer new superblock or %NULL if allocation had failed.
  */
-static struct super_block *alloc_super(void)
+static struct super_block *alloc_super(struct file_system_type *type)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static struct super_operations default_op;
@@ -72,7 +72,12 @@ static struct super_block *alloc_super(v
 		INIT_HLIST_HEAD(&s->s_anon);
 		INIT_LIST_HEAD(&s->s_inodes);
 		init_rwsem(&s->s_umount);
-		mutex_init(&s->s_lock);
+		/*
+		 * The locking rules for s_lock are up to the
+		 * filesystem. For example ext3fs has different
+		 * lock ordering than usbfs:
+		 */
+		mutex_init_key(&s->s_lock, type->name, &type->s_lock_key);
 		down_write(&s->s_umount);
 		s->s_count = S_BIAS;
 		atomic_set(&s->s_active, 1);
@@ -297,7 +302,7 @@ retry:
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
-		s = alloc_super();
+		s = alloc_super(type);
 		if (!s)
 			return ERR_PTR(-ENOMEM);
 		goto retry;
@@ -696,7 +701,7 @@ struct super_block *get_sb_bdev(struct f
 	 */
 	mutex_lock(&bdev->bd_mount_mutex);
 	s = sget(fs_type, test_bdev_super, set_bdev_super, bdev);
-	mutex_unlock(&bdev->bd_mount_mutex);
+	mutex_unlock_non_nested(&bdev->bd_mount_mutex);
 	if (IS_ERR(s))
 		goto out;
 
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -1307,6 +1307,7 @@ struct file_system_type {
 	struct module *owner;
 	struct file_system_type * next;
 	struct list_head fs_supers;
+	struct lockdep_type_key s_lock_key;
 };
 
 struct super_block *get_sb_bdev(struct file_system_type *fs_type,

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 40/61] lock validator: special locking: futex
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (38 preceding siblings ...)
  2006-05-29 21:26 ` [patch 39/61] lock validator: special locking: s_lock Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 41/61] lock validator: special locking: genirq Ingo Molnar
                   ` (33 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/futex.c |   44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

Index: linux/kernel/futex.c
===================================================================
--- linux.orig/kernel/futex.c
+++ linux/kernel/futex.c
@@ -604,6 +604,22 @@ static int unlock_futex_pi(u32 __user *u
 }
 
 /*
+ * Express the locking dependencies for lockdep:
+ */
+static inline void
+double_lock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
+{
+	if (hb1 <= hb2) {
+		spin_lock(&hb1->lock);
+		if (hb1 < hb2)
+			spin_lock_nested(&hb2->lock, SINGLE_DEPTH_NESTING);
+	} else { /* hb1 > hb2 */
+		spin_lock(&hb2->lock);
+		spin_lock_nested(&hb1->lock, SINGLE_DEPTH_NESTING);
+	}
+}
+
+/*
  * Wake up all waiters hashed on the physical page that is mapped
  * to this virtual address:
  */
@@ -669,19 +685,15 @@ retryfull:
 	hb2 = hash_futex(&key2);
 
 retry:
-	if (hb1 < hb2)
-		spin_lock(&hb1->lock);
-	spin_lock(&hb2->lock);
-	if (hb1 > hb2)
-		spin_lock(&hb1->lock);
+	double_lock_hb(hb1, hb2);
 
 	op_ret = futex_atomic_op_inuser(op, uaddr2);
 	if (unlikely(op_ret < 0)) {
 		u32 dummy;
 
-		spin_unlock(&hb1->lock);
+		spin_unlock_non_nested(&hb1->lock);
 		if (hb1 != hb2)
-			spin_unlock(&hb2->lock);
+			spin_unlock_non_nested(&hb2->lock);
 
 #ifndef CONFIG_MMU
 		/*
@@ -748,9 +760,9 @@ retry:
 		ret += op_ret;
 	}
 
-	spin_unlock(&hb1->lock);
+	spin_unlock_non_nested(&hb1->lock);
 	if (hb1 != hb2)
-		spin_unlock(&hb2->lock);
+		spin_unlock_non_nested(&hb2->lock);
 out:
 	up_read(&current->mm->mmap_sem);
 	return ret;
@@ -782,11 +794,7 @@ static int futex_requeue(u32 __user *uad
 	hb1 = hash_futex(&key1);
 	hb2 = hash_futex(&key2);
 
-	if (hb1 < hb2)
-		spin_lock(&hb1->lock);
-	spin_lock(&hb2->lock);
-	if (hb1 > hb2)
-		spin_lock(&hb1->lock);
+	double_lock_hb(hb1, hb2);
 
 	if (likely(cmpval != NULL)) {
 		u32 curval;
@@ -794,9 +802,9 @@ static int futex_requeue(u32 __user *uad
 		ret = get_futex_value_locked(&curval, uaddr1);
 
 		if (unlikely(ret)) {
-			spin_unlock(&hb1->lock);
+			spin_unlock_non_nested(&hb1->lock);
 			if (hb1 != hb2)
-				spin_unlock(&hb2->lock);
+				spin_unlock_non_nested(&hb2->lock);
 
 			/*
 			 * If we would have faulted, release mmap_sem, fault
@@ -842,9 +850,9 @@ static int futex_requeue(u32 __user *uad
 	}
 
 out_unlock:
-	spin_unlock(&hb1->lock);
+	spin_unlock_non_nested(&hb1->lock);
 	if (hb1 != hb2)
-		spin_unlock(&hb2->lock);
+		spin_unlock_non_nested(&hb2->lock);
 
 	/* drop_key_refs() must be called outside the spinlocks. */
 	while (--drop_count >= 0)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 41/61] lock validator: special locking: genirq
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (39 preceding siblings ...)
  2006-05-29 21:26 ` [patch 40/61] lock validator: special locking: futex Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 42/61] lock validator: special locking: kgdb Ingo Molnar
                   ` (32 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
---
 kernel/irq/handle.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

Index: linux/kernel/irq/handle.c
===================================================================
--- linux.orig/kernel/irq/handle.c
+++ linux/kernel/irq/handle.c
@@ -11,6 +11,7 @@
 #include <linux/random.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
+#include <linux/kallsyms.h>
 
 #include "internals.h"
 
@@ -193,3 +194,15 @@ out:
 	return 1;
 }
 
+/*
+ * lockdep: we want to handle all irq_desc locks as a single lock-type:
+ */
+static struct lockdep_type_key irq_desc_lock_type;
+
+void early_init_irq_lock_type(void)
+{
+	int i;
+
+	for (i = 0; i < NR_IRQS; i++)
+		spin_lock_init_key(&irq_desc[i].lock, &irq_desc_lock_type);
+}

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 42/61] lock validator: special locking: kgdb
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (40 preceding siblings ...)
  2006-05-29 21:26 ` [patch 41/61] lock validator: special locking: genirq Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 43/61] lock validator: special locking: completions Ingo Molnar
                   ` (31 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive, non-ordered) locking code to the lock validator.
Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
---
 kernel/kgdb.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/kernel/kgdb.c
===================================================================
--- linux.orig/kernel/kgdb.c
+++ linux/kernel/kgdb.c
@@ -1539,7 +1539,7 @@ int kgdb_handle_exception(int ex_vector,
 
 	if (!debugger_step || !kgdb_contthread) {
 		for (i = 0; i < NR_CPUS; i++)
-			spin_unlock(&slavecpulocks[i]);
+			spin_unlock_non_nested(&slavecpulocks[i]);
 		/* Wait till all the processors have quit
 		 * from the debugger. */
 		for (i = 0; i < NR_CPUS; i++) {
@@ -1622,7 +1622,7 @@ static void __init kgdb_internal_init(vo
 
 	/* Initialize our spinlocks. */
 	for (i = 0; i < NR_CPUS; i++)
-		spin_lock_init(&slavecpulocks[i]);
+		spin_lock_init_static(&slavecpulocks[i]);
 
 	for (i = 0; i < MAX_BREAKPOINTS; i++)
 		kgdb_break[i].state = bp_none;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 43/61] lock validator: special locking: completions
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (41 preceding siblings ...)
  2006-05-29 21:26 ` [patch 42/61] lock validator: special locking: kgdb Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 44/61] lock validator: special locking: waitqueues Ingo Molnar
                   ` (30 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (multi-initialized) locking code to the lock validator.
Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
---
 include/linux/completion.h |    6 +-----
 kernel/sched.c             |    8 ++++++++
 2 files changed, 9 insertions(+), 5 deletions(-)

Index: linux/include/linux/completion.h
===================================================================
--- linux.orig/include/linux/completion.h
+++ linux/include/linux/completion.h
@@ -21,11 +21,7 @@ struct completion {
 #define DECLARE_COMPLETION(work) \
 	struct completion work = COMPLETION_INITIALIZER(work)
 
-static inline void init_completion(struct completion *x)
-{
-	x->done = 0;
-	init_waitqueue_head(&x->wait);
-}
+extern void init_completion(struct completion *x);
 
 extern void FASTCALL(wait_for_completion(struct completion *));
 extern int FASTCALL(wait_for_completion_interruptible(struct completion *x));
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3569,6 +3569,14 @@ __wake_up_sync(wait_queue_head_t *q, uns
 }
 EXPORT_SYMBOL_GPL(__wake_up_sync);	/* For internal use only */
 
+void init_completion(struct completion *x)
+{
+	x->done = 0;
+	__init_waitqueue_head(&x->wait);
+}
+
+EXPORT_SYMBOL(init_completion);
+
 void fastcall complete(struct completion *x)
 {
 	unsigned long flags;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 44/61] lock validator: special locking: waitqueues
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (42 preceding siblings ...)
  2006-05-29 21:26 ` [patch 43/61] lock validator: special locking: completions Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 45/61] lock validator: special locking: mm Ingo Molnar
                   ` (29 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

map special (multi-initialized) locking code to the lock validator.
Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
---
 include/linux/wait.h |   11 +++++++++--
 kernel/wait.c        |    9 +++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

Index: linux/include/linux/wait.h
===================================================================
--- linux.orig/include/linux/wait.h
+++ linux/include/linux/wait.h
@@ -77,12 +77,19 @@ struct task_struct;
 #define __WAIT_BIT_KEY_INITIALIZER(word, bit)				\
 	{ .flags = word, .bit_nr = bit, }
 
-static inline void init_waitqueue_head(wait_queue_head_t *q)
+/*
+ * lockdep: we want one lock-type for all waitqueue locks.
+ */
+extern struct lockdep_type_key waitqueue_lock_key;
+
+static inline void __init_waitqueue_head(wait_queue_head_t *q)
 {
-	spin_lock_init(&q->lock);
+	spin_lock_init_key(&q->lock, &waitqueue_lock_key);
 	INIT_LIST_HEAD(&q->task_list);
 }
 
+extern void init_waitqueue_head(wait_queue_head_t *q);
+
 static inline void init_waitqueue_entry(wait_queue_t *q, struct task_struct *p)
 {
 	q->flags = 0;
Index: linux/kernel/wait.c
===================================================================
--- linux.orig/kernel/wait.c
+++ linux/kernel/wait.c
@@ -11,6 +11,15 @@
 #include <linux/wait.h>
 #include <linux/hash.h>
 
+struct lockdep_type_key waitqueue_lock_key;
+
+void init_waitqueue_head(wait_queue_head_t *q)
+{
+	__init_waitqueue_head(q);
+}
+
+EXPORT_SYMBOL(init_waitqueue_head);
+
 void fastcall add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
 {
 	unsigned long flags;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 45/61] lock validator: special locking: mm
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (43 preceding siblings ...)
  2006-05-29 21:26 ` [patch 44/61] lock validator: special locking: waitqueues Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 46/61] lock validator: special locking: slab Ingo Molnar
                   ` (28 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 mm/memory.c |    2 +-
 mm/mremap.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux/mm/memory.c
===================================================================
--- linux.orig/mm/memory.c
+++ linux/mm/memory.c
@@ -509,7 +509,7 @@ again:
 		return -ENOMEM;
 	src_pte = pte_offset_map_nested(src_pmd, addr);
 	src_ptl = pte_lockptr(src_mm, src_pmd);
-	spin_lock(src_ptl);
+	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
 
 	do {
 		/*
Index: linux/mm/mremap.c
===================================================================
--- linux.orig/mm/mremap.c
+++ linux/mm/mremap.c
@@ -97,7 +97,7 @@ static void move_ptes(struct vm_area_str
  	new_pte = pte_offset_map_nested(new_pmd, new_addr);
 	new_ptl = pte_lockptr(mm, new_pmd);
 	if (new_ptl != old_ptl)
-		spin_lock(new_ptl);
+		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
 
 	for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
 				   new_pte++, new_addr += PAGE_SIZE) {

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 46/61] lock validator: special locking: slab
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (44 preceding siblings ...)
  2006-05-29 21:26 ` [patch 45/61] lock validator: special locking: mm Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-30  1:35   ` Andrew Morton
  2006-05-29 21:26 ` [patch 47/61] lock validator: special locking: skb_queue_head_init() Ingo Molnar
                   ` (27 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

fix initialize-locks-via-memcpy assumptions.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 mm/slab.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 11 deletions(-)

Index: linux/mm/slab.c
===================================================================
--- linux.orig/mm/slab.c
+++ linux/mm/slab.c
@@ -1026,7 +1026,8 @@ static void drain_alien_cache(struct kme
 	}
 }
 
-static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
+static inline int cache_free_alien(struct kmem_cache *cachep, void *objp,
+				   int nesting)
 {
 	struct slab *slabp = virt_to_slab(objp);
 	int nodeid = slabp->nodeid;
@@ -1044,7 +1045,7 @@ static inline int cache_free_alien(struc
 	STATS_INC_NODEFREES(cachep);
 	if (l3->alien && l3->alien[nodeid]) {
 		alien = l3->alien[nodeid];
-		spin_lock(&alien->lock);
+		spin_lock_nested(&alien->lock, nesting);
 		if (unlikely(alien->avail == alien->limit)) {
 			STATS_INC_ACOVERFLOW(cachep);
 			__drain_alien_cache(cachep, alien, nodeid);
@@ -1073,7 +1074,8 @@ static inline void free_alien_cache(stru
 {
 }
 
-static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
+static inline int cache_free_alien(struct kmem_cache *cachep, void *objp,
+				   int nesting)
 {
 	return 0;
 }
@@ -1278,6 +1280,11 @@ static void init_list(struct kmem_cache 
 
 	local_irq_disable();
 	memcpy(ptr, list, sizeof(struct kmem_list3));
+	/*
+	 * Do not assume that spinlocks can be initialized via memcpy:
+	 */
+	spin_lock_init(&ptr->list_lock);
+
 	MAKE_ALL_LISTS(cachep, ptr, nodeid);
 	cachep->nodelists[nodeid] = ptr;
 	local_irq_enable();
@@ -1408,7 +1415,7 @@ void __init kmem_cache_init(void)
 	}
 	/* 4) Replace the bootstrap head arrays */
 	{
-		void *ptr;
+		struct array_cache *ptr;
 
 		ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);
 
@@ -1416,6 +1423,11 @@ void __init kmem_cache_init(void)
 		BUG_ON(cpu_cache_get(&cache_cache) != &initarray_cache.cache);
 		memcpy(ptr, cpu_cache_get(&cache_cache),
 		       sizeof(struct arraycache_init));
+		/*
+		 * Do not assume that spinlocks can be initialized via memcpy:
+		 */
+		spin_lock_init(&ptr->lock);
+
 		cache_cache.array[smp_processor_id()] = ptr;
 		local_irq_enable();
 
@@ -1426,6 +1438,11 @@ void __init kmem_cache_init(void)
 		       != &initarray_generic.cache);
 		memcpy(ptr, cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep),
 		       sizeof(struct arraycache_init));
+		/*
+		 * Do not assume that spinlocks can be initialized via memcpy:
+		 */
+		spin_lock_init(&ptr->lock);
+
 		malloc_sizes[INDEX_AC].cs_cachep->array[smp_processor_id()] =
 		    ptr;
 		local_irq_enable();
@@ -1753,6 +1770,8 @@ static void slab_destroy_objs(struct kme
 }
 #endif
 
+static void __cache_free(struct kmem_cache *cachep, void *objp, int nesting);
+
 /**
  * slab_destroy - destroy and release all objects in a slab
  * @cachep: cache pointer being destroyed
@@ -1776,8 +1795,17 @@ static void slab_destroy(struct kmem_cac
 		call_rcu(&slab_rcu->head, kmem_rcu_free);
 	} else {
 		kmem_freepages(cachep, addr);
-		if (OFF_SLAB(cachep))
-			kmem_cache_free(cachep->slabp_cache, slabp);
+		if (OFF_SLAB(cachep)) {
+			unsigned long flags;
+
+			/*
+		 	 * lockdep: we may nest inside an already held
+			 * ac->lock, so pass in a nesting flag:
+			 */
+			local_irq_save(flags);
+			__cache_free(cachep->slabp_cache, slabp, 1);
+			local_irq_restore(flags);
+		}
 	}
 }
 
@@ -3062,7 +3090,16 @@ static void free_block(struct kmem_cache
 		if (slabp->inuse == 0) {
 			if (l3->free_objects > l3->free_limit) {
 				l3->free_objects -= cachep->num;
+				/*
+				 * It is safe to drop the lock. The slab is
+				 * no longer linked to the cache. cachep
+				 * cannot disappear - we are using it and
+				 * all destruction of caches must be
+				 * serialized properly by the user.
+				 */
+				spin_unlock(&l3->list_lock);
 				slab_destroy(cachep, slabp);
+				spin_lock(&l3->list_lock);
 			} else {
 				list_add(&slabp->list, &l3->slabs_free);
 			}
@@ -3088,7 +3125,7 @@ static void cache_flusharray(struct kmem
 #endif
 	check_irq_off();
 	l3 = cachep->nodelists[node];
-	spin_lock(&l3->list_lock);
+	spin_lock_nested(&l3->list_lock, SINGLE_DEPTH_NESTING);
 	if (l3->shared) {
 		struct array_cache *shared_array = l3->shared;
 		int max = shared_array->limit - shared_array->avail;
@@ -3131,14 +3168,14 @@ free_done:
  * Release an obj back to its cache. If the obj has a constructed state, it must
  * be in this state _before_ it is released.  Called with disabled ints.
  */
-static inline void __cache_free(struct kmem_cache *cachep, void *objp)
+static void __cache_free(struct kmem_cache *cachep, void *objp, int nesting)
 {
 	struct array_cache *ac = cpu_cache_get(cachep);
 
 	check_irq_off();
 	objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
 
-	if (cache_free_alien(cachep, objp))
+	if (cache_free_alien(cachep, objp, nesting))
 		return;
 
 	if (likely(ac->avail < ac->limit)) {
@@ -3393,7 +3430,7 @@ void kmem_cache_free(struct kmem_cache *
 	BUG_ON(virt_to_cache(objp) != cachep);
 
 	local_irq_save(flags);
-	__cache_free(cachep, objp);
+	__cache_free(cachep, objp, 0);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(kmem_cache_free);
@@ -3418,7 +3455,7 @@ void kfree(const void *objp)
 	kfree_debugcheck(objp);
 	c = virt_to_cache(objp);
 	debug_check_no_locks_freed(objp, obj_size(c));
-	__cache_free(c, (void *)objp);
+	__cache_free(c, (void *)objp, 0);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(kfree);

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 47/61] lock validator: special locking: skb_queue_head_init()
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (45 preceding siblings ...)
  2006-05-29 21:26 ` [patch 46/61] lock validator: special locking: slab Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:26 ` [patch 48/61] lock validator: special locking: timer.c Ingo Molnar
                   ` (26 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (multi-initialized) locking code to the lock validator.
Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
---
 include/linux/skbuff.h |    7 +------
 net/core/skbuff.c      |    9 +++++++++
 2 files changed, 10 insertions(+), 6 deletions(-)

Index: linux/include/linux/skbuff.h
===================================================================
--- linux.orig/include/linux/skbuff.h
+++ linux/include/linux/skbuff.h
@@ -584,12 +584,7 @@ static inline __u32 skb_queue_len(const 
 	return list_->qlen;
 }
 
-static inline void skb_queue_head_init(struct sk_buff_head *list)
-{
-	spin_lock_init(&list->lock);
-	list->prev = list->next = (struct sk_buff *)list;
-	list->qlen = 0;
-}
+extern void skb_queue_head_init(struct sk_buff_head *list);
 
 /*
  *	Insert an sk_buff at the start of a list.
Index: linux/net/core/skbuff.c
===================================================================
--- linux.orig/net/core/skbuff.c
+++ linux/net/core/skbuff.c
@@ -71,6 +71,15 @@
 static kmem_cache_t *skbuff_head_cache __read_mostly;
 static kmem_cache_t *skbuff_fclone_cache __read_mostly;
 
+void skb_queue_head_init(struct sk_buff_head *list)
+{
+	spin_lock_init(&list->lock);
+	list->prev = list->next = (struct sk_buff *)list;
+	list->qlen = 0;
+}
+
+EXPORT_SYMBOL(skb_queue_head_init);
+
 /*
  *	Keep out-of-line to prevent kernel bloat.
  *	__builtin_return_address is not used because it is not always

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 48/61] lock validator: special locking: timer.c
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (46 preceding siblings ...)
  2006-05-29 21:26 ` [patch 47/61] lock validator: special locking: skb_queue_head_init() Ingo Molnar
@ 2006-05-29 21:26 ` Ingo Molnar
  2006-05-29 21:27 ` [patch 49/61] lock validator: special locking: sched.c Ingo Molnar
                   ` (25 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/timer.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Index: linux/kernel/timer.c
===================================================================
--- linux.orig/kernel/timer.c
+++ linux/kernel/timer.c
@@ -1496,6 +1496,13 @@ asmlinkage long sys_sysinfo(struct sysin
 	return 0;
 }
 
+/*
+ * lockdep: we want to track each per-CPU base as a separate lock-type,
+ * but timer-bases are kmalloc()-ed, so we need to attach separate
+ * keys to them:
+ */
+static struct lockdep_type_key base_lock_keys[NR_CPUS];
+
 static int __devinit init_timers_cpu(int cpu)
 {
 	int j;
@@ -1530,7 +1537,7 @@ static int __devinit init_timers_cpu(int
 		base = per_cpu(tvec_bases, cpu);
 	}
 
-	spin_lock_init(&base->lock);
+	spin_lock_init_key(&base->lock, base_lock_keys + cpu);
 	for (j = 0; j < TVN_SIZE; j++) {
 		INIT_LIST_HEAD(base->tv5.vec + j);
 		INIT_LIST_HEAD(base->tv4.vec + j);

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 49/61] lock validator: special locking: sched.c
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (47 preceding siblings ...)
  2006-05-29 21:26 ` [patch 48/61] lock validator: special locking: timer.c Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-29 21:27 ` [patch 50/61] lock validator: special locking: hrtimer.c Ingo Molnar
                   ` (24 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/sched.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1963,7 +1963,7 @@ static void double_rq_unlock(runqueue_t 
 	__releases(rq1->lock)
 	__releases(rq2->lock)
 {
-	spin_unlock(&rq1->lock);
+	spin_unlock_non_nested(&rq1->lock);
 	if (rq1 != rq2)
 		spin_unlock(&rq2->lock);
 	else
@@ -1980,7 +1980,7 @@ static void double_lock_balance(runqueue
 {
 	if (unlikely(!spin_trylock(&busiest->lock))) {
 		if (busiest->cpu < this_rq->cpu) {
-			spin_unlock(&this_rq->lock);
+			spin_unlock_non_nested(&this_rq->lock);
 			spin_lock(&busiest->lock);
 			spin_lock(&this_rq->lock);
 		} else
@@ -2602,7 +2602,7 @@ static int load_balance_newidle(int this
 		nr_moved = move_tasks(this_rq, this_cpu, busiest,
 					minus_1_or_zero(busiest->nr_running),
 					imbalance, sd, NEWLY_IDLE, NULL);
-		spin_unlock(&busiest->lock);
+		spin_unlock_non_nested(&busiest->lock);
 	}
 
 	if (!nr_moved) {
@@ -2687,7 +2687,7 @@ static void active_load_balance(runqueue
 	else
 		schedstat_inc(sd, alb_failed);
 out:
-	spin_unlock(&target_rq->lock);
+	spin_unlock_non_nested(&target_rq->lock);
 }
 
 /*
@@ -3032,7 +3032,7 @@ static void wake_sleeping_dependent(int 
 	}
 
 	for_each_cpu_mask(i, sibling_map)
-		spin_unlock(&cpu_rq(i)->lock);
+		spin_unlock_non_nested(&cpu_rq(i)->lock);
 	/*
 	 * We exit with this_cpu's rq still held and IRQs
 	 * still disabled:
@@ -3068,7 +3068,7 @@ static int dependent_sleeper(int this_cp
 	 * The same locking rules and details apply as for
 	 * wake_sleeping_dependent():
 	 */
-	spin_unlock(&this_rq->lock);
+	spin_unlock_non_nested(&this_rq->lock);
 	sibling_map = sd->span;
 	for_each_cpu_mask(i, sibling_map)
 		spin_lock(&cpu_rq(i)->lock);
@@ -3146,7 +3146,7 @@ check_smt_task:
 	}
 out_unlock:
 	for_each_cpu_mask(i, sibling_map)
-		spin_unlock(&cpu_rq(i)->lock);
+		spin_unlock_non_nested(&cpu_rq(i)->lock);
 	return ret;
 }
 #else
@@ -6680,7 +6680,7 @@ void __init sched_init(void)
 		prio_array_t *array;
 
 		rq = cpu_rq(i);
-		spin_lock_init(&rq->lock);
+		spin_lock_init_static(&rq->lock);
 		rq->nr_running = 0;
 		rq->active = rq->arrays;
 		rq->expired = rq->arrays + 1;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 50/61] lock validator: special locking: hrtimer.c
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (48 preceding siblings ...)
  2006-05-29 21:27 ` [patch 49/61] lock validator: special locking: sched.c Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-30  1:35   ` Andrew Morton
  2006-05-29 21:27 ` [patch 51/61] lock validator: special locking: sock_lock_init() Ingo Molnar
                   ` (23 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/hrtimer.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/hrtimer.c
===================================================================
--- linux.orig/kernel/hrtimer.c
+++ linux/kernel/hrtimer.c
@@ -786,7 +786,7 @@ static void __devinit init_hrtimers_cpu(
 	int i;
 
 	for (i = 0; i < MAX_HRTIMER_BASES; i++, base++)
-		spin_lock_init(&base->lock);
+		spin_lock_init_static(&base->lock);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 51/61] lock validator: special locking: sock_lock_init()
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (49 preceding siblings ...)
  2006-05-29 21:27 ` [patch 50/61] lock validator: special locking: hrtimer.c Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-30  1:36   ` Andrew Morton
  2006-05-29 21:27 ` [patch 52/61] lock validator: special locking: af_unix Ingo Molnar
                   ` (22 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (multi-initialized, per-address-family) locking code to the
lock validator. Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/net/sock.h |    6 ------
 net/core/sock.c    |   27 +++++++++++++++++++++++----
 2 files changed, 23 insertions(+), 10 deletions(-)

Index: linux/include/net/sock.h
===================================================================
--- linux.orig/include/net/sock.h
+++ linux/include/net/sock.h
@@ -81,12 +81,6 @@ typedef struct {
 	wait_queue_head_t	wq;
 } socket_lock_t;
 
-#define sock_lock_init(__sk) \
-do {	spin_lock_init(&((__sk)->sk_lock.slock)); \
-	(__sk)->sk_lock.owner = NULL; \
-	init_waitqueue_head(&((__sk)->sk_lock.wq)); \
-} while(0)
-
 struct sock;
 struct proto;
 
Index: linux/net/core/sock.c
===================================================================
--- linux.orig/net/core/sock.c
+++ linux/net/core/sock.c
@@ -739,6 +739,27 @@ lenout:
   	return 0;
 }
 
+/*
+ * Each address family might have different locking rules, so we have
+ * one slock key per address family:
+ */
+static struct lockdep_type_key af_family_keys[AF_MAX];
+
+static void noinline sock_lock_init(struct sock *sk)
+{
+	spin_lock_init_key(&sk->sk_lock.slock, af_family_keys + sk->sk_family);
+	sk->sk_lock.owner = NULL;
+	init_waitqueue_head(&sk->sk_lock.wq);
+}
+
+static struct lockdep_type_key af_callback_keys[AF_MAX];
+
+static void noinline sock_rwlock_init(struct sock *sk)
+{
+	rwlock_init(&sk->sk_dst_lock);
+	rwlock_init_key(&sk->sk_callback_lock, af_callback_keys + sk->sk_family);
+}
+
 /**
  *	sk_alloc - All socket objects are allocated here
  *	@family: protocol family
@@ -833,8 +854,7 @@ struct sock *sk_clone(const struct sock 
 		skb_queue_head_init(&newsk->sk_receive_queue);
 		skb_queue_head_init(&newsk->sk_write_queue);
 
-		rwlock_init(&newsk->sk_dst_lock);
-		rwlock_init(&newsk->sk_callback_lock);
+		sock_rwlock_init(newsk);
 
 		newsk->sk_dst_cache	= NULL;
 		newsk->sk_wmem_queued	= 0;
@@ -1404,8 +1424,7 @@ void sock_init_data(struct socket *sock,
 	} else
 		sk->sk_sleep	=	NULL;
 
-	rwlock_init(&sk->sk_dst_lock);
-	rwlock_init(&sk->sk_callback_lock);
+	sock_rwlock_init(sk);
 
 	sk->sk_state_change	=	sock_def_wakeup;
 	sk->sk_data_ready	=	sock_def_readable;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 52/61] lock validator: special locking: af_unix
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (50 preceding siblings ...)
  2006-05-29 21:27 ` [patch 51/61] lock validator: special locking: sock_lock_init() Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-30  1:36   ` Andrew Morton
  2006-05-29 21:27 ` [patch 53/61] lock validator: special locking: bh_lock_sock() Ingo Molnar
                   ` (21 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

(includes workaround for sk_receive_queue.lock, which is currently
treated globally by the lock validator, but which be switched to
per-address-family locking rules.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/net/af_unix.h |    3 +++
 net/unix/af_unix.c    |   10 +++++-----
 net/unix/garbage.c    |    8 ++++----
 3 files changed, 12 insertions(+), 9 deletions(-)

Index: linux/include/net/af_unix.h
===================================================================
--- linux.orig/include/net/af_unix.h
+++ linux/include/net/af_unix.h
@@ -61,6 +61,9 @@ struct unix_skb_parms {
 #define unix_state_rlock(s)	spin_lock(&unix_sk(s)->lock)
 #define unix_state_runlock(s)	spin_unlock(&unix_sk(s)->lock)
 #define unix_state_wlock(s)	spin_lock(&unix_sk(s)->lock)
+#define unix_state_wlock_nested(s) \
+				spin_lock_nested(&unix_sk(s)->lock, \
+				SINGLE_DEPTH_NESTING)
 #define unix_state_wunlock(s)	spin_unlock(&unix_sk(s)->lock)
 
 #ifdef __KERNEL__
Index: linux/net/unix/af_unix.c
===================================================================
--- linux.orig/net/unix/af_unix.c
+++ linux/net/unix/af_unix.c
@@ -1022,7 +1022,7 @@ restart:
 		goto out_unlock;
 	}
 
-	unix_state_wlock(sk);
+	unix_state_wlock_nested(sk);
 
 	if (sk->sk_state != st) {
 		unix_state_wunlock(sk);
@@ -1073,12 +1073,12 @@ restart:
 	unix_state_wunlock(sk);
 
 	/* take ten and and send info to listening sock */
-	spin_lock(&other->sk_receive_queue.lock);
+	spin_lock_bh(&other->sk_receive_queue.lock);
 	__skb_queue_tail(&other->sk_receive_queue, skb);
 	/* Undo artificially decreased inflight after embrion
 	 * is installed to listening socket. */
 	atomic_inc(&newu->inflight);
-	spin_unlock(&other->sk_receive_queue.lock);
+	spin_unlock_bh(&other->sk_receive_queue.lock);
 	unix_state_runlock(other);
 	other->sk_data_ready(other, 0);
 	sock_put(other);
@@ -1843,7 +1843,7 @@ static int unix_ioctl(struct socket *soc
 				break;
 			}
 
-			spin_lock(&sk->sk_receive_queue.lock);
+			spin_lock_bh(&sk->sk_receive_queue.lock);
 			if (sk->sk_type == SOCK_STREAM ||
 			    sk->sk_type == SOCK_SEQPACKET) {
 				skb_queue_walk(&sk->sk_receive_queue, skb)
@@ -1853,7 +1853,7 @@ static int unix_ioctl(struct socket *soc
 				if (skb)
 					amount=skb->len;
 			}
-			spin_unlock(&sk->sk_receive_queue.lock);
+			spin_unlock_bh(&sk->sk_receive_queue.lock);
 			err = put_user(amount, (int __user *)arg);
 			break;
 		}
Index: linux/net/unix/garbage.c
===================================================================
--- linux.orig/net/unix/garbage.c
+++ linux/net/unix/garbage.c
@@ -235,7 +235,7 @@ void unix_gc(void)
 		struct sock *x = pop_stack();
 		struct sock *sk;
 
-		spin_lock(&x->sk_receive_queue.lock);
+		spin_lock_bh(&x->sk_receive_queue.lock);
 		skb = skb_peek(&x->sk_receive_queue);
 		
 		/*
@@ -270,7 +270,7 @@ void unix_gc(void)
 				maybe_unmark_and_push(skb->sk);
 			skb=skb->next;
 		}
-		spin_unlock(&x->sk_receive_queue.lock);
+		spin_unlock_bh(&x->sk_receive_queue.lock);
 		sock_put(x);
 	}
 
@@ -283,7 +283,7 @@ void unix_gc(void)
 		if (u->gc_tree == GC_ORPHAN) {
 			struct sk_buff *nextsk;
 
-			spin_lock(&s->sk_receive_queue.lock);
+			spin_lock_bh(&s->sk_receive_queue.lock);
 			skb = skb_peek(&s->sk_receive_queue);
 			while (skb &&
 			       skb != (struct sk_buff *)&s->sk_receive_queue) {
@@ -298,7 +298,7 @@ void unix_gc(void)
 				}
 				skb = nextsk;
 			}
-			spin_unlock(&s->sk_receive_queue.lock);
+			spin_unlock_bh(&s->sk_receive_queue.lock);
 		}
 		u->gc_tree = GC_ORPHAN;
 	}

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 53/61] lock validator: special locking: bh_lock_sock()
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (51 preceding siblings ...)
  2006-05-29 21:27 ` [patch 52/61] lock validator: special locking: af_unix Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-29 21:27 ` [patch 54/61] lock validator: special locking: mmap_sem Ingo Molnar
                   ` (20 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/net/sock.h  |    3 +++
 net/ipv4/tcp_ipv4.c |    2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

Index: linux/include/net/sock.h
===================================================================
--- linux.orig/include/net/sock.h
+++ linux/include/net/sock.h
@@ -743,6 +743,9 @@ extern void FASTCALL(release_sock(struct
 
 /* BH context may only use the following locking interface. */
 #define bh_lock_sock(__sk)	spin_lock(&((__sk)->sk_lock.slock))
+#define bh_lock_sock_nested(__sk) \
+				spin_lock_nested(&((__sk)->sk_lock.slock), \
+				SINGLE_DEPTH_NESTING)
 #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
 
 extern struct sock		*sk_alloc(int family,
Index: linux/net/ipv4/tcp_ipv4.c
===================================================================
--- linux.orig/net/ipv4/tcp_ipv4.c
+++ linux/net/ipv4/tcp_ipv4.c
@@ -1088,7 +1088,7 @@ process:
 
 	skb->dev = NULL;
 
-	bh_lock_sock(sk);
+	bh_lock_sock_nested(sk);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
 		if (!tcp_prequeue(sk, skb))

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 54/61] lock validator: special locking: mmap_sem
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (52 preceding siblings ...)
  2006-05-29 21:27 ` [patch 53/61] lock validator: special locking: bh_lock_sock() Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-29 21:27 ` [patch 55/61] lock validator: special locking: sb->s_umount Ingo Molnar
                   ` (19 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/exit.c |    2 +-
 kernel/fork.c |    5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -582,7 +582,7 @@ static void exit_mm(struct task_struct *
 	/* more a memory barrier than a real lock */
 	task_lock(tsk);
 	tsk->mm = NULL;
-	up_read(&mm->mmap_sem);
+	up_read_non_nested(&mm->mmap_sem);
 	enter_lazy_tlb(mm, current);
 	task_unlock(tsk);
 	mmput(mm);
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -196,7 +196,10 @@ static inline int dup_mmap(struct mm_str
 
 	down_write(&oldmm->mmap_sem);
 	flush_cache_mm(oldmm);
-	down_write(&mm->mmap_sem);
+	/*
+	 * Not linked in yet - no deadlock potential:
+	 */
+	down_write_nested(&mm->mmap_sem, 1);
 
 	mm->locked_vm = 0;
 	mm->mmap = NULL;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 55/61] lock validator: special locking: sb->s_umount
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (53 preceding siblings ...)
  2006-05-29 21:27 ` [patch 54/61] lock validator: special locking: mmap_sem Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-30  1:36   ` Andrew Morton
  2006-05-29 21:27 ` [patch 56/61] lock validator: special locking: jbd Ingo Molnar
                   ` (18 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

workaround for special sb->s_umount locking rule.

s_umount gets held across a series of lock dropping and releasing
in prune_one_dentry(), so i changed the order, at the risk of
introducing a umount race. FIXME.

i think a better fix would be to do the unlocks as _non_nested in
prune_one_dentry(), and to do the up_read() here as
an up_read_non_nested() as well?

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/dcache.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/fs/dcache.c
===================================================================
--- linux.orig/fs/dcache.c
+++ linux/fs/dcache.c
@@ -470,8 +470,9 @@ static void prune_dcache(int count, stru
 		s_umount = &dentry->d_sb->s_umount;
 		if (down_read_trylock(s_umount)) {
 			if (dentry->d_sb->s_root != NULL) {
-				prune_one_dentry(dentry);
+// lockdep hack: do this better!
 				up_read(s_umount);
+				prune_one_dentry(dentry);
 				continue;
 			}
 			up_read(s_umount);

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 56/61] lock validator: special locking: jbd
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (54 preceding siblings ...)
  2006-05-29 21:27 ` [patch 55/61] lock validator: special locking: sb->s_umount Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-29 21:27 ` [patch 57/61] lock validator: special locking: posix-timers Ingo Molnar
                   ` (17 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (non-nested) unlocking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/jbd/checkpoint.c |    2 +-
 fs/jbd/commit.c     |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux/fs/jbd/checkpoint.c
===================================================================
--- linux.orig/fs/jbd/checkpoint.c
+++ linux/fs/jbd/checkpoint.c
@@ -135,7 +135,7 @@ void __log_wait_for_space(journal_t *jou
 			log_do_checkpoint(journal);
 			spin_lock(&journal->j_state_lock);
 		}
-		mutex_unlock(&journal->j_checkpoint_mutex);
+		mutex_unlock_non_nested(&journal->j_checkpoint_mutex);
 	}
 }
 
Index: linux/fs/jbd/commit.c
===================================================================
--- linux.orig/fs/jbd/commit.c
+++ linux/fs/jbd/commit.c
@@ -838,7 +838,7 @@ restart_loop:
 	J_ASSERT(commit_transaction == journal->j_committing_transaction);
 	journal->j_commit_sequence = commit_transaction->t_tid;
 	journal->j_committing_transaction = NULL;
-	spin_unlock(&journal->j_state_lock);
+	spin_unlock_non_nested(&journal->j_state_lock);
 
 	if (commit_transaction->t_checkpoint_list == NULL) {
 		__journal_drop_transaction(journal, commit_transaction);

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 57/61] lock validator: special locking: posix-timers
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (55 preceding siblings ...)
  2006-05-29 21:27 ` [patch 56/61] lock validator: special locking: jbd Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-29 21:27 ` [patch 58/61] lock validator: special locking: sch_generic.c Ingo Molnar
                   ` (16 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (non-nested) unlocking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/posix-timers.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/posix-timers.c
===================================================================
--- linux.orig/kernel/posix-timers.c
+++ linux/kernel/posix-timers.c
@@ -576,7 +576,7 @@ static struct k_itimer * lock_timer(time
 	timr = (struct k_itimer *) idr_find(&posix_timers_id, (int) timer_id);
 	if (timr) {
 		spin_lock(&timr->it_lock);
-		spin_unlock(&idr_lock);
+		spin_unlock_non_nested(&idr_lock);
 
 		if ((timr->it_id != timer_id) || !(timr->it_process) ||
 				timr->it_process->tgid != current->tgid) {

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 58/61] lock validator: special locking: sch_generic.c
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (56 preceding siblings ...)
  2006-05-29 21:27 ` [patch 57/61] lock validator: special locking: posix-timers Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-29 21:27 ` [patch 59/61] lock validator: special locking: xfrm Ingo Molnar
                   ` (15 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (non-nested) unlocking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 net/sched/sch_generic.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/net/sched/sch_generic.c
===================================================================
--- linux.orig/net/sched/sch_generic.c
+++ linux/net/sched/sch_generic.c
@@ -132,7 +132,7 @@ int qdisc_restart(struct net_device *dev
 		
 		{
 			/* And release queue */
-			spin_unlock(&dev->queue_lock);
+			spin_unlock_non_nested(&dev->queue_lock);
 
 			if (!netif_queue_stopped(dev)) {
 				int ret;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 59/61] lock validator: special locking: xfrm
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (57 preceding siblings ...)
  2006-05-29 21:27 ` [patch 58/61] lock validator: special locking: sch_generic.c Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-30  1:36   ` Andrew Morton
  2006-05-29 21:27 ` [patch 60/61] lock validator: special locking: sound/core/seq/seq_ports.c Ingo Molnar
                   ` (14 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (non-nested) unlocking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 net/xfrm/xfrm_policy.c |    2 +-
 net/xfrm/xfrm_state.c  |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux/net/xfrm/xfrm_policy.c
===================================================================
--- linux.orig/net/xfrm/xfrm_policy.c
+++ linux/net/xfrm/xfrm_policy.c
@@ -1308,7 +1308,7 @@ static struct xfrm_policy_afinfo *xfrm_p
 	afinfo = xfrm_policy_afinfo[family];
 	if (likely(afinfo != NULL))
 		read_lock(&afinfo->lock);
-	read_unlock(&xfrm_policy_afinfo_lock);
+	read_unlock_non_nested(&xfrm_policy_afinfo_lock);
 	return afinfo;
 }
 
Index: linux/net/xfrm/xfrm_state.c
===================================================================
--- linux.orig/net/xfrm/xfrm_state.c
+++ linux/net/xfrm/xfrm_state.c
@@ -1105,7 +1105,7 @@ static struct xfrm_state_afinfo *xfrm_st
 	afinfo = xfrm_state_afinfo[family];
 	if (likely(afinfo != NULL))
 		read_lock(&afinfo->lock);
-	read_unlock(&xfrm_state_afinfo_lock);
+	read_unlock_non_nested(&xfrm_state_afinfo_lock);
 	return afinfo;
 }
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 60/61] lock validator: special locking: sound/core/seq/seq_ports.c
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (58 preceding siblings ...)
  2006-05-29 21:27 ` [patch 59/61] lock validator: special locking: xfrm Ingo Molnar
@ 2006-05-29 21:27 ` Ingo Molnar
  2006-05-29 21:28 ` [patch 61/61] lock validator: enable lock validator in Kconfig Ingo Molnar
                   ` (13 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

teach special (recursive) locking code to the lock validator. Has no
effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 sound/core/seq/seq_ports.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/sound/core/seq/seq_ports.c
===================================================================
--- linux.orig/sound/core/seq/seq_ports.c
+++ linux/sound/core/seq/seq_ports.c
@@ -518,7 +518,7 @@ int snd_seq_port_connect(struct snd_seq_
 	atomic_set(&subs->ref_count, 2);
 
 	down_write(&src->list_mutex);
-	down_write(&dest->list_mutex);
+	down_write_nested(&dest->list_mutex, SINGLE_DEPTH_NESTING);
 
 	exclusive = info->flags & SNDRV_SEQ_PORT_SUBS_EXCLUSIVE ? 1 : 0;
 	err = -EBUSY;
@@ -591,7 +591,7 @@ int snd_seq_port_disconnect(struct snd_s
 	unsigned long flags;
 
 	down_write(&src->list_mutex);
-	down_write(&dest->list_mutex);
+	down_write_nested(&dest->list_mutex, SINGLE_DEPTH_NESTING);
 
 	/* look for the connection */
 	list_for_each(p, &src->list_head) {

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 61/61] lock validator: enable lock validator in Kconfig
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (59 preceding siblings ...)
  2006-05-29 21:27 ` [patch 60/61] lock validator: special locking: sound/core/seq/seq_ports.c Ingo Molnar
@ 2006-05-29 21:28 ` Ingo Molnar
  2006-05-30  1:36   ` Andrew Morton
  2006-05-30 13:33   ` Roman Zippel
  2006-05-29 22:28 ` [patch 00/61] ANNOUNCE: lock validator -V1 Michal Piotrowski
                   ` (12 subsequent siblings)
  73 siblings, 2 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 21:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, Andrew Morton

From: Ingo Molnar <mingo@elte.hu>

offer the following lock validation options:

 CONFIG_PROVE_SPIN_LOCKING
 CONFIG_PROVE_RW_LOCKING
 CONFIG_PROVE_MUTEX_LOCKING
 CONFIG_PROVE_RWSEM_LOCKING

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 lib/Kconfig.debug |  167 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -184,6 +184,173 @@ config DEBUG_SPINLOCK
 	  best used in conjunction with the NMI watchdog so that spinlock
 	  deadlocks are also debuggable.
 
+config PROVE_SPIN_LOCKING
+	bool "Prove spin-locking correctness"
+	default y
+	help
+	 This feature enables the kernel to prove that all spinlock
+	 locking that occurs in the kernel runtime is mathematically
+	 correct: that under no circumstance could an arbitrary (and
+	 not yet triggered) combination of observed spinlock locking
+	 sequences (on an arbitrary number of CPUs, running an
+	 arbitrary number of tasks and interrupt contexts) cause a
+	 deadlock.
+
+	 In short, this feature enables the kernel to report spinlock
+	 deadlocks before they actually occur.
+
+	 The proof does not depend on how hard and complex a
+	 deadlock scenario would be to trigger: how many
+	 participant CPUs, tasks and irq-contexts would be needed
+	 for it to trigger. The proof also does not depend on
+	 timing: if a race and a resulting deadlock is possible
+	 theoretically (no matter how unlikely the race scenario
+	 is), it will be proven so and will immediately be
+	 reported by the kernel (once the event is observed that
+	 makes the deadlock theoretically possible).
+
+	 If a deadlock is impossible (i.e. the locking rules, as
+	 observed by the kernel, are mathematically correct), the
+	 kernel reports nothing.
+
+	 NOTE: this feature can also be enabled for rwlocks, mutexes
+	 and rwsems - in which case all dependencies between these
+	 different locking variants are observed and mapped too, and
+	 the proof of observed correctness is also maintained for an
+	 arbitrary combination of these separate locking variants.
+
+	 For more details, see Documentation/locking-correctness.txt.
+
+config PROVE_RW_LOCKING
+	bool "Prove rw-locking correctness"
+	default y
+	help
+	 This feature enables the kernel to prove that all rwlock
+	 locking that occurs in the kernel runtime is mathematically
+	 correct: that under no circumstance could an arbitrary (and
+	 not yet triggered) combination of observed rwlock locking
+	 sequences (on an arbitrary number of CPUs, running an
+	 arbitrary number of tasks and interrupt contexts) cause a
+	 deadlock.
+
+	 In short, this feature enables the kernel to report rwlock
+	 deadlocks before they actually occur.
+
+	 The proof does not depend on how hard and complex a
+	 deadlock scenario would be to trigger: how many
+	 participant CPUs, tasks and irq-contexts would be needed
+	 for it to trigger. The proof also does not depend on
+	 timing: if a race and a resulting deadlock is possible
+	 theoretically (no matter how unlikely the race scenario
+	 is), it will be proven so and will immediately be
+	 reported by the kernel (once the event is observed that
+	 makes the deadlock theoretically possible).
+
+	 If a deadlock is impossible (i.e. the locking rules, as
+	 observed by the kernel, are mathematically correct), the
+	 kernel reports nothing.
+
+	 NOTE: this feature can also be enabled for spinlocks, mutexes
+	 and rwsems - in which case all dependencies between these
+	 different locking variants are observed and mapped too, and
+	 the proof of observed correctness is also maintained for an
+	 arbitrary combination of these separate locking variants.
+
+	 For more details, see Documentation/locking-correctness.txt.
+
+config PROVE_MUTEX_LOCKING
+	bool "Prove mutex-locking correctness"
+	default y
+	help
+	 This feature enables the kernel to prove that all mutexlock
+	 locking that occurs in the kernel runtime is mathematically
+	 correct: that under no circumstance could an arbitrary (and
+	 not yet triggered) combination of observed mutexlock locking
+	 sequences (on an arbitrary number of CPUs, running an
+	 arbitrary number of tasks and interrupt contexts) cause a
+	 deadlock.
+
+	 In short, this feature enables the kernel to report mutexlock
+	 deadlocks before they actually occur.
+
+	 The proof does not depend on how hard and complex a
+	 deadlock scenario would be to trigger: how many
+	 participant CPUs, tasks and irq-contexts would be needed
+	 for it to trigger. The proof also does not depend on
+	 timing: if a race and a resulting deadlock is possible
+	 theoretically (no matter how unlikely the race scenario
+	 is), it will be proven so and will immediately be
+	 reported by the kernel (once the event is observed that
+	 makes the deadlock theoretically possible).
+
+	 If a deadlock is impossible (i.e. the locking rules, as
+	 observed by the kernel, are mathematically correct), the
+	 kernel reports nothing.
+
+	 NOTE: this feature can also be enabled for spinlock, rwlocks
+	 and rwsems - in which case all dependencies between these
+	 different locking variants are observed and mapped too, and
+	 the proof of observed correctness is also maintained for an
+	 arbitrary combination of these separate locking variants.
+
+	 For more details, see Documentation/locking-correctness.txt.
+
+config PROVE_RWSEM_LOCKING
+	bool "Prove rwsem-locking correctness"
+	default y
+	help
+	 This feature enables the kernel to prove that all rwsemlock
+	 locking that occurs in the kernel runtime is mathematically
+	 correct: that under no circumstance could an arbitrary (and
+	 not yet triggered) combination of observed rwsemlock locking
+	 sequences (on an arbitrary number of CPUs, running an
+	 arbitrary number of tasks and interrupt contexts) cause a
+	 deadlock.
+
+	 In short, this feature enables the kernel to report rwsemlock
+	 deadlocks before they actually occur.
+
+	 The proof does not depend on how hard and complex a
+	 deadlock scenario would be to trigger: how many
+	 participant CPUs, tasks and irq-contexts would be needed
+	 for it to trigger. The proof also does not depend on
+	 timing: if a race and a resulting deadlock is possible
+	 theoretically (no matter how unlikely the race scenario
+	 is), it will be proven so and will immediately be
+	 reported by the kernel (once the event is observed that
+	 makes the deadlock theoretically possible).
+
+	 If a deadlock is impossible (i.e. the locking rules, as
+	 observed by the kernel, are mathematically correct), the
+	 kernel reports nothing.
+
+	 NOTE: this feature can also be enabled for spinlocks, rwlocks
+	 and mutexes - in which case all dependencies between these
+	 different locking variants are observed and mapped too, and
+	 the proof of observed correctness is also maintained for an
+	 arbitrary combination of these separate locking variants.
+
+	 For more details, see Documentation/locking-correctness.txt.
+
+config LOCKDEP
+	bool
+	default y
+	depends on PROVE_SPIN_LOCKING || PROVE_RW_LOCKING || PROVE_MUTEX_LOCKING || PROVE_RWSEM_LOCKING
+
+config DEBUG_LOCKDEP
+	bool "Lock dependency engine debugging"
+	depends on LOCKDEP
+	default y
+	help
+	  If you say Y here, the lock dependency engine will do
+	  additional runtime checks to debug itself, at the price
+	  of more runtime overhead.
+
+config TRACE_IRQFLAGS
+	bool
+	default y
+	depends on PROVE_SPIN_LOCKING || PROVE_RW_LOCKING
+
 config DEBUG_SPINLOCK_SLEEP
 	bool "Sleep-inside-spinlock checking"
 	depends on DEBUG_KERNEL

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (60 preceding siblings ...)
  2006-05-29 21:28 ` [patch 61/61] lock validator: enable lock validator in Kconfig Ingo Molnar
@ 2006-05-29 22:28 ` Michal Piotrowski
  2006-05-29 22:41   ` Ingo Molnar
  2006-05-30  5:20   ` Arjan van de Ven
  2006-05-30  1:35 ` Andrew Morton
                   ` (11 subsequent siblings)
  73 siblings, 2 replies; 319+ messages in thread
From: Michal Piotrowski @ 2006-05-29 22:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton, Dave Jones

On 29/05/06, Ingo Molnar <mingo@elte.hu> wrote:
> We are pleased to announce the first release of the "lock dependency
> correctness validator" kernel debugging feature, which can be downloaded
> from:
>
>   http://redhat.com/~mingo/lockdep-patches/
>
[snip]

I get this while loading cpufreq modules

=====================================================
[ BUG: possible circular locking deadlock detected! ]
-----------------------------------------------------
modprobe/1942 is trying to acquire lock:
 (&anon_vma->lock){--..}, at: [<c10609cf>] anon_vma_link+0x1d/0xc9

but task is already holding lock:
 (&mm->mmap_sem/1){--..}, at: [<c101e5a0>] copy_process+0xbc6/0x1519

which lock already depends on the new lock,
which could lead to circular deadlocks!

the existing dependency chain (in reverse order) is:

-> #1 (cpucontrol){--..}:
       [<c10394be>] lockdep_acquire+0x69/0x82
       [<c11ed759>] __mutex_lock_slowpath+0xd0/0x347
       [<c11ed9ec>] mutex_lock+0x1c/0x1f
       [<c103dda5>] __lock_cpu_hotplug+0x36/0x56
       [<c103ddde>] lock_cpu_hotplug+0xa/0xc
       [<c1199e06>] __cpufreq_driver_target+0x15/0x50
       [<c119a1c2>] cpufreq_governor_performance+0x1a/0x20
       [<c1198b0a>] __cpufreq_governor+0xa0/0x1a9
       [<c1198ce2>] __cpufreq_set_policy+0xcf/0x100
       [<c11991c6>] cpufreq_set_policy+0x2d/0x6f
       [<c1199cae>] cpufreq_add_dev+0x34f/0x492
       [<c114b8c8>] sysdev_driver_register+0x58/0x9b
       [<c119a036>] cpufreq_register_driver+0x80/0xf4
       [<fd97b02a>] ct_get_next+0x17/0x3f [ip_conntrack]
       [<c10410e1>] sys_init_module+0xa6/0x230
       [<c11ef9ab>] sysenter_past_esp+0x54/0x8d

-> #0 (&anon_vma->lock){--..}:
       [<c10394be>] lockdep_acquire+0x69/0x82
       [<c11ed759>] __mutex_lock_slowpath+0xd0/0x347
       [<c11ed9ec>] mutex_lock+0x1c/0x1f
       [<c11990eb>] cpufreq_update_policy+0x34/0xd8
       [<fd9ad50b>] cpufreq_stat_cpu_callback+0x1b/0x7c [cpufreq_stats]
       [<fd9b007d>] cpufreq_stats_init+0x7d/0x9b [cpufreq_stats]
       [<c10410e1>] sys_init_module+0xa6/0x230
       [<c11ef9ab>] sysenter_past_esp+0x54/0x8d

other info that might help us debug this:

1 locks held by modprobe/1942:
  #0:  (cpucontrol){--..}, at: [<c11ed9ec>] mutex_lock+0x1c/0x1f

stack backtrace:
 <c1003f36> show_trace+0xd/0xf  <c1004449> dump_stack+0x17/0x19
 <c103863e> print_circular_bug_tail+0x59/0x64  <c1038e91>
__lockdep_acquire+0x848/0xa39
 <c10394be> lockdep_acquire+0x69/0x82  <c11ed759>
__mutex_lock_slowpath+0xd0/0x347
 <c11ed9ec> mutex_lock+0x1c/0x1f  <c11990eb> cpufreq_update_policy+0x34/0xd8
 <fd9ad50b> cpufreq_stat_cpu_callback+0x1b/0x7c [cpufreq_stats]
<fd9b007d> cpufreq_stats_init+0x7d/0x9b [cpufreq_stats]
 <c10410e1> sys_init_module+0xa6/0x230  <c11ef9ab> sysenter_past_esp+0x54/0x8d

Here is dmesg http://www.stardust.webpages.pl/files/lockdep/2.6.17-rc4-mm3-lockdep1/lockdep-dmesg3

Here is config
http://www.stardust.webpages.pl/files/lockdep/2.6.17-rc4-mm3-lockdep1/lockdep-config2

BTW I still must revert lockdep-serial.patch - it doesn't compile on
my gcc 4.1.1

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 22:28 ` [patch 00/61] ANNOUNCE: lock validator -V1 Michal Piotrowski
@ 2006-05-29 22:41   ` Ingo Molnar
  2006-05-29 23:09     ` Dave Jones
  2006-05-30  5:20   ` Arjan van de Ven
  1 sibling, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-29 22:41 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: linux-kernel, Arjan van de Ven, Andrew Morton, Dave Jones


* Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:

> On 29/05/06, Ingo Molnar <mingo@elte.hu> wrote:
> >We are pleased to announce the first release of the "lock dependency
> >correctness validator" kernel debugging feature, which can be downloaded
> >from:
> >
> >  http://redhat.com/~mingo/lockdep-patches/
> >
> [snip]
> 
> I get this while loading cpufreq modules
> 
> =====================================================
> [ BUG: possible circular locking deadlock detected! ]
> -----------------------------------------------------
> modprobe/1942 is trying to acquire lock:
> (&anon_vma->lock){--..}, at: [<c10609cf>] anon_vma_link+0x1d/0xc9
> 
> but task is already holding lock:
> (&mm->mmap_sem/1){--..}, at: [<c101e5a0>] copy_process+0xbc6/0x1519
> 
> which lock already depends on the new lock,
> which could lead to circular deadlocks!

hm, this one could perhaps be a real bug. Dave: lockdep complains about 
having observed:

	anon_vma->lock  =>   mm->mmap_sem
	mm->mmap_sem    =>   anon_vma->lock

locking sequences, in the cpufreq code. Is there some special runtime 
behavior that still makes this safe, or is it a real bug?

> stack backtrace:
> <c1003f36> show_trace+0xd/0xf  <c1004449> dump_stack+0x17/0x19
> <c103863e> print_circular_bug_tail+0x59/0x64  <c1038e91>
> __lockdep_acquire+0x848/0xa39
> <c10394be> lockdep_acquire+0x69/0x82  <c11ed759>
> __mutex_lock_slowpath+0xd0/0x347

there's one small detail to improve future lockdep printouts: please set 
CONFIG_STACK_BACKTRACE_COLS=1, so that the backtrace is more readable. 
(i'll change the code to force that when CONFIG_LOCKDEP is enabled)

> BTW I still must revert lockdep-serial.patch - it doesn't compile on 
> my gcc 4.1.1

ok, will check this.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 33/61] lock validator: disable NMI watchdog if CONFIG_LOCKDEP 
  2006-05-29 21:25 ` [patch 33/61] lock validator: disable NMI watchdog if CONFIG_LOCKDEP Ingo Molnar
@ 2006-05-29 22:49   ` Keith Owens
  0 siblings, 0 replies; 319+ messages in thread
From: Keith Owens @ 2006-05-29 22:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton

Ingo Molnar (on Mon, 29 May 2006 23:25:50 +0200) wrote:
>From: Ingo Molnar <mingo@elte.hu>
>
>The NMI watchdog uses spinlocks (notifier chains, etc.),
>so it's not lockdep-safe at the moment.

Fixed in 2.6.17-rc1.  notify_die() uses atomic_notifier_call_chain()
which uses RCU, not spinlocks.


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 22:41   ` Ingo Molnar
@ 2006-05-29 23:09     ` Dave Jones
  2006-05-30  5:45       ` Arjan van de Ven
  2006-05-30  5:52       ` [patch 00/61] ANNOUNCE: lock validator -V1 Michal Piotrowski
  0 siblings, 2 replies; 319+ messages in thread
From: Dave Jones @ 2006-05-29 23:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Piotrowski, linux-kernel, Arjan van de Ven, Andrew Morton

On Tue, May 30, 2006 at 12:41:08AM +0200, Ingo Molnar wrote:

 > > =====================================================
 > > [ BUG: possible circular locking deadlock detected! ]
 > > -----------------------------------------------------
 > > modprobe/1942 is trying to acquire lock:
 > > (&anon_vma->lock){--..}, at: [<c10609cf>] anon_vma_link+0x1d/0xc9
 > > 
 > > but task is already holding lock:
 > > (&mm->mmap_sem/1){--..}, at: [<c101e5a0>] copy_process+0xbc6/0x1519
 > > 
 > > which lock already depends on the new lock,
 > > which could lead to circular deadlocks!
 > 
 > hm, this one could perhaps be a real bug. Dave: lockdep complains about 
 > having observed:
 > 
 > 	anon_vma->lock  =>   mm->mmap_sem
 > 	mm->mmap_sem    =>   anon_vma->lock
 > 
 > locking sequences, in the cpufreq code. Is there some special runtime 
 > behavior that still makes this safe, or is it a real bug?

I'm feeling a bit overwhelmed by the voluminous output of this checker.
Especially as (directly at least) cpufreq doesn't touch vma's, or mmap's.

The first stack trace it shows has us down in the bowels of cpu hotplug,
where we're taking the cpucontrol sem.  The second stack trace shows
us in cpufreq_update_policy taking a per-cpu data->lock semaphore.

Now, I notice this is modprobe triggering this, and this *looks* like
we're loading two modules simultaneously (the first trace is from a
scaling driver like powernow-k8 or the like, whilst the second trace
is from cpufreq-stats).  

How on earth did we get into this situation? module loading is supposed
to be serialised on the module_mutex no ?

It's been a while since a debug patch has sent me in search of paracetamol ;)

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 11/61] lock validator: lockdep: small xfs init_rwsem() cleanup
  2006-05-30  1:33   ` Andrew Morton
@ 2006-05-30  1:32     ` Nathan Scott
  0 siblings, 0 replies; 319+ messages in thread
From: Nathan Scott @ 2006-05-30  1:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, arjan

On Mon, May 29, 2006 at 06:33:41PM -0700, Andrew Morton wrote:
> I'll queue this for mainline, via the XFS tree.

Thanks Andrew, its merged in our tree now.

-- 
Nathan

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 01/61] lock validator: floppy.c irq-release fix
  2006-05-29 21:22 ` [patch 01/61] lock validator: floppy.c irq-release fix Ingo Molnar
@ 2006-05-30  1:32   ` Andrew Morton
  0 siblings, 0 replies; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:22:56 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> floppy.c does alot of irq-unsafe work within floppy_release_irq_and_dma():
> free_irq(), release_region() ... so when executing in irq context, push
> the whole function into keventd.

I seem to remember having issues with this - of the "not yet adequate"
type.  But I forget what they were.  Perhaps we have enough
flush_scheduled_work()s in there now.

We're glad to see you reassuming floppy.c maintenance.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 02/61] lock validator: forcedeth.c fix
  2006-05-29 21:23 ` [patch 02/61] lock validator: forcedeth.c fix Ingo Molnar
@ 2006-05-30  1:33   ` Andrew Morton
  2006-05-31  5:40     ` Manfred Spraul
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, Ayaz Abdulla, Manfred Spraul

On Mon, 29 May 2006 23:23:13 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> nv_do_nic_poll() is called from timer softirqs, which has interrupts
> enabled, but np->lock might also be taken by some other interrupt
> context.

But the driver does disable_irq(), so I'd say this was a false-positive.

And afaict this is not a timer handler - it's a poll_controller handler
(although maybe that get called from timer handler somewhere?)

That being said, doing disable_irq() from a poll_controller handler is
downright scary.

Anwyay, I'll tentatively mark this as a lockdep workaround, not a bugfix.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 03/61] lock validator: sound/oss/emu10k1/midi.c cleanup
  2006-05-29 21:23 ` [patch 03/61] lock validator: sound/oss/emu10k1/midi.c cleanup Ingo Molnar
@ 2006-05-30  1:33   ` Andrew Morton
  2006-05-30 10:51     ` Takashi Iwai
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, Jaroslav Kysela, Takashi Iwai

On Mon, 29 May 2006 23:23:19 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> move the __attribute outside of the DEFINE_SPINLOCK() section.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  sound/oss/emu10k1/midi.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux/sound/oss/emu10k1/midi.c
> ===================================================================
> --- linux.orig/sound/oss/emu10k1/midi.c
> +++ linux/sound/oss/emu10k1/midi.c
> @@ -45,7 +45,7 @@
>  #include "../sound_config.h"
>  #endif
>  
> -static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));
> +static __attribute((unused)) DEFINE_SPINLOCK(midi_spinlock);
>  
>  static void init_midi_hdr(struct midi_hdr *midihdr)
>  {

I'll tag this as for-mainline-via-alsa.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond)
  2006-05-29 21:23 ` [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond) Ingo Molnar
@ 2006-05-30  1:33   ` Andrew Morton
  2006-05-30 17:38     ` Steven Rostedt
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:23:28 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> add WARN_ON_ONCE(cond) to print once-per-bootup messages.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  include/asm-generic/bug.h |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> Index: linux/include/asm-generic/bug.h
> ===================================================================
> --- linux.orig/include/asm-generic/bug.h
> +++ linux/include/asm-generic/bug.h
> @@ -44,4 +44,17 @@
>  # define WARN_ON_SMP(x)			do { } while (0)
>  #endif
>  
> +#define WARN_ON_ONCE(condition)				\
> +({							\
> +	static int __warn_once = 1;			\
> +	int __ret = 0;					\
> +							\
> +	if (unlikely(__warn_once && (condition))) {	\
> +		__warn_once = 0;			\
> +		WARN_ON(1);				\
> +		__ret = 1;				\
> +	}						\
> +	__ret;						\
> +})
> +
>  #endif

I'll queue this for mainline inclusion.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 06/61] lock validator: add __module_address() method
  2006-05-29 21:23 ` [patch 06/61] lock validator: add __module_address() method Ingo Molnar
@ 2006-05-30  1:33   ` Andrew Morton
  2006-05-30 17:45     ` Steven Rostedt
  2006-06-23  8:38     ` Ingo Molnar
  0 siblings, 2 replies; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:23:33 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> +/*
> + * Is this a valid module address? We don't grab the lock.
> + */
> +int __module_address(unsigned long addr)
> +{
> +	struct module *mod;
> +
> +	list_for_each_entry(mod, &modules, list)
> +		if (within(addr, mod->module_core, mod->core_size))
> +			return 1;
> +	return 0;
> +}

Returns a boolean.

>  /* Is this a valid kernel address?  We don't grab the lock: we are oopsing. */
>  struct module *__module_text_address(unsigned long addr)

But this returns a module*.

I'd suggest that __module_address() should do the same thing, from an API neatness
POV.  Although perhaps that's mot very useful if we didn't take a ref on the returned
object (but module_text_address() doesn't either).

Also, the name's a bit misleading - it sounds like it returns the address
of a module or something.  __module_any_address() would be better, perhaps?

Also, how come this doesn't need modlist_lock()?


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 07/61] lock validator: better lock debugging
  2006-05-29 21:23 ` [patch 07/61] lock validator: better lock debugging Ingo Molnar
@ 2006-05-30  1:33   ` Andrew Morton
  2006-06-23 10:25     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:23:37 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> --- /dev/null
> +++ linux/include/linux/debug_locks.h
> @@ -0,0 +1,62 @@
> +#ifndef __LINUX_DEBUG_LOCKING_H
> +#define __LINUX_DEBUG_LOCKING_H
> +
> +extern int debug_locks;
> +extern int debug_locks_silent;
> +
> +/*
> + * Generic 'turn off all lock debugging' function:
> + */
> +extern int debug_locks_off(void);
> +
> +/*
> + * In the debug case we carry the caller's instruction pointer into
> + * other functions, but we dont want the function argument overhead
> + * in the nondebug case - hence these macros:
> + */
> +#define _RET_IP_		(unsigned long)__builtin_return_address(0)
> +#define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })
> +
> +#define DEBUG_WARN_ON(c)						\
> +({									\
> +	int __ret = 0;							\
> +									\
> +	if (unlikely(c)) {						\
> +		if (debug_locks_off())					\
> +			WARN_ON(1);					\
> +		__ret = 1;						\
> +	}								\
> +	__ret;								\
> +})

Either the name of this thing is too generic, or we _make_ it generic, in
which case it's in the wrong header file.

> +#ifdef CONFIG_SMP
> +# define SMP_DEBUG_WARN_ON(c)			DEBUG_WARN_ON(c)
> +#else
> +# define SMP_DEBUG_WARN_ON(c)			do { } while (0)
> +#endif

Probably ditto.



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 11/61] lock validator: lockdep: small xfs init_rwsem() cleanup
  2006-05-29 21:23 ` [patch 11/61] lock validator: lockdep: small xfs init_rwsem() cleanup Ingo Molnar
@ 2006-05-30  1:33   ` Andrew Morton
  2006-05-30  1:32     ` Nathan Scott
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, Nathan Scott

On Mon, 29 May 2006 23:23:59 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> nit_rwsem() has no return value. This is not a problem if init_rwsem()
> is a function, but it's a problem if it's a do { ... } while (0) macro.
> (which lockdep introduces)
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  fs/xfs/linux-2.6/mrlock.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux/fs/xfs/linux-2.6/mrlock.h
> ===================================================================
> --- linux.orig/fs/xfs/linux-2.6/mrlock.h
> +++ linux/fs/xfs/linux-2.6/mrlock.h
> @@ -28,7 +28,7 @@ typedef struct {
>  } mrlock_t;
>  
>  #define mrinit(mrp, name)	\
> -	( (mrp)->mr_writer = 0, init_rwsem(&(mrp)->mr_lock) )
> +	do { (mrp)->mr_writer = 0; init_rwsem(&(mrp)->mr_lock); } while (0)
>  #define mrlock_init(mrp, t,n,s)	mrinit(mrp, n)
>  #define mrfree(mrp)		do { } while (0)
>  #define mraccess(mrp)		mraccessf(mrp, 0)

I'll queue this for mainline, via the XFS tree.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 12/61] lock validator: beautify x86_64 stacktraces
  2006-05-29 21:24 ` [patch 12/61] lock validator: beautify x86_64 stacktraces Ingo Molnar
@ 2006-05-30  1:33   ` Andrew Morton
  0 siblings, 0 replies; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:24:05 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> beautify x86_64 stacktraces to be more readable.

One reject fixed due to the backtrace changes in Andi's tree.

I'll get all this compiling, but we'll need to review and test the end
result please, make sure that it all landed OK.


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 15/61] lock validator: x86_64: use stacktrace to generate backtraces
  2006-05-29 21:24 ` [patch 15/61] lock validator: x86_64: use stacktrace to generate backtraces Ingo Molnar
@ 2006-05-30  1:33   ` Andrew Morton
  0 siblings, 0 replies; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:24:19 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> this switches x86_64 to use the stacktrace infrastructure when generating
> backtrace printouts, if CONFIG_FRAME_POINTER=y. (This patch will go away
> once the dwarf2 stackframe parser in -mm goes upstream.)

yup, I dropped it.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 16/61] lock validator: fown locking workaround
  2006-05-29 21:24 ` [patch 16/61] lock validator: fown locking workaround Ingo Molnar
@ 2006-05-30  1:34   ` Andrew Morton
  2006-06-23  9:10     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:24:23 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> temporary workaround for the lock validator: make all uses of
> f_owner.lock irq-safe. (The real solution will be to express to
> the lock validator that f_owner.lock rules are to be generated
> per-filesystem.)

This description forgot to tell us what problem is being worked around.

This patch is a bit of a show-stopper.  How hard-n-bad is the real fix?

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 17/61] lock validator: sk_callback_lock workaround
  2006-05-29 21:24 ` [patch 17/61] lock validator: sk_callback_lock workaround Ingo Molnar
@ 2006-05-30  1:34   ` Andrew Morton
  2006-06-23  9:19     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:24:27 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> temporary workaround for the lock validator: make all uses of
> sk_callback_lock softirq-safe. (The real solution will be to
> express to the lock validator that sk_callback_lock rules are
> to be generated per-address-family.)

Ditto.  What's the actual problem being worked around here, and how's the
real fix shaping up?



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 18/61] lock validator: irqtrace: core
  2006-05-29 21:24 ` [patch 18/61] lock validator: irqtrace: core Ingo Molnar
@ 2006-05-30  1:34   ` Andrew Morton
  2006-06-23 10:42     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:24:32 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> accurate hard-IRQ-flags state tracing. This allows us to attach
> extra functionality to IRQ flags on/off events (such as trace-on/off).

That's a fairly skimpy description of some fairly substantial new
infrastructure.


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 21/61] lock validator: lockdep: add local_irq_enable_in_hardirq() API.
  2006-05-29 21:24 ` [patch 21/61] lock validator: lockdep: add local_irq_enable_in_hardirq() API Ingo Molnar
@ 2006-05-30  1:34   ` Andrew Morton
  2006-06-23  9:28     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:24:52 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> introduce local_irq_enable_in_hardirq() API. It is currently
> aliased to local_irq_enable(), hence has no functional effects.
> 
> This API will be used by lockdep, but even without lockdep
> this will better document places in the kernel where a hardirq
> context enables hardirqs.

If we expect people to use this then we'd best whack a comment over it.

Also, trace_irqflags.h doesn't seem an appropriate place for it to live.

I trust all the affected files are including trace_irqflags.h by some
means.  Hopefully a _reliable_ means.  No doubt I'm about to find out ;)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 22/61] lock validator:  add per_cpu_offset()
  2006-05-29 21:24 ` [patch 22/61] lock validator: add per_cpu_offset() Ingo Molnar
@ 2006-05-30  1:34   ` Andrew Morton
  2006-06-23  9:30     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, arjan, Luck, Tony, Benjamin Herrenschmidt,
	Paul Mackerras, Martin Schwidefsky, David S. Miller

On Mon, 29 May 2006 23:24:57 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> add the per_cpu_offset() generic method. (used by the lock validator)
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  include/asm-generic/percpu.h |    2 ++
>  include/asm-x86_64/percpu.h  |    2 ++
>  2 files changed, 4 insertions(+)
> 
> Index: linux/include/asm-generic/percpu.h
> ===================================================================
> --- linux.orig/include/asm-generic/percpu.h
> +++ linux/include/asm-generic/percpu.h
> @@ -7,6 +7,8 @@
>  
>  extern unsigned long __per_cpu_offset[NR_CPUS];
>  
> +#define per_cpu_offset(x) (__per_cpu_offset[x])
> +
>  /* Separate out the type, so (int[3], foo) works. */
>  #define DEFINE_PER_CPU(type, name) \
>      __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
> Index: linux/include/asm-x86_64/percpu.h
> ===================================================================
> --- linux.orig/include/asm-x86_64/percpu.h
> +++ linux/include/asm-x86_64/percpu.h
> @@ -14,6 +14,8 @@
>  #define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
>  #define __my_cpu_offset() read_pda(data_offset)
>  
> +#define per_cpu_offset(x) (__per_cpu_offset(x))
> +
>  /* Separate out the type, so (int[3], foo) works. */
>  #define DEFINE_PER_CPU(type, name) \
>      __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name

I can tell just looking at it that it'll break various builds.I assume that
things still happen to compile because you're presently using it in code
which those architectures don't presently compile.

But introducing a "generic" function invites others to start using it.  And
they will, and they'll ship code which "works" but is broken, because they
only tested it on x86 and x86_64.

I'll queue the needed fixups - please check it.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (61 preceding siblings ...)
  2006-05-29 22:28 ` [patch 00/61] ANNOUNCE: lock validator -V1 Michal Piotrowski
@ 2006-05-30  1:35 ` Andrew Morton
  2006-06-23  9:41   ` Ingo Molnar
  2006-05-30  4:52 ` Mike Galbraith
                   ` (10 subsequent siblings)
  73 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:21:09 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> We are pleased to announce the first release of the "lock dependency 
> correctness validator" kernel debugging feature

What are the runtime speed and space costs of enabling this?

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 27/61] lock validator: prove spinlock/rwlock locking correctness
  2006-05-29 21:25 ` [patch 27/61] lock validator: prove spinlock/rwlock " Ingo Molnar
@ 2006-05-30  1:35   ` Andrew Morton
  2006-06-23 10:44     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:25:23 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> +# define spin_lock_init_key(lock, key)				\
> +	__spin_lock_init((lock), #lock, key)

erk.  This adds a whole new layer of obfuscation on top of the existing
spinlock header files.  You already need to run the preprocessor and
disassembler to even work out which flavour you're presently using.

Ho hum.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 34/61] lock validator: special locking: bdev
  2006-05-29 21:25 ` [patch 34/61] lock validator: special locking: bdev Ingo Molnar
@ 2006-05-30  1:35   ` Andrew Morton
  2006-05-30  5:13     ` Arjan van de Ven
                       ` (2 more replies)
  0 siblings, 3 replies; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:25:54 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> teach special (recursive) locking code to the lock validator. Has no
> effect on non-lockdep kernels.
> 

There's no description here of the problem which is being worked around. 
This leaves everyone in the dark.

> +static int
> +blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags)
> +{
> +	/*
> +	 * This crockload is due to bad choice of ->open() type.
> +	 * It will go away.
> +	 * For now, block device ->open() routine must _not_
> +	 * examine anything in 'inode' argument except ->i_rdev.
> +	 */
> +	struct file fake_file = {};
> +	struct dentry fake_dentry = {};
> +	fake_file.f_mode = mode;
> +	fake_file.f_flags = flags;
> +	fake_file.f_dentry = &fake_dentry;
> +	fake_dentry.d_inode = bdev->bd_inode;
> +
> +	return do_open(bdev, &fake_file, BD_MUTEX_WHOLE);
> +}

"crock" is a decent description ;)

How long will this live, and what will the fix look like?

(This is all a bit of a pain - carrying these patches in -mm will require
some effort, and they're not ready to go yet, which will lengthen the pain
arbitrarily).


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 36/61] lock validator: special locking: serial
  2006-05-29 21:26 ` [patch 36/61] lock validator: special locking: serial Ingo Molnar
@ 2006-05-30  1:35   ` Andrew Morton
  2006-06-23  9:49     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, Russell King

On Mon, 29 May 2006 23:26:04 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> teach special (dual-initialized) locking code to the lock validator.
> Has no effect on non-lockdep kernels.
> 

This isn't an adequate description of the problem which this patch is
solving, IMO.

I _assume_ the validator is using the instruction pointer of the
spin_lock_init() site (or the file-n-line) as the lock's identifier.  Or
something?

> 
> Index: linux/drivers/serial/serial_core.c
> ===================================================================
> --- linux.orig/drivers/serial/serial_core.c
> +++ linux/drivers/serial/serial_core.c
> @@ -1849,6 +1849,12 @@ static const struct baud_rates baud_rate
>  	{      0, B38400  }
>  };
>  
> +/*
> + * lockdep: port->lock is initialized in two places, but we
> + *          want only one lock-type:
> + */
> +static struct lockdep_type_key port_lock_key;
> +
>  /**
>   *	uart_set_options - setup the serial console parameters
>   *	@port: pointer to the serial ports uart_port structure
> @@ -1869,7 +1875,7 @@ uart_set_options(struct uart_port *port,
>  	 * Ensure that the serial console lock is initialised
>  	 * early.
>  	 */
> -	spin_lock_init(&port->lock);
> +	spin_lock_init_key(&port->lock, &port_lock_key);
>  
>  	memset(&termios, 0, sizeof(struct termios));
>  
> @@ -2255,7 +2261,7 @@ int uart_add_one_port(struct uart_driver
>  	 * initialised.
>  	 */
>  	if (!(uart_console(port) && (port->cons->flags & CON_ENABLED)))
> -		spin_lock_init(&port->lock);
> +		spin_lock_init_key(&port->lock, &port_lock_key);
>  
>  	uart_configure_port(drv, state, port);
>  

Is there a cleaner way of doing this?

Perhaps write a new helper function which initialises the spinlock, call
that?  Rather than open-coding lockdep stuff?


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 37/61] lock validator: special locking: dcache
  2006-05-29 21:26 ` [patch 37/61] lock validator: special locking: dcache Ingo Molnar
@ 2006-05-30  1:35   ` Andrew Morton
  2006-05-30 20:51     ` Steven Rostedt
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:26:08 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> teach special (recursive) locking code to the lock validator. Has no
> effect on non-lockdep kernels.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  fs/dcache.c            |    6 +++---
>  include/linux/dcache.h |   12 ++++++++++++
>  2 files changed, 15 insertions(+), 3 deletions(-)
> 
> Index: linux/fs/dcache.c
> ===================================================================
> --- linux.orig/fs/dcache.c
> +++ linux/fs/dcache.c
> @@ -1380,10 +1380,10 @@ void d_move(struct dentry * dentry, stru
>  	 */
>  	if (target < dentry) {
>  		spin_lock(&target->d_lock);
> -		spin_lock(&dentry->d_lock);
> +		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
>  	} else {
>  		spin_lock(&dentry->d_lock);
> -		spin_lock(&target->d_lock);
> +		spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NESTED);
>  	}
>  
>  	/* Move the dentry to the target hash queue, if on different bucket */
> @@ -1420,7 +1420,7 @@ already_unhashed:
>  	}
>  
>  	list_add(&dentry->d_u.d_child, &dentry->d_parent->d_subdirs);
> -	spin_unlock(&target->d_lock);
> +	spin_unlock_non_nested(&target->d_lock);
>  	fsnotify_d_move(dentry);
>  	spin_unlock(&dentry->d_lock);
>  	write_sequnlock(&rename_lock);
> Index: linux/include/linux/dcache.h
> ===================================================================
> --- linux.orig/include/linux/dcache.h
> +++ linux/include/linux/dcache.h
> @@ -114,6 +114,18 @@ struct dentry {
>  	unsigned char d_iname[DNAME_INLINE_LEN_MIN];	/* small names */
>  };
>  
> +/*
> + * dentry->d_lock spinlock nesting types:
> + *
> + * 0: normal
> + * 1: nested
> + */
> +enum dentry_d_lock_type
> +{
> +	DENTRY_D_LOCK_NORMAL,
> +	DENTRY_D_LOCK_NESTED
> +};
> +
>  struct dentry_operations {
>  	int (*d_revalidate)(struct dentry *, struct nameidata *);
>  	int (*d_hash) (struct dentry *, struct qstr *);

DENTRY_D_LOCK_NORMAL isn't used anywhere.


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 46/61] lock validator: special locking: slab
  2006-05-29 21:26 ` [patch 46/61] lock validator: special locking: slab Ingo Molnar
@ 2006-05-30  1:35   ` Andrew Morton
  2006-06-23  9:54     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:26:49 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> +		/*
> +		 * Do not assume that spinlocks can be initialized via memcpy:
> +		 */

I'd view that as something which should be fixed in mainline.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 50/61] lock validator: special locking: hrtimer.c
  2006-05-29 21:27 ` [patch 50/61] lock validator: special locking: hrtimer.c Ingo Molnar
@ 2006-05-30  1:35   ` Andrew Morton
  2006-06-23 10:04     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:27:09 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> teach special (recursive) locking code to the lock validator. Has no
> effect on non-lockdep kernels.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  kernel/hrtimer.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux/kernel/hrtimer.c
> ===================================================================
> --- linux.orig/kernel/hrtimer.c
> +++ linux/kernel/hrtimer.c
> @@ -786,7 +786,7 @@ static void __devinit init_hrtimers_cpu(
>  	int i;
>  
>  	for (i = 0; i < MAX_HRTIMER_BASES; i++, base++)
> -		spin_lock_init(&base->lock);
> +		spin_lock_init_static(&base->lock);
>  }
>  

Perhaps the validator core's implementation of spin_lock_init() could look
at the address and work out if it's within the static storage sections.


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 51/61] lock validator: special locking: sock_lock_init()
  2006-05-29 21:27 ` [patch 51/61] lock validator: special locking: sock_lock_init() Ingo Molnar
@ 2006-05-30  1:36   ` Andrew Morton
  2006-06-23 10:06     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, David S. Miller

On Mon, 29 May 2006 23:27:14 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> teach special (multi-initialized, per-address-family) locking code to the
> lock validator. Has no effect on non-lockdep kernels.
> 
> Index: linux/include/net/sock.h
> ===================================================================
> --- linux.orig/include/net/sock.h
> +++ linux/include/net/sock.h
> @@ -81,12 +81,6 @@ typedef struct {
>  	wait_queue_head_t	wq;
>  } socket_lock_t;
>  
> -#define sock_lock_init(__sk) \
> -do {	spin_lock_init(&((__sk)->sk_lock.slock)); \
> -	(__sk)->sk_lock.owner = NULL; \
> -	init_waitqueue_head(&((__sk)->sk_lock.wq)); \
> -} while(0)
> -
>  struct sock;
>  struct proto;
>  
> Index: linux/net/core/sock.c
> ===================================================================
> --- linux.orig/net/core/sock.c
> +++ linux/net/core/sock.c
> @@ -739,6 +739,27 @@ lenout:
>    	return 0;
>  }
>  
> +/*
> + * Each address family might have different locking rules, so we have
> + * one slock key per address family:
> + */
> +static struct lockdep_type_key af_family_keys[AF_MAX];
> +
> +static void noinline sock_lock_init(struct sock *sk)
> +{
> +	spin_lock_init_key(&sk->sk_lock.slock, af_family_keys + sk->sk_family);
> +	sk->sk_lock.owner = NULL;
> +	init_waitqueue_head(&sk->sk_lock.wq);
> +}

OK, no code outside net/core/sock.c uses sock_lock_init().

Hopefully the same is true of out-of-tree code...

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 52/61] lock validator: special locking: af_unix
  2006-05-29 21:27 ` [patch 52/61] lock validator: special locking: af_unix Ingo Molnar
@ 2006-05-30  1:36   ` Andrew Morton
  2006-06-23 10:07     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, David S. Miller

On Mon, 29 May 2006 23:27:19 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> teach special (recursive) locking code to the lock validator. Has no
> effect on non-lockdep kernels.
> 
> (includes workaround for sk_receive_queue.lock, which is currently
> treated globally by the lock validator, but which be switched to
> per-address-family locking rules.)
> 
> ...
>
>  
> -			spin_lock(&sk->sk_receive_queue.lock);
> +			spin_lock_bh(&sk->sk_receive_queue.lock);

Again, a bit of a show-stopper.  Will the real fix be far off?

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 55/61] lock validator: special locking: sb->s_umount
  2006-05-29 21:27 ` [patch 55/61] lock validator: special locking: sb->s_umount Ingo Molnar
@ 2006-05-30  1:36   ` Andrew Morton
  2006-06-23 10:55     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:27:32 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> workaround for special sb->s_umount locking rule.
> 
> s_umount gets held across a series of lock dropping and releasing
> in prune_one_dentry(), so i changed the order, at the risk of
> introducing a umount race. FIXME.
> 
> i think a better fix would be to do the unlocks as _non_nested in
> prune_one_dentry(), and to do the up_read() here as
> an up_read_non_nested() as well?
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  fs/dcache.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux/fs/dcache.c
> ===================================================================
> --- linux.orig/fs/dcache.c
> +++ linux/fs/dcache.c
> @@ -470,8 +470,9 @@ static void prune_dcache(int count, stru
>  		s_umount = &dentry->d_sb->s_umount;
>  		if (down_read_trylock(s_umount)) {
>  			if (dentry->d_sb->s_root != NULL) {
> -				prune_one_dentry(dentry);
> +// lockdep hack: do this better!
>  				up_read(s_umount);
> +				prune_one_dentry(dentry);
>  				continue;

argh, you broke my kernel!

I'll whack some ifdefs in here so it's only known-broken if CONFIG_LOCKDEP.

Again, we'd need the real fix here.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 59/61] lock validator: special locking: xfrm
  2006-05-29 21:27 ` [patch 59/61] lock validator: special locking: xfrm Ingo Molnar
@ 2006-05-30  1:36   ` Andrew Morton
  0 siblings, 0 replies; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, David S. Miller, Patrick McHardy

On Mon, 29 May 2006 23:27:51 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> 
> teach special (non-nested) unlocking code to the lock validator. Has no
> effect on non-lockdep kernels.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  net/xfrm/xfrm_policy.c |    2 +-
>  net/xfrm/xfrm_state.c  |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux/net/xfrm/xfrm_policy.c
> ===================================================================
> --- linux.orig/net/xfrm/xfrm_policy.c
> +++ linux/net/xfrm/xfrm_policy.c
> @@ -1308,7 +1308,7 @@ static struct xfrm_policy_afinfo *xfrm_p
>  	afinfo = xfrm_policy_afinfo[family];
>  	if (likely(afinfo != NULL))
>  		read_lock(&afinfo->lock);
> -	read_unlock(&xfrm_policy_afinfo_lock);
> +	read_unlock_non_nested(&xfrm_policy_afinfo_lock);
>  	return afinfo;
>  }
>  
> Index: linux/net/xfrm/xfrm_state.c
> ===================================================================
> --- linux.orig/net/xfrm/xfrm_state.c
> +++ linux/net/xfrm/xfrm_state.c
> @@ -1105,7 +1105,7 @@ static struct xfrm_state_afinfo *xfrm_st
>  	afinfo = xfrm_state_afinfo[family];
>  	if (likely(afinfo != NULL))
>  		read_lock(&afinfo->lock);
> -	read_unlock(&xfrm_state_afinfo_lock);
> +	read_unlock_non_nested(&xfrm_state_afinfo_lock);
>  	return afinfo;
>  }
>  

I got a bunch of rejects here due to changes in git-net.patch.  Please
verify the result.  It could well be wrong (the changes in there are odd).


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 61/61] lock validator: enable lock validator in Kconfig
  2006-05-29 21:28 ` [patch 61/61] lock validator: enable lock validator in Kconfig Ingo Molnar
@ 2006-05-30  1:36   ` Andrew Morton
  2006-05-30 13:33   ` Roman Zippel
  1 sibling, 0 replies; 319+ messages in thread
From: Andrew Morton @ 2006-05-30  1:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Mon, 29 May 2006 23:28:12 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> offer the following lock validation options:
> 
>  CONFIG_PROVE_SPIN_LOCKING
>  CONFIG_PROVE_RW_LOCKING
>  CONFIG_PROVE_MUTEX_LOCKING
>  CONFIG_PROVE_RWSEM_LOCKING
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  lib/Kconfig.debug |  167 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 167 insertions(+)
> 
> Index: linux/lib/Kconfig.debug
> ===================================================================
> --- linux.orig/lib/Kconfig.debug
> +++ linux/lib/Kconfig.debug
> @@ -184,6 +184,173 @@ config DEBUG_SPINLOCK
>  	  best used in conjunction with the NMI watchdog so that spinlock
>  	  deadlocks are also debuggable.
>  
> +config PROVE_SPIN_LOCKING
> +	bool "Prove spin-locking correctness"
> +	default y

err, I think I'll be sticking a `depends on X86' in there, thanks very
much.  I'd prefer that you be the first to test it ;)


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (62 preceding siblings ...)
  2006-05-30  1:35 ` Andrew Morton
@ 2006-05-30  4:52 ` Mike Galbraith
  2006-05-30  6:20   ` Arjan van de Ven
                     ` (2 more replies)
  2006-05-30  9:14 ` Benoit Boissinot
                   ` (9 subsequent siblings)
  73 siblings, 3 replies; 319+ messages in thread
From: Mike Galbraith @ 2006-05-30  4:52 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton

On Mon, 2006-05-29 at 23:21 +0200, Ingo Molnar wrote:
> The easiest way to try lockdep on a testbox is to apply the combo patch 
> to 2.6.17-rc4-mm3. The patch order is:
> 
>   http://kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.17-rc4.tar.bz2
>   http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17-rc4/2.6.17-rc4-mm3/2.6.17-rc4-mm3.bz2
>   http://redhat.com/~mingo/lockdep-patches/lockdep-combo.patch
> 
> do 'make oldconfig' and accept all the defaults for new config options - 
> reboot into the kernel and if everything goes well it should boot up 
> fine and you should have /proc/lockdep and /proc/lockdep_stats files.

Darn.  It said all tests passed, then oopsed.

(have .config all gzipped up if you want it)

	-Mike

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
b103a872
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
last sysfs file:
Modules linked in:
CPU:    0
EIP:    0060:[<b103a872>]    Not tainted VLI
EFLAGS: 00010083   (2.6.17-rc4-mm3-smp #157)
EIP is at count_matching_names+0x5b/0xa2
eax: b15074a8   ebx: 00000000   ecx: b165c430   edx: b165b320
esi: 00000000   edi: b1410423   ebp: dfe20e74   esp: dfe20e68
ds: 007b   es: 007b   ss: 0068
Process idle (pid: 1, threadinfo=dfe20000 task=effc1470)
Stack: 000139b0 b165c430 00000000 dfe20ec8 b103d442 b1797a6c b1797a64 effc1470
       b1797a64 00000004 b1797a50 00000000 b15074a8 effc1470 dfe20ef8 b106da88
       b169d0a8 b1797a64 dfe20f52 0000000a b106dec7 00000282 dfe20000 00000000
Call Trace:
 <b1003d73> show_stack_log_lvl+0x9e/0xc3  <b1003f80> show_registers+0x1ac/0x237
 <b100413d> die+0x132/0x2fb  <b101a083> do_page_fault+0x5cf/0x656
 <b10038a7> error_code+0x4f/0x54  <b103d442> __lockdep_acquire+0xa6f/0xc32
 <b103d9f8> lockdep_acquire+0x61/0x77  <b13d27f3> _spin_lock+0x2e/0x42
 <b102b03a> register_sysctl_table+0x4e/0xaa  <b15a463a> sched_init_smp+0x411/0x41e
 <b100035d> init+0xbd/0x2c6  <b1001005> kernel_thread_helper+0x5/0xb
Code: 92 50 b1 74 5d 8b 41 10 2b 41 14 31 db 39 42 10 75 0d eb 53 8b 41 10 2b 41 14 3b 42 10 74 48 8b b2 a0 00 00 00 8b b9 a0 00 00 00 <ac> ae 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 85 c0 75 0b 8b

1151            list_for_each_entry(type, &all_lock_types, lock_entry) {
1152                    if (new_type->key - new_type->subtype == type->key)
1153                            return type->name_version;
1154                    if (!strcmp(type->name, new_type->name))  <--kaboom
1155                            count = max(count, type->name_version);
1156            }

EIP: [<b103a872>] count_matching_names+0x5b/0xa2 SS:ESP 0068:dfe20e68
 Kernel panic - not syncing: Attempted to kill init!
 BUG: warning at arch/i386/kernel/smp.c:537/smp_call_function()
 <b1003dd2> show_trace+0xd/0xf  <b10044c0> dump_stack+0x17/0x19
 <b10129ff> smp_call_function+0x11d/0x122  <b1012a22> smp_send_stop+0x1e/0x31
 <b1022f4b> panic+0x60/0x1d5  <b10267fa> do_exit+0x613/0x94f
 <b1004306> do_trap+0x0/0x9e  <b101a083> do_page_fault+0x5cf/0x656
 <b10038a7> error_code+0x4f/0x54  <b103d442> __lockdep_acquire+0xa6f/0xc32
 <b103d9f8> lockdep_acquire+0x61/0x77  <b13d27f3> _spin_lock+0x2e/0x42
 <b102b03a> register_sysctl_table+0x4e/0xaa  <b15a463a> sched_init_smp+0x411/0x41e
 <b100035d> init+0xbd/0x2c6  <b1001005> kernel_thread_helper+0x5/0xb
BUG: NMI Watchdog detected LOCKUP on CPU1, eip b103cc64, registers:
Modules linked in:
CPU:    1
EIP:    0060:[<b103cc64>]    Not tainted VLI
EFLAGS: 00000086   (2.6.17-rc4-mm3-smp #157)
EIP is at __lockdep_acquire+0x291/0xc32
eax: 00000000   ebx: 000001d7   ecx: b16bf938   edx: 00000000
esi: 00000000   edi: b16bf938   ebp: effc4ea4   esp: effc4e58
ds: 007b   es: 007b   ss: 0068
Process idle (pid: 0, threadinfo=effc4000 task=effc0a50)
Stack: b101d4ce 00000000 effc0fb8 000001d7 effc0a50 b16bf938 00000000 b29b38c8
       effc0a50 effc0fb8 00000001 00000000 00000005 00000000 00000000 00000000
       00000096 effc4000 00000000 effc4ecc b103d9f8 00000000 00000001 b101d4ce
Call Trace:
 <b1003d73> show_stack_log_lvl+0x9e/0xc3  <b1003f80> show_registers+0x1ac/0x237
 <b10050d9> die_nmi+0x93/0xeb  <b1015af1> nmi_watchdog_tick+0xff/0x20e
 <b1004542> do_nmi+0x80/0x249  <b1003912> nmi_stack_correct+0x1d/0x22
 <b103d9f8> lockdep_acquire+0x61/0x77  <b13d27f3> _spin_lock+0x2e/0x42
 <b101d4ce> scheduler_tick+0xd0/0x381  <b102d47e> update_process_times+0x42/0x61
 <b1014f9f> smp_apic_timer_interrupt+0x67/0x78  <b10037ba> apic_timer_interrupt+0x2a/0x30
 <b1001e5b> cpu_idle+0x71/0xb8  <b1013c6e> start_secondary+0x3e5/0x46b
 <00000000> _stext+0x4efffd68/0x8  <effc4fb4> 0xeffc4fb4
Code: 18 01 90 39 c7 0f 84 2e 02 00 00 8b 50 0c 31 f2 8b 40 08 31 d8 09 c2 75 e2 f0 ff 05 08 8a 61 b1 f0 fe 0d e4 92 50 b1 79 0d f3 90 <80> 3d e4 92 50 b1 00 7e f5 eb ea 8b 55 d4 8b b2 64 05 00 00 85
console shuts up ...



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 34/61] lock validator: special locking: bdev
  2006-05-30  1:35   ` Andrew Morton
@ 2006-05-30  5:13     ` Arjan van de Ven
  2006-05-30  9:58     ` Al Viro
  2006-05-30 10:45     ` Arjan van de Ven
  2 siblings, 0 replies; 319+ messages in thread
From: Arjan van de Ven @ 2006-05-30  5:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel

On Mon, 2006-05-29 at 18:35 -0700, Andrew Morton wrote:
> On Mon, 29 May 2006 23:25:54 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > From: Ingo Molnar <mingo@elte.hu>
> > 
> > teach special (recursive) locking code to the lock validator. Has no
> > effect on non-lockdep kernels.
> > 
> 
> There's no description here of the problem which is being worked around. 
> This leaves everyone in the dark.

it's not really a workaround, it's a "separate the uses" thing. The real
problem is an inherent hierarchy between "disk" and "partition". Where
lots of code assumes you can first take the disk mutex, and then the
partition mutex, and never deadlock. This patch basically separates the
"get me the disk" versus "get me the partition" uses.


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 22:28 ` [patch 00/61] ANNOUNCE: lock validator -V1 Michal Piotrowski
  2006-05-29 22:41   ` Ingo Molnar
@ 2006-05-30  5:20   ` Arjan van de Ven
  1 sibling, 0 replies; 319+ messages in thread
From: Arjan van de Ven @ 2006-05-30  5:20 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: Ingo Molnar, linux-kernel, Andrew Morton, Dave Jones

On Tue, 2006-05-30 at 00:28 +0200, Michal Piotrowski wrote:
> On 29/05/06, Ingo Molnar <mingo@elte.hu> wrote:
> > We are pleased to announce the first release of the "lock dependency
> > correctness validator" kernel debugging feature, which can be downloaded
> > from:
> >
> >   http://redhat.com/~mingo/lockdep-patches/
> >
> [snip]
> 
> I get this while loading cpufreq modules

can you enable CONFIG_KALLSYMS_ALL ? that will give a more accurate
debug output...


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 23:09     ` Dave Jones
@ 2006-05-30  5:45       ` Arjan van de Ven
  2006-05-30  6:07         ` Michal Piotrowski
                           ` (2 more replies)
  2006-05-30  5:52       ` [patch 00/61] ANNOUNCE: lock validator -V1 Michal Piotrowski
  1 sibling, 3 replies; 319+ messages in thread
From: Arjan van de Ven @ 2006-05-30  5:45 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andrew Morton, linux-kernel, Michal Piotrowski, Ingo Molnar


> I'm feeling a bit overwhelmed by the voluminous output of this checker.
> Especially as (directly at least) cpufreq doesn't touch vma's, or mmap's.

the reporter doesn't have CONFIG_KALLSYMS_ALL enabled which gives
sometimes misleading backtraces (should lockdep just enable KALLSYMS_ALL
to get more useful bugreports?)

the problem is this, there are 2 scenarios in this bug:

One
---
store_scaling_governor takes policy->lock and then calls __cpufreq_set_policy
__cpufreq_set_policy calls __cpufreq_governor
__cpufreq_governor  calls __cpufreq_driver_target via cpufreq_governor_performance
__cpufreq_driver_target calls lock_cpu_hotplug() (which takes the hotplug lock)


Two
---
cpufreq_stats_init lock_cpu_hotplug() and then calls cpufreq_stat_cpu_callback
cpufreq_stat_cpu_callback calls cpufreq_update_policy
cpufreq_update_policy takes the policy->lock


so this looks like a real honest AB-BA deadlock to me...



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 23:09     ` Dave Jones
  2006-05-30  5:45       ` Arjan van de Ven
@ 2006-05-30  5:52       ` Michal Piotrowski
  1 sibling, 0 replies; 319+ messages in thread
From: Michal Piotrowski @ 2006-05-30  5:52 UTC (permalink / raw)
  To: Dave Jones, Ingo Molnar, Michal Piotrowski, linux-kernel,
	Arjan van de Ven, Andrew Morton

Hi,

On 30/05/06, Dave Jones <davej@redhat.com> wrote:
> On Tue, May 30, 2006 at 12:41:08AM +0200, Ingo Molnar wrote:
>
>  > > =====================================================
>  > > [ BUG: possible circular locking deadlock detected! ]
>  > > -----------------------------------------------------
>  > > modprobe/1942 is trying to acquire lock:
>  > > (&anon_vma->lock){--..}, at: [<c10609cf>] anon_vma_link+0x1d/0xc9
>  > >
>  > > but task is already holding lock:
>  > > (&mm->mmap_sem/1){--..}, at: [<c101e5a0>] copy_process+0xbc6/0x1519
>  > >
>  > > which lock already depends on the new lock,
>  > > which could lead to circular deadlocks!
>  >
>  > hm, this one could perhaps be a real bug. Dave: lockdep complains about
>  > having observed:
>  >
>  >      anon_vma->lock  =>   mm->mmap_sem
>  >      mm->mmap_sem    =>   anon_vma->lock
>  >
>  > locking sequences, in the cpufreq code. Is there some special runtime
>  > behavior that still makes this safe, or is it a real bug?
>
> I'm feeling a bit overwhelmed by the voluminous output of this checker.
> Especially as (directly at least) cpufreq doesn't touch vma's, or mmap's.
>
> The first stack trace it shows has us down in the bowels of cpu hotplug,
> where we're taking the cpucontrol sem.  The second stack trace shows
> us in cpufreq_update_policy taking a per-cpu data->lock semaphore.
>
> Now, I notice this is modprobe triggering this, and this *looks* like
> we're loading two modules simultaneously (the first trace is from a
> scaling driver like powernow-k8 or the like, whilst the second trace
> is from cpufreq-stats).

/etc/init.d/cpuspeed starts very early
$ ls /etc/rc5.d/ | grep cpu
S06cpuspeed

I have this in my /etc/rc.local
modprobe -i cpufreq_conservative
modprobe -i cpufreq_ondemand
modprobe -i cpufreq_powersave
modprobe -i cpufreq_stats
modprobe -i cpufreq_userspace
modprobe -i freq_table

>
> How on earth did we get into this situation?

Just before gdm starts, while /etc/rc.local is processed.

> module loading is supposed
> to be serialised on the module_mutex no ?
>
> It's been a while since a debug patch has sent me in search of paracetamol ;)
>
>                 Dave

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  5:45       ` Arjan van de Ven
@ 2006-05-30  6:07         ` Michal Piotrowski
  2006-05-30 14:10         ` Dave Jones
  2006-05-30 20:54         ` [patch, -rc5-mm1] lock validator: select KALLSYMS_ALL Ingo Molnar
  2 siblings, 0 replies; 319+ messages in thread
From: Michal Piotrowski @ 2006-05-30  6:07 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Dave Jones, Andrew Morton, linux-kernel, Ingo Molnar

Hi,

On 30/05/06, Arjan van de Ven <arjan@infradead.org> wrote:
>
> > I'm feeling a bit overwhelmed by the voluminous output of this checker.
> > Especially as (directly at least) cpufreq doesn't touch vma's, or mmap's.
>
> the reporter doesn't have CONFIG_KALLSYMS_ALL enabled which gives
> sometimes misleading backtraces (should lockdep just enable KALLSYMS_ALL
> to get more useful bugreports?)

Here is bug with CONFIG_KALLSYMS_ALL enabled.

=====================================================
[ BUG: possible circular locking deadlock detected! ]
-----------------------------------------------------
modprobe/1950 is trying to acquire lock:
 (&sighand->siglock){.+..}, at: [<c102b632>] do_notify_parent+0x12b/0x1b9

but task is already holding lock:
 (tasklist_lock){..-<B1>}, at: [<c1023473>] do_exit+0x608/0xa43

which lock already depends on the new lock,
which could lead to circular deadlocks!

the existing dependency chain (in reverse order) is:

-> #1 (cpucontrol){--..}:
       [<c10394be>] lockdep_acquire+0x69/0x82
       [<c11ed729>] __mutex_lock_slowpath+0xd0/0x347
       [<c11ed9bc>] mutex_lock+0x1c/0x1f
       [<c103dda5>] __lock_cpu_hotplug+0x36/0x56
       [<c103ddde>] lock_cpu_hotplug+0xa/0xc
       [<c1199dd6>] __cpufreq_driver_target+0x15/0x50
       [<c119a192>] cpufreq_governor_performance+0x1a/0x20
       [<c1198ada>] __cpufreq_governor+0xa0/0x1a9
       [<c1198cb2>] __cpufreq_set_policy+0xcf/0x100
       [<c1199196>] cpufreq_set_policy+0x2d/0x6f
       [<c1199c7e>] cpufreq_add_dev+0x34f/0x492
       [<c114b898>] sysdev_driver_register+0x58/0x9b
       [<c119a006>] cpufreq_register_driver+0x80/0xf4
       [<fd91402a>] ipt_local_out_hook+0x2a/0x65 [iptable_filter]
       [<c10410e1>] sys_init_module+0xa6/0x230
       [<c11ef97b>] sysenter_past_esp+0x54/0x8d

-> #0 (&sighand->siglock){.+..}:
       [<c10394be>] lockdep_acquire+0x69/0x82
       [<c11ed729>] __mutex_lock_slowpath+0xd0/0x347
       [<c11ed9bc>] mutex_lock+0x1c/0x1f
       [<c11990bb>] cpufreq_update_policy+0x34/0xd8
       [<fd9a350b>] cpufreq_stat_cpu_callback+0x1b/0x7c [cpufreq_stats]
       [<fd9a607d>] cpufreq_stats_init+0x7d/0x9b [cpufreq_stats]
       [<c10410e1>] sys_init_module+0xa6/0x230
       [<c11ef97b>] sysenter_past_esp+0x54/0x8d

other info that might help us debug this:

1 locks held by modprobe/1950:
 #0:  (cpucontrol){--..}, at: [<c11ed9bc>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c1003ed6>] show_trace+0xd/0xf
 [<c10043e9>] dump_stack+0x17/0x19
 [<c103863e>] print_circular_bug_tail+0x59/0x64
 [<c1038e91>] __lockdep_acquire+0x848/0xa39
 [<c10394be>] lockdep_acquire+0x69/0x82
 [<c11ed729>] __mutex_lock_slowpath+0xd0/0x347
 [<c11ed9bc>] mutex_lock+0x1c/0x1f
 [<c11990bb>] cpufreq_update_policy+0x34/0xd8
 [<fd9a350b>] cpufreq_stat_cpu_callback+0x1b/0x7c [cpufreq_stats]
 [<fd9a607d>] cpufreq_stats_init+0x7d/0x9b [cpufreq_stats]
 [<c10410e1>] sys_init_module+0xa6/0x230
 [<c11ef97b>] sysenter_past_esp+0x54/0x8d


>
> the problem is this, there are 2 scenarios in this bug:
>
> One
> ---
> store_scaling_governor takes policy->lock and then calls __cpufreq_set_policy
> __cpufreq_set_policy calls __cpufreq_governor
> __cpufreq_governor  calls __cpufreq_driver_target via cpufreq_governor_performance
> __cpufreq_driver_target calls lock_cpu_hotplug() (which takes the hotplug lock)
>
>
> Two
> ---
> cpufreq_stats_init lock_cpu_hotplug() and then calls cpufreq_stat_cpu_callback
> cpufreq_stat_cpu_callback calls cpufreq_update_policy
> cpufreq_update_policy takes the policy->lock
>
>
> so this looks like a real honest AB-BA deadlock to me...

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  4:52 ` Mike Galbraith
@ 2006-05-30  6:20   ` Arjan van de Ven
  2006-05-30  6:35   ` Arjan van de Ven
  2006-05-30  6:37   ` Ingo Molnar
  2 siblings, 0 replies; 319+ messages in thread
From: Arjan van de Ven @ 2006-05-30  6:20 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, linux-kernel, Andrew Morton

On Tue, 2006-05-30 at 06:52 +0200, Mike Galbraith wrote:
> On Mon, 2006-05-29 at 23:21 +0200, Ingo Molnar wrote:
> > The easiest way to try lockdep on a testbox is to apply the combo patch 
> > to 2.6.17-rc4-mm3. The patch order is:
> > 
> >   http://kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.17-rc4.tar.bz2
> >   http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17-rc4/2.6.17-rc4-mm3/2.6.17-rc4-mm3.bz2
> >   http://redhat.com/~mingo/lockdep-patches/lockdep-combo.patch
> > 
> > do 'make oldconfig' and accept all the defaults for new config options - 
> > reboot into the kernel and if everything goes well it should boot up 
> > fine and you should have /proc/lockdep and /proc/lockdep_stats files.
> 
> Darn.  It said all tests passed, then oopsed.
> 
> (have .config all gzipped up if you want it)


yes please get me/Ingo the .config; something odd is going on


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  4:52 ` Mike Galbraith
  2006-05-30  6:20   ` Arjan van de Ven
@ 2006-05-30  6:35   ` Arjan van de Ven
  2006-05-30  7:47     ` Ingo Molnar
  2006-05-30  6:37   ` Ingo Molnar
  2 siblings, 1 reply; 319+ messages in thread
From: Arjan van de Ven @ 2006-05-30  6:35 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, linux-kernel, Andrew Morton

On Tue, 2006-05-30 at 06:52 +0200, Mike Galbraith wrote:
> On Mon, 2006-05-29 at 23:21 +0200, Ingo Molnar wrote:
> > The easiest way to try lockdep on a testbox is to apply the combo patch 
> > to 2.6.17-rc4-mm3. The patch order is:
> > 
> >   http://kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.17-rc4.tar.bz2
> >   http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17-rc4/2.6.17-rc4-mm3/2.6.17-rc4-mm3.bz2
> >   http://redhat.com/~mingo/lockdep-patches/lockdep-combo.patch
> > 
> > do 'make oldconfig' and accept all the defaults for new config options - 
> > reboot into the kernel and if everything goes well it should boot up 
> > fine and you should have /proc/lockdep and /proc/lockdep_stats files.
> 
> Darn.  It said all tests passed, then oopsed.


does this fix it?


type->name can be NULL legitimately; all places but one check for this
already. Fix this off-by-one.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

--- linux-2.6.17-rc4-mm3-lockdep/kernel/lockdep.c.org	2006-05-30 08:32:52.000000000 +0200
+++ linux-2.6.17-rc4-mm3-lockdep/kernel/lockdep.c	2006-05-30 08:33:09.000000000 +0200
@@ -1151,7 +1151,7 @@ int count_matching_names(struct lock_typ
 	list_for_each_entry(type, &all_lock_types, lock_entry) {
 		if (new_type->key - new_type->subtype == type->key)
 			return type->name_version;
-		if (!strcmp(type->name, new_type->name))
+		if (type->name && !strcmp(type->name, new_type->name))
 			count = max(count, type->name_version);
 	}
 



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  4:52 ` Mike Galbraith
  2006-05-30  6:20   ` Arjan van de Ven
  2006-05-30  6:35   ` Arjan van de Ven
@ 2006-05-30  6:37   ` Ingo Molnar
  2006-05-30  9:25     ` Mike Galbraith
  2 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30  6:37 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton


* Mike Galbraith <efault@gmx.de> wrote:

> Darn.  It said all tests passed, then oopsed.
> 
> (have .config all gzipped up if you want it)

yeah, please.

> EIP:    0060:[<b103a872>]    Not tainted VLI
> EFLAGS: 00010083   (2.6.17-rc4-mm3-smp #157)
> EIP is at count_matching_names+0x5b/0xa2

> 1151            list_for_each_entry(type, &all_lock_types, lock_entry) {
> 1152                    if (new_type->key - new_type->subtype == type->key)
> 1153                            return type->name_version;
> 1154                    if (!strcmp(type->name, new_type->name))  <--kaboom
> 1155                            count = max(count, type->name_version);

hm, while most code (except the one above) is prepared for type->name 
being NULL, it should not be NULL. Maybe an uninitialized lock slipped 
through? Please try the patch below - it both protects against 
type->name being NULL in this place, and will warn if it finds a NULL 
lockname.

	Ingo

Index: linux/kernel/lockdep.c
===================================================================
--- linux.orig/kernel/lockdep.c
+++ linux/kernel/lockdep.c
@@ -1151,7 +1151,7 @@ int count_matching_names(struct lock_typ
 	list_for_each_entry(type, &all_lock_types, lock_entry) {
 		if (new_type->key - new_type->subtype == type->key)
 			return type->name_version;
-		if (!strcmp(type->name, new_type->name))
+		if (type->name && !strcmp(type->name, new_type->name))
 			count = max(count, type->name_version);
 	}
 
@@ -1974,7 +1974,8 @@ void lockdep_init_map(struct lockdep_map
 
 	if (DEBUG_WARN_ON(!key))
 		return;
-
+	if (DEBUG_WARN_ON(!name))
+		return;
 	/*
 	 * Sanity check, the lock-type key must be persistent:
 	 */

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  6:35   ` Arjan van de Ven
@ 2006-05-30  7:47     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30  7:47 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Mike Galbraith, linux-kernel, Andrew Morton


* Arjan van de Ven <arjan@infradead.org> wrote:

> > Darn.  It said all tests passed, then oopsed.
>
> does this fix it?
> 
> type->name can be NULL legitimately; all places but one check for this 
> already. Fix this off-by-one.

that used to be the case, but shouldnt happen anymore - with current 
lockdep code we always pass some string to the lock init code. (that's 
what lock-init-improvement.patch achieves in essence.) Worst-case the 
string should be "old_style_spin_init" or "old_style_rw_init".

So Mike please try the other patch i sent - it also adds a debugging 
check so that we can see where that NULL name comes from. It could be 
something benign like me forgetting to pass in a string somewhere in the 
initialization macros, but it could also be something more nasty like an 
initialize-by-memset assumption.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 25/61] lock validator: design docs
  2006-05-29 21:25 ` [patch 25/61] lock validator: design docs Ingo Molnar
@ 2006-05-30  9:07   ` Nikita Danilov
  0 siblings, 0 replies; 319+ messages in thread
From: Nikita Danilov @ 2006-05-30  9:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Arjan van de Ven, Andrew Morton, Linux Kernel Mailing List

Ingo Molnar writes:
 > From: Ingo Molnar <mingo@elte.hu>

[...]

 > +
 > +enum bdev_bd_mutex_lock_type
 > +{
 > +       BD_MUTEX_NORMAL,
 > +       BD_MUTEX_WHOLE,
 > +       BD_MUTEX_PARTITION
 > +};

In some situations well-defined and finite set of "nesting levels" does
not exist. For example, if one has a tree with per-node locking, and
algorithms acquire multiple node locks left-to-right in the tree
order. Reiser4 does this.

Can nested locking restrictions be weakened for certain lock types?

Nikita.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (63 preceding siblings ...)
  2006-05-30  4:52 ` Mike Galbraith
@ 2006-05-30  9:14 ` Benoit Boissinot
  2006-05-30 10:26   ` Arjan van de Ven
  2006-06-01 14:42   ` [patch mm1-rc2] lock validator: netlink.c netlink_table_grab fix Frederik Deweerdt
  2007-02-13 14:20 ` [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support Ingo Molnar
                   ` (8 subsequent siblings)
  73 siblings, 2 replies; 319+ messages in thread
From: Benoit Boissinot @ 2006-05-30  9:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Arjan van de Ven, Andrew Morton, yi.zhu, jketreno

On 5/29/06, Ingo Molnar <mingo@elte.hu> wrote:
> We are pleased to announce the first release of the "lock dependency
> correctness validator" kernel debugging feature, which can be downloaded
> from:
>
>   http://redhat.com/~mingo/lockdep-patches/
> [snip]

I get this right after ipw2200 is loaded (it is quite verbose, I
probably shoudln't post everything...)

ipw2200: Detected Intel PRO/Wireless 2200BG Network Connection
ipw2200: Detected geography ZZD (13 802.11bg channels, 0 802.11a channels)

======================================================
[ BUG: hard-safe -> hard-unsafe lock order detected! ]
------------------------------------------------------
default.hotplug/3212 [HC0[0]:SC1[1]:HE0:SE0] is trying to acquire:
 (nl_table_lock){-.-±}, at: [<c0301efa>] netlink_broadcast+0x7a/0x360

and this task is already holding:
 (&priv->lock){++..}, at: [<e1cfe588>] ipw_irq_tasklet+0x18/0x500 [ipw2200]
which would create a new lock dependency:
 (&priv->lock){++..} -> (nl_table_lock){-.-±}

but this new dependency connects a hard-irq-safe lock:
 (&priv->lock){++..}
... which became hard-irq-safe at:
  [<c01395da>] lockdep_acquire+0x7a/0xa0
  [<c0352583>] _spin_lock+0x23/0x30
  [<e1cfdbc1>] ipw_isr+0x21/0xd0 [ipw2200]
  [<c01466e3>] handle_IRQ_event+0x33/0x80
  [<c01467e4>] __do_IRQ+0xb4/0x120
  [<c01057c0>] do_IRQ+0x70/0xc0

to a hard-irq-unsafe lock:
 (nl_table_lock){-.-±}
... which became hard-irq-unsafe at:
...  [<c01395da>] lockdep_acquire+0x7a/0xa0
  [<c03520da>] _write_lock_bh+0x2a/0x30
  [<c03017d2>] netlink_table_grab+0x12/0xe0
  [<c0301bcb>] netlink_insert+0x2b/0x180
  [<c030307c>] netlink_kernel_create+0xac/0x140
  [<c048f29a>] rtnetlink_init+0x6a/0xc0
  [<c048f6b9>] netlink_proto_init+0x169/0x180
  [<c010029f>] _stext+0x7f/0x250
  [<c0101005>] kernel_thread_helper+0x5/0xb

which could potentially lead to deadlocks!

other info that might help us debug this:

1 locks held by default.hotplug/3212:
 #0:  (&priv->lock){++..}, at: [<e1cfe588>] ipw_irq_tasklet+0x18/0x500 [ipw2200]

the hard-irq-safe lock's dependencies:
-> (&priv->lock){++..} ops: 102 {
   initial-use  at:
                                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                                       [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                                       [<e1cf6a0c>] ipw_load+0x1fc/0xc90 [ipw2200]
                                       [<e1cf74e8>] ipw_up+0x48/0x520 [ipw2200]
                                       [<e1cfda87>] ipw_net_init+0x27/0x50 [ipw2200]
                                       [<c02eeef1>] register_netdevice+0xd1/0x410
                                       [<c02f0609>] register_netdev+0x59/0x70
                                       [<e1cfe4d6>] ipw_pci_probe+0x806/0x8a0 [ipw2200]
                                       [<c023481e>] pci_device_probe+0x5e/0x80
                                       [<c02a86e4>] driver_probe_device+0x44/0xc0
                                       [<c02a888b>] __driver_attach+0x9b/0xa0
                                       [<c02a8039>] bus_for_each_dev+0x49/0x70
                                       [<c02a8629>] driver_attach+0x19/0x20
                                       [<c02a7c64>] bus_add_driver+0x74/0x140
                                       [<c02a8b06>] driver_register+0x56/0x90
                                       [<c0234a10>] __pci_register_driver+0x50/0x70
                                       [<e18b302e>] 0xe18b302e
                                       [<c014034d>] sys_init_module+0xcd/0x1630
                                       [<c035273b>] sysenter_past_esp+0x54/0x8d
   in-hardirq-W at:
                                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                                       [<c0352583>] _spin_lock+0x23/0x30
                                       [<e1cfdbc1>] ipw_isr+0x21/0xd0 [ipw2200]
                                       [<c01466e3>] handle_IRQ_event+0x33/0x80
                                       [<c01467e4>] __do_IRQ+0xb4/0x120
                                       [<c01057c0>] do_IRQ+0x70/0xc0
   in-softirq-W at:
                                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                                       [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                                       [<e1cfe588>] ipw_irq_tasklet+0x18/0x500 [ipw2200]
                                       [<c0121ea0>] tasklet_action+0x40/0x90
                                       [<c01223b4>] __do_softirq+0x54/0xc0
                                       [<c01056bb>] do_softirq+0x5b/0xf0
 }
 ... key      at: [<e1d0b438>] __key.27363+0x0/0xffff38f6 [ipw2200]
  -> (&q->lock){++..} ops: 33353 {
     initial-use  at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c0352509>] _spin_lock_irq+0x29/0x40
                      [<c034f084>] wait_for_completion+0x24/0x150
                      [<c013160e>] keventd_create_kthread+0x2e/0x70
                      [<c01315d6>] kthread_create+0xe6/0xf0
                      [<c0121b75>] cpu_callback+0x95/0x110
                      [<c0481194>] spawn_ksoftirqd+0x14/0x30
                      [<c010023c>] _stext+0x1c/0x250
                      [<c0101005>] kernel_thread_helper+0x5/0xb
     in-hardirq-W at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<c011794b>] __wake_up+0x1b/0x50
                      [<c012dcdd>] __queue_work+0x4d/0x70
                      [<c012ddaf>] queue_work+0x6f/0x80
                      [<c0269588>] acpi_os_execute+0xcd/0xe9
                      [<c026eea1>] acpi_ev_gpe_dispatch+0xbc/0x122
                      [<c026f106>] acpi_ev_gpe_detect+0x99/0xe0
                      [<c026d90b>] acpi_ev_sci_xrupt_handler+0x15/0x1d
                      [<c0268c55>] acpi_irq+0xe/0x18
                      [<c01466e3>] handle_IRQ_event+0x33/0x80
                      [<c01467e4>] __do_IRQ+0xb4/0x120
                      [<c01057c0>] do_IRQ+0x70/0xc0
     in-softirq-W at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<c011786b>] complete+0x1b/0x60
                      [<c012ef0b>] wakeme_after_rcu+0xb/0x10
                      [<c012f0c9>] __rcu_process_callbacks+0x69/0x1c0
                      [<c012f232>] rcu_process_callbacks+0x12/0x30
                      [<c0121ea0>] tasklet_action+0x40/0x90
                      [<c01223b4>] __do_softirq+0x54/0xc0
                      [<c01056bb>] do_softirq+0x5b/0xf0
   }
   ... key      at: [<c04d47c8>] 0xc04d47c8
    -> (&rq->lock){++..} ops: 68824 {
       initial-use  at:
                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                       [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                       [<c0117bcc>] init_idle+0x4c/0x80
                       [<c0480ad8>] sched_init+0xa8/0xb0
                       [<c0473558>] start_kernel+0x58/0x330
                       [<c0100199>] 0xc0100199
       in-hardirq-W at:
                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                       [<c0352583>] _spin_lock+0x23/0x30
                       [<c0117cc7>] scheduler_tick+0xc7/0x310
                       [<c01270ee>] update_process_times+0x3e/0x70
                       [<c0106c21>] timer_interrupt+0x41/0xa0
                       [<c01466e3>] handle_IRQ_event+0x33/0x80
                       [<c01467e4>] __do_IRQ+0xb4/0x120
                       [<c01057c0>] do_IRQ+0x70/0xc0
       in-softirq-W at:
                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                       [<c0352583>] _spin_lock+0x23/0x30
                       [<c01183e0>] try_to_wake_up+0x30/0x170
                       [<c011854f>] wake_up_process+0xf/0x20
                       [<c0122413>] __do_softirq+0xb3/0xc0
                       [<c01056bb>] do_softirq+0x5b/0xf0
     }
     ... key      at: [<c04c1400>] 0xc04c1400
   ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c0352583>] _spin_lock+0x23/0x30
   [<c01183e0>] try_to_wake_up+0x30/0x170
   [<c011852b>] default_wake_function+0xb/0x10
   [<c01172d9>] __wake_up_common+0x39/0x70
   [<c011788d>] complete+0x3d/0x60
   [<c01316d4>] kthread+0x84/0xbc
   [<c0101005>] kernel_thread_helper+0x5/0xb

 ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c03524c0>] _spin_lock_irqsave+0x30/0x50
   [<c011794b>] __wake_up+0x1b/0x50
   [<e1cf6a2e>] ipw_load+0x21e/0xc90 [ipw2200]
   [<e1cf74e8>] ipw_up+0x48/0x520 [ipw2200]
   [<e1cfda87>] ipw_net_init+0x27/0x50 [ipw2200]
   [<c02eeef1>] register_netdevice+0xd1/0x410
   [<c02f0609>] register_netdev+0x59/0x70
   [<e1cfe4d6>] ipw_pci_probe+0x806/0x8a0 [ipw2200]
   [<c023481e>] pci_device_probe+0x5e/0x80
   [<c02a86e4>] driver_probe_device+0x44/0xc0
   [<c02a888b>] __driver_attach+0x9b/0xa0
   [<c02a8039>] bus_for_each_dev+0x49/0x70
   [<c02a8629>] driver_attach+0x19/0x20
   [<c02a7c64>] bus_add_driver+0x74/0x140
   [<c02a8b06>] driver_register+0x56/0x90
   [<c0234a10>] __pci_register_driver+0x50/0x70
   [<e18b302e>] 0xe18b302e
   [<c014034d>] sys_init_module+0xcd/0x1630
   [<c035273b>] sysenter_past_esp+0x54/0x8d

  -> (&rxq->lock){.+..} ops: 40 {
     initial-use  at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<e1cf66d0>] ipw_rx_queue_replenish+0x20/0x120 [ipw2200]
                      [<e1cf72e0>] ipw_load+0xad0/0xc90 [ipw2200]
                      [<e1cf74e8>] ipw_up+0x48/0x520 [ipw2200]
                      [<e1cfda87>] ipw_net_init+0x27/0x50 [ipw2200]
                      [<c02eeef1>] register_netdevice+0xd1/0x410
                      [<c02f0609>] register_netdev+0x59/0x70
                      [<e1cfe4d6>] ipw_pci_probe+0x806/0x8a0 [ipw2200]
                      [<c023481e>] pci_device_probe+0x5e/0x80
                      [<c02a86e4>] driver_probe_device+0x44/0xc0
                      [<c02a888b>] __driver_attach+0x9b/0xa0
                      [<c02a8039>] bus_for_each_dev+0x49/0x70
                      [<c02a8629>] driver_attach+0x19/0x20
                      [<c02a7c64>] bus_add_driver+0x74/0x140
                      [<c02a8b06>] driver_register+0x56/0x90
                      [<c0234a10>] __pci_register_driver+0x50/0x70
                      [<e18b302e>] 0xe18b302e
                      [<c014034d>] sys_init_module+0xcd/0x1630
                      [<c035273b>] sysenter_past_esp+0x54/0x8d
     in-softirq-W at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<e1cf25bf>] ipw_rx_queue_restock+0x1f/0x120 [ipw2200]
                      [<e1cf80d1>] ipw_rx+0x631/0x1bb0 [ipw2200]
                      [<e1cfe6ac>] ipw_irq_tasklet+0x13c/0x500 [ipw2200]
                      [<c0121ea0>] tasklet_action+0x40/0x90
                      [<c01223b4>] __do_softirq+0x54/0xc0
                      [<c01056bb>] do_softirq+0x5b/0xf0
   }
   ... key      at: [<e1d0b440>] __key.23915+0x0/0xffff38ee [ipw2200]
    -> (&parent->list_lock){.+..} ops: 17457 {
       initial-use  at:
                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                       [<c0352583>] _spin_lock+0x23/0x30
                       [<c0166437>] cache_alloc_refill+0x87/0x650
                       [<c0166bae>] kmem_cache_zalloc+0xbe/0xd0
                       [<c01672d4>] kmem_cache_create+0x154/0x540
                       [<c0483ad9>] kmem_cache_init+0x179/0x3d0
                       [<c0473638>] start_kernel+0x138/0x330
                       [<c0100199>] 0xc0100199
       in-softirq-W at:
                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                       [<c0352583>] _spin_lock+0x23/0x30
                       [<c0166073>] free_block+0x183/0x190
                       [<c0165bdf>] __cache_free+0x9f/0x120
                       [<c0165da8>] kmem_cache_free+0x88/0xb0
                       [<c0119e21>] free_task+0x21/0x30
                       [<c011b955>] __put_task_struct+0x95/0x156
                       [<c011db12>] delayed_put_task_struct+0x32/0x60
                       [<c012f0c9>] __rcu_process_callbacks+0x69/0x1c0
                       [<c012f232>] rcu_process_callbacks+0x12/0x30
                       [<c0121ea0>] tasklet_action+0x40/0x90
                       [<c01223b4>] __do_softirq+0x54/0xc0
                       [<c01056bb>] do_softirq+0x5b/0xf0
     }
     ... key      at: [<c060d00c>] 0xc060d00c
   ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c0352583>] _spin_lock+0x23/0x30
   [<c0166437>] cache_alloc_refill+0x87/0x650
   [<c0166ab8>] __kmalloc+0xb8/0xf0
   [<c02eb3cb>] __alloc_skb+0x4b/0x100
   [<e1cf6769>] ipw_rx_queue_replenish+0xb9/0x120 [ipw2200]
   [<e1cf72e0>] ipw_load+0xad0/0xc90 [ipw2200]
   [<e1cf74e8>] ipw_up+0x48/0x520 [ipw2200]
   [<e1cfda87>] ipw_net_init+0x27/0x50 [ipw2200]
   [<c02eeef1>] register_netdevice+0xd1/0x410
   [<c02f0609>] register_netdev+0x59/0x70
   [<e1cfe4d6>] ipw_pci_probe+0x806/0x8a0 [ipw2200]
   [<c023481e>] pci_device_probe+0x5e/0x80
   [<c02a86e4>] driver_probe_device+0x44/0xc0
   [<c02a888b>] __driver_attach+0x9b/0xa0
   [<c02a8039>] bus_for_each_dev+0x49/0x70
   [<c02a8629>] driver_attach+0x19/0x20
   [<c02a7c64>] bus_add_driver+0x74/0x140
   [<c02a8b06>] driver_register+0x56/0x90
   [<c0234a10>] __pci_register_driver+0x50/0x70
   [<e18b302e>] 0xe18b302e
   [<c014034d>] sys_init_module+0xcd/0x1630
   [<c035273b>] sysenter_past_esp+0x54/0x8d

 ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c03524c0>] _spin_lock_irqsave+0x30/0x50
   [<e1cf25bf>] ipw_rx_queue_restock+0x1f/0x120 [ipw2200]
   [<e1cf80d1>] ipw_rx+0x631/0x1bb0 [ipw2200]
   [<e1cfe6ac>] ipw_irq_tasklet+0x13c/0x500 [ipw2200]
   [<c0121ea0>] tasklet_action+0x40/0x90
   [<c01223b4>] __do_softirq+0x54/0xc0
   [<c01056bb>] do_softirq+0x5b/0xf0

  -> (&ieee->lock){.+..} ops: 15 {
     initial-use  at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<e1c9d0cf>] ieee80211_process_probe_response+0x1ff/0x790 [ieee80211]
                      [<e1c9d70f>] ieee80211_rx_mgt+0xaf/0x340 [ieee80211]
                      [<e1cf8219>] ipw_rx+0x779/0x1bb0 [ipw2200]
                      [<e1cfe6ac>] ipw_irq_tasklet+0x13c/0x500 [ipw2200]
                      [<c0121ea0>] tasklet_action+0x40/0x90
                      [<c01223b4>] __do_softirq+0x54/0xc0
                      [<c01056bb>] do_softirq+0x5b/0xf0
     in-softirq-W at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<e1c9d0cf>] ieee80211_process_probe_response+0x1ff/0x790 [ieee80211]
                      [<e1c9d70f>] ieee80211_rx_mgt+0xaf/0x340 [ieee80211]
                      [<e1cf8219>] ipw_rx+0x779/0x1bb0 [ipw2200]
                      [<e1cfe6ac>] ipw_irq_tasklet+0x13c/0x500 [ipw2200]
                      [<c0121ea0>] tasklet_action+0x40/0x90
                      [<c01223b4>] __do_softirq+0x54/0xc0
                      [<c01056bb>] do_softirq+0x5b/0xf0
   }
   ... key      at: [<e1ca2781>] __key.22782+0x0/0xffffdc00 [ieee80211]
 ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c03524c0>] _spin_lock_irqsave+0x30/0x50
   [<e1c9d0cf>] ieee80211_process_probe_response+0x1ff/0x790 [ieee80211]
   [<e1c9d70f>] ieee80211_rx_mgt+0xaf/0x340 [ieee80211]
   [<e1cf8219>] ipw_rx+0x779/0x1bb0 [ipw2200]
   [<e1cfe6ac>] ipw_irq_tasklet+0x13c/0x500 [ipw2200]
   [<c0121ea0>] tasklet_action+0x40/0x90
   [<c01223b4>] __do_softirq+0x54/0xc0
   [<c01056bb>] do_softirq+0x5b/0xf0

  -> (&cwq->lock){++..} ops: 3739 {
     initial-use  at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<c012dca8>] __queue_work+0x18/0x70
                      [<c012ddaf>] queue_work+0x6f/0x80
                      [<c012d949>] call_usermodehelper_keys+0x139/0x160
                      [<c0219a2a>] kobject_uevent+0x7a/0x4a0
                      [<c0219753>] kobject_register+0x43/0x50
                      [<c02a7687>] sysdev_register+0x67/0x100
                      [<c02aa950>] register_cpu+0x30/0x70
                      [<c0108f7a>] arch_register_cpu+0x2a/0x30
                      [<c047850a>] topology_init+0xa/0x10
                      [<c010029f>] _stext+0x7f/0x250
                      [<c0101005>] kernel_thread_helper+0x5/0xb
     in-hardirq-W at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<c012dca8>] __queue_work+0x18/0x70
                      [<c012ddaf>] queue_work+0x6f/0x80
                      [<c0269588>] acpi_os_execute+0xcd/0xe9
                      [<c026eea1>] acpi_ev_gpe_dispatch+0xbc/0x122
                      [<c026f106>] acpi_ev_gpe_detect+0x99/0xe0
                      [<c026d90b>] acpi_ev_sci_xrupt_handler+0x15/0x1d
                      [<c0268c55>] acpi_irq+0xe/0x18
                      [<c01466e3>] handle_IRQ_event+0x33/0x80
                      [<c01467e4>] __do_IRQ+0xb4/0x120
                      [<c01057c0>] do_IRQ+0x70/0xc0
     in-softirq-W at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<c012dca8>] __queue_work+0x18/0x70
                      [<c012dd30>] delayed_work_timer_fn+0x30/0x40
                      [<c012633e>] run_timer_softirq+0x12e/0x180
                      [<c01223b4>] __do_softirq+0x54/0xc0
                      [<c01056bb>] do_softirq+0x5b/0xf0
   }
   ... key      at: [<c04d4334>] 0xc04d4334
    -> (&q->lock){++..} ops: 33353 {
       initial-use  at:
                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                       [<c0352509>] _spin_lock_irq+0x29/0x40
                       [<c034f084>] wait_for_completion+0x24/0x150
                       [<c013160e>] keventd_create_kthread+0x2e/0x70
                       [<c01315d6>] kthread_create+0xe6/0xf0
                       [<c0121b75>] cpu_callback+0x95/0x110
                       [<c0481194>] spawn_ksoftirqd+0x14/0x30
                       [<c010023c>] _stext+0x1c/0x250
                       [<c0101005>] kernel_thread_helper+0x5/0xb
       in-hardirq-W at:
                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                       [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                       [<c011794b>] __wake_up+0x1b/0x50
                       [<c012dcdd>] __queue_work+0x4d/0x70
                       [<c012ddaf>] queue_work+0x6f/0x80
                       [<c0269588>] acpi_os_execute+0xcd/0xe9
                       [<c026eea1>] acpi_ev_gpe_dispatch+0xbc/0x122
                       [<c026f106>] acpi_ev_gpe_detect+0x99/0xe0
                       [<c026d90b>] acpi_ev_sci_xrupt_handler+0x15/0x1d
                       [<c0268c55>] acpi_irq+0xe/0x18
                       [<c01466e3>] handle_IRQ_event+0x33/0x80
                       [<c01467e4>] __do_IRQ+0xb4/0x120
                       [<c01057c0>] do_IRQ+0x70/0xc0
       in-softirq-W at:
                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                       [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                       [<c011786b>] complete+0x1b/0x60
                       [<c012ef0b>] wakeme_after_rcu+0xb/0x10
                       [<c012f0c9>] __rcu_process_callbacks+0x69/0x1c0
                       [<c012f232>] rcu_process_callbacks+0x12/0x30
                       [<c0121ea0>] tasklet_action+0x40/0x90
                       [<c01223b4>] __do_softirq+0x54/0xc0
                       [<c01056bb>] do_softirq+0x5b/0xf0
     }
     ... key      at: [<c04d47c8>] 0xc04d47c8
      -> (&rq->lock){++..} ops: 68824 {
         initial-use  at:
                        [<c01395da>] lockdep_acquire+0x7a/0xa0
                        [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                        [<c0117bcc>] init_idle+0x4c/0x80
                        [<c0480ad8>] sched_init+0xa8/0xb0
                        [<c0473558>] start_kernel+0x58/0x330
                        [<c0100199>] 0xc0100199
         in-hardirq-W at:
                        [<c01395da>] lockdep_acquire+0x7a/0xa0
                        [<c0352583>] _spin_lock+0x23/0x30
                        [<c0117cc7>] scheduler_tick+0xc7/0x310
                        [<c01270ee>] update_process_times+0x3e/0x70
                        [<c0106c21>] timer_interrupt+0x41/0xa0
                        [<c01466e3>] handle_IRQ_event+0x33/0x80
                        [<c01467e4>] __do_IRQ+0xb4/0x120
                        [<c01057c0>] do_IRQ+0x70/0xc0
         in-softirq-W at:
                        [<c01395da>] lockdep_acquire+0x7a/0xa0
                        [<c0352583>] _spin_lock+0x23/0x30
                        [<c01183e0>] try_to_wake_up+0x30/0x170
                        [<c011854f>] wake_up_process+0xf/0x20
                        [<c0122413>] __do_softirq+0xb3/0xc0
                        [<c01056bb>] do_softirq+0x5b/0xf0
       }
       ... key      at: [<c04c1400>] 0xc04c1400
     ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c0352583>] _spin_lock+0x23/0x30
   [<c01183e0>] try_to_wake_up+0x30/0x170
   [<c011852b>] default_wake_function+0xb/0x10
   [<c01172d9>] __wake_up_common+0x39/0x70
   [<c011788d>] complete+0x3d/0x60
   [<c01316d4>] kthread+0x84/0xbc
   [<c0101005>] kernel_thread_helper+0x5/0xb

   ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c03524c0>] _spin_lock_irqsave+0x30/0x50
   [<c011794b>] __wake_up+0x1b/0x50
   [<c012dcdd>] __queue_work+0x4d/0x70
   [<c012ddaf>] queue_work+0x6f/0x80
   [<c012d949>] call_usermodehelper_keys+0x139/0x160
   [<c0219a2a>] kobject_uevent+0x7a/0x4a0
   [<c0219753>] kobject_register+0x43/0x50
   [<c02a7687>] sysdev_register+0x67/0x100
   [<c02aa950>] register_cpu+0x30/0x70
   [<c0108f7a>] arch_register_cpu+0x2a/0x30
   [<c047850a>] topology_init+0xa/0x10
   [<c010029f>] _stext+0x7f/0x250
   [<c0101005>] kernel_thread_helper+0x5/0xb

 ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c03524c0>] _spin_lock_irqsave+0x30/0x50
   [<c012dca8>] __queue_work+0x18/0x70
   [<c012ddaf>] queue_work+0x6f/0x80
   [<e1cf267e>] ipw_rx_queue_restock+0xde/0x120 [ipw2200]
   [<e1cf80d1>] ipw_rx+0x631/0x1bb0 [ipw2200]
   [<e1cfe6ac>] ipw_irq_tasklet+0x13c/0x500 [ipw2200]
   [<c0121ea0>] tasklet_action+0x40/0x90
   [<c01223b4>] __do_softirq+0x54/0xc0
   [<c01056bb>] do_softirq+0x5b/0xf0

  -> (&base->lock){++..} ops: 8140 {
     initial-use  at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<c0126e4a>] lock_timer_base+0x3a/0x60
                      [<c0126f17>] __mod_timer+0x37/0xc0
                      [<c0127036>] mod_timer+0x36/0x50
                      [<c048a2e5>] con_init+0x1b5/0x200
                      [<c0489802>] console_init+0x32/0x40
                      [<c04735ea>] start_kernel+0xea/0x330
                      [<c0100199>] 0xc0100199
     in-hardirq-W at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c03524c0>] _spin_lock_irqsave+0x30/0x50
                      [<c0126e4a>] lock_timer_base+0x3a/0x60
                      [<c0126e9c>] del_timer+0x2c/0x70
                      [<c02bc619>] ide_intr+0x69/0x1f0
                      [<c01466e3>] handle_IRQ_event+0x33/0x80
                      [<c01467e4>] __do_IRQ+0xb4/0x120
                      [<c01057c0>] do_IRQ+0x70/0xc0
     in-softirq-W at:
                      [<c01395da>] lockdep_acquire+0x7a/0xa0
                      [<c0352509>] _spin_lock_irq+0x29/0x40
                      [<c0126239>] run_timer_softirq+0x29/0x180
                      [<c01223b4>] __do_softirq+0x54/0xc0
                      [<c01056bb>] do_softirq+0x5b/0xf0
   }
   ... key      at: [<c04d3af8>] 0xc04d3af8
 ... acquired at:
   [<c01395da>] lockdep_acquire+0x7a/0xa0
   [<c03524c0>] _spin_lock_irqsave+0x30/0x50
   [<c0126e4a>] lock_timer_base+0x3a/0x60
   [<c0126e9c>] del_timer+0x2c/0x70
   [<e1cf83d9>] ipw_rx+0x939/0x1bb0 [ipw2200]
   [<e1cfe6ac>] ipw_irq_tasklet+0x13c/0x500 [ipw2200]
   [<c0121ea0>] tasklet_action+0x40/0x90
   [<c01223b4>] __do_softirq+0x54/0xc0
   [<c01056bb>] do_softirq+0x5b/0xf0


the hard-irq-unsafe lock's dependencies:
-> (nl_table_lock){-.-±} ops: 1585 {
   initial-use  at:
                                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                                       [<c03520da>] _write_lock_bh+0x2a/0x30
                                       [<c03017d2>] netlink_table_grab+0x12/0xe0
                                       [<c0301bcb>] netlink_insert+0x2b/0x180
                                       [<c030307c>] netlink_kernel_create+0xac/0x140
                                       [<c048f29a>] rtnetlink_init+0x6a/0xc0
                                       [<c048f6b9>] netlink_proto_init+0x169/0x180
                                       [<c010029f>] _stext+0x7f/0x250
                                       [<c0101005>] kernel_thread_helper+0x5/0xb
   hardirq-on-W at:
                                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                                       [<c03520da>] _write_lock_bh+0x2a/0x30
                                       [<c03017d2>] netlink_table_grab+0x12/0xe0
                                       [<c0301bcb>] netlink_insert+0x2b/0x180
                                       [<c030307c>] netlink_kernel_create+0xac/0x140
                                       [<c048f29a>] rtnetlink_init+0x6a/0xc0
                                       [<c048f6b9>] netlink_proto_init+0x169/0x180
                                       [<c010029f>] _stext+0x7f/0x250
                                       [<c0101005>] kernel_thread_helper+0x5/0xb
   in-softirq-R at:
                                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                                       [<c0352130>] _read_lock+0x20/0x30
                                       [<c0301efa>] netlink_broadcast+0x7a/0x360
                                       [<c02fb6a4>] wireless_send_event+0x304/0x340
                                       [<e1cf8e11>] ipw_rx+0x1371/0x1bb0 [ipw2200]
                                       [<e1cfe6ac>] ipw_irq_tasklet+0x13c/0x500 [ipw2200]
                                       [<c0121ea0>] tasklet_action+0x40/0x90
                                       [<c01223b4>] __do_softirq+0x54/0xc0
                                       [<c01056bb>] do_softirq+0x5b/0xf0
   softirq-on-R at:
                                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                                       [<c0352130>] _read_lock+0x20/0x30
                                       [<c0301efa>] netlink_broadcast+0x7a/0x360
                                       [<c02199f0>] kobject_uevent+0x40/0x4a0
                                       [<c0219753>] kobject_register+0x43/0x50
                                       [<c02a7687>] sysdev_register+0x67/0x100
                                       [<c02aa950>] register_cpu+0x30/0x70
                                       [<c0108f7a>] arch_register_cpu+0x2a/0x30
                                       [<c047850a>] topology_init+0xa/0x10
                                       [<c010029f>] _stext+0x7f/0x250
                                       [<c0101005>] kernel_thread_helper+0x5/0xb
   hardirq-on-R at:
                                       [<c01395da>] lockdep_acquire+0x7a/0xa0
                                       [<c0352130>] _read_lock+0x20/0x30
                                       [<c0301efa>] netlink_broadcast+0x7a/0x360
                                       [<c02199f0>] kobject_uevent+0x40/0x4a0
                                       [<c0219753>] kobject_register+0x43/0x50
                                       [<c02a7687>] sysdev_register+0x67/0x100
                                       [<c02aa950>] register_cpu+0x30/0x70
                                       [<c0108f7a>] arch_register_cpu+0x2a/0x30
                                       [<c047850a>] topology_init+0xa/0x10
                                       [<c010029f>] _stext+0x7f/0x250
                                       [<c0101005>] kernel_thread_helper+0x5/0xb
 }
 ... key      at: [<c0438908>] 0xc0438908

stack backtrace:
 <c010402d> show_trace+0xd/0x10  <c0104687> dump_stack+0x17/0x20
 <c0137fe3> check_usage+0x263/0x270  <c0138f06> __lockdep_acquire+0xb96/0xd40
 <c01395da> lockdep_acquire+0x7a/0xa0  <c0352130> _read_lock+0x20/0x30
 <c0301efa> netlink_broadcast+0x7a/0x360  <c02fb6a4> wireless_send_event+0x304/0x340
 <e1cf8e11> ipw_rx+0x1371/0x1bb0 [ipw2200]  <e1cfe6ac> ipw_irq_tasklet+0x13c/0x500 [ipw2200]
 <c0121ea0> tasklet_action+0x40/0x90  <c01223b4> __do_softirq+0x54/0xc0
 <c01056bb> do_softirq+0x5b/0xf0 
 =======================
 <c0122455> irq_exit+0x35/0x40  <c01057c7> do_IRQ+0x77/0xc0
 <c0103949> common_interrupt+0x25/0x2c 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  6:37   ` Ingo Molnar
@ 2006-05-30  9:25     ` Mike Galbraith
  2006-05-30 10:57       ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Mike Galbraith @ 2006-05-30  9:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton

On Tue, 2006-05-30 at 08:37 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > Darn.  It said all tests passed, then oopsed.
> > 
> > (have .config all gzipped up if you want it)
> 
> yeah, please.

(sent off list)

> > EIP:    0060:[<b103a872>]    Not tainted VLI
> > EFLAGS: 00010083   (2.6.17-rc4-mm3-smp #157)
> > EIP is at count_matching_names+0x5b/0xa2
> 
> > 1151            list_for_each_entry(type, &all_lock_types, lock_entry) {
> > 1152                    if (new_type->key - new_type->subtype == type->key)
> > 1153                            return type->name_version;
> > 1154                    if (!strcmp(type->name, new_type->name))  <--kaboom
> > 1155                            count = max(count, type->name_version);
> 
> hm, while most code (except the one above) is prepared for type->name 
> being NULL, it should not be NULL. Maybe an uninitialized lock slipped 
> through? Please try the patch below - it both protects against 
> type->name being NULL in this place, and will warn if it finds a NULL 
> lockname.

Got the warning.  It failed testing, but booted.

Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBTYPES:    8
... MAX_LOCK_DEPTH:          30
... MAX_LOCKDEP_KEYS:        2048
... TYPEHASH_SIZE:           1024
... MAX_LOCKDEP_ENTRIES:     8192
... MAX_LOCKDEP_CHAINS:      8192
... CHAINHASH_SIZE:          4096
 memory used by lock dependency info: 696 kB
 per task-struct memory footprint: 1080 bytes
------------------------
| Locking API testsuite:
----------------------------------------------------------------------------
                                 | spin |wlock |rlock |mutex | wsem | rsem |
  --------------------------------------------------------------------------
BUG: warning at kernel/lockdep.c:1977/lockdep_init_map()
 <b1003dd2> show_trace+0xd/0xf  <b10044c0> dump_stack+0x17/0x19
 <b103badf> lockdep_init_map+0x10a/0x10f  <b10398d7> __mutex_init+0x3b/0x44
 <b11d4601> init_type_X+0x37/0x4d  <b11d4638> init_shared_types+0x21/0xaa
 <b11dcca3> locking_selftest+0x76/0x1889  <b1597657> start_kernel+0x1e7/0x400
 <b1000210> 0xb1000210 
                     A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                 A-B-B-A deadlock:  ok  |  ok  |FAILED|  ok  |  ok  |  ok  |
             A-B-B-C-C-A deadlock:  ok  |  ok  |FAILED|  ok  |  ok  |  ok  |
             A-B-C-A-B-C deadlock:  ok  |  ok  |FAILED|  ok  |  ok  |  ok  |
         A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |FAILED|  ok  |  ok  |  ok  |
         A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |FAILED|  ok  |  ok  |  ok  |
         A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |FAILED|  ok  |  ok  |  ok  |
                    double unlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                 bad unlock order:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
  --------------------------------------------------------------------------
              recursive read-lock:             |FAILED|             |  ok  |
  --------------------------------------------------------------------------
                non-nested unlock:FAILED|FAILED|FAILED|FAILED|
  ------------------------------------------------------------
     hard-irqs-on + irq-safe-A/12:  ok  |  ok  |FAILED|
     soft-irqs-on + irq-safe-A/12:  ok  |  ok  |FAILED|
     hard-irqs-on + irq-safe-A/21:  ok  |  ok  |FAILED|
     soft-irqs-on + irq-safe-A/21:  ok  |  ok  |FAILED|
       sirq-safe-A => hirqs-on/12:  ok  |  ok  |FAILED|
       sirq-safe-A => hirqs-on/21:  ok  |  ok  |FAILED|
         hard-safe-A + irqs-on/12:  ok  |  ok  |FAILED|
         soft-safe-A + irqs-on/12:  ok  |  ok  |FAILED|
         hard-safe-A + irqs-on/21:  ok  |  ok  |FAILED|
         soft-safe-A + irqs-on/21:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #1/123:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #1/123:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #1/132:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #1/132:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #1/213:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #1/213:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #1/231:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #1/231:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #1/312:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #1/312:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #1/321:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #1/321:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #2/123:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #2/123:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #2/132:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #2/132:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #2/213:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #2/213:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #2/231:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #2/231:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #2/312:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #2/312:  ok  |  ok  |FAILED|
    hard-safe-A + unsafe-B #2/321:  ok  |  ok  |FAILED|
    soft-safe-A + unsafe-B #2/321:  ok  |  ok  |FAILED|
      hard-irq lock-inversion/123:  ok  |  ok  |FAILED|
      soft-irq lock-inversion/123:  ok  |  ok  |FAILED|
      hard-irq lock-inversion/132:  ok  |  ok  |FAILED|
      soft-irq lock-inversion/132:  ok  |  ok  |FAILED|
      hard-irq lock-inversion/213:  ok  |  ok  |FAILED|
      soft-irq lock-inversion/213:  ok  |  ok  |FAILED|
      hard-irq lock-inversion/231:  ok  |  ok  |FAILED|
      soft-irq lock-inversion/231:  ok  |  ok  |FAILED|
      hard-irq lock-inversion/312:  ok  |  ok  |FAILED|
      soft-irq lock-inversion/312:  ok  |  ok  |FAILED|
      hard-irq lock-inversion/321:  ok  |  ok  |FAILED|
      soft-irq lock-inversion/321:  ok  |  ok  |FAILED|
      hard-irq read-recursion/123:FAILED|
      soft-irq read-recursion/123:FAILED|
      hard-irq read-recursion/132:FAILED|
      soft-irq read-recursion/132:FAILED|
      hard-irq read-recursion/213:FAILED|
      soft-irq read-recursion/213:FAILED|
      hard-irq read-recursion/231:FAILED|
      soft-irq read-recursion/231:FAILED|
      hard-irq read-recursion/312:FAILED|
      soft-irq read-recursion/312:FAILED|
      hard-irq read-recursion/321:FAILED|
      soft-irq read-recursion/321:FAILED|
-----------------------------------------------------------------
BUG:  69 unexpected failures (out of 210) - debugging disabled! |
-----------------------------------------------------------------



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 34/61] lock validator: special locking: bdev
  2006-05-30  1:35   ` Andrew Morton
  2006-05-30  5:13     ` Arjan van de Ven
@ 2006-05-30  9:58     ` Al Viro
  2006-05-30 10:45     ` Arjan van de Ven
  2 siblings, 0 replies; 319+ messages in thread
From: Al Viro @ 2006-05-30  9:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, arjan

On Mon, May 29, 2006 at 06:35:23PM -0700, Andrew Morton wrote:
> > +	 * For now, block device ->open() routine must _not_
> > +	 * examine anything in 'inode' argument except ->i_rdev.
> > +	 */
> > +	struct file fake_file = {};
> > +	struct dentry fake_dentry = {};
> > +	fake_file.f_mode = mode;
> > +	fake_file.f_flags = flags;
> > +	fake_file.f_dentry = &fake_dentry;
> > +	fake_dentry.d_inode = bdev->bd_inode;
> > +
> > +	return do_open(bdev, &fake_file, BD_MUTEX_WHOLE);
> > +}
> 
> "crock" is a decent description ;)
> 
> How long will this live, and what will the fix look like?

The comment there is a bit deceptive.  

The real problem is with the stuff ->open() uses.  Short version of the
story:
	* everything uses inode->i_bdev.  Since we always pass an inode
allocated in block_dev.c along with bdev and its ->i_bdev points to that
bdev (i.e. at the constant offset from inode), it doesn't matter whether
we pass struct inode or struct block_device.
	* many things use file->f_mode.  Nobody modifies it.
	* some things use file->f_flags.  Used flags: O_EXCL and O_NDELAY.
Nobody modifies it.
	* one (and only one) weird driver uses something else.  That FPOS
is floppy.c and it needs more detailed description.

floppy.c is _weird_.  In addition to normally used stuff, it checks for
opener having write permissions on file->f_dentry->d_inode.  Then it
modifies file->private_data to store that information and uses it as
permission check in ->ioctl().

The rationale for that crock is a big load of bullshit.  It goes like that:
	We have priveleged ioctls and can't allow them unless you have
write permissions.
	We can't ask to just open() the damn thing for write and let these
be done as usual (and check file->f_mode & FMODE_WRITE) because we might want
them on drive that has no disk in it or a write-protected one.  Opening it
for write would try to check for disk being writable and screw itself.
	Passing O_NDELAY would avoid that problem by skipping the checks
for disk being writable, present, etc., but we can't use that.  Reasons
why we can't?  We don't need no stinkin' reasons!

IOW, *all* of that could be avoided if floppy.c
	* checked FMODE_WRITE for ability to do priveleged ioctls
	* had those who want to issue such ioctls on drive that might have
no disk in it pass O_NDELAY|O_WRONLY (or O_NDELAY|O_RDWR) when they open
the fscker.  Note that userland code always could have done that -
passing O_NDELAY|O_RDWR will do the right thing with any kernel.

That FPOS is the main reason why we pass struct file * there at all *and*
care to have ->f_dentry->d_inode in it (normally that wouldn't be even
looked at).  Again, my prefered solution would be to pass 4-bit flags and
either inode or block_device.  Flags being FMODE_READ, FMODE_WRITE,
O_EXCL and O_NDELAY.

The problem is moronic semantics for ioctl access control in floppy.c,
even though the sane API is _already_ present and always had been.  In
the very same floppy_open()...

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  9:14 ` Benoit Boissinot
@ 2006-05-30 10:26   ` Arjan van de Ven
  2006-05-30 11:42     ` Benoit Boissinot
  2006-06-01 14:42   ` [patch mm1-rc2] lock validator: netlink.c netlink_table_grab fix Frederik Deweerdt
  1 sibling, 1 reply; 319+ messages in thread
From: Arjan van de Ven @ 2006-05-30 10:26 UTC (permalink / raw)
  To: Benoit Boissinot
  Cc: jketreno, yi.zhu, Andrew Morton, Ingo Molnar, linux-kernel

On Tue, 2006-05-30 at 11:14 +0200, Benoit Boissinot wrote:
> On 5/29/06, Ingo Molnar <mingo@elte.hu> wrote:
> > We are pleased to announce the first release of the "lock dependency
> > correctness validator" kernel debugging feature, which can be downloaded
> > from:
> >
> >   http://redhat.com/~mingo/lockdep-patches/
> > [snip]
> 
> I get this right after ipw2200 is loaded (it is quite verbose, I
> probably shoudln't post everything...)
> 
> ipw2200: Detected Intel PRO/Wireless 2200BG Network Connection
> ipw2200: Detected geography ZZD (13 802.11bg channels, 0 802.11a channels)


>  <c0301efa> netlink_broadcast+0x7a/0x360  

this isn't allow to be called from IRQ context, because it takes
nl_table_lock for read, but that is taken as
        write_lock_bh(&nl_table_lock);
in 
	static void netlink_table_grab(void)
so without disabling interrupts; which would thus deadlock if this
read_lock-from-irq would hit.

>  <c02fb6a4> wireless_send_event+0x304/0x340
>  <e1cf8e11> ipw_rx+0x1371/0x1bb0 [ipw2200] 
>  <e1cfe6ac> ipw_irq_tasklet+0x13c/0x500 [ipw2200]
>  <c0121ea0> tasklet_action+0x40/0x90  

but it's more complex than that, since we ARE in BH context.
The complexity comes from us holding &priv->lock, which is 
used in hard irq context.

so the deadlock is like this:


cpu 0: user context					cpu1: softirq context
   netlink_table_grab takes nl_table_lock as		take priv->lock	in ipw_irq_tasklet
   write_lock_bh, but leaves irqs enabled


   hardirq comes in and the isr tries to take           in ipw_rx, call wireless_send_event which
   priv->lock but has to wait on cpu 1                  tries to take nl_table_lock for read
                                                        but has to wait for cpu0

and... kaboom kabang deadlock :)



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 34/61] lock validator: special locking: bdev
  2006-05-30  1:35   ` Andrew Morton
  2006-05-30  5:13     ` Arjan van de Ven
  2006-05-30  9:58     ` Al Viro
@ 2006-05-30 10:45     ` Arjan van de Ven
  2 siblings, 0 replies; 319+ messages in thread
From: Arjan van de Ven @ 2006-05-30 10:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel

On Mon, 2006-05-29 at 18:35 -0700, Andrew Morton wrote:
> On Mon, 29 May 2006 23:25:54 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > From: Ingo Molnar <mingo@elte.hu>
> > 
> > teach special (recursive) locking code to the lock validator. Has no
> > effect on non-lockdep kernels.
> > 
> 
> There's no description here of the problem which is being worked around. 
> This leaves everyone in the dark.
> 
> > +static int
> > +blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags)
> > +{
> > +	/*
> > +	 * This crockload is due to bad choice of ->open() type.
> > +	 * It will go away.
> > +	 * For now, block device ->open() routine must _not_
> > +	 * examine anything in 'inode' argument except ->i_rdev.
> > +	 */
> > +	struct file fake_file = {};
> > +	struct dentry fake_dentry = {};
> > +	fake_file.f_mode = mode;
> > +	fake_file.f_flags = flags;
> > +	fake_file.f_dentry = &fake_dentry;
> > +	fake_dentry.d_inode = bdev->bd_inode;
> > +
> > +	return do_open(bdev, &fake_file, BD_MUTEX_WHOLE);
> > +}
> 
> "crock" is a decent description ;)
> 
> How long will this live, and what will the fix look like?

this btw is not new crock; the only new thing is the BD_MUTEX_WHOLE :)


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 03/61] lock validator: sound/oss/emu10k1/midi.c cleanup
  2006-05-30  1:33   ` Andrew Morton
@ 2006-05-30 10:51     ` Takashi Iwai
  2006-05-30 11:03       ` Alexey Dobriyan
  0 siblings, 1 reply; 319+ messages in thread
From: Takashi Iwai @ 2006-05-30 10:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, arjan, Jaroslav Kysela

At Mon, 29 May 2006 18:33:17 -0700,
Andrew Morton wrote:
> 
> On Mon, 29 May 2006 23:23:19 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > move the __attribute outside of the DEFINE_SPINLOCK() section.
> > 
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> > ---
> >  sound/oss/emu10k1/midi.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > Index: linux/sound/oss/emu10k1/midi.c
> > ===================================================================
> > --- linux.orig/sound/oss/emu10k1/midi.c
> > +++ linux/sound/oss/emu10k1/midi.c
> > @@ -45,7 +45,7 @@
> >  #include "../sound_config.h"
> >  #endif
> >  
> > -static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));
> > +static __attribute((unused)) DEFINE_SPINLOCK(midi_spinlock);
> >  
> >  static void init_midi_hdr(struct midi_hdr *midihdr)
> >  {
> 
> I'll tag this as for-mainline-via-alsa.

Acked-by: Takashi Iwai <tiwai@suse.de>


It's OSS stuff, so feel free to push it from your side ;)


thanks,

Takashi

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  9:25     ` Mike Galbraith
@ 2006-05-30 10:57       ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30 10:57 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton


* Mike Galbraith <efault@gmx.de> wrote:

> On Tue, 2006-05-30 at 08:37 +0200, Ingo Molnar wrote:
> > * Mike Galbraith <efault@gmx.de> wrote:
> > 
> > > Darn.  It said all tests passed, then oopsed.
> > > 
> > > (have .config all gzipped up if you want it)
> > 
> > yeah, please.
> 
> (sent off list)

thanks, i managed to reproduce the warning with your .config - i'm 
debugging the problem now.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 03/61] lock validator: sound/oss/emu10k1/midi.c cleanup
  2006-05-30 10:51     ` Takashi Iwai
@ 2006-05-30 11:03       ` Alexey Dobriyan
  0 siblings, 0 replies; 319+ messages in thread
From: Alexey Dobriyan @ 2006-05-30 11:03 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Andrew Morton, Ingo Molnar, linux-kernel, arjan, Jaroslav Kysela

On Tue, May 30, 2006 at 12:51:53PM +0200, Takashi Iwai wrote:
> At Mon, 29 May 2006 18:33:17 -0700,
> Andrew Morton wrote:
> > 
> > On Mon, 29 May 2006 23:23:19 +0200
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > move the __attribute outside of the DEFINE_SPINLOCK() section.
> > > 
> > > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> > > ---
> > >  sound/oss/emu10k1/midi.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > Index: linux/sound/oss/emu10k1/midi.c
> > > ===================================================================
> > > --- linux.orig/sound/oss/emu10k1/midi.c
> > > +++ linux/sound/oss/emu10k1/midi.c
> > > @@ -45,7 +45,7 @@
> > >  #include "../sound_config.h"
> > >  #endif
> > >  
> > > -static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));
> > > +static __attribute((unused)) DEFINE_SPINLOCK(midi_spinlock);
> > >  
> > >  static void init_midi_hdr(struct midi_hdr *midihdr)
> > >  {
> > 
> > I'll tag this as for-mainline-via-alsa.
> 
> Acked-by: Takashi Iwai <tiwai@suse.de>
> 
> 
> It's OSS stuff, so feel free to push it from your side ;)

Why it is marked unused when in fact it's used?

[PATCH] Mark midi_spinlock as used

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

--- a/sound/oss/emu10k1/midi.c
+++ b/sound/oss/emu10k1/midi.c
@@ -45,7 +45,7 @@
 #include "../sound_config.h"
 #endif
 
-static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));
+static DEFINE_SPINLOCK(midi_spinlock);
 
 static void init_midi_hdr(struct midi_hdr *midihdr)
 {


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 10:26   ` Arjan van de Ven
@ 2006-05-30 11:42     ` Benoit Boissinot
  2006-05-30 12:13       ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Benoit Boissinot @ 2006-05-30 11:42 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: jketreno, yi.zhu, Andrew Morton, Ingo Molnar, linux-kernel

On Tue, May 30, 2006 at 12:26:27PM +0200, Arjan van de Ven wrote:
> On Tue, 2006-05-30 at 11:14 +0200, Benoit Boissinot wrote:
> > On 5/29/06, Ingo Molnar <mingo@elte.hu> wrote:
> > > We are pleased to announce the first release of the "lock dependency
> > > correctness validator" kernel debugging feature, which can be downloaded
> > > from:
> > >
> > >   http://redhat.com/~mingo/lockdep-patches/
> > > [snip]
> > 
> > I get this right after ipw2200 is loaded (it is quite verbose, I
> > probably shoudln't post everything...)
> > 
> > ipw2200: Detected Intel PRO/Wireless 2200BG Network Connection
> > ipw2200: Detected geography ZZD (13 802.11bg channels, 0 802.11a channels)
> 
> 
> >  <c0301efa> netlink_broadcast+0x7a/0x360  
> 
> this isn't allow to be called from IRQ context, because it takes
> nl_table_lock for read, but that is taken as
>         write_lock_bh(&nl_table_lock);
> in 
> 	static void netlink_table_grab(void)
> so without disabling interrupts; which would thus deadlock if this
> read_lock-from-irq would hit.
> 
> >  <c02fb6a4> wireless_send_event+0x304/0x340
> >  <e1cf8e11> ipw_rx+0x1371/0x1bb0 [ipw2200] 
> >  <e1cfe6ac> ipw_irq_tasklet+0x13c/0x500 [ipw2200]
> >  <c0121ea0> tasklet_action+0x40/0x90  
> 
> but it's more complex than that, since we ARE in BH context.
> The complexity comes from us holding &priv->lock, which is 
> used in hard irq context.

It is probably related, but I got this in my log too:

BUG: warning at kernel/softirq.c:86/local_bh_disable()
 <c010402d> show_trace+0xd/0x10  <c0104687> dump_stack+0x17/0x20
 <c0121fdc> local_bh_disable+0x5c/0x70  <c03520f1> _read_lock_bh+0x11/0x30
 <c02e8dce> sock_def_readable+0x1e/0x80  <c0302130> netlink_broadcast+0x2b0/0x360
 <c02fb6a4> wireless_send_event+0x304/0x340  <e1cf8e11> ipw_rx+0x1371/0x1bb0 [ipw2200]
 <e1cfe6ac> ipw_irq_tasklet+0x13c/0x500 [ipw2200] <c0121ea0> tasklet_action+0x40/0x90
 <c01223b4> __do_softirq+0x54/0xc0  <c01056bb> do_softirq+0x5b/0xf0
 =======================
 <c0122455> irq_exit+0x35/0x40  <c01057c7> do_IRQ+0x77/0xc0
 <c0103949> common_interrupt+0x25/0x2c 

> 
> so the deadlock is like this:
> 
> 
> cpu 0: user context					cpu1: softirq context
>    netlink_table_grab takes nl_table_lock as		take priv->lock	in ipw_irq_tasklet
>    write_lock_bh, but leaves irqs enabled
> 
> 
>    hardirq comes in and the isr tries to take           in ipw_rx, call wireless_send_event which
>    priv->lock but has to wait on cpu 1                  tries to take nl_table_lock for read
>                                                         but has to wait for cpu0
> 
> and... kaboom kabang deadlock :)
> 
> 

-- 
powered by bash/screen/(urxvt/fvwm|linux-console)/gentoo/gnu/linux OS

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 11:42     ` Benoit Boissinot
@ 2006-05-30 12:13       ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30 12:13 UTC (permalink / raw)
  To: Benoit Boissinot
  Cc: Arjan van de Ven, jketreno, yi.zhu, Andrew Morton, linux-kernel


* Benoit Boissinot <benoit.boissinot@ens-lyon.org> wrote:

> It is probably related, but I got this in my log too:
> 
> BUG: warning at kernel/softirq.c:86/local_bh_disable()

this one is harmless, you can ignore it. (already sent a patch to remove 
the WARN_ON)

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 61/61] lock validator: enable lock validator in Kconfig
  2006-05-29 21:28 ` [patch 61/61] lock validator: enable lock validator in Kconfig Ingo Molnar
  2006-05-30  1:36   ` Andrew Morton
@ 2006-05-30 13:33   ` Roman Zippel
  2006-06-23 11:01     ` Ingo Molnar
  1 sibling, 1 reply; 319+ messages in thread
From: Roman Zippel @ 2006-05-30 13:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton

Hi,

On Mon, 29 May 2006, Ingo Molnar wrote:

> Index: linux/lib/Kconfig.debug
> ===================================================================
> --- linux.orig/lib/Kconfig.debug
> +++ linux/lib/Kconfig.debug
> @@ -184,6 +184,173 @@ config DEBUG_SPINLOCK
>  	  best used in conjunction with the NMI watchdog so that spinlock
>  	  deadlocks are also debuggable.
>  
> +config PROVE_SPIN_LOCKING
> +	bool "Prove spin-locking correctness"
> +	default y

Could you please keep all the defaults in a separate -mm-only patch, so 
it doesn't get merged?
There are also a number of dependencies on DEBUG_KERNEL missing, it 
completely breaks the debugging menu.

> +config LOCKDEP
> +	bool
> +	default y
> +	depends on PROVE_SPIN_LOCKING || PROVE_RW_LOCKING || PROVE_MUTEX_LOCKING || PROVE_RWSEM_LOCKING

This can be written shorter as:

config LOCKDEP
	def_bool PROVE_SPIN_LOCKING || PROVE_RW_LOCKING || PROVE_MUTEX_LOCKING || PROVE_RWSEM_LOCKING

bye, Roman

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  5:45       ` Arjan van de Ven
  2006-05-30  6:07         ` Michal Piotrowski
@ 2006-05-30 14:10         ` Dave Jones
  2006-05-30 14:19           ` Arjan van de Ven
  2006-05-30 20:54         ` [patch, -rc5-mm1] lock validator: select KALLSYMS_ALL Ingo Molnar
  2 siblings, 1 reply; 319+ messages in thread
From: Dave Jones @ 2006-05-30 14:10 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, linux-kernel, Michal Piotrowski, Ingo Molnar

On Tue, May 30, 2006 at 07:45:47AM +0200, Arjan van de Ven wrote:

 > One
 > ---
 > store_scaling_governor takes policy->lock and then calls __cpufreq_set_policy
 > __cpufreq_set_policy calls __cpufreq_governor
 > __cpufreq_governor  calls __cpufreq_driver_target via cpufreq_governor_performance
 > __cpufreq_driver_target calls lock_cpu_hotplug() (which takes the hotplug lock)
 > 
 > 
 > Two
 > ---
 > cpufreq_stats_init lock_cpu_hotplug() and then calls cpufreq_stat_cpu_callback
 > cpufreq_stat_cpu_callback calls cpufreq_update_policy
 > cpufreq_update_policy takes the policy->lock
 > 
 > 
 > so this looks like a real honest AB-BA deadlock to me...

This looks a little clearer this morning.  I missed the fact that sys_init_module
isn't completely serialised, only the loading part. ->init routines can and will be
called in parallel.

I don't see where cpufreq_update_policy takes policy->lock though.
In my tree it just takes the per-cpu data->lock.

Time for more wake-up juice? or am I missing something obvious again?

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 14:10         ` Dave Jones
@ 2006-05-30 14:19           ` Arjan van de Ven
  2006-05-30 14:58             ` Dave Jones
  0 siblings, 1 reply; 319+ messages in thread
From: Arjan van de Ven @ 2006-05-30 14:19 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andrew Morton, linux-kernel, Michal Piotrowski, Ingo Molnar

On Tue, 2006-05-30 at 10:10 -0400, Dave Jones wrote:
> On Tue, May 30, 2006 at 07:45:47AM +0200, Arjan van de Ven wrote:
> 
>  > One
>  > ---
>  > store_scaling_governor takes policy->lock and then calls __cpufreq_set_policy
>  > __cpufreq_set_policy calls __cpufreq_governor
>  > __cpufreq_governor  calls __cpufreq_driver_target via cpufreq_governor_performance
>  > __cpufreq_driver_target calls lock_cpu_hotplug() (which takes the hotplug lock)
>  > 
>  > 
>  > Two
>  > ---
>  > cpufreq_stats_init lock_cpu_hotplug() and then calls cpufreq_stat_cpu_callback
>  > cpufreq_stat_cpu_callback calls cpufreq_update_policy
>  > cpufreq_update_policy takes the policy->lock
>  > 
>  > 
>  > so this looks like a real honest AB-BA deadlock to me...
> 
> This looks a little clearer this morning.  I missed the fact that sys_init_module
> isn't completely serialised, only the loading part. ->init routines can and will be
> called in parallel.
> 
> I don't see where cpufreq_update_policy takes policy->lock though.
> In my tree it just takes the per-cpu data->lock.

isn't that basically the same lock?



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 14:19           ` Arjan van de Ven
@ 2006-05-30 14:58             ` Dave Jones
  2006-05-30 17:11               ` Dominik Brodowski
  0 siblings, 1 reply; 319+ messages in thread
From: Dave Jones @ 2006-05-30 14:58 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, linux-kernel, Michal Piotrowski, Ingo Molnar, linux

On Tue, May 30, 2006 at 04:19:22PM +0200, Arjan van de Ven wrote:

 > >  > One
 > >  > ---
 > >  > store_scaling_governor takes policy->lock and then calls __cpufreq_set_policy
 > >  > __cpufreq_set_policy calls __cpufreq_governor
 > >  > __cpufreq_governor  calls __cpufreq_driver_target via cpufreq_governor_performance
 > >  > __cpufreq_driver_target calls lock_cpu_hotplug() (which takes the hotplug lock)
 > >  > 
 > >  > 
 > >  > Two
 > >  > ---
 > >  > cpufreq_stats_init lock_cpu_hotplug() and then calls cpufreq_stat_cpu_callback
 > >  > cpufreq_stat_cpu_callback calls cpufreq_update_policy
 > >  > cpufreq_update_policy takes the policy->lock
 > >  > 
 > >  > 
 > >  > so this looks like a real honest AB-BA deadlock to me...
 > > 
 > > This looks a little clearer this morning.  I missed the fact that sys_init_module
 > > isn't completely serialised, only the loading part. ->init routines can and will be
 > > called in parallel.
 > > 
 > > I don't see where cpufreq_update_policy takes policy->lock though.
 > > In my tree it just takes the per-cpu data->lock.
 > 
 > isn't that basically the same lock?

Ugh, I've completely forgotten how this stuff fits together.

Dominik, any clues ?

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 14:58             ` Dave Jones
@ 2006-05-30 17:11               ` Dominik Brodowski
  2006-05-30 19:02                 ` Dave Jones
  2006-05-30 19:39                 ` Dave Jones
  0 siblings, 2 replies; 319+ messages in thread
From: Dominik Brodowski @ 2006-05-30 17:11 UTC (permalink / raw)
  To: Dave Jones, Arjan van de Ven, Andrew Morton, linux-kernel,
	Michal Piotrowski, Ingo Molnar, nanhai.zou

Hi,

On Tue, May 30, 2006 at 10:58:52AM -0400, Dave Jones wrote:
> On Tue, May 30, 2006 at 04:19:22PM +0200, Arjan van de Ven wrote:
> 
>  > >  > One
>  > >  > ---
>  > >  > store_scaling_governor takes policy->lock and then calls __cpufreq_set_policy
>  > >  > __cpufreq_set_policy calls __cpufreq_governor
>  > >  > __cpufreq_governor  calls __cpufreq_driver_target via cpufreq_governor_performance
>  > >  > __cpufreq_driver_target calls lock_cpu_hotplug() (which takes the hotplug lock)
>  > >  > 
>  > >  > 
>  > >  > Two
>  > >  > ---
>  > >  > cpufreq_stats_init lock_cpu_hotplug() and then calls cpufreq_stat_cpu_callback
>  > >  > cpufreq_stat_cpu_callback calls cpufreq_update_policy
>  > >  > cpufreq_update_policy takes the policy->lock
>  > >  > 
>  > >  > 
>  > >  > so this looks like a real honest AB-BA deadlock to me...
>  > > 
>  > > This looks a little clearer this morning.  I missed the fact that sys_init_module
>  > > isn't completely serialised, only the loading part. ->init routines can and will be
>  > > called in parallel.
>  > > 
>  > > I don't see where cpufreq_update_policy takes policy->lock though.
>  > > In my tree it just takes the per-cpu data->lock.
>  > 
>  > isn't that basically the same lock?
> 
> Ugh, I've completely forgotten how this stuff fits together.
> 
> Dominik, any clues ?

That's indeed a possible deadlock situation -- what's the
cpufreq_update_policy() call needed for in cpufreq_stat_cpu_callback anyway?

	Dominik

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond)
  2006-05-30  1:33   ` Andrew Morton
@ 2006-05-30 17:38     ` Steven Rostedt
  2006-06-03 18:09       ` Steven Rostedt
  0 siblings, 1 reply; 319+ messages in thread
From: Steven Rostedt @ 2006-05-30 17:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, arjan

On Mon, 2006-05-29 at 18:33 -0700, Andrew Morton wrote:
> On Mon, 29 May 2006 23:23:28 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > add WARN_ON_ONCE(cond) to print once-per-bootup messages.
> > 
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> > ---
> >  include/asm-generic/bug.h |   13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > Index: linux/include/asm-generic/bug.h
> > ===================================================================
> > --- linux.orig/include/asm-generic/bug.h
> > +++ linux/include/asm-generic/bug.h
> > @@ -44,4 +44,17 @@
> >  # define WARN_ON_SMP(x)			do { } while (0)
> >  #endif
> >  
> > +#define WARN_ON_ONCE(condition)				\
> > +({							\
> > +	static int __warn_once = 1;			\
> > +	int __ret = 0;					\
> > +							\
> > +	if (unlikely(__warn_once && (condition))) {	\

Since __warn_once is likely to be true, and the condition is likely to
be false, wouldn't it be better to switch this around to:

  if (unlikely((condition) && __warn_once)) {

So the && will fall out before having to check a global variable.

Only after the unlikely condition would the __warn_once be false.

-- Steve

> > +		__warn_once = 0;			\
> > +		WARN_ON(1);				\
> > +		__ret = 1;				\
> > +	}						\
> > +	__ret;						\
> > +})
> > +
> >  #endif
> 
> I'll queue this for mainline inclusion.



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 06/61] lock validator: add __module_address() method
  2006-05-30  1:33   ` Andrew Morton
@ 2006-05-30 17:45     ` Steven Rostedt
  2006-06-23  8:38     ` Ingo Molnar
  1 sibling, 0 replies; 319+ messages in thread
From: Steven Rostedt @ 2006-05-30 17:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, arjan

On Mon, 2006-05-29 at 18:33 -0700, Andrew Morton wrote:

> 
> I'd suggest that __module_address() should do the same thing, from an API neatness
> POV.  Although perhaps that's mot very useful if we didn't take a ref on the returned
> object (but module_text_address() doesn't either).
> 
> Also, the name's a bit misleading - it sounds like it returns the address
> of a module or something.  __module_any_address() would be better, perhaps?

How about __valid_module_address()  so that it describes exactly what it
is doing. Or __module_address_valid().

-- Steve

> 
> Also, how come this doesn't need modlist_lock()?



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 17:11               ` Dominik Brodowski
@ 2006-05-30 19:02                 ` Dave Jones
  2006-05-30 19:25                   ` Roland Dreier
  2006-05-30 19:39                 ` Dave Jones
  1 sibling, 1 reply; 319+ messages in thread
From: Dave Jones @ 2006-05-30 19:02 UTC (permalink / raw)
  To: Dominik Brodowski, Arjan van de Ven, Andrew Morton, linux-kernel,
	Michal Piotrowski, Ingo Molnar, nanhai.zou

On Tue, May 30, 2006 at 07:11:18PM +0200, Dominik Brodowski wrote:

 > That's indeed a possible deadlock situation -- what's the
 > cpufreq_update_policy() call needed for in cpufreq_stat_cpu_callback anyway?

I was hoping you could enlighten me :)
I started picking through history with gitk, but my tk install uses
fonts that make my eyes bleed.  My kingdom for a 'git annotate'..

		Dave
-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 19:02                 ` Dave Jones
@ 2006-05-30 19:25                   ` Roland Dreier
  2006-05-30 19:34                     ` Dave Jones
  2006-05-30 20:41                     ` Ingo Molnar
  0 siblings, 2 replies; 319+ messages in thread
From: Roland Dreier @ 2006-05-30 19:25 UTC (permalink / raw)
  To: Dave Jones
  Cc: Dominik Brodowski, Arjan van de Ven, Andrew Morton, linux-kernel,
	Michal Piotrowski, Ingo Molnar, nanhai.zou

    Dave> I was hoping you could enlighten me :) I started picking
    Dave> through history with gitk, but my tk install uses fonts that
    Dave> make my eyes bleed.  My kingdom for a 'git annotate'..

Heh -- try "git annotate" or "git blame".  I think you need git 1.3.x
for that... details of where to send your kingdom forthcoming...

 - R.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 19:25                   ` Roland Dreier
@ 2006-05-30 19:34                     ` Dave Jones
  2006-05-30 20:41                     ` Ingo Molnar
  1 sibling, 0 replies; 319+ messages in thread
From: Dave Jones @ 2006-05-30 19:34 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Dominik Brodowski, Arjan van de Ven, Andrew Morton, linux-kernel,
	Michal Piotrowski, Ingo Molnar, nanhai.zou

On Tue, May 30, 2006 at 12:25:29PM -0700, Roland Dreier wrote:
 >     Dave> I was hoping you could enlighten me :) I started picking
 >     Dave> through history with gitk, but my tk install uses fonts that
 >     Dave> make my eyes bleed.  My kingdom for a 'git annotate'..
 > 
 > Heh -- try "git annotate" or "git blame".  I think you need git 1.3.x
 > for that... details of where to send your kingdom forthcoming...

How on earth did I miss that?  Thanks for the pointer.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 17:11               ` Dominik Brodowski
  2006-05-30 19:02                 ` Dave Jones
@ 2006-05-30 19:39                 ` Dave Jones
  2006-05-30 19:53                   ` Ashok Raj
  1 sibling, 1 reply; 319+ messages in thread
From: Dave Jones @ 2006-05-30 19:39 UTC (permalink / raw)
  To: Dominik Brodowski, Arjan van de Ven, Andrew Morton, linux-kernel,
	Michal Piotrowski, Ingo Molnar, nanhai.zou, ashok.raj

On Tue, May 30, 2006 at 07:11:18PM +0200, Dominik Brodowski wrote:
 
 > On Tue, May 30, 2006 at 10:58:52AM -0400, Dave Jones wrote:
 > > On Tue, May 30, 2006 at 04:19:22PM +0200, Arjan van de Ven wrote:
 > > 
 > >  > >  > One
 > >  > >  > ---
 > >  > >  > store_scaling_governor takes policy->lock and then calls __cpufreq_set_policy
 > >  > >  > __cpufreq_set_policy calls __cpufreq_governor
 > >  > >  > __cpufreq_governor  calls __cpufreq_driver_target via cpufreq_governor_performance
 > >  > >  > __cpufreq_driver_target calls lock_cpu_hotplug() (which takes the hotplug lock)
 > >  > >  > 
 > >  > >  > 
 > >  > >  > Two
 > >  > >  > ---
 > >  > >  > cpufreq_stats_init lock_cpu_hotplug() and then calls cpufreq_stat_cpu_callback
 > >  > >  > cpufreq_stat_cpu_callback calls cpufreq_update_policy
 > >  > >  > cpufreq_update_policy takes the policy->lock
 > >  > >  > 
 > >  > >  > 
 > >  > >  > so this looks like a real honest AB-BA deadlock to me...
 > >  > > 
 > >  > > This looks a little clearer this morning.  I missed the fact that sys_init_module
 > >  > > isn't completely serialised, only the loading part. ->init routines can and will be
 > >  > > called in parallel.
 > >  > > 
 > >  > > I don't see where cpufreq_update_policy takes policy->lock though.
 > >  > > In my tree it just takes the per-cpu data->lock.
 > >  > 
 > >  > isn't that basically the same lock?
 > > 
 > > Ugh, I've completely forgotten how this stuff fits together.
 > > 
 > > Dominik, any clues ?
 > 
 > That's indeed a possible deadlock situation -- what's the
 > cpufreq_update_policy() call needed for in cpufreq_stat_cpu_callback anyway?

Oh wow. Reading the commit message of this change rings alarm bells.

change c32b6b8e524d2c337767d312814484d9289550cf has this to say..

    [PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers
    
    cpufreq entries in sysfs should only be populated when CPU is online state.
     When we either boot with maxcpus=x and then boot the other cpus by echoing
    to sysfs online file, these entries should be created and destroyed when
    CPU_DEAD is notified.  Same treatement as cache entries under sysfs.
    
    We place the processor in the lowest frequency, so hw managed P-State
    transitions can still work on the other threads to save power.
    
    Primary goal was to just make these directories appear/disapper dynamically.
    
    There is one in this patch i had to do, which i really dont like myself but
    probably best if someone handling the cpufreq infrastructure could give
    this code right treatment if this is not acceptable.  I guess its probably
    good for the first cut.
    
    - Converting lock_cpu_hotplug()/unlock_cpu_hotplug() to disable/enable preempt.
      The locking was smack in the middle of the notification path, when the
      hotplug is already holding the lock. I tried another solution to avoid this
      so avoid taking locks if we know we are from notification path. The solution
      was getting very ugly and i decided this was probably good for this iteration
      until someone who understands cpufreq could do a better job than me.

So, that last part pretty highlights that we knew about this problem, and meant to
come back and fix it later. Surprise surprise, no one came back and fixed it.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 19:39                 ` Dave Jones
@ 2006-05-30 19:53                   ` Ashok Raj
  2006-06-01  5:50                     ` Nathan Lynch
  0 siblings, 1 reply; 319+ messages in thread
From: Ashok Raj @ 2006-05-30 19:53 UTC (permalink / raw)
  To: Dave Jones
  Cc: Dominik Brodowski, Arjan van de Ven, Andrew Morton, linux-kernel,
	Michal Piotrowski, Ingo Molnar, nanhai.zou, ashok.raj

On Tue, May 30, 2006 at 03:39:47PM -0400, Dave Jones wrote:

> So, that last part pretty highlights that we knew about this problem, and meant to
> come back and fix it later. Surprise surprise, no one came back and fixed it.
> 

There was another iteration after his, and currently we keep track of
the owner in lock_cpu_hotplug()->__lock_cpu_hotplug(). So if we are in 
same thread context we dont acquire locks.

    if (lock_cpu_hotplug_owner != current) {
        if (interruptible)
            ret = down_interruptible(&cpucontrol);
        else
            down(&cpucontrol);
    }


the lock and unlock kept track of the depth as well, so we know when to release

We didnt hear any better suggestions (from cpufreq folks), so we left it in 
that state (atlease the same thread doenst try to take the lock twice) 
that resulted in deadlocks earlier.

-- 
Cheers,
Ashok Raj
- Open Source Technology Center

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 19:25                   ` Roland Dreier
  2006-05-30 19:34                     ` Dave Jones
@ 2006-05-30 20:41                     ` Ingo Molnar
  2006-05-30 20:44                       ` Ingo Molnar
  2006-05-30 21:58                       ` Paolo Ciarrocchi
  1 sibling, 2 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30 20:41 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Dave Jones, Dominik Brodowski, Arjan van de Ven, Andrew Morton,
	linux-kernel, Michal Piotrowski, nanhai.zou


* Roland Dreier <rdreier@cisco.com> wrote:

>     Dave> I was hoping you could enlighten me :) I started picking
>     Dave> through history with gitk, but my tk install uses fonts that
>     Dave> make my eyes bleed.  My kingdom for a 'git annotate'..
> 
> Heh -- try "git annotate" or "git blame".  I think you need git 1.3.x 
> for that... details of where to send your kingdom forthcoming...

i use qgit, which is GTK based and thus uses the native desktop fonts.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 20:41                     ` Ingo Molnar
@ 2006-05-30 20:44                       ` Ingo Molnar
  2006-05-30 21:58                       ` Paolo Ciarrocchi
  1 sibling, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30 20:44 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Dave Jones, Dominik Brodowski, Arjan van de Ven, Andrew Morton,
	linux-kernel, Michal Piotrowski, nanhai.zou


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Roland Dreier <rdreier@cisco.com> wrote:
> 
> >     Dave> I was hoping you could enlighten me :) I started picking
> >     Dave> through history with gitk, but my tk install uses fonts that
> >     Dave> make my eyes bleed.  My kingdom for a 'git annotate'..
> > 
> > Heh -- try "git annotate" or "git blame".  I think you need git 1.3.x 
> > for that... details of where to send your kingdom forthcoming...
> 
> i use qgit, which is GTK based and thus uses the native desktop fonts.

and qgit annotates source files in the background while you are viewing 
them, and then you can click on lines to jump to the last commit that 
touched it. It doesnt need latest GIT, qgit always did this (by itself).

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 37/61] lock validator: special locking: dcache
  2006-05-30  1:35   ` Andrew Morton
@ 2006-05-30 20:51     ` Steven Rostedt
  2006-05-30 21:01       ` Ingo Molnar
  2006-06-23  9:51       ` Ingo Molnar
  0 siblings, 2 replies; 319+ messages in thread
From: Steven Rostedt @ 2006-05-30 20:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, arjan

On Mon, 2006-05-29 at 18:35 -0700, Andrew Morton wrote:

> > Index: linux/fs/dcache.c
> > ===================================================================
> > --- linux.orig/fs/dcache.c
> > +++ linux/fs/dcache.c
> > @@ -1380,10 +1380,10 @@ void d_move(struct dentry * dentry, stru
> >  	 */
> >  	if (target < dentry) {
> >  		spin_lock(&target->d_lock);
> > -		spin_lock(&dentry->d_lock);
> > +		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> >  	} else {
> >  		spin_lock(&dentry->d_lock);
> > -		spin_lock(&target->d_lock);
> > +		spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NESTED);
> >  	}
> > 
>  

[...]

> > +/*
> > + * dentry->d_lock spinlock nesting types:
> > + *
> > + * 0: normal
> > + * 1: nested
> > + */
> > +enum dentry_d_lock_type
> > +{
> > +	DENTRY_D_LOCK_NORMAL,
> > +	DENTRY_D_LOCK_NESTED
> > +};
> > +
> >  struct dentry_operations {
> >  	int (*d_revalidate)(struct dentry *, struct nameidata *);
> >  	int (*d_hash) (struct dentry *, struct qstr *);
> 
> DENTRY_D_LOCK_NORMAL isn't used anywhere.
> 

I guess it is implied with the normal spin_lock.  Since 
  spin_lock(&target->d_lock) and
  spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NORMAL)
are equivalent. (DENTRY_D_LOCK_NORMAL == 0)

Probably this deserves a comment.

-- Steve



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 38/61] lock validator: special locking: i_mutex
  2006-05-29 21:26 ` [patch 38/61] lock validator: special locking: i_mutex Ingo Molnar
@ 2006-05-30 20:53   ` Steven Rostedt
  2006-05-30 21:06     ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Steven Rostedt @ 2006-05-30 20:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton

On Mon, 2006-05-29 at 23:26 +0200, Ingo Molnar wrote:
> + * inode->i_mutex nesting types for the LOCKDEP validator:
> + *
> + * 0: the object of the current VFS operation
> + * 1: parent
> + * 2: child/target
> + */
> +enum inode_i_mutex_lock_type
> +{
> +       I_MUTEX_NORMAL,
> +       I_MUTEX_PARENT,
> +       I_MUTEX_CHILD
> +};
> +
> +/* 

I guess we can say the same about I_MUTEX_NORMAL.

-- Steve



^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch, -rc5-mm1] lock validator: select KALLSYMS_ALL
  2006-05-30  5:45       ` Arjan van de Ven
  2006-05-30  6:07         ` Michal Piotrowski
  2006-05-30 14:10         ` Dave Jones
@ 2006-05-30 20:54         ` Ingo Molnar
  2 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30 20:54 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Dave Jones, Andrew Morton, linux-kernel, Michal Piotrowski


* Arjan van de Ven <arjan@infradead.org> wrote:

> the reporter doesn't have CONFIG_KALLSYMS_ALL enabled which gives 
> sometimes misleading backtraces (should lockdep just enable 
> KALLSYMS_ALL to get more useful bugreports?)

agreed - the patch below does that.

-----------------------
Subject: lock validator: select KALLSYMS_ALL
From: Ingo Molnar <mingo@elte.hu>

all the kernel symbol printouts make alot more sense if KALLSYMS_ALL
is enabled too - force it on if lockdep is enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 lib/Kconfig.debug |    1 +
 1 file changed, 1 insertion(+)

Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -342,6 +342,7 @@ config LOCKDEP
 	default y
 	select FRAME_POINTER
 	select KALLSYMS
+	select KALLSYMS_ALL
 	depends on PROVE_SPIN_LOCKING || PROVE_RW_LOCKING || PROVE_MUTEX_LOCKING || PROVE_RWSEM_LOCKING
 
 config DEBUG_LOCKDEP

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 37/61] lock validator: special locking: dcache
  2006-05-30 20:51     ` Steven Rostedt
@ 2006-05-30 21:01       ` Ingo Molnar
  2006-06-23  9:51       ` Ingo Molnar
  1 sibling, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30 21:01 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andrew Morton, linux-kernel, arjan


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > > +enum dentry_d_lock_type
> > > +{
> > > +	DENTRY_D_LOCK_NORMAL,
> > > +	DENTRY_D_LOCK_NESTED
> > > +};
> > > +
> > >  struct dentry_operations {
> > >  	int (*d_revalidate)(struct dentry *, struct nameidata *);
> > >  	int (*d_hash) (struct dentry *, struct qstr *);
> > 
> > DENTRY_D_LOCK_NORMAL isn't used anywhere.
> 
> I guess it is implied with the normal spin_lock.  Since 
>   spin_lock(&target->d_lock) and
>   spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NORMAL)
> are equivalent. (DENTRY_D_LOCK_NORMAL == 0)

correct. This is the case for all the subtype enum definitions: 0 means 
normal spinlock [rwlock, rwsem, mutex] API use.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 38/61] lock validator: special locking: i_mutex
  2006-05-30 20:53   ` Steven Rostedt
@ 2006-05-30 21:06     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-30 21:06 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 2006-05-29 at 23:26 +0200, Ingo Molnar wrote:
> > + * inode->i_mutex nesting types for the LOCKDEP validator:
> > + *
> > + * 0: the object of the current VFS operation
> > + * 1: parent
> > + * 2: child/target
> > + */
> > +enum inode_i_mutex_lock_type
> > +{
> > +       I_MUTEX_NORMAL,
> > +       I_MUTEX_PARENT,
> > +       I_MUTEX_CHILD
> > +};
> > +
> > +/* 
> 
> I guess we can say the same about I_MUTEX_NORMAL.

yeah. Subtypes start from 1, as 0 is the basic type.

Lock types are keyed via static kernel addresses. This means that we can 
use the lock address (for DEFINE_SPINLOCK) or the static key embedded in 
spin_lock_init() as a key in 99% of the cases. The key [struct 
lockdep_type_key, see include/linux/lockdep.h] occupies enough bytes (of 
kernel static virtual memory) so that the keys remain automatically 
unique. Right now MAX_LOKCDEP_SUBTYPES is 8, so the keys take at most 8 
bytes. (To save some memory there's another detail: for static locks 
(DEFINE_SPINLOCK ones) we use the lock address itself as the key.)

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 20:41                     ` Ingo Molnar
  2006-05-30 20:44                       ` Ingo Molnar
@ 2006-05-30 21:58                       ` Paolo Ciarrocchi
  2006-05-31  8:40                         ` Ingo Molnar
  1 sibling, 1 reply; 319+ messages in thread
From: Paolo Ciarrocchi @ 2006-05-30 21:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Roland Dreier, Dave Jones, Dominik Brodowski, Arjan van de Ven,
	Andrew Morton, linux-kernel, Michal Piotrowski, nanhai.zou

On 5/30/06, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Roland Dreier <rdreier@cisco.com> wrote:
>
> >     Dave> I was hoping you could enlighten me :) I started picking
> >     Dave> through history with gitk, but my tk install uses fonts that
> >     Dave> make my eyes bleed.  My kingdom for a 'git annotate'..
> >
> > Heh -- try "git annotate" or "git blame".  I think you need git 1.3.x
> > for that... details of where to send your kingdom forthcoming...
>
> i use qgit, which is GTK based and thus uses the native desktop fonts.

GTK? A typo, I suppose.
QGit is a git GUI viewer built on Qt/C++ (that I hope will be added to
the git.git tree soon).

Ciao,

-- 
Paolo
http://paolociarrocchi.googlepages.com

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 02/61] lock validator: forcedeth.c fix
  2006-05-30  1:33   ` Andrew Morton
@ 2006-05-31  5:40     ` Manfred Spraul
  0 siblings, 0 replies; 319+ messages in thread
From: Manfred Spraul @ 2006-05-31  5:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, arjan, Ayaz Abdulla

Andrew Morton wrote:

>On Mon, 29 May 2006 23:23:13 +0200
>Ingo Molnar <mingo@elte.hu> wrote:
>
>  
>
>>nv_do_nic_poll() is called from timer softirqs, which has interrupts
>>enabled, but np->lock might also be taken by some other interrupt
>>context.
>>    
>>
>
>But the driver does disable_irq(), so I'd say this was a false-positive.
>
>And afaict this is not a timer handler - it's a poll_controller handler
>(although maybe that get called from timer handler somewhere?)
>
>  
>
It's both a timer handler and a poll_controller handler:
- if the interrupt handler causes a system overload (gig e without irq 
mitigation...), then the nic disables the irq on the device and waits 
one tick and handles the interrupts from a timer. This is nv_do_nic_poll().

- nv_do_nic_poll is also called from the poll_controller handler.

I'll try to remove the disable_irq() calls from the poll_controller 
handler, but probably not before the week-end.

--
    Manfred

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 21:58                       ` Paolo Ciarrocchi
@ 2006-05-31  8:40                         ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-05-31  8:40 UTC (permalink / raw)
  To: Paolo Ciarrocchi
  Cc: Roland Dreier, Dave Jones, Dominik Brodowski, Arjan van de Ven,
	Andrew Morton, linux-kernel, Michal Piotrowski, nanhai.zou


* Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com> wrote:

> GTK? A typo, I suppose.

brainfart, sorry :)

> QGit is a git GUI viewer built on Qt/C++ (that I hope will be added to 
> the git.git tree soon).

yeah.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30 19:53                   ` Ashok Raj
@ 2006-06-01  5:50                     ` Nathan Lynch
  0 siblings, 0 replies; 319+ messages in thread
From: Nathan Lynch @ 2006-06-01  5:50 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Dave Jones, Dominik Brodowski, Arjan van de Ven, Andrew Morton,
	linux-kernel, Michal Piotrowski, Ingo Molnar, nanhai.zou,
	Zwane Mwaikambo

Ashok Raj wrote:
> On Tue, May 30, 2006 at 03:39:47PM -0400, Dave Jones wrote:
> 
> > So, that last part pretty highlights that we knew about this problem, and meant to
> > come back and fix it later. Surprise surprise, no one came back and fixed it.
> > 
> 
> There was another iteration after his, and currently we keep track of
> the owner in lock_cpu_hotplug()->__lock_cpu_hotplug(). So if we are in 
> same thread context we dont acquire locks.
> 
>     if (lock_cpu_hotplug_owner != current) {
>         if (interruptible)
>             ret = down_interruptible(&cpucontrol);
>         else
>             down(&cpucontrol);
>     }
> 
> 
> the lock and unlock kept track of the depth as well, so we know when to release

Can we please kill this recursive locking hack in the cpu hotplug code
in 2.6.18/soon?  It's papering over the real problem, and I worry that
if it's allowed to sit there, other users will start to take
"advantage" of it.  Perhaps, at the very least, cpufreq could be made
to handle this itself instead of polluting the core code...


> We didnt hear any better suggestions (from cpufreq folks), so we left it in 
> that state (atlease the same thread doenst try to take the lock twice) 
> that resulted in deadlocks earlier.

Fix (and document!) the ordering of lock acquisitions in cpufreq?

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch mm1-rc2] lock validator: netlink.c netlink_table_grab fix
  2006-05-30  9:14 ` Benoit Boissinot
  2006-05-30 10:26   ` Arjan van de Ven
@ 2006-06-01 14:42   ` Frederik Deweerdt
  2006-06-02  3:10     ` Zhu Yi
  1 sibling, 1 reply; 319+ messages in thread
From: Frederik Deweerdt @ 2006-06-01 14:42 UTC (permalink / raw)
  To: Benoit Boissinot
  Cc: linux-kernel, Ingo Molnar, Arjan van de Ven, Andrew Morton,
	yi.zhu, jketreno

On Tue, May 30, 2006 at 11:14:15AM +0200, Benoit Boissinot wrote:
> On 5/29/06, Ingo Molnar <mingo@elte.hu> wrote:
> >We are pleased to announce the first release of the "lock dependency
> >correctness validator" kernel debugging feature, which can be downloaded
> >from:
> >
> >  http://redhat.com/~mingo/lockdep-patches/
> >[snip]
> 
> I get this right after ipw2200 is loaded (it is quite verbose, I
> probably shoudln't post everything...)
> 
This got rid of the oops for me, is it the right fix?

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
--- /usr/src/linux/net/netlink/af_netlink.c	2006-05-24 14:58:38.000000000 +0200
+++ net/netlink/af_netlink.c	2006-06-01 16:36:51.000000000 +0200
@@ -157,7 +157,7 @@ static void netlink_sock_destruct(struct
 
 static void netlink_table_grab(void)
 {
-	write_lock_bh(&nl_table_lock);
+	write_lock_irq(&nl_table_lock);
 
 	if (atomic_read(&nl_table_users)) {
 		DECLARE_WAITQUEUE(wait, current);
@@ -167,9 +167,9 @@ static void netlink_table_grab(void)
 			set_current_state(TASK_UNINTERRUPTIBLE);
 			if (atomic_read(&nl_table_users) == 0)
 				break;
-			write_unlock_bh(&nl_table_lock);
+			write_unlock_irq(&nl_table_lock);
 			schedule();
-			write_lock_bh(&nl_table_lock);
+			write_lock_irq(&nl_table_lock);
 		}
 
 		__set_current_state(TASK_RUNNING);
@@ -179,7 +179,7 @@ static void netlink_table_grab(void)
 
 static __inline__ void netlink_table_ungrab(void)
 {
-	write_unlock_bh(&nl_table_lock);
+	write_unlock_irq(&nl_table_lock);
 	wake_up(&nl_table_wait);
 }
 


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch mm1-rc2] lock validator: netlink.c netlink_table_grab fix
  2006-06-01 14:42   ` [patch mm1-rc2] lock validator: netlink.c netlink_table_grab fix Frederik Deweerdt
@ 2006-06-02  3:10     ` Zhu Yi
  2006-06-02  9:53       ` Frederik Deweerdt
  0 siblings, 1 reply; 319+ messages in thread
From: Zhu Yi @ 2006-06-02  3:10 UTC (permalink / raw)
  To: Frederik Deweerdt
  Cc: Benoit Boissinot, linux-kernel, Ingo Molnar, Arjan van de Ven,
	Andrew Morton, jketreno

[-- Attachment #1: Type: text/plain, Size: 227 bytes --]

On Thu, 2006-06-01 at 16:42 +0200, Frederik Deweerdt wrote:
> This got rid of the oops for me, is it the right fix?

I don't think netlink will contend with hardirqs. Can you test with this
fix for ipw2200 driver?

Thanks,
-yi

[-- Attachment #2: ipw2200-lockdep-fix.patch --]
[-- Type: text/x-patch, Size: 1141 bytes --]

diff -urp a/drivers/net/wireless/ipw2200.c b/drivers/net/wireless/ipw2200.c
--- a/drivers/net/wireless/ipw2200.c	2006-04-01 09:47:24.000000000 +0800
+++ b/drivers/net/wireless/ipw2200.c	2006-06-01 14:32:00.000000000 +0800
@@ -11058,11 +11058,9 @@ static irqreturn_t ipw_isr(int irq, void
 	if (!priv)
 		return IRQ_NONE;
 
-	spin_lock(&priv->lock);
-
 	if (!(priv->status & STATUS_INT_ENABLED)) {
 		/* Shared IRQ */
-		goto none;
+		return IRQ_NONE;
 	}
 
 	inta = ipw_read32(priv, IPW_INTA_RW);
@@ -11071,12 +11069,12 @@ static irqreturn_t ipw_isr(int irq, void
 	if (inta == 0xFFFFFFFF) {
 		/* Hardware disappeared */
 		IPW_WARNING("IRQ INTA == 0xFFFFFFFF\n");
-		goto none;
+		return IRQ_NONE;
 	}
 
 	if (!(inta & (IPW_INTA_MASK_ALL & inta_mask))) {
 		/* Shared interrupt */
-		goto none;
+		return IRQ_NONE;
 	}
 
 	/* tell the device to stop sending interrupts */
@@ -11091,12 +11089,7 @@ static irqreturn_t ipw_isr(int irq, void
 
 	tasklet_schedule(&priv->irq_tasklet);
 
-	spin_unlock(&priv->lock);
-
 	return IRQ_HANDLED;
-      none:
-	spin_unlock(&priv->lock);
-	return IRQ_NONE;
 }
 
 static void ipw_rf_kill(void *adapter)

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch mm1-rc2] lock validator: netlink.c netlink_table_grab fix
  2006-06-02  3:10     ` Zhu Yi
@ 2006-06-02  9:53       ` Frederik Deweerdt
  2006-06-05  3:40         ` Zhu Yi
  0 siblings, 1 reply; 319+ messages in thread
From: Frederik Deweerdt @ 2006-06-02  9:53 UTC (permalink / raw)
  To: Zhu Yi
  Cc: Benoit Boissinot, linux-kernel, Ingo Molnar, Arjan van de Ven,
	Andrew Morton, jketreno

On Fri, Jun 02, 2006 at 11:10:10AM +0800, Zhu Yi wrote:
> On Thu, 2006-06-01 at 16:42 +0200, Frederik Deweerdt wrote:
> > This got rid of the oops for me, is it the right fix?
> 
> I don't think netlink will contend with hardirqs. Can you test with this
> fix for ipw2200 driver?
> 
It does work, thanks. But doesn't this add a possibility of missing 
some interrupts?
	cpu0				cpu1
        ====				====
in isr				in tasklet

				ipw_enable_interrupts
				|->priv->status |= STATUS_INT_ENABLED;

ipw_disable_interrupts
|->priv->status &= ~STATUS_INT_ENABLED;
|->ipw_write32(priv, IPW_INTA_MASK_R, ~IPW_INTA_MASK_ALL);

				|->ipw_write32(priv, IPW_INTA_MASK_R, IPW_INTA_MASK_ALL);
				/* This is possible due to priv->lock no longer being taken
				   in isr */

=>interrupt from ipw2200
in new isr
if (!(priv->status & STATUS_INT_ENABLED))
	return IRQ_NONE; /* we wrongfully return here because priv->status
                            does not reflect the register's value */


Not sure this is really important at all, just curious.

Thanks,
Frederik

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond)
  2006-05-30 17:38     ` Steven Rostedt
@ 2006-06-03 18:09       ` Steven Rostedt
  2006-06-04  9:18         ` Arjan van de Ven
  0 siblings, 1 reply; 319+ messages in thread
From: Steven Rostedt @ 2006-06-03 18:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, arjan

On Tue, 2006-05-30 at 13:38 -0400, Steven Rostedt wrote:
> On Mon, 2006-05-29 at 18:33 -0700, Andrew Morton wrote:
> > On Mon, 29 May 2006 23:23:28 +0200
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > add WARN_ON_ONCE(cond) to print once-per-bootup messages.
> > > 
> > > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> > > ---
> > >  include/asm-generic/bug.h |   13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > > 
> > > Index: linux/include/asm-generic/bug.h
> > > ===================================================================
> > > --- linux.orig/include/asm-generic/bug.h
> > > +++ linux/include/asm-generic/bug.h
> > > @@ -44,4 +44,17 @@
> > >  # define WARN_ON_SMP(x)			do { } while (0)
> > >  #endif
> > >  
> > > +#define WARN_ON_ONCE(condition)				\
> > > +({							\
> > > +	static int __warn_once = 1;			\
> > > +	int __ret = 0;					\
> > > +							\
> > > +	if (unlikely(__warn_once && (condition))) {	\
> 
> Since __warn_once is likely to be true, and the condition is likely to
> be false, wouldn't it be better to switch this around to:
> 
>   if (unlikely((condition) && __warn_once)) {
> 
> So the && will fall out before having to check a global variable.
> 
> Only after the unlikely condition would the __warn_once be false.

Hi Ingo,

Not sure if you missed this request or didn't think it mattered.  But I
just tried out the difference between the two to see what gcc would do
to a simple function compiling with -O2.

Here's my code:

----- with the current WARN_ON_ONCE ----

#define unlikely(x) __builtin_expect(!!(x), 0)

#define WARN_ON_ONCE(condition)                         \
({                                                      \
        static int __warn_once = 1;                     \
        int __ret = 0;                                  \
                                                        \
        if (__warn_once && unlikely((condition))) {     \
                __warn_once = 0;                        \
                WARN_ON(1);                             \
                __ret = 1;                              \
        }                                               \
        __ret;                                          \
})

int warn (int x)
{
        WARN_ON_ONCE(x==1);
        return x+1;
}


----- with the version I suggest. ----

#define unlikely(x) __builtin_expect(!!(x), 0)

#define WARN_ON_ONCE(condition)                         \
({                                                      \
        static int __warn_once = 1;                     \
        int __ret = 0;                                  \
                                                        \
        if (unlikely((condition)) && __warn_once) {     \
                __warn_once = 0;                        \
                WARN_ON(1);                             \
                __ret = 1;                              \
        }                                               \
        __ret;                                          \
})

int warn(int x)
{
        WARN_ON_ONCE(x==1);
        return x+1;
}

-------


Compiling these two I get this:


current warn.o:

00000000 <warn>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   53                      push   %ebx
   4:   83 ec 04                sub    $0x4,%esp
   7:   a1 00 00 00 00          mov    0x0,%eax
   c:   8b 5d 08                mov    0x8(%ebp),%ebx

# here we test the __warn_once first and if it is not zero
# it jumps to warn+0x20 to do the condition test
   f:   85 c0                   test   %eax,%eax
  11:   75 0d                   jne    20 <warn+0x20>
  13:   5a                      pop    %edx
  14:   8d 43 01                lea    0x1(%ebx),%eax
  17:   5b                      pop    %ebx
  18:   5d                      pop    %ebp
  19:   c3                      ret
  1a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
  20:   83 fb 01                cmp    $0x1,%ebx
  23:   75 ee                   jne    13 <warn+0x13>
  25:   31 c9                   xor    %ecx,%ecx
  27:   89 0d 00 00 00 00       mov    %ecx,0x0
  2d:   c7 04 24 01 00 00 00    movl   $0x1,(%esp)
  34:   e8 fc ff ff ff          call   35 <warn+0x35>
  39:   eb d8                   jmp    13 <warn+0x13>
Disassembly of section .data:


My suggested change of doing the condition first:

00000000 <warn>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   53                      push   %ebx
   4:   83 ec 04                sub    $0x4,%esp
   7:   8b 5d 08                mov    0x8(%ebp),%ebx

# here we test the condition first, and if it the
# unlikely condition is true, then we jump to test
# the __warn_once.
   a:   83 fb 01                cmp    $0x1,%ebx
   d:   74 07                   je     16 <warn+0x16>
   f:   5a                      pop    %edx
  10:   8d 43 01                lea    0x1(%ebx),%eax
  13:   5b                      pop    %ebx
  14:   5d                      pop    %ebp
  15:   c3                      ret
  16:   a1 00 00 00 00          mov    0x0,%eax
  1b:   85 c0                   test   %eax,%eax
  1d:   74 f0                   je     f <warn+0xf>
  1f:   31 c9                   xor    %ecx,%ecx
  21:   89 0d 00 00 00 00       mov    %ecx,0x0
  27:   c7 04 24 01 00 00 00    movl   $0x1,(%esp)
  2e:   e8 fc ff ff ff          call   2f <warn+0x2f>
  33:   eb da                   jmp    f <warn+0xf>
Disassembly of section .data:


As you can see, because the whole thing is unlikely, the first condition
is expected to fail.  With the current WARN_ON logic, that means that
the __warn_once is expected to fail, but that's not the case.  So on a
normal system where the WARN_ON_ONCE condition would never happen, you
are always branching.   So simply reversing the order to test the
condition before testing the __warn_once variable should improve cache
performance.

Below is my recommended patch.

-- Steve

Index: linux-2.6.17-rc5-mm2/include/asm-generic/bug.h
===================================================================
--- linux-2.6.17-rc5-mm2.orig/include/asm-generic/bug.h	2006-06-03 14:01:22.000000000 -0400
+++ linux-2.6.17-rc5-mm2/include/asm-generic/bug.h	2006-06-03 14:01:50.000000000 -0400
@@ -43,7 +43,7 @@
 	static int __warn_once = 1;			\
 	int __ret = 0;					\
 							\
-	if (unlikely(__warn_once && (condition))) {	\
+	if (unlikely((condition) && __warn_once)) {	\
 		__warn_once = 0;			\
 		WARN_ON(1);				\
 		__ret = 1;				\



^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond)
  2006-06-03 18:09       ` Steven Rostedt
@ 2006-06-04  9:18         ` Arjan van de Ven
  2006-06-04 13:43           ` Steven Rostedt
  0 siblings, 1 reply; 319+ messages in thread
From: Arjan van de Ven @ 2006-06-04  9:18 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andrew Morton, Ingo Molnar, linux-kernel

On Sat, 2006-06-03 at 14:09 -0400, Steven Rostedt wrote:

> 
> As you can see, because the whole thing is unlikely, the first condition
> is expected to fail.  With the current WARN_ON logic, that means that
> the __warn_once is expected to fail, but that's not the case.  So on a
> normal system where the WARN_ON_ONCE condition would never happen, you
> are always branching. 

which is no cost since it's consistent for the branch predictor

>   So simply reversing the order to test the
> condition before testing the __warn_once variable should improve cache
> performance.
> -	if (unlikely(__warn_once && (condition))) {	\
> +	if (unlikely((condition) && __warn_once)) {	\
>  		__warn_once = 0;			\

I disagree with this; "condition" can be a relatively complex thing,
such as a function call. doing the cheaper (and consistent!) test first
will be better. __warn_once will be branch predicted correctly ALWAYS,
except the exact ONE time you turn hit the backtrace. So it's really
really cheap to test, and if the WARN_ON_ONCE is triggering a lot after
the first time, you now would have a flapping first condition (which
means lots of branch mispredicts) while the original code has a perfect
one-check-predicted-exit scenario.




^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 05/61] lock validator: introduce WARN_ON_ONCE(cond)
  2006-06-04  9:18         ` Arjan van de Ven
@ 2006-06-04 13:43           ` Steven Rostedt
  0 siblings, 0 replies; 319+ messages in thread
From: Steven Rostedt @ 2006-06-04 13:43 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andrew Morton, Ingo Molnar, linux-kernel

On Sun, 2006-06-04 at 11:18 +0200, Arjan van de Ven wrote:
> On Sat, 2006-06-03 at 14:09 -0400, Steven Rostedt wrote:
> 
> > 
> > As you can see, because the whole thing is unlikely, the first condition
> > is expected to fail.  With the current WARN_ON logic, that means that
> > the __warn_once is expected to fail, but that's not the case.  So on a
> > normal system where the WARN_ON_ONCE condition would never happen, you
> > are always branching. 
> 
> which is no cost since it's consistent for the branch predictor
> 
> >   So simply reversing the order to test the
> > condition before testing the __warn_once variable should improve cache
> > performance.
> > -	if (unlikely(__warn_once && (condition))) {	\
> > +	if (unlikely((condition) && __warn_once)) {	\
> >  		__warn_once = 0;			\
> 
> I disagree with this; "condition" can be a relatively complex thing,
> such as a function call. doing the cheaper (and consistent!) test first
> will be better. 

Wrong!  It's not better, because it is pretty much ALWAYS TRUE!  So even
if you have branch prediction you will call the condition regardless!

> __warn_once will be branch predicted correctly ALWAYS,
> except the exact ONE time you turn hit the backtrace. So it's really
> really cheap to test, and if the WARN_ON_ONCE is triggering a lot after
> the first time, you now would have a flapping first condition (which
> means lots of branch mispredicts) while the original code has a perfect
> one-check-predicted-exit scenario.

Who cares?  If the WARN_ON_ONCE _is_ triggered a bunch of times, that
means the kernel is broken.  The WARN_ON is about checking for validity,
and the condition should never trigger on a proper setup.  The ONCE part
is to keep the users logs from getting full and killing performance with
printk. And even so.  If you have 100 instances of WARN_ON_ONCE in the
kernel, only one at time would probably trigger, so you save on the
other 99.  Your idea is to optimize the broken kernel while punishing
the working one.

The analysis wasn't only about the code, but also about the use of
WARN_ON_ONCE.  The condition should _not_ be too complex and slow since
the WARN_ON_ONCE is just a check, and not something that should slow the
system down too much.

One other thing that wasn't mentioned.  The __warn_once variable is
global and not setup as a read_mostly (which maybe it should).  Because
now it can be placed in the same cache line as some global variable that
is modified a lot, so every time you test __warn_once you need to do a
cache coherency  with other CPUS, thus bringing down the performance
further.

-- Steve


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch mm1-rc2] lock validator: netlink.c netlink_table_grab fix
  2006-06-02  9:53       ` Frederik Deweerdt
@ 2006-06-05  3:40         ` Zhu Yi
  0 siblings, 0 replies; 319+ messages in thread
From: Zhu Yi @ 2006-06-05  3:40 UTC (permalink / raw)
  To: Frederik Deweerdt
  Cc: Benoit Boissinot, linux-kernel, Ingo Molnar, Arjan van de Ven,
	Andrew Morton, jketreno

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

On Fri, 2006-06-02 at 09:53 +0000, Frederik Deweerdt wrote:
> On Fri, Jun 02, 2006 at 11:10:10AM +0800, Zhu Yi wrote:
> > On Thu, 2006-06-01 at 16:42 +0200, Frederik Deweerdt wrote:
> > > This got rid of the oops for me, is it the right fix?
> > 
> > I don't think netlink will contend with hardirqs. Can you test with this
> > fix for ipw2200 driver?
> > 
> It does work, thanks. But doesn't this add a possibility of missing 
> some interrupts?
> 	cpu0				cpu1
>         ====				====
> in isr				in tasklet
> 
> 				ipw_enable_interrupts
> 				|->priv->status |= STATUS_INT_ENABLED;

This is unlikely. cpu0 should not receive ipw2200 interrupt since the
interrupt is disabled until HERE (see below).

> ipw_disable_interrupts
> |->priv->status &= ~STATUS_INT_ENABLED;
> |->ipw_write32(priv, IPW_INTA_MASK_R, ~IPW_INTA_MASK_ALL);
> 
> 				|->ipw_write32(priv, IPW_INTA_MASK_R, IPW_INTA_MASK_ALL);
> 				/* This is possible due to priv->lock no longer being taken
> 				   in isr */

				HERE


Well, this is not 100% if when the card fires two consecutive
interrupts. Though unlikely, it's better to protect early than seeing
some "weird" bugs one day. I proposed attached patch. If you can help to
test, that will be appreciated (I cannot see the lockdep warning on my
box somehow).

Thanks,
-yi

[-- Attachment #2: lock_irq.patch --]
[-- Type: text/x-patch, Size: 3972 bytes --]

diff -urp a/drivers/net/wireless/ipw2200.c b/drivers/net/wireless/ipw2200.c
--- a/drivers/net/wireless/ipw2200.c	2006-04-01 09:47:24.000000000 +0800
+++ b/drivers/net/wireless/ipw2200.c	2006-06-05 11:32:18.000000000 +0800
@@ -542,7 +542,7 @@ static inline void ipw_clear_bit(struct 
 	ipw_write32(priv, reg, ipw_read32(priv, reg) & ~mask);
 }
 
-static inline void ipw_enable_interrupts(struct ipw_priv *priv)
+static inline void __ipw_enable_interrupts(struct ipw_priv *priv)
 {
 	if (priv->status & STATUS_INT_ENABLED)
 		return;
@@ -550,7 +550,7 @@ static inline void ipw_enable_interrupts
 	ipw_write32(priv, IPW_INTA_MASK_R, IPW_INTA_MASK_ALL);
 }
 
-static inline void ipw_disable_interrupts(struct ipw_priv *priv)
+static inline void __ipw_disable_interrupts(struct ipw_priv *priv)
 {
 	if (!(priv->status & STATUS_INT_ENABLED))
 		return;
@@ -558,6 +558,20 @@ static inline void ipw_disable_interrupt
 	ipw_write32(priv, IPW_INTA_MASK_R, ~IPW_INTA_MASK_ALL);
 }
 
+static inline void ipw_enable_interrupts(struct ipw_priv *priv)
+{
+	spin_lock_irqsave(&priv->irq_lock, priv->lock_flags);
+	__ipw_enable_interrupts(priv);
+	spin_unlock_irqrestore(&priv->irq_lock, priv->lock_flags);
+}
+
+static inline void ipw_disable_interrupts(struct ipw_priv *priv)
+{
+	spin_lock_irqsave(&priv->irq_lock, priv->lock_flags);
+	__ipw_disable_interrupts(priv);
+	spin_unlock_irqrestore(&priv->irq_lock, priv->lock_flags);
+}
+
 #ifdef CONFIG_IPW2200_DEBUG
 static char *ipw_error_desc(u32 val)
 {
@@ -1959,7 +1973,7 @@ static void ipw_irq_tasklet(struct ipw_p
 	unsigned long flags;
 	int rc = 0;
 
-	spin_lock_irqsave(&priv->lock, flags);
+	spin_lock_irqsave(&priv->irq_lock, flags);
 
 	inta = ipw_read32(priv, IPW_INTA_RW);
 	inta_mask = ipw_read32(priv, IPW_INTA_MASK_R);
@@ -1968,6 +1982,10 @@ static void ipw_irq_tasklet(struct ipw_p
 	/* Add any cached INTA values that need to be handled */
 	inta |= priv->isr_inta;
 
+	spin_unlock_irqrestore(&priv->irq_lock, flags);
+
+	spin_lock_irqsave(&priv->lock, flags);
+
 	/* handle all the justifications for the interrupt */
 	if (inta & IPW_INTA_BIT_RX_TRANSFER) {
 		ipw_rx(priv);
@@ -2096,10 +2114,10 @@ static void ipw_irq_tasklet(struct ipw_p
 		IPW_ERROR("Unhandled INTA bits 0x%08x\n", inta & ~handled);
 	}
 
+	spin_unlock_irqrestore(&priv->lock, flags);
+
 	/* enable all interrupts */
 	ipw_enable_interrupts(priv);
-
-	spin_unlock_irqrestore(&priv->lock, flags);
 }
 
 #define IPW_CMD(x) case IPW_CMD_ ## x : return #x
@@ -11058,7 +11076,7 @@ static irqreturn_t ipw_isr(int irq, void
 	if (!priv)
 		return IRQ_NONE;
 
-	spin_lock(&priv->lock);
+	spin_lock(&priv->irq_lock);
 
 	if (!(priv->status & STATUS_INT_ENABLED)) {
 		/* Shared IRQ */
@@ -11080,7 +11098,7 @@ static irqreturn_t ipw_isr(int irq, void
 	}
 
 	/* tell the device to stop sending interrupts */
-	ipw_disable_interrupts(priv);
+	__ipw_disable_interrupts(priv);
 
 	/* ack current interrupts */
 	inta &= (IPW_INTA_MASK_ALL & inta_mask);
@@ -11091,11 +11109,11 @@ static irqreturn_t ipw_isr(int irq, void
 
 	tasklet_schedule(&priv->irq_tasklet);
 
-	spin_unlock(&priv->lock);
+	spin_unlock(&priv->irq_lock);
 
 	return IRQ_HANDLED;
       none:
-	spin_unlock(&priv->lock);
+	spin_unlock(&priv->irq_lock);
 	return IRQ_NONE;
 }
 
@@ -12185,6 +12203,7 @@ static int ipw_pci_probe(struct pci_dev 
 #ifdef CONFIG_IPW2200_DEBUG
 	ipw_debug_level = debug;
 #endif
+	spin_lock_init(&priv->irq_lock);
 	spin_lock_init(&priv->lock);
 	for (i = 0; i < IPW_IBSS_MAC_HASH_SIZE; i++)
 		INIT_LIST_HEAD(&priv->ibss_mac_hash[i]);
diff -urp a/drivers/net/wireless/ipw2200.h b/drivers/net/wireless/ipw2200.h
--- a/drivers/net/wireless/ipw2200.h	2006-04-01 09:47:24.000000000 +0800
+++ b/drivers/net/wireless/ipw2200.h	2006-06-05 11:32:18.000000000 +0800
@@ -1181,6 +1181,8 @@ struct ipw_priv {
 	struct ieee80211_device *ieee;
 
 	spinlock_t lock;
+	spinlock_t irq_lock;
+	unsigned long lock_flags;
 	struct mutex mutex;
 
 	/* basic pci-network driver stuff */

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 06/61] lock validator: add __module_address() method
  2006-05-30  1:33   ` Andrew Morton
  2006-05-30 17:45     ` Steven Rostedt
@ 2006-06-23  8:38     ` Ingo Molnar
  1 sibling, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  8:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> On Mon, 29 May 2006 23:23:33 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > +/*
> > + * Is this a valid module address? We don't grab the lock.
> > + */
> > +int __module_address(unsigned long addr)
> > +{
> > +	struct module *mod;
> > +
> > +	list_for_each_entry(mod, &modules, list)
> > +		if (within(addr, mod->module_core, mod->core_size))
> > +			return 1;
> > +	return 0;
> > +}
> 
> Returns a boolean.
> 
> >  /* Is this a valid kernel address?  We don't grab the lock: we are oopsing. */
> >  struct module *__module_text_address(unsigned long addr)
> 
> But this returns a module*.
> 
> I'd suggest that __module_address() should do the same thing, from an 
> API neatness POV.  Although perhaps that's mot very useful if we 
> didn't take a ref on the returned object (but module_text_address() 
> doesn't either).
> 
> Also, the name's a bit misleading - it sounds like it returns the 
> address of a module or something.  __module_any_address() would be 
> better, perhaps?

yeah. I changed this to __is_module_address().

> Also, how come this doesn't need modlist_lock()?

indeed. I originally avoided taking that lock due to recursion worries - 
but in fact we use this only in sections that initialize a lock - hence 
no recursion problems.

i fixed this and renamed the function to is_module_address() :)

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 16/61] lock validator: fown locking workaround
  2006-05-30  1:34   ` Andrew Morton
@ 2006-06-23  9:10     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  9:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> On Mon, 29 May 2006 23:24:23 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > temporary workaround for the lock validator: make all uses of 
> > f_owner.lock irq-safe. (The real solution will be to express to the 
> > lock validator that f_owner.lock rules are to be generated 
> > per-filesystem.)
> 
> This description forgot to tell us what problem is being worked 
> around.

f_owner locking rules are per-filesystem: some of them have this lock 
irq-safe [because they use it in irq-context generated SIGIOs], some of 
them have it irq-unsafe [because they dont generate SIGIOs in irq 
context]. The lock validator meshes them together and produces a false 
positive. The workaround changed all uses of f_owner.lock to be 
irq-safe.

> This patch is a bit of a show-stopper.  How hard-n-bad is the real 
> fix?

the real fix would be to correctly map the 'key' of the f_owner.lock to 
the filesystem. I.e. to embedd a "lockdep_type_key s_fown_key" in 
'struct file_system_type', and to use that key when initializing 
f_own.lock.

the practical problem is that the initialization site of f_owner.lock 
does not know about which filesystem this file will belong to.

there might be another way though: the only non-core user of f_own.lock 
is CIFS, and that use of f_own.lock seems unnecessary - it does not 
change any fowner state, and its justification for taking that lock 
seems rather vague as well:

 *  GlobalSMBSesLock protects:
 *      list operations on tcp and SMB session lists and tCon lists
 *  f_owner.lock protects certain per file struct operations

maybe CIFS or VFS people could comment?

that way you could remove the following patch from -mm:

   lock-validator-fown-locking-workaround.patch

and add the patch below. (the fcntl.c portion of the above patch is 
meanwhile moot)

	Ingo

--------------------------------------
Subject: CIFS: remove f_owner.lock use
From: Ingo Molnar <mingo@elte.hu>

CIFS takes/releases f_owner.lock - why? It does not change anything
in the fowner state. Remove this locking.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 fs/cifs/file.c |    9 ---------
 1 file changed, 9 deletions(-)

Index: linux/fs/cifs/file.c
===================================================================
--- linux.orig/fs/cifs/file.c
+++ linux/fs/cifs/file.c
@@ -110,7 +110,6 @@ static inline int cifs_open_inode_helper
 			 &pCifsInode->openFileList);
 	}
 	write_unlock(&GlobalSMBSeslock);
-	write_unlock(&file->f_owner.lock);
 	if (pCifsInode->clientCanCacheRead) {
 		/* we have the inode open somewhere else
 		   no need to discard cache data */
@@ -287,7 +286,6 @@ int cifs_open(struct inode *inode, struc
 		goto out;
 	}
 	pCifsFile = cifs_init_private(file->private_data, inode, file, netfid);
-	write_lock(&file->f_owner.lock);
 	write_lock(&GlobalSMBSeslock);
 	list_add(&pCifsFile->tlist, &pTcon->openFileList);
 
@@ -298,7 +296,6 @@ int cifs_open(struct inode *inode, struc
 					    &oplock, buf, full_path, xid);
 	} else {
 		write_unlock(&GlobalSMBSeslock);
-		write_unlock(&file->f_owner.lock);
 	}
 
 	if (oplock & CIFS_CREATE_ACTION) {           
@@ -477,7 +474,6 @@ int cifs_close(struct inode *inode, stru
 	pTcon = cifs_sb->tcon;
 	if (pSMBFile) {
 		pSMBFile->closePend = TRUE;
-		write_lock(&file->f_owner.lock);
 		if (pTcon) {
 			/* no sense reconnecting to close a file that is
 			   already closed */
@@ -492,23 +488,18 @@ int cifs_close(struct inode *inode, stru
 					the struct would be in each open file,
 					but this should give enough time to 
 					clear the socket */
-					write_unlock(&file->f_owner.lock);
 					cERROR(1,("close with pending writes"));
 					msleep(timeout);
-					write_lock(&file->f_owner.lock);
 					timeout *= 4;
 				} 
-				write_unlock(&file->f_owner.lock);
 				rc = CIFSSMBClose(xid, pTcon,
 						  pSMBFile->netfid);
-				write_lock(&file->f_owner.lock);
 			}
 		}
 		write_lock(&GlobalSMBSeslock);
 		list_del(&pSMBFile->flist);
 		list_del(&pSMBFile->tlist);
 		write_unlock(&GlobalSMBSeslock);
-		write_unlock(&file->f_owner.lock);
 		kfree(pSMBFile->search_resume_name);
 		kfree(file->private_data);
 		file->private_data = NULL;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 17/61] lock validator: sk_callback_lock workaround
  2006-05-30  1:34   ` Andrew Morton
@ 2006-06-23  9:19     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  9:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> On Mon, 29 May 2006 23:24:27 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > temporary workaround for the lock validator: make all uses of 
> > sk_callback_lock softirq-safe. (The real solution will be to express 
> > to the lock validator that sk_callback_lock rules are to be 
> > generated per-address-family.)
> 
> Ditto.  What's the actual problem being worked around here, and how's 
> the real fix shaping up?

this patch should be moot meanwhile. Earlier versions of the lock 
validator produced false positives for certain read-locking constructs.

i have undone the patch:

  lock-validator-sk_callback_lock-workaround.patch

and there doesnt seem to be any false positives popping up. Please dont 
remove it from -mm yet, i'll test this some more and will do the removal 
in the lock validator queue refactoring, ok?

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 21/61] lock validator: lockdep: add local_irq_enable_in_hardirq() API.
  2006-05-30  1:34   ` Andrew Morton
@ 2006-06-23  9:28     ` Ingo Molnar
  2006-06-23  9:52       ` Andrew Morton
  0 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  9:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> On Mon, 29 May 2006 23:24:52 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > introduce local_irq_enable_in_hardirq() API. It is currently
> > aliased to local_irq_enable(), hence has no functional effects.
> > 
> > This API will be used by lockdep, but even without lockdep
> > this will better document places in the kernel where a hardirq
> > context enables hardirqs.
> 
> If we expect people to use this then we'd best whack a comment over 
> it.

ok, i've improved the comment in trace_irqflags.h.

> Also, trace_irqflags.h doesn't seem an appropriate place for it to 
> live.

seems like the most practical place for it. Previously we had no central 
include file for irq-flags APIs (they used to be included from 
asm/system.h and other random per-arch places) - trace_irqflags.h has 
become the central file now. Should i rename it to irqflags.h perhaps, 
to not tie it to tracing? We have some deprecated irq-flags ops in 
interrupt.h, maybe this all belongs there. (although i think it's 
cleaner to have linux/include/irqflags.h and include it from 
interrupt.h)

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 22/61] lock validator:  add per_cpu_offset()
  2006-05-30  1:34   ` Andrew Morton
@ 2006-06-23  9:30     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  9:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, arjan, Luck, Tony, Benjamin Herrenschmidt,
	Paul Mackerras, Martin Schwidefsky, David S. Miller


* Andrew Morton <akpm@osdl.org> wrote:

> > +#define per_cpu_offset(x) (__per_cpu_offset(x))
> > +
> >  /* Separate out the type, so (int[3], foo) works. */
> >  #define DEFINE_PER_CPU(type, name) \
> >      __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
> 
> I can tell just looking at it that it'll break various builds.I assume 
> that things still happen to compile because you're presently using it 
> in code which those architectures don't presently compile.
> 
> But introducing a "generic" function invites others to start using it.  
> And they will, and they'll ship code which "works" but is broken, 
> because they only tested it on x86 and x86_64.
> 
> I'll queue the needed fixups - please check it.

[belated reply] They look good.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/61] ANNOUNCE: lock validator -V1
  2006-05-30  1:35 ` Andrew Morton
@ 2006-06-23  9:41   ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  9:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> > We are pleased to announce the first release of the "lock dependency 
> > correctness validator" kernel debugging feature
> 
> What are the runtime speed and space costs of enabling this?

The RAM space costs are estimated in the bootup info printout:

 ... MAX_LOCKDEP_SUBTYPES:    8
 ... MAX_LOCK_DEPTH:          30
 ... MAX_LOCKDEP_KEYS:        2048
 ... TYPEHASH_SIZE:           1024
 ... MAX_LOCKDEP_ENTRIES:     8192
 ... MAX_LOCKDEP_CHAINS:      8192
 ... CHAINHASH_SIZE:          4096
  memory used by lock dependency info: 696 kB
  per task-struct memory footprint: 1200 bytes

Plus every lock now embedds the lock_map structure which is 10 pointers. 
That is the biggest direct dynamic RAM cost.

There are also a few embedded keys in .data but they are small.

The .text overhead mostly comes from the subsystem itself - which is 
around 20K of .text. The callbacks are not inlined most of the time - 
there are about 200 of them right now, which should be another +1-2K of 
.text cost.

The runtime cycle cost is significant if CONFIG_DEBUG_LOCKDEP [lock 
validator self-consistency checks] is enabled - then we take a global 
lock from every lock operation which kills scalability.

If DEBUG_LOCKDEP is disabled then it's OK - smaller than DEBUG_SLAB. In 
this case we have the lock-stack maintainance overhead, the irq-trace 
callbacks and a lockless hash-lookup per lock operation. All of that 
overhead is O(1) and lockless so it shouldnt change fundamental 
characteristics anywhere.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 36/61] lock validator: special locking: serial
  2006-05-30  1:35   ` Andrew Morton
@ 2006-06-23  9:49     ` Ingo Molnar
  2006-06-23 10:04       ` Andrew Morton
  0 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  9:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan, Russell King


* Andrew Morton <akpm@osdl.org> wrote:

> > +/*
> > + * lockdep: port->lock is initialized in two places, but we
> > + *          want only one lock-type:
> > + */
> > +static struct lockdep_type_key port_lock_key;
> > +
> >  /**
> >   *	uart_set_options - setup the serial console parameters
> >   *	@port: pointer to the serial ports uart_port structure
> > @@ -1869,7 +1875,7 @@ uart_set_options(struct uart_port *port,
> >  	 * Ensure that the serial console lock is initialised
> >  	 * early.
> >  	 */
> > -	spin_lock_init(&port->lock);
> > +	spin_lock_init_key(&port->lock, &port_lock_key);
> >  
> >  	memset(&termios, 0, sizeof(struct termios));
> >  
> > @@ -2255,7 +2261,7 @@ int uart_add_one_port(struct uart_driver
> >  	 * initialised.
> >  	 */
> >  	if (!(uart_console(port) && (port->cons->flags & CON_ENABLED)))
> > -		spin_lock_init(&port->lock);
> > +		spin_lock_init_key(&port->lock, &port_lock_key);
> >  
> >  	uart_configure_port(drv, state, port);
> >  
> 
> Is there a cleaner way of doing this?
> 
> Perhaps write a new helper function which initialises the spinlock, 
> call that?  Rather than open-coding lockdep stuff?

yes, we can do that too - but that would have an effect to non-lockdep 
kernels too.

Also, the initialization of the 'port' seems a bit twisted here, already 
initialized and not-yet-initialized ports can be passed in to 
uard_add_one_port(). So i did not want to touch the structure of the 
code - hence the open-coded solution.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 37/61] lock validator: special locking: dcache
  2006-05-30 20:51     ` Steven Rostedt
  2006-05-30 21:01       ` Ingo Molnar
@ 2006-06-23  9:51       ` Ingo Molnar
  1 sibling, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  9:51 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andrew Morton, linux-kernel, arjan


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 2006-05-29 at 18:35 -0700, Andrew Morton wrote:

> > DENTRY_D_LOCK_NORMAL isn't used anywhere.
> 
> I guess it is implied with the normal spin_lock.  Since 
>   spin_lock(&target->d_lock) and
>   spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NORMAL)
> are equivalent. (DENTRY_D_LOCK_NORMAL == 0)
> 
> Probably this deserves a comment.

i have added a comment to dcache.h explaining this better.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 21/61] lock validator: lockdep: add local_irq_enable_in_hardirq() API.
  2006-06-23  9:28     ` Ingo Molnar
@ 2006-06-23  9:52       ` Andrew Morton
  2006-06-23 10:20         ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-06-23  9:52 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Fri, 23 Jun 2006 11:28:52 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > On Mon, 29 May 2006 23:24:52 +0200
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > introduce local_irq_enable_in_hardirq() API. It is currently
> > > aliased to local_irq_enable(), hence has no functional effects.
> > > 
> > > This API will be used by lockdep, but even without lockdep
> > > this will better document places in the kernel where a hardirq
> > > context enables hardirqs.
> > 
> > If we expect people to use this then we'd best whack a comment over 
> > it.
> 
> ok, i've improved the comment in trace_irqflags.h.
> 
> > Also, trace_irqflags.h doesn't seem an appropriate place for it to 
> > live.
> 
> seems like the most practical place for it. Previously we had no central 
> include file for irq-flags APIs (they used to be included from 
> asm/system.h and other random per-arch places) - trace_irqflags.h has 
> become the central file now. Should i rename it to irqflags.h perhaps, 
> to not tie it to tracing? We have some deprecated irq-flags ops in 
> interrupt.h, maybe this all belongs there. (although i think it's 
> cleaner to have linux/include/irqflags.h and include it from 
> interrupt.h)
> 

Yes, irqflags.h is nice.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 46/61] lock validator: special locking: slab
  2006-05-30  1:35   ` Andrew Morton
@ 2006-06-23  9:54     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23  9:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> On Mon, 29 May 2006 23:26:49 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > +		/*
> > +		 * Do not assume that spinlocks can be initialized via memcpy:
> > +		 */
> 
> I'd view that as something which should be fixed in mainline.

yeah. I got bitten by this (read: pulled hair for hours) when converting 
the slab spinlocks to rtmutexes in the -rt tree.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 50/61] lock validator: special locking: hrtimer.c
  2006-05-30  1:35   ` Andrew Morton
@ 2006-06-23 10:04     ` Ingo Molnar
  2006-06-23 10:38       ` Andrew Morton
  0 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> >  	for (i = 0; i < MAX_HRTIMER_BASES; i++, base++)
> > -		spin_lock_init(&base->lock);
> > +		spin_lock_init_static(&base->lock);
> >  }
> >  
> 
> Perhaps the validator core's implementation of spin_lock_init() could 
> look at the address and work out if it's within the static storage 
> sections.

yeah, but there are two cases: places where we want to 'unify' array 
locks into a single type, and places where we want to treat them 
separately. The case where we 'unify' is the more common one: locks 
embedded into hash-tables for example. So i went for annotating the ones 
that are rarer. There are 2 right now: scheduler, hrtimers, with the 
hrtimers one going away in the high-res-timers implementation. (we 
unified the hrtimers locks into a per-CPU lock) (there's also a kgdb 
annotation for -mm)

perhaps the naming should be clearer? I had it named 
spin_lock_init_standalone() originally, then cleaned it up to be 
spin_lock_init_static(). Maybe the original name is better?

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 36/61] lock validator: special locking: serial
  2006-06-23  9:49     ` Ingo Molnar
@ 2006-06-23 10:04       ` Andrew Morton
  2006-06-23 10:18         ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-06-23 10:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, rmk

On Fri, 23 Jun 2006 11:49:41 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > > +/*
> > > + * lockdep: port->lock is initialized in two places, but we
> > > + *          want only one lock-type:
> > > + */
> > > +static struct lockdep_type_key port_lock_key;
> > > +
> > >  /**
> > >   *	uart_set_options - setup the serial console parameters
> > >   *	@port: pointer to the serial ports uart_port structure
> > > @@ -1869,7 +1875,7 @@ uart_set_options(struct uart_port *port,
> > >  	 * Ensure that the serial console lock is initialised
> > >  	 * early.
> > >  	 */
> > > -	spin_lock_init(&port->lock);
> > > +	spin_lock_init_key(&port->lock, &port_lock_key);
> > >  
> > >  	memset(&termios, 0, sizeof(struct termios));
> > >  
> > > @@ -2255,7 +2261,7 @@ int uart_add_one_port(struct uart_driver
> > >  	 * initialised.
> > >  	 */
> > >  	if (!(uart_console(port) && (port->cons->flags & CON_ENABLED)))
> > > -		spin_lock_init(&port->lock);
> > > +		spin_lock_init_key(&port->lock, &port_lock_key);
> > >  
> > >  	uart_configure_port(drv, state, port);
> > >  
> > 
> > Is there a cleaner way of doing this?
> > 
> > Perhaps write a new helper function which initialises the spinlock, 
> > call that?  Rather than open-coding lockdep stuff?
> 
> yes, we can do that too - but that would have an effect to non-lockdep 
> kernels too.
> 
> Also, the initialization of the 'port' seems a bit twisted here, already 
> initialized and not-yet-initialized ports can be passed in to 
> uard_add_one_port(). So i did not want to touch the structure of the 
> code - hence the open-coded solution.
> 

btw, I was looking at this change:

diff -puN drivers/scsi/libata-core.c~lock-validator-locking-init-debugging-improvement drivers/scsi/libata-core.c
--- a/drivers/scsi/libata-core.c~lock-validator-locking-init-debugging-improvement
+++ a/drivers/scsi/libata-core.c
@@ -1003,6 +1003,7 @@ unsigned ata_exec_internal(struct ata_de
 	unsigned int err_mask;
 	int rc;
 
+	init_completion(&wait);
 	spin_lock_irqsave(ap->lock, flags);
 
 	/* no internal command while frozen */

That local was already initialised with DEFINE_WAIT().  Am surprised that
an init_wait() also was needed?


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 51/61] lock validator: special locking: sock_lock_init()
  2006-05-30  1:36   ` Andrew Morton
@ 2006-06-23 10:06     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan, David S. Miller


* Andrew Morton <akpm@osdl.org> wrote:

> > +/*
> > + * Each address family might have different locking rules, so we have
> > + * one slock key per address family:
> > + */
> > +static struct lockdep_type_key af_family_keys[AF_MAX];
> > +
> > +static void noinline sock_lock_init(struct sock *sk)
> > +{
> > +	spin_lock_init_key(&sk->sk_lock.slock, af_family_keys + sk->sk_family);
> > +	sk->sk_lock.owner = NULL;
> > +	init_waitqueue_head(&sk->sk_lock.wq);
> > +}
> 
> OK, no code outside net/core/sock.c uses sock_lock_init().

yeah.

> Hopefully the same is true of out-of-tree code...

it wont go unnoticed even if it does: we'll get a nonfatal lockdep 
message and fix it up. I dont expect out-of-tree code to mess with 
sk_lock.slock though ...

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 52/61] lock validator: special locking: af_unix
  2006-05-30  1:36   ` Andrew Morton
@ 2006-06-23 10:07     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan, David S. Miller


* Andrew Morton <akpm@osdl.org> wrote:

> > -			spin_lock(&sk->sk_receive_queue.lock);
> > +			spin_lock_bh(&sk->sk_receive_queue.lock);
> 
> Again, a bit of a show-stopper.  Will the real fix be far off?

ok, this should be solved in recent -mm, via:

 lock-validator-special-locking-af_unix-undo-af_unix-_bh-locking-changes-and-split-lock-type.patch

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 36/61] lock validator: special locking: serial
  2006-06-23 10:04       ` Andrew Morton
@ 2006-06-23 10:18         ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan, rmk


* Andrew Morton <akpm@osdl.org> wrote:

> btw, I was looking at this change:

> @@ -1003,6 +1003,7 @@ unsigned ata_exec_internal(struct ata_de
>  	unsigned int err_mask;
>  	int rc;
>  
> +	init_completion(&wait);
>  	spin_lock_irqsave(ap->lock, flags);
>  
>  	/* no internal command while frozen */
> 
> That local was already initialised with DEFINE_COMPLETION().  Am 
> surprised that an init_completion() also was needed?

That's a fundamental problem of DECLARE_COMPLETION() done on the kernel 
stack - it does build-time initialization with no opportunity to inject 
any runtime logic. (which lockdep would need. Maybe i missed some clever 
way to add a runtime callback into the initialization? [*])

Btw., there is no danger from missing the initialization of a wait 
structure: lockdep will detect "uninitialized" on-stack locks and will 
complain about it and turn itself off. [this happened a few times during 
development - that's how those init_completion() calls got added]

But at a minimum these initializations need to become lockdep-specific 
key-reinits - otherwise there will be impact to non-lockdep kernels too.

	Ingo

[*] the only solution i can see is to introduce 
DECLARE_COMPLETION_ONSTACK(), which could call a function with &wait 
passed in, where that function would return with a structure. The macro 
magic would resolve to something like:

  struct completion wait = lockdep_init_completion(&wait);

and thus the structure would be initialized. But this method cannot be 
used for static scope uses of DECLARE_COMPLETION, because it's not a 
constant initializer. So we'd definitely have to make a distinction in 
terms of _ONSTACK(). Is there really no compiler feature that could help 
us out here?

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 21/61] lock validator: lockdep: add local_irq_enable_in_hardirq() API.
  2006-06-23  9:52       ` Andrew Morton
@ 2006-06-23 10:20         ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> > > Also, trace_irqflags.h doesn't seem an appropriate place for it to 
> > > live.
> > 
> > seems like the most practical place for it. Previously we had no 
> > central include file for irq-flags APIs (they used to be included 
> > from asm/system.h and other random per-arch places) - 
> > trace_irqflags.h has become the central file now. Should i rename it 
> > to irqflags.h perhaps, to not tie it to tracing? We have some 
> > deprecated irq-flags ops in interrupt.h, maybe this all belongs 
> > there. (although i think it's cleaner to have 
> > linux/include/irqflags.h and include it from interrupt.h)
> > 
> 
> Yes, irqflags.h is nice.

ok, done.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 07/61] lock validator: better lock debugging
  2006-05-30  1:33   ` Andrew Morton
@ 2006-06-23 10:25     ` Ingo Molnar
  2006-06-23 11:06       ` Andrew Morton
  0 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> > +#define DEBUG_WARN_ON(c)						\
> > +({									\
> > +	int __ret = 0;							\
> > +									\
> > +	if (unlikely(c)) {						\
> > +		if (debug_locks_off())					\
> > +			WARN_ON(1);					\
> > +		__ret = 1;						\
> > +	}								\
> > +	__ret;								\
> > +})
> 
> Either the name of this thing is too generic, or we _make_ it generic, 
> in which case it's in the wrong header file.

this op is only intended to be used only by the lock debugging 
infrastructure. So it should be renamed - but i fail to find a good name 
for it. (it's used quite frequently within the lock debugging code, at 
60+ places) Maybe INTERNAL_WARN_ON()? [that makes it sound special 
enough.] DEBUG_LOCKS_WARN_ON() might work too.

> > +#ifdef CONFIG_SMP
> > +# define SMP_DEBUG_WARN_ON(c)			DEBUG_WARN_ON(c)
> > +#else
> > +# define SMP_DEBUG_WARN_ON(c)			do { } while (0)
> > +#endif
> 
> Probably ditto.

agreed.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 50/61] lock validator: special locking: hrtimer.c
  2006-06-23 10:04     ` Ingo Molnar
@ 2006-06-23 10:38       ` Andrew Morton
  2006-06-23 10:52         ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-06-23 10:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Fri, 23 Jun 2006 12:04:39 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > >  	for (i = 0; i < MAX_HRTIMER_BASES; i++, base++)
> > > -		spin_lock_init(&base->lock);
> > > +		spin_lock_init_static(&base->lock);
> > >  }
> > >  
> > 
> > Perhaps the validator core's implementation of spin_lock_init() could 
> > look at the address and work out if it's within the static storage 
> > sections.
> 
> yeah, but there are two cases: places where we want to 'unify' array 
> locks into a single type, and places where we want to treat them 
> separately. The case where we 'unify' is the more common one: locks 
> embedded into hash-tables for example. So i went for annotating the ones 
> that are rarer. There are 2 right now: scheduler, hrtimers, with the 
> hrtimers one going away in the high-res-timers implementation. (we 
> unified the hrtimers locks into a per-CPU lock) (there's also a kgdb 
> annotation for -mm)
> 
> perhaps the naming should be clearer? I had it named 
> spin_lock_init_standalone() originally, then cleaned it up to be 
> spin_lock_init_static(). Maybe the original name is better?
> 

hm.  This is where a "term of art" is needed.  What is lockdep's internal
term for locks-of-a-different-type?  It should have such a term.

"class" would be a good term, although terribly overused.  Using that as an
example, spin_lock_init_standalone_class()?  ug.

<gives up>

You want spin_lock_init_singleton().

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 18/61] lock validator: irqtrace: core
  2006-05-30  1:34   ` Andrew Morton
@ 2006-06-23 10:42     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> On Mon, 29 May 2006 23:24:32 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > accurate hard-IRQ-flags state tracing. This allows us to attach 
> > extra functionality to IRQ flags on/off events (such as 
> > trace-on/off).
> 
> That's a fairly skimpy description of some fairly substantial new 
> infrastructure.

ok, here's some more info (i'll add this to the irq-flags-tracing core 
patch):

the "irq state tracing" feature "traces" hardirq and softirq state, in 
that it gives interested subsystems an opportunity to be notified of 
every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that 
happens in the kernel.

CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING 
and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging 
code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and 
CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these 
are locking APIs that are not used in IRQ context. (the one exception 
for rwsems is worked around)

Right now the only interested subsystem is the lock validator, but 
things like RTLinux, ADEOS, the -rt tree and the latency tracer would 
certainly be interested in managing irq-flags state too. (I did not add 
any expansive (and probably expensive) notifier mechanism yet, before 
someone actually tries to mix multiple users of this infrastructure and 
comes up with the right abstraction.)

architecture support for this is certainly not in the "trivial" 
category, because lots of lowlevel assembly code deal with irq-flags 
state changes. But an architecture can be irq-flags-tracing enabled in a 
rather straightforward and risk-free manner.

Architectures that want to support this need to do a couple of 
code-organizational changes first:

- move their irq-flags manipulation code from their asm/system.h header 
  to asm/irqflags.h

- rename local_irq_disable()/etc to raw_local_irq_disable()/etc. so that 
  the linux/irqflags.h code can inject callbacks and can construct the 
  real local_irq_disable()/etc APIs.

- add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file

and then a couple of functional changes are needed as well to implement 
irq-flags-tracing support:

- in lowlevel entry code add (build-conditional) calls to the
  trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator 
  closely guards whether the 'real' irq-flags matches the 'virtual' 
  irq-flags state, and complains loudly (and turns itself off) if the 
  two do not match. Usually most of the time for arch support for 
  irq-flags-tracing is spent in this state: look at the lockdep 
  complaint, try to figure out the assembly code we did not cover yet, 
  fix and repeat. Once the system has booted up and works without a 
  lockdep complaint in the irq-flags-tracing functions arch support is 
  complete.

- if the architecture has non-maskable interrupts then those need to be 
  excluded from the irq-tracing [and lock validation] mechanism via
  lockdep_off()/lockdep_on().

in general there is no risk from having an incomplete irq-flags-tracing 
implementation in an architecture: lockdep will detect that and will 
turn itself off. I.e. the lock validator will still be reliable. There 
should be no crashes due to irq-tracing bugs. (except if the assembly 
changes break other code by modifying conditions or registers that 
shouldnt be)

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 27/61] lock validator: prove spinlock/rwlock locking correctness
  2006-05-30  1:35   ` Andrew Morton
@ 2006-06-23 10:44     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> On Mon, 29 May 2006 23:25:23 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > +# define spin_lock_init_key(lock, key)				\
> > +	__spin_lock_init((lock), #lock, key)
> 
> erk.  This adds a whole new layer of obfuscation on top of the 
> existing spinlock header files.  You already need to run the 
> preprocessor and disassembler to even work out which flavour you're 
> presently using.
> 
> Ho hum.

agreed. I think the API we started using in latest -mm 
(lockdep_init_key()) is the cleaner approach - that also makes it 
trivially sure that lockdep doesnt impact non-lockdep code. I'll fix the 
current lockdep_init_key() shortcomings and i'll get rid of the 
*_init_key() APIs.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 50/61] lock validator: special locking: hrtimer.c
  2006-06-23 10:38       ` Andrew Morton
@ 2006-06-23 10:52         ` Ingo Molnar
  2006-06-23 11:52           ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> > perhaps the naming should be clearer? I had it named 
> > spin_lock_init_standalone() originally, then cleaned it up to be 
> > spin_lock_init_static(). Maybe the original name is better?
> > 
> 
> hm.  This is where a "term of art" is needed.  What is lockdep's 
> internal term for locks-of-a-different-type?  It should have such a 
> term.

'lock type' is what i tried to use consistenty.

> "class" would be a good term, although terribly overused.  Using that 
> as an example, spin_lock_init_standalone_class()?  ug.
> 
> <gives up>
> 
> You want spin_lock_init_singleton().

hehe ;)

singleton wouldnt be enough here as we dont want just one instance of 
this lock type: we want separate types for each array entry. I.e. we 
dont want to unify the lock types (as the common spin_lock_init() call 
suggests), we want to split them along their static addresses.

singleton initialization is what spin_lock_init() itself accomplishes: 
the first call to a given spin_lock_init() will register a 'lock type' 
structure, and all subsequent calls to spin_lock_init() will find this 
type registered already. (keyed by the lockdep-type-key embedded in the 
spin_lock_init() macro)

so - spin_lock_init_split_type() might be better i think and expresses 
the purpose (to split away this type from the other lock types 
initialized here).

Or we could simply get rid of this static-variables special-case and 
embedd a lock_type_key in the runqueue and use 
spin_lock_init_key(&rq->rq_lock_key)? That would unify the 'splitting' 
of types for static and dynamic locks. (at a minimal cost of .data) Hm?

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 55/61] lock validator: special locking: sb->s_umount
  2006-05-30  1:36   ` Andrew Morton
@ 2006-06-23 10:55     ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 10:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> > +++ linux/fs/dcache.c
> > @@ -470,8 +470,9 @@ static void prune_dcache(int count, stru
> >  		s_umount = &dentry->d_sb->s_umount;
> >  		if (down_read_trylock(s_umount)) {
> >  			if (dentry->d_sb->s_root != NULL) {
> > -				prune_one_dentry(dentry);
> > +// lockdep hack: do this better!
> >  				up_read(s_umount);
> > +				prune_one_dentry(dentry);
> >  				continue;
> 
> argh, you broke my kernel!
> 
> I'll whack some ifdefs in here so it's only known-broken if 
> CONFIG_LOCKDEP.
> 
> Again, we'd need the real fix here.

yeah. We should undo this patch for now. This will only be complained 
about if CONFIG_DEBUG_NON_NESTED_UNLOCKS is enabled. [i'll do this in my 
refactored queue]

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 61/61] lock validator: enable lock validator in Kconfig
  2006-05-30 13:33   ` Roman Zippel
@ 2006-06-23 11:01     ` Ingo Molnar
  2006-06-26 11:37       ` Roman Zippel
  0 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 11:01 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > +config PROVE_SPIN_LOCKING
> > +	bool "Prove spin-locking correctness"
> > +	default y
> 
> Could you please keep all the defaults in a separate -mm-only patch, 
> so it doesn't get merged?

yep - the default got removed.

> There are also a number of dependencies on DEBUG_KERNEL missing, it 
> completely breaks the debugging menu.

i have solved this problem in current -mm by making more advanced 
versions of lock debugging (allocation/exit checks, validator) depend on 
more basic lock debugging options. All the basic lock debugging options 
have a DEBUG_KERNEL dependency, which thus gets inherited by the other 
options as well.

> > +config LOCKDEP
> > +	bool
> > +	default y
> > +	depends on PROVE_SPIN_LOCKING || PROVE_RW_LOCKING || PROVE_MUTEX_LOCKING || PROVE_RWSEM_LOCKING
> 
> This can be written shorter as:
> 
> config LOCKDEP
> 	def_bool PROVE_SPIN_LOCKING || PROVE_RW_LOCKING || PROVE_MUTEX_LOCKING || PROVE_RWSEM_LOCKING

ok, done. (Btw., there's tons of other Kconfig code though that uses the 
bool + depends syntax though, and def_bool usage is quite rare.)

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 07/61] lock validator: better lock debugging
  2006-06-23 11:06       ` Andrew Morton
@ 2006-06-23 11:04         ` Ingo Molnar
  0 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 11:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Andrew Morton <akpm@osdl.org> wrote:

> > > Either the name of this thing is too generic, or we _make_ it 
> > > generic, in which case it's in the wrong header file.
> > 
> > this op is only intended to be used only by the lock debugging 
> > infrastructure. So it should be renamed - but i fail to find a good 
> > name for it. (it's used quite frequently within the lock debugging 
> > code, at 60+ places) Maybe INTERNAL_WARN_ON()? [that makes it sound 
> > special enough.] DEBUG_LOCKS_WARN_ON() might work too.
> 
> Well it has a debug_locks_off() in there, so DEBUG_LOCKS_WARN_ON() 
> seems right.

done.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 07/61] lock validator: better lock debugging
  2006-06-23 10:25     ` Ingo Molnar
@ 2006-06-23 11:06       ` Andrew Morton
  2006-06-23 11:04         ` Ingo Molnar
  0 siblings, 1 reply; 319+ messages in thread
From: Andrew Morton @ 2006-06-23 11:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Fri, 23 Jun 2006 12:25:23 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > > +#define DEBUG_WARN_ON(c)						\
> > > +({									\
> > > +	int __ret = 0;							\
> > > +									\
> > > +	if (unlikely(c)) {						\
> > > +		if (debug_locks_off())					\
> > > +			WARN_ON(1);					\
> > > +		__ret = 1;						\
> > > +	}								\
> > > +	__ret;								\
> > > +})
> > 
> > Either the name of this thing is too generic, or we _make_ it generic, 
> > in which case it's in the wrong header file.
> 
> this op is only intended to be used only by the lock debugging 
> infrastructure. So it should be renamed - but i fail to find a good name 
> for it. (it's used quite frequently within the lock debugging code, at 
> 60+ places) Maybe INTERNAL_WARN_ON()? [that makes it sound special 
> enough.] DEBUG_LOCKS_WARN_ON() might work too.

Well it has a debug_locks_off() in there, so DEBUG_LOCKS_WARN_ON() seems right.


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 50/61] lock validator: special locking: hrtimer.c
  2006-06-23 10:52         ` Ingo Molnar
@ 2006-06-23 11:52           ` Ingo Molnar
  2006-06-23 12:06             ` Andrew Morton
  0 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2006-06-23 11:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan


* Ingo Molnar <mingo@elte.hu> wrote:

> > > perhaps the naming should be clearer? I had it named 
> > > spin_lock_init_standalone() originally, then cleaned it up to be 
> > > spin_lock_init_static(). Maybe the original name is better?
> > > 
> > 
> > hm.  This is where a "term of art" is needed.  What is lockdep's 
> > internal term for locks-of-a-different-type?  It should have such a 
> > term.
> 
> 'lock type' is what i tried to use consistenty.
> 
> > "class" would be a good term, although terribly overused.  Using that 
> > as an example, spin_lock_init_standalone_class()?  ug.

actually ... 'class' might be an even better term than 'type', mainly 
because type is even more overloaded in this context than class. "Q: 
What type does this lock have?" The natural answer: "it's a spinlock".

so i'm strongly considering the renaming of 'lock type' to 'lock class' 
and push that through all the APIs (and documentation). (i.e. we'd have 
'subclasses' of locks, not 'subtypes'.)

then we could do the annotations (where the call-site heuristics get the 
class wrong and either do false splits or dont do a split) via:

	spin_lock_set_class(&lock, &class_key)
	rwlock_set_class(&rwlock, &class_key)
	mutex_set_class(&mutex, &class_key)
	rwsem_set_class(&rwsem, &class_key)

[And for class-internal nesting, we'd have subclass nesting levels.]

hm?

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 50/61] lock validator: special locking: hrtimer.c
  2006-06-23 11:52           ` Ingo Molnar
@ 2006-06-23 12:06             ` Andrew Morton
  0 siblings, 0 replies; 319+ messages in thread
From: Andrew Morton @ 2006-06-23 12:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Fri, 23 Jun 2006 13:52:54 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > > perhaps the naming should be clearer? I had it named 
> > > > spin_lock_init_standalone() originally, then cleaned it up to be 
> > > > spin_lock_init_static(). Maybe the original name is better?
> > > > 
> > > 
> > > hm.  This is where a "term of art" is needed.  What is lockdep's 
> > > internal term for locks-of-a-different-type?  It should have such a 
> > > term.
> > 
> > 'lock type' is what i tried to use consistenty.
> > 
> > > "class" would be a good term, although terribly overused.  Using that 
> > > as an example, spin_lock_init_standalone_class()?  ug.
> 
> actually ... 'class' might be an even better term than 'type', mainly 
> because type is even more overloaded in this context than class. "Q: 
> What type does this lock have?" The natural answer: "it's a spinlock".
> 
> so i'm strongly considering the renaming of 'lock type' to 'lock class' 
> and push that through all the APIs (and documentation). (i.e. we'd have 
> 'subclasses' of locks, not 'subtypes'.)
> 
> then we could do the annotations (where the call-site heuristics get the 
> class wrong and either do false splits or dont do a split) via:
> 
> 	spin_lock_set_class(&lock, &class_key)
> 	rwlock_set_class(&rwlock, &class_key)
> 	mutex_set_class(&mutex, &class_key)
> 	rwsem_set_class(&rwsem, &class_key)
> 
> [And for class-internal nesting, we'd have subclass nesting levels.]
> 

Works for me.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 61/61] lock validator: enable lock validator in Kconfig
  2006-06-23 11:01     ` Ingo Molnar
@ 2006-06-26 11:37       ` Roman Zippel
  0 siblings, 0 replies; 319+ messages in thread
From: Roman Zippel @ 2006-06-26 11:37 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Arjan van de Ven, Andrew Morton

Hi,

On Fri, 23 Jun 2006, Ingo Molnar wrote:

> > Could you please keep all the defaults in a separate -mm-only patch, 
> > so it doesn't get merged?
> 
> yep - the default got removed.

Thanks.

> > > +config LOCKDEP
> > > +	bool
> > > +	default y
> > > +	depends on PROVE_SPIN_LOCKING || PROVE_RW_LOCKING || PROVE_MUTEX_LOCKING || PROVE_RWSEM_LOCKING
> > 
> > This can be written shorter as:
> > 
> > config LOCKDEP
> > 	def_bool PROVE_SPIN_LOCKING || PROVE_RW_LOCKING || PROVE_MUTEX_LOCKING || PROVE_RWSEM_LOCKING
> 
> ok, done. (Btw., there's tons of other Kconfig code though that uses the 
> bool + depends syntax though, and def_bool usage is quite rare.)

The new syntax was added later, so everything that was converted uses the 
basic syntax and is still copied around a lot (where it's probably also 
doesn't help that it's not properly documented yet). I'm still planing to 
go through this and convert most of them...

bye, Roman

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (64 preceding siblings ...)
  2006-05-30  9:14 ` Benoit Boissinot
@ 2007-02-13 14:20 ` Ingo Molnar
  2007-02-13 15:00   ` Alan
                     ` (6 more replies)
  2007-02-13 14:20 ` [patch 01/11] syslets: add async.h include file, kernel-side API definitions Ingo Molnar
                   ` (7 subsequent siblings)
  73 siblings, 7 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

I'm pleased to announce the first release of the "Syslet" kernel feature 
and kernel subsystem, which provides generic asynchrous system call 
support:

   http://redhat.com/~mingo/syslet-patches/

Syslets are small, simple, lightweight programs (consisting of 
system-calls, 'atoms') that the kernel can execute autonomously (and, 
not the least, asynchronously), without having to exit back into 
user-space. Syslets can be freely constructed and submitted by any 
unprivileged user-space context - and they have access to all the 
resources (and only those resources) that the original context has 
access to.

because the proof of the pudding is eating it, here are the performance 
results from async-test.c which does open()+read()+close() of 1000 small 
random files (smaller is better):

                  synchronous IO      |   Syslets:
                  --------------------------------------------
  uncached:       45.8 seconds        |  34.2 seconds   ( +33.9% )
  cached:         31.6 msecs          |  26.5 msecs     ( +19.2% )

("uncached" results were done via "echo 3 > /proc/sys/vm/drop_caches". 
The default IO scheduler was the deadline scheduler, the test was run on 
ext3, using a single PATA IDE disk.)

So syslets, in this particular workload, are a nice speedup /both/ in 
the uncached and in the cached case. (note that i used only a single
disk, so the level of parallelism in the hardware is quite limited.)

the testcode can be found at:

     http://redhat.com/~mingo/syslet-patches/async-test-0.1.tar.gz

The boring details:

Syslets consist of 'syslet atoms', where each atom represents a single 
system-call. These atoms can be chained to each other: serially, in 
branches or in loops. The return value of an executed atom is checked 
against the condition flags. So an atom can specify 'exit on nonzero' or 
'loop until non-negative' kind of constructs.

Syslet atoms fundamentally execute only system calls, thus to be able to 
manipulate user-space variables from syslets i've added a simple special 
system call: sys_umem_add(ptr, val). This can be used to increase or 
decrease the user-space variable (and to get the result), or to simply 
read out the variable (if 'val' is 0).

So a single syslet (submitted and executed via a single system call) can 
be arbitrarily complex. For example it can be like this:

       --------------------
       |     accept()     |-----> [ stop if returns negative ]
       --------------------
                |
                V
  -------------------------------
  |   setsockopt(TCP_NODELAY)   |-----> [ stop if returns negative ]
  -------------------------------
                |
                v
       --------------------
       |      read()      |<---------
       --------------------         | [ loop while positive ]
           |    |                   |
           |    ---------------------
           |
        -----------------------------------------
        | decrease and read user space variable |
        -----------------------------------------                    A
                    |                                                |
                    -------[ loop back to accept() if positive ]------

(you can find a VFS example and a hello.c example in the user-space 
testcode.)

A syslet is executed opportunistically: i.e. the syslet subsystem 
assumes that the syslet will not block, and it will switch to a 
cachemiss kernel thread from the scheduler. This means that even a 
single-atom syslet (i.e. a pure system call) is very close in 
performance to a pure system call. The syslet NULL-overhead in the 
cached case is roughly 10% of the SYSENTER NULL-syscall overhead. This 
means that two atoms are a win already, even in the cached case.

When a 'cachemiss' occurs, i.e. if we hit schedule() and are about to 
consider other threads, the syslet subsystem picks up a 'cachemiss 
thread' and switches the current task's user-space context over to the 
cachemiss thread, and makes the cachemiss thread available. The original 
thread (which now becomes a 'busy' cachemiss thread) continues to block. 
This means that user-space will still be executed without stopping - 
even if user-space is single-threaded.

if the submitting user-space context /knows/ that a system call will 
block, it can request immediate 'cachemiss' via the SYSLET_ASYNC flag. 
This would be used if for example an O_DIRECT file is read() or 
write()n.

likewise, if user-space knows (or expects) that a system call takes alot 
of CPU time even in the cached case, and it wants to offload it to 
another asynchronous context, it can request that via the SYSLET_ASYNC 
flag too.

completions of asynchronous syslets are done via a user-space ringbuffer 
that the kernel fills and user-space clears. Waiting is done via the 
sys_async_wait() system call. Completion can be supressed on a per-atom 
basis via the SYSLET_NO_COMPLETE flag, for atoms that include some 
implicit notification mechanism. (such as sys_kill(), etc.)

As it might be obvious to some of you, the syslet subsystem takes many 
ideas and experience from my Tux in-kernel webserver :) The syslet code 
originates from a heavy rewrite of the Tux-atom and the Tux-cachemiss 
infrastructure.

Open issues:

 - the 'TID' of the 'head' thread currently varies depending on which 
   thread is running the user-space context.

 - signal support is not fully thought through - probably the head 
   should be getting all of them - the cachemiss threads are not really 
   interested in executing signal handlers.

 - sys_fork() and sys_async_exec() should be filtered out from the 
   syscalls that are allowed - first one only makes sense with ptregs, 
   second one is a nice kernel recursion thing :) I didnt want to 
   duplicate the sys_call_table though - maybe others have a better 
   idea.

See more details in Documentation/syslet-design.txt. The patchset is 
against v2.6.20, but should apply to the -git head as well.

Thanks to Zach Brown for the idea to drive cachemisses via the 
scheduler. Thanks to Arjan van de Ven for early review feedback.

Comments, suggestions, reports are welcome!

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 01/11] syslets: add async.h include file, kernel-side API definitions
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (65 preceding siblings ...)
  2007-02-13 14:20 ` [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support Ingo Molnar
@ 2007-02-13 14:20 ` Ingo Molnar
  2007-02-13 14:20 ` [patch 02/11] syslets: add syslet.h include file, user API/ABI definitions Ingo Molnar
                   ` (6 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add include/linux/async.h which contains the kernel-side API
declarations.

it also provides NOP stubs for the !CONFIG_ASYNC_SUPPORT case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/async.h |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Index: linux/include/linux/async.h
===================================================================
--- /dev/null
+++ linux/include/linux/async.h
@@ -0,0 +1,25 @@
+#ifndef _LINUX_ASYNC_H
+#define _LINUX_ASYNC_H
+/*
+ * The syslet subsystem - asynchronous syscall execution support.
+ *
+ * Generic kernel API definitions:
+ */
+
+#ifdef CONFIG_ASYNC_SUPPORT
+extern void async_init(struct task_struct *t);
+extern void async_exit(struct task_struct *t);
+extern void __async_schedule(struct task_struct *t);
+#else /* !CONFIG_ASYNC_SUPPORT */
+static inline void async_init(struct task_struct *t)
+{
+}
+static inline void async_exit(struct task_struct *t)
+{
+}
+static inline void __async_schedule(struct task_struct *t)
+{
+}
+#endif /* !CONFIG_ASYNC_SUPPORT */
+
+#endif

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 02/11] syslets: add syslet.h include file, user API/ABI definitions
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (66 preceding siblings ...)
  2007-02-13 14:20 ` [patch 01/11] syslets: add async.h include file, kernel-side API definitions Ingo Molnar
@ 2007-02-13 14:20 ` Ingo Molnar
  2007-02-13 20:17   ` Indan Zupancic
  2007-02-19  0:22   ` Paul Mackerras
  2007-02-13 14:20 ` [patch 03/11] syslets: generic kernel bits Ingo Molnar
                   ` (5 subsequent siblings)
  73 siblings, 2 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add include/linux/syslet.h which contains the user-space API/ABI
declarations. Add the new header to include/linux/Kbuild as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/Kbuild   |    1 
 include/linux/syslet.h |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+)

Index: linux/include/linux/Kbuild
===================================================================
--- linux.orig/include/linux/Kbuild
+++ linux/include/linux/Kbuild
@@ -140,6 +140,7 @@ header-y += sockios.h
 header-y += som.h
 header-y += sound.h
 header-y += synclink.h
+header-y += syslet.h
 header-y += telephony.h
 header-y += termios.h
 header-y += ticable.h
Index: linux/include/linux/syslet.h
===================================================================
--- /dev/null
+++ linux/include/linux/syslet.h
@@ -0,0 +1,136 @@
+#ifndef _LINUX_SYSLET_H
+#define _LINUX_SYSLET_H
+/*
+ * The syslet subsystem - asynchronous syscall execution support.
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * User-space API/ABI definitions:
+ */
+
+/*
+ * This is the 'Syslet Atom' - the basic unit of execution
+ * within the syslet framework. A syslet always represents
+ * a single system-call plus its arguments, plus has conditions
+ * attached to it that allows the construction of larger
+ * programs from these atoms. User-space variables can be used
+ * (for example a loop index) via the special sys_umem*() syscalls.
+ *
+ * Arguments are implemented via pointers to arguments. This not
+ * only increases the flexibility of syslet atoms (multiple syslets
+ * can share the same variable for example), but is also an
+ * optimization: copy_uatom() will only fetch syscall parameters
+ * up until the point it meets the first NULL pointer. 50% of all
+ * syscalls have 2 or less parameters (and 90% of all syscalls have
+ * 4 or less parameters).
+ *
+ * [ Note: since the argument array is at the end of the atom, and the
+ *   kernel will not touch any argument beyond the final NULL one, atoms
+ *   might be packed more tightly. (the only special case exception to
+ *   this rule would be SKIP_TO_NEXT_ON_STOP atoms, where the kernel will
+ *   jump a full syslet_uatom number of bytes.) ]
+ */
+struct syslet_uatom {
+	unsigned long				flags;
+	unsigned long				nr;
+	long __user				*ret_ptr;
+	struct syslet_uatom	__user		*next;
+	unsigned long		__user		*arg_ptr[6];
+	/*
+	 * User-space can put anything in here, kernel will not
+	 * touch it:
+	 */
+	void __user				*private;
+};
+
+/*
+ * Flags to modify/control syslet atom behavior:
+ */
+
+/*
+ * Immediately queue this syslet asynchronously - do not even
+ * attempt to execute it synchronously in the user context:
+ */
+#define SYSLET_ASYNC				0x00000001
+
+/*
+ * Never queue this syslet asynchronously - even if synchronous
+ * execution causes a context-switching:
+ */
+#define SYSLET_SYNC				0x00000002
+
+/*
+ * Do not queue the syslet in the completion ring when done.
+ *
+ * ( the default is that the final atom of a syslet is queued
+ *   in the completion ring. )
+ *
+ * Some syscalls generate implicit completion events of their
+ * own.
+ */
+#define SYSLET_NO_COMPLETE			0x00000004
+
+/*
+ * Execution control: conditions upon the return code
+ * of the previous syslet atom. 'Stop' means syslet
+ * execution is stopped and the atom is put into the
+ * completion ring:
+ */
+#define SYSLET_STOP_ON_NONZERO			0x00000008
+#define SYSLET_STOP_ON_ZERO			0x00000010
+#define SYSLET_STOP_ON_NEGATIVE			0x00000020
+#define SYSLET_STOP_ON_NON_POSITIVE		0x00000040
+
+#define SYSLET_STOP_MASK				\
+	(	SYSLET_STOP_ON_NONZERO		|	\
+		SYSLET_STOP_ON_ZERO		|	\
+		SYSLET_STOP_ON_NEGATIVE		|	\
+		SYSLET_STOP_ON_NON_POSITIVE		)
+
+/*
+ * Special modifier to 'stop' handling: instead of stopping the
+ * execution of the syslet, the linearly next syslet is executed.
+ * (Normal execution flows along atom->next, and execution stops
+ *  if atom->next is NULL or a stop condition becomes true.)
+ *
+ * This is what allows true branches of execution within syslets.
+ */
+#define SYSLET_SKIP_TO_NEXT_ON_STOP		0x00000080
+
+/*
+ * This is the (per-user-context) descriptor of the async completion
+ * ring. This gets registered via sys_async_register().
+ */
+struct async_head_user {
+	/*
+	 * Pointers to completed async syslets (i.e. syslets that
+	 * generated a cachemiss and went async, returning -EASYNCSYSLET
+	 * to the user context by sys_async_exec()) are queued here.
+	 * Syslets that were executed synchronously are not queued here.
+	 *
+	 * Note: the final atom that generated the exit condition is
+	 * queued here. Normally this would be the last atom of a syslet.
+	 */
+	struct syslet_uatom __user		**completion_ring;
+	/*
+	 * Ring size in bytes:
+	 */
+	unsigned long				ring_size_bytes;
+
+	/*
+	 * Maximum number of asynchronous contexts the kernel creates.
+	 *
+	 * -1UL has a special meaning: the kernel manages the optimal
+	 * size of the async pool.
+	 *
+	 * Note: this field should be valid for the lifetime of async
+	 * processing, because future kernels detect changes to this
+	 * field. (enabling user-space to control the size of the async
+	 * pool in a low-overhead fashion)
+	 */
+	unsigned long				max_nr_threads;
+};
+
+#endif

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 03/11] syslets: generic kernel bits
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (67 preceding siblings ...)
  2007-02-13 14:20 ` [patch 02/11] syslets: add syslet.h include file, user API/ABI definitions Ingo Molnar
@ 2007-02-13 14:20 ` Ingo Molnar
  2007-02-13 14:20 ` [patch 04/11] syslets: core, data structures Ingo Molnar
                   ` (4 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add the kernel generic bits - these are present even if !CONFIG_ASYNC_SUPPORT.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/sched.h |    7 ++++++-
 kernel/exit.c         |    3 +++
 kernel/fork.c         |    2 ++
 kernel/sched.c        |    9 +++++++++
 4 files changed, 20 insertions(+), 1 deletion(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -88,7 +88,8 @@ struct sched_param {
 
 struct exec_domain;
 struct futex_pi_state;
-
+struct async_thread;
+struct async_head;
 /*
  * List of flags we want to share for kernel threads,
  * if only because they are not used by them anyway.
@@ -997,6 +998,10 @@ struct task_struct {
 /* journalling filesystem info */
 	void *journal_info;
 
+/* async syscall support: */
+	struct async_thread *at, *async_ready;
+	struct async_head *ah;
+
 /* VM state */
 	struct reclaim_state *reclaim_state;
 
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -26,6 +26,7 @@
 #include <linux/ptrace.h>
 #include <linux/profile.h>
 #include <linux/mount.h>
+#include <linux/async.h>
 #include <linux/proc_fs.h>
 #include <linux/mempolicy.h>
 #include <linux/taskstats_kern.h>
@@ -889,6 +890,8 @@ fastcall NORET_TYPE void do_exit(long co
 		schedule();
 	}
 
+	async_exit(tsk);
+
 	tsk->flags |= PF_EXITING;
 
 	if (unlikely(in_atomic()))
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -22,6 +22,7 @@
 #include <linux/personality.h>
 #include <linux/mempolicy.h>
 #include <linux/sem.h>
+#include <linux/async.h>
 #include <linux/file.h>
 #include <linux/key.h>
 #include <linux/binfmts.h>
@@ -1054,6 +1055,7 @@ static struct task_struct *copy_process(
 
 	p->lock_depth = -1;		/* -1 = no lock */
 	do_posix_clock_monotonic_gettime(&p->start_time);
+	async_init(p);
 	p->security = NULL;
 	p->io_context = NULL;
 	p->io_wait = NULL;
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -38,6 +38,7 @@
 #include <linux/vmalloc.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
+#include <linux/async.h>
 #include <linux/smp.h>
 #include <linux/threads.h>
 #include <linux/timer.h>
@@ -3436,6 +3437,14 @@ asmlinkage void __sched schedule(void)
 	}
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
+	prev = current;
+	if (unlikely(prev->async_ready)) {
+		if (prev->state && !(preempt_count() & PREEMPT_ACTIVE) &&
+			(!(prev->state & TASK_INTERRUPTIBLE) ||
+				!signal_pending(prev)))
+			__async_schedule(prev);
+	}
+
 need_resched:
 	preempt_disable();
 	prev = current;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 04/11] syslets: core, data structures
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (68 preceding siblings ...)
  2007-02-13 14:20 ` [patch 03/11] syslets: generic kernel bits Ingo Molnar
@ 2007-02-13 14:20 ` Ingo Molnar
  2007-02-13 14:20 ` [patch 05/11] syslets: core code Ingo Molnar
                   ` (3 subsequent siblings)
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

this adds the data structures used by the syslet / async system calls
infrastructure.

This is used only if CONFIG_ASYNC_SUPPORT is enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/async.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

Index: linux/kernel/async.h
===================================================================
--- /dev/null
+++ linux/kernel/async.h
@@ -0,0 +1,58 @@
+/*
+ * The syslet subsystem - asynchronous syscall execution support.
+ *
+ * Syslet-subsystem internal definitions:
+ */
+
+/*
+ * The kernel-side copy of a syslet atom - with arguments expanded:
+ */
+struct syslet_atom {
+	unsigned long				flags;
+	unsigned long				nr;
+	long __user				*ret_ptr;
+	struct syslet_uatom	__user		*next;
+	unsigned long				args[6];
+};
+
+/*
+ * The 'async head' is the thread which has user-space context (ptregs)
+ * 'below it' - this is the one that can return to user-space:
+ */
+struct async_head {
+	spinlock_t				lock;
+	struct task_struct			*user_task;
+
+	struct list_head			ready_async_threads;
+	struct list_head			busy_async_threads;
+
+	unsigned long				events_left;
+	wait_queue_head_t			wait;
+
+	struct async_head_user	__user		*uah;
+	struct syslet_uatom	__user		**completion_ring;
+	unsigned long				curr_ring_idx;
+	unsigned long				max_ring_idx;
+	unsigned long				ring_size_bytes;
+
+	unsigned int				nr_threads;
+	unsigned int				max_nr_threads;
+
+	struct completion			start_done;
+	struct completion			exit_done;
+};
+
+/*
+ * The 'async thread' is either a newly created async thread or it is
+ * an 'ex-head' - it cannot return to user-space and only has kernel
+ * context.
+ */
+struct async_thread {
+	struct task_struct			*task;
+	struct syslet_uatom	__user		*work;
+	struct async_head			*ah;
+
+	struct list_head			entry;
+
+	unsigned int				exit;
+};

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 05/11] syslets: core code
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (69 preceding siblings ...)
  2007-02-13 14:20 ` [patch 04/11] syslets: core, data structures Ingo Molnar
@ 2007-02-13 14:20 ` Ingo Molnar
  2007-02-13 23:15   ` Andi Kleen
                     ` (3 more replies)
  2007-02-13 14:20 ` [patch 06/11] syslets: core, documentation Ingo Molnar
                   ` (2 subsequent siblings)
  73 siblings, 4 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

the core syslet / async system calls infrastructure code.

Is built only if CONFIG_ASYNC_SUPPORT is enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/Makefile |    1 
 kernel/async.c  |  811 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 812 insertions(+)

Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -10,6 +10,7 @@ obj-y     = sched.o fork.o exec_domain.o
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
 	    hrtimer.o rwsem.o latency.o nsproxy.o srcu.o
 
+obj-$(CONFIG_ASYNC_SUPPORT) += async.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
Index: linux/kernel/async.c
===================================================================
--- /dev/null
+++ linux/kernel/async.c
@@ -0,0 +1,811 @@
+/*
+ * kernel/async.c
+ *
+ * The syslet subsystem - asynchronous syscall execution support.
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This code implements asynchronous syscalls via 'syslets'.
+ *
+ * Syslets consist of a set of 'syslet atoms' which are residing
+ * purely in user-space memory and have no kernel-space resource
+ * attached to them. These atoms can be linked to each other via
+ * pointers. Besides the fundamental ability to execute system
+ * calls, syslet atoms can also implement branches, loops and
+ * arithmetics.
+ *
+ * Thus syslets can be used to build small autonomous programs that
+ * the kernel can execute purely from kernel-space, without having
+ * to return to any user-space context. Syslets can be run by any
+ * unprivileged user-space application - they are executed safely
+ * by the kernel.
+ */
+#include <linux/syscalls.h>
+#include <linux/syslet.h>
+#include <linux/delay.h>
+#include <linux/async.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/err.h>
+
+#include <asm/uaccess.h>
+#include <asm/unistd.h>
+
+#include "async.h"
+
+typedef asmlinkage long (*syscall_fn_t)(long, long, long, long, long, long);
+
+extern syscall_fn_t sys_call_table[NR_syscalls];
+
+static void
+__mark_async_thread_ready(struct async_thread *at, struct async_head *ah)
+{
+	list_del(&at->entry);
+	list_add_tail(&at->entry, &ah->ready_async_threads);
+	if (list_empty(&ah->busy_async_threads))
+		wake_up(&ah->wait);
+}
+
+static void
+mark_async_thread_ready(struct async_thread *at, struct async_head *ah)
+{
+	spin_lock(&ah->lock);
+	__mark_async_thread_ready(at, ah);
+	spin_unlock(&ah->lock);
+}
+
+static void
+__mark_async_thread_busy(struct async_thread *at, struct async_head *ah)
+{
+	list_del(&at->entry);
+	list_add_tail(&at->entry, &ah->busy_async_threads);
+}
+
+static void
+mark_async_thread_busy(struct async_thread *at, struct async_head *ah)
+{
+	spin_lock(&ah->lock);
+	__mark_async_thread_busy(at, ah);
+	spin_unlock(&ah->lock);
+}
+
+static void
+__async_thread_init(struct task_struct *t, struct async_thread *at,
+		    struct async_head *ah)
+{
+	INIT_LIST_HEAD(&at->entry);
+	at->exit = 0;
+	at->task = t;
+	at->ah = ah;
+	at->work = NULL;
+
+	t->at = at;
+	ah->nr_threads++;
+}
+
+static void
+async_thread_init(struct task_struct *t, struct async_thread *at,
+		  struct async_head *ah)
+{
+	spin_lock(&ah->lock);
+	__async_thread_init(t, at, ah);
+	__mark_async_thread_ready(at, ah);
+	spin_unlock(&ah->lock);
+}
+
+
+static void
+async_thread_exit(struct async_thread *at, struct task_struct *t)
+{
+	struct async_head *ah;
+
+	ah = at->ah;
+
+	spin_lock(&ah->lock);
+	list_del_init(&at->entry);
+	if (at->exit)
+		complete(&ah->exit_done);
+	t->at = NULL;
+	at->task = NULL;
+	WARN_ON(!ah->nr_threads);
+	ah->nr_threads--;
+	spin_unlock(&ah->lock);
+}
+
+static struct async_thread *
+pick_ready_cachemiss_thread(struct async_head *ah)
+{
+	struct list_head *head = &ah->ready_async_threads;
+	struct async_thread *at;
+
+	if (list_empty(head))
+		return NULL;
+
+	at = list_entry(head->next, struct async_thread, entry);
+
+	return at;
+}
+
+static void pick_new_async_head(struct async_head *ah,
+				struct task_struct *t, struct pt_regs *old_regs)
+{
+	struct async_thread *new_async_thread;
+	struct async_thread *async_ready;
+	struct task_struct *new_task;
+	struct pt_regs *new_regs;
+
+	spin_lock(&ah->lock);
+
+	new_async_thread = pick_ready_cachemiss_thread(ah);
+	if (!new_async_thread)
+		goto out_unlock;
+
+	async_ready = t->async_ready;
+	WARN_ON(!async_ready);
+	t->async_ready = NULL;
+
+	new_task = new_async_thread->task;
+	new_regs = task_pt_regs(new_task);
+	*new_regs = *old_regs;
+
+	new_task->at = NULL;
+	t->ah = NULL;
+	new_task->ah = ah;
+
+	wake_up_process(new_task);
+
+	__async_thread_init(t, async_ready, ah);
+	__mark_async_thread_busy(t->at, ah);
+
+ out_unlock:
+	spin_unlock(&ah->lock);
+}
+
+void __async_schedule(struct task_struct *t)
+{
+	struct async_head *ah = t->ah;
+	struct pt_regs *old_regs = task_pt_regs(t);
+
+	pick_new_async_head(ah, t, old_regs);
+}
+
+static void async_schedule(struct task_struct *t)
+{
+	if (t->async_ready)
+		__async_schedule(t);
+}
+
+static long __exec_atom(struct task_struct *t, struct syslet_atom *atom)
+{
+	struct async_thread *async_ready_save;
+	long ret;
+
+	/*
+	 * If user-space expects the syscall to schedule then
+	 * (try to) switch user-space to another thread straight
+	 * away and execute the syscall asynchronously:
+	 */
+	if (unlikely(atom->flags & SYSLET_ASYNC))
+		async_schedule(t);
+	/*
+	 * Does user-space want synchronous execution for this atom?:
+	 */
+	async_ready_save = t->async_ready;
+	if (unlikely(atom->flags & SYSLET_SYNC))
+		t->async_ready = NULL;
+
+	if (unlikely(atom->nr >= NR_syscalls))
+		return -ENOSYS;
+
+	ret = sys_call_table[atom->nr](atom->args[0], atom->args[1],
+				       atom->args[2], atom->args[3],
+				       atom->args[4], atom->args[5]);
+	if (atom->ret_ptr && put_user(ret, atom->ret_ptr))
+		return -EFAULT;
+
+	if (t->ah)
+		t->async_ready = async_ready_save;
+
+	return ret;
+}
+
+/*
+ * Arithmetics syscall, add a value to a user-space memory location.
+ *
+ * Generic C version - in case the architecture has not implemented it
+ * in assembly.
+ */
+asmlinkage __attribute__((weak)) long
+sys_umem_add(unsigned long __user *uptr, unsigned long inc)
+{
+	unsigned long val, new_val;
+
+	if (get_user(val, uptr))
+		return -EFAULT;
+	/*
+	 * inc == 0 means 'read memory value':
+	 */
+	if (!inc)
+		return val;
+
+	new_val = val + inc;
+	__put_user(new_val, uptr);
+
+	return new_val;
+}
+
+/*
+ * Open-coded because this is a very hot codepath during syslet
+ * execution and every cycle counts ...
+ *
+ * [ NOTE: it's an explicit fastcall because optimized assembly code
+ *   might depend on this. There are some kernels that disable regparm,
+ *   so lets not break those if possible. ]
+ */
+fastcall __attribute__((weak)) long
+copy_uatom(struct syslet_atom *atom, struct syslet_uatom __user *uatom)
+{
+	unsigned long __user *arg_ptr;
+	long ret = 0;
+
+	if (!access_ok(VERIFY_WRITE, uatom, sizeof(*uatom)))
+		return -EFAULT;
+
+	ret = __get_user(atom->nr, &uatom->nr);
+	ret |= __get_user(atom->ret_ptr, &uatom->ret_ptr);
+	ret |= __get_user(atom->flags, &uatom->flags);
+	ret |= __get_user(atom->next, &uatom->next);
+
+	memset(atom->args, 0, sizeof(atom->args));
+
+	ret |= __get_user(arg_ptr, &uatom->arg_ptr[0]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_WRITE, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[0], arg_ptr);
+
+	ret |= __get_user(arg_ptr, &uatom->arg_ptr[1]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_WRITE, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[1], arg_ptr);
+
+	ret |= __get_user(arg_ptr, &uatom->arg_ptr[2]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_WRITE, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[2], arg_ptr);
+
+	ret |= __get_user(arg_ptr, &uatom->arg_ptr[3]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_WRITE, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[3], arg_ptr);
+
+	ret |= __get_user(arg_ptr, &uatom->arg_ptr[4]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_WRITE, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[4], arg_ptr);
+
+	ret |= __get_user(arg_ptr, &uatom->arg_ptr[5]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_WRITE, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[5], arg_ptr);
+
+	return ret;
+}
+
+/*
+ * Should the next atom run, depending on the return value of
+ * the current atom - or should we stop execution?
+ */
+static int run_next_atom(struct syslet_atom *atom, long ret)
+{
+	switch (atom->flags & SYSLET_STOP_MASK) {
+		case SYSLET_STOP_ON_NONZERO:
+			if (!ret)
+				return 1;
+			return 0;
+		case SYSLET_STOP_ON_ZERO:
+			if (ret)
+				return 1;
+			return 0;
+		case SYSLET_STOP_ON_NEGATIVE:
+			if (ret >= 0)
+				return 1;
+			return 0;
+		case SYSLET_STOP_ON_NON_POSITIVE:
+			if (ret > 0)
+				return 1;
+			return 0;
+	}
+	return 1;
+}
+
+static struct syslet_uatom __user *
+next_uatom(struct syslet_atom *atom, struct syslet_uatom *uatom, long ret)
+{
+	/*
+	 * If the stop condition is false then continue
+	 * to atom->next:
+	 */
+	if (run_next_atom(atom, ret))
+		return atom->next;
+	/*
+	 * Special-case: if the stop condition is true and the atom
+	 * has SKIP_TO_NEXT_ON_STOP set, then instead of
+	 * stopping we skip to the atom directly after this atom
+	 * (in linear address-space).
+	 *
+	 * This, combined with the atom->next pointer and the
+	 * stop condition flags is what allows true branches and
+	 * loops in syslets:
+	 */
+	if (atom->flags & SYSLET_SKIP_TO_NEXT_ON_STOP)
+		return uatom + 1;
+
+	return NULL;
+}
+
+/*
+ * If user-space requested a completion event then put the last
+ * executed uatom into the completion ring:
+ */
+static long
+complete_uatom(struct async_head *ah, struct task_struct *t,
+	       struct syslet_atom *atom, struct syslet_uatom __user *uatom)
+{
+	struct syslet_uatom __user **ring_slot, *slot_val = NULL;
+	long ret;
+
+	WARN_ON(!t->at);
+	WARN_ON(t->ah);
+
+	if (unlikely(atom->flags & SYSLET_NO_COMPLETE))
+		return 0;
+
+	/*
+	 * Asynchron threads can complete in parallel so use the
+	 * head-lock to serialize:
+	 */
+	spin_lock(&ah->lock);
+	ring_slot = ah->completion_ring + ah->curr_ring_idx;
+	ret = __copy_from_user_inatomic(&slot_val, ring_slot, sizeof(slot_val));
+	/*
+	 * User-space submitted more work than what fits into the
+	 * completion ring - do not stomp over it silently and signal
+	 * the error condition:
+	 */
+	if (unlikely(slot_val)) {
+		spin_unlock(&ah->lock);
+		return -EFAULT;
+	}
+	slot_val = uatom;
+	ret |= __copy_to_user_inatomic(ring_slot, &slot_val, sizeof(slot_val));
+
+	ah->curr_ring_idx++;
+	if (unlikely(ah->curr_ring_idx == ah->max_ring_idx))
+		ah->curr_ring_idx = 0;
+
+	/*
+	 * See whether the async-head is waiting and needs a wakeup:
+	 */
+	if (ah->events_left) {
+		ah->events_left--;
+		if (!ah->events_left)
+			wake_up(&ah->wait);
+	}
+
+	spin_unlock(&ah->lock);
+
+	return ret;
+}
+
+/*
+ * This is the main syslet atom execution loop. This fetches atoms
+ * and executes them until it runs out of atoms or until the
+ * exit condition becomes false:
+ */
+static struct syslet_uatom __user *
+exec_atom(struct async_head *ah, struct task_struct *t,
+	  struct syslet_uatom __user *uatom)
+{
+	struct syslet_uatom __user *last_uatom;
+	struct syslet_atom atom;
+	long ret;
+
+ run_next:
+	if (unlikely(copy_uatom(&atom, uatom)))
+		return ERR_PTR(-EFAULT);
+
+	last_uatom = uatom;
+	ret = __exec_atom(t, &atom);
+	if (unlikely(signal_pending(t) || need_resched()))
+		goto stop;
+
+	uatom = next_uatom(&atom, uatom, ret);
+	if (uatom)
+		goto run_next;
+ stop:
+	/*
+	 * We do completion only in async context:
+	 */
+	if (t->at && complete_uatom(ah, t, &atom, last_uatom))
+		return ERR_PTR(-EFAULT);
+
+	return last_uatom;
+}
+
+static void cachemiss_execute(struct async_thread *at, struct async_head *ah,
+			      struct task_struct *t)
+{
+	struct syslet_uatom __user *uatom;
+
+	uatom = at->work;
+	WARN_ON(!uatom);
+	at->work = NULL;
+
+	exec_atom(ah, t, uatom);
+}
+
+static void
+cachemiss_loop(struct async_thread *at, struct async_head *ah,
+	       struct task_struct *t)
+{
+	for (;;) {
+		schedule();
+		mark_async_thread_busy(at, ah);
+		set_task_state(t, TASK_INTERRUPTIBLE);
+		if (at->work)
+			cachemiss_execute(at, ah, t);
+		if (unlikely(t->ah || at->exit || signal_pending(t)))
+			break;
+		mark_async_thread_ready(at, ah);
+	}
+	t->state = TASK_RUNNING;
+
+	async_thread_exit(at, t);
+}
+
+static int cachemiss_thread(void *data)
+{
+	struct task_struct *t = current;
+	struct async_head *ah = data;
+	struct async_thread at;
+
+	async_thread_init(t, &at, ah);
+	complete(&ah->start_done);
+
+	cachemiss_loop(&at, ah, t);
+	if (at.exit)
+		do_exit(0);
+
+	if (!t->ah && signal_pending(t)) {
+		WARN_ON(1);
+		do_exit(0);
+	}
+
+	/*
+	 * Return to user-space with NULL:
+	 */
+	return 0;
+}
+
+static void __notify_async_thread_exit(struct async_thread *at,
+				       struct async_head *ah)
+{
+	list_del_init(&at->entry);
+	at->exit = 1;
+	init_completion(&ah->exit_done);
+	wake_up_process(at->task);
+}
+
+static void stop_cachemiss_threads(struct async_head *ah)
+{
+	struct async_thread *at;
+
+repeat:
+	spin_lock(&ah->lock);
+	list_for_each_entry(at, &ah->ready_async_threads, entry) {
+
+		__notify_async_thread_exit(at, ah);
+		spin_unlock(&ah->lock);
+
+		wait_for_completion(&ah->exit_done);
+
+		goto repeat;
+	}
+
+	list_for_each_entry(at, &ah->busy_async_threads, entry) {
+
+		__notify_async_thread_exit(at, ah);
+		spin_unlock(&ah->lock);
+
+		wait_for_completion(&ah->exit_done);
+
+		goto repeat;
+	}
+	spin_unlock(&ah->lock);
+}
+
+static void async_head_exit(struct async_head *ah, struct task_struct *t)
+{
+	stop_cachemiss_threads(ah);
+	WARN_ON(!list_empty(&ah->ready_async_threads));
+	WARN_ON(!list_empty(&ah->busy_async_threads));
+	WARN_ON(ah->nr_threads);
+	WARN_ON(spin_is_locked(&ah->lock));
+	kfree(ah);
+	t->ah = NULL;
+}
+
+/*
+ * Pretty arbitrary for now. The kernel resource-controls the number
+ * of threads anyway.
+ */
+#define DEFAULT_THREAD_LIMIT 1024
+
+/*
+ * Initialize the in-kernel async head, based on the user-space async
+ * head:
+ */
+static long
+async_head_init(struct task_struct *t, struct async_head_user __user *uah)
+{
+	unsigned long max_nr_threads, ring_size_bytes, max_ring_idx;
+	struct syslet_uatom __user **completion_ring;
+	struct async_head *ah;
+	long ret;
+
+	if (get_user(max_nr_threads, &uah->max_nr_threads))
+		return -EFAULT;
+	if (get_user(completion_ring, &uah->completion_ring))
+		return -EFAULT;
+	if (get_user(ring_size_bytes, &uah->ring_size_bytes))
+		return -EFAULT;
+	if (!ring_size_bytes)
+		return -EINVAL;
+	/*
+	 * We pre-check the ring pointer, so that in the fastpath
+	 * we can use __put_user():
+	 */
+	if (!access_ok(VERIFY_WRITE, completion_ring, ring_size_bytes))
+		return -EFAULT;
+
+	max_ring_idx = ring_size_bytes / sizeof(void *);
+	if (ring_size_bytes != max_ring_idx * sizeof(void *))
+		return -EINVAL;
+
+	/*
+	 * Lock down the ring. Note: user-space should not munlock() this,
+	 * because if the ring pages get swapped out then the async
+	 * completion code might return a -EFAULT instead of the expected
+	 * completion. (the kernel safely handles that case too, so this
+	 * isnt a security problem.)
+	 *
+	 * mlock() is better here because it gets resource-accounted
+	 * properly, and even unprivileged userspace has a few pages
+	 * of mlock-able memory available. (which is more than enough
+	 * for the completion-pointers ringbuffer)
+	 */
+	ret = sys_mlock((unsigned long)completion_ring, ring_size_bytes);
+	if (ret)
+		return ret;
+
+	/*
+	 * -1 means: the kernel manages the optimal size of the async pool.
+	 * Simple static limit for now.
+	 */
+	if (max_nr_threads == -1UL)
+		max_nr_threads = DEFAULT_THREAD_LIMIT;
+	/*
+	 * If the ring is smaller than the number of threads requested
+	 * then lower the thread count - otherwise we might lose
+	 * syslet completion events:
+	 */
+	max_nr_threads = min(max_ring_idx, max_nr_threads);
+
+	ah = kmalloc(sizeof(*ah), GFP_KERNEL);
+	if (!ah)
+		return -ENOMEM;
+
+	spin_lock_init(&ah->lock);
+	ah->nr_threads = 0;
+	ah->max_nr_threads = max_nr_threads;
+	INIT_LIST_HEAD(&ah->ready_async_threads);
+	INIT_LIST_HEAD(&ah->busy_async_threads);
+	init_waitqueue_head(&ah->wait);
+	ah->events_left = 0;
+	ah->uah = uah;
+	ah->curr_ring_idx = 0;
+	ah->max_ring_idx = max_ring_idx;
+	ah->completion_ring = completion_ring;
+	ah->ring_size_bytes = ring_size_bytes;
+
+	ah->user_task = t;
+	t->ah = ah;
+
+	return 0;
+}
+
+/**
+ * sys_async_register - enable async syscall support
+ */
+asmlinkage long
+sys_async_register(struct async_head_user __user *uah, unsigned int len)
+{
+	struct task_struct *t = current;
+
+	/*
+	 * This 'len' check enables future extension of
+	 * the async_head ABI:
+	 */
+	if (len != sizeof(struct async_head_user))
+		return -EINVAL;
+	/*
+	 * Already registered?
+	 */
+	if (t->ah)
+		return -EEXIST;
+
+	return async_head_init(t, uah);
+}
+
+/**
+ * sys_async_unregister - disable async syscall support
+ */
+asmlinkage long
+sys_async_unregister(struct async_head_user __user *uah, unsigned int len)
+{
+	struct syslet_uatom __user **completion_ring;
+	struct task_struct *t = current;
+	struct async_head *ah = t->ah;
+	unsigned long ring_size_bytes;
+
+	if (len != sizeof(struct async_head_user))
+		return -EINVAL;
+	/*
+	 * Already unregistered?
+	 */
+	if (!ah)
+		return -EINVAL;
+
+	completion_ring = ah->completion_ring;
+	ring_size_bytes = ah->ring_size_bytes;
+
+	async_head_exit(ah, t);
+
+	/*
+	 * Unpin the ring:
+	 */
+	return sys_munlock((unsigned long)completion_ring, ring_size_bytes);
+}
+
+/*
+ * Simple limit and pool management mechanism for now:
+ */
+static void refill_cachemiss_pool(struct async_head *ah)
+{
+	int pid;
+
+	if (ah->nr_threads >= ah->max_nr_threads)
+		return;
+
+	init_completion(&ah->start_done);
+
+	pid = create_async_thread(cachemiss_thread, (void *)ah,
+			   CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND |
+			   CLONE_PTRACE | CLONE_THREAD | CLONE_SYSVSEM);
+	if (pid < 0)
+		return;
+
+	wait_for_completion(&ah->start_done);
+}
+
+/**
+ * sys_async_wait - wait for async completion events
+ *
+ * This syscall waits for @min_wait_events syslet completion events
+ * to finish or for all async processing to finish (whichever
+ * comes first).
+ */
+asmlinkage long sys_async_wait(unsigned long min_wait_events)
+{
+	struct async_head *ah = current->ah;
+
+	if (!ah)
+		return -EINVAL;
+
+	if (min_wait_events) {
+		spin_lock(&ah->lock);
+		ah->events_left = min_wait_events;
+		spin_unlock(&ah->lock);
+	}
+
+	return wait_event_interruptible(ah->wait,
+		list_empty(&ah->busy_async_threads) || !ah->events_left);
+}
+
+/**
+ * sys_async_exec - execute a syslet.
+ *
+ * returns the uatom that was last executed, if the kernel was able to
+ * execute the syslet synchronously, or NULL if the syslet became
+ * asynchronous. (in the latter case syslet completion will be notified
+ * via the completion ring)
+ *
+ * (Various errors might also be returned via the usual negative numbers.)
+ */
+asmlinkage struct syslet_uatom __user *
+sys_async_exec(struct syslet_uatom __user *uatom)
+{
+	struct syslet_uatom __user *ret;
+	struct task_struct *t = current;
+	struct async_head *ah = t->ah;
+	struct async_thread at;
+
+	if (unlikely(!ah))
+		return ERR_PTR(-EINVAL);
+
+	if (list_empty(&ah->ready_async_threads))
+		refill_cachemiss_pool(ah);
+
+	t->async_ready = &at;
+	ret = exec_atom(ah, t, uatom);
+
+	if (t->ah) {
+		WARN_ON(!t->async_ready);
+		t->async_ready = NULL;
+		return ret;
+	}
+	ret = ERR_PTR(-EINTR);
+	if (!at.exit && !signal_pending(t)) {
+		set_task_state(t, TASK_INTERRUPTIBLE);
+		mark_async_thread_ready(&at, ah);
+		cachemiss_loop(&at, ah, t);
+	}
+	if (t->ah)
+		return NULL;
+	else
+		do_exit(0);
+}
+
+/*
+ * fork()-time initialization:
+ */
+void async_init(struct task_struct *t)
+{
+	t->at = NULL;
+	t->async_ready = NULL;
+	t->ah = NULL;
+}
+
+/*
+ * do_exit()-time cleanup:
+ */
+void async_exit(struct task_struct *t)
+{
+	struct async_thread *at = t->at;
+	struct async_head *ah = t->ah;
+
+	WARN_ON(at && ah);
+	WARN_ON(t->async_ready);
+
+	if (unlikely(at))
+		async_thread_exit(at, t);
+
+	if (unlikely(ah))
+		async_head_exit(ah, t);
+}

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 06/11] syslets: core, documentation
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (70 preceding siblings ...)
  2007-02-13 14:20 ` [patch 05/11] syslets: core code Ingo Molnar
@ 2007-02-13 14:20 ` Ingo Molnar
  2007-02-13 20:18   ` Davide Libenzi
  2007-02-14 10:36   ` Russell King
  2007-02-13 14:20 ` [patch 07/11] syslets: x86, add create_async_thread() method Ingo Molnar
       [not found] ` <20061213130211.GT21847@elte.hu>
  73 siblings, 2 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

Add Documentation/syslet-design.txt with a high-level description
of the syslet concepts.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 Documentation/syslet-design.txt |  137 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

Index: linux/Documentation/syslet-design.txt
===================================================================
--- /dev/null
+++ linux/Documentation/syslet-design.txt
@@ -0,0 +1,137 @@
+Syslets / asynchronous system calls
+===================================
+
+started by Ingo Molnar <mingo@redhat.com>
+
+Goal:
+-----
+
+The goal of the syslet subsystem is to allow user-space to execute
+arbitrary system calls asynchronously. It does so by allowing user-space
+to execute "syslets" which are small scriptlets that the kernel can execute
+both securely and asynchronously without having to exit to user-space.
+
+the core syslet concepts are:
+
+The Syslet Atom:
+----------------
+
+The syslet atom is a small, fixed-size (44 bytes on 32-bit) piece of
+user-space memory, which is the basic unit of execution within the syslet
+framework. A syslet represents a single system-call and its arguments.
+In addition it also has condition flags attached to it that allows the
+construction of larger programs (syslets) from these atoms.
+
+Arguments to the system call are implemented via pointers to arguments.
+This not only increases the flexibility of syslet atoms (multiple syslets
+can share the same variable for example), but is also an optimization:
+copy_uatom() will only fetch syscall parameters up until the point it
+meets the first NULL pointer. 50% of all syscalls have 2 or less
+parameters (and 90% of all syscalls have 4 or less parameters).
+
+ [ Note: since the argument array is at the end of the atom, and the
+   kernel will not touch any argument beyond the final NULL one, atoms
+   might be packed more tightly. (the only special case exception to
+   this rule would be SKIP_TO_NEXT_ON_STOP atoms, where the kernel will
+   jump a full syslet_uatom number of bytes.) ]
+
+The Syslet:
+-----------
+
+A syslet is a program, represented by a graph of syslet atoms. The
+syslet atoms are chained to each other either via the atom->next pointer,
+or via the SYSLET_SKIP_TO_NEXT_ON_STOP flag.
+
+Running Syslets:
+----------------
+
+Syslets can be run via the sys_async_exec() system call, which takes
+the first atom of the syslet as an argument. The kernel does not need
+to be told about the other atoms - it will fetch them on the fly as
+execution goes forward.
+
+A syslet might either be executed 'cached', or it might generate a
+'cachemiss'.
+
+'Cached' syslet execution means that the whole syslet was executed
+without blocking. The system-call returns the submitted atom's address
+in this case.
+
+If a syslet blocks while the kernel executes a system-call embedded in
+one of its atoms, the kernel will keep working on that syscall in
+parallel, but it immediately returns to user-space with a NULL pointer,
+so the submitting task can submit other syslets.
+
+Completion of asynchronous syslets:
+-----------------------------------
+
+Completion of asynchronous syslets is done via the 'completion ring',
+which is a ringbuffer of syslet atom pointers user user-space memory,
+provided by user-space in the sys_async_register() syscall. The
+kernel fills in the ringbuffer starting at index 0, and user-space
+must clear out these pointers. Once the kernel reaches the end of
+the ring it wraps back to index 0. The kernel will not overwrite
+non-NULL pointers (but will return an error), user-space has to
+make sure it completes all events it asked for.
+
+Waiting for completions:
+------------------------
+
+Syslet completions can be waited for via the sys_async_wait()
+system call - which takes the number of events it should wait for as
+a parameter. This system call will also return if the number of
+pending events goes down to zero.
+
+Sample Hello World syslet code:
+
+--------------------------->
+/*
+ * Set up a syslet atom:
+ */
+static void
+init_atom(struct syslet_uatom *atom, int nr,
+	  void *arg_ptr0, void *arg_ptr1, void *arg_ptr2,
+	  void *arg_ptr3, void *arg_ptr4, void *arg_ptr5,
+	  void *ret_ptr, unsigned long flags, struct syslet_uatom *next)
+{
+	atom->nr = nr;
+	atom->arg_ptr[0] = arg_ptr0;
+	atom->arg_ptr[1] = arg_ptr1;
+	atom->arg_ptr[2] = arg_ptr2;
+	atom->arg_ptr[3] = arg_ptr3;
+	atom->arg_ptr[4] = arg_ptr4;
+	atom->arg_ptr[5] = arg_ptr5;
+	atom->ret_ptr = ret_ptr;
+	atom->flags = flags;
+	atom->next = next;
+}
+
+int main(int argc, char *argv[])
+{
+	unsigned long int fd_out = 1; /* standard output */
+	char *buf = "Hello Syslet World!\n";
+	unsigned long size = strlen(buf);
+	struct syslet_uatom atom, *done;
+
+	async_head_init();
+
+	/*
+	 * Simple syslet consisting of a single atom:
+	 */
+	init_atom(&atom, __NR_sys_write, &fd_out, &buf, &size,
+		  NULL, NULL, NULL, NULL, SYSLET_ASYNC, NULL);
+	done = sys_async_exec(&atom);
+	if (!done) {
+		sys_async_wait(1);
+		if (completion_ring[curr_ring_idx] == &atom) {
+			completion_ring[curr_ring_idx] = NULL;
+			printf("completed an async syslet atom!\n");
+		}
+	} else {
+		printf("completed an cached syslet atom!\n");
+	}
+
+	async_head_exit();
+
+	return 0;
+}

^ permalink raw reply	[flat|nested] 319+ messages in thread

* [patch 07/11] syslets: x86, add create_async_thread() method
  2006-05-29 21:21 [patch 00/61] ANNOUNCE: lock validator -V1 Ingo Molnar
                   ` (71 preceding siblings ...)
  2007-02-13 14:20 ` [patch 06/11] syslets: core, documentation Ingo Molnar
@ 2007-02-13 14:20 ` Ingo Molnar
       [not found] ` <20061213130211.GT21847@elte.hu>
  73 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add the create_async_thread() way of creating kernel threads:
these threads first execute a kernel function and when they
return from it they execute user-space.

An architecture must implement this interface before it can turn
CONFIG_ASYNC_SUPPORT on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/entry.S     |   25 +++++++++++++++++++++++++
 arch/i386/kernel/process.c   |   31 +++++++++++++++++++++++++++++++
 include/asm-i386/processor.h |    5 +++++
 3 files changed, 61 insertions(+)

Index: linux/arch/i386/kernel/entry.S
===================================================================
--- linux.orig/arch/i386/kernel/entry.S
+++ linux/arch/i386/kernel/entry.S
@@ -996,6 +996,31 @@ ENTRY(kernel_thread_helper)
 	CFI_ENDPROC
 ENDPROC(kernel_thread_helper)
 
+ENTRY(async_thread_helper)
+	CFI_STARTPROC
+	/*
+	 * Allocate space on the stack for pt-regs.
+	 * sizeof(struct pt_regs) == 64, and we've got 8 bytes on the
+	 * kernel stack already:
+	 */
+	subl $64-8, %esp
+	CFI_ADJUST_CFA_OFFSET 64
+	movl %edx,%eax
+	push %edx
+	CFI_ADJUST_CFA_OFFSET 4
+	call *%ebx
+	addl $4, %esp
+	CFI_ADJUST_CFA_OFFSET -4
+
+	movl %eax, PT_EAX(%esp)
+
+	GET_THREAD_INFO(%ebp)
+
+	jmp syscall_exit
+	CFI_ENDPROC
+ENDPROC(async_thread_helper)
+
+
 .section .rodata,"a"
 #include "syscall_table.S"
 
Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -352,6 +352,37 @@ int kernel_thread(int (*fn)(void *), voi
 EXPORT_SYMBOL(kernel_thread);
 
 /*
+ * This gets run with %ebx containing the
+ * function to call, and %edx containing
+ * the "args".
+ */
+extern void async_thread_helper(void);
+
+/*
+ * Create an async thread
+ */
+int create_async_thread(int (*fn)(void *), void * arg, unsigned long flags)
+{
+	struct pt_regs regs;
+
+	memset(&regs, 0, sizeof(regs));
+
+	regs.ebx = (unsigned long) fn;
+	regs.edx = (unsigned long) arg;
+
+	regs.xds = __USER_DS;
+	regs.xes = __USER_DS;
+	regs.xgs = __KERNEL_PDA;
+	regs.orig_eax = -1;
+	regs.eip = (unsigned long) async_thread_helper;
+	regs.xcs = __KERNEL_CS | get_kernel_rpl();
+	regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;
+
+	/* Ok, create the new task.. */
+	return do_fork(flags | CLONE_VM, 0, &regs, 0, NULL, NULL);
+}
+
+/*
  * Free current thread data structures etc..
  */
 void exit_thread(void)
Index: linux/include/asm-i386/processor.h
===================================================================
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -468,6 +468,11 @@ extern void prepare_to_copy(struct task_
  */
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
+/*
+ * create an async thread:
+ */
+extern int create_async_thread(int (*fn)(void *), void * arg, unsigned long flags);
+
 extern unsigned long thread_saved_pc(struct task_struct *tsk);
 void show_trace(struct task_struct *task, struct pt_regs *regs, unsigned long *stack);
 

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 15:00   ` Alan
@ 2007-02-13 14:58     ` Benjamin LaHaise
  2007-02-13 15:09       ` Arjan van de Ven
                         ` (3 more replies)
  2007-02-13 15:46     ` Dmitry Torokhov
                       ` (2 subsequent siblings)
  3 siblings, 4 replies; 319+ messages in thread
From: Benjamin LaHaise @ 2007-02-13 14:58 UTC (permalink / raw)
  To: Alan
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Thomas Gleixner

On Tue, Feb 13, 2007 at 03:00:19PM +0000, Alan wrote:
> > Open issues:
> 
> Let me add some more

Also: FPU state (especially important with the FPU and SSE memory copy 
variants), segment register bases on x86-64, interaction with set_fs()...  
There is no easy way of getting around the full thread context switch and 
its associated overhead (mucking around in CR0 is one of the more expensive 
bits of the context switch code path, and at the very least, setting the FPU 
not present is mandatory).  I have looked into exactly this approach, and 
it's only cheaper if the code is incomplete.  Linux's native threads are 
pretty damned good.

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 14:20 ` [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support Ingo Molnar
@ 2007-02-13 15:00   ` Alan
  2007-02-13 14:58     ` Benjamin LaHaise
                       ` (3 more replies)
  2007-02-13 20:22   ` Davide Libenzi
                     ` (5 subsequent siblings)
  6 siblings, 4 replies; 319+ messages in thread
From: Alan @ 2007-02-13 15:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

> A syslet is executed opportunistically: i.e. the syslet subsystem 
> assumes that the syslet will not block, and it will switch to a 
> cachemiss kernel thread from the scheduler. This means that even a 

How is scheduler fairness maintained ? and what is done for resource
accounting here ?

> that the kernel fills and user-space clears. Waiting is done via the 
> sys_async_wait() system call. Completion can be supressed on a per-atom 

They should be selectable as well iff possible.

> Open issues:

Let me add some more

	sys_setuid/gid/etc need to be synchronous only and not occur
while other async syscalls are running in parallel to meet current kernel
assumptions.

	sys_exec and other security boundaries must be synchronous only
and not allow async "spill over" (consider setuid async binary patching)

>  - sys_fork() and sys_async_exec() should be filtered out from the 
>    syscalls that are allowed - first one only makes sense with ptregs, 

clone and vfork. async_vfork is a real mindbender actually.

>    second one is a nice kernel recursion thing :) I didnt want to 
>    duplicate the sys_call_table though - maybe others have a better 
>    idea.

What are the semantics of async sys_async_wait and async sys_async ?


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 14:58     ` Benjamin LaHaise
@ 2007-02-13 15:09       ` Arjan van de Ven
  2007-02-13 16:24       ` bert hubert
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 319+ messages in thread
From: Arjan van de Ven @ 2007-02-13 15:09 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Alan, Ingo Molnar, linux-kernel, Linus Torvalds,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Thomas Gleixner

On Tue, 2007-02-13 at 09:58 -0500, Benjamin LaHaise wrote:
> On Tue, Feb 13, 2007 at 03:00:19PM +0000, Alan wrote:
> > > Open issues:
> > 
> > Let me add some more
> 
> Also: FPU state (especially important with the FPU and SSE memory copy 
> variants)

are these preserved over explicit system calls? 
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 15:00   ` Alan
  2007-02-13 14:58     ` Benjamin LaHaise
@ 2007-02-13 15:46     ` Dmitry Torokhov
  2007-02-13 20:39       ` Ingo Molnar
  2007-02-13 16:39     ` Andi Kleen
  2007-02-13 16:42     ` Ingo Molnar
  3 siblings, 1 reply; 319+ messages in thread
From: Dmitry Torokhov @ 2007-02-13 15:46 UTC (permalink / raw)
  To: Alan
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

On 2/13/07, Alan <alan@lxorguk.ukuu.org.uk> wrote:
> > A syslet is executed opportunistically: i.e. the syslet subsystem
> > assumes that the syslet will not block, and it will switch to a
> > cachemiss kernel thread from the scheduler. This means that even a
>
> How is scheduler fairness maintained ? and what is done for resource
> accounting here ?
>
> > that the kernel fills and user-space clears. Waiting is done via the
> > sys_async_wait() system call. Completion can be supressed on a per-atom
>
> They should be selectable as well iff possible.
>
> > Open issues:
>
> Let me add some more
>
>        sys_setuid/gid/etc need to be synchronous only and not occur
> while other async syscalls are running in parallel to meet current kernel
> assumptions.
>
>        sys_exec and other security boundaries must be synchronous only
> and not allow async "spill over" (consider setuid async binary patching)
>
> >  - sys_fork() and sys_async_exec() should be filtered out from the
> >    syscalls that are allowed - first one only makes sense with ptregs,
>
> clone and vfork. async_vfork is a real mindbender actually.
>
> >    second one is a nice kernel recursion thing :) I didnt want to
> >    duplicate the sys_call_table though - maybe others have a better
> >    idea.
>
> What are the semantics of async sys_async_wait and async sys_async ?
>

Ooooohh. OpenVMS lives forever ;) Me likeee ;)

-- 
Dmitry

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 14:58     ` Benjamin LaHaise
  2007-02-13 15:09       ` Arjan van de Ven
@ 2007-02-13 16:24       ` bert hubert
  2007-02-13 16:56       ` Ingo Molnar
  2007-02-13 20:34       ` Ingo Molnar
  3 siblings, 0 replies; 319+ messages in thread
From: bert hubert @ 2007-02-13 16:24 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Alan, Ingo Molnar, linux-kernel, Linus Torvalds,
	Arjan van de Ven, Christoph Hellwig, Andrew Morton,
	Ulrich Drepper, Zach Brown, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

On Tue, Feb 13, 2007 at 09:58:48AM -0500, Benjamin LaHaise wrote:

> not present is mandatory).  I have looked into exactly this approach, and 
> it's only cheaper if the code is incomplete.  Linux's native threads are 
> pretty damned good.

Cheaper in time or in memory? Iow, would you be able to queue up as many
threads as syslets?

	Bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 16:39     ` Andi Kleen
@ 2007-02-13 16:26       ` Linus Torvalds
  2007-02-13 17:03         ` Ingo Molnar
  2007-02-13 20:26         ` Davide Libenzi
  2007-02-13 16:49       ` Ingo Molnar
  1 sibling, 2 replies; 319+ messages in thread
From: Linus Torvalds @ 2007-02-13 16:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan, Ingo Molnar, linux-kernel, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner



On Tue, 13 Feb 2007, Andi Kleen wrote:

> > 	sys_exec and other security boundaries must be synchronous only
> > and not allow async "spill over" (consider setuid async binary patching)
> 
> He probably would need some generalization of Andrea's seccomp work.
> Perhaps using bitmaps? For paranoia I would suggest to white list, not black list
> calls.

It's actually more likely a lot more efficient to let the system call 
itself do the sanity checking. That allows the common system calls (that 
*don't* need to even check) to just not do anything at all, instead of 
having some complex logic in the common system call execution trying to 
figure out for each system call whether it is ok or not.

Ie, we could just add to "do_fork()" (which is where all of the 
vfork/clone/fork cases end up) a simple case like

	err = wait_async_context();
	if (err)
		return err;

or

	if (in_async_context())
		return -EINVAL;

or similar. We need that "async_context()" function anyway for the other 
cases where we can't do other things concurrently, like changing the UID.

I would suggest that "wait_async_context()" would do:

 - if weare *in* an async context, return an error. We cannot wait for 
   ourselves!
 - if we are the "real thread", wait for all async contexts to go away 
   (and since we are the real thread, no new ones will be created, so this 
   is not going to be an infinite wait)

The new thing would be that wait_async_context() would possibly return 
-ERESTARTSYS (signal while an async context was executing), so any system 
call that does this would possibly return EINTR. Which "fork()" hasn't 
historically done. But if you have async events active, some operations 
likely cannot be done (setuid() and execve() comes to mind), so you really 
do need something like this.

And obviously it would only affect any program that actually would _use_ 
any of the suggested new interfaces, so it's not like a new error return 
would break anything old.

		Linus

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 15:00   ` Alan
  2007-02-13 14:58     ` Benjamin LaHaise
  2007-02-13 15:46     ` Dmitry Torokhov
@ 2007-02-13 16:39     ` Andi Kleen
  2007-02-13 16:26       ` Linus Torvalds
  2007-02-13 16:49       ` Ingo Molnar
  2007-02-13 16:42     ` Ingo Molnar
  3 siblings, 2 replies; 319+ messages in thread
From: Andi Kleen @ 2007-02-13 16:39 UTC (permalink / raw)
  To: Alan
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

Alan <alan@lxorguk.ukuu.org.uk> writes:

Funny, it sounds like batch() on stereoids @) Ok with an async context it becomes
somewhat more interesting.
 
> 	sys_setuid/gid/etc need to be synchronous only and not occur
> while other async syscalls are running in parallel to meet current kernel
> assumptions.
> 
> 	sys_exec and other security boundaries must be synchronous only
> and not allow async "spill over" (consider setuid async binary patching)

He probably would need some generalization of Andrea's seccomp work.
Perhaps using bitmaps? For paranoia I would suggest to white list, not black list
calls.

-Andi

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 15:00   ` Alan
                       ` (2 preceding siblings ...)
  2007-02-13 16:39     ` Andi Kleen
@ 2007-02-13 16:42     ` Ingo Molnar
  3 siblings, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 16:42 UTC (permalink / raw)
  To: Alan
  Cc: linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner


* Alan <alan@lxorguk.ukuu.org.uk> wrote:

> > A syslet is executed opportunistically: i.e. the syslet subsystem 
> > assumes that the syslet will not block, and it will switch to a 
> > cachemiss kernel thread from the scheduler. This means that even a
> 
> How is scheduler fairness maintained ? and what is done for resource 
> accounting here ?

the async threads are as if the user created user-space threads - and 
it's accounted (and scheduled) accordingly.

> > that the kernel fills and user-space clears. Waiting is done via the 
> > sys_async_wait() system call. Completion can be supressed on a 
> > per-atom
> 
> They should be selectable as well iff possible.

basically arbitrary notification interfaces are supported. For example 
if you add a sys_kill() call as the last syslet atom then this will 
notify any waiter in sigwait().

or if you want to select(), just do it in the fds that you are 
interested in, and the write that the syslet does triggers select() 
completion.

but the fastest one will be by using syslets: to just check the 
notification ring pointer in user-space, and then call into 
sys_async_wait() if the ring is empty.

I just noticed a small bug here: sys_async_wait() should also take the 
ring index userspace checked as a second parameter, and fix up the 
number of events it waits for with the delta between the ring index the 
kernel maintains and the ring index user-space has. The patch below 
fixes this bug.

> > Open issues:
> 
> Let me add some more
> 
> 	sys_setuid/gid/etc need to be synchronous only and not occur 
> while other async syscalls are running in parallel to meet current 
> kernel assumptions.

these should probably be taken out of the 'async syscall table', along 
with fork and the async syscalls themselves.

> 	sys_exec and other security boundaries must be synchronous 
> only and not allow async "spill over" (consider setuid async binary 
> patching)

i've tested sys_exec() and it seems to work, but i might have missed 
some corner-cases. (And what you raise is not academic, it might even 
make sense to do it, in the vfork() way.)

> >  - sys_fork() and sys_async_exec() should be filtered out from the 
> >    syscalls that are allowed - first one only makes sense with ptregs, 
> 
> clone and vfork. async_vfork is a real mindbender actually.

yeah. Also, create_module() perhaps. I'm starting to lean towards an 
async_syscall_table[]. At which point we could reduce the max syslet 
parameter count to 4, and do those few 5 and 6 parameter syscalls (of 
which only splice() and futex() truly matter i suspect) via wrappers. 
This would fit a syslet atom into 32 bytes on x86. Hm?

> >    second one is a nice kernel recursion thing :) I didnt want to 
> >    duplicate the sys_call_table though - maybe others have a better 
> >    idea.
> 
> What are the semantics of async sys_async_wait and async sys_async ?

agreed, that should be forbidden too.

	Ingo

---------------------->
---
 kernel/async.c |   12 +++++++++---
 kernel/async.h |    2 +-
 2 files changed, 10 insertions(+), 4 deletions(-)

Index: linux/kernel/async.c
===================================================================
--- linux.orig/kernel/async.c
+++ linux/kernel/async.c
@@ -721,7 +721,8 @@ static void refill_cachemiss_pool(struct
  * to finish or for all async processing to finish (whichever
  * comes first).
  */
-asmlinkage long sys_async_wait(unsigned long min_wait_events)
+asmlinkage long
+sys_async_wait(unsigned long min_wait_events, unsigned long user_curr_ring_idx)
 {
 	struct async_head *ah = current->ah;
 
@@ -730,12 +731,17 @@ asmlinkage long sys_async_wait(unsigned 
 
 	if (min_wait_events) {
 		spin_lock(&ah->lock);
-		ah->events_left = min_wait_events;
+		/*
+		 * Account any completions that happened since user-space
+		 * checked the ring:
+	 	 */
+		ah->events_left = min_wait_events -
+				(ah->curr_ring_idx - user_curr_ring_idx);
 		spin_unlock(&ah->lock);
 	}
 
 	return wait_event_interruptible(ah->wait,
-		list_empty(&ah->busy_async_threads) || !ah->events_left);
+		list_empty(&ah->busy_async_threads) || ah->events_left > 0);
 }
 
 /**
Index: linux/kernel/async.h
===================================================================
--- linux.orig/kernel/async.h
+++ linux/kernel/async.h
@@ -26,7 +26,7 @@ struct async_head {
 	struct list_head			ready_async_threads;
 	struct list_head			busy_async_threads;
 
-	unsigned long				events_left;
+	long					events_left;
 	wait_queue_head_t			wait;
 
 	struct async_head_user	__user		*uah;

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 16:39     ` Andi Kleen
  2007-02-13 16:26       ` Linus Torvalds
@ 2007-02-13 16:49       ` Ingo Molnar
  1 sibling, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 16:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan, linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner


* Andi Kleen <andi@firstfloor.org> wrote:

> > 	sys_exec and other security boundaries must be synchronous 
> > only and not allow async "spill over" (consider setuid async binary 
> > patching)
> 
> He probably would need some generalization of Andrea's seccomp work. 
> Perhaps using bitmaps? For paranoia I would suggest to white list, not 
> black list calls.

what i've implemented in my tree is sys_async_call_table[] which is a 
copy of sys_call_table[] with certain entries modified (by architecture 
level code, not by kernel/async.c) to sys_ni_syscall(). It's up to the 
architecture to decide which syscalls are allowed.

but i could use a bitmap too - whatever linear construct. [ I'm not sure 
there's much connection to seccomp - seccomp uses a NULL terminated 
whitelist - while syslets would use most of the entries (and would not 
want to have the overhead of checking a blacklist). ]

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 14:58     ` Benjamin LaHaise
  2007-02-13 15:09       ` Arjan van de Ven
  2007-02-13 16:24       ` bert hubert
@ 2007-02-13 16:56       ` Ingo Molnar
  2007-02-13 18:56         ` Evgeniy Polyakov
  2007-02-13 20:34       ` Ingo Molnar
  3 siblings, 1 reply; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 16:56 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Alan, linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Thomas Gleixner


* Benjamin LaHaise <bcrl@kvack.org> wrote:

> > > Open issues:
> > 
> > Let me add some more
> 
> Also: FPU state (especially important with the FPU and SSE memory copy 
> variants), segment register bases on x86-64, interaction with 
> set_fs()...

agreed - i'll fix this. But i can see no big conceptual issue here - 
these resources are all attached to the user context, and that doesnt 
change upon an 'async context-switch'. So it's "only" a matter of 
properly separating the user execution context from the kernel execution 
context. The hardest bit was getting the ptregs details right - the 
FPU/SSE state is pretty much async already (in the hardware too) and 
isnt even touched by any of these codepaths.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 16:26       ` Linus Torvalds
@ 2007-02-13 17:03         ` Ingo Molnar
  2007-02-13 20:26         ` Davide Libenzi
  1 sibling, 0 replies; 319+ messages in thread
From: Ingo Molnar @ 2007-02-13 17:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Alan, linux-kernel, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Benjamin LaHaise,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Ie, we could just add to "do_fork()" (which is where all of the 
> vfork/clone/fork cases end up) a simple case like
> 
> 	err = wait_async_context();
> 	if (err)
> 		return err;
> 
> or
> 
> 	if (in_async_context())
> 		return -EINVAL;

ok, this is a much nicer solution. I've scrapped the 
sys_async_sys_call_table[] thing.

	Ingo

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 16:56       ` Ingo Molnar
@ 2007-02-13 18:56         ` Evgeniy Polyakov
  2007-02-13 19:12           ` Evgeniy Polyakov
  2007-02-13 22:18           ` Ingo Molnar
  0 siblings, 2 replies; 319+ messages in thread
From: Evgeniy Polyakov @ 2007-02-13 18:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Benjamin LaHaise, Alan, linux-kernel, Linus Torvalds,
	Arjan van de Ven, Christoph Hellwig, Andrew Morton,
	Ulrich Drepper, Zach Brown, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

On Tue, Feb 13, 2007 at 05:56:42PM +0100, Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Benjamin LaHaise <bcrl@kvack.org> wrote:
> 
> > > > Open issues:
> > > 
> > > Let me add some more
> > 
> > Also: FPU state (especially important with the FPU and SSE memory copy 
> > variants), segment register bases on x86-64, interaction with 
> > set_fs()...
> 
> agreed - i'll fix this. But i can see no big conceptual issue here - 
> these resources are all attached to the user context, and that doesnt 
> change upon an 'async context-switch'. So it's "only" a matter of 
> properly separating the user execution context from the kernel execution 
> context. The hardest bit was getting the ptregs details right - the 
> FPU/SSE state is pretty much async already (in the hardware too) and 
> isnt even touched by any of these codepaths.

Good work, Ingo.

I have not received first mail with announcement yet, so I will place 
my thoughts here if you do not mind.

First one is per-thread data like TID. What about TLS related kernel
data (is non-exec stack property stored in TLS block or in kernel)?
Should it be copied with regs too (or better introduce new clone flag,
which would force that info copy)?

Btw, does SSE?/MMX?/call-it-yourself really saved on context switch?
As far as I can see no syscalls (and kernel at all) use that registers.

Another one is more global AIO question - while this approach IMHO
outperforms micro-thread design (Zach and Linus created really good
starting points, but they too have fundamental limiting factor), it
still has a problem - syscall blocks and the same thread thus is not
allowed to continue execution and fill the pipe - so what if system
issues thousands of requests and there are only tens of working thread
at most. What Tux did, as far as I recall, (and some other similar 
state machines do :) was to break blocking syscall issues and return
to the next execution entity (next syslet or atom). Is it possible to
extend exactly this state machine and interface to allow that (so that
some other state machine implementations would not continue its life :)?

> 	Ingo

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 00/11] ANNOUNCE: "Syslets", generic asynchronous system call support
  2007-02-13 18:56         ` Evgeniy Polyakov
@ 2007-02-13 19:12           ` Evgeniy Polyakov
  2007-02-13 22:19             ` Ingo Molnar
  2007-02-13 22:18           ` Ingo Molnar
  1 sibling, 1 reply; 319+ messages in thread
From: Evgeniy Polyakov @ 2007-02-13 19:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Benjamin LaHaise, Alan, linux-kernel, Linus Torvalds,
	Arjan van de Ven, Christoph Hellwig, Andrew Morton,
	Ulrich Drepper, Zach Brown, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Thomas Gleixner

> I have not received first mail with announcement yet, so I will place 
> my thoughts here if you do not mind.

An issue with sys_async_wait():
is is possible that events_left will be setup too late so that all
events are already ready and thus sys_async_wait() can wait forever
(or until next $sys_async_wait are ready)?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 319+ messages in thread

* Re: [patch 02/11] syslets: add syslet.h include file, user API/ABI definitions
  2007-02-13 14:20 ` [patch 02/11] syslets: add syslet.h include file, user API/ABI definitions Ingo Molnar
@ 2007-02-13 20:17   ` Indan Zupancic
  2007-02-13 21:43     ` Ingo Molnar
  2007-02-19  0:22   ` Paul Mackerras
  1 sibling, 1 reply; 319+ messages in thread
From: Indan Zupancic @ 2007-02-13 20:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Alan Cox, Ulrich Drepper

On Tue, February 13, 2007 15:20, Ingo Molnar wrote:
> +/*
> + * Execution control: conditions upon the return code
> + * of the previous syslet atom. 'Stop' means syslet
> + *