LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [git pull] core kernel fixes
@ 2008-10-30 23:29 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-10-30 23:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Suresh Siddha (1):
      resources: fix x86info results ioremap.c:226 __ioremap_caller+0xf2/0x2d6() WARNINGs


 kernel/resource.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 7fec0e4..6aac5c6 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -17,6 +17,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/device.h>
+#include <linux/pfn.h>
 #include <asm/io.h>
 
 
@@ -849,7 +850,8 @@ int iomem_map_sanity_check(resource_size_t addr, unsigned long size)
 			continue;
 		if (p->end < addr)
 			continue;
-		if (p->start <= addr && (p->end >= addr + size - 1))
+		if (PFN_DOWN(p->start) <= PFN_DOWN(addr) &&
+		    PFN_DOWN(p->end) >= PFN_DOWN(addr + size - 1))
 			continue;
 		printk(KERN_WARNING "resource map sanity check conflict: "
 		       "0x%llx 0x%llx 0x%llx 0x%llx %s\n",

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2020-01-29 11:53 [GIT PULL] " Ingo Molnar
@ 2020-01-29 19:10 ` pr-tracker-bot
  0 siblings, 0 replies; 97+ messages in thread
From: pr-tracker-bot @ 2020-01-29 19:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

The pull request you sent on Wed, 29 Jan 2020 12:53:48 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/80b60e3849bfe987801a73ebd4bab43b7b591a09

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2020-01-29 11:53 Ingo Molnar
  2020-01-29 19:10 ` pr-tracker-bot
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2020-01-29 11:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: 74777eaf7aef0f80276cb1c3fad5b8292c368859 Merge branch 'core/documentation' into core/urgent, to pick up single commit

Three objtool fixes, plus marking SFI as obsolete.

 Thanks,

	Ingo

------------------>
Josh Poimboeuf (1):
      objtool: Skip samples subdirectory

Lukas Bulwahn (1):
      MAINTAINERS: Mark simple firmware interface (SFI) obsolete

Olof Johansson (1):
      objtool: Silence build output

Shile Zhang (1):
      objtool: Fix ARCH=x86_64 build error


 MAINTAINERS                 | 5 +----
 samples/Makefile            | 1 +
 tools/objtool/Makefile      | 6 +-----
 tools/objtool/sync-check.sh | 2 --
 4 files changed, 3 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2019-05-18  8:51 Ingo Molnar
@ 2019-05-19 17:45 ` pr-tracker-bot
  0 siblings, 0 replies; 97+ messages in thread
From: pr-tracker-bot @ 2019-05-19 17:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Borislav Petkov, Dave Hansen, Andrew Morton

The pull request you sent on Sat, 18 May 2019 10:51:11 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/1335d9a1fb2abbe5022de3c517989cc7c7161dee

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2019-05-18  8:51 Ingo Molnar
  2019-05-19 17:45 ` pr-tracker-bot
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2019-05-18  8:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Dave Hansen, Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: 8ea58f1e8b11cca3087b294779bf5959bf89cc10 objtool: Allow AR to be overridden with HOSTAR

This fixes a particularly thorny munmap() bug with MPX, plus fixes a host 
build environment assumption in objtool.

 Thanks,

	Ingo

------------------>
Dave Hansen (1):
      x86/mpx, mm/core: Fix recursive munmap() corruption

Nathan Chancellor (1):
      objtool: Allow AR to be overridden with HOSTAR


 arch/powerpc/include/asm/mmu_context.h   |  1 -
 arch/um/include/asm/mmu_context.h        |  1 -
 arch/unicore32/include/asm/mmu_context.h |  1 -
 arch/x86/include/asm/mmu_context.h       |  6 +++---
 arch/x86/include/asm/mpx.h               | 15 ++++++++-------
 arch/x86/mm/mpx.c                        | 10 ++++++----
 include/asm-generic/mm_hooks.h           |  1 -
 mm/mmap.c                                | 15 ++++++++-------
 tools/objtool/Makefile                   |  3 ++-
 9 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 6ee8195a2ffb..4a6dd3ba0b0b 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -237,7 +237,6 @@ extern void arch_exit_mmap(struct mm_struct *mm);
 #endif
 
 static inline void arch_unmap(struct mm_struct *mm,
-			      struct vm_area_struct *vma,
 			      unsigned long start, unsigned long end)
 {
 	if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h
index fca34b2177e2..9f4b4bb78120 100644
--- a/arch/um/include/asm/mmu_context.h
+++ b/arch/um/include/asm/mmu_context.h
@@ -22,7 +22,6 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
 }
 extern void arch_exit_mmap(struct mm_struct *mm);
 static inline void arch_unmap(struct mm_struct *mm,
-			struct vm_area_struct *vma,
 			unsigned long start, unsigned long end)
 {
 }
diff --git a/arch/unicore32/include/asm/mmu_context.h b/arch/unicore32/include/asm/mmu_context.h
index 5c205a9cb5a6..9f06ea5466dd 100644
--- a/arch/unicore32/include/asm/mmu_context.h
+++ b/arch/unicore32/include/asm/mmu_context.h
@@ -88,7 +88,6 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
 }
 
 static inline void arch_unmap(struct mm_struct *mm,
-			struct vm_area_struct *vma,
 			unsigned long start, unsigned long end)
 {
 }
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 93dff1963337..9024236693d2 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -278,8 +278,8 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
 	mpx_mm_init(mm);
 }
 
-static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
-			      unsigned long start, unsigned long end)
+static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
+			      unsigned long end)
 {
 	/*
 	 * mpx_notify_unmap() goes and reads a rarely-hot
@@ -299,7 +299,7 @@ static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * consistently wrong.
 	 */
 	if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX)))
-		mpx_notify_unmap(mm, vma, start, end);
+		mpx_notify_unmap(mm, start, end);
 }
 
 /*
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index d0b1434fb0b6..143a5c193ed3 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -64,12 +64,15 @@ struct mpx_fault_info {
 };
 
 #ifdef CONFIG_X86_INTEL_MPX
-int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs);
-int mpx_handle_bd_fault(void);
+
+extern int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs);
+extern int mpx_handle_bd_fault(void);
+
 static inline int kernel_managing_mpx_tables(struct mm_struct *mm)
 {
 	return (mm->context.bd_addr != MPX_INVALID_BOUNDS_DIR);
 }
+
 static inline void mpx_mm_init(struct mm_struct *mm)
 {
 	/*
@@ -78,11 +81,10 @@ static inline void mpx_mm_init(struct mm_struct *mm)
 	 */
 	mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
 }
-void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
-		      unsigned long start, unsigned long end);
 
-unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
-		unsigned long flags);
+extern void mpx_notify_unmap(struct mm_struct *mm, unsigned long start, unsigned long end);
+extern unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, unsigned long flags);
+
 #else
 static inline int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs)
 {
@@ -100,7 +102,6 @@ static inline void mpx_mm_init(struct mm_struct *mm)
 {
 }
 static inline void mpx_notify_unmap(struct mm_struct *mm,
-				    struct vm_area_struct *vma,
 				    unsigned long start, unsigned long end)
 {
 }
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index c805db6236b4..7aeb9fe2955f 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -881,9 +881,10 @@ static int mpx_unmap_tables(struct mm_struct *mm,
  * the virtual address region start...end have already been split if
  * necessary, and the 'vma' is the first vma in this range (start -> end).
  */
-void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
-		unsigned long start, unsigned long end)
+void mpx_notify_unmap(struct mm_struct *mm, unsigned long start,
+		      unsigned long end)
 {
+	struct vm_area_struct *vma;
 	int ret;
 
 	/*
@@ -902,11 +903,12 @@ void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * which should not occur normally. Being strict about it here
 	 * helps ensure that we do not have an exploitable stack overflow.
 	 */
-	do {
+	vma = find_vma(mm, start);
+	while (vma && vma->vm_start < end) {
 		if (vma->vm_flags & VM_MPX)
 			return;
 		vma = vma->vm_next;
-	} while (vma && vma->vm_start < end);
+	}
 
 	ret = mpx_unmap_tables(mm, start, end);
 	if (ret)
diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h
index 8ac4e68a12f0..6736ed2f632b 100644
--- a/include/asm-generic/mm_hooks.h
+++ b/include/asm-generic/mm_hooks.h
@@ -18,7 +18,6 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
 }
 
 static inline void arch_unmap(struct mm_struct *mm,
-			struct vm_area_struct *vma,
 			unsigned long start, unsigned long end)
 {
 }
diff --git a/mm/mmap.c b/mm/mmap.c
index bd7b9f293b39..2d6a6662edb9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2735,9 +2735,17 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 		return -EINVAL;
 
 	len = PAGE_ALIGN(len);
+	end = start + len;
 	if (len == 0)
 		return -EINVAL;
 
+	/*
+	 * arch_unmap() might do unmaps itself.  It must be called
+	 * and finish any rbtree manipulation before this code
+	 * runs and also starts to manipulate the rbtree.
+	 */
+	arch_unmap(mm, start, end);
+
 	/* Find the first overlapping VMA */
 	vma = find_vma(mm, start);
 	if (!vma)
@@ -2746,7 +2754,6 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 	/* we have  start < vma->vm_end  */
 
 	/* if it doesn't overlap, we have nothing.. */
-	end = start + len;
 	if (vma->vm_start >= end)
 		return 0;
 
@@ -2816,12 +2823,6 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 	/* Detach vmas from rbtree */
 	detach_vmas_to_be_unmapped(mm, vma, prev, end);
 
-	/*
-	 * mpx unmap needs to be called with mmap_sem held for write.
-	 * It is safe to call it before unmap_region().
-	 */
-	arch_unmap(mm, vma, start, end);
-
 	if (downgrade)
 		downgrade_write(&mm->mmap_sem);
 
diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile
index 53f8be0f4a1f..88158239622b 100644
--- a/tools/objtool/Makefile
+++ b/tools/objtool/Makefile
@@ -7,11 +7,12 @@ ARCH := x86
 endif
 
 # always use the host compiler
+HOSTAR	?= ar
 HOSTCC	?= gcc
 HOSTLD	?= ld
+AR	 = $(HOSTAR)
 CC	 = $(HOSTCC)
 LD	 = $(HOSTLD)
-AR	 = ar
 
 ifeq ($(srctree),)
 srctree := $(patsubst %/,%,$(dir $(CURDIR)))

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2019-05-16 15:51 Ingo Molnar
@ 2019-05-16 18:20 ` pr-tracker-bot
  0 siblings, 0 replies; 97+ messages in thread
From: pr-tracker-bot @ 2019-05-16 18:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Borislav Petkov, Andrew Morton

The pull request you sent on Thu, 16 May 2019 17:51:46 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/b2ca74d32bba153a1507e6b7e36d3ec8a89311a1

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2019-05-16 15:51 Ingo Molnar
  2019-05-16 18:20 ` pr-tracker-bot
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2019-05-16 15:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Borislav Petkov,
	Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: 2decec48b0fd28ffdbf4cc684bd04e735f0839dd objtool: Fix whitelist documentation typo

A handful of objtool updates, plus a documentation addition for 
__ab_c_size().

 Thanks,

	Ingo

------------------>
Josh Poimboeuf (2):
      objtool: Don't use ignore flag for fake jumps
      objtool: Fix function fallthrough detection

Raphael Gault (1):
      objtool: Fix whitelist documentation typo

Rasmus Villemoes (1):
      overflow.h: Add comment documenting __ab_c_size()


 include/linux/overflow.h                         |  8 ++++++--
 tools/objtool/Documentation/stack-validation.txt |  2 +-
 tools/objtool/check.c                            | 11 +++++++----
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/linux/overflow.h b/include/linux/overflow.h
index 40b48e2133cb..6534a727cadb 100644
--- a/include/linux/overflow.h
+++ b/include/linux/overflow.h
@@ -278,11 +278,15 @@ static inline __must_check size_t array3_size(size_t a, size_t b, size_t c)
 	return bytes;
 }
 
-static inline __must_check size_t __ab_c_size(size_t n, size_t size, size_t c)
+/*
+ * Compute a*b+c, returning SIZE_MAX on overflow. Internal helper for
+ * struct_size() below.
+ */
+static inline __must_check size_t __ab_c_size(size_t a, size_t b, size_t c)
 {
 	size_t bytes;
 
-	if (check_mul_overflow(n, size, &bytes))
+	if (check_mul_overflow(a, b, &bytes))
 		return SIZE_MAX;
 	if (check_add_overflow(bytes, c, &bytes))
 		return SIZE_MAX;
diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index 3995735a878f..cd17ee022072 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -306,7 +306,7 @@ ignore it:
 
 - To skip validation of a file, add
 
-    OBJECT_FILES_NON_STANDARD_filename.o := n
+    OBJECT_FILES_NON_STANDARD_filename.o := y
 
   to the Makefile.
 
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index ac743a1d53ab..7325d89ccad9 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -28,6 +28,8 @@
 #include <linux/hashtable.h>
 #include <linux/kernel.h>
 
+#define FAKE_JUMP_OFFSET -1
+
 struct alternative {
 	struct list_head list;
 	struct instruction *insn;
@@ -568,7 +570,7 @@ static int add_jump_destinations(struct objtool_file *file)
 		    insn->type != INSN_JUMP_UNCONDITIONAL)
 			continue;
 
-		if (insn->ignore)
+		if (insn->ignore || insn->offset == FAKE_JUMP_OFFSET)
 			continue;
 
 		rela = find_rela_by_dest_range(insn->sec, insn->offset,
@@ -745,10 +747,10 @@ static int handle_group_alt(struct objtool_file *file,
 		clear_insn_state(&fake_jump->state);
 
 		fake_jump->sec = special_alt->new_sec;
-		fake_jump->offset = -1;
+		fake_jump->offset = FAKE_JUMP_OFFSET;
 		fake_jump->type = INSN_JUMP_UNCONDITIONAL;
 		fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
-		fake_jump->ignore = true;
+		fake_jump->func = orig_insn->func;
 	}
 
 	if (!special_alt->new_len) {
@@ -1957,7 +1959,8 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 			return 1;
 		}
 
-		func = insn->func ? insn->func->pfunc : NULL;
+		if (insn->func)
+			func = insn->func->pfunc;
 
 		if (func && insn->ignore) {
 			WARN_FUNC("BUG: why am I validating an ignored function?",

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2018-07-21 11:58 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2018-07-21 11:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: 092b31aa2048cf7561a39697974adcd147fbb27b x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling

This is mostly the copy_to_user_mcsafe() related fixes from Dan Williams,
and an ORC fix for Clang.

 Thanks,

	Ingo

------------------>
Dan Williams (4):
      lib/iov_iter: Document _copy_to_iter_mcsafe()
      lib/iov_iter: Document _copy_to_iter_flushcache()
      lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()
      x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling

Simon Ser (1):
      objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang


 arch/x86/Kconfig                  |  2 +-
 arch/x86/include/asm/uaccess_64.h |  7 +++-
 lib/iov_iter.c                    | 77 +++++++++++++++++++++++++++++++++++++--
 tools/objtool/elf.c               |  6 ++-
 4 files changed, 84 insertions(+), 8 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..887d3a7bb646 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -63,7 +63,7 @@ config X86
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_REFCOUNT
 	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
-	select ARCH_HAS_UACCESS_MCSAFE		if X86_64
+	select ARCH_HAS_UACCESS_MCSAFE		if X86_64 && X86_MCE
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 62acb613114b..a9d637bc301d 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -52,7 +52,12 @@ copy_to_user_mcsafe(void *to, const void *from, unsigned len)
 	unsigned long ret;
 
 	__uaccess_begin();
-	ret = memcpy_mcsafe(to, from, len);
+	/*
+	 * Note, __memcpy_mcsafe() is explicitly used since it can
+	 * handle exceptions / faults.  memcpy_mcsafe() may fall back to
+	 * memcpy() which lacks this handling.
+	 */
+	ret = __memcpy_mcsafe(to, from, len);
 	__uaccess_end();
 	return ret;
 }
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 7e43cd54c84c..8be175df3075 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -596,15 +596,70 @@ static unsigned long memcpy_mcsafe_to_page(struct page *page, size_t offset,
 	return ret;
 }
 
+static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes,
+				struct iov_iter *i)
+{
+	struct pipe_inode_info *pipe = i->pipe;
+	size_t n, off, xfer = 0;
+	int idx;
+
+	if (!sanity(i))
+		return 0;
+
+	bytes = n = push_pipe(i, bytes, &idx, &off);
+	if (unlikely(!n))
+		return 0;
+	for ( ; n; idx = next_idx(idx, pipe), off = 0) {
+		size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
+		unsigned long rem;
+
+		rem = memcpy_mcsafe_to_page(pipe->bufs[idx].page, off, addr,
+				chunk);
+		i->idx = idx;
+		i->iov_offset = off + chunk - rem;
+		xfer += chunk - rem;
+		if (rem)
+			break;
+		n -= chunk;
+		addr += chunk;
+	}
+	i->count -= xfer;
+	return xfer;
+}
+
+/**
+ * _copy_to_iter_mcsafe - copy to user with source-read error exception handling
+ * @addr: source kernel address
+ * @bytes: total transfer length
+ * @iter: destination iterator
+ *
+ * The pmem driver arranges for filesystem-dax to use this facility via
+ * dax_copy_to_iter() for protecting read/write to persistent memory.
+ * Unless / until an architecture can guarantee identical performance
+ * between _copy_to_iter_mcsafe() and _copy_to_iter() it would be a
+ * performance regression to switch more users to the mcsafe version.
+ *
+ * Otherwise, the main differences between this and typical _copy_to_iter().
+ *
+ * * Typical tail/residue handling after a fault retries the copy
+ *   byte-by-byte until the fault happens again. Re-triggering machine
+ *   checks is potentially fatal so the implementation uses source
+ *   alignment and poison alignment assumptions to avoid re-triggering
+ *   hardware exceptions.
+ *
+ * * ITER_KVEC, ITER_PIPE, and ITER_BVEC can return short copies.
+ *   Compare to copy_to_iter() where only ITER_IOVEC attempts might return
+ *   a short copy.
+ *
+ * See MCSAFE_TEST for self-test.
+ */
 size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	const char *from = addr;
 	unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
 
-	if (unlikely(i->type & ITER_PIPE)) {
-		WARN_ON(1);
-		return 0;
-	}
+	if (unlikely(i->type & ITER_PIPE))
+		return copy_pipe_to_iter_mcsafe(addr, bytes, i);
 	if (iter_is_iovec(i))
 		might_fault();
 	iterate_and_advance(i, bytes, v,
@@ -701,6 +756,20 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL(_copy_from_iter_nocache);
 
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/**
+ * _copy_from_iter_flushcache - write destination through cpu cache
+ * @addr: destination kernel address
+ * @bytes: total transfer length
+ * @iter: source iterator
+ *
+ * The pmem driver arranges for filesystem-dax to use this facility via
+ * dax_copy_from_iter() for ensuring that writes to persistent memory
+ * are flushed through the CPU cache. It is differentiated from
+ * _copy_from_iter_nocache() in that guarantees all data is flushed for
+ * all iterator types. The _copy_from_iter_nocache() only attempts to
+ * bypass the cache for the ITER_IOVEC case, and on some archs may use
+ * instructions that strand dirty-data in the cache.
+ */
 size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 {
 	char *to = addr;
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 0d1acb704f64..7ec85d567598 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -519,10 +519,12 @@ struct section *elf_create_section(struct elf *elf, const char *name,
 	sec->sh.sh_flags = SHF_ALLOC;
 
 
-	/* Add section name to .shstrtab */
+	/* Add section name to .shstrtab (or .strtab for Clang) */
 	shstrtab = find_section_by_name(elf, ".shstrtab");
+	if (!shstrtab)
+		shstrtab = find_section_by_name(elf, ".strtab");
 	if (!shstrtab) {
-		WARN("can't find .shstrtab section");
+		WARN("can't find .shstrtab or .strtab section");
 		return NULL;
 	}
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2017-12-06 22:01 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2017-12-06 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Josh Poimboeuf,
	Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: 14c47b54b0d9389e3ca0718e805cdd90c5a4303a objtool: Fix 64-bit build on 32-bit host

Two fixes:

 - objtool cross-build fixes
 - removal of an obsolete CPU-hotplug state name from comments.

 Thanks,

	Ingo

------------------>
Brendan Jackman (1):
      cpu/hotplug: Fix state name in takedown_cpu() comment

Mikulas Patocka (1):
      objtool: Fix 64-bit build on 32-bit host


 kernel/cpu.c             | 4 ++--
 tools/objtool/Makefile   | 8 +++++---
 tools/objtool/orc_dump.c | 7 ++++---
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 04892a82f6ac..2a885c5f2429 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -780,8 +780,8 @@ static int takedown_cpu(unsigned int cpu)
 	BUG_ON(cpu_online(cpu));
 
 	/*
-	 * The CPUHP_AP_SCHED_MIGRATE_DYING callback will have removed all
-	 * runnable tasks from the cpu, there's only the idle task left now
+	 * The teardown callback for CPUHP_AP_SCHED_STARTING will have removed
+	 * all runnable tasks from the CPU, there's only the idle task left now
 	 * that the migration thread is done doing the stop_machine thing.
 	 *
 	 * Wait for the stop thread to go away.
diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile
index 0f94af3ccaaa..ae0272f9a091 100644
--- a/tools/objtool/Makefile
+++ b/tools/objtool/Makefile
@@ -7,9 +7,11 @@ ARCH := x86
 endif
 
 # always use the host compiler
-CC = gcc
-LD = ld
-AR = ar
+HOSTCC	?= gcc
+HOSTLD	?= ld
+CC	 = $(HOSTCC)
+LD	 = $(HOSTLD)
+AR	 = ar
 
 ifeq ($(srctree),)
 srctree := $(patsubst %/,%,$(dir $(CURDIR)))
diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c
index 36c5bf6a2675..c3343820916a 100644
--- a/tools/objtool/orc_dump.c
+++ b/tools/objtool/orc_dump.c
@@ -76,7 +76,8 @@ int orc_dump(const char *_objname)
 	int fd, nr_entries, i, *orc_ip = NULL, orc_size = 0;
 	struct orc_entry *orc = NULL;
 	char *name;
-	unsigned long nr_sections, orc_ip_addr = 0;
+	size_t nr_sections;
+	Elf64_Addr orc_ip_addr = 0;
 	size_t shstrtab_idx;
 	Elf *elf;
 	Elf_Scn *scn;
@@ -187,10 +188,10 @@ int orc_dump(const char *_objname)
 				return -1;
 			}
 
-			printf("%s+%lx:", name, rela.r_addend);
+			printf("%s+%llx:", name, (unsigned long long)rela.r_addend);
 
 		} else {
-			printf("%lx:", orc_ip_addr + (i * sizeof(int)) + orc_ip[i]);
+			printf("%llx:", (unsigned long long)(orc_ip_addr + (i * sizeof(int)) + orc_ip[i]));
 		}
 
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2017-11-05 20:12     ` Linus Torvalds
@ 2017-11-05 21:01       ` Josh Poimboeuf
  0 siblings, 0 replies; 97+ messages in thread
From: Josh Poimboeuf @ 2017-11-05 21:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Sun, Nov 05, 2017 at 12:12:31PM -0800, Linus Torvalds wrote:
> On Sun, Nov 5, 2017 at 11:53 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > The GCC manual says:
> >
> >   "asm statements that have no output operands, including asm goto
> >    statements, are implicitly volatile."
> 
> Hmm. Fair enough.
> 
> And the manual does say that it can merge and duplicate those asms
> (and suggests using "%=" to generate a unique number, but I guess
> "%c0" with __COUNTER__ is equivalent).
> 
> I think the gcc manual has changed. I'm pretty certain it used to say
> that "volatile" asms would not be "moved significantly". They've
> silently changed semantics before too, oh well.

I had tried the '%=' thing before, because that was exactly what I
needed.  But alas, it's not supported by the older GCCs.

-- 
Josh

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2017-11-05 19:53   ` Josh Poimboeuf
@ 2017-11-05 20:12     ` Linus Torvalds
  2017-11-05 21:01       ` Josh Poimboeuf
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2017-11-05 20:12 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Sun, Nov 5, 2017 at 11:53 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> The GCC manual says:
>
>   "asm statements that have no output operands, including asm goto
>    statements, are implicitly volatile."

Hmm. Fair enough.

And the manual does say that it can merge and duplicate those asms
(and suggests using "%=" to generate a unique number, but I guess
"%c0" with __COUNTER__ is equivalent).

I think the gcc manual has changed. I'm pretty certain it used to say
that "volatile" asms would not be "moved significantly". They've
silently changed semantics before too, oh well.

              Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2017-11-05 18:09 ` Linus Torvalds
@ 2017-11-05 19:53   ` Josh Poimboeuf
  2017-11-05 20:12     ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Josh Poimboeuf @ 2017-11-05 19:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Sun, Nov 05, 2017 at 10:09:59AM -0800, Linus Torvalds wrote:
> On Sun, Nov 5, 2017 at 6:33 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Please note that this pull request is RFC due to the top commit:
> >
> >   ec1e1b610917: objtool: Prevent GCC from merging annotate_unreachable(), take 2
> >
> > ... which is admittedly somewhat of an ad-hoc workaround for something the
> > compiler should have done - if there's another solution we can try that.
> 
> So I'm certainly ok with that workaround since apparently "asm
> volatile" doesn't do it.
> 
> That said, I think that if that asm needs to not be merged, it should
> _also_ be marked as "volatile" - since that's the documented bit for
> "not moved significantly". Of course, then because apparently that
> isn't enough, the __COUNTER__ games are ok, but might really mention
> an explicit comment in the code as to why they exist. Because right
> now they look just odd and nonsensical.

The GCC manual says:

  "asm statements that have no output operands, including asm goto
   statements, are implicitly volatile."

Since these macros have input operands, but no output operands, I assume
they're already implicitly volatile.  But we can certainly make it
explicit.  And yes, a comment would be good.

Something like so?

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 3672353a0acd..188ed9f65517 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -88,17 +88,22 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 
 /* Unreachable code */
 #ifdef CONFIG_STACK_VALIDATION
+/*
+ * These macros help objtool understand GCC code flow for unreachable code.
+ * The __COUNTER__ based labels are a hack to make each instance of the macros
+ * unique, to convince GCC not to merge duplicate inline asm statements.
+ */
 #define annotate_reachable() ({						\
-	asm("%c0:\n\t"							\
-	    ".pushsection .discard.reachable\n\t"			\
-	    ".long %c0b - .\n\t"					\
-	    ".popsection\n\t" : : "i" (__COUNTER__));			\
+	asm volatile("%c0:\n\t"						\
+		     ".pushsection .discard.reachable\n\t"		\
+		     ".long %c0b - .\n\t"				\
+		     ".popsection\n\t" : : "i" (__COUNTER__));		\
 })
 #define annotate_unreachable() ({					\
-	asm("%c0:\n\t"							\
-	    ".pushsection .discard.unreachable\n\t"			\
-	    ".long %c0b - .\n\t"					\
-	    ".popsection\n\t" : : "i" (__COUNTER__));			\
+	asm volatile("%c0:\n\t"						\
+		     ".pushsection .discard.unreachable\n\t"		\
+		     ".long %c0b - .\n\t"				\
+		     ".popsection\n\t" : : "i" (__COUNTER__));		\
 })
 #define ASM_UNREACHABLE							\
 	"999:\n\t"							\

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2017-11-05 14:33 Ingo Molnar
@ 2017-11-05 18:09 ` Linus Torvalds
  2017-11-05 19:53   ` Josh Poimboeuf
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2017-11-05 18:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra,
	Josh Poimboeuf, Andrew Morton

On Sun, Nov 5, 2017 at 6:33 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Please note that this pull request is RFC due to the top commit:
>
>   ec1e1b610917: objtool: Prevent GCC from merging annotate_unreachable(), take 2
>
> ... which is admittedly somewhat of an ad-hoc workaround for something the
> compiler should have done - if there's another solution we can try that.

So I'm certainly ok with that workaround since apparently "asm
volatile" doesn't do it.

That said, I think that if that asm needs to not be merged, it should
_also_ be marked as "volatile" - since that's the documented bit for
"not moved significantly". Of course, then because apparently that
isn't enough, the __COUNTER__ games are ok, but might really mention
an explicit comment in the code as to why they exist. Because right
now they look just odd and nonsensical.

               Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2017-11-05 14:33 Ingo Molnar
  2017-11-05 18:09 ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2017-11-05 14:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Josh Poimboeuf,
	Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: ec1e1b6109171d1890a437481c35b2b56d2327b8 objtool: Prevent GCC from merging annotate_unreachable(), take 2

Please note that this pull request is RFC due to the top commit:

  ec1e1b610917: objtool: Prevent GCC from merging annotate_unreachable(), take 2

... which is admittedly somewhat of an ad-hoc workaround for something the 
compiler should have done - if there's another solution we can try that.

The other changes:

 - futex race fixes

 - objtool build warning fix

 - two watchdog fixes: a crash fix (revert) and a /proc/sys/kernel/watchdog_thresh 
   handling bug fix.

 Thanks,

	Ingo

------------------>
Don Zickus (1):
      watchdog/hardlockup/perf: Use atomics to track in-use cpu counter

Josh Poimboeuf (2):
      objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version
      objtool: Prevent GCC from merging annotate_unreachable(), take 2

Peter Zijlstra (1):
      futex: Fix more put_pi_state() vs. exit_pi_state_list() races

Thomas Gleixner (1):
      watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")


 include/linux/compiler.h                          |  4 ++--
 kernel/futex.c                                    | 23 ++++++++++++++++++++---
 kernel/watchdog_hld.c                             | 15 ++++++++++-----
 tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk |  1 +
 4 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index fd8697aa4f73..202710420d6d 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -191,13 +191,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 	asm("%c0:\n\t"							\
 	    ".pushsection .discard.reachable\n\t"			\
 	    ".long %c0b - .\n\t"					\
-	    ".popsection\n\t" : : "i" (__LINE__));			\
+	    ".popsection\n\t" : : "i" (__COUNTER__));			\
 })
 #define annotate_unreachable() ({					\
 	asm("%c0:\n\t"							\
 	    ".pushsection .discard.unreachable\n\t"			\
 	    ".long %c0b - .\n\t"					\
-	    ".popsection\n\t" : : "i" (__LINE__));			\
+	    ".popsection\n\t" : : "i" (__COUNTER__));			\
 })
 #define ASM_UNREACHABLE							\
 	"999:\n\t"							\
diff --git a/kernel/futex.c b/kernel/futex.c
index 0d638f008bb1..76ed5921117a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -903,11 +903,27 @@ void exit_pi_state_list(struct task_struct *curr)
 	 */
 	raw_spin_lock_irq(&curr->pi_lock);
 	while (!list_empty(head)) {
-
 		next = head->next;
 		pi_state = list_entry(next, struct futex_pi_state, list);
 		key = pi_state->key;
 		hb = hash_futex(&key);
+
+		/*
+		 * We can race against put_pi_state() removing itself from the
+		 * list (a waiter going away). put_pi_state() will first
+		 * decrement the reference count and then modify the list, so
+		 * its possible to see the list entry but fail this reference
+		 * acquire.
+		 *
+		 * In that case; drop the locks to let put_pi_state() make
+		 * progress and retry the loop.
+		 */
+		if (!atomic_inc_not_zero(&pi_state->refcount)) {
+			raw_spin_unlock_irq(&curr->pi_lock);
+			cpu_relax();
+			raw_spin_lock_irq(&curr->pi_lock);
+			continue;
+		}
 		raw_spin_unlock_irq(&curr->pi_lock);
 
 		spin_lock(&hb->lock);
@@ -918,8 +934,10 @@ void exit_pi_state_list(struct task_struct *curr)
 		 * task still owns the PI-state:
 		 */
 		if (head->next != next) {
+			/* retain curr->pi_lock for the loop invariant */
 			raw_spin_unlock(&pi_state->pi_mutex.wait_lock);
 			spin_unlock(&hb->lock);
+			put_pi_state(pi_state);
 			continue;
 		}
 
@@ -927,9 +945,8 @@ void exit_pi_state_list(struct task_struct *curr)
 		WARN_ON(list_empty(&pi_state->list));
 		list_del_init(&pi_state->list);
 		pi_state->owner = NULL;
-		raw_spin_unlock(&curr->pi_lock);
 
-		get_pi_state(pi_state);
+		raw_spin_unlock(&curr->pi_lock);
 		raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
 		spin_unlock(&hb->lock);
 
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 4583feb66393..e449a23e9d59 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -13,6 +13,7 @@
 #define pr_fmt(fmt) "NMI watchdog: " fmt
 
 #include <linux/nmi.h>
+#include <linux/atomic.h>
 #include <linux/module.h>
 #include <linux/sched/debug.h>
 
@@ -22,10 +23,11 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static unsigned int watchdog_cpus;
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -189,7 +191,8 @@ void hardlockup_detector_perf_enable(void)
 	if (hardlockup_detector_event_create())
 		return;
 
-	if (!watchdog_cpus++)
+	/* use original value for check */
+	if (!atomic_fetch_inc(&watchdog_cpus))
 		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
 
 	perf_event_enable(this_cpu_read(watchdog_ev));
@@ -204,8 +207,10 @@ void hardlockup_detector_perf_disable(void)
 
 	if (event) {
 		perf_event_disable(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-		watchdog_cpus--;
+		atomic_dec(&watchdog_cpus);
 	}
 }
 
@@ -219,7 +224,7 @@ void hardlockup_detector_perf_cleanup(void)
 	int cpu;
 
 	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+		struct perf_event *event = per_cpu(dead_event, cpu);
 
 		/*
 		 * Required because for_each_cpu() reports  unconditionally
@@ -227,7 +232,7 @@ void hardlockup_detector_perf_cleanup(void)
 		 */
 		if (event)
 			perf_event_release_kernel(event);
-		per_cpu(watchdog_ev, cpu) = NULL;
+		per_cpu(dead_event, cpu) = NULL;
 	}
 	cpumask_clear(&dead_events_mask);
 }
diff --git a/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk b/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
index a3d2c62fd805..b02a36b2c14f 100644
--- a/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
+++ b/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk
@@ -1,4 +1,5 @@
 #!/bin/awk -f
+# SPDX-License-Identifier: GPL-2.0
 # gen-insn-attr-x86.awk: Instruction attribute table generator
 # Written by Masami Hiramatsu <mhiramat@redhat.com>
 #

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2017-07-21 10:01 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2017-07-21 10:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton,
	Josh Poimboeuf

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: 325cdacd03c12629aa5f9ee2ace49b1f3dc184a8 debug: Fix WARN_ON_ONCE() for modules

A fix to WARN_ON_ONCE() done by modules, plus a MAINTAINERS update.

 Thanks,

	Ingo

------------------>
Ingo Molnar (1):
      MAINTAINERS: Update the PTRACE entry

Josh Poimboeuf (1):
      debug: Fix WARN_ON_ONCE() for modules


 MAINTAINERS                     | 6 +++++-
 arch/arm/include/asm/bug.h      | 2 +-
 arch/arm64/include/asm/bug.h    | 2 +-
 arch/blackfin/include/asm/bug.h | 4 ++--
 arch/mn10300/include/asm/bug.h  | 2 +-
 arch/parisc/include/asm/bug.h   | 6 +++---
 arch/powerpc/include/asm/bug.h  | 8 ++++----
 arch/s390/include/asm/bug.h     | 4 ++--
 arch/sh/include/asm/bug.h       | 4 ++--
 arch/x86/include/asm/bug.h      | 4 ++--
 10 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d357695ee4fe..cbe90323c35a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10355,7 +10355,6 @@ F:	drivers/ptp/*
 F:	include/linux/ptp_cl*
 
 PTRACE SUPPORT
-M:	Roland McGrath <roland@hack.frob.com>
 M:	Oleg Nesterov <oleg@redhat.com>
 S:	Maintained
 F:	include/asm-generic/syscall.h
@@ -10363,7 +10362,12 @@ F:	include/linux/ptrace.h
 F:	include/linux/regset.h
 F:	include/linux/tracehook.h
 F:	include/uapi/linux/ptrace.h
+F:	include/uapi/linux/ptrace.h
+F:	include/asm-generic/ptrace.h
 F:	kernel/ptrace.c
+F:	arch/*/ptrace*.c
+F:	arch/*/*/ptrace*.c
+F:	arch/*/include/asm/ptrace*.h
 
 PULSE8-CEC DRIVER
 M:	Hans Verkuil <hverkuil@xs4all.nl>
diff --git a/arch/arm/include/asm/bug.h b/arch/arm/include/asm/bug.h
index 4e6e88a6b2f4..2244a94ed9c9 100644
--- a/arch/arm/include/asm/bug.h
+++ b/arch/arm/include/asm/bug.h
@@ -37,7 +37,7 @@ do {								\
 		".pushsection .rodata.str, \"aMS\", %progbits, 1\n" \
 		"2:\t.asciz " #__file "\n" 			\
 		".popsection\n" 				\
-		".pushsection __bug_table,\"a\"\n"		\
+		".pushsection __bug_table,\"aw\"\n"		\
 		".align 2\n"					\
 		"3:\t.word 1b, 2b\n"				\
 		"\t.hword " #__line ", 0\n"			\
diff --git a/arch/arm64/include/asm/bug.h b/arch/arm64/include/asm/bug.h
index 366448eb0fb7..a02a57186f56 100644
--- a/arch/arm64/include/asm/bug.h
+++ b/arch/arm64/include/asm/bug.h
@@ -36,7 +36,7 @@
 #ifdef CONFIG_GENERIC_BUG
 
 #define __BUG_ENTRY(flags) 				\
-		".pushsection __bug_table,\"a\"\n\t"	\
+		".pushsection __bug_table,\"aw\"\n\t"	\
 		".align 2\n\t"				\
 	"0:	.long 1f - 0b\n\t"			\
 _BUGVERBOSE_LOCATION(__FILE__, __LINE__)		\
diff --git a/arch/blackfin/include/asm/bug.h b/arch/blackfin/include/asm/bug.h
index 8d9b1eba89c4..76b2e82ee730 100644
--- a/arch/blackfin/include/asm/bug.h
+++ b/arch/blackfin/include/asm/bug.h
@@ -21,7 +21,7 @@
 #define _BUG_OR_WARN(flags)						\
 	asm volatile(							\
 		"1:	.hword	%0\n"					\
-		"	.section __bug_table,\"a\",@progbits\n"		\
+		"	.section __bug_table,\"aw\",@progbits\n"	\
 		"2:	.long	1b\n"					\
 		"	.long	%1\n"					\
 		"	.short	%2\n"					\
@@ -38,7 +38,7 @@
 #define _BUG_OR_WARN(flags)						\
 	asm volatile(							\
 		"1:	.hword	%0\n"					\
-		"	.section __bug_table,\"a\",@progbits\n"		\
+		"	.section __bug_table,\"aw\",@progbits\n"	\
 		"2:	.long	1b\n"					\
 		"	.short	%1\n"					\
 		"	.org	2b + %2\n"				\
diff --git a/arch/mn10300/include/asm/bug.h b/arch/mn10300/include/asm/bug.h
index aa6a38886391..811414fb002d 100644
--- a/arch/mn10300/include/asm/bug.h
+++ b/arch/mn10300/include/asm/bug.h
@@ -21,7 +21,7 @@ do {								\
 	asm volatile(						\
 		"	syscall 15			\n"	\
 		"0:					\n"	\
-		"	.section __bug_table,\"a\"	\n"	\
+		"	.section __bug_table,\"aw\"	\n"	\
 		"	.long 0b,%0,%1			\n"	\
 		"	.previous			\n"	\
 		:						\
diff --git a/arch/parisc/include/asm/bug.h b/arch/parisc/include/asm/bug.h
index d2742273a685..07ea467f22fc 100644
--- a/arch/parisc/include/asm/bug.h
+++ b/arch/parisc/include/asm/bug.h
@@ -27,7 +27,7 @@
 	do {								\
 		asm volatile("\n"					\
 			     "1:\t" PARISC_BUG_BREAK_ASM "\n"		\
-			     "\t.pushsection __bug_table,\"a\"\n"	\
+			     "\t.pushsection __bug_table,\"aw\"\n"	\
 			     "2:\t" ASM_WORD_INSN "1b, %c0\n"		\
 			     "\t.short %c1, %c2\n"			\
 			     "\t.org 2b+%c3\n"				\
@@ -50,7 +50,7 @@
 	do {								\
 		asm volatile("\n"					\
 			     "1:\t" PARISC_BUG_BREAK_ASM "\n"		\
-			     "\t.pushsection __bug_table,\"a\"\n"	\
+			     "\t.pushsection __bug_table,\"aw\"\n"	\
 			     "2:\t" ASM_WORD_INSN "1b, %c0\n"		\
 			     "\t.short %c1, %c2\n"			\
 			     "\t.org 2b+%c3\n"				\
@@ -64,7 +64,7 @@
 	do {								\
 		asm volatile("\n"					\
 			     "1:\t" PARISC_BUG_BREAK_ASM "\n"		\
-			     "\t.pushsection __bug_table,\"a\"\n"	\
+			     "\t.pushsection __bug_table,\"aw\"\n"	\
 			     "2:\t" ASM_WORD_INSN "1b\n"		\
 			     "\t.short %c0\n"				\
 			     "\t.org 2b+%c1\n"				\
diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 0151af6c2a50..87fcc1948817 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -18,7 +18,7 @@
 #include <asm/asm-offsets.h>
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 .macro EMIT_BUG_ENTRY addr,file,line,flags
-	 .section __bug_table,"a"
+	 .section __bug_table,"aw"
 5001:	 PPC_LONG \addr, 5002f
 	 .short \line, \flags
 	 .org 5001b+BUG_ENTRY_SIZE
@@ -29,7 +29,7 @@
 .endm
 #else
 .macro EMIT_BUG_ENTRY addr,file,line,flags
-	 .section __bug_table,"a"
+	 .section __bug_table,"aw"
 5001:	 PPC_LONG \addr
 	 .short \flags
 	 .org 5001b+BUG_ENTRY_SIZE
@@ -42,14 +42,14 @@
    sizeof(struct bug_entry), respectively */
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 #define _EMIT_BUG_ENTRY				\
-	".section __bug_table,\"a\"\n"		\
+	".section __bug_table,\"aw\"\n"		\
 	"2:\t" PPC_LONG "1b, %0\n"		\
 	"\t.short %1, %2\n"			\
 	".org 2b+%3\n"				\
 	".previous\n"
 #else
 #define _EMIT_BUG_ENTRY				\
-	".section __bug_table,\"a\"\n"		\
+	".section __bug_table,\"aw\"\n"		\
 	"2:\t" PPC_LONG "1b\n"			\
 	"\t.short %2\n"				\
 	".org 2b+%3\n"				\
diff --git a/arch/s390/include/asm/bug.h b/arch/s390/include/asm/bug.h
index 1bbd9dbfe4e0..ce9cc123988b 100644
--- a/arch/s390/include/asm/bug.h
+++ b/arch/s390/include/asm/bug.h
@@ -14,7 +14,7 @@
 		".section .rodata.str,\"aMS\",@progbits,1\n"	\
 		"2:	.asciz	\""__FILE__"\"\n"		\
 		".previous\n"					\
-		".section __bug_table,\"a\"\n"			\
+		".section __bug_table,\"aw\"\n"			\
 		"3:	.long	1b-3b,2b-3b\n"			\
 		"	.short	%0,%1\n"			\
 		"	.org	3b+%2\n"			\
@@ -30,7 +30,7 @@
 	asm volatile(					\
 		"0:	j	0b+2\n"			\
 		"1:\n"					\
-		".section __bug_table,\"a\"\n"		\
+		".section __bug_table,\"aw\"\n"		\
 		"2:	.long	1b-2b\n"		\
 		"	.short	%0\n"			\
 		"	.org	2b+%1\n"		\
diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h
index 1b77f068be2b..986c8781d89f 100644
--- a/arch/sh/include/asm/bug.h
+++ b/arch/sh/include/asm/bug.h
@@ -24,14 +24,14 @@
  */
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 #define _EMIT_BUG_ENTRY				\
-	"\t.pushsection __bug_table,\"a\"\n"	\
+	"\t.pushsection __bug_table,\"aw\"\n"	\
 	"2:\t.long 1b, %O1\n"			\
 	"\t.short %O2, %O3\n"			\
 	"\t.org 2b+%O4\n"			\
 	"\t.popsection\n"
 #else
 #define _EMIT_BUG_ENTRY				\
-	"\t.pushsection __bug_table,\"a\"\n"	\
+	"\t.pushsection __bug_table,\"aw\"\n"	\
 	"2:\t.long 1b\n"			\
 	"\t.short %O3\n"			\
 	"\t.org 2b+%O4\n"			\
diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 39e702d90cdb..aa6b2023d8f8 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -35,7 +35,7 @@
 #define _BUG_FLAGS(ins, flags)						\
 do {									\
 	asm volatile("1:\t" ins "\n"					\
-		     ".pushsection __bug_table,\"a\"\n"			\
+		     ".pushsection __bug_table,\"aw\"\n"		\
 		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
 		     "\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"	\
 		     "\t.word %c1"        "\t# bug_entry::line\n"	\
@@ -52,7 +52,7 @@ do {									\
 #define _BUG_FLAGS(ins, flags)						\
 do {									\
 	asm volatile("1:\t" ins "\n"					\
-		     ".pushsection __bug_table,\"a\"\n"			\
+		     ".pushsection __bug_table,\"aw\"\n"		\
 		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
 		     "\t.word %c0"        "\t# bug_entry::flags\n"	\
 		     "\t.org 2b+%c1\n"					\

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2016-10-18 10:14 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2016-10-18 10:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton,
	Josh Poimboeuf

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: a705e07b9c80df27b6bb12f7a4cd4cf4ed2f728b cpu/hotplug: Use distinct name for cpu_hotplug.dep_map

A CPU hotplug debuggability fix and three objtool false positive warnings fixes 
for new GCC6 code generation patterns.

 Thanks,

	Ingo

------------------>
Joonas Lahtinen (1):
      cpu/hotplug: Use distinct name for cpu_hotplug.dep_map

Josh Poimboeuf (3):
      objtool: Support '-mtune=atom' stack frame setup instruction
      objtool: Improve rare switch jump table pattern detection
      objtool: Skip all "unreachable instruction" warnings for gcov kernels


 kernel/cpu.c                    |  2 +-
 tools/objtool/arch/x86/decode.c |  9 ++++++
 tools/objtool/builtin-check.c   | 68 +++++++++++++++++++++--------------------
 3 files changed, 45 insertions(+), 34 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5df20d6d1520..29de1a9352c0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -228,7 +228,7 @@ static struct {
 	.wq = __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.wq),
 	.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-	.dep_map = {.name = "cpu_hotplug.lock" },
+	.dep_map = STATIC_LOCKDEP_MAP_INIT("cpu_hotplug.dep_map", &cpu_hotplug.dep_map),
 #endif
 };
 
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index c0c0b265e88e..b63a31be1218 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -98,6 +98,15 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 			*type = INSN_FP_SETUP;
 		break;
 
+	case 0x8d:
+		if (insn.rex_prefix.bytes &&
+		    insn.rex_prefix.bytes[0] == 0x48 &&
+		    insn.modrm.nbytes && insn.modrm.bytes[0] == 0x2c &&
+		    insn.sib.nbytes && insn.sib.bytes[0] == 0x24)
+			/* lea %(rsp), %rbp */
+			*type = INSN_FP_SETUP;
+		break;
+
 	case 0x90:
 		*type = INSN_NOP;
 		break;
diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index 143b6cdd7f06..4490601a9235 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -97,6 +97,19 @@ static struct instruction *next_insn_same_sec(struct objtool_file *file,
 	return next;
 }
 
+static bool gcov_enabled(struct objtool_file *file)
+{
+	struct section *sec;
+	struct symbol *sym;
+
+	list_for_each_entry(sec, &file->elf->sections, list)
+		list_for_each_entry(sym, &sec->symbol_list, list)
+			if (!strncmp(sym->name, "__gcov_.", 8))
+				return true;
+
+	return false;
+}
+
 #define for_each_insn(file, insn)					\
 	list_for_each_entry(insn, &file->insn_list, list)
 
@@ -713,6 +726,7 @@ static struct rela *find_switch_table(struct objtool_file *file,
 				      struct instruction *insn)
 {
 	struct rela *text_rela, *rodata_rela;
+	struct instruction *orig_insn = insn;
 
 	text_rela = find_rela_by_dest_range(insn->sec, insn->offset, insn->len);
 	if (text_rela && text_rela->sym == file->rodata->sym) {
@@ -733,10 +747,16 @@ static struct rela *find_switch_table(struct objtool_file *file,
 
 	/* case 3 */
 	func_for_each_insn_continue_reverse(file, func, insn) {
-		if (insn->type == INSN_JUMP_UNCONDITIONAL ||
-		    insn->type == INSN_JUMP_DYNAMIC)
+		if (insn->type == INSN_JUMP_DYNAMIC)
 			break;
 
+		/* allow small jumps within the range */
+		if (insn->type == INSN_JUMP_UNCONDITIONAL &&
+		    insn->jump_dest &&
+		    (insn->jump_dest->offset <= insn->offset ||
+		     insn->jump_dest->offset >= orig_insn->offset))
+		    break;
+
 		text_rela = find_rela_by_dest_range(insn->sec, insn->offset,
 						    insn->len);
 		if (text_rela && text_rela->sym == file->rodata->sym)
@@ -1034,34 +1054,6 @@ static int validate_branch(struct objtool_file *file,
 	return 0;
 }
 
-static bool is_gcov_insn(struct instruction *insn)
-{
-	struct rela *rela;
-	struct section *sec;
-	struct symbol *sym;
-	unsigned long offset;
-
-	rela = find_rela_by_dest_range(insn->sec, insn->offset, insn->len);
-	if (!rela)
-		return false;
-
-	if (rela->sym->type != STT_SECTION)
-		return false;
-
-	sec = rela->sym->sec;
-	offset = rela->addend + insn->offset + insn->len - rela->offset;
-
-	list_for_each_entry(sym, &sec->symbol_list, list) {
-		if (sym->type != STT_OBJECT)
-			continue;
-
-		if (offset >= sym->offset && offset < sym->offset + sym->len)
-			return (!memcmp(sym->name, "__gcov0.", 8));
-	}
-
-	return false;
-}
-
 static bool is_kasan_insn(struct instruction *insn)
 {
 	return (insn->type == INSN_CALL &&
@@ -1083,9 +1075,6 @@ static bool ignore_unreachable_insn(struct symbol *func,
 	if (insn->type == INSN_NOP)
 		return true;
 
-	if (is_gcov_insn(insn))
-		return true;
-
 	/*
 	 * Check if this (or a subsequent) instruction is related to
 	 * CONFIG_UBSAN or CONFIG_KASAN.
@@ -1146,6 +1135,19 @@ static int validate_functions(struct objtool_file *file)
 				    ignore_unreachable_insn(func, insn))
 					continue;
 
+				/*
+				 * gcov produces a lot of unreachable
+				 * instructions.  If we get an unreachable
+				 * warning and the file has gcov enabled, just
+				 * ignore it, and all other such warnings for
+				 * the file.
+				 */
+				if (!file->ignore_unreachables &&
+				    gcov_enabled(file)) {
+					file->ignore_unreachables = true;
+					continue;
+				}
+
 				WARN_FUNC("function has unreachable instruction", insn->sec, insn->offset);
 				warnings++;
 			}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2016-07-13 10:55 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2016-07-13 10:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: a7c734140aa36413944eef0f8c660e0e2256357d cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble

Fix an objtool false positive plus an UP kernel memory corruption bug on certain 
configs.

 Thanks,

	Ingo

------------------>
Josh Poimboeuf (1):
      objtool: Fix STACK_FRAME_NON_STANDARD macro checking for function symbols

Thomas Gleixner (1):
      cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble


 kernel/cpu.c                  | 2 ++
 tools/objtool/builtin-check.c | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d948e44c471e..7b61887f7ccd 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1201,6 +1201,8 @@ static struct cpuhp_step cpuhp_bp_states[] = {
 		.teardown		= takedown_cpu,
 		.cant_stop		= true,
 	},
+#else
+	[CPUHP_BRINGUP_CPU] = { },
 #endif
 };
 
diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index e8a1e69eb92c..25d803148f5c 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -122,10 +122,14 @@ static bool ignore_func(struct objtool_file *file, struct symbol *func)
 
 	/* check for STACK_FRAME_NON_STANDARD */
 	if (file->whitelist && file->whitelist->rela)
-		list_for_each_entry(rela, &file->whitelist->rela->rela_list, list)
-			if (rela->sym->sec == func->sec &&
+		list_for_each_entry(rela, &file->whitelist->rela->rela_list, list) {
+			if (rela->sym->type == STT_SECTION &&
+			    rela->sym->sec == func->sec &&
 			    rela->addend == func->offset)
 				return true;
+			if (rela->sym->type == STT_FUNC && rela->sym == func)
+				return true;
+		}
 
 	/* check if it has a context switching instruction */
 	func_for_each_insn(file, func, insn)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2016-04-03 10:45 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2016-04-03 10:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton,
	Frédéric Weisbecker

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: 353def94606fda16d9ae1761b4b0583286481ec5 MAINTAINERS: Update my email address

This contains the nohz/atomic cleanup/fix for the fetch_or() ugliness you noted 
during the original nohz pull request, plus there's also misc fixes:

 - fix liblockdep build bug
 - fix uapi header build bug
 - print more lockdep hash collision info to help debug recent reports of hash collisions
 - update MAINTAINERS email address

 Thanks,

	Ingo

------------------>
Alfredo Alvarez Fernandez (1):
      locking/lockdep: Print chain_key collision information

Denys Vlasenko (1):
      uapi/linux/stddef.h: Provide __always_inline to userspace headers

Frederic Weisbecker (3):
      locking/atomic: Introduce atomic_fetch_or()
      timers/nohz: Convert tick dependency mask to atomic_t
      locking/atomic, sched: Unexport fetch_or()

Masami Hiramatsu (1):
      MAINTAINERS: Update my email address

Sedat Dilek (1):
      tools/lib/lockdep: Fix unsupported 'basename -s' in run_tests.sh


 MAINTAINERS                    |  2 +-
 include/linux/atomic.h         | 34 +++++++++---------
 include/linux/sched.h          |  4 +--
 include/uapi/linux/stddef.h    |  4 +++
 kernel/locking/lockdep.c       | 79 ++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/core.c            | 18 ++++++++++
 kernel/time/tick-sched.c       | 61 ++++++++++++++++----------------
 kernel/time/tick-sched.h       |  2 +-
 tools/lib/lockdep/run_tests.sh | 12 ++++---
 9 files changed, 158 insertions(+), 58 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 378ebfff2d1f..ed121a5b9319 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6402,7 +6402,7 @@ KPROBES
 M:	Ananth N Mavinakayanahalli <ananth@in.ibm.com>
 M:	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
 M:	"David S. Miller" <davem@davemloft.net>
-M:	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+M:	Masami Hiramatsu <mhiramat@kernel.org>
 S:	Maintained
 F:	Documentation/kprobes.txt
 F:	include/linux/kprobes.h
diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index df4f369254c0..506c3531832e 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -559,25 +559,25 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 #endif
 
 /**
- * fetch_or - perform *ptr |= mask and return old value of *ptr
- * @ptr: pointer to value
- * @mask: mask to OR on the value
- *
- * cmpxchg based fetch_or, macro so it works for different integer types
+ * atomic_fetch_or - perform *p |= mask and return old value of *p
+ * @p: pointer to atomic_t
+ * @mask: mask to OR on the atomic_t
  */
-#ifndef fetch_or
-#define fetch_or(ptr, mask)						\
-({	typeof(*(ptr)) __old, __val = *(ptr);				\
-	for (;;) {							\
-		__old = cmpxchg((ptr), __val, __val | (mask));		\
-		if (__old == __val)					\
-			break;						\
-		__val = __old;						\
-	}								\
-	__old;								\
-})
-#endif
+#ifndef atomic_fetch_or
+static inline int atomic_fetch_or(atomic_t *p, int mask)
+{
+	int old, val = atomic_read(p);
+
+	for (;;) {
+		old = atomic_cmpxchg(p, val, val | mask);
+		if (old == val)
+			break;
+		val = old;
+	}
 
+	return old;
+}
+#endif
 
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 60bba7e032dc..52c4847b05e2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -720,7 +720,7 @@ struct signal_struct {
 	struct task_cputime cputime_expires;
 
 #ifdef CONFIG_NO_HZ_FULL
-	unsigned long tick_dep_mask;
+	atomic_t tick_dep_mask;
 #endif
 
 	struct list_head cpu_timers[3];
@@ -1549,7 +1549,7 @@ struct task_struct {
 #endif
 
 #ifdef CONFIG_NO_HZ_FULL
-	unsigned long tick_dep_mask;
+	atomic_t tick_dep_mask;
 #endif
 	unsigned long nvcsw, nivcsw; /* context switch counts */
 	u64 start_time;		/* monotonic time in nsec */
diff --git a/include/uapi/linux/stddef.h b/include/uapi/linux/stddef.h
index aa9f10428743..621fa8ac4425 100644
--- a/include/uapi/linux/stddef.h
+++ b/include/uapi/linux/stddef.h
@@ -1 +1,5 @@
 #include <linux/compiler.h>
+
+#ifndef __always_inline
+#define __always_inline inline
+#endif
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 53ab2f85d77e..2324ba5310db 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2000,6 +2000,77 @@ static inline int get_first_held_lock(struct task_struct *curr,
 }
 
 /*
+ * Returns the next chain_key iteration
+ */
+static u64 print_chain_key_iteration(int class_idx, u64 chain_key)
+{
+	u64 new_chain_key = iterate_chain_key(chain_key, class_idx);
+
+	printk(" class_idx:%d -> chain_key:%016Lx",
+		class_idx,
+		(unsigned long long)new_chain_key);
+	return new_chain_key;
+}
+
+static void
+print_chain_keys_held_locks(struct task_struct *curr, struct held_lock *hlock_next)
+{
+	struct held_lock *hlock;
+	u64 chain_key = 0;
+	int depth = curr->lockdep_depth;
+	int i;
+
+	printk("depth: %u\n", depth + 1);
+	for (i = get_first_held_lock(curr, hlock_next); i < depth; i++) {
+		hlock = curr->held_locks + i;
+		chain_key = print_chain_key_iteration(hlock->class_idx, chain_key);
+
+		print_lock(hlock);
+	}
+
+	print_chain_key_iteration(hlock_next->class_idx, chain_key);
+	print_lock(hlock_next);
+}
+
+static void print_chain_keys_chain(struct lock_chain *chain)
+{
+	int i;
+	u64 chain_key = 0;
+	int class_id;
+
+	printk("depth: %u\n", chain->depth);
+	for (i = 0; i < chain->depth; i++) {
+		class_id = chain_hlocks[chain->base + i];
+		chain_key = print_chain_key_iteration(class_id + 1, chain_key);
+
+		print_lock_name(lock_classes + class_id);
+		printk("\n");
+	}
+}
+
+static void print_collision(struct task_struct *curr,
+			struct held_lock *hlock_next,
+			struct lock_chain *chain)
+{
+	printk("\n");
+	printk("======================\n");
+	printk("[chain_key collision ]\n");
+	print_kernel_ident();
+	printk("----------------------\n");
+	printk("%s/%d: ", current->comm, task_pid_nr(current));
+	printk("Hash chain already cached but the contents don't match!\n");
+
+	printk("Held locks:");
+	print_chain_keys_held_locks(curr, hlock_next);
+
+	printk("Locks in cached chain:");
+	print_chain_keys_chain(chain);
+
+	printk("\nstack backtrace:\n");
+	dump_stack();
+}
+
+/*
  * Checks whether the chain and the current held locks are consistent
  * in depth and also in content. If they are not it most likely means
  * that there was a collision during the calculation of the chain_key.
@@ -2014,14 +2085,18 @@ static int check_no_collision(struct task_struct *curr,
 
 	i = get_first_held_lock(curr, hlock);
 
-	if (DEBUG_LOCKS_WARN_ON(chain->depth != curr->lockdep_depth - (i - 1)))
+	if (DEBUG_LOCKS_WARN_ON(chain->depth != curr->lockdep_depth - (i - 1))) {
+		print_collision(curr, hlock, chain);
 		return 0;
+	}
 
 	for (j = 0; j < chain->depth - 1; j++, i++) {
 		id = curr->held_locks[i].class_idx - 1;
 
-		if (DEBUG_LOCKS_WARN_ON(chain_hlocks[chain->base + j] != id))
+		if (DEBUG_LOCKS_WARN_ON(chain_hlocks[chain->base + j] != id)) {
+			print_collision(curr, hlock, chain);
 			return 0;
+		}
 	}
 #endif
 	return 1;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d8465eeab8b3..8b489fcac37b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -321,6 +321,24 @@ static inline void init_hrtick(void)
 }
 #endif	/* CONFIG_SCHED_HRTICK */
 
+/*
+ * cmpxchg based fetch_or, macro so it works for different integer types
+ */
+#define fetch_or(ptr, mask)						\
+	({								\
+		typeof(ptr) _ptr = (ptr);				\
+		typeof(mask) _mask = (mask);				\
+		typeof(*_ptr) _old, _val = *_ptr;			\
+									\
+		for (;;) {						\
+			_old = cmpxchg(_ptr, _val, _val | _mask);	\
+			if (_old == _val)				\
+				break;					\
+			_val = _old;					\
+		}							\
+	_old;								\
+})
+
 #if defined(CONFIG_SMP) && defined(TIF_POLLING_NRFLAG)
 /*
  * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 084b79f5917e..58e3310c9b21 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -157,52 +157,50 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 cpumask_var_t tick_nohz_full_mask;
 cpumask_var_t housekeeping_mask;
 bool tick_nohz_full_running;
-static unsigned long tick_dep_mask;
+static atomic_t tick_dep_mask;
 
-static void trace_tick_dependency(unsigned long dep)
+static bool check_tick_dependency(atomic_t *dep)
 {
-	if (dep & TICK_DEP_MASK_POSIX_TIMER) {
+	int val = atomic_read(dep);
+
+	if (val & TICK_DEP_MASK_POSIX_TIMER) {
 		trace_tick_stop(0, TICK_DEP_MASK_POSIX_TIMER);
-		return;
+		return true;
 	}
 
-	if (dep & TICK_DEP_MASK_PERF_EVENTS) {
+	if (val & TICK_DEP_MASK_PERF_EVENTS) {
 		trace_tick_stop(0, TICK_DEP_MASK_PERF_EVENTS);
-		return;
+		return true;
 	}
 
-	if (dep & TICK_DEP_MASK_SCHED) {
+	if (val & TICK_DEP_MASK_SCHED) {
 		trace_tick_stop(0, TICK_DEP_MASK_SCHED);
-		return;
+		return true;
 	}
 
-	if (dep & TICK_DEP_MASK_CLOCK_UNSTABLE)
+	if (val & TICK_DEP_MASK_CLOCK_UNSTABLE) {
 		trace_tick_stop(0, TICK_DEP_MASK_CLOCK_UNSTABLE);
+		return true;
+	}
+
+	return false;
 }
 
 static bool can_stop_full_tick(struct tick_sched *ts)
 {
 	WARN_ON_ONCE(!irqs_disabled());
 
-	if (tick_dep_mask) {
-		trace_tick_dependency(tick_dep_mask);
+	if (check_tick_dependency(&tick_dep_mask))
 		return false;
-	}
 
-	if (ts->tick_dep_mask) {
-		trace_tick_dependency(ts->tick_dep_mask);
+	if (check_tick_dependency(&ts->tick_dep_mask))
 		return false;
-	}
 
-	if (current->tick_dep_mask) {
-		trace_tick_dependency(current->tick_dep_mask);
+	if (check_tick_dependency(&current->tick_dep_mask))
 		return false;
-	}
 
-	if (current->signal->tick_dep_mask) {
-		trace_tick_dependency(current->signal->tick_dep_mask);
+	if (check_tick_dependency(&current->signal->tick_dep_mask))
 		return false;
-	}
 
 	return true;
 }
@@ -259,12 +257,12 @@ static void tick_nohz_full_kick_all(void)
 	preempt_enable();
 }
 
-static void tick_nohz_dep_set_all(unsigned long *dep,
+static void tick_nohz_dep_set_all(atomic_t *dep,
 				  enum tick_dep_bits bit)
 {
-	unsigned long prev;
+	int prev;
 
-	prev = fetch_or(dep, BIT_MASK(bit));
+	prev = atomic_fetch_or(dep, BIT(bit));
 	if (!prev)
 		tick_nohz_full_kick_all();
 }
@@ -280,7 +278,7 @@ void tick_nohz_dep_set(enum tick_dep_bits bit)
 
 void tick_nohz_dep_clear(enum tick_dep_bits bit)
 {
-	clear_bit(bit, &tick_dep_mask);
+	atomic_andnot(BIT(bit), &tick_dep_mask);
 }
 
 /*
@@ -289,12 +287,12 @@ void tick_nohz_dep_clear(enum tick_dep_bits bit)
  */
 void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
 {
-	unsigned long prev;
+	int prev;
 	struct tick_sched *ts;
 
 	ts = per_cpu_ptr(&tick_cpu_sched, cpu);
 
-	prev = fetch_or(&ts->tick_dep_mask, BIT_MASK(bit));
+	prev = atomic_fetch_or(&ts->tick_dep_mask, BIT(bit));
 	if (!prev) {
 		preempt_disable();
 		/* Perf needs local kick that is NMI safe */
@@ -313,7 +311,7 @@ void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
 {
 	struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);
 
-	clear_bit(bit, &ts->tick_dep_mask);
+	atomic_andnot(BIT(bit), &ts->tick_dep_mask);
 }
 
 /*
@@ -331,7 +329,7 @@ void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit)
 
 void tick_nohz_dep_clear_task(struct task_struct *tsk, enum tick_dep_bits bit)
 {
-	clear_bit(bit, &tsk->tick_dep_mask);
+	atomic_andnot(BIT(bit), &tsk->tick_dep_mask);
 }
 
 /*
@@ -345,7 +343,7 @@ void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit)
 
 void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bit)
 {
-	clear_bit(bit, &sig->tick_dep_mask);
+	atomic_andnot(BIT(bit), &sig->tick_dep_mask);
 }
 
 /*
@@ -366,7 +364,8 @@ void __tick_nohz_task_switch(void)
 	ts = this_cpu_ptr(&tick_cpu_sched);
 
 	if (ts->tick_stopped) {
-		if (current->tick_dep_mask || current->signal->tick_dep_mask)
+		if (atomic_read(&current->tick_dep_mask) ||
+		    atomic_read(&current->signal->tick_dep_mask))
 			tick_nohz_full_kick();
 	}
 out:
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index eb4e32566a83..bf38226e5c17 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -60,7 +60,7 @@ struct tick_sched {
 	u64				next_timer;
 	ktime_t				idle_expires;
 	int				do_timer_last;
-	unsigned long			tick_dep_mask;
+	atomic_t			tick_dep_mask;
 };
 
 extern struct tick_sched *tick_get_tick_sched(int cpu);
diff --git a/tools/lib/lockdep/run_tests.sh b/tools/lib/lockdep/run_tests.sh
index 5334ad9d39b7..1069d96248c1 100755
--- a/tools/lib/lockdep/run_tests.sh
+++ b/tools/lib/lockdep/run_tests.sh
@@ -3,7 +3,7 @@
 make &> /dev/null
 
 for i in `ls tests/*.c`; do
-	testname=$(basename -s .c "$i")
+	testname=$(basename "$i" .c)
 	gcc -o tests/$testname -pthread -lpthread $i liblockdep.a -Iinclude -D__USE_LIBLOCKDEP &> /dev/null
 	echo -ne "$testname... "
 	if [ $(timeout 1 ./tests/$testname | wc -l) -gt 0 ]; then
@@ -11,11 +11,13 @@ for i in `ls tests/*.c`; do
 	else
 		echo "FAILED!"
 	fi
-	rm tests/$testname
+	if [ -f "tests/$testname" ]; then
+		rm tests/$testname
+	fi
 done
 
 for i in `ls tests/*.c`; do
-	testname=$(basename -s .c "$i")
+	testname=$(basename "$i" .c)
 	gcc -o tests/$testname -pthread -lpthread -Iinclude $i &> /dev/null
 	echo -ne "(PRELOAD) $testname... "
 	if [ $(timeout 1 ./lockdep ./tests/$testname | wc -l) -gt 0 ]; then
@@ -23,5 +25,7 @@ for i in `ls tests/*.c`; do
 	else
 		echo "FAILED!"
 	fi
-	rm tests/$testname
+	if [ -f "tests/$testname" ]; then
+		rm tests/$testname
+	fi
 done

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2015-02-06 18:28 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2015-02-06 18:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   # HEAD: 135818bf494e6f8f7c9327d9d9e015f7548b6f8d Merge branch 'liblockdep-fixes-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux into core/urgent

Two liblockdep fixes and a CPU hotplug race fix.

 Thanks,

	Ingo

------------------>
Baruch Siach (2):
      tools/liblockdep: ignore generated .so file
      tools/liblockdep: don't include host headers

Lai Jiangshan (1):
      smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()


 kernel/smpboot.c             | 2 ++
 tools/lib/lockdep/.gitignore | 1 +
 tools/lib/lockdep/Makefile   | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/lockdep/.gitignore

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index f032fb5284e3..40190f28db35 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -280,6 +280,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
 	unsigned int cpu;
 	int ret = 0;
 
+	get_online_cpus();
 	mutex_lock(&smpboot_threads_lock);
 	for_each_online_cpu(cpu) {
 		ret = __smpboot_create_thread(plug_thread, cpu);
@@ -292,6 +293,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
 	list_add(&plug_thread->list, &hotplug_threads);
 out:
 	mutex_unlock(&smpboot_threads_lock);
+	put_online_cpus();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
diff --git a/tools/lib/lockdep/.gitignore b/tools/lib/lockdep/.gitignore
new file mode 100644
index 000000000000..cc0e7a9f99e3
--- /dev/null
+++ b/tools/lib/lockdep/.gitignore
@@ -0,0 +1 @@
+liblockdep.so.*
diff --git a/tools/lib/lockdep/Makefile b/tools/lib/lockdep/Makefile
index 52f9279c6c13..4b866c54f624 100644
--- a/tools/lib/lockdep/Makefile
+++ b/tools/lib/lockdep/Makefile
@@ -104,7 +104,7 @@ N		=
 
 export Q VERBOSE
 
-INCLUDES = -I. -I/usr/local/include -I./uinclude -I./include -I../../include $(CONFIG_INCLUDES)
+INCLUDES = -I. -I./uinclude -I./include -I../../include $(CONFIG_INCLUDES)
 
 # Set compile option CFLAGS if not set elsewhere
 CFLAGS ?= -g -DCONFIG_LOCKDEP -DCONFIG_STACKTRACE -DCONFIG_PROVE_LOCKING -DBITS_PER_LONG=__WORDSIZE -DLIBLOCKDEP_VERSION='"$(LIBLOCKDEP_VERSION)"' -rdynamic -O0 -g

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2012-10-23 10:57 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2012-10-23 10:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   HEAD: fd0587339d80dd2fea5ead7f734676c9c618eace Documentation: Reflect the new location of the NMI watchdog info

Two small fixes.

 Thanks,

	Ingo

------------------>
Jean Delvare (1):
      Documentation: Reflect the new location of the NMI watchdog info

Michal Hocko (1):
      nohz: Fix idle ticks in cpu summary line of /proc/stat


 Documentation/00-INDEX |  4 ++--
 fs/proc/stat.c         | 14 ++++++++++----
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 49c0513..fec55dc 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -210,6 +210,8 @@ local_ops.txt
 	- semantics and behavior of local atomic operations.
 lockdep-design.txt
 	- documentation on the runtime locking correctness validator.
+lockup-watchdogs.txt
+	- info on soft and hard lockup detectors (aka nmi_watchdog).
 logo.gif
 	- full colour GIF image of Linux logo (penguin - Tux).
 logo.txt
@@ -240,8 +242,6 @@ netlabel/
 	- directory with information on the NetLabel subsystem.
 networking/
 	- directory with info on various aspects of networking with Linux.
-nmi_watchdog.txt
-	- info on NMI watchdog for SMP systems.
 nommu-mmap.txt
 	- documentation about no-mmu memory mapping support.
 numastat.txt
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 64c3b31..e296572 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -45,10 +45,13 @@ static cputime64_t get_iowait_time(int cpu)
 
 static u64 get_idle_time(int cpu)
 {
-	u64 idle, idle_time = get_cpu_idle_time_us(cpu, NULL);
+	u64 idle, idle_time = -1ULL;
+
+	if (cpu_online(cpu))
+		idle_time = get_cpu_idle_time_us(cpu, NULL);
 
 	if (idle_time == -1ULL)
-		/* !NO_HZ so we can rely on cpustat.idle */
+		/* !NO_HZ or cpu offline so we can rely on cpustat.idle */
 		idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE];
 	else
 		idle = usecs_to_cputime64(idle_time);
@@ -58,10 +61,13 @@ static u64 get_idle_time(int cpu)
 
 static u64 get_iowait_time(int cpu)
 {
-	u64 iowait, iowait_time = get_cpu_iowait_time_us(cpu, NULL);
+	u64 iowait, iowait_time = -1ULL;
+
+	if (cpu_online(cpu))
+		iowait_time = get_cpu_iowait_time_us(cpu, NULL);
 
 	if (iowait_time == -1ULL)
-		/* !NO_HZ so we can rely on cpustat.iowait */
+		/* !NO_HZ or cpu offline so we can rely on cpustat.iowait */
 		iowait = kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT];
 	else
 		iowait = usecs_to_cputime64(iowait_time);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2012-08-03 17:01   ` Ingo Molnar
@ 2012-08-03 17:24     ` Darren Hart
  0 siblings, 0 replies; 97+ messages in thread
From: Darren Hart @ 2012-08-03 17:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton



On 08/03/2012 10:01 AM, Ingo Molnar wrote:
> 
> * Darren Hart <dvhart@linux.intel.com> wrote:
> 
>> On 08/03/2012 09:31 AM, Ingo Molnar wrote:
>>> Linus,
>>>
>>> Please pull the latest core-urgent-for-linus git tree from:
>>>
>>>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus
>>>
>>>    HEAD: 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
>>>
>>> Various futex fixes for bugs Darren Hart found via his 
>>> testsuite.
>>>
>>
>> Minor correction. I fixed two bugs reported by Dave Jones 
>> (found with his trinity test) and Dan Carpenter through static 
>> analysis. The other I found while debugging the first two. 
>> Credit where credit is due.
> 
> Hm, from the wording of the changelogs I thought you were 
> running those tests. Please put such bug reporting info into the 
> changelog and/or add a Reported-by tag next time around - 
> testers are our most valuable contributors.


I see the attribution of the testing I left only in the cover letter, my
apologies, sloppy of me.

I had followed Dave's request that I mention trinity and CC him on bugs
found with trinity - but looking at that patch now, it doesn't attribute
that well enough.

I'll correct this in the future.

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2012-08-03 16:55 ` Darren Hart
@ 2012-08-03 17:01   ` Ingo Molnar
  2012-08-03 17:24     ` Darren Hart
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2012-08-03 17:01 UTC (permalink / raw)
  To: Darren Hart
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton


* Darren Hart <dvhart@linux.intel.com> wrote:

> On 08/03/2012 09:31 AM, Ingo Molnar wrote:
> > Linus,
> > 
> > Please pull the latest core-urgent-for-linus git tree from:
> > 
> >    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus
> > 
> >    HEAD: 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
> > 
> > Various futex fixes for bugs Darren Hart found via his 
> > testsuite.
> > 
> 
> Minor correction. I fixed two bugs reported by Dave Jones 
> (found with his trinity test) and Dan Carpenter through static 
> analysis. The other I found while debugging the first two. 
> Credit where credit is due.

Hm, from the wording of the changelogs I thought you were 
running those tests. Please put such bug reporting info into the 
changelog and/or add a Reported-by tag next time around - 
testers are our most valuable contributors.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2012-08-03 16:31 Ingo Molnar
@ 2012-08-03 16:55 ` Darren Hart
  2012-08-03 17:01   ` Ingo Molnar
  0 siblings, 1 reply; 97+ messages in thread
From: Darren Hart @ 2012-08-03 16:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton



On 08/03/2012 09:31 AM, Ingo Molnar wrote:
> Linus,
> 
> Please pull the latest core-urgent-for-linus git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus
> 
>    HEAD: 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
> 
> Various futex fixes for bugs Darren Hart found via his 
> testsuite.
> 

Minor correction. I fixed two bugs reported by Dave Jones (found with
his trinity test) and Dan Carpenter through static analysis. The other
I found while debugging the first two. Credit where credit is due.

Thanks,

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2012-08-03 16:31 Ingo Molnar
  2012-08-03 16:55 ` Darren Hart
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2012-08-03 16:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton,
	Darren Hart

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   HEAD: 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()

Various futex fixes for bugs Darren Hart found via his 
testsuite.

 Thanks,

	Ingo

------------------>
Darren Hart (3):
      futex: Test for pi_mutex on fault in futex_wait_requeue_pi()
      futex: Fix bug in WARN_ON for NULL q.pi_state
      futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()


 kernel/futex.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e2b0fb9..3717e7b 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2231,11 +2231,11 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
  * @uaddr2:	the pi futex we will take prior to returning to user-space
  *
  * The caller will wait on uaddr and will be requeued by futex_requeue() to
- * uaddr2 which must be PI aware.  Normal wakeup will wake on uaddr2 and
- * complete the acquisition of the rt_mutex prior to returning to userspace.
- * This ensures the rt_mutex maintains an owner when it has waiters; without
- * one, the pi logic wouldn't know which task to boost/deboost, if there was a
- * need to.
+ * uaddr2 which must be PI aware and unique from uaddr.  Normal wakeup will wake
+ * on uaddr2 and complete the acquisition of the rt_mutex prior to returning to
+ * userspace.  This ensures the rt_mutex maintains an owner when it has waiters;
+ * without one, the pi logic would not know which task to boost/deboost, if
+ * there was a need to.
  *
  * We call schedule in futex_wait_queue_me() when we enqueue and return there
  * via the following:
@@ -2272,6 +2272,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 	struct futex_q q = futex_q_init;
 	int res, ret;
 
+	if (uaddr == uaddr2)
+		return -EINVAL;
+
 	if (!bitset)
 		return -EINVAL;
 
@@ -2343,7 +2346,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 		 * signal.  futex_unlock_pi() will not destroy the lock_ptr nor
 		 * the pi_state.
 		 */
-		WARN_ON(!&q.pi_state);
+		WARN_ON(!q.pi_state);
 		pi_mutex = &q.pi_state->pi_mutex;
 		ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
 		debug_rt_mutex_free_waiter(&rt_waiter);
@@ -2370,7 +2373,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 	 * fault, unlock the rt_mutex and return the fault to userspace.
 	 */
 	if (ret == -EFAULT) {
-		if (rt_mutex_owner(pi_mutex) == current)
+		if (pi_mutex && rt_mutex_owner(pi_mutex) == current)
 			rt_mutex_unlock(pi_mutex);
 	} else if (ret == -EINTR) {
 		/*

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2012-06-15 18:45 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2012-06-15 18:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton,
	Paul E. McKenney

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   HEAD: 4a1e001d2bb75c47a9cdbbfb66ae51daff1ddcba Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent

Most of the diffstat comes from the RCU slow boot regression 
fixes, but there's also a debuggability improvements/fixes.

 Thanks,

	Ingo

------------------>
Christian Borntraeger (1):
      spinlock: Indicate that a lockup is only suspected

Kyle McMartin (1):
      panic: Make panic_on_oops configurable

Paul E. McKenney (4):
      rcu: RCU_FAST_NO_HZ detection of callback adoption
      rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks
      rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure
      rcu: Precompute RCU_FAST_NO_HZ timer offsets

Seiji Aguchi (1):
      kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()

Stephen Boyd (1):
      memblock: Document memblock_is_region_{memory,reserved}()


 include/linux/rcutiny.h    |   6 +-
 include/linux/rcutree.h    |   2 +-
 include/trace/events/rcu.h |   1 +
 kernel/panic.c             |   6 +-
 kernel/rcutree.c           |   2 +
 kernel/rcutree.h           |  14 ++++
 kernel/rcutree_plugin.h    | 165 ++++++++++++++++++++++++---------------------
 kernel/time/tick-sched.c   |   7 +-
 lib/Kconfig.debug          |  20 ++++++
 lib/spinlock_debug.c       |   2 +-
 mm/memblock.c              |  20 ++++++
 11 files changed, 160 insertions(+), 85 deletions(-)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index adb5e5a..854dc4c 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -87,8 +87,9 @@ static inline void kfree_call_rcu(struct rcu_head *head,
 
 #ifdef CONFIG_TINY_RCU
 
-static inline int rcu_needs_cpu(int cpu)
+static inline int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
 {
+	*delta_jiffies = ULONG_MAX;
 	return 0;
 }
 
@@ -96,8 +97,9 @@ static inline int rcu_needs_cpu(int cpu)
 
 int rcu_preempt_needs_cpu(void);
 
-static inline int rcu_needs_cpu(int cpu)
+static inline int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
 {
+	*delta_jiffies = ULONG_MAX;
 	return rcu_preempt_needs_cpu();
 }
 
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 3c6083c..952b793 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -32,7 +32,7 @@
 
 extern void rcu_init(void);
 extern void rcu_note_context_switch(int cpu);
-extern int rcu_needs_cpu(int cpu);
+extern int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies);
 extern void rcu_cpu_stall_reset(void);
 
 /*
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 1480900..d274734 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -289,6 +289,7 @@ TRACE_EVENT(rcu_dyntick,
  *	"In holdoff": Nothing to do, holding off after unsuccessful attempt.
  *	"Begin holdoff": Attempt failed, don't retry until next jiffy.
  *	"Dyntick with callbacks": Entering dyntick-idle despite callbacks.
+ *	"Dyntick with lazy callbacks": Entering dyntick-idle w/lazy callbacks.
  *	"More callbacks": Still more callbacks, try again to clear them out.
  *	"Callbacks drained": All callbacks processed, off to dyntick idle!
  *	"Timer": Timer fired to cause CPU to continue processing callbacks.
diff --git a/kernel/panic.c b/kernel/panic.c
index 8ed89a1..d2a5f4e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -27,7 +27,7 @@
 #define PANIC_TIMER_STEP 100
 #define PANIC_BLINK_SPD 18
 
-int panic_on_oops;
+int panic_on_oops = CONFIG_PANIC_ON_OOPS_VALUE;
 static unsigned long tainted_mask;
 static int pause_on_oops;
 static int pause_on_oops_flag;
@@ -108,8 +108,6 @@ void panic(const char *fmt, ...)
 	 */
 	crash_kexec(NULL);
 
-	kmsg_dump(KMSG_DUMP_PANIC);
-
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
 	 * unfortunately means it may not be hardened to work in a panic
@@ -117,6 +115,8 @@ void panic(const char *fmt, ...)
 	 */
 	smp_send_stop();
 
+	kmsg_dump(KMSG_DUMP_PANIC);
+
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
 	bust_spinlocks(0);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 0da7b88..3b0f133 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1397,6 +1397,8 @@ static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)
 	rdp->qlen_lazy += rsp->qlen_lazy;
 	rdp->qlen += rsp->qlen;
 	rdp->n_cbs_adopted += rsp->qlen;
+	if (rsp->qlen_lazy != rsp->qlen)
+		rcu_idle_count_callbacks_posted();
 	rsp->qlen_lazy = 0;
 	rsp->qlen = 0;
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 7f5d138..ea05649 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -84,6 +84,20 @@ struct rcu_dynticks {
 				    /* Process level is worth LLONG_MAX/2. */
 	int dynticks_nmi_nesting;   /* Track NMI nesting level. */
 	atomic_t dynticks;	    /* Even value for idle, else odd. */
+#ifdef CONFIG_RCU_FAST_NO_HZ
+	int dyntick_drain;	    /* Prepare-for-idle state variable. */
+	unsigned long dyntick_holdoff;
+				    /* No retries for the jiffy of failure. */
+	struct timer_list idle_gp_timer;
+				    /* Wake up CPU sleeping with callbacks. */
+	unsigned long idle_gp_timer_expires;
+				    /* When to wake up CPU (for repost). */
+	bool idle_first_pass;	    /* First pass of attempt to go idle? */
+	unsigned long nonlazy_posted;
+				    /* # times non-lazy CBs posted to CPU. */
+	unsigned long nonlazy_posted_snap;
+				    /* idle-period nonlazy_posted snapshot. */
+#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 };
 
 /* RCU's kthread states for tracing. */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 2411000..5271a02 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1886,8 +1886,9 @@ static void __cpuinit rcu_prepare_kthreads(int cpu)
  * Because we not have RCU_FAST_NO_HZ, just check whether this CPU needs
  * any flavor of RCU.
  */
-int rcu_needs_cpu(int cpu)
+int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
 {
+	*delta_jiffies = ULONG_MAX;
 	return rcu_cpu_has_callbacks(cpu);
 }
 
@@ -1962,41 +1963,6 @@ static void rcu_idle_count_callbacks_posted(void)
 #define RCU_IDLE_GP_DELAY 6		/* Roughly one grace period. */
 #define RCU_IDLE_LAZY_GP_DELAY (6 * HZ)	/* Roughly six seconds. */
 
-/* Loop counter for rcu_prepare_for_idle(). */
-static DEFINE_PER_CPU(int, rcu_dyntick_drain);
-/* If rcu_dyntick_holdoff==jiffies, don't try to enter dyntick-idle mode. */
-static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff);
-/* Timer to awaken the CPU if it enters dyntick-idle mode with callbacks. */
-static DEFINE_PER_CPU(struct timer_list, rcu_idle_gp_timer);
-/* Scheduled expiry time for rcu_idle_gp_timer to allow reposting. */
-static DEFINE_PER_CPU(unsigned long, rcu_idle_gp_timer_expires);
-/* Enable special processing on first attempt to enter dyntick-idle mode. */
-static DEFINE_PER_CPU(bool, rcu_idle_first_pass);
-/* Running count of non-lazy callbacks posted, never decremented. */
-static DEFINE_PER_CPU(unsigned long, rcu_nonlazy_posted);
-/* Snapshot of rcu_nonlazy_posted to detect meaningful exits from idle. */
-static DEFINE_PER_CPU(unsigned long, rcu_nonlazy_posted_snap);
-
-/*
- * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
- * callbacks on this CPU, (2) this CPU has not yet attempted to enter
- * dyntick-idle mode, or (3) this CPU is in the process of attempting to
- * enter dyntick-idle mode.  Otherwise, if we have recently tried and failed
- * to enter dyntick-idle mode, we refuse to try to enter it.  After all,
- * it is better to incur scheduling-clock interrupts than to spin
- * continuously for the same time duration!
- */
-int rcu_needs_cpu(int cpu)
-{
-	/* Flag a new idle sojourn to the idle-entry state machine. */
-	per_cpu(rcu_idle_first_pass, cpu) = 1;
-	/* If no callbacks, RCU doesn't need the CPU. */
-	if (!rcu_cpu_has_callbacks(cpu))
-		return 0;
-	/* Otherwise, RCU needs the CPU only if it recently tried and failed. */
-	return per_cpu(rcu_dyntick_holdoff, cpu) == jiffies;
-}
-
 /*
  * Does the specified flavor of RCU have non-lazy callbacks pending on
  * the specified CPU?  Both RCU flavor and CPU are specified by the
@@ -2040,6 +2006,47 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
 }
 
 /*
+ * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
+ * callbacks on this CPU, (2) this CPU has not yet attempted to enter
+ * dyntick-idle mode, or (3) this CPU is in the process of attempting to
+ * enter dyntick-idle mode.  Otherwise, if we have recently tried and failed
+ * to enter dyntick-idle mode, we refuse to try to enter it.  After all,
+ * it is better to incur scheduling-clock interrupts than to spin
+ * continuously for the same time duration!
+ *
+ * The delta_jiffies argument is used to store the time when RCU is
+ * going to need the CPU again if it still has callbacks.  The reason
+ * for this is that rcu_prepare_for_idle() might need to post a timer,
+ * but if so, it will do so after tick_nohz_stop_sched_tick() has set
+ * the wakeup time for this CPU.  This means that RCU's timer can be
+ * delayed until the wakeup time, which defeats the purpose of posting
+ * a timer.
+ */
+int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
+{
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+	/* Flag a new idle sojourn to the idle-entry state machine. */
+	rdtp->idle_first_pass = 1;
+	/* If no callbacks, RCU doesn't need the CPU. */
+	if (!rcu_cpu_has_callbacks(cpu)) {
+		*delta_jiffies = ULONG_MAX;
+		return 0;
+	}
+	if (rdtp->dyntick_holdoff == jiffies) {
+		/* RCU recently tried and failed, so don't try again. */
+		*delta_jiffies = 1;
+		return 1;
+	}
+	/* Set up for the possibility that RCU will post a timer. */
+	if (rcu_cpu_has_nonlazy_callbacks(cpu))
+		*delta_jiffies = RCU_IDLE_GP_DELAY;
+	else
+		*delta_jiffies = RCU_IDLE_LAZY_GP_DELAY;
+	return 0;
+}
+
+/*
  * Handler for smp_call_function_single().  The only point of this
  * handler is to wake the CPU up, so the handler does only tracing.
  */
@@ -2075,21 +2082,24 @@ static void rcu_idle_gp_timer_func(unsigned long cpu_in)
  */
 static void rcu_prepare_for_idle_init(int cpu)
 {
-	per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;
-	setup_timer(&per_cpu(rcu_idle_gp_timer, cpu),
-		    rcu_idle_gp_timer_func, cpu);
-	per_cpu(rcu_idle_gp_timer_expires, cpu) = jiffies - 1;
-	per_cpu(rcu_idle_first_pass, cpu) = 1;
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+	rdtp->dyntick_holdoff = jiffies - 1;
+	setup_timer(&rdtp->idle_gp_timer, rcu_idle_gp_timer_func, cpu);
+	rdtp->idle_gp_timer_expires = jiffies - 1;
+	rdtp->idle_first_pass = 1;
 }
 
 /*
  * Clean up for exit from idle.  Because we are exiting from idle, there
- * is no longer any point to rcu_idle_gp_timer, so cancel it.  This will
+ * is no longer any point to ->idle_gp_timer, so cancel it.  This will
  * do nothing if this timer is not active, so just cancel it unconditionally.
  */
 static void rcu_cleanup_after_idle(int cpu)
 {
-	del_timer(&per_cpu(rcu_idle_gp_timer, cpu));
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+	del_timer(&rdtp->idle_gp_timer);
 	trace_rcu_prep_idle("Cleanup after idle");
 }
 
@@ -2108,42 +2118,41 @@ static void rcu_cleanup_after_idle(int cpu)
  * Because it is not legal to invoke rcu_process_callbacks() with irqs
  * disabled, we do one pass of force_quiescent_state(), then do a
  * invoke_rcu_core() to cause rcu_process_callbacks() to be invoked
- * later.  The per-cpu rcu_dyntick_drain variable controls the sequencing.
+ * later.  The ->dyntick_drain field controls the sequencing.
  *
  * The caller must have disabled interrupts.
  */
 static void rcu_prepare_for_idle(int cpu)
 {
 	struct timer_list *tp;
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
 
 	/*
 	 * If this is an idle re-entry, for example, due to use of
 	 * RCU_NONIDLE() or the new idle-loop tracing API within the idle
 	 * loop, then don't take any state-machine actions, unless the
 	 * momentary exit from idle queued additional non-lazy callbacks.
-	 * Instead, repost the rcu_idle_gp_timer if this CPU has callbacks
+	 * Instead, repost the ->idle_gp_timer if this CPU has callbacks
 	 * pending.
 	 */
-	if (!per_cpu(rcu_idle_first_pass, cpu) &&
-	    (per_cpu(rcu_nonlazy_posted, cpu) ==
-	     per_cpu(rcu_nonlazy_posted_snap, cpu))) {
+	if (!rdtp->idle_first_pass &&
+	    (rdtp->nonlazy_posted == rdtp->nonlazy_posted_snap)) {
 		if (rcu_cpu_has_callbacks(cpu)) {
-			tp = &per_cpu(rcu_idle_gp_timer, cpu);
-			mod_timer_pinned(tp, per_cpu(rcu_idle_gp_timer_expires, cpu));
+			tp = &rdtp->idle_gp_timer;
+			mod_timer_pinned(tp, rdtp->idle_gp_timer_expires);
 		}
 		return;
 	}
-	per_cpu(rcu_idle_first_pass, cpu) = 0;
-	per_cpu(rcu_nonlazy_posted_snap, cpu) =
-		per_cpu(rcu_nonlazy_posted, cpu) - 1;
+	rdtp->idle_first_pass = 0;
+	rdtp->nonlazy_posted_snap = rdtp->nonlazy_posted - 1;
 
 	/*
 	 * If there are no callbacks on this CPU, enter dyntick-idle mode.
 	 * Also reset state to avoid prejudicing later attempts.
 	 */
 	if (!rcu_cpu_has_callbacks(cpu)) {
-		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;
-		per_cpu(rcu_dyntick_drain, cpu) = 0;
+		rdtp->dyntick_holdoff = jiffies - 1;
+		rdtp->dyntick_drain = 0;
 		trace_rcu_prep_idle("No callbacks");
 		return;
 	}
@@ -2152,36 +2161,37 @@ static void rcu_prepare_for_idle(int cpu)
 	 * If in holdoff mode, just return.  We will presumably have
 	 * refrained from disabling the scheduling-clock tick.
 	 */
-	if (per_cpu(rcu_dyntick_holdoff, cpu) == jiffies) {
+	if (rdtp->dyntick_holdoff == jiffies) {
 		trace_rcu_prep_idle("In holdoff");
 		return;
 	}
 
-	/* Check and update the rcu_dyntick_drain sequencing. */
-	if (per_cpu(rcu_dyntick_drain, cpu) <= 0) {
+	/* Check and update the ->dyntick_drain sequencing. */
+	if (rdtp->dyntick_drain <= 0) {
 		/* First time through, initialize the counter. */
-		per_cpu(rcu_dyntick_drain, cpu) = RCU_IDLE_FLUSHES;
-	} else if (per_cpu(rcu_dyntick_drain, cpu) <= RCU_IDLE_OPT_FLUSHES &&
+		rdtp->dyntick_drain = RCU_IDLE_FLUSHES;
+	} else if (rdtp->dyntick_drain <= RCU_IDLE_OPT_FLUSHES &&
 		   !rcu_pending(cpu) &&
 		   !local_softirq_pending()) {
 		/* Can we go dyntick-idle despite still having callbacks? */
-		trace_rcu_prep_idle("Dyntick with callbacks");
-		per_cpu(rcu_dyntick_drain, cpu) = 0;
-		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
-		if (rcu_cpu_has_nonlazy_callbacks(cpu))
-			per_cpu(rcu_idle_gp_timer_expires, cpu) =
+		rdtp->dyntick_drain = 0;
+		rdtp->dyntick_holdoff = jiffies;
+		if (rcu_cpu_has_nonlazy_callbacks(cpu)) {
+			trace_rcu_prep_idle("Dyntick with callbacks");
+			rdtp->idle_gp_timer_expires =
 					   jiffies + RCU_IDLE_GP_DELAY;
-		else
-			per_cpu(rcu_idle_gp_timer_expires, cpu) =
+		} else {
+			rdtp->idle_gp_timer_expires =
 					   jiffies + RCU_IDLE_LAZY_GP_DELAY;
-		tp = &per_cpu(rcu_idle_gp_timer, cpu);
-		mod_timer_pinned(tp, per_cpu(rcu_idle_gp_timer_expires, cpu));
-		per_cpu(rcu_nonlazy_posted_snap, cpu) =
-			per_cpu(rcu_nonlazy_posted, cpu);
+			trace_rcu_prep_idle("Dyntick with lazy callbacks");
+		}
+		tp = &rdtp->idle_gp_timer;
+		mod_timer_pinned(tp, rdtp->idle_gp_timer_expires);
+		rdtp->nonlazy_posted_snap = rdtp->nonlazy_posted;
 		return; /* Nothing more to do immediately. */
-	} else if (--per_cpu(rcu_dyntick_drain, cpu) <= 0) {
+	} else if (--(rdtp->dyntick_drain) <= 0) {
 		/* We have hit the limit, so time to give up. */
-		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
+		rdtp->dyntick_holdoff = jiffies;
 		trace_rcu_prep_idle("Begin holdoff");
 		invoke_rcu_core();  /* Force the CPU out of dyntick-idle. */
 		return;
@@ -2227,7 +2237,7 @@ static void rcu_prepare_for_idle(int cpu)
  */
 static void rcu_idle_count_callbacks_posted(void)
 {
-	__this_cpu_add(rcu_nonlazy_posted, 1);
+	__this_cpu_add(rcu_dynticks.nonlazy_posted, 1);
 }
 
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
@@ -2238,11 +2248,12 @@ static void rcu_idle_count_callbacks_posted(void)
 
 static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
 {
-	struct timer_list *tltp = &per_cpu(rcu_idle_gp_timer, cpu);
+	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+	struct timer_list *tltp = &rdtp->idle_gp_timer;
 
 	sprintf(cp, "drain=%d %c timer=%lu",
-		per_cpu(rcu_dyntick_drain, cpu),
-		per_cpu(rcu_dyntick_holdoff, cpu) == jiffies ? 'H' : '.',
+		rdtp->dyntick_drain,
+		rdtp->dyntick_holdoff == jiffies ? 'H' : '.',
 		timer_pending(tltp) ? tltp->expires - jiffies : -1);
 }
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da70c6d..8699978 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -274,6 +274,7 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 {
 	unsigned long seq, last_jiffies, next_jiffies, delta_jiffies;
+	unsigned long rcu_delta_jiffies;
 	ktime_t last_update, expires, now;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
@@ -322,7 +323,7 @@ static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 		time_delta = timekeeping_max_deferment();
 	} while (read_seqretry(&xtime_lock, seq));
 
-	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
+	if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
 	    arch_needs_cpu(cpu)) {
 		next_jiffies = last_jiffies + 1;
 		delta_jiffies = 1;
@@ -330,6 +331,10 @@ static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 		/* Get the next timer wheel timer */
 		next_jiffies = get_next_timer_interrupt(last_jiffies);
 		delta_jiffies = next_jiffies - last_jiffies;
+		if (rcu_delta_jiffies < delta_jiffies) {
+			next_jiffies = last_jiffies + rcu_delta_jiffies;
+			delta_jiffies = rcu_delta_jiffies;
+		}
 	}
 	/*
 	 * Do not stop the tick, if we are only one off
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a42d3ae..ff5bdee 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -241,6 +241,26 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
 	default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
 	default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
 
+config PANIC_ON_OOPS
+	bool "Panic on Oops" if EXPERT
+	default n
+	help
+	  Say Y here to enable the kernel to panic when it oopses. This
+	  has the same effect as setting oops=panic on the kernel command
+	  line.
+
+	  This feature is useful to ensure that the kernel does not do
+	  anything erroneous after an oops which could result in data
+	  corruption or other issues.
+
+	  Say N if unsure.
+
+config PANIC_ON_OOPS_VALUE
+	int
+	range 0 1
+	default 0 if !PANIC_ON_OOPS
+	default 1 if PANIC_ON_OOPS
+
 config DETECT_HUNG_TASK
 	bool "Detect Hung Tasks"
 	depends on DEBUG_KERNEL
diff --git a/lib/spinlock_debug.c b/lib/spinlock_debug.c
index d0ec4f3..e91fbc2 100644
--- a/lib/spinlock_debug.c
+++ b/lib/spinlock_debug.c
@@ -118,7 +118,7 @@ static void __spin_lock_debug(raw_spinlock_t *lock)
 		/* lockup suspected: */
 		if (print_once) {
 			print_once = 0;
-			spin_dump(lock, "lockup");
+			spin_dump(lock, "lockup suspected");
 #ifdef CONFIG_SMP
 			trigger_all_cpu_backtrace();
 #endif
diff --git a/mm/memblock.c b/mm/memblock.c
index 952123e..32a0a5e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -867,6 +867,16 @@ int __init_memblock memblock_is_memory(phys_addr_t addr)
 	return memblock_search(&memblock.memory, addr) != -1;
 }
 
+/**
+ * memblock_is_region_memory - check if a region is a subset of memory
+ * @base: base of region to check
+ * @size: size of region to check
+ *
+ * Check if the region [@base, @base+@size) is a subset of a memory block.
+ *
+ * RETURNS:
+ * 0 if false, non-zero if true
+ */
 int __init_memblock memblock_is_region_memory(phys_addr_t base, phys_addr_t size)
 {
 	int idx = memblock_search(&memblock.memory, base);
@@ -879,6 +889,16 @@ int __init_memblock memblock_is_region_memory(phys_addr_t base, phys_addr_t size
 		 memblock.memory.regions[idx].size) >= end;
 }
 
+/**
+ * memblock_is_region_reserved - check if a region intersects reserved memory
+ * @base: base of region to check
+ * @size: size of region to check
+ *
+ * Check if the region [@base, @base+@size) intersects a reserved memory block.
+ *
+ * RETURNS:
+ * 0 if false, non-zero if true
+ */
 int __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t size)
 {
 	memblock_cap_size(base, &size);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2012-01-26 18:05 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2012-01-26 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus

   HEAD: b64b223aed5f8aeeb6c046f1b050a8f976b87de0 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent

 Thanks,

	Ingo

------------------>
Heiko Carstens (1):
      rcu: Add missing __cpuinit annotation in rcutorture code

Paul E. McKenney (1):
      sched: Add "const" to is_idle_task() parameter

Rusty Russell (1):
      rcu: Make rcutorture bool parameters really bool (core code)

Tejun Heo (1):
      memblock: Fix alloc failure due to dumb underflow protection in memblock_find_in_range_node()


 include/linux/sched.h |    2 +-
 kernel/rcutorture.c   |    8 ++++----
 mm/memblock.c         |    7 +++++--
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index cf0eb34..40d8448 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2089,7 +2089,7 @@ extern struct task_struct *idle_task(int cpu);
  * is_idle_task - is the specified task an idle task?
  * @tsk: the task in question.
  */
-static inline bool is_idle_task(struct task_struct *p)
+static inline bool is_idle_task(const struct task_struct *p)
 {
 	return p->pid == 0;
 }
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 88f17b8..a58ac28 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -56,8 +56,8 @@ static int nreaders = -1;	/* # reader threads, defaults to 2*ncpus */
 static int nfakewriters = 4;	/* # fake writer threads */
 static int stat_interval;	/* Interval between stats, in seconds. */
 				/*  Defaults to "only at end of test". */
-static int verbose;		/* Print more debug info. */
-static int test_no_idle_hz;	/* Test RCU's support for tickless idle CPUs. */
+static bool verbose;		/* Print more debug info. */
+static bool test_no_idle_hz;	/* Test RCU's support for tickless idle CPUs. */
 static int shuffle_interval = 3; /* Interval between shuffles (in sec)*/
 static int stutter = 5;		/* Start/stop testing interval (in sec) */
 static int irqreader = 1;	/* RCU readers from irq (timers). */
@@ -1399,7 +1399,7 @@ rcu_torture_shutdown(void *arg)
  * Execute random CPU-hotplug operations at the interval specified
  * by the onoff_interval.
  */
-static int
+static int __cpuinit
 rcu_torture_onoff(void *arg)
 {
 	int cpu;
@@ -1447,7 +1447,7 @@ rcu_torture_onoff(void *arg)
 	return 0;
 }
 
-static int
+static int __cpuinit
 rcu_torture_onoff_init(void)
 {
 	if (onoff_interval <= 0)
diff --git a/mm/memblock.c b/mm/memblock.c
index 2f55f19..77b5f22 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -106,14 +106,17 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
 
-	/* adjust @start to avoid underflow and allocating the first page */
-	start = max3(start, size, (phys_addr_t)PAGE_SIZE);
+	/* avoid allocating the first page */
+	start = max_t(phys_addr_t, start, PAGE_SIZE);
 	end = max(start, end);
 
 	for_each_free_mem_range_reverse(i, nid, &this_start, &this_end, NULL) {
 		this_start = clamp(this_start, start, end);
 		this_end = clamp(this_end, start, end);
 
+		if (this_end < size)
+			continue;
+
 		cand = round_down(this_end - size, align);
 		if (cand >= this_start)
 			return cand;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2011-08-04 20:45 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2011-08-04 20:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-urgent-for-linus

 Thanks,

	Ingo

------------------>
Peter Zijlstra (4):
      lockdep: Fix trace_hardirqs_on_caller()
      lockdep: Fix up warning
      slab, lockdep: Annotate slab -> rcu -> debug_object -> slab
      slab, lockdep: Annotate the locks before using them

Shawn Bohrer (1):
      futex: Fix regression with read only mappings

Tejun Heo (1):
      lockdep: Clear whole lockdep_map on initialization


 kernel/futex.c   |   54 ++++++++++++++++++++++++-------
 kernel/lockdep.c |   37 ++++++++++-----------
 mm/slab.c        |   92 +++++++++++++++++++++++++++++++++++++++++------------
 3 files changed, 131 insertions(+), 52 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 0a30897..11cbe05 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -218,6 +218,8 @@ static void drop_futex_key_refs(union futex_key *key)
  * @uaddr:	virtual address of the futex
  * @fshared:	0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
  * @key:	address where result is stored.
+ * @rw:		mapping needs to be read/write (values: VERIFY_READ,
+ *              VERIFY_WRITE)
  *
  * Returns a negative error code or 0
  * The key words are stored in *key on success.
@@ -229,12 +231,12 @@ static void drop_futex_key_refs(union futex_key *key)
  * lock_page() might sleep, the caller should not hold a spinlock.
  */
 static int
-get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
+get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
 {
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
 	struct page *page, *page_head;
-	int err;
+	int err, ro = 0;
 
 	/*
 	 * The futex address must be "naturally" aligned.
@@ -262,8 +264,18 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
 
 again:
 	err = get_user_pages_fast(address, 1, 1, &page);
+	/*
+	 * If write access is not required (eg. FUTEX_WAIT), try
+	 * and get read-only access.
+	 */
+	if (err == -EFAULT && rw == VERIFY_READ) {
+		err = get_user_pages_fast(address, 1, 0, &page);
+		ro = 1;
+	}
 	if (err < 0)
 		return err;
+	else
+		err = 0;
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	page_head = page;
@@ -305,6 +317,13 @@ again:
 	if (!page_head->mapping) {
 		unlock_page(page_head);
 		put_page(page_head);
+		/*
+		* ZERO_PAGE pages don't have a mapping. Avoid a busy loop
+		* trying to find one. RW mapping would have COW'd (and thus
+		* have a mapping) so this page is RO and won't ever change.
+		*/
+		if ((page_head == ZERO_PAGE(address)))
+			return -EFAULT;
 		goto again;
 	}
 
@@ -316,6 +335,15 @@ again:
 	 * the object not the particular process.
 	 */
 	if (PageAnon(page_head)) {
+		/*
+		 * A RO anonymous page will never change and thus doesn't make
+		 * sense for futex operations.
+		 */
+		if (ro) {
+			err = -EFAULT;
+			goto out;
+		}
+
 		key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
 		key->private.mm = mm;
 		key->private.address = address;
@@ -327,9 +355,10 @@ again:
 
 	get_futex_key_refs(key);
 
+out:
 	unlock_page(page_head);
 	put_page(page_head);
-	return 0;
+	return err;
 }
 
 static inline void put_futex_key(union futex_key *key)
@@ -940,7 +969,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	if (!bitset)
 		return -EINVAL;
 
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -986,10 +1015,10 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 	int ret, op_ret;
 
 retry:
-	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1);
+	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, VERIFY_WRITE);
 	if (unlikely(ret != 0))
 		goto out_put_key1;
 
@@ -1243,10 +1272,11 @@ retry:
 		pi_state = NULL;
 	}
 
-	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1);
+	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2,
+			    requeue_pi ? VERIFY_WRITE : VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out_put_key1;
 
@@ -1790,7 +1820,7 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
 	 * while the syscall executes.
 	 */
 retry:
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key, VERIFY_READ);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -1941,7 +1971,7 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, int detect,
 	}
 
 retry:
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key, VERIFY_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -2060,7 +2090,7 @@ retry:
 	if ((uval & FUTEX_TID_MASK) != vpid)
 		return -EPERM;
 
-	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, VERIFY_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -2249,7 +2279,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 	debug_rt_mutex_init_waiter(&rt_waiter);
 	rt_waiter.task = NULL;
 
-	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, VERIFY_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 3956f51..8c24294 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2468,7 +2468,7 @@ mark_held_locks(struct task_struct *curr, enum mark_type mark)
 
 		BUG_ON(usage_bit >= LOCK_USAGE_STATES);
 
-		if (hlock_class(hlock)->key == &__lockdep_no_validate__)
+		if (hlock_class(hlock)->key == __lockdep_no_validate__.subkeys)
 			continue;
 
 		if (!mark_lock(curr, hlock, usage_bit))
@@ -2485,23 +2485,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip)
 {
 	struct task_struct *curr = current;
 
-	if (DEBUG_LOCKS_WARN_ON(unlikely(early_boot_irqs_disabled)))
-		return;
-
-	if (unlikely(curr->hardirqs_enabled)) {
-		/*
-		 * Neither irq nor preemption are disabled here
-		 * so this is racy by nature but losing one hit
-		 * in a stat is not a big deal.
-		 */
-		__debug_atomic_inc(redundant_hardirqs_on);
-		return;
-	}
 	/* we'll do an OFF -> ON transition: */
 	curr->hardirqs_enabled = 1;
 
-	if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
-		return;
 	/*
 	 * We are going to turn hardirqs on, so set the
 	 * usage bit for all held locks:
@@ -2529,9 +2515,25 @@ void trace_hardirqs_on_caller(unsigned long ip)
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
+	if (unlikely(current->hardirqs_enabled)) {
+		/*
+		 * Neither irq nor preemption are disabled here
+		 * so this is racy by nature but losing one hit
+		 * in a stat is not a big deal.
+		 */
+		__debug_atomic_inc(redundant_hardirqs_on);
+		return;
+	}
+
 	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
 		return;
 
+	if (DEBUG_LOCKS_WARN_ON(unlikely(early_boot_irqs_disabled)))
+		return;
+
+	if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
+		return;
+
 	current->lockdep_recursion = 1;
 	__trace_hardirqs_on_caller(ip);
 	current->lockdep_recursion = 0;
@@ -2872,10 +2874,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
 void lockdep_init_map(struct lockdep_map *lock, const char *name,
 		      struct lock_class_key *key, int subclass)
 {
-	int i;
-
-	for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
-		lock->class_cache[i] = NULL;
+	memset(lock, 0, sizeof(*lock));
 
 #ifdef CONFIG_LOCK_STAT
 	lock->cpu = raw_smp_processor_id();
diff --git a/mm/slab.c b/mm/slab.c
index 9594740..6d90a09 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -622,6 +622,51 @@ int slab_is_available(void)
 static struct lock_class_key on_slab_l3_key;
 static struct lock_class_key on_slab_alc_key;
 
+static struct lock_class_key debugobj_l3_key;
+static struct lock_class_key debugobj_alc_key;
+
+static void slab_set_lock_classes(struct kmem_cache *cachep,
+		struct lock_class_key *l3_key, struct lock_class_key *alc_key,
+		int q)
+{
+	struct array_cache **alc;
+	struct kmem_list3 *l3;
+	int r;
+
+	l3 = cachep->nodelists[q];
+	if (!l3)
+		return;
+
+	lockdep_set_class(&l3->list_lock, l3_key);
+	alc = l3->alien;
+	/*
+	 * FIXME: This check for BAD_ALIEN_MAGIC
+	 * should go away when common slab code is taught to
+	 * work even without alien caches.
+	 * Currently, non NUMA code returns BAD_ALIEN_MAGIC
+	 * for alloc_alien_cache,
+	 */
+	if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
+		return;
+	for_each_node(r) {
+		if (alc[r])
+			lockdep_set_class(&alc[r]->lock, alc_key);
+	}
+}
+
+static void slab_set_debugobj_lock_classes_node(struct kmem_cache *cachep, int node)
+{
+	slab_set_lock_classes(cachep, &debugobj_l3_key, &debugobj_alc_key, node);
+}
+
+static void slab_set_debugobj_lock_classes(struct kmem_cache *cachep)
+{
+	int node;
+
+	for_each_online_node(node)
+		slab_set_debugobj_lock_classes_node(cachep, node);
+}
+
 static void init_node_lock_keys(int q)
 {
 	struct cache_sizes *s = malloc_sizes;
@@ -630,29 +675,14 @@ static void init_node_lock_keys(int q)
 		return;
 
 	for (s = malloc_sizes; s->cs_size != ULONG_MAX; s++) {
-		struct array_cache **alc;
 		struct kmem_list3 *l3;
-		int r;
 
 		l3 = s->cs_cachep->nodelists[q];
 		if (!l3 || OFF_SLAB(s->cs_cachep))
 			continue;
-		lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
-		alc = l3->alien;
-		/*
-		 * FIXME: This check for BAD_ALIEN_MAGIC
-		 * should go away when common slab code is taught to
-		 * work even without alien caches.
-		 * Currently, non NUMA code returns BAD_ALIEN_MAGIC
-		 * for alloc_alien_cache,
-		 */
-		if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
-			continue;
-		for_each_node(r) {
-			if (alc[r])
-				lockdep_set_class(&alc[r]->lock,
-					&on_slab_alc_key);
-		}
+
+		slab_set_lock_classes(s->cs_cachep, &on_slab_l3_key,
+				&on_slab_alc_key, q);
 	}
 }
 
@@ -671,6 +701,14 @@ static void init_node_lock_keys(int q)
 static inline void init_lock_keys(void)
 {
 }
+
+static void slab_set_debugobj_lock_classes_node(struct kmem_cache *cachep, int node)
+{
+}
+
+static void slab_set_debugobj_lock_classes(struct kmem_cache *cachep)
+{
+}
 #endif
 
 /*
@@ -1264,6 +1302,8 @@ static int __cpuinit cpuup_prepare(long cpu)
 		spin_unlock_irq(&l3->list_lock);
 		kfree(shared);
 		free_alien_cache(alien);
+		if (cachep->flags & SLAB_DEBUG_OBJECTS)
+			slab_set_debugobj_lock_classes_node(cachep, node);
 	}
 	init_node_lock_keys(node);
 
@@ -1626,6 +1666,9 @@ void __init kmem_cache_init_late(void)
 {
 	struct kmem_cache *cachep;
 
+	/* Annotate slab for lockdep -- annotate the malloc caches */
+	init_lock_keys();
+
 	/* 6) resize the head arrays to their final sizes */
 	mutex_lock(&cache_chain_mutex);
 	list_for_each_entry(cachep, &cache_chain, next)
@@ -1636,9 +1679,6 @@ void __init kmem_cache_init_late(void)
 	/* Done! */
 	g_cpucache_up = FULL;
 
-	/* Annotate slab for lockdep -- annotate the malloc caches */
-	init_lock_keys();
-
 	/*
 	 * Register a cpu startup notifier callback that initializes
 	 * cpu_cache_get for all new cpus
@@ -2426,6 +2466,16 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 		goto oops;
 	}
 
+	if (flags & SLAB_DEBUG_OBJECTS) {
+		/*
+		 * Would deadlock through slab_destroy()->call_rcu()->
+		 * debug_object_activate()->kmem_cache_alloc().
+		 */
+		WARN_ON_ONCE(flags & SLAB_DESTROY_BY_RCU);
+
+		slab_set_debugobj_lock_classes(cachep);
+	}
+
 	/* cache setup completed, link it into the list */
 	list_add(&cachep->next, &cache_chain);
 oops:

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2011-04-02 10:21 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2011-04-02 10:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Paul E. McKenney (1):
      rcu: create new rcu_access_index() and use in mce

Steven Rostedt (1):
      WARN_ON_SMP(): Add comment to explain ({0;})


 arch/x86/kernel/cpu/mcheck/mce.c |    2 +-
 include/asm-generic/bug.h        |    7 +++++++
 include/linux/rcupdate.h         |   20 ++++++++++++++++++++
 3 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5a05ef6..3385ea2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1626,7 +1626,7 @@ out:
 static unsigned int mce_poll(struct file *file, poll_table *wait)
 {
 	poll_wait(file, &mce_wait, wait);
-	if (rcu_dereference_check_mce(mcelog.next))
+	if (rcu_access_index(mcelog.next))
 		return POLLIN | POLLRDNORM;
 	if (!mce_apei_read_done && apei_check_mce())
 		return POLLIN | POLLRDNORM;
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index f2d2faf..e5a3f58 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -194,6 +194,13 @@ extern void warn_slowpath_null(const char *file, const int line);
 #ifdef CONFIG_SMP
 # define WARN_ON_SMP(x)			WARN_ON(x)
 #else
+/*
+ * Use of ({0;}) because WARN_ON_SMP(x) may be used either as
+ * a stand alone line statement or as a condition in an if ()
+ * statement.
+ * A simple "0" would cause gcc to give a "statement has no effect"
+ * warning.
+ */
 # define WARN_ON_SMP(x)			({0;})
 #endif
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index af56148..ff422d2 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -339,6 +339,12 @@ extern int rcu_my_thread_group_empty(void);
 		((typeof(*p) __force __kernel *)(p)); \
 	})
 
+#define __rcu_access_index(p, space) \
+	({ \
+		typeof(p) _________p1 = ACCESS_ONCE(p); \
+		rcu_dereference_sparse(p, space); \
+		(_________p1); \
+	})
 #define __rcu_dereference_index_check(p, c) \
 	({ \
 		typeof(p) _________p1 = ACCESS_ONCE(p); \
@@ -429,6 +435,20 @@ extern int rcu_my_thread_group_empty(void);
 #define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? @@@*/
 
 /**
+ * rcu_access_index() - fetch RCU index with no dereferencing
+ * @p: The index to read
+ *
+ * Return the value of the specified RCU-protected index, but omit the
+ * smp_read_barrier_depends() and keep the ACCESS_ONCE().  This is useful
+ * when the value of this index is accessed, but the index is not
+ * dereferenced, for example, when testing an RCU-protected index against
+ * -1.  Although rcu_access_index() may also be used in cases where
+ * update-side locks prevent the value of the index from changing, you
+ * should instead use rcu_dereference_index_protected() for this use case.
+ */
+#define rcu_access_index(p) __rcu_access_index((p), __rcu)
+
+/**
  * rcu_dereference_index_check() - rcu_dereference for indices with debug checking
  * @p: The pointer to read, prior to dereferencing
  * @c: The conditions under which the dereference will take place

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2011-03-25 12:52 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2011-03-25 12:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Namhyung Kim (2):
      vsprintf: Introduce %pB format specifier
      x86, dumpstack: Use %pB format specifier for stack trace

Sergey Senozhatsky (1):
      lockdep: Remove unused 'factor' variable from lockdep_stats_show()

Steven Rostedt (2):
      WARN_ON_SMP(): Allow use in if() statements on UP
      futex: Fix WARN_ON() test for UP


 arch/x86/kernel/dumpstack.c |    2 +-
 include/asm-generic/bug.h   |   28 ++++++++++++++++++++++++++-
 include/linux/kallsyms.h    |    7 ++++++
 kernel/futex.c              |    4 +-
 kernel/kallsyms.c           |   44 ++++++++++++++++++++++++++++++++++++++++--
 kernel/lockdep_proc.c       |    9 +-------
 lib/vsprintf.c              |    7 +++++-
 7 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 999e279..24d0479 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -27,7 +27,7 @@ static int die_counter;
 
 void printk_address(unsigned long address, int reliable)
 {
-	printk(" [<%p>] %s%pS\n", (void *) address,
+	printk(" [<%p>] %s%pB\n", (void *) address,
 			reliable ? "" : "? ", (void *) address);
 }
 
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index c2c9ba0..f2d2faf 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -165,10 +165,36 @@ extern void warn_slowpath_null(const char *file, const int line);
 #define WARN_ON_RATELIMIT(condition, state)			\
 		WARN_ON((condition) && __ratelimit(state))
 
+/*
+ * WARN_ON_SMP() is for cases that the warning is either
+ * meaningless for !SMP or may even cause failures.
+ * This is usually used for cases that we have
+ * WARN_ON(!spin_is_locked(&lock)) checks, as spin_is_locked()
+ * returns 0 for uniprocessor settings.
+ * It can also be used with values that are only defined
+ * on SMP:
+ *
+ * struct foo {
+ *  [...]
+ * #ifdef CONFIG_SMP
+ *	int bar;
+ * #endif
+ * };
+ *
+ * void func(struct foo *zoot)
+ * {
+ *	WARN_ON_SMP(!zoot->bar);
+ *
+ * For CONFIG_SMP, WARN_ON_SMP() should act the same as WARN_ON(),
+ * and should be a nop and return false for uniprocessor.
+ *
+ * if (WARN_ON_SMP(x)) returns true only when CONFIG_SMP is set
+ * and x is true.
+ */
 #ifdef CONFIG_SMP
 # define WARN_ON_SMP(x)			WARN_ON(x)
 #else
-# define WARN_ON_SMP(x)			do { } while (0)
+# define WARN_ON_SMP(x)			({0;})
 #endif
 
 #endif
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index d8e9b3d..0df513b 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -36,6 +36,7 @@ const char *kallsyms_lookup(unsigned long addr,
 
 /* Look up a kernel symbol and return it in a text buffer. */
 extern int sprint_symbol(char *buffer, unsigned long address);
+extern int sprint_backtrace(char *buffer, unsigned long address);
 
 /* Look up a kernel symbol and print it to the kernel messages. */
 extern void __print_symbol(const char *fmt, unsigned long address);
@@ -79,6 +80,12 @@ static inline int sprint_symbol(char *buffer, unsigned long addr)
 	return 0;
 }
 
+static inline int sprint_backtrace(char *buffer, unsigned long addr)
+{
+	*buffer = '\0';
+	return 0;
+}
+
 static inline int lookup_symbol_name(unsigned long addr, char *symname)
 {
 	return -ERANGE;
diff --git a/kernel/futex.c b/kernel/futex.c
index bda4157..823aae3 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -782,8 +782,8 @@ static void __unqueue_futex(struct futex_q *q)
 {
 	struct futex_hash_bucket *hb;
 
-	if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr)
-			|| plist_node_empty(&q->list)))
+	if (WARN_ON_SMP(!q->lock_ptr || !spin_is_locked(q->lock_ptr))
+	    || WARN_ON(plist_node_empty(&q->list)))
 		return;
 
 	hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 6f6d091..59e8799 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -342,13 +342,15 @@ int lookup_symbol_attrs(unsigned long addr, unsigned long *size,
 }
 
 /* Look up a kernel symbol and return it in a text buffer. */
-int sprint_symbol(char *buffer, unsigned long address)
+static int __sprint_symbol(char *buffer, unsigned long address,
+			   int symbol_offset)
 {
 	char *modname;
 	const char *name;
 	unsigned long offset, size;
 	int len;
 
+	address += symbol_offset;
 	name = kallsyms_lookup(address, &size, &offset, &modname, buffer);
 	if (!name)
 		return sprintf(buffer, "0x%lx", address);
@@ -357,17 +359,53 @@ int sprint_symbol(char *buffer, unsigned long address)
 		strcpy(buffer, name);
 	len = strlen(buffer);
 	buffer += len;
+	offset -= symbol_offset;
 
 	if (modname)
-		len += sprintf(buffer, "+%#lx/%#lx [%s]",
-						offset, size, modname);
+		len += sprintf(buffer, "+%#lx/%#lx [%s]", offset, size, modname);
 	else
 		len += sprintf(buffer, "+%#lx/%#lx", offset, size);
 
 	return len;
 }
+
+/**
+ * sprint_symbol - Look up a kernel symbol and return it in a text buffer
+ * @buffer: buffer to be stored
+ * @address: address to lookup
+ *
+ * This function looks up a kernel symbol with @address and stores its name,
+ * offset, size and module name to @buffer if possible. If no symbol was found,
+ * just saves its @address as is.
+ *
+ * This function returns the number of bytes stored in @buffer.
+ */
+int sprint_symbol(char *buffer, unsigned long address)
+{
+	return __sprint_symbol(buffer, address, 0);
+}
+
 EXPORT_SYMBOL_GPL(sprint_symbol);
 
+/**
+ * sprint_backtrace - Look up a backtrace symbol and return it in a text buffer
+ * @buffer: buffer to be stored
+ * @address: address to lookup
+ *
+ * This function is for stack backtrace and does the same thing as
+ * sprint_symbol() but with modified/decreased @address. If there is a
+ * tail-call to the function marked "noreturn", gcc optimized out code after
+ * the call so that the stack-saved return address could point outside of the
+ * caller. This function ensures that kallsyms will find the original caller
+ * by decreasing @address.
+ *
+ * This function returns the number of bytes stored in @buffer.
+ */
+int sprint_backtrace(char *buffer, unsigned long address)
+{
+	return __sprint_symbol(buffer, address, -1);
+}
+
 /* Look up a kernel symbol and print it to the kernel messages. */
 void __print_symbol(const char *fmt, unsigned long address)
 {
diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c
index 1969d2f..71edd2f 100644
--- a/kernel/lockdep_proc.c
+++ b/kernel/lockdep_proc.c
@@ -225,7 +225,7 @@ static int lockdep_stats_show(struct seq_file *m, void *v)
 		      nr_irq_read_safe = 0, nr_irq_read_unsafe = 0,
 		      nr_softirq_read_safe = 0, nr_softirq_read_unsafe = 0,
 		      nr_hardirq_read_safe = 0, nr_hardirq_read_unsafe = 0,
-		      sum_forward_deps = 0, factor = 0;
+		      sum_forward_deps = 0;
 
 	list_for_each_entry(class, &all_lock_classes, lock_entry) {
 
@@ -283,13 +283,6 @@ static int lockdep_stats_show(struct seq_file *m, void *v)
 			nr_hardirq_unsafe * nr_hardirq_safe +
 			nr_list_entries);
 
-	/*
-	 * Estimated factor between direct and indirect
-	 * dependencies:
-	 */
-	if (nr_list_entries)
-		factor = sum_forward_deps / nr_list_entries;
-
 #ifdef CONFIG_PROVE_LOCKING
 	seq_printf(m, " dependency chains:             %11lu [max: %lu]\n",
 			nr_lock_chains, MAX_LOCKDEP_CHAINS);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index d3023df..d9e01fc 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -574,7 +574,9 @@ char *symbol_string(char *buf, char *end, void *ptr,
 	unsigned long value = (unsigned long) ptr;
 #ifdef CONFIG_KALLSYMS
 	char sym[KSYM_SYMBOL_LEN];
-	if (ext != 'f' && ext != 's')
+	if (ext == 'B')
+		sprint_backtrace(sym, value);
+	else if (ext != 'f' && ext != 's')
 		sprint_symbol(sym, value);
 	else
 		kallsyms_lookup(value, NULL, NULL, NULL, sym);
@@ -949,6 +951,7 @@ int kptr_restrict = 1;
  * - 'f' For simple symbolic function names without offset
  * - 'S' For symbolic direct pointers with offset
  * - 's' For symbolic direct pointers without offset
+ * - 'B' For backtraced symbolic direct pointers with offset
  * - 'R' For decoded struct resource, e.g., [mem 0x0-0x1f 64bit pref]
  * - 'r' For raw struct resource, e.g., [mem 0x0-0x1f flags 0x201]
  * - 'M' For a 6-byte MAC address, it prints the address in the
@@ -1008,6 +1011,7 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 		/* Fallthrough */
 	case 'S':
 	case 's':
+	case 'B':
 		return symbol_string(buf, end, ptr, spec, *fmt);
 	case 'R':
 	case 'r':
@@ -1279,6 +1283,7 @@ qualifier:
  * %ps output the name of a text symbol without offset
  * %pF output the name of a function pointer with its offset
  * %pf output the name of a function pointer without its offset
+ * %pB output the name of a backtrace symbol with its offset
  * %pR output the address range in a struct resource with decoded flags
  * %pr output the address range in a struct resource with raw flags
  * %pM output a 6-byte MAC address with colons

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2011-01-21  2:11 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2011-01-21  2:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Tejun Heo, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

This fixes the percpu related boot crash that i've reported.

 Thanks,

	Ingo

------------------>
Tejun Heo (2):
      lockdep: Move early boot local IRQ enable/disable status to init/main.c
      smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c


 arch/x86/xen/enlighten.c     |    2 +-
 include/linux/kernel.h       |    2 ++
 include/linux/lockdep.h      |    8 --------
 init/main.c                  |   13 +++++++++++--
 kernel/lockdep.c             |   18 +-----------------
 kernel/smp.c                 |   11 +++++++----
 kernel/trace/trace_irqsoff.c |    8 --------
 7 files changed, 22 insertions(+), 40 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 7e8d3bc..50542ef 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1194,7 +1194,7 @@ asmlinkage void __init xen_start_kernel(void)
 	per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
 
 	local_irq_disable();
-	early_boot_irqs_off();
+	early_boot_irqs_disabled = true;
 
 	memblock_init();
 
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5a9d905..d07d805 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -243,6 +243,8 @@ extern int test_taint(unsigned flag);
 extern unsigned long get_taint(void);
 extern int root_mountflags;
 
+extern bool early_boot_irqs_disabled;
+
 /* Values used for system_state */
 extern enum system_states {
 	SYSTEM_BOOTING,
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 71c09b2..f638fd7 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -436,16 +436,8 @@ do {								\
 #endif /* CONFIG_LOCKDEP */
 
 #ifdef CONFIG_TRACE_IRQFLAGS
-extern void early_boot_irqs_off(void);
-extern void early_boot_irqs_on(void);
 extern void print_irqtrace_events(struct task_struct *curr);
 #else
-static inline void early_boot_irqs_off(void)
-{
-}
-static inline void early_boot_irqs_on(void)
-{
-}
 static inline void print_irqtrace_events(struct task_struct *curr)
 {
 }
diff --git a/init/main.c b/init/main.c
index 00799c1..33c37c3 100644
--- a/init/main.c
+++ b/init/main.c
@@ -96,6 +96,15 @@ static inline void mark_rodata_ro(void) { }
 extern void tc_init(void);
 #endif
 
+/*
+ * Debug helper: via this flag we know that we are in 'early bootup code'
+ * where only the boot processor is running with IRQ disabled.  This means
+ * two things - IRQ must not be enabled before the flag is cleared and some
+ * operations which are not allowed with IRQ disabled are allowed while the
+ * flag is set.
+ */
+bool early_boot_irqs_disabled __read_mostly;
+
 enum system_states system_state __read_mostly;
 EXPORT_SYMBOL(system_state);
 
@@ -554,7 +563,7 @@ asmlinkage void __init start_kernel(void)
 	cgroup_init_early();
 
 	local_irq_disable();
-	early_boot_irqs_off();
+	early_boot_irqs_disabled = true;
 
 /*
  * Interrupts are still disabled. Do necessary setups, then
@@ -621,7 +630,7 @@ asmlinkage void __init start_kernel(void)
 	if (!irqs_disabled())
 		printk(KERN_CRIT "start_kernel(): bug: interrupts were "
 				 "enabled early\n");
-	early_boot_irqs_on();
+	early_boot_irqs_disabled = false;
 	local_irq_enable();
 
 	/* Interrupts are enabled now so all GFP allocations are safe. */
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 42ba65d..0d2058d 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2292,22 +2292,6 @@ mark_held_locks(struct task_struct *curr, enum mark_type mark)
 }
 
 /*
- * Debugging helper: via this flag we know that we are in
- * 'early bootup code', and will warn about any invalid irqs-on event:
- */
-static int early_boot_irqs_enabled;
-
-void early_boot_irqs_off(void)
-{
-	early_boot_irqs_enabled = 0;
-}
-
-void early_boot_irqs_on(void)
-{
-	early_boot_irqs_enabled = 1;
-}
-
-/*
  * Hardirqs will be enabled:
  */
 void trace_hardirqs_on_caller(unsigned long ip)
@@ -2319,7 +2303,7 @@ void trace_hardirqs_on_caller(unsigned long ip)
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
-	if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled)))
+	if (DEBUG_LOCKS_WARN_ON(unlikely(early_boot_irqs_disabled)))
 		return;
 
 	if (unlikely(curr->hardirqs_enabled)) {
diff --git a/kernel/smp.c b/kernel/smp.c
index 4ec30e0..4b83cd6 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -430,7 +430,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	 * can't happen.
 	 */
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
-		     && !oops_in_progress);
+		     && !oops_in_progress && !early_boot_irqs_disabled);
 
 	/* So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
@@ -533,17 +533,20 @@ void ipi_call_unlock_irq(void)
 #endif /* USE_GENERIC_SMP_HELPERS */
 
 /*
- * Call a function on all processors
+ * Call a function on all processors.  May be used during early boot while
+ * early_boot_irqs_disabled is set.  Use local_irq_save/restore() instead
+ * of local_irq_disable/enable().
  */
 int on_each_cpu(void (*func) (void *info), void *info, int wait)
 {
+	unsigned long flags;
 	int ret = 0;
 
 	preempt_disable();
 	ret = smp_call_function(func, info, wait);
-	local_irq_disable();
+	local_irq_save(flags);
 	func(info);
-	local_irq_enable();
+	local_irq_restore(flags);
 	preempt_enable();
 	return ret;
 }
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 5cf8c60..92b6e1e 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -453,14 +453,6 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1)
  * Stubs:
  */
 
-void early_boot_irqs_off(void)
-{
-}
-
-void early_boot_irqs_on(void)
-{
-}
-
 void trace_softirqs_on(unsigned long ip)
 {
 }

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2011-01-15 15:15 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2011-01-15 15:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Paul E. McKenney, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Paul E. McKenney (2):
      rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
      rcu: avoid pointless blocked-task warnings

Steven Rostedt (1):
      rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()


 init/Kconfig     |   15 ---------------
 kernel/futex.c   |    7 +++----
 kernel/rcutiny.c |    3 ++-
 kernel/srcu.c    |   15 +++++++++++++--
 4 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 8dfd094..bd1ea92 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -497,21 +497,6 @@ config RCU_BOOST_DELAY
 
 	  Accept the default if unsure.
 
-config SRCU_SYNCHRONIZE_DELAY
-	int "Microseconds to delay before waiting for readers"
-	range 0 20
-	default 10
-	help
-	  This option controls how long SRCU delays before entering its
-	  loop waiting on SRCU readers.  The purpose of this loop is
-	  to avoid the unconditional context-switch penalty that would
-	  otherwise be incurred if there was an active SRCU reader,
-	  in a manner similar to adaptive locking schemes.  This should
-	  be set to be a bit longer than the common-case SRCU read-side
-	  critical-section overhead.
-
-	  Accept the default if unsure.
-
 endmenu # "RCU Subsystem"
 
 config IKCONFIG
diff --git a/kernel/futex.c b/kernel/futex.c
index 3019b92..5696d38 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -791,10 +791,9 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
 	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
 
 	/*
-	 * This happens when we have stolen the lock and the original
-	 * pending owner did not enqueue itself back on the rt_mutex.
-	 * Thats not a tragedy. We know that way, that a lock waiter
-	 * is on the fly. We make the futex_q waiter the pending owner.
+	 * It is possible that the next waiter (the one that brought
+	 * this owner to the kernel) timed out and is no longer
+	 * waiting on the lock.
 	 */
 	if (!new_owner)
 		new_owner = this->task;
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 0344937..0c343b9 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -189,7 +189,8 @@ static int rcu_kthread(void *arg)
 	unsigned long flags;
 
 	for (;;) {
-		wait_event(rcu_kthread_wq, have_rcu_kthread_work != 0);
+		wait_event_interruptible(rcu_kthread_wq,
+					 have_rcu_kthread_work != 0);
 		morework = rcu_boost();
 		local_irq_save(flags);
 		work = have_rcu_kthread_work;
diff --git a/kernel/srcu.c b/kernel/srcu.c
index 98d8c1e..73ce23f 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c
@@ -156,6 +156,16 @@ void __srcu_read_unlock(struct srcu_struct *sp, int idx)
 EXPORT_SYMBOL_GPL(__srcu_read_unlock);
 
 /*
+ * We use an adaptive strategy for synchronize_srcu() and especially for
+ * synchronize_srcu_expedited().  We spin for a fixed time period
+ * (defined below) to allow SRCU readers to exit their read-side critical
+ * sections.  If there are still some readers after 10 microseconds,
+ * we repeatedly block for 1-millisecond time periods.  This approach
+ * has done well in testing, so there is no need for a config parameter.
+ */
+#define SYNCHRONIZE_SRCU_READER_DELAY 10
+
+/*
  * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
  */
 static void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
@@ -207,11 +217,12 @@ static void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
 	 * will have finished executing.  We initially give readers
 	 * an arbitrarily chosen 10 microseconds to get out of their
 	 * SRCU read-side critical sections, then loop waiting 1/HZ
-	 * seconds per iteration.
+	 * seconds per iteration.  The 10-microsecond value has done
+	 * very well in testing.
 	 */
 
 	if (srcu_readers_active_idx(sp, idx))
-		udelay(CONFIG_SRCU_SYNCHRONIZE_DELAY);
+		udelay(SYNCHRONIZE_SRCU_READER_DELAY);
 	while (srcu_readers_active_idx(sp, idx))
 		schedule_timeout_interruptible(1);
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-07  8:11                 ` Ingo Molnar
@ 2010-10-07 17:42                   ` Paul E. McKenney
  0 siblings, 0 replies; 97+ messages in thread
From: Paul E. McKenney @ 2010-10-07 17:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Linus Torvalds, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

On Thu, Oct 07, 2010 at 10:11:00AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Wed, Oct 06, 2010 at 08:20:49PM +0200, Ingo Molnar wrote:
> > > 
> > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > Ingo, I have this queued up at:
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent
> > > > 
> > > > It passes targeted testing.  If Eric is OK with it, it is ready to be 
> > > > pulled.  At least as soon as kernel.org gets it out to the mirrors. 
> > > > Commmit 773e3f9357.
> > > 
> > > Now that the original fix is in Linus's tree, can the followup commits 
> > > wait until v2.6.37?
> > > 
> > > Linus, Paul, what's your preference?
> > 
> > I am fine either way.
> 
> Ok, Linus released -rc7 which is probably the last -rc so to reduce 
> final-rc code flux i have pulled your fix into tip:core/rcu, for 
> v2.6.37.

Ah, good, thank you!  I am rebasing on top of that.  Here is hoping that
this approach quiets down the conflict in -next as well.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-06 21:27               ` Paul E. McKenney
@ 2010-10-07  8:11                 ` Ingo Molnar
  2010-10-07 17:42                   ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2010-10-07  8:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Eric Dumazet, Linus Torvalds, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> On Wed, Oct 06, 2010 at 08:20:49PM +0200, Ingo Molnar wrote:
> > 
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > Ingo, I have this queued up at:
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent
> > > 
> > > It passes targeted testing.  If Eric is OK with it, it is ready to be 
> > > pulled.  At least as soon as kernel.org gets it out to the mirrors. 
> > > Commmit 773e3f9357.
> > 
> > Now that the original fix is in Linus's tree, can the followup commits 
> > wait until v2.6.37?
> > 
> > Linus, Paul, what's your preference?
> 
> I am fine either way.

Ok, Linus released -rc7 which is probably the last -rc so to reduce 
final-rc code flux i have pulled your fix into tip:core/rcu, for 
v2.6.37.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-06 18:20             ` Ingo Molnar
@ 2010-10-06 21:27               ` Paul E. McKenney
  2010-10-07  8:11                 ` Ingo Molnar
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2010-10-06 21:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Linus Torvalds, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

On Wed, Oct 06, 2010 at 08:20:49PM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Ingo, I have this queued up at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent
> > 
> > It passes targeted testing.  If Eric is OK with it, it is ready to be 
> > pulled.  At least as soon as kernel.org gets it out to the mirrors. 
> > Commmit 773e3f9357.
> 
> Now that the original fix is in Linus's tree, can the followup commits 
> wait until v2.6.37?
> 
> Linus, Paul, what's your preference?

I am fine either way.

							Thanx, Paul

> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-06  4:59           ` Paul E. McKenney
@ 2010-10-06 18:20             ` Ingo Molnar
  2010-10-06 21:27               ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2010-10-06 18:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Eric Dumazet, Linus Torvalds, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> Ingo, I have this queued up at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent
> 
> It passes targeted testing.  If Eric is OK with it, it is ready to be 
> pulled.  At least as soon as kernel.org gets it out to the mirrors. 
> Commmit 773e3f9357.

Now that the original fix is in Linus's tree, can the followup commits 
wait until v2.6.37?

Linus, Paul, what's your preference?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-06  2:56         ` Eric Dumazet
@ 2010-10-06  4:59           ` Paul E. McKenney
  2010-10-06 18:20             ` Ingo Molnar
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2010-10-06  4:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

On Wed, Oct 06, 2010 at 04:56:54AM +0200, Eric Dumazet wrote:
> Le mardi 05 octobre 2010 à 15:05 -0700, Paul E. McKenney a écrit :
> > On Tue, Oct 05, 2010 at 02:45:07PM -0700, Linus Torvalds wrote:
> > > On Tue, Oct 5, 2010 at 2:09 PM, Paul E. McKenney
> > > <paulmck@linux.vnet.ibm.com> wrote:
> > > >
> > > > Good point!!!  If the following diff looks good to you, I will get it
> > > > tested and pushed.
> > > 
> > > Looks good. Except since I pulled the thing despite my complaint,
> > > you'll also need to undo the extra irqs_disabled() test.
> > 
> > No problem, the diff I sent you combined two commits.  ;-)
> > 
> 
> Hi Paul & Linus
> 
> I originally considered adding the test in rcu_read_lock_bh_held() too,
> but thought (wrongly) :
> 
> 	"rcu_read_unlock_bh() doesnt block hard IRQ, thus
> 	 rcu_read_lock_bh_held() should not test hard irq."
> 
> I remember anyway my first patch was wrong, since I was not using
> irqs_disabled() but in_irq() 
> 
> http://kerneltrap.org/mailarchive/linux-kernel/2010/9/22/4622784

Eric,

I believe that history shows that we were all groping around in the
dark on this one...  Me not least of all.  ;-)

Not sure we are completely there yet, but we are at least in a
feasible state.

Ingo, I have this queued up at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent

It passes targeted testing.  If Eric is OK with it, it is ready to be
pulled.  At least as soon as kernel.org gets it out to the mirrors.
Commmit 773e3f9357.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-05 22:05       ` Paul E. McKenney
@ 2010-10-06  2:56         ` Eric Dumazet
  2010-10-06  4:59           ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Eric Dumazet @ 2010-10-06  2:56 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

Le mardi 05 octobre 2010 à 15:05 -0700, Paul E. McKenney a écrit :
> On Tue, Oct 05, 2010 at 02:45:07PM -0700, Linus Torvalds wrote:
> > On Tue, Oct 5, 2010 at 2:09 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > Good point!!!  If the following diff looks good to you, I will get it
> > > tested and pushed.
> > 
> > Looks good. Except since I pulled the thing despite my complaint,
> > you'll also need to undo the extra irqs_disabled() test.
> 
> No problem, the diff I sent you combined two commits.  ;-)
> 

Hi Paul & Linus

I originally considered adding the test in rcu_read_lock_bh_held() too,
but thought (wrongly) :

	"rcu_read_unlock_bh() doesnt block hard IRQ, thus
	 rcu_read_lock_bh_held() should not test hard irq."

I remember anyway my first patch was wrong, since I was not using
irqs_disabled() but in_irq() 

http://kerneltrap.org/mailarchive/linux-kernel/2010/9/22/4622784

Thanks !



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-05 21:45     ` Linus Torvalds
@ 2010-10-05 22:05       ` Paul E. McKenney
  2010-10-06  2:56         ` Eric Dumazet
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2010-10-05 22:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Eric Dumazet, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

On Tue, Oct 05, 2010 at 02:45:07PM -0700, Linus Torvalds wrote:
> On Tue, Oct 5, 2010 at 2:09 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > Good point!!!  If the following diff looks good to you, I will get it
> > tested and pushed.
> 
> Looks good. Except since I pulled the thing despite my complaint,
> you'll also need to undo the extra irqs_disabled() test.

No problem, the diff I sent you combined two commits.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-05 21:09   ` Paul E. McKenney
@ 2010-10-05 21:45     ` Linus Torvalds
  2010-10-05 22:05       ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2010-10-05 21:45 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, Eric Dumazet, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

On Tue, Oct 5, 2010 at 2:09 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> Good point!!!  If the following diff looks good to you, I will get it
> tested and pushed.

Looks good. Except since I pulled the thing despite my complaint,
you'll also need to undo the extra irqs_disabled() test.

                Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-05 20:15 ` Linus Torvalds
@ 2010-10-05 21:09   ` Paul E. McKenney
  2010-10-05 21:45     ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Paul E. McKenney @ 2010-10-05 21:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Eric Dumazet, linux-kernel, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

On Tue, Oct 05, 2010 at 01:15:02PM -0700, Linus Torvalds wrote:
> On Tue, Oct 5, 2010 at 12:12 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >  #define rcu_dereference_bh(p) \
> > -               rcu_dereference_check(p, rcu_read_lock_bh_held())
> > +               rcu_dereference_check(p, rcu_read_lock_bh_held() || irqs_disabled())
> 
> Wouldn't that irqs_disabled() check have made more sense inside
> rcu_read_lock_bh_held()?
> 
> That's the function that is _supposed_ to check whether bottom halves
> are disabled, no? So why add a workaround for that function being
> buggy/incomplete in one place that uses it?

Good point!!!  If the following diff looks good to you, I will get it
tested and pushed.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 4d16983..0af1dc7 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -86,7 +86,7 @@ int rcu_read_lock_bh_held(void)
 {
 	if (!debug_lockdep_rcu_enabled())
 		return 1;
-	return in_softirq();
+	return in_softirq() || irqs_disabled();
 }
 EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2010-10-05 19:12 Ingo Molnar
@ 2010-10-05 20:15 ` Linus Torvalds
  2010-10-05 21:09   ` Paul E. McKenney
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2010-10-05 20:15 UTC (permalink / raw)
  To: Ingo Molnar, Eric Dumazet
  Cc: linux-kernel, Peter Zijlstra, Andrew Morton, Paul E. McKenney,
	Thomas Gleixner

On Tue, Oct 5, 2010 at 12:12 PM, Ingo Molnar <mingo@elte.hu> wrote:
>  #define rcu_dereference_bh(p) \
> -               rcu_dereference_check(p, rcu_read_lock_bh_held())
> +               rcu_dereference_check(p, rcu_read_lock_bh_held() || irqs_disabled())

Wouldn't that irqs_disabled() check have made more sense inside
rcu_read_lock_bh_held()?

That's the function that is _supposed_ to check whether bottom halves
are disabled, no? So why add a workaround for that function being
buggy/incomplete in one place that uses it?

                    Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2010-10-05 19:12 Ingo Molnar
  2010-10-05 20:15 ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2010-10-05 19:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Andrew Morton, Paul E. McKenney,
	Thomas Gleixner

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Eric Dumazet (1):
      rcu: rcu_read_lock_bh_held(): disabling irqs also disables bh

Heiko Carstens (1):
      generic-ipi: Fix deadlock in __smp_call_function_single


 include/linux/rcupdate.h |    2 +-
 kernel/smp.c             |   17 ++++++++++++++---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9fbc54a..83af1f8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -454,7 +454,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * Makes rcu_dereference_check() do the dirty work.
  */
 #define rcu_dereference_bh(p) \
-		rcu_dereference_check(p, rcu_read_lock_bh_held())
+		rcu_dereference_check(p, rcu_read_lock_bh_held() || irqs_disabled())
 
 /**
  * rcu_dereference_sched - fetch RCU-protected pointer, checking for RCU-sched
diff --git a/kernel/smp.c b/kernel/smp.c
index 75c970c..ed6aacf 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -365,9 +365,10 @@ call:
 EXPORT_SYMBOL_GPL(smp_call_function_any);
 
 /**
- * __smp_call_function_single(): Run a function on another CPU
+ * __smp_call_function_single(): Run a function on a specific CPU
  * @cpu: The CPU to run on.
  * @data: Pre-allocated and setup data structure
+ * @wait: If true, wait until function has completed on specified CPU.
  *
  * Like smp_call_function_single(), but allow caller to pass in a
  * pre-allocated data structure. Useful for embedding @data inside
@@ -376,8 +377,10 @@ EXPORT_SYMBOL_GPL(smp_call_function_any);
 void __smp_call_function_single(int cpu, struct call_single_data *data,
 				int wait)
 {
-	csd_lock(data);
+	unsigned int this_cpu;
+	unsigned long flags;
 
+	this_cpu = get_cpu();
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -387,7 +390,15 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 	WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
 		     && !oops_in_progress);
 
-	generic_exec_single(cpu, data, wait);
+	if (cpu == this_cpu) {
+		local_irq_save(flags);
+		data->func(data->info);
+		local_irq_restore(flags);
+	} else {
+		csd_lock(data);
+		generic_exec_single(cpu, data, wait);
+	}
+	put_cpu();
 }
 
 /**

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2010-09-08 13:04 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2010-09-08 13:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Paul E. McKenney,
	Thomas Gleixner

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Andi Kleen (1):
      gcc-4.6: kernel/*: Fix unused but set warnings

Paul E. McKenney (2):
      MAINTAINERS: Add RCU's public git tree
      pid: make setpgid() system call use RCU read-side critical section

Randy Dunlap (1):
      mutex: Fix annotations to include it in kernel-locking docbook


 Documentation/DocBook/kernel-locking.tmpl |    6 ++++++
 Documentation/mutex-design.txt            |    3 ++-
 MAINTAINERS                               |    2 ++
 include/linux/mutex.h                     |    8 ++++++++
 kernel/debug/kdb/kdb_bp.c                 |    2 --
 kernel/hrtimer.c                          |    3 +--
 kernel/mutex.c                            |   23 +++++++----------------
 kernel/sched_fair.c                       |    3 +--
 kernel/sys.c                              |    2 ++
 kernel/sysctl.c                           |    5 +----
 kernel/trace/ring_buffer.c                |    2 --
 11 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 0b1a3f9..a0d479d 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1961,6 +1961,12 @@ machines due to caching.
    </sect1>
   </chapter>
 
+  <chapter id="apiref">
+   <title>Mutex API reference</title>
+!Iinclude/linux/mutex.h
+!Ekernel/mutex.c
+  </chapter>
+
   <chapter id="references">
    <title>Further reading</title>
 
diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
index c91ccc0..38c10fd 100644
--- a/Documentation/mutex-design.txt
+++ b/Documentation/mutex-design.txt
@@ -9,7 +9,7 @@ firstly, there's nothing wrong with semaphores. But if the simpler
 mutex semantics are sufficient for your code, then there are a couple
 of advantages of mutexes:
 
- - 'struct mutex' is smaller on most architectures: .e.g on x86,
+ - 'struct mutex' is smaller on most architectures: E.g. on x86,
    'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
    A smaller structure size means less RAM footprint, and better
    CPU-cache utilization.
@@ -136,3 +136,4 @@ the APIs of 'struct mutex' have been streamlined:
  void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
  int  mutex_lock_interruptible_nested(struct mutex *lock,
                                       unsigned int subclass);
+ int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
diff --git a/MAINTAINERS b/MAINTAINERS
index 5fa8451..614861a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4799,6 +4799,7 @@ RCUTORTURE MODULE
 M:	Josh Triplett <josh@freedesktop.org>
 M:	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
 S:	Supported
+T:	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
 F:	Documentation/RCU/torture.txt
 F:	kernel/rcutorture.c
 
@@ -4823,6 +4824,7 @@ M:	Dipankar Sarma <dipankar@in.ibm.com>
 M:	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
 W:	http://www.rdrop.com/users/paulmck/rclock/
 S:	Supported
+T:	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
 F:	Documentation/RCU/
 F:	include/linux/rcu*
 F:	include/linux/srcu*
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 878cab4..f363bc8 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -78,6 +78,14 @@ struct mutex_waiter {
 # include <linux/mutex-debug.h>
 #else
 # define __DEBUG_MUTEX_INITIALIZER(lockname)
+/**
+ * mutex_init - initialize the mutex
+ * @mutex: the mutex to be initialized
+ *
+ * Initialize the mutex to unlocked state.
+ *
+ * It is not allowed to initialize an already locked mutex.
+ */
 # define mutex_init(mutex) \
 do {							\
 	static struct lock_class_key __key;		\
diff --git a/kernel/debug/kdb/kdb_bp.c b/kernel/debug/kdb/kdb_bp.c
index 75bd9b3..20059ef 100644
--- a/kernel/debug/kdb/kdb_bp.c
+++ b/kernel/debug/kdb/kdb_bp.c
@@ -274,7 +274,6 @@ static int kdb_bp(int argc, const char **argv)
 	int i, bpno;
 	kdb_bp_t *bp, *bp_check;
 	int diag;
-	int free;
 	char *symname = NULL;
 	long offset = 0ul;
 	int nextarg;
@@ -305,7 +304,6 @@ static int kdb_bp(int argc, const char **argv)
 	/*
 	 * Find an empty bp structure to allocate
 	 */
-	free = KDB_MAXBPT;
 	for (bpno = 0, bp = kdb_breakpoints; bpno < KDB_MAXBPT; bpno++, bp++) {
 		if (bp->bp_free)
 			break;
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index ce66917..1decafb 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1091,11 +1091,10 @@ EXPORT_SYMBOL_GPL(hrtimer_cancel);
  */
 ktime_t hrtimer_get_remaining(const struct hrtimer *timer)
 {
-	struct hrtimer_clock_base *base;
 	unsigned long flags;
 	ktime_t rem;
 
-	base = lock_hrtimer_base(timer, &flags);
+	lock_hrtimer_base(timer, &flags);
 	rem = hrtimer_expires_remaining(timer);
 	unlock_hrtimer_base(timer, &flags);
 
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 4c0b7b3..200407c 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -36,15 +36,6 @@
 # include <asm/mutex.h>
 #endif
 
-/***
- * mutex_init - initialize the mutex
- * @lock: the mutex to be initialized
- * @key: the lock_class_key for the class; used by mutex lock debugging
- *
- * Initialize the mutex to unlocked state.
- *
- * It is not allowed to initialize an already locked mutex.
- */
 void
 __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
 {
@@ -68,7 +59,7 @@ EXPORT_SYMBOL(__mutex_init);
 static __used noinline void __sched
 __mutex_lock_slowpath(atomic_t *lock_count);
 
-/***
+/**
  * mutex_lock - acquire the mutex
  * @lock: the mutex to be acquired
  *
@@ -105,7 +96,7 @@ EXPORT_SYMBOL(mutex_lock);
 
 static __used noinline void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
 
-/***
+/**
  * mutex_unlock - release the mutex
  * @lock: the mutex to be released
  *
@@ -364,8 +355,8 @@ __mutex_lock_killable_slowpath(atomic_t *lock_count);
 static noinline int __sched
 __mutex_lock_interruptible_slowpath(atomic_t *lock_count);
 
-/***
- * mutex_lock_interruptible - acquire the mutex, interruptable
+/**
+ * mutex_lock_interruptible - acquire the mutex, interruptible
  * @lock: the mutex to be acquired
  *
  * Lock the mutex like mutex_lock(), and return 0 if the mutex has
@@ -456,15 +447,15 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
 	return prev == 1;
 }
 
-/***
- * mutex_trylock - try acquire the mutex, without waiting
+/**
+ * mutex_trylock - try to acquire the mutex, without waiting
  * @lock: the mutex to be acquired
  *
  * Try to acquire the mutex atomically. Returns 1 if the mutex
  * has been acquired successfully, and 0 on contention.
  *
  * NOTE: this function follows the spin_trylock() convention, so
- * it is negated to the down_trylock() return values! Be careful
+ * it is negated from the down_trylock() return values! Be careful
  * about this when converting semaphore users to mutexes.
  *
  * This function must not be used in interrupt context. The
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index ab661eb..134f7ed 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1313,7 +1313,7 @@ static struct sched_group *
 find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		  int this_cpu, int load_idx)
 {
-	struct sched_group *idlest = NULL, *this = NULL, *group = sd->groups;
+	struct sched_group *idlest = NULL, *group = sd->groups;
 	unsigned long min_load = ULONG_MAX, this_load = 0;
 	int imbalance = 100 + (sd->imbalance_pct-100)/2;
 
@@ -1348,7 +1348,6 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 
 		if (local_group) {
 			this_load = avg_load;
-			this = group;
 		} else if (avg_load < min_load) {
 			min_load = avg_load;
 			idlest = group;
diff --git a/kernel/sys.c b/kernel/sys.c
index e9ad444..7f5a0cd 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -931,6 +931,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
 		pgid = pid;
 	if (pgid < 0)
 		return -EINVAL;
+	rcu_read_lock();
 
 	/* From this point forward we keep holding onto the tasklist lock
 	 * so that our parent does not change from under us. -DaveM
@@ -984,6 +985,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
 out:
 	/* All paths lead to here, thus we are safe. -DaveM */
 	write_unlock_irq(&tasklist_lock);
+	rcu_read_unlock();
 	return err;
 }
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ca38e8e..f88552c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1713,10 +1713,7 @@ static __init int sysctl_init(void)
 {
 	sysctl_set_parent(NULL, root_table);
 #ifdef CONFIG_SYSCTL_SYSCALL_CHECK
-	{
-		int err;
-		err = sysctl_check_table(current->nsproxy, root_table);
-	}
+	sysctl_check_table(current->nsproxy, root_table);
 #endif
 	return 0;
 }
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 19cccc3..492197e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2985,13 +2985,11 @@ static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer)
 
 static void rb_advance_iter(struct ring_buffer_iter *iter)
 {
-	struct ring_buffer *buffer;
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event;
 	unsigned length;
 
 	cpu_buffer = iter->cpu_buffer;
-	buffer = cpu_buffer->buffer;
 
 	/*
 	 * Check if we are at the end of the buffer.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2010-03-26 14:53 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2010-03-26 14:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney, Peter Zijlstra,
	Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Colin Ian King (1):
      softlockup: Stop spurious softlockup messages due to overflow

Jiri Kosina (1):
      x86: Remove excessive early_res debug output

Lai Jiangshan (2):
      rcu: Fix tracepoints & lockdep false positive
      rcu: Fix local_irq_disable() CONFIG_PROVE_RCU=y false positives

Paul E. McKenney (1):
      rcu: Make rcu_read_lock_bh_held() allow for disabled BH


 include/linux/rcupdate.h   |   23 ++++++-----------------
 include/linux/tracepoint.h |    2 +-
 kernel/rcupdate.c          |   23 +++++++++++++++++++++++
 kernel/softlockup.c        |    4 ++--
 mm/bootmem.c               |   13 -------------
 5 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3024050..872a98e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -123,22 +123,11 @@ static inline int rcu_read_lock_held(void)
 	return lock_is_held(&rcu_lock_map);
 }
 
-/**
- * rcu_read_lock_bh_held - might we be in RCU-bh read-side critical section?
- *
- * If CONFIG_PROVE_LOCKING is selected and enabled, returns nonzero iff in
- * an RCU-bh read-side critical section.  In absence of CONFIG_PROVE_LOCKING,
- * this assumes we are in an RCU-bh read-side critical section unless it can
- * prove otherwise.
- *
- * Check rcu_scheduler_active to prevent false positives during boot.
+/*
+ * rcu_read_lock_bh_held() is defined out of line to avoid #include-file
+ * hell.
  */
-static inline int rcu_read_lock_bh_held(void)
-{
-	if (!debug_lockdep_rcu_enabled())
-		return 1;
-	return lock_is_held(&rcu_bh_lock_map);
-}
+extern int rcu_read_lock_bh_held(void);
 
 /**
  * rcu_read_lock_sched_held - might we be in RCU-sched read-side critical section?
@@ -160,7 +149,7 @@ static inline int rcu_read_lock_sched_held(void)
 		return 1;
 	if (debug_locks)
 		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
-	return lockdep_opinion || preempt_count() != 0;
+	return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
 }
 #else /* #ifdef CONFIG_PREEMPT */
 static inline int rcu_read_lock_sched_held(void)
@@ -191,7 +180,7 @@ static inline int rcu_read_lock_bh_held(void)
 #ifdef CONFIG_PREEMPT
 static inline int rcu_read_lock_sched_held(void)
 {
-	return !rcu_scheduler_active || preempt_count() != 0;
+	return !rcu_scheduler_active || preempt_count() != 0 || irqs_disabled();
 }
 #else /* #ifdef CONFIG_PREEMPT */
 static inline int rcu_read_lock_sched_held(void)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index f59604e..78b4bd3 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -49,7 +49,7 @@ struct tracepoint {
 		void **it_func;						\
 									\
 		rcu_read_lock_sched_notrace();				\
-		it_func = rcu_dereference((tp)->funcs);			\
+		it_func = rcu_dereference_sched((tp)->funcs);		\
 		if (it_func) {						\
 			do {						\
 				((void(*)(proto))(*it_func))(args);	\
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index f1125c1..63fe254 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -45,6 +45,7 @@
 #include <linux/mutex.h>
 #include <linux/module.h>
 #include <linux/kernel_stat.h>
+#include <linux/hardirq.h>
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 static struct lock_class_key rcu_lock_key;
@@ -66,6 +67,28 @@ EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
 int rcu_scheduler_active __read_mostly;
 EXPORT_SYMBOL_GPL(rcu_scheduler_active);
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+
+/**
+ * rcu_read_lock_bh_held - might we be in RCU-bh read-side critical section?
+ *
+ * Check for bottom half being disabled, which covers both the
+ * CONFIG_PROVE_RCU and not cases.  Note that if someone uses
+ * rcu_read_lock_bh(), but then later enables BH, lockdep (if enabled)
+ * will show the situation.
+ *
+ * Check debug_lockdep_rcu_enabled() to prevent false positives during boot.
+ */
+int rcu_read_lock_bh_held(void)
+{
+	if (!debug_lockdep_rcu_enabled())
+		return 1;
+	return in_softirq();
+}
+EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
+
+#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
+
 /*
  * This function is invoked towards the end of the scheduler's initialization
  * process.  Before this is called, the idle task might contain
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 0d4c789..4b493f6 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -155,11 +155,11 @@ void softlockup_tick(void)
 	 * Wake up the high-prio watchdog task twice per
 	 * threshold timespan.
 	 */
-	if (now > touch_ts + softlockup_thresh/2)
+	if (time_after(now - softlockup_thresh/2, touch_ts))
 		wake_up_process(per_cpu(softlockup_watchdog, this_cpu));
 
 	/* Warn about unreasonable delays: */
-	if (now <= (touch_ts + softlockup_thresh))
+	if (time_before_eq(now - softlockup_thresh, touch_ts))
 		return;
 
 	per_cpu(softlockup_print_ts, this_cpu) = touch_ts;
diff --git a/mm/bootmem.c b/mm/bootmem.c
index d7c791e..9b13446 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -180,19 +180,12 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 	end_aligned = end & ~(BITS_PER_LONG - 1);
 
 	if (end_aligned <= start_aligned) {
-#if 1
-		printk(KERN_DEBUG " %lx - %lx\n", start, end);
-#endif
 		for (i = start; i < end; i++)
 			__free_pages_bootmem(pfn_to_page(i), 0);
 
 		return;
 	}
 
-#if 1
-	printk(KERN_DEBUG " %lx %lx - %lx %lx\n",
-		 start, start_aligned, end_aligned, end);
-#endif
 	for (i = start; i < start_aligned; i++)
 		__free_pages_bootmem(pfn_to_page(i), 0);
 
@@ -428,9 +421,6 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
 {
 #ifdef CONFIG_NO_BOOTMEM
 	free_early(physaddr, physaddr + size);
-#if 0
-	printk(KERN_DEBUG "free %lx %lx\n", physaddr, size);
-#endif
 #else
 	unsigned long start, end;
 
@@ -456,9 +446,6 @@ void __init free_bootmem(unsigned long addr, unsigned long size)
 {
 #ifdef CONFIG_NO_BOOTMEM
 	free_early(addr, addr + size);
-#if 0
-	printk(KERN_DEBUG "free %lx %lx\n", addr, size);
-#endif
 #else
 	unsigned long start, end;
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2010-03-13 16:35 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2010-03-13 16:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Paul E. McKenney, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
FUJITA Tomonori (1):
      x86/gart: Unexport gart_iommu_aperture

Luca Barbieri (1):
      locking: Make sparse work with inline spinlocks and rwlocks

Paul E. McKenney (13):
      rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
      rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
      sched, rcu: Fix rcu_dereference() for RCU-lockdep
      rcu: Use wrapper function instead of exporting tasklist_lock
      rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
      rcu: Add control variables to lockdep_rcu_dereference() diagnostics
      rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
      rcu: Suppress __mpol_dup() false positive from RCU lockdep
      rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
      rcu: Suppress RCU lockdep warnings during early boot
      ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
      rcu: Increase RCU CPU stall timeouts if PROVE_RCU
      x86/mce: Fix RCU lockdep splats

Thomas Gleixner (1):
      futex: Protect pid lookup in compat code with RCU


 arch/x86/kernel/aperture_64.c      |    1 -
 arch/x86/kernel/cpu/mcheck/mce.c   |   11 ++++++--
 include/linux/cred.h               |    2 +-
 include/linux/rcupdate.h           |   45 ++++++++++++++++++++++++++++-------
 include/linux/rwlock.h             |   20 ++++++++--------
 include/linux/sched.h              |    4 +++
 include/linux/spinlock.h           |   13 ++++++----
 include/trace/ftrace.h             |    4 +-
 kernel/exit.c                      |    2 +-
 kernel/fork.c                      |    9 ++++++-
 kernel/futex_compat.c              |    6 ++--
 kernel/lockdep.c                   |    1 +
 kernel/pid.c                       |    4 ++-
 kernel/rcutree.h                   |   21 ++++++++++++----
 kernel/rcutree_plugin.h            |    8 ++++--
 kernel/sched_fair.c                |    2 +-
 kernel/trace/ftrace.c              |   22 ++++++++++-------
 kernel/trace/trace_event_profile.c |    4 +-
 mm/mempolicy.c                     |    2 +
 19 files changed, 123 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index f147a95..3704997 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -31,7 +31,6 @@
 #include <asm/x86_init.h>
 
 int gart_iommu_aperture;
-EXPORT_SYMBOL_GPL(gart_iommu_aperture);
 int gart_iommu_aperture_disabled __initdata;
 int gart_iommu_aperture_allowed __initdata;
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a8aacd4..4442e9e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -46,6 +46,11 @@
 
 #include "mce-internal.h"
 
+#define rcu_dereference_check_mce(p) \
+	rcu_dereference_check((p), \
+			      rcu_read_lock_sched_held() || \
+			      lockdep_is_held(&mce_read_mutex))
+
 #define CREATE_TRACE_POINTS
 #include <trace/events/mce.h>
 
@@ -158,7 +163,7 @@ void mce_log(struct mce *mce)
 	mce->finished = 0;
 	wmb();
 	for (;;) {
-		entry = rcu_dereference(mcelog.next);
+		entry = rcu_dereference_check_mce(mcelog.next);
 		for (;;) {
 			/*
 			 * When the buffer fills up discard new entries.
@@ -1500,7 +1505,7 @@ static ssize_t mce_read(struct file *filp, char __user *ubuf, size_t usize,
 		return -ENOMEM;
 
 	mutex_lock(&mce_read_mutex);
-	next = rcu_dereference(mcelog.next);
+	next = rcu_dereference_check_mce(mcelog.next);
 
 	/* Only supports full reads right now */
 	if (*off != 0 || usize < MCE_LOG_LEN*sizeof(struct mce)) {
@@ -1565,7 +1570,7 @@ timeout:
 static unsigned int mce_poll(struct file *file, poll_table *wait)
 {
 	poll_wait(file, &mce_wait, wait);
-	if (rcu_dereference(mcelog.next))
+	if (rcu_dereference_check_mce(mcelog.next))
 		return POLLIN | POLLRDNORM;
 	return 0;
 }
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 4db09f8..52507c3 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -280,7 +280,7 @@ static inline void put_cred(const struct cred *_cred)
  * task or by holding tasklist_lock to prevent it from being unlinked.
  */
 #define __task_cred(task) \
-	((const struct cred *)(rcu_dereference_check((task)->real_cred, rcu_read_lock_held() || lockdep_is_held(&tasklist_lock))))
+	((const struct cred *)(rcu_dereference_check((task)->real_cred, rcu_read_lock_held() || lockdep_tasklist_lock_is_held())))
 
 /**
  * get_task_cred - Get another task's objective credentials
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index c843736..75921b8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -97,6 +97,11 @@ extern struct lockdep_map rcu_sched_lock_map;
 # define rcu_read_release_sched() \
 		lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
 
+static inline int debug_lockdep_rcu_enabled(void)
+{
+	return likely(rcu_scheduler_active && debug_locks);
+}
+
 /**
  * rcu_read_lock_held - might we be in RCU read-side critical section?
  *
@@ -104,12 +109,14 @@ extern struct lockdep_map rcu_sched_lock_map;
  * an RCU read-side critical section.  In absence of CONFIG_PROVE_LOCKING,
  * this assumes we are in an RCU read-side critical section unless it can
  * prove otherwise.
+ *
+ * Check rcu_scheduler_active to prevent false positives during boot.
  */
 static inline int rcu_read_lock_held(void)
 {
-	if (debug_locks)
-		return lock_is_held(&rcu_lock_map);
-	return 1;
+	if (!debug_lockdep_rcu_enabled())
+		return 1;
+	return lock_is_held(&rcu_lock_map);
 }
 
 /**
@@ -119,12 +126,14 @@ static inline int rcu_read_lock_held(void)
  * an RCU-bh read-side critical section.  In absence of CONFIG_PROVE_LOCKING,
  * this assumes we are in an RCU-bh read-side critical section unless it can
  * prove otherwise.
+ *
+ * Check rcu_scheduler_active to prevent false positives during boot.
  */
 static inline int rcu_read_lock_bh_held(void)
 {
-	if (debug_locks)
-		return lock_is_held(&rcu_bh_lock_map);
-	return 1;
+	if (!debug_lockdep_rcu_enabled())
+		return 1;
+	return lock_is_held(&rcu_bh_lock_map);
 }
 
 /**
@@ -135,15 +144,26 @@ static inline int rcu_read_lock_bh_held(void)
  * this assumes we are in an RCU-sched read-side critical section unless it
  * can prove otherwise.  Note that disabling of preemption (including
  * disabling irqs) counts as an RCU-sched read-side critical section.
+ *
+ * Check rcu_scheduler_active to prevent false positives during boot.
  */
+#ifdef CONFIG_PREEMPT
 static inline int rcu_read_lock_sched_held(void)
 {
 	int lockdep_opinion = 0;
 
+	if (!debug_lockdep_rcu_enabled())
+		return 1;
 	if (debug_locks)
 		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
-	return lockdep_opinion || preempt_count() != 0 || !rcu_scheduler_active;
+	return lockdep_opinion || preempt_count() != 0;
+}
+#else /* #ifdef CONFIG_PREEMPT */
+static inline int rcu_read_lock_sched_held(void)
+{
+	return 1;
 }
+#endif /* #else #ifdef CONFIG_PREEMPT */
 
 #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
@@ -164,10 +184,17 @@ static inline int rcu_read_lock_bh_held(void)
 	return 1;
 }
 
+#ifdef CONFIG_PREEMPT
 static inline int rcu_read_lock_sched_held(void)
 {
-	return preempt_count() != 0 || !rcu_scheduler_active;
+	return !rcu_scheduler_active || preempt_count() != 0;
+}
+#else /* #ifdef CONFIG_PREEMPT */
+static inline int rcu_read_lock_sched_held(void)
+{
+	return 1;
 }
+#endif /* #else #ifdef CONFIG_PREEMPT */
 
 #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
@@ -184,7 +211,7 @@ static inline int rcu_read_lock_sched_held(void)
  */
 #define rcu_dereference_check(p, c) \
 	({ \
-		if (debug_locks && !(c)) \
+		if (debug_lockdep_rcu_enabled() && !(c)) \
 			lockdep_rcu_dereference(__FILE__, __LINE__); \
 		rcu_dereference_raw(p); \
 	})
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index 71e0b00..bc2994e 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -29,25 +29,25 @@ do {								\
 #endif
 
 #ifdef CONFIG_DEBUG_SPINLOCK
- extern void do_raw_read_lock(rwlock_t *lock);
+ extern void do_raw_read_lock(rwlock_t *lock) __acquires(lock);
 #define do_raw_read_lock_flags(lock, flags) do_raw_read_lock(lock)
  extern int do_raw_read_trylock(rwlock_t *lock);
- extern void do_raw_read_unlock(rwlock_t *lock);
- extern void do_raw_write_lock(rwlock_t *lock);
+ extern void do_raw_read_unlock(rwlock_t *lock) __releases(lock);
+ extern void do_raw_write_lock(rwlock_t *lock) __acquires(lock);
 #define do_raw_write_lock_flags(lock, flags) do_raw_write_lock(lock)
  extern int do_raw_write_trylock(rwlock_t *lock);
- extern void do_raw_write_unlock(rwlock_t *lock);
+ extern void do_raw_write_unlock(rwlock_t *lock) __releases(lock);
 #else
-# define do_raw_read_lock(rwlock)	arch_read_lock(&(rwlock)->raw_lock)
+# define do_raw_read_lock(rwlock)	do {__acquire(lock); arch_read_lock(&(rwlock)->raw_lock); } while (0)
 # define do_raw_read_lock_flags(lock, flags) \
-		arch_read_lock_flags(&(lock)->raw_lock, *(flags))
+		do {__acquire(lock); arch_read_lock_flags(&(lock)->raw_lock, *(flags)); } while (0)
 # define do_raw_read_trylock(rwlock)	arch_read_trylock(&(rwlock)->raw_lock)
-# define do_raw_read_unlock(rwlock)	arch_read_unlock(&(rwlock)->raw_lock)
-# define do_raw_write_lock(rwlock)	arch_write_lock(&(rwlock)->raw_lock)
+# define do_raw_read_unlock(rwlock)	do {arch_read_unlock(&(rwlock)->raw_lock); __release(lock); } while (0)
+# define do_raw_write_lock(rwlock)	do {__acquire(lock); arch_write_lock(&(rwlock)->raw_lock); } while (0)
 # define do_raw_write_lock_flags(lock, flags) \
-		arch_write_lock_flags(&(lock)->raw_lock, *(flags))
+		do {__acquire(lock); arch_write_lock_flags(&(lock)->raw_lock, *(flags)); } while (0)
 # define do_raw_write_trylock(rwlock)	arch_write_trylock(&(rwlock)->raw_lock)
-# define do_raw_write_unlock(rwlock)	arch_write_unlock(&(rwlock)->raw_lock)
+# define do_raw_write_unlock(rwlock)	do {arch_write_unlock(&(rwlock)->raw_lock); __release(lock); } while (0)
 #endif
 
 #define read_can_lock(rwlock)		arch_read_can_lock(&(rwlock)->raw_lock)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0eef87b..a47af20 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -258,6 +258,10 @@ extern spinlock_t mmlist_lock;
 
 struct task_struct;
 
+#ifdef CONFIG_PROVE_RCU
+extern int lockdep_tasklist_lock_is_held(void);
+#endif /* #ifdef CONFIG_PROVE_RCU */
+
 extern void sched_init(void);
 extern void sched_init_smp(void);
 extern asmlinkage void schedule_tail(struct task_struct *prev);
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 8608821..89fac6a 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -128,19 +128,21 @@ static inline void smp_mb__after_lock(void) { smp_mb(); }
 #define raw_spin_unlock_wait(lock)	arch_spin_unlock_wait(&(lock)->raw_lock)
 
 #ifdef CONFIG_DEBUG_SPINLOCK
- extern void do_raw_spin_lock(raw_spinlock_t *lock);
+ extern void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock);
 #define do_raw_spin_lock_flags(lock, flags) do_raw_spin_lock(lock)
  extern int do_raw_spin_trylock(raw_spinlock_t *lock);
- extern void do_raw_spin_unlock(raw_spinlock_t *lock);
+ extern void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock);
 #else
-static inline void do_raw_spin_lock(raw_spinlock_t *lock)
+static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock)
 {
+	__acquire(lock);
 	arch_spin_lock(&lock->raw_lock);
 }
 
 static inline void
-do_raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long *flags)
+do_raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long *flags) __acquires(lock)
 {
+	__acquire(lock);
 	arch_spin_lock_flags(&lock->raw_lock, *flags);
 }
 
@@ -149,9 +151,10 @@ static inline int do_raw_spin_trylock(raw_spinlock_t *lock)
 	return arch_spin_trylock(&(lock)->raw_lock);
 }
 
-static inline void do_raw_spin_unlock(raw_spinlock_t *lock)
+static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 {
 	arch_spin_unlock(&lock->raw_lock);
+	__release(lock);
 }
 #endif
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 0804cd5..601ad77 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -699,9 +699,9 @@ __attribute__((section("_ftrace_events"))) event_##call = {		\
  *	__cpu = smp_processor_id();
  *
  *	if (in_nmi())
- *		trace_buf = rcu_dereference(perf_trace_buf_nmi);
+ *		trace_buf = rcu_dereference_sched(perf_trace_buf_nmi);
  *	else
- *		trace_buf = rcu_dereference(perf_trace_buf);
+ *		trace_buf = rcu_dereference_sched(perf_trace_buf);
  *
  *	if (!trace_buf)
  *		goto end;
diff --git a/kernel/exit.c b/kernel/exit.c
index 45ed043..fed3a4d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -87,7 +87,7 @@ static void __exit_signal(struct task_struct *tsk)
 
 	sighand = rcu_dereference_check(tsk->sighand,
 					rcu_read_lock_held() ||
-					lockdep_is_held(&tasklist_lock));
+					lockdep_tasklist_lock_is_held());
 	spin_lock(&sighand->siglock);
 
 	posix_cpu_timers_exit(tsk);
diff --git a/kernel/fork.c b/kernel/fork.c
index 17bbf09..8691c54 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -86,7 +86,14 @@ int max_threads;		/* tunable limit on nr_threads */
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
-EXPORT_SYMBOL_GPL(tasklist_lock);
+
+#ifdef CONFIG_PROVE_RCU
+int lockdep_tasklist_lock_is_held(void)
+{
+	return lockdep_is_held(&tasklist_lock);
+}
+EXPORT_SYMBOL_GPL(lockdep_tasklist_lock_is_held);
+#endif /* #ifdef CONFIG_PROVE_RCU */
 
 int nr_processes(void)
 {
diff --git a/kernel/futex_compat.c b/kernel/futex_compat.c
index 2357165..d49afb2 100644
--- a/kernel/futex_compat.c
+++ b/kernel/futex_compat.c
@@ -146,7 +146,7 @@ compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
 		struct task_struct *p;
 
 		ret = -ESRCH;
-		read_lock(&tasklist_lock);
+		rcu_read_lock();
 		p = find_task_by_vpid(pid);
 		if (!p)
 			goto err_unlock;
@@ -157,7 +157,7 @@ compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
 		    !capable(CAP_SYS_PTRACE))
 			goto err_unlock;
 		head = p->compat_robust_list;
-		read_unlock(&tasklist_lock);
+		rcu_read_unlock();
 	}
 
 	if (put_user(sizeof(*head), len_ptr))
@@ -165,7 +165,7 @@ compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
 	return put_user(ptr_to_compat(head), head_ptr);
 
 err_unlock:
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 
 	return ret;
 }
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 0c30d04..681bc2e 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -3822,6 +3822,7 @@ void lockdep_rcu_dereference(const char *file, const int line)
 	printk("%s:%d invoked rcu_dereference_check() without protection!\n",
 			file, line);
 	printk("\nother info that might help us debug this:\n\n");
+	printk("\nrcu_scheduler_active = %d, debug_locks = %d\n", rcu_scheduler_active, debug_locks);
 	lockdep_print_held_locks(curr);
 	printk("\nstack backtrace:\n");
 	dump_stack();
diff --git a/kernel/pid.c b/kernel/pid.c
index b08e697..b606440 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -367,7 +367,9 @@ struct task_struct *pid_task(struct pid *pid, enum pid_type type)
 	struct task_struct *result = NULL;
 	if (pid) {
 		struct hlist_node *first;
-		first = rcu_dereference_check(pid->tasks[type].first, rcu_read_lock_held() || lockdep_is_held(&tasklist_lock));
+		first = rcu_dereference_check(pid->tasks[type].first,
+					      rcu_read_lock_held() ||
+					      lockdep_tasklist_lock_is_held());
 		if (first)
 			result = hlist_entry(first, struct task_struct, pids[(type)].node);
 	}
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 1439eb5..4a525a3 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -246,12 +246,21 @@ struct rcu_data {
 
 #define RCU_JIFFIES_TILL_FORCE_QS	 3	/* for rsp->jiffies_force_qs */
 #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
-#define RCU_SECONDS_TILL_STALL_CHECK   (10 * HZ)  /* for rsp->jiffies_stall */
-#define RCU_SECONDS_TILL_STALL_RECHECK (30 * HZ)  /* for rsp->jiffies_stall */
-#define RCU_STALL_RAT_DELAY		2	  /* Allow other CPUs time */
-						  /*  to take at least one */
-						  /*  scheduling clock irq */
-						  /*  before ratting on them. */
+
+#ifdef CONFIG_PROVE_RCU
+#define RCU_STALL_DELAY_DELTA	       (5 * HZ)
+#else
+#define RCU_STALL_DELAY_DELTA	       0
+#endif
+
+#define RCU_SECONDS_TILL_STALL_CHECK   (10 * HZ + RCU_STALL_DELAY_DELTA)
+						/* for rsp->jiffies_stall */
+#define RCU_SECONDS_TILL_STALL_RECHECK (30 * HZ + RCU_STALL_DELAY_DELTA)
+						/* for rsp->jiffies_stall */
+#define RCU_STALL_RAT_DELAY		2	/* Allow other CPUs time */
+						/*  to take at least one */
+						/*  scheduling clock irq */
+						/*  before ratting on them. */
 
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
 
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 464ad2c..79b53bd 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1010,6 +1010,10 @@ int rcu_needs_cpu(int cpu)
 	int c = 0;
 	int thatcpu;
 
+	/* Check for being in the holdoff period. */
+	if (per_cpu(rcu_dyntick_holdoff, cpu) == jiffies)
+		return rcu_needs_cpu_quick_check(cpu);
+
 	/* Don't bother unless we are the last non-dyntick-idle CPU. */
 	for_each_cpu_not(thatcpu, nohz_cpu_mask)
 		if (thatcpu != cpu) {
@@ -1041,10 +1045,8 @@ int rcu_needs_cpu(int cpu)
 	}
 
 	/* If RCU callbacks are still pending, RCU still needs this CPU. */
-	if (c) {
+	if (c)
 		raise_softirq(RCU_SOFTIRQ);
-		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
-	}
 	return c;
 }
 
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 3e1fd96..5a5ea2c 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3476,7 +3476,7 @@ static void run_rebalance_domains(struct softirq_action *h)
 
 static inline int on_null_domain(int cpu)
 {
-	return !rcu_dereference(cpu_rq(cpu)->sd);
+	return !rcu_dereference_sched(cpu_rq(cpu)->sd);
 }
 
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8378357..8c5adc0 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -27,6 +27,7 @@
 #include <linux/ctype.h>
 #include <linux/list.h>
 #include <linux/hash.h>
+#include <linux/rcupdate.h>
 
 #include <trace/events/sched.h>
 
@@ -88,18 +89,22 @@ ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
 static int ftrace_set_func(unsigned long *array, int *idx, char *buffer);
 #endif
 
+/*
+ * Traverse the ftrace_list, invoking all entries.  The reason that we
+ * can use rcu_dereference_raw() is that elements removed from this list
+ * are simply leaked, so there is no need to interact with a grace-period
+ * mechanism.  The rcu_dereference_raw() calls are needed to handle
+ * concurrent insertions into the ftrace_list.
+ *
+ * Silly Alpha and silly pointer-speculation compiler optimizations!
+ */
 static void ftrace_list_func(unsigned long ip, unsigned long parent_ip)
 {
-	struct ftrace_ops *op = ftrace_list;
-
-	/* in case someone actually ports this to alpha! */
-	read_barrier_depends();
+	struct ftrace_ops *op = rcu_dereference_raw(ftrace_list); /*see above*/
 
 	while (op != &ftrace_list_end) {
-		/* silly alpha */
-		read_barrier_depends();
 		op->func(ip, parent_ip);
-		op = op->next;
+		op = rcu_dereference_raw(op->next); /*see above*/
 	};
 }
 
@@ -154,8 +159,7 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
 	 * the ops->next pointer is valid before another CPU sees
 	 * the ops pointer included into the ftrace_list.
 	 */
-	smp_wmb();
-	ftrace_list = ops;
+	rcu_assign_pointer(ftrace_list, ops);
 
 	if (ftrace_enabled) {
 		ftrace_func_t func;
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index f0d6930..c1cc3ab 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -138,9 +138,9 @@ __kprobes void *ftrace_perf_buf_prepare(int size, unsigned short type,
 	cpu = smp_processor_id();
 
 	if (in_nmi())
-		trace_buf = rcu_dereference(perf_trace_buf_nmi);
+		trace_buf = rcu_dereference_sched(perf_trace_buf_nmi);
 	else
-		trace_buf = rcu_dereference(perf_trace_buf);
+		trace_buf = rcu_dereference_sched(perf_trace_buf);
 
 	if (!trace_buf)
 		goto err;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 290fb5b..3cec080 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1730,10 +1730,12 @@ struct mempolicy *__mpol_dup(struct mempolicy *old)
 
 	if (!new)
 		return ERR_PTR(-ENOMEM);
+	rcu_read_lock();
 	if (current_cpuset_is_being_rebound()) {
 		nodemask_t mems = cpuset_mems_allowed(current);
 		mpol_rebind_policy(old, &mems);
 	}
+	rcu_read_unlock();
 	*new = *old;
 	atomic_set(&new->refcnt, 1);
 	return new;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-12-18 18:52 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-12-18 18:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Paul E. McKenney,
	Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Thomas Gleixner (3):
      signal: Fix racy access to __task_cred in kill_pid_info_as_uid()
      signals: Fix more rcu assumptions
      sys: Fix missing rcu protection for __task_cred() access


 kernel/signal.c |   25 ++++++++++++++-----------
 kernel/sys.c    |    2 ++
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 6b982f2..f67545f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -218,13 +218,13 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t flags, int override_rlimi
 	struct user_struct *user;
 
 	/*
-	 * We won't get problems with the target's UID changing under us
-	 * because changing it requires RCU be used, and if t != current, the
-	 * caller must be holding the RCU readlock (by way of a spinlock) and
-	 * we use RCU protection here
+	 * Protect access to @t credentials. This can go away when all
+	 * callers hold rcu read lock.
 	 */
+	rcu_read_lock();
 	user = get_uid(__task_cred(t)->user);
 	atomic_inc(&user->sigpending);
+	rcu_read_unlock();
 
 	if (override_rlimit ||
 	    atomic_read(&user->sigpending) <=
@@ -1175,11 +1175,12 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
 	int ret = -EINVAL;
 	struct task_struct *p;
 	const struct cred *pcred;
+	unsigned long flags;
 
 	if (!valid_signal(sig))
 		return ret;
 
-	read_lock(&tasklist_lock);
+	rcu_read_lock();
 	p = pid_task(pid, PIDTYPE_PID);
 	if (!p) {
 		ret = -ESRCH;
@@ -1196,14 +1197,16 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
 	ret = security_task_kill(p, info, sig, secid);
 	if (ret)
 		goto out_unlock;
-	if (sig && p->sighand) {
-		unsigned long flags;
-		spin_lock_irqsave(&p->sighand->siglock, flags);
-		ret = __send_signal(sig, info, p, 1, 0);
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+
+	if (sig) {
+		if (lock_task_sighand(p, &flags)) {
+			ret = __send_signal(sig, info, p, 1, 0);
+			unlock_task_sighand(p, &flags);
+		} else
+			ret = -ESRCH;
 	}
 out_unlock:
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(kill_pid_info_as_uid);
diff --git a/kernel/sys.c b/kernel/sys.c
index 9968c5f..bc1dc61 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -163,6 +163,7 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 	if (niceval > 19)
 		niceval = 19;
 
+	rcu_read_lock();
 	read_lock(&tasklist_lock);
 	switch (which) {
 		case PRIO_PROCESS:
@@ -200,6 +201,7 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 	}
 out_unlock:
 	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 out:
 	return error;
 }

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-11-10 17:53 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-11-10 17:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Paul E. McKenney (1):
      rcu: Fix long-grace-period race between forcing and initialization

Soeren Sandmann (2):
      highmem: Fix race in debug_kmap_atomic() which could cause warn_count to underflow
      highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE

Thomas Gleixner (1):
      uids: Prevent tear down race


 kernel/rcutree.c |   16 +++++++++++-----
 kernel/rcutree.h |    7 ++++---
 kernel/user.c    |    2 +-
 mm/highmem.c     |   17 ++++++++++++-----
 4 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 0536125..f3077c0 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -59,7 +59,7 @@
 		NUM_RCU_LVL_2, \
 		NUM_RCU_LVL_3, /* == MAX_RCU_LVLS */ \
 	}, \
-	.signaled = RCU_SIGNAL_INIT, \
+	.signaled = RCU_GP_IDLE, \
 	.gpnum = -300, \
 	.completed = -300, \
 	.onofflock = __SPIN_LOCK_UNLOCKED(&name.onofflock), \
@@ -657,14 +657,17 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 	 * irqs disabled.
 	 */
 	rcu_for_each_node_breadth_first(rsp, rnp) {
-		spin_lock(&rnp->lock);	/* irqs already disabled. */
+		spin_lock(&rnp->lock);		/* irqs already disabled. */
 		rcu_preempt_check_blocked_tasks(rnp);
 		rnp->qsmask = rnp->qsmaskinit;
 		rnp->gpnum = rsp->gpnum;
-		spin_unlock(&rnp->lock);	/* irqs already disabled. */
+		spin_unlock(&rnp->lock);	/* irqs remain disabled. */
 	}
 
+	rnp = rcu_get_root(rsp);
+	spin_lock(&rnp->lock);			/* irqs already disabled. */
 	rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */
+	spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 	spin_unlock_irqrestore(&rsp->onofflock, flags);
 }
 
@@ -706,6 +709,7 @@ static void cpu_quiet_msk_finish(struct rcu_state *rsp, unsigned long flags)
 {
 	WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
 	rsp->completed = rsp->gpnum;
+	rsp->signaled = RCU_GP_IDLE;
 	rcu_process_gp_end(rsp, rsp->rda[smp_processor_id()]);
 	rcu_start_gp(rsp, flags);  /* releases root node's rnp->lock. */
 }
@@ -1162,9 +1166,10 @@ static void force_quiescent_state(struct rcu_state *rsp, int relaxed)
 	}
 	spin_unlock(&rnp->lock);
 	switch (signaled) {
+	case RCU_GP_IDLE:
 	case RCU_GP_INIT:
 
-		break; /* grace period still initializing, ignore. */
+		break; /* grace period idle or initializing, ignore. */
 
 	case RCU_SAVE_DYNTICK:
 
@@ -1178,7 +1183,8 @@ static void force_quiescent_state(struct rcu_state *rsp, int relaxed)
 
 		/* Update state, record completion counter. */
 		spin_lock(&rnp->lock);
-		if (lastcomp == rsp->completed) {
+		if (lastcomp == rsp->completed &&
+		    rsp->signaled == RCU_SAVE_DYNTICK) {
 			rsp->signaled = RCU_FORCE_QS;
 			dyntick_record_completed(rsp, lastcomp);
 		}
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 1823c6e..1899023 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -201,9 +201,10 @@ struct rcu_data {
 };
 
 /* Values for signaled field in struct rcu_state. */
-#define RCU_GP_INIT		0	/* Grace period being initialized. */
-#define RCU_SAVE_DYNTICK	1	/* Need to scan dyntick state. */
-#define RCU_FORCE_QS		2	/* Need to force quiescent state. */
+#define RCU_GP_IDLE		0	/* No grace period in progress. */
+#define RCU_GP_INIT		1	/* Grace period being initialized. */
+#define RCU_SAVE_DYNTICK	2	/* Need to scan dyntick state. */
+#define RCU_FORCE_QS		3	/* Need to force quiescent state. */
 #ifdef CONFIG_NO_HZ
 #define RCU_SIGNAL_INIT		RCU_SAVE_DYNTICK
 #else /* #ifdef CONFIG_NO_HZ */
diff --git a/kernel/user.c b/kernel/user.c
index 2c000e7..46d0165 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -330,9 +330,9 @@ done:
  */
 static void free_user(struct user_struct *up, unsigned long flags)
 {
-	spin_unlock_irqrestore(&uidhash_lock, flags);
 	INIT_DELAYED_WORK(&up->work, cleanup_user_struct);
 	schedule_delayed_work(&up->work, msecs_to_jiffies(1000));
+	spin_unlock_irqrestore(&uidhash_lock, flags);
 }
 
 #else	/* CONFIG_USER_SCHED && CONFIG_SYSFS */
diff --git a/mm/highmem.c b/mm/highmem.c
index 25878cc..9c1e627 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -426,16 +426,21 @@ void __init page_address_init(void)
 
 void debug_kmap_atomic(enum km_type type)
 {
-	static unsigned warn_count = 10;
+	static int warn_count = 10;
 
-	if (unlikely(warn_count == 0))
+	if (unlikely(warn_count < 0))
 		return;
 
 	if (unlikely(in_interrupt())) {
-		if (in_irq()) {
+		if (in_nmi()) {
+			if (type != KM_NMI && type != KM_NMI_PTE) {
+				WARN_ON(1);
+				warn_count--;
+			}
+		} else if (in_irq()) {
 			if (type != KM_IRQ0 && type != KM_IRQ1 &&
 			    type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
-			    type != KM_BOUNCE_READ) {
+			    type != KM_BOUNCE_READ && type != KM_IRQ_PTE) {
 				WARN_ON(1);
 				warn_count--;
 			}
@@ -452,7 +457,9 @@ void debug_kmap_atomic(enum km_type type)
 	}
 
 	if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ ||
-			type == KM_BIO_SRC_IRQ || type == KM_BIO_DST_IRQ) {
+			type == KM_BIO_SRC_IRQ || type == KM_BIO_DST_IRQ ||
+			type == KM_IRQ_PTE || type == KM_NMI ||
+			type == KM_NMI_PTE ) {
 		if (!irqs_disabled()) {
 			WARN_ON(1);
 			warn_count--;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-10-23 14:53 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-10-23 14:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Paul E. McKenney, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Darren Hart (2):
      futex: Check for NULL keys in match_futex
      futex: Move drop_futex_key_refs out of spinlock'ed region

Paul E. McKenney (3):
      rcu: Prevent RCU IPI storms in presence of high call_rcu() load
      rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU
      rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang

Thomas Gleixner (1):
      futex: Handle spurious wake up


 include/linux/rcutree.h |    6 +-----
 kernel/futex.c          |   24 +++++++++++++++++++-----
 kernel/rcutree.c        |   44 ++++++++++++++++++++++++++++++++++++++------
 kernel/rcutree.h        |   10 +++++++---
 kernel/rcutree_plugin.h |   46 ++++++++++++++++++++++++++++++++++++++--------
 5 files changed, 103 insertions(+), 27 deletions(-)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 46e9ab3..9642c6b 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -76,11 +76,7 @@ static inline void __rcu_read_unlock_bh(void)
 
 extern void call_rcu_sched(struct rcu_head *head,
 			   void (*func)(struct rcu_head *rcu));
-
-static inline void synchronize_rcu_expedited(void)
-{
-	synchronize_sched_expedited();
-}
+extern void synchronize_rcu_expedited(void);
 
 static inline void synchronize_rcu_bh_expedited(void)
 {
diff --git a/kernel/futex.c b/kernel/futex.c
index 4949d33..642f3bb 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -150,7 +150,8 @@ static struct futex_hash_bucket *hash_futex(union futex_key *key)
  */
 static inline int match_futex(union futex_key *key1, union futex_key *key2)
 {
-	return (key1->both.word == key2->both.word
+	return (key1 && key2
+		&& key1->both.word == key2->both.word
 		&& key1->both.ptr == key2->both.ptr
 		&& key1->both.offset == key2->both.offset);
 }
@@ -1028,7 +1029,6 @@ static inline
 void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
 			   struct futex_hash_bucket *hb)
 {
-	drop_futex_key_refs(&q->key);
 	get_futex_key_refs(key);
 	q->key = *key;
 
@@ -1226,6 +1226,7 @@ retry_private:
 		 */
 		if (ret == 1) {
 			WARN_ON(pi_state);
+			drop_count++;
 			task_count++;
 			ret = get_futex_value_locked(&curval2, uaddr2);
 			if (!ret)
@@ -1304,6 +1305,7 @@ retry_private:
 			if (ret == 1) {
 				/* We got the lock. */
 				requeue_pi_wake_futex(this, &key2, hb2);
+				drop_count++;
 				continue;
 			} else if (ret) {
 				/* -EDEADLK */
@@ -1791,6 +1793,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 					     current->timer_slack_ns);
 	}
 
+retry:
 	/* Prepare to wait on uaddr. */
 	ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
 	if (ret)
@@ -1808,9 +1811,14 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 		goto out_put_key;
 
 	/*
-	 * We expect signal_pending(current), but another thread may
-	 * have handled it for us already.
+	 * We expect signal_pending(current), but we might be the
+	 * victim of a spurious wakeup as well.
 	 */
+	if (!signal_pending(current)) {
+		put_futex_key(fshared, &q.key);
+		goto retry;
+	}
+
 	ret = -ERESTARTSYS;
 	if (!abs_time)
 		goto out_put_key;
@@ -2118,9 +2126,11 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 		 */
 		plist_del(&q->list, &q->list.plist);
 
+		/* Handle spurious wakeups gracefully */
+		ret = -EAGAIN;
 		if (timeout && !timeout->task)
 			ret = -ETIMEDOUT;
-		else
+		else if (signal_pending(current))
 			ret = -ERESTARTNOINTR;
 	}
 	return ret;
@@ -2198,6 +2208,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	debug_rt_mutex_init_waiter(&rt_waiter);
 	rt_waiter.task = NULL;
 
+retry:
 	key2 = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_WRITE);
 	if (unlikely(ret != 0))
@@ -2292,6 +2303,9 @@ out_put_keys:
 out_key2:
 	put_futex_key(fshared, &key2);
 
+	/* Spurious wakeup ? */
+	if (ret == -EAGAIN)
+		goto retry;
 out:
 	if (to) {
 		hrtimer_cancel(&to->timer);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 705f02a..0536125 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -913,7 +913,20 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
 			spin_unlock(&rnp->lock); /* irqs remain disabled. */
 			break;
 		}
-		rcu_preempt_offline_tasks(rsp, rnp, rdp);
+
+		/*
+		 * If there was a task blocking the current grace period,
+		 * and if all CPUs have checked in, we need to propagate
+		 * the quiescent state up the rcu_node hierarchy.  But that
+		 * is inconvenient at the moment due to deadlock issues if
+		 * this should end the current grace period.  So set the
+		 * offlined CPU's bit in ->qsmask in order to force the
+		 * next force_quiescent_state() invocation to clean up this
+		 * mess in a deadlock-free manner.
+		 */
+		if (rcu_preempt_offline_tasks(rsp, rnp, rdp) && !rnp->qsmask)
+			rnp->qsmask |= mask;
+
 		mask = rnp->grpmask;
 		spin_unlock(&rnp->lock);	/* irqs remain disabled. */
 		rnp = rnp->parent;
@@ -958,7 +971,7 @@ static void rcu_offline_cpu(int cpu)
  * Invoke any RCU callbacks that have made it to the end of their grace
  * period.  Thottle as specified by rdp->blimit.
  */
-static void rcu_do_batch(struct rcu_data *rdp)
+static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
 {
 	unsigned long flags;
 	struct rcu_head *next, *list, **tail;
@@ -1011,6 +1024,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	if (rdp->blimit == LONG_MAX && rdp->qlen <= qlowmark)
 		rdp->blimit = blimit;
 
+	/* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */
+	if (rdp->qlen == 0 && rdp->qlen_last_fqs_check != 0) {
+		rdp->qlen_last_fqs_check = 0;
+		rdp->n_force_qs_snap = rsp->n_force_qs;
+	} else if (rdp->qlen < rdp->qlen_last_fqs_check - qhimark)
+		rdp->qlen_last_fqs_check = rdp->qlen;
+
 	local_irq_restore(flags);
 
 	/* Re-raise the RCU softirq if there are callbacks remaining. */
@@ -1224,7 +1244,7 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
 	}
 
 	/* If there are callbacks ready, invoke them. */
-	rcu_do_batch(rdp);
+	rcu_do_batch(rsp, rdp);
 }
 
 /*
@@ -1288,10 +1308,20 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 		rcu_start_gp(rsp, nestflag);  /* releases rnp_root->lock. */
 	}
 
-	/* Force the grace period if too many callbacks or too long waiting. */
-	if (unlikely(++rdp->qlen > qhimark)) {
+	/*
+	 * Force the grace period if too many callbacks or too long waiting.
+	 * Enforce hysteresis, and don't invoke force_quiescent_state()
+	 * if some other CPU has recently done so.  Also, don't bother
+	 * invoking force_quiescent_state() if the newly enqueued callback
+	 * is the only one waiting for a grace period to complete.
+	 */
+	if (unlikely(++rdp->qlen > rdp->qlen_last_fqs_check + qhimark)) {
 		rdp->blimit = LONG_MAX;
-		force_quiescent_state(rsp, 0);
+		if (rsp->n_force_qs == rdp->n_force_qs_snap &&
+		    *rdp->nxttail[RCU_DONE_TAIL] != head)
+			force_quiescent_state(rsp, 0);
+		rdp->n_force_qs_snap = rsp->n_force_qs;
+		rdp->qlen_last_fqs_check = rdp->qlen;
 	} else if ((long)(ACCESS_ONCE(rsp->jiffies_force_qs) - jiffies) < 0)
 		force_quiescent_state(rsp, 1);
 	local_irq_restore(flags);
@@ -1523,6 +1553,8 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
 	rdp->beenonline = 1;	 /* We have now been online. */
 	rdp->preemptable = preemptable;
 	rdp->passed_quiesc_completed = lastcomp - 1;
+	rdp->qlen_last_fqs_check = 0;
+	rdp->n_force_qs_snap = rsp->n_force_qs;
 	rdp->blimit = blimit;
 	spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index b40ac57..1823c6e 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -167,6 +167,10 @@ struct rcu_data {
 	struct rcu_head *nxtlist;
 	struct rcu_head **nxttail[RCU_NEXT_SIZE];
 	long		qlen;		/* # of queued callbacks */
+	long		qlen_last_fqs_check;
+					/* qlen at last check for QS forcing */
+	unsigned long	n_force_qs_snap;
+					/* did other CPU force QS recently? */
 	long		blimit;		/* Upper limit on a processed batch */
 
 #ifdef CONFIG_NO_HZ
@@ -302,9 +306,9 @@ static void rcu_print_task_stall(struct rcu_node *rnp);
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
 static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
 #ifdef CONFIG_HOTPLUG_CPU
-static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
-				      struct rcu_node *rnp,
-				      struct rcu_data *rdp);
+static int rcu_preempt_offline_tasks(struct rcu_state *rsp,
+				     struct rcu_node *rnp,
+				     struct rcu_data *rdp);
 static void rcu_preempt_offline_cpu(int cpu);
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 static void rcu_preempt_check_callbacks(int cpu);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c0cb783..ef2a58c 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -304,21 +304,25 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
  * parent is to remove the need for rcu_read_unlock_special() to
  * make more than two attempts to acquire the target rcu_node's lock.
  *
+ * Returns 1 if there was previously a task blocking the current grace
+ * period on the specified rcu_node structure.
+ *
  * The caller must hold rnp->lock with irqs disabled.
  */
-static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
-				      struct rcu_node *rnp,
-				      struct rcu_data *rdp)
+static int rcu_preempt_offline_tasks(struct rcu_state *rsp,
+				     struct rcu_node *rnp,
+				     struct rcu_data *rdp)
 {
 	int i;
 	struct list_head *lp;
 	struct list_head *lp_root;
+	int retval = rcu_preempted_readers(rnp);
 	struct rcu_node *rnp_root = rcu_get_root(rsp);
 	struct task_struct *tp;
 
 	if (rnp == rnp_root) {
 		WARN_ONCE(1, "Last CPU thought to be offlined?");
-		return;  /* Shouldn't happen: at least one CPU online. */
+		return 0;  /* Shouldn't happen: at least one CPU online. */
 	}
 	WARN_ON_ONCE(rnp != rdp->mynode &&
 		     (!list_empty(&rnp->blocked_tasks[0]) ||
@@ -342,6 +346,8 @@ static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
 			spin_unlock(&rnp_root->lock); /* irqs remain disabled */
 		}
 	}
+
+	return retval;
 }
 
 /*
@@ -393,6 +399,17 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 EXPORT_SYMBOL_GPL(call_rcu);
 
 /*
+ * Wait for an rcu-preempt grace period.  We are supposed to expedite the
+ * grace period, but this is the crude slow compatability hack, so just
+ * invoke synchronize_rcu().
+ */
+void synchronize_rcu_expedited(void)
+{
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/*
  * Check to see if there is any immediate preemptable-RCU-related work
  * to be done.
  */
@@ -521,12 +538,15 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
 
 /*
  * Because preemptable RCU does not exist, it never needs to migrate
- * tasks that were blocked within RCU read-side critical sections.
+ * tasks that were blocked within RCU read-side critical sections, and
+ * such non-existent tasks cannot possibly have been blocking the current
+ * grace period.
  */
-static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
-				      struct rcu_node *rnp,
-				      struct rcu_data *rdp)
+static int rcu_preempt_offline_tasks(struct rcu_state *rsp,
+				     struct rcu_node *rnp,
+				     struct rcu_data *rdp)
 {
+	return 0;
 }
 
 /*
@@ -565,6 +585,16 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 EXPORT_SYMBOL_GPL(call_rcu);
 
 /*
+ * Wait for an rcu-preempt grace period, but make it happen quickly.
+ * But because preemptable RCU does not exist, map to rcu-sched.
+ */
+void synchronize_rcu_expedited(void)
+{
+	synchronize_sched_expedited();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/*
  * Because preemptable RCU does not exist, it never has any work to do.
  */
 static int rcu_preempt_pending(int cpu)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-10-13 18:29 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-10-13 18:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
David Rientjes (1):
      oprofile: fix race condition in event_buffer free

Peter Zijlstra (1):
      lockdep: Use cpu_clock() for lockstat

Robert Richter (1):
      oprofile: warn on freeing event buffer too early


 drivers/oprofile/event_buffer.c |   35 ++++++++++++++++++++++++++---------
 kernel/lockdep.c                |   20 ++++++++++++--------
 2 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c
index 2b7ae36..5df60a6 100644
--- a/drivers/oprofile/event_buffer.c
+++ b/drivers/oprofile/event_buffer.c
@@ -35,12 +35,23 @@ static size_t buffer_pos;
 /* atomic_t because wait_event checks it outside of buffer_mutex */
 static atomic_t buffer_ready = ATOMIC_INIT(0);
 
-/* Add an entry to the event buffer. When we
- * get near to the end we wake up the process
- * sleeping on the read() of the file.
+/*
+ * Add an entry to the event buffer. When we get near to the end we
+ * wake up the process sleeping on the read() of the file. To protect
+ * the event_buffer this function may only be called when buffer_mutex
+ * is set.
  */
 void add_event_entry(unsigned long value)
 {
+	/*
+	 * This shouldn't happen since all workqueues or handlers are
+	 * canceled or flushed before the event buffer is freed.
+	 */
+	if (!event_buffer) {
+		WARN_ON_ONCE(1);
+		return;
+	}
+
 	if (buffer_pos == buffer_size) {
 		atomic_inc(&oprofile_stats.event_lost_overflow);
 		return;
@@ -69,7 +80,6 @@ void wake_up_buffer_waiter(void)
 
 int alloc_event_buffer(void)
 {
-	int err = -ENOMEM;
 	unsigned long flags;
 
 	spin_lock_irqsave(&oprofilefs_lock, flags);
@@ -80,21 +90,22 @@ int alloc_event_buffer(void)
 	if (buffer_watershed >= buffer_size)
 		return -EINVAL;
 
+	buffer_pos = 0;
 	event_buffer = vmalloc(sizeof(unsigned long) * buffer_size);
 	if (!event_buffer)
-		goto out;
+		return -ENOMEM;
 
-	err = 0;
-out:
-	return err;
+	return 0;
 }
 
 
 void free_event_buffer(void)
 {
+	mutex_lock(&buffer_mutex);
 	vfree(event_buffer);
-
+	buffer_pos = 0;
 	event_buffer = NULL;
+	mutex_unlock(&buffer_mutex);
 }
 
 
@@ -167,6 +178,12 @@ static ssize_t event_buffer_read(struct file *file, char __user *buf,
 
 	mutex_lock(&buffer_mutex);
 
+	/* May happen if the buffer is freed during pending reads. */
+	if (!event_buffer) {
+		retval = -EINTR;
+		goto out;
+	}
+
 	atomic_set(&buffer_ready, 0);
 
 	retval = -EFAULT;
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 3815ac1..9af5672 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -142,6 +142,11 @@ static inline struct lock_class *hlock_class(struct held_lock *hlock)
 #ifdef CONFIG_LOCK_STAT
 static DEFINE_PER_CPU(struct lock_class_stats[MAX_LOCKDEP_KEYS], lock_stats);
 
+static inline u64 lockstat_clock(void)
+{
+	return cpu_clock(smp_processor_id());
+}
+
 static int lock_point(unsigned long points[], unsigned long ip)
 {
 	int i;
@@ -158,7 +163,7 @@ static int lock_point(unsigned long points[], unsigned long ip)
 	return i;
 }
 
-static void lock_time_inc(struct lock_time *lt, s64 time)
+static void lock_time_inc(struct lock_time *lt, u64 time)
 {
 	if (time > lt->max)
 		lt->max = time;
@@ -234,12 +239,12 @@ static void put_lock_stats(struct lock_class_stats *stats)
 static void lock_release_holdtime(struct held_lock *hlock)
 {
 	struct lock_class_stats *stats;
-	s64 holdtime;
+	u64 holdtime;
 
 	if (!lock_stat)
 		return;
 
-	holdtime = sched_clock() - hlock->holdtime_stamp;
+	holdtime = lockstat_clock() - hlock->holdtime_stamp;
 
 	stats = get_lock_stats(hlock_class(hlock));
 	if (hlock->read)
@@ -2792,7 +2797,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	hlock->references = references;
 #ifdef CONFIG_LOCK_STAT
 	hlock->waittime_stamp = 0;
-	hlock->holdtime_stamp = sched_clock();
+	hlock->holdtime_stamp = lockstat_clock();
 #endif
 
 	if (check == 2 && !mark_irqflags(curr, hlock))
@@ -3322,7 +3327,7 @@ found_it:
 	if (hlock->instance != lock)
 		return;
 
-	hlock->waittime_stamp = sched_clock();
+	hlock->waittime_stamp = lockstat_clock();
 
 	contention_point = lock_point(hlock_class(hlock)->contention_point, ip);
 	contending_point = lock_point(hlock_class(hlock)->contending_point,
@@ -3345,8 +3350,7 @@ __lock_acquired(struct lockdep_map *lock, unsigned long ip)
 	struct held_lock *hlock, *prev_hlock;
 	struct lock_class_stats *stats;
 	unsigned int depth;
-	u64 now;
-	s64 waittime = 0;
+	u64 now, waittime = 0;
 	int i, cpu;
 
 	depth = curr->lockdep_depth;
@@ -3374,7 +3378,7 @@ found_it:
 
 	cpu = smp_processor_id();
 	if (hlock->waittime_stamp) {
-		now = sched_clock();
+		now = lockstat_clock();
 		waittime = now - hlock->waittime_stamp;
 		hlock->holdtime_stamp = now;
 	}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-10-08 19:16 ` Linus Torvalds
@ 2009-10-08 19:20   ` Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-10-08 19:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Paul E. McKenney, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Thu, 8 Oct 2009, Ingo Molnar wrote:
> > 
> > Please pull the latest core-fixes-for-linus git tree from:
> > 
> >    git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus
> > 
> > Sigh ... this now looks a bit large for -rc3, due to the RCU 
> > cleanups. When i queued it up in -rc1 it looked acceptable :-/ 
> > Should we redo it again to extract the cleanups from the fixes 
> > (which cause most of the diffstat increase)?
> 
> Gaah.
> 
> Looking at the actual patch, it looks ok to me. But I really would 
> have preferred to see just fixes by now.
> 
> But I'll take it this way.

Thanks. Will be more careful ...

	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-10-08 19:06 Ingo Molnar
@ 2009-10-08 19:16 ` Linus Torvalds
  2009-10-08 19:20   ` Ingo Molnar
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2009-10-08 19:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linux Kernel Mailing List, Paul E. McKenney, Andrew Morton



On Thu, 8 Oct 2009, Ingo Molnar wrote:
> 
> Please pull the latest core-fixes-for-linus git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus
> 
> Sigh ... this now looks a bit large for -rc3, due to the RCU cleanups. 
> When i queued it up in -rc1 it looked acceptable :-/ Should we redo it 
> again to extract the cleanups from the fixes (which cause most of the 
> diffstat increase)?

Gaah.

Looking at the actual patch, it looks ok to me. But I really would have 
preferred to see just fixes by now.

But I'll take it this way.

		Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-10-08 19:06 Ingo Molnar
  2009-10-08 19:16 ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2009-10-08 19:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Paul E. McKenney, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

Sigh ... this now looks a bit large for -rc3, due to the RCU cleanups. 
When i queued it up in -rc1 it looked acceptable :-/ Should we redo it 
again to extract the cleanups from the fixes (which cause most of the 
diffstat increase)?

 Thanks,

	Ingo

------------------>
Aaro Koskinen (1):
      panic: Fix panic message visibility by calling bust_spinlocks(0) before dying

Anton Blanchard (1):
      futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions

Darren Hart (1):
      futex: fix requeue_pi key imbalance

Paul E. McKenney (10):
      rcu: Clean up code based on review feedback from Josh Triplett
      rcu: Clean up code based on review feedback from Josh Triplett, part 2
      rcu: Clean up code to address Ingo's checkpatch feedback
      rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
      rcu: Clean up code based on review feedback from Josh Triplett, part 3
      rcu: Clean up code based on review feedback from Josh Triplett, part 4
      rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
      rcu: Move rcu_barrier() to rcutree
      rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
      rcu: Place root rcu_node structure in separate lockdep class

Peter Zijlstra (1):
      futex: Nullify robust lists after cleanup

Thomas Gleixner (2):
      futex: Fix locking imbalance
      futex: Move exit_pi_state() call to release_mm()


 include/linux/futex.h    |    4 +-
 include/linux/rcupdate.h |   18 ++-
 include/linux/rcutree.h  |   13 +-
 init/main.c              |    1 -
 kernel/exit.c            |    2 -
 kernel/fork.c            |   10 +-
 kernel/futex.c           |    3 +-
 kernel/panic.c           |    3 +-
 kernel/rcupdate.c        |  140 +-------------------
 kernel/rcutorture.c      |    4 +-
 kernel/rcutree.c         |  330 ++++++++++++++++++++++++++++++----------------
 kernel/rcutree.h         |   86 ++++++++++--
 kernel/rcutree_plugin.h  |  103 +++++++++++----
 kernel/rcutree_trace.c   |   14 +-
 14 files changed, 410 insertions(+), 321 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 34956c8..78b92ec 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -38,8 +38,8 @@ union ktime;
 #define FUTEX_LOCK_PI_PRIVATE	(FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
 #define FUTEX_UNLOCK_PI_PRIVATE	(FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
 #define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAIT_BITSET_PRIVATE	(FUTEX_WAIT_BITS | FUTEX_PRIVATE_FLAG)
-#define FUTEX_WAKE_BITSET_PRIVATE	(FUTEX_WAKE_BITS | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAIT_BITSET_PRIVATE	(FUTEX_WAIT_BITSET | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAKE_BITSET_PRIVATE	(FUTEX_WAKE_BITSET | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
 					 FUTEX_PRIVATE_FLAG)
 #define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6fe0363..3ebd0b7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -77,7 +77,7 @@ extern int rcu_scheduler_active;
 #error "Unknown RCU implementation specified to kernel configuration"
 #endif
 
-#define RCU_HEAD_INIT 	{ .next = NULL, .func = NULL }
+#define RCU_HEAD_INIT	{ .next = NULL, .func = NULL }
 #define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
 #define INIT_RCU_HEAD(ptr) do { \
        (ptr)->next = NULL; (ptr)->func = NULL; \
@@ -129,12 +129,6 @@ static inline void rcu_read_lock(void)
 	rcu_read_acquire();
 }
 
-/**
- * rcu_read_unlock - marks the end of an RCU read-side critical section.
- *
- * See rcu_read_lock() for more information.
- */
-
 /*
  * So where is rcu_write_lock()?  It does not exist, as there is no
  * way for writers to lock out RCU readers.  This is a feature, not
@@ -144,6 +138,12 @@ static inline void rcu_read_lock(void)
  * used as well.  RCU does not care how the writers keep out of each
  * others' way, as long as they do so.
  */
+
+/**
+ * rcu_read_unlock - marks the end of an RCU read-side critical section.
+ *
+ * See rcu_read_lock() for more information.
+ */
 static inline void rcu_read_unlock(void)
 {
 	rcu_read_release();
@@ -196,6 +196,8 @@ static inline void rcu_read_lock_sched(void)
 	__acquire(RCU_SCHED);
 	rcu_read_acquire();
 }
+
+/* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
 static inline notrace void rcu_read_lock_sched_notrace(void)
 {
 	preempt_disable_notrace();
@@ -213,6 +215,8 @@ static inline void rcu_read_unlock_sched(void)
 	__release(RCU_SCHED);
 	preempt_enable();
 }
+
+/* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
 static inline notrace void rcu_read_unlock_sched_notrace(void)
 {
 	__release(RCU_SCHED);
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 3768277..46e9ab3 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -30,10 +30,14 @@
 #ifndef __LINUX_RCUTREE_H
 #define __LINUX_RCUTREE_H
 
+struct notifier_block;
+
 extern void rcu_sched_qs(int cpu);
 extern void rcu_bh_qs(int cpu);
-
+extern int rcu_cpu_notify(struct notifier_block *self,
+			  unsigned long action, void *hcpu);
 extern int rcu_needs_cpu(int cpu);
+extern int rcu_expedited_torture_stats(char *page);
 
 #ifdef CONFIG_TREE_PREEMPT_RCU
 
@@ -85,16 +89,11 @@ static inline void synchronize_rcu_bh_expedited(void)
 
 extern void __rcu_init(void);
 extern void rcu_check_callbacks(int cpu, int user);
-extern void rcu_restart_cpu(int cpu);
 
 extern long rcu_batches_completed(void);
 extern long rcu_batches_completed_bh(void);
 extern long rcu_batches_completed_sched(void);
 
-static inline void rcu_init_sched(void)
-{
-}
-
 #ifdef CONFIG_NO_HZ
 void rcu_enter_nohz(void);
 void rcu_exit_nohz(void);
@@ -107,7 +106,7 @@ static inline void rcu_exit_nohz(void)
 }
 #endif /* CONFIG_NO_HZ */
 
-/* A context switch is a grace period for rcutree. */
+/* A context switch is a grace period for RCU-sched and RCU-bh. */
 static inline int rcu_blocking_is_gp(void)
 {
 	return num_online_cpus() == 1;
diff --git a/init/main.c b/init/main.c
index 34971be..833d675 100644
--- a/init/main.c
+++ b/init/main.c
@@ -782,7 +782,6 @@ static void __init do_initcalls(void)
  */
 static void __init do_basic_setup(void)
 {
-	rcu_init_sched(); /* needed by module_init stage. */
 	init_workqueues();
 	cpuset_init_smp();
 	usermodehelper_init();
diff --git a/kernel/exit.c b/kernel/exit.c
index ae5d866..bc2b1fd 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -989,8 +989,6 @@ NORET_TYPE void do_exit(long code)
 	tsk->mempolicy = NULL;
 #endif
 #ifdef CONFIG_FUTEX
-	if (unlikely(!list_empty(&tsk->pi_state_list)))
-		exit_pi_state_list(tsk);
 	if (unlikely(current->pi_state_cache))
 		kfree(current->pi_state_cache);
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index bfee931..341965b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -543,12 +543,18 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 
 	/* Get rid of any futexes when releasing the mm */
 #ifdef CONFIG_FUTEX
-	if (unlikely(tsk->robust_list))
+	if (unlikely(tsk->robust_list)) {
 		exit_robust_list(tsk);
+		tsk->robust_list = NULL;
+	}
 #ifdef CONFIG_COMPAT
-	if (unlikely(tsk->compat_robust_list))
+	if (unlikely(tsk->compat_robust_list)) {
 		compat_exit_robust_list(tsk);
+		tsk->compat_robust_list = NULL;
+	}
 #endif
+	if (unlikely(!list_empty(&tsk->pi_state_list)))
+		exit_pi_state_list(tsk);
 #endif
 
 	/* Get rid of any cached register state */
diff --git a/kernel/futex.c b/kernel/futex.c
index 463af2e..c3bb2fc 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -916,8 +916,8 @@ retry:
 	hb1 = hash_futex(&key1);
 	hb2 = hash_futex(&key2);
 
-	double_lock_hb(hb1, hb2);
 retry_private:
+	double_lock_hb(hb1, hb2);
 	op_ret = futex_atomic_op_inuser(op, uaddr2);
 	if (unlikely(op_ret < 0)) {
 
@@ -2111,7 +2111,6 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 		 * Unqueue the futex_q and determine which it was.
 		 */
 		plist_del(&q->list, &q->list.plist);
-		drop_futex_key_refs(&q->key);
 
 		if (timeout && !timeout->task)
 			ret = -ETIMEDOUT;
diff --git a/kernel/panic.c b/kernel/panic.c
index 512ab73..bc4dcb6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -90,6 +90,8 @@ NORET_TYPE void panic(const char * fmt, ...)
 
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
+	bust_spinlocks(0);
+
 	if (!panic_blink)
 		panic_blink = no_blink;
 
@@ -136,7 +138,6 @@ NORET_TYPE void panic(const char * fmt, ...)
 		mdelay(1);
 		i++;
 	}
-	bust_spinlocks(0);
 }
 
 EXPORT_SYMBOL(panic);
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 37ac454..4001833 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -46,22 +46,15 @@
 #include <linux/module.h>
 #include <linux/kernel_stat.h>
 
-enum rcu_barrier {
-	RCU_BARRIER_STD,
-	RCU_BARRIER_BH,
-	RCU_BARRIER_SCHED,
-};
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static struct lock_class_key rcu_lock_key;
+struct lockdep_map rcu_lock_map =
+	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+EXPORT_SYMBOL_GPL(rcu_lock_map);
+#endif
 
-static DEFINE_PER_CPU(struct rcu_head, rcu_barrier_head) = {NULL};
-static atomic_t rcu_barrier_cpu_count;
-static DEFINE_MUTEX(rcu_barrier_mutex);
-static struct completion rcu_barrier_completion;
 int rcu_scheduler_active __read_mostly;
 
-static atomic_t rcu_migrate_type_count = ATOMIC_INIT(0);
-static struct rcu_head rcu_migrate_head[3];
-static DECLARE_WAIT_QUEUE_HEAD(rcu_migrate_wq);
-
 /*
  * Awaken the corresponding synchronize_rcu() instance now that a
  * grace period has elapsed.
@@ -164,129 +157,10 @@ void synchronize_rcu_bh(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
 
-static void rcu_barrier_callback(struct rcu_head *notused)
-{
-	if (atomic_dec_and_test(&rcu_barrier_cpu_count))
-		complete(&rcu_barrier_completion);
-}
-
-/*
- * Called with preemption disabled, and from cross-cpu IRQ context.
- */
-static void rcu_barrier_func(void *type)
-{
-	int cpu = smp_processor_id();
-	struct rcu_head *head = &per_cpu(rcu_barrier_head, cpu);
-
-	atomic_inc(&rcu_barrier_cpu_count);
-	switch ((enum rcu_barrier)type) {
-	case RCU_BARRIER_STD:
-		call_rcu(head, rcu_barrier_callback);
-		break;
-	case RCU_BARRIER_BH:
-		call_rcu_bh(head, rcu_barrier_callback);
-		break;
-	case RCU_BARRIER_SCHED:
-		call_rcu_sched(head, rcu_barrier_callback);
-		break;
-	}
-}
-
-static inline void wait_migrated_callbacks(void)
-{
-	wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
-	smp_mb(); /* In case we didn't sleep. */
-}
-
-/*
- * Orchestrate the specified type of RCU barrier, waiting for all
- * RCU callbacks of the specified type to complete.
- */
-static void _rcu_barrier(enum rcu_barrier type)
-{
-	BUG_ON(in_interrupt());
-	/* Take cpucontrol mutex to protect against CPU hotplug */
-	mutex_lock(&rcu_barrier_mutex);
-	init_completion(&rcu_barrier_completion);
-	/*
-	 * Initialize rcu_barrier_cpu_count to 1, then invoke
-	 * rcu_barrier_func() on each CPU, so that each CPU also has
-	 * incremented rcu_barrier_cpu_count.  Only then is it safe to
-	 * decrement rcu_barrier_cpu_count -- otherwise the first CPU
-	 * might complete its grace period before all of the other CPUs
-	 * did their increment, causing this function to return too
-	 * early.
-	 */
-	atomic_set(&rcu_barrier_cpu_count, 1);
-	on_each_cpu(rcu_barrier_func, (void *)type, 1);
-	if (atomic_dec_and_test(&rcu_barrier_cpu_count))
-		complete(&rcu_barrier_completion);
-	wait_for_completion(&rcu_barrier_completion);
-	mutex_unlock(&rcu_barrier_mutex);
-	wait_migrated_callbacks();
-}
-
-/**
- * rcu_barrier - Wait until all in-flight call_rcu() callbacks complete.
- */
-void rcu_barrier(void)
-{
-	_rcu_barrier(RCU_BARRIER_STD);
-}
-EXPORT_SYMBOL_GPL(rcu_barrier);
-
-/**
- * rcu_barrier_bh - Wait until all in-flight call_rcu_bh() callbacks complete.
- */
-void rcu_barrier_bh(void)
-{
-	_rcu_barrier(RCU_BARRIER_BH);
-}
-EXPORT_SYMBOL_GPL(rcu_barrier_bh);
-
-/**
- * rcu_barrier_sched - Wait for in-flight call_rcu_sched() callbacks.
- */
-void rcu_barrier_sched(void)
-{
-	_rcu_barrier(RCU_BARRIER_SCHED);
-}
-EXPORT_SYMBOL_GPL(rcu_barrier_sched);
-
-static void rcu_migrate_callback(struct rcu_head *notused)
-{
-	if (atomic_dec_and_test(&rcu_migrate_type_count))
-		wake_up(&rcu_migrate_wq);
-}
-
-extern int rcu_cpu_notify(struct notifier_block *self,
-			  unsigned long action, void *hcpu);
-
 static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
 		unsigned long action, void *hcpu)
 {
-	rcu_cpu_notify(self, action, hcpu);
-	if (action == CPU_DYING) {
-		/*
-		 * preempt_disable() in on_each_cpu() prevents stop_machine(),
-		 * so when "on_each_cpu(rcu_barrier_func, (void *)type, 1);"
-		 * returns, all online cpus have queued rcu_barrier_func(),
-		 * and the dead cpu(if it exist) queues rcu_migrate_callback()s.
-		 *
-		 * These callbacks ensure _rcu_barrier() waits for all
-		 * RCU callbacks of the specified type to complete.
-		 */
-		atomic_set(&rcu_migrate_type_count, 3);
-		call_rcu_bh(rcu_migrate_head, rcu_migrate_callback);
-		call_rcu_sched(rcu_migrate_head + 1, rcu_migrate_callback);
-		call_rcu(rcu_migrate_head + 2, rcu_migrate_callback);
-	} else if (action == CPU_DOWN_PREPARE) {
-		/* Don't need to wait until next removal operation. */
-		/* rcu_migrate_head is protected by cpu_add_remove_lock */
-		wait_migrated_callbacks();
-	}
-
-	return NOTIFY_OK;
+	return rcu_cpu_notify(self, action, hcpu);
 }
 
 void __init rcu_init(void)
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 233768f..697c0a0 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -606,8 +606,6 @@ static struct rcu_torture_ops sched_ops_sync = {
 	.name		= "sched_sync"
 };
 
-extern int rcu_expedited_torture_stats(char *page);
-
 static struct rcu_torture_ops sched_expedited_ops = {
 	.init		= rcu_sync_torture_init,
 	.cleanup	= NULL,
@@ -650,7 +648,7 @@ rcu_torture_writer(void *arg)
 		old_rp = rcu_torture_current;
 		rp->rtort_mbtest = 1;
 		rcu_assign_pointer(rcu_torture_current, rp);
-		smp_wmb();
+		smp_wmb(); /* Mods to old_rp must follow rcu_assign_pointer() */
 		if (old_rp) {
 			i = old_rp->rtort_pipe_count;
 			if (i > RCU_TORTURE_PIPE_LEN)
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 52b06f6..705f02a 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -49,13 +49,6 @@
 
 #include "rcutree.h"
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-static struct lock_class_key rcu_lock_key;
-struct lockdep_map rcu_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
-EXPORT_SYMBOL_GPL(rcu_lock_map);
-#endif
-
 /* Data structures. */
 
 #define RCU_STATE_INITIALIZER(name) { \
@@ -70,6 +63,9 @@ EXPORT_SYMBOL_GPL(rcu_lock_map);
 	.gpnum = -300, \
 	.completed = -300, \
 	.onofflock = __SPIN_LOCK_UNLOCKED(&name.onofflock), \
+	.orphan_cbs_list = NULL, \
+	.orphan_cbs_tail = &name.orphan_cbs_list, \
+	.orphan_qlen = 0, \
 	.fqslock = __SPIN_LOCK_UNLOCKED(&name.fqslock), \
 	.n_force_qs = 0, \
 	.n_force_qs_ngp = 0, \
@@ -81,24 +77,16 @@ DEFINE_PER_CPU(struct rcu_data, rcu_sched_data);
 struct rcu_state rcu_bh_state = RCU_STATE_INITIALIZER(rcu_bh_state);
 DEFINE_PER_CPU(struct rcu_data, rcu_bh_data);
 
-extern long rcu_batches_completed_sched(void);
-static struct rcu_node *rcu_get_root(struct rcu_state *rsp);
-static void cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp,
-			  struct rcu_node *rnp, unsigned long flags);
-static void cpu_quiet_msk_finish(struct rcu_state *rsp, unsigned long flags);
-#ifdef CONFIG_HOTPLUG_CPU
-static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp);
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-static void __rcu_process_callbacks(struct rcu_state *rsp,
-				    struct rcu_data *rdp);
-static void __call_rcu(struct rcu_head *head,
-		       void (*func)(struct rcu_head *rcu),
-		       struct rcu_state *rsp);
-static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp);
-static void __cpuinit rcu_init_percpu_data(int cpu, struct rcu_state *rsp,
-					   int preemptable);
 
-#include "rcutree_plugin.h"
+/*
+ * Return true if an RCU grace period is in progress.  The ACCESS_ONCE()s
+ * permit this function to be invoked without holding the root rcu_node
+ * structure's ->lock, but of course results can be subject to change.
+ */
+static int rcu_gp_in_progress(struct rcu_state *rsp)
+{
+	return ACCESS_ONCE(rsp->completed) != ACCESS_ONCE(rsp->gpnum);
+}
 
 /*
  * Note a quiescent state.  Because we do not need to know
@@ -137,6 +125,10 @@ static int blimit = 10;		/* Maximum callbacks per softirq. */
 static int qhimark = 10000;	/* If this many pending, ignore blimit. */
 static int qlowmark = 100;	/* Once only this many pending, use blimit. */
 
+module_param(blimit, int, 0);
+module_param(qhimark, int, 0);
+module_param(qlowmark, int, 0);
+
 static void force_quiescent_state(struct rcu_state *rsp, int relaxed);
 static int rcu_pending(int cpu);
 
@@ -173,9 +165,7 @@ cpu_has_callbacks_ready_to_invoke(struct rcu_data *rdp)
 static int
 cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
 {
-	/* ACCESS_ONCE() because we are accessing outside of lock. */
-	return *rdp->nxttail[RCU_DONE_TAIL] &&
-	       ACCESS_ONCE(rsp->completed) == ACCESS_ONCE(rsp->gpnum);
+	return *rdp->nxttail[RCU_DONE_TAIL] && !rcu_gp_in_progress(rsp);
 }
 
 /*
@@ -369,7 +359,7 @@ static long dyntick_recall_completed(struct rcu_state *rsp)
 /*
  * Snapshot the specified CPU's dynticks counter so that we can later
  * credit them with an implicit quiescent state.  Return 1 if this CPU
- * is already in a quiescent state courtesy of dynticks idle mode.
+ * is in dynticks idle mode, which is an extended quiescent state.
  */
 static int dyntick_save_progress_counter(struct rcu_data *rdp)
 {
@@ -475,30 +465,34 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	long delta;
 	unsigned long flags;
 	struct rcu_node *rnp = rcu_get_root(rsp);
-	struct rcu_node *rnp_cur = rsp->level[NUM_RCU_LVLS - 1];
-	struct rcu_node *rnp_end = &rsp->node[NUM_RCU_NODES];
 
 	/* Only let one CPU complain about others per time interval. */
 
 	spin_lock_irqsave(&rnp->lock, flags);
 	delta = jiffies - rsp->jiffies_stall;
-	if (delta < RCU_STALL_RAT_DELAY || rsp->gpnum == rsp->completed) {
+	if (delta < RCU_STALL_RAT_DELAY || !rcu_gp_in_progress(rsp)) {
 		spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
 	}
 	rsp->jiffies_stall = jiffies + RCU_SECONDS_TILL_STALL_RECHECK;
+
+	/*
+	 * Now rat on any tasks that got kicked up to the root rcu_node
+	 * due to CPU offlining.
+	 */
+	rcu_print_task_stall(rnp);
 	spin_unlock_irqrestore(&rnp->lock, flags);
 
 	/* OK, time to rat on our buddy... */
 
 	printk(KERN_ERR "INFO: RCU detected CPU stalls:");
-	for (; rnp_cur < rnp_end; rnp_cur++) {
+	rcu_for_each_leaf_node(rsp, rnp) {
 		rcu_print_task_stall(rnp);
-		if (rnp_cur->qsmask == 0)
+		if (rnp->qsmask == 0)
 			continue;
-		for (cpu = 0; cpu <= rnp_cur->grphi - rnp_cur->grplo; cpu++)
-			if (rnp_cur->qsmask & (1UL << cpu))
-				printk(" %d", rnp_cur->grplo + cpu);
+		for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
+			if (rnp->qsmask & (1UL << cpu))
+				printk(" %d", rnp->grplo + cpu);
 	}
 	printk(" (detected by %d, t=%ld jiffies)\n",
 	       smp_processor_id(), (long)(jiffies - rsp->gp_start));
@@ -537,8 +531,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
 		/* We haven't checked in, so go dump stack. */
 		print_cpu_stall(rsp);
 
-	} else if (rsp->gpnum != rsp->completed &&
-		   delta >= RCU_STALL_RAT_DELAY) {
+	} else if (rcu_gp_in_progress(rsp) && delta >= RCU_STALL_RAT_DELAY) {
 
 		/* They had two time units to dump stack, so complain. */
 		print_other_cpu_stall(rsp);
@@ -617,9 +610,15 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 	note_new_gpnum(rsp, rdp);
 
 	/*
-	 * Because we are first, we know that all our callbacks will
-	 * be covered by this upcoming grace period, even the ones
-	 * that were registered arbitrarily recently.
+	 * Because this CPU just now started the new grace period, we know
+	 * that all of its callbacks will be covered by this upcoming grace
+	 * period, even the ones that were registered arbitrarily recently.
+	 * Therefore, advance all outstanding callbacks to RCU_WAIT_TAIL.
+	 *
+	 * Other CPUs cannot be sure exactly when the grace period started.
+	 * Therefore, their recently registered callbacks must pass through
+	 * an additional RCU_NEXT_READY stage, so that they will be handled
+	 * by the next RCU grace period.
 	 */
 	rdp->nxttail[RCU_NEXT_READY_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
 	rdp->nxttail[RCU_WAIT_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
@@ -657,7 +656,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 	 * one corresponding to this CPU, due to the fact that we have
 	 * irqs disabled.
 	 */
-	for (rnp = &rsp->node[0]; rnp < &rsp->node[NUM_RCU_NODES]; rnp++) {
+	rcu_for_each_node_breadth_first(rsp, rnp) {
 		spin_lock(&rnp->lock);	/* irqs already disabled. */
 		rcu_preempt_check_blocked_tasks(rnp);
 		rnp->qsmask = rnp->qsmaskinit;
@@ -703,9 +702,9 @@ rcu_process_gp_end(struct rcu_state *rsp, struct rcu_data *rdp)
  * hold rnp->lock, as required by rcu_start_gp(), which will release it.
  */
 static void cpu_quiet_msk_finish(struct rcu_state *rsp, unsigned long flags)
-	__releases(rnp->lock)
+	__releases(rcu_get_root(rsp)->lock)
 {
-	WARN_ON_ONCE(rsp->completed == rsp->gpnum);
+	WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
 	rsp->completed = rsp->gpnum;
 	rcu_process_gp_end(rsp, rsp->rda[smp_processor_id()]);
 	rcu_start_gp(rsp, flags);  /* releases root node's rnp->lock. */
@@ -842,17 +841,63 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 #ifdef CONFIG_HOTPLUG_CPU
 
 /*
+ * Move a dying CPU's RCU callbacks to the ->orphan_cbs_list for the
+ * specified flavor of RCU.  The callbacks will be adopted by the next
+ * _rcu_barrier() invocation or by the CPU_DEAD notifier, whichever
+ * comes first.  Because this is invoked from the CPU_DYING notifier,
+ * irqs are already disabled.
+ */
+static void rcu_send_cbs_to_orphanage(struct rcu_state *rsp)
+{
+	int i;
+	struct rcu_data *rdp = rsp->rda[smp_processor_id()];
+
+	if (rdp->nxtlist == NULL)
+		return;  /* irqs disabled, so comparison is stable. */
+	spin_lock(&rsp->onofflock);  /* irqs already disabled. */
+	*rsp->orphan_cbs_tail = rdp->nxtlist;
+	rsp->orphan_cbs_tail = rdp->nxttail[RCU_NEXT_TAIL];
+	rdp->nxtlist = NULL;
+	for (i = 0; i < RCU_NEXT_SIZE; i++)
+		rdp->nxttail[i] = &rdp->nxtlist;
+	rsp->orphan_qlen += rdp->qlen;
+	rdp->qlen = 0;
+	spin_unlock(&rsp->onofflock);  /* irqs remain disabled. */
+}
+
+/*
+ * Adopt previously orphaned RCU callbacks.
+ */
+static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)
+{
+	unsigned long flags;
+	struct rcu_data *rdp;
+
+	spin_lock_irqsave(&rsp->onofflock, flags);
+	rdp = rsp->rda[smp_processor_id()];
+	if (rsp->orphan_cbs_list == NULL) {
+		spin_unlock_irqrestore(&rsp->onofflock, flags);
+		return;
+	}
+	*rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_cbs_list;
+	rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_cbs_tail;
+	rdp->qlen += rsp->orphan_qlen;
+	rsp->orphan_cbs_list = NULL;
+	rsp->orphan_cbs_tail = &rsp->orphan_cbs_list;
+	rsp->orphan_qlen = 0;
+	spin_unlock_irqrestore(&rsp->onofflock, flags);
+}
+
+/*
  * Remove the outgoing CPU from the bitmasks in the rcu_node hierarchy
  * and move all callbacks from the outgoing CPU to the current one.
  */
 static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
 {
-	int i;
 	unsigned long flags;
 	long lastcomp;
 	unsigned long mask;
 	struct rcu_data *rdp = rsp->rda[cpu];
-	struct rcu_data *rdp_me;
 	struct rcu_node *rnp;
 
 	/* Exclude any attempts to start a new grace period. */
@@ -875,32 +920,9 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
 	} while (rnp != NULL);
 	lastcomp = rsp->completed;
 
-	spin_unlock(&rsp->onofflock);		/* irqs remain disabled. */
+	spin_unlock_irqrestore(&rsp->onofflock, flags);
 
-	/*
-	 * Move callbacks from the outgoing CPU to the running CPU.
-	 * Note that the outgoing CPU is now quiscent, so it is now
-	 * (uncharacteristically) safe to access its rcu_data structure.
-	 * Note also that we must carefully retain the order of the
-	 * outgoing CPU's callbacks in order for rcu_barrier() to work
-	 * correctly.  Finally, note that we start all the callbacks
-	 * afresh, even those that have passed through a grace period
-	 * and are therefore ready to invoke.  The theory is that hotplug
-	 * events are rare, and that if they are frequent enough to
-	 * indefinitely delay callbacks, you have far worse things to
-	 * be worrying about.
-	 */
-	rdp_me = rsp->rda[smp_processor_id()];
-	if (rdp->nxtlist != NULL) {
-		*rdp_me->nxttail[RCU_NEXT_TAIL] = rdp->nxtlist;
-		rdp_me->nxttail[RCU_NEXT_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
-		rdp->nxtlist = NULL;
-		for (i = 0; i < RCU_NEXT_SIZE; i++)
-			rdp->nxttail[i] = &rdp->nxtlist;
-		rdp_me->qlen += rdp->qlen;
-		rdp->qlen = 0;
-	}
-	local_irq_restore(flags);
+	rcu_adopt_orphan_cbs(rsp);
 }
 
 /*
@@ -918,6 +940,14 @@ static void rcu_offline_cpu(int cpu)
 
 #else /* #ifdef CONFIG_HOTPLUG_CPU */
 
+static void rcu_send_cbs_to_orphanage(struct rcu_state *rsp)
+{
+}
+
+static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)
+{
+}
+
 static void rcu_offline_cpu(int cpu)
 {
 }
@@ -1050,33 +1080,32 @@ static int rcu_process_dyntick(struct rcu_state *rsp, long lastcomp,
 	int cpu;
 	unsigned long flags;
 	unsigned long mask;
-	struct rcu_node *rnp_cur = rsp->level[NUM_RCU_LVLS - 1];
-	struct rcu_node *rnp_end = &rsp->node[NUM_RCU_NODES];
+	struct rcu_node *rnp;
 
-	for (; rnp_cur < rnp_end; rnp_cur++) {
+	rcu_for_each_leaf_node(rsp, rnp) {
 		mask = 0;
-		spin_lock_irqsave(&rnp_cur->lock, flags);
+		spin_lock_irqsave(&rnp->lock, flags);
 		if (rsp->completed != lastcomp) {
-			spin_unlock_irqrestore(&rnp_cur->lock, flags);
+			spin_unlock_irqrestore(&rnp->lock, flags);
 			return 1;
 		}
-		if (rnp_cur->qsmask == 0) {
-			spin_unlock_irqrestore(&rnp_cur->lock, flags);
+		if (rnp->qsmask == 0) {
+			spin_unlock_irqrestore(&rnp->lock, flags);
 			continue;
 		}
-		cpu = rnp_cur->grplo;
+		cpu = rnp->grplo;
 		bit = 1;
-		for (; cpu <= rnp_cur->grphi; cpu++, bit <<= 1) {
-			if ((rnp_cur->qsmask & bit) != 0 && f(rsp->rda[cpu]))
+		for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
+			if ((rnp->qsmask & bit) != 0 && f(rsp->rda[cpu]))
 				mask |= bit;
 		}
 		if (mask != 0 && rsp->completed == lastcomp) {
 
-			/* cpu_quiet_msk() releases rnp_cur->lock. */
-			cpu_quiet_msk(mask, rsp, rnp_cur, flags);
+			/* cpu_quiet_msk() releases rnp->lock. */
+			cpu_quiet_msk(mask, rsp, rnp, flags);
 			continue;
 		}
-		spin_unlock_irqrestore(&rnp_cur->lock, flags);
+		spin_unlock_irqrestore(&rnp->lock, flags);
 	}
 	return 0;
 }
@@ -1092,7 +1121,7 @@ static void force_quiescent_state(struct rcu_state *rsp, int relaxed)
 	struct rcu_node *rnp = rcu_get_root(rsp);
 	u8 signaled;
 
-	if (ACCESS_ONCE(rsp->completed) == ACCESS_ONCE(rsp->gpnum))
+	if (!rcu_gp_in_progress(rsp))
 		return;  /* No grace period in progress, nothing to force. */
 	if (!spin_trylock_irqsave(&rsp->fqslock, flags)) {
 		rsp->n_force_qs_lh++; /* Inexact, can lose counts.  Tough! */
@@ -1251,7 +1280,7 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 	rdp->nxttail[RCU_NEXT_TAIL] = &head->next;
 
 	/* Start a new grace period if one not already started. */
-	if (ACCESS_ONCE(rsp->completed) == ACCESS_ONCE(rsp->gpnum)) {
+	if (!rcu_gp_in_progress(rsp)) {
 		unsigned long nestflag;
 		struct rcu_node *rnp_root = rcu_get_root(rsp);
 
@@ -1331,7 +1360,7 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 	}
 
 	/* Has an RCU GP gone long enough to send resched IPIs &c? */
-	if (ACCESS_ONCE(rsp->completed) != ACCESS_ONCE(rsp->gpnum) &&
+	if (rcu_gp_in_progress(rsp) &&
 	    ((long)(ACCESS_ONCE(rsp->jiffies_force_qs) - jiffies) < 0)) {
 		rdp->n_rp_need_fqs++;
 		return 1;
@@ -1368,6 +1397,82 @@ int rcu_needs_cpu(int cpu)
 	       rcu_preempt_needs_cpu(cpu);
 }
 
+static DEFINE_PER_CPU(struct rcu_head, rcu_barrier_head) = {NULL};
+static atomic_t rcu_barrier_cpu_count;
+static DEFINE_MUTEX(rcu_barrier_mutex);
+static struct completion rcu_barrier_completion;
+
+static void rcu_barrier_callback(struct rcu_head *notused)
+{
+	if (atomic_dec_and_test(&rcu_barrier_cpu_count))
+		complete(&rcu_barrier_completion);
+}
+
+/*
+ * Called with preemption disabled, and from cross-cpu IRQ context.
+ */
+static void rcu_barrier_func(void *type)
+{
+	int cpu = smp_processor_id();
+	struct rcu_head *head = &per_cpu(rcu_barrier_head, cpu);
+	void (*call_rcu_func)(struct rcu_head *head,
+			      void (*func)(struct rcu_head *head));
+
+	atomic_inc(&rcu_barrier_cpu_count);
+	call_rcu_func = type;
+	call_rcu_func(head, rcu_barrier_callback);
+}
+
+/*
+ * Orchestrate the specified type of RCU barrier, waiting for all
+ * RCU callbacks of the specified type to complete.
+ */
+static void _rcu_barrier(struct rcu_state *rsp,
+			 void (*call_rcu_func)(struct rcu_head *head,
+					       void (*func)(struct rcu_head *head)))
+{
+	BUG_ON(in_interrupt());
+	/* Take mutex to serialize concurrent rcu_barrier() requests. */
+	mutex_lock(&rcu_barrier_mutex);
+	init_completion(&rcu_barrier_completion);
+	/*
+	 * Initialize rcu_barrier_cpu_count to 1, then invoke
+	 * rcu_barrier_func() on each CPU, so that each CPU also has
+	 * incremented rcu_barrier_cpu_count.  Only then is it safe to
+	 * decrement rcu_barrier_cpu_count -- otherwise the first CPU
+	 * might complete its grace period before all of the other CPUs
+	 * did their increment, causing this function to return too
+	 * early.
+	 */
+	atomic_set(&rcu_barrier_cpu_count, 1);
+	preempt_disable(); /* stop CPU_DYING from filling orphan_cbs_list */
+	rcu_adopt_orphan_cbs(rsp);
+	on_each_cpu(rcu_barrier_func, (void *)call_rcu_func, 1);
+	preempt_enable(); /* CPU_DYING can again fill orphan_cbs_list */
+	if (atomic_dec_and_test(&rcu_barrier_cpu_count))
+		complete(&rcu_barrier_completion);
+	wait_for_completion(&rcu_barrier_completion);
+	mutex_unlock(&rcu_barrier_mutex);
+}
+
+/**
+ * rcu_barrier_bh - Wait until all in-flight call_rcu_bh() callbacks complete.
+ */
+void rcu_barrier_bh(void)
+{
+	_rcu_barrier(&rcu_bh_state, call_rcu_bh);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier_bh);
+
+/**
+ * rcu_barrier_sched - Wait for in-flight call_rcu_sched() callbacks.
+ */
+void rcu_barrier_sched(void)
+{
+	_rcu_barrier(&rcu_sched_state, call_rcu_sched);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier_sched);
+
 /*
  * Do boot-time initialization of a CPU's per-CPU RCU data.
  */
@@ -1464,6 +1569,22 @@ int __cpuinit rcu_cpu_notify(struct notifier_block *self,
 	case CPU_UP_PREPARE_FROZEN:
 		rcu_online_cpu(cpu);
 		break;
+	case CPU_DYING:
+	case CPU_DYING_FROZEN:
+		/*
+		 * preempt_disable() in _rcu_barrier() prevents stop_machine(),
+		 * so when "on_each_cpu(rcu_barrier_func, (void *)type, 1);"
+		 * returns, all online cpus have queued rcu_barrier_func().
+		 * The dying CPU clears its cpu_online_mask bit and
+		 * moves all of its RCU callbacks to ->orphan_cbs_list
+		 * in the context of stop_machine(), so subsequent calls
+		 * to _rcu_barrier() will adopt these callbacks and only
+		 * then queue rcu_barrier_func() on all remaining CPUs.
+		 */
+		rcu_send_cbs_to_orphanage(&rcu_bh_state);
+		rcu_send_cbs_to_orphanage(&rcu_sched_state);
+		rcu_preempt_send_cbs_to_orphanage();
+		break;
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 	case CPU_UP_CANCELED:
@@ -1526,7 +1647,8 @@ static void __init rcu_init_one(struct rcu_state *rsp)
 		cpustride *= rsp->levelspread[i];
 		rnp = rsp->level[i];
 		for (j = 0; j < rsp->levelcnt[i]; j++, rnp++) {
-			spin_lock_init(&rnp->lock);
+			if (rnp != rcu_get_root(rsp))
+				spin_lock_init(&rnp->lock);
 			rnp->gpnum = 0;
 			rnp->qsmask = 0;
 			rnp->qsmaskinit = 0;
@@ -1549,6 +1671,7 @@ static void __init rcu_init_one(struct rcu_state *rsp)
 			INIT_LIST_HEAD(&rnp->blocked_tasks[1]);
 		}
 	}
+	spin_lock_init(&rcu_get_root(rsp)->lock);
 }
 
 /*
@@ -1558,6 +1681,10 @@ static void __init rcu_init_one(struct rcu_state *rsp)
  */
 #define RCU_INIT_FLAVOR(rsp, rcu_data) \
 do { \
+	int i; \
+	int j; \
+	struct rcu_node *rnp; \
+	\
 	rcu_init_one(rsp); \
 	rnp = (rsp)->level[NUM_RCU_LVLS - 1]; \
 	j = 0; \
@@ -1570,31 +1697,8 @@ do { \
 	} \
 } while (0)
 
-#ifdef CONFIG_TREE_PREEMPT_RCU
-
-void __init __rcu_init_preempt(void)
-{
-	int i;			/* All used by RCU_INIT_FLAVOR(). */
-	int j;
-	struct rcu_node *rnp;
-
-	RCU_INIT_FLAVOR(&rcu_preempt_state, rcu_preempt_data);
-}
-
-#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
-
-void __init __rcu_init_preempt(void)
-{
-}
-
-#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
-
 void __init __rcu_init(void)
 {
-	int i;			/* All used by RCU_INIT_FLAVOR(). */
-	int j;
-	struct rcu_node *rnp;
-
 	rcu_bootup_announce();
 #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
 	printk(KERN_INFO "RCU-based detection of stalled CPUs is enabled.\n");
@@ -1605,6 +1709,4 @@ void __init __rcu_init(void)
 	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
 }
 
-module_param(blimit, int, 0);
-module_param(qhimark, int, 0);
-module_param(qlowmark, int, 0);
+#include "rcutree_plugin.h"
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 8e8287a..b40ac57 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -48,14 +48,14 @@
 #elif NR_CPUS <= RCU_FANOUT_SQ
 #  define NUM_RCU_LVLS	      2
 #  define NUM_RCU_LVL_0	      1
-#  define NUM_RCU_LVL_1	      (((NR_CPUS) + RCU_FANOUT - 1) / RCU_FANOUT)
+#  define NUM_RCU_LVL_1	      DIV_ROUND_UP(NR_CPUS, RCU_FANOUT)
 #  define NUM_RCU_LVL_2	      (NR_CPUS)
 #  define NUM_RCU_LVL_3	      0
 #elif NR_CPUS <= RCU_FANOUT_CUBE
 #  define NUM_RCU_LVLS	      3
 #  define NUM_RCU_LVL_0	      1
-#  define NUM_RCU_LVL_1	      (((NR_CPUS) + RCU_FANOUT_SQ - 1) / RCU_FANOUT_SQ)
-#  define NUM_RCU_LVL_2	      (((NR_CPUS) + (RCU_FANOUT) - 1) / (RCU_FANOUT))
+#  define NUM_RCU_LVL_1	      DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_SQ)
+#  define NUM_RCU_LVL_2	      DIV_ROUND_UP(NR_CPUS, RCU_FANOUT)
 #  define NUM_RCU_LVL_3	      NR_CPUS
 #else
 # error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
@@ -79,15 +79,21 @@ struct rcu_dynticks {
  * Definition for node within the RCU grace-period-detection hierarchy.
  */
 struct rcu_node {
-	spinlock_t lock;
+	spinlock_t lock;	/* Root rcu_node's lock protects some */
+				/*  rcu_state fields as well as following. */
 	long	gpnum;		/* Current grace period for this node. */
 				/*  This will either be equal to or one */
 				/*  behind the root rcu_node's gpnum. */
 	unsigned long qsmask;	/* CPUs or groups that need to switch in */
 				/*  order for current grace period to proceed.*/
+				/*  In leaf rcu_node, each bit corresponds to */
+				/*  an rcu_data structure, otherwise, each */
+				/*  bit corresponds to a child rcu_node */
+				/*  structure. */
 	unsigned long qsmaskinit;
 				/* Per-GP initialization for qsmask. */
 	unsigned long grpmask;	/* Mask to apply to parent qsmask. */
+				/*  Only one bit will be set in this mask. */
 	int	grplo;		/* lowest-numbered CPU or group here. */
 	int	grphi;		/* highest-numbered CPU or group here. */
 	u8	grpnum;		/* CPU/group number for next level up. */
@@ -95,8 +101,23 @@ struct rcu_node {
 	struct rcu_node *parent;
 	struct list_head blocked_tasks[2];
 				/* Tasks blocked in RCU read-side critsect. */
+				/*  Grace period number (->gpnum) x blocked */
+				/*  by tasks on the (x & 0x1) element of the */
+				/*  blocked_tasks[] array. */
 } ____cacheline_internodealigned_in_smp;
 
+/*
+ * Do a full breadth-first scan of the rcu_node structures for the
+ * specified rcu_state structure.
+ */
+#define rcu_for_each_node_breadth_first(rsp, rnp) \
+	for ((rnp) = &(rsp)->node[0]; \
+	     (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
+
+#define rcu_for_each_leaf_node(rsp, rnp) \
+	for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \
+	     (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
+
 /* Index values for nxttail array in struct rcu_data. */
 #define RCU_DONE_TAIL		0	/* Also RCU_WAIT head. */
 #define RCU_WAIT_TAIL		1	/* Also RCU_NEXT_READY head. */
@@ -126,19 +147,22 @@ struct rcu_data {
 	 * Any of the partitions might be empty, in which case the
 	 * pointer to that partition will be equal to the pointer for
 	 * the following partition.  When the list is empty, all of
-	 * the nxttail elements point to nxtlist, which is NULL.
+	 * the nxttail elements point to the ->nxtlist pointer itself,
+	 * which in that case is NULL.
 	 *
-	 * [*nxttail[RCU_NEXT_READY_TAIL], NULL = *nxttail[RCU_NEXT_TAIL]):
-	 *	Entries that might have arrived after current GP ended
-	 * [*nxttail[RCU_WAIT_TAIL], *nxttail[RCU_NEXT_READY_TAIL]):
-	 *	Entries known to have arrived before current GP ended
-	 * [*nxttail[RCU_DONE_TAIL], *nxttail[RCU_WAIT_TAIL]):
-	 *	Entries that batch # <= ->completed - 1: waiting for current GP
 	 * [nxtlist, *nxttail[RCU_DONE_TAIL]):
 	 *	Entries that batch # <= ->completed
 	 *	The grace period for these entries has completed, and
 	 *	the other grace-period-completed entries may be moved
 	 *	here temporarily in rcu_process_callbacks().
+	 * [*nxttail[RCU_DONE_TAIL], *nxttail[RCU_WAIT_TAIL]):
+	 *	Entries that batch # <= ->completed - 1: waiting for current GP
+	 * [*nxttail[RCU_WAIT_TAIL], *nxttail[RCU_NEXT_READY_TAIL]):
+	 *	Entries known to have arrived before current GP ended
+	 * [*nxttail[RCU_NEXT_READY_TAIL], *nxttail[RCU_NEXT_TAIL]):
+	 *	Entries that might have arrived after current GP ended
+	 *	Note that the value of *nxttail[RCU_NEXT_TAIL] will
+	 *	always be NULL, as this is the end of the list.
 	 */
 	struct rcu_head *nxtlist;
 	struct rcu_head **nxttail[RCU_NEXT_SIZE];
@@ -216,8 +240,19 @@ struct rcu_state {
 						/* Force QS state. */
 	long	gpnum;				/* Current gp number. */
 	long	completed;			/* # of last completed gp. */
+
+	/* End  of fields guarded by root rcu_node's lock. */
+
 	spinlock_t onofflock;			/* exclude on/offline and */
-						/*  starting new GP. */
+						/*  starting new GP.  Also */
+						/*  protects the following */
+						/*  orphan_cbs fields. */
+	struct rcu_head *orphan_cbs_list;	/* list of rcu_head structs */
+						/*  orphaned by all CPUs in */
+						/*  a given leaf rcu_node */
+						/*  going offline. */
+	struct rcu_head **orphan_cbs_tail;	/* And tail pointer. */
+	long orphan_qlen;			/* Number of orphaned cbs. */
 	spinlock_t fqslock;			/* Only one task forcing */
 						/*  quiescent states. */
 	unsigned long jiffies_force_qs;		/* Time at which to invoke */
@@ -255,5 +290,30 @@ extern struct rcu_state rcu_preempt_state;
 DECLARE_PER_CPU(struct rcu_data, rcu_preempt_data);
 #endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
 
-#endif /* #ifdef RCU_TREE_NONCORE */
+#else /* #ifdef RCU_TREE_NONCORE */
+
+/* Forward declarations for rcutree_plugin.h */
+static inline void rcu_bootup_announce(void);
+long rcu_batches_completed(void);
+static void rcu_preempt_note_context_switch(int cpu);
+static int rcu_preempted_readers(struct rcu_node *rnp);
+#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
+static void rcu_print_task_stall(struct rcu_node *rnp);
+#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
+#ifdef CONFIG_HOTPLUG_CPU
+static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
+				      struct rcu_node *rnp,
+				      struct rcu_data *rdp);
+static void rcu_preempt_offline_cpu(int cpu);
+#endif /* #ifdef CONFIG_HOTPLUG_CPU */
+static void rcu_preempt_check_callbacks(int cpu);
+static void rcu_preempt_process_callbacks(void);
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
+static int rcu_preempt_pending(int cpu);
+static int rcu_preempt_needs_cpu(int cpu);
+static void __cpuinit rcu_preempt_init_percpu_data(int cpu);
+static void rcu_preempt_send_cbs_to_orphanage(void);
+static void __init __rcu_init_preempt(void);
 
+#endif /* #else #ifdef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 1cee04f..c0cb783 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -150,6 +150,16 @@ void __rcu_read_lock(void)
 }
 EXPORT_SYMBOL_GPL(__rcu_read_lock);
 
+/*
+ * Check for preempted RCU readers blocking the current grace period
+ * for the specified rcu_node structure.  If the caller needs a reliable
+ * answer, it must hold the rcu_node's ->lock.
+ */
+static int rcu_preempted_readers(struct rcu_node *rnp)
+{
+	return !list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]);
+}
+
 static void rcu_read_unlock_special(struct task_struct *t)
 {
 	int empty;
@@ -196,7 +206,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
 				break;
 			spin_unlock(&rnp->lock);  /* irqs remain disabled. */
 		}
-		empty = list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]);
+		empty = !rcu_preempted_readers(rnp);
 		list_del_init(&t->rcu_node_entry);
 		t->rcu_blocked_node = NULL;
 
@@ -207,7 +217,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		 * drop rnp->lock and restore irq.
 		 */
 		if (!empty && rnp->qsmask == 0 &&
-		    list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1])) {
+		    !rcu_preempted_readers(rnp)) {
 			struct rcu_node *rnp_p;
 
 			if (rnp->parent == NULL) {
@@ -257,12 +267,12 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
 {
 	unsigned long flags;
 	struct list_head *lp;
-	int phase = rnp->gpnum & 0x1;
+	int phase;
 	struct task_struct *t;
 
-	if (!list_empty(&rnp->blocked_tasks[phase])) {
+	if (rcu_preempted_readers(rnp)) {
 		spin_lock_irqsave(&rnp->lock, flags);
-		phase = rnp->gpnum & 0x1; /* re-read under lock. */
+		phase = rnp->gpnum & 0x1;
 		lp = &rnp->blocked_tasks[phase];
 		list_for_each_entry(t, lp, rcu_node_entry)
 			printk(" P%d", t->pid);
@@ -281,20 +291,10 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
  */
 static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
 {
-	WARN_ON_ONCE(!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]));
+	WARN_ON_ONCE(rcu_preempted_readers(rnp));
 	WARN_ON_ONCE(rnp->qsmask);
 }
 
-/*
- * Check for preempted RCU readers for the specified rcu_node structure.
- * If the caller needs a reliable answer, it must hold the rcu_node's
- * >lock.
- */
-static int rcu_preempted_readers(struct rcu_node *rnp)
-{
-	return !list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]);
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 
 /*
@@ -410,6 +410,15 @@ static int rcu_preempt_needs_cpu(int cpu)
 	return !!per_cpu(rcu_preempt_data, cpu).nxtlist;
 }
 
+/**
+ * rcu_barrier - Wait until all in-flight call_rcu() callbacks complete.
+ */
+void rcu_barrier(void)
+{
+	_rcu_barrier(&rcu_preempt_state, call_rcu);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
 /*
  * Initialize preemptable RCU's per-CPU data.
  */
@@ -419,6 +428,22 @@ static void __cpuinit rcu_preempt_init_percpu_data(int cpu)
 }
 
 /*
+ * Move preemptable RCU's callbacks to ->orphan_cbs_list.
+ */
+static void rcu_preempt_send_cbs_to_orphanage(void)
+{
+	rcu_send_cbs_to_orphanage(&rcu_preempt_state);
+}
+
+/*
+ * Initialize preemptable RCU's state structures.
+ */
+static void __init __rcu_init_preempt(void)
+{
+	RCU_INIT_FLAVOR(&rcu_preempt_state, rcu_preempt_data);
+}
+
+/*
  * Check for a task exiting while in a preemptable-RCU read-side
  * critical section, clean up if so.  No need to issue warnings,
  * as debug_check_no_locks_held() already does this if lockdep
@@ -461,6 +486,15 @@ static void rcu_preempt_note_context_switch(int cpu)
 {
 }
 
+/*
+ * Because preemptable RCU does not exist, there are never any preempted
+ * RCU readers.
+ */
+static int rcu_preempted_readers(struct rcu_node *rnp)
+{
+	return 0;
+}
+
 #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
 
 /*
@@ -483,15 +517,6 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
 	WARN_ON_ONCE(rnp->qsmask);
 }
 
-/*
- * Because preemptable RCU does not exist, there are never any preempted
- * RCU readers.
- */
-static int rcu_preempted_readers(struct rcu_node *rnp)
-{
-	return 0;
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 
 /*
@@ -518,7 +543,7 @@ static void rcu_preempt_offline_cpu(int cpu)
  * Because preemptable RCU does not exist, it never has any callbacks
  * to check.
  */
-void rcu_preempt_check_callbacks(int cpu)
+static void rcu_preempt_check_callbacks(int cpu)
 {
 }
 
@@ -526,7 +551,7 @@ void rcu_preempt_check_callbacks(int cpu)
  * Because preemptable RCU does not exist, it never has any callbacks
  * to process.
  */
-void rcu_preempt_process_callbacks(void)
+static void rcu_preempt_process_callbacks(void)
 {
 }
 
@@ -556,6 +581,16 @@ static int rcu_preempt_needs_cpu(int cpu)
 }
 
 /*
+ * Because preemptable RCU does not exist, rcu_barrier() is just
+ * another name for rcu_barrier_sched().
+ */
+void rcu_barrier(void)
+{
+	rcu_barrier_sched();
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
+/*
  * Because preemptable RCU does not exist, there is no per-CPU
  * data to initialize.
  */
@@ -563,4 +598,18 @@ static void __cpuinit rcu_preempt_init_percpu_data(int cpu)
 {
 }
 
+/*
+ * Because there is no preemptable RCU, there are no callbacks to move.
+ */
+static void rcu_preempt_send_cbs_to_orphanage(void)
+{
+}
+
+/*
+ * Because preemptable RCU does not exist, it need not be initialized.
+ */
+static void __init __rcu_init_preempt(void)
+{
+}
+
 #endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index c89f5e9..4b31c77 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -93,7 +93,7 @@ static int rcudata_open(struct inode *inode, struct file *file)
 	return single_open(file, show_rcudata, NULL);
 }
 
-static struct file_operations rcudata_fops = {
+static const struct file_operations rcudata_fops = {
 	.owner = THIS_MODULE,
 	.open = rcudata_open,
 	.read = seq_read,
@@ -145,7 +145,7 @@ static int rcudata_csv_open(struct inode *inode, struct file *file)
 	return single_open(file, show_rcudata_csv, NULL);
 }
 
-static struct file_operations rcudata_csv_fops = {
+static const struct file_operations rcudata_csv_fops = {
 	.owner = THIS_MODULE,
 	.open = rcudata_csv_open,
 	.read = seq_read,
@@ -159,13 +159,13 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 	struct rcu_node *rnp;
 
 	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
-	              "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu\n",
+		      "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n",
 		   rsp->completed, rsp->gpnum, rsp->signaled,
 		   (long)(rsp->jiffies_force_qs - jiffies),
 		   (int)(jiffies & 0xffff),
 		   rsp->n_force_qs, rsp->n_force_qs_ngp,
 		   rsp->n_force_qs - rsp->n_force_qs_ngp,
-		   rsp->n_force_qs_lh);
+		   rsp->n_force_qs_lh, rsp->orphan_qlen);
 	for (rnp = &rsp->node[0]; rnp - &rsp->node[0] < NUM_RCU_NODES; rnp++) {
 		if (rnp->level != level) {
 			seq_puts(m, "\n");
@@ -196,7 +196,7 @@ static int rcuhier_open(struct inode *inode, struct file *file)
 	return single_open(file, show_rcuhier, NULL);
 }
 
-static struct file_operations rcuhier_fops = {
+static const struct file_operations rcuhier_fops = {
 	.owner = THIS_MODULE,
 	.open = rcuhier_open,
 	.read = seq_read,
@@ -222,7 +222,7 @@ static int rcugp_open(struct inode *inode, struct file *file)
 	return single_open(file, show_rcugp, NULL);
 }
 
-static struct file_operations rcugp_fops = {
+static const struct file_operations rcugp_fops = {
 	.owner = THIS_MODULE,
 	.open = rcugp_open,
 	.read = seq_read,
@@ -276,7 +276,7 @@ static int rcu_pending_open(struct inode *inode, struct file *file)
 	return single_open(file, show_rcu_pending, NULL);
 }
 
-static struct file_operations rcu_pending_fops = {
+static const struct file_operations rcu_pending_fops = {
 	.owner = THIS_MODULE,
 	.open = rcu_pending_open,
 	.read = seq_read,

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-09-21 13:13 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-09-21 13:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Paul E. McKenney, Peter Zijlstra

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

RCU fixes.

 Thanks,

	Ingo

------------------>
Josh Triplett (1):
      rcutorture: Occasionally delay readers enough to make RCU force_quiescent_state

Paul E. McKenney (10):
      rcu: Need to update rnp->gpnum if preemptable RCU is to be reliable
      rcu: Initialize multi-level RCU grace periods holding locks
      rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down
      rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods
      rcu: Simplify rcu_read_unlock_special() quiescent-state accounting
      rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU
      rcu: Add WARN_ON_ONCE() consistency checks covering state transitions
      rcu: Apply results of code inspection of kernel/rcutree_plugin.h
      rcu: Fix thinko, actually initialize full tree
      rcu: Fix whitespace inconsistencies


 include/linux/rculist_nulls.h |    2 +-
 include/linux/rcupdate.h      |   29 +++--------
 include/linux/rcutree.h       |    6 +-
 include/linux/sched.h         |    1 -
 init/Kconfig                  |    3 +-
 kernel/rcupdate.c             |   48 +++++++++++++++++-
 kernel/rcutorture.c           |   43 +++++++++-------
 kernel/rcutree.c              |  105 ++++++++++++++-------------------------
 kernel/rcutree.h              |    2 +-
 kernel/rcutree_plugin.h       |  110 +++++++++++++++++++++++++++--------------
 kernel/rcutree_trace.c        |    2 +-
 11 files changed, 195 insertions(+), 156 deletions(-)

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index f9ddd03..589a409 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -102,7 +102,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
  */
 #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
 	for (pos = rcu_dereference((head)->first);			 \
-		(!is_a_nulls(pos)) && 			\
+		(!is_a_nulls(pos)) &&			\
 		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
 		pos = rcu_dereference(pos->next))
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 95e0615..6fe0363 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1,5 +1,5 @@
 /*
- * Read-Copy Update mechanism for mutual exclusion 
+ * Read-Copy Update mechanism for mutual exclusion
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -18,7 +18,7 @@
  * Copyright IBM Corporation, 2001
  *
  * Author: Dipankar Sarma <dipankar@in.ibm.com>
- * 
+ *
  * Based on the original work by Paul McKenney <paulmck@us.ibm.com>
  * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
  * Papers:
@@ -26,7 +26,7 @@
  * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
  *
  * For detailed explanation of Read-Copy Update mechanism see -
- * 		http://lse.sourceforge.net/locking/rcupdate.html
+ *		http://lse.sourceforge.net/locking/rcupdate.html
  *
  */
 
@@ -52,8 +52,13 @@ struct rcu_head {
 };
 
 /* Exported common interfaces */
+#ifdef CONFIG_TREE_PREEMPT_RCU
 extern void synchronize_rcu(void);
+#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
+#define synchronize_rcu synchronize_sched
+#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
 extern void synchronize_rcu_bh(void);
+extern void synchronize_sched(void);
 extern void rcu_barrier(void);
 extern void rcu_barrier_bh(void);
 extern void rcu_barrier_sched(void);
@@ -262,24 +267,6 @@ struct rcu_synchronize {
 extern void wakeme_after_rcu(struct rcu_head  *head);
 
 /**
- * synchronize_sched - block until all CPUs have exited any non-preemptive
- * kernel code sequences.
- *
- * This means that all preempt_disable code sequences, including NMI and
- * hardware-interrupt handlers, in progress on entry will have completed
- * before this primitive returns.  However, this does not guarantee that
- * softirq handlers will have completed, since in some kernels, these
- * handlers can run in process context, and can block.
- *
- * This primitive provides the guarantees made by the (now removed)
- * synchronize_kernel() API.  In contrast, synchronize_rcu() only
- * guarantees that rcu_read_lock() sections will have completed.
- * In "classic RCU", these two guarantees happen to be one and
- * the same, but can differ in realtime RCU implementations.
- */
-#define synchronize_sched() __synchronize_sched()
-
-/**
  * call_rcu - Queue an RCU callback for invocation after a grace period.
  * @head: structure to be used for queueing the RCU updates.
  * @func: actual update function to be invoked after the grace period
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index a893077..3768277 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -24,7 +24,7 @@
  * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
  *
  * For detailed explanation of Read-Copy Update mechanism see -
- * 	Documentation/RCU
+ *	Documentation/RCU
  */
 
 #ifndef __LINUX_RCUTREE_H
@@ -53,6 +53,8 @@ static inline void __rcu_read_unlock(void)
 	preempt_enable();
 }
 
+#define __synchronize_sched() synchronize_rcu()
+
 static inline void exit_rcu(void)
 {
 }
@@ -68,8 +70,6 @@ static inline void __rcu_read_unlock_bh(void)
 	local_bh_enable();
 }
 
-#define __synchronize_sched() synchronize_rcu()
-
 extern void call_rcu_sched(struct rcu_head *head,
 			   void (*func)(struct rcu_head *rcu));
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f3d74bd..c62a9f8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1740,7 +1740,6 @@ extern cputime_t task_gtime(struct task_struct *p);
 
 #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
 #define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
-#define RCU_READ_UNLOCK_GOT_QS  (1 << 2) /* CPU has responded to RCU core. */
 
 static inline void rcu_copy_process(struct task_struct *p)
 {
diff --git a/init/Kconfig b/init/Kconfig
index 8e8b76d..4c2c936 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -331,7 +331,8 @@ config TREE_PREEMPT_RCU
 	  This option selects the RCU implementation that is
 	  designed for very large SMP systems with hundreds or
 	  thousands of CPUs, but for which real-time response
-	  is also required.
+	  is also required.  It also scales down nicely to
+	  smaller systems.
 
 endchoice
 
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index bd5d5c8..37ac454 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -19,7 +19,7 @@
  *
  * Authors: Dipankar Sarma <dipankar@in.ibm.com>
  *	    Manfred Spraul <manfred@colorfullife.com>
- * 
+ *
  * Based on the original work by Paul McKenney <paulmck@us.ibm.com>
  * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
  * Papers:
@@ -27,7 +27,7 @@
  * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
  *
  * For detailed explanation of Read-Copy Update mechanism see -
- * 		http://lse.sourceforge.net/locking/rcupdate.html
+ *		http://lse.sourceforge.net/locking/rcupdate.html
  *
  */
 #include <linux/types.h>
@@ -74,6 +74,8 @@ void wakeme_after_rcu(struct rcu_head  *head)
 	complete(&rcu->completion);
 }
 
+#ifdef CONFIG_TREE_PREEMPT_RCU
+
 /**
  * synchronize_rcu - wait until a grace period has elapsed.
  *
@@ -87,7 +89,7 @@ void synchronize_rcu(void)
 {
 	struct rcu_synchronize rcu;
 
-	if (rcu_blocking_is_gp())
+	if (!rcu_scheduler_active)
 		return;
 
 	init_completion(&rcu.completion);
@@ -98,6 +100,46 @@ void synchronize_rcu(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
+#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
+
+/**
+ * synchronize_sched - wait until an rcu-sched grace period has elapsed.
+ *
+ * Control will return to the caller some time after a full rcu-sched
+ * grace period has elapsed, in other words after all currently executing
+ * rcu-sched read-side critical sections have completed.   These read-side
+ * critical sections are delimited by rcu_read_lock_sched() and
+ * rcu_read_unlock_sched(), and may be nested.  Note that preempt_disable(),
+ * local_irq_disable(), and so on may be used in place of
+ * rcu_read_lock_sched().
+ *
+ * This means that all preempt_disable code sequences, including NMI and
+ * hardware-interrupt handlers, in progress on entry will have completed
+ * before this primitive returns.  However, this does not guarantee that
+ * softirq handlers will have completed, since in some kernels, these
+ * handlers can run in process context, and can block.
+ *
+ * This primitive provides the guarantees made by the (now removed)
+ * synchronize_kernel() API.  In contrast, synchronize_rcu() only
+ * guarantees that rcu_read_lock() sections will have completed.
+ * In "classic RCU", these two guarantees happen to be one and
+ * the same, but can differ in realtime RCU implementations.
+ */
+void synchronize_sched(void)
+{
+	struct rcu_synchronize rcu;
+
+	if (rcu_blocking_is_gp())
+		return;
+
+	init_completion(&rcu.completion);
+	/* Will wake me after RCU finished. */
+	call_rcu_sched(&rcu.head, wakeme_after_rcu);
+	/* Wait for it. */
+	wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(synchronize_sched);
+
 /**
  * synchronize_rcu_bh - wait until an rcu_bh grace period has elapsed.
  *
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index b33db53..233768f 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -18,7 +18,7 @@
  * Copyright (C) IBM Corporation, 2005, 2006
  *
  * Authors: Paul E. McKenney <paulmck@us.ibm.com>
- *          Josh Triplett <josh@freedesktop.org>
+ *	  Josh Triplett <josh@freedesktop.org>
  *
  * See also:  Documentation/RCU/torture.txt
  */
@@ -50,7 +50,7 @@
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and "
-              "Josh Triplett <josh@freedesktop.org>");
+	      "Josh Triplett <josh@freedesktop.org>");
 
 static int nreaders = -1;	/* # reader threads, defaults to 2*ncpus */
 static int nfakewriters = 4;	/* # fake writer threads */
@@ -110,8 +110,8 @@ struct rcu_torture {
 };
 
 static LIST_HEAD(rcu_torture_freelist);
-static struct rcu_torture *rcu_torture_current = NULL;
-static long rcu_torture_current_version = 0;
+static struct rcu_torture *rcu_torture_current;
+static long rcu_torture_current_version;
 static struct rcu_torture rcu_tortures[10 * RCU_TORTURE_PIPE_LEN];
 static DEFINE_SPINLOCK(rcu_torture_lock);
 static DEFINE_PER_CPU(long [RCU_TORTURE_PIPE_LEN + 1], rcu_torture_count) =
@@ -124,11 +124,11 @@ static atomic_t n_rcu_torture_alloc_fail;
 static atomic_t n_rcu_torture_free;
 static atomic_t n_rcu_torture_mberror;
 static atomic_t n_rcu_torture_error;
-static long n_rcu_torture_timers = 0;
+static long n_rcu_torture_timers;
 static struct list_head rcu_torture_removed;
 static cpumask_var_t shuffle_tmp_mask;
 
-static int stutter_pause_test = 0;
+static int stutter_pause_test;
 
 #if defined(MODULE) || defined(CONFIG_RCU_TORTURE_TEST_RUNNABLE)
 #define RCUTORTURE_RUNNABLE_INIT 1
@@ -267,7 +267,8 @@ struct rcu_torture_ops {
 	int irq_capable;
 	char *name;
 };
-static struct rcu_torture_ops *cur_ops = NULL;
+
+static struct rcu_torture_ops *cur_ops;
 
 /*
  * Definitions for rcu torture testing.
@@ -281,14 +282,17 @@ static int rcu_torture_read_lock(void) __acquires(RCU)
 
 static void rcu_read_delay(struct rcu_random_state *rrsp)
 {
-	long delay;
-	const long longdelay = 200;
+	const unsigned long shortdelay_us = 200;
+	const unsigned long longdelay_ms = 50;
 
-	/* We want there to be long-running readers, but not all the time. */
+	/* We want a short delay sometimes to make a reader delay the grace
+	 * period, and we want a long delay occasionally to trigger
+	 * force_quiescent_state. */
 
-	delay = rcu_random(rrsp) % (nrealreaders * 2 * longdelay);
-	if (!delay)
-		udelay(longdelay);
+	if (!(rcu_random(rrsp) % (nrealreaders * 2000 * longdelay_ms)))
+		mdelay(longdelay_ms);
+	if (!(rcu_random(rrsp) % (nrealreaders * 2 * shortdelay_us)))
+		udelay(shortdelay_us);
 }
 
 static void rcu_torture_read_unlock(int idx) __releases(RCU)
@@ -339,8 +343,8 @@ static struct rcu_torture_ops rcu_ops = {
 	.sync		= synchronize_rcu,
 	.cb_barrier	= rcu_barrier,
 	.stats		= NULL,
-	.irq_capable 	= 1,
-	.name 		= "rcu"
+	.irq_capable	= 1,
+	.name		= "rcu"
 };
 
 static void rcu_sync_torture_deferred_free(struct rcu_torture *p)
@@ -638,7 +642,8 @@ rcu_torture_writer(void *arg)
 
 	do {
 		schedule_timeout_uninterruptible(1);
-		if ((rp = rcu_torture_alloc()) == NULL)
+		rp = rcu_torture_alloc();
+		if (rp == NULL)
 			continue;
 		rp->rtort_pipe_count = 0;
 		udelay(rcu_random(&rand) & 0x3ff);
@@ -1110,7 +1115,7 @@ rcu_torture_init(void)
 		printk(KERN_ALERT "rcutorture: invalid torture type: \"%s\"\n",
 		       torture_type);
 		mutex_unlock(&fullstop_mutex);
-		return (-EINVAL);
+		return -EINVAL;
 	}
 	if (cur_ops->init)
 		cur_ops->init(); /* no "goto unwind" prior to this point!!! */
@@ -1161,7 +1166,7 @@ rcu_torture_init(void)
 		goto unwind;
 	}
 	fakewriter_tasks = kzalloc(nfakewriters * sizeof(fakewriter_tasks[0]),
-	                           GFP_KERNEL);
+				   GFP_KERNEL);
 	if (fakewriter_tasks == NULL) {
 		VERBOSE_PRINTK_ERRSTRING("out of memory");
 		firsterr = -ENOMEM;
@@ -1170,7 +1175,7 @@ rcu_torture_init(void)
 	for (i = 0; i < nfakewriters; i++) {
 		VERBOSE_PRINTK_STRING("Creating rcu_torture_fakewriter task");
 		fakewriter_tasks[i] = kthread_run(rcu_torture_fakewriter, NULL,
-		                                  "rcu_torture_fakewriter");
+						  "rcu_torture_fakewriter");
 		if (IS_ERR(fakewriter_tasks[i])) {
 			firsterr = PTR_ERR(fakewriter_tasks[i]);
 			VERBOSE_PRINTK_ERRSTRING("Failed to create fakewriter");
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 6b11b07..52b06f6 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -25,7 +25,7 @@
  * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
  *
  * For detailed explanation of Read-Copy Update mechanism see -
- * 	Documentation/RCU
+ *	Documentation/RCU
  */
 #include <linux/types.h>
 #include <linux/kernel.h>
@@ -107,27 +107,23 @@ static void __cpuinit rcu_init_percpu_data(int cpu, struct rcu_state *rsp,
  */
 void rcu_sched_qs(int cpu)
 {
-	unsigned long flags;
 	struct rcu_data *rdp;
 
-	local_irq_save(flags);
 	rdp = &per_cpu(rcu_sched_data, cpu);
-	rdp->passed_quiesc = 1;
 	rdp->passed_quiesc_completed = rdp->completed;
-	rcu_preempt_qs(cpu);
-	local_irq_restore(flags);
+	barrier();
+	rdp->passed_quiesc = 1;
+	rcu_preempt_note_context_switch(cpu);
 }
 
 void rcu_bh_qs(int cpu)
 {
-	unsigned long flags;
 	struct rcu_data *rdp;
 
-	local_irq_save(flags);
 	rdp = &per_cpu(rcu_bh_data, cpu);
-	rdp->passed_quiesc = 1;
 	rdp->passed_quiesc_completed = rdp->completed;
-	local_irq_restore(flags);
+	barrier();
+	rdp->passed_quiesc = 1;
 }
 
 #ifdef CONFIG_NO_HZ
@@ -605,8 +601,6 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 {
 	struct rcu_data *rdp = rsp->rda[smp_processor_id()];
 	struct rcu_node *rnp = rcu_get_root(rsp);
-	struct rcu_node *rnp_cur;
-	struct rcu_node *rnp_end;
 
 	if (!cpu_needs_another_gp(rsp, rdp)) {
 		spin_unlock_irqrestore(&rnp->lock, flags);
@@ -615,6 +609,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 
 	/* Advance to a new grace period and initialize state. */
 	rsp->gpnum++;
+	WARN_ON_ONCE(rsp->signaled == RCU_GP_INIT);
 	rsp->signaled = RCU_GP_INIT; /* Hold off force_quiescent_state. */
 	rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
 	record_gp_stall_check_time(rsp);
@@ -631,7 +626,9 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 
 	/* Special-case the common single-level case. */
 	if (NUM_RCU_NODES == 1) {
+		rcu_preempt_check_blocked_tasks(rnp);
 		rnp->qsmask = rnp->qsmaskinit;
+		rnp->gpnum = rsp->gpnum;
 		rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */
 		spin_unlock_irqrestore(&rnp->lock, flags);
 		return;
@@ -644,42 +641,28 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 	spin_lock(&rsp->onofflock);  /* irqs already disabled. */
 
 	/*
-	 * Set the quiescent-state-needed bits in all the non-leaf RCU
-	 * nodes for all currently online CPUs.  This operation relies
-	 * on the layout of the hierarchy within the rsp->node[] array.
-	 * Note that other CPUs will access only the leaves of the
-	 * hierarchy, which still indicate that no grace period is in
-	 * progress.  In addition, we have excluded CPU-hotplug operations.
-	 *
-	 * We therefore do not need to hold any locks.  Any required
-	 * memory barriers will be supplied by the locks guarding the
-	 * leaf rcu_nodes in the hierarchy.
-	 */
-
-	rnp_end = rsp->level[NUM_RCU_LVLS - 1];
-	for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++)
-		rnp_cur->qsmask = rnp_cur->qsmaskinit;
-
-	/*
-	 * Now set up the leaf nodes.  Here we must be careful.  First,
-	 * we need to hold the lock in order to exclude other CPUs, which
-	 * might be contending for the leaf nodes' locks.  Second, as
-	 * soon as we initialize a given leaf node, its CPUs might run
-	 * up the rest of the hierarchy.  We must therefore acquire locks
-	 * for each node that we touch during this stage.  (But we still
-	 * are excluding CPU-hotplug operations.)
+	 * Set the quiescent-state-needed bits in all the rcu_node
+	 * structures for all currently online CPUs in breadth-first
+	 * order, starting from the root rcu_node structure.  This
+	 * operation relies on the layout of the hierarchy within the
+	 * rsp->node[] array.  Note that other CPUs will access only
+	 * the leaves of the hierarchy, which still indicate that no
+	 * grace period is in progress, at least until the corresponding
+	 * leaf node has been initialized.  In addition, we have excluded
+	 * CPU-hotplug operations.
 	 *
 	 * Note that the grace period cannot complete until we finish
 	 * the initialization process, as there will be at least one
 	 * qsmask bit set in the root node until that time, namely the
-	 * one corresponding to this CPU.
+	 * one corresponding to this CPU, due to the fact that we have
+	 * irqs disabled.
 	 */
-	rnp_end = &rsp->node[NUM_RCU_NODES];
-	rnp_cur = rsp->level[NUM_RCU_LVLS - 1];
-	for (; rnp_cur < rnp_end; rnp_cur++) {
-		spin_lock(&rnp_cur->lock);	/* irqs already disabled. */
-		rnp_cur->qsmask = rnp_cur->qsmaskinit;
-		spin_unlock(&rnp_cur->lock);	/* irqs already disabled. */
+	for (rnp = &rsp->node[0]; rnp < &rsp->node[NUM_RCU_NODES]; rnp++) {
+		spin_lock(&rnp->lock);	/* irqs already disabled. */
+		rcu_preempt_check_blocked_tasks(rnp);
+		rnp->qsmask = rnp->qsmaskinit;
+		rnp->gpnum = rsp->gpnum;
+		spin_unlock(&rnp->lock);	/* irqs already disabled. */
 	}
 
 	rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */
@@ -722,6 +705,7 @@ rcu_process_gp_end(struct rcu_state *rsp, struct rcu_data *rdp)
 static void cpu_quiet_msk_finish(struct rcu_state *rsp, unsigned long flags)
 	__releases(rnp->lock)
 {
+	WARN_ON_ONCE(rsp->completed == rsp->gpnum);
 	rsp->completed = rsp->gpnum;
 	rcu_process_gp_end(rsp, rsp->rda[smp_processor_id()]);
 	rcu_start_gp(rsp, flags);  /* releases root node's rnp->lock. */
@@ -739,6 +723,8 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,
 	      unsigned long flags)
 	__releases(rnp->lock)
 {
+	struct rcu_node *rnp_c;
+
 	/* Walk up the rcu_node hierarchy. */
 	for (;;) {
 		if (!(rnp->qsmask & mask)) {
@@ -762,8 +748,10 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,
 			break;
 		}
 		spin_unlock_irqrestore(&rnp->lock, flags);
+		rnp_c = rnp;
 		rnp = rnp->parent;
 		spin_lock_irqsave(&rnp->lock, flags);
+		WARN_ON_ONCE(rnp_c->qsmask);
 	}
 
 	/*
@@ -776,10 +764,10 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,
 
 /*
  * Record a quiescent state for the specified CPU, which must either be
- * the current CPU or an offline CPU.  The lastcomp argument is used to
- * make sure we are still in the grace period of interest.  We don't want
- * to end the current grace period based on quiescent states detected in
- * an earlier grace period!
+ * the current CPU.  The lastcomp argument is used to make sure we are
+ * still in the grace period of interest.  We don't want to end the current
+ * grace period based on quiescent states detected in an earlier grace
+ * period!
  */
 static void
 cpu_quiet(int cpu, struct rcu_state *rsp, struct rcu_data *rdp, long lastcomp)
@@ -814,7 +802,6 @@ cpu_quiet(int cpu, struct rcu_state *rsp, struct rcu_data *rdp, long lastcomp)
 		 * This GP can't end until cpu checks in, so all of our
 		 * callbacks can be processed during the next GP.
 		 */
-		rdp = rsp->rda[smp_processor_id()];
 		rdp->nxttail[RCU_NEXT_READY_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
 
 		cpu_quiet_msk(mask, rsp, rnp, flags); /* releases rnp->lock */
@@ -872,7 +859,7 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
 	spin_lock_irqsave(&rsp->onofflock, flags);
 
 	/* Remove the outgoing CPU from the masks in the rcu_node hierarchy. */
-	rnp = rdp->mynode;
+	rnp = rdp->mynode;	/* this is the outgoing CPU's rnp. */
 	mask = rdp->grpmask;	/* rnp->grplo is constant. */
 	do {
 		spin_lock(&rnp->lock);		/* irqs already disabled. */
@@ -881,7 +868,7 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
 			spin_unlock(&rnp->lock); /* irqs remain disabled. */
 			break;
 		}
-		rcu_preempt_offline_tasks(rsp, rnp);
+		rcu_preempt_offline_tasks(rsp, rnp, rdp);
 		mask = rnp->grpmask;
 		spin_unlock(&rnp->lock);	/* irqs remain disabled. */
 		rnp = rnp->parent;
@@ -890,9 +877,6 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
 
 	spin_unlock(&rsp->onofflock);		/* irqs remain disabled. */
 
-	/* Being offline is a quiescent state, so go record it. */
-	cpu_quiet(cpu, rsp, rdp, lastcomp);
-
 	/*
 	 * Move callbacks from the outgoing CPU to the running CPU.
 	 * Note that the outgoing CPU is now quiscent, so it is now
@@ -1457,20 +1441,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
 		rnp = rnp->parent;
 	} while (rnp != NULL && !(rnp->qsmaskinit & mask));
 
-	spin_unlock(&rsp->onofflock);		/* irqs remain disabled. */
-
-	/*
-	 * A new grace period might start here.  If so, we will be part of
-	 * it, and its gpnum will be greater than ours, so we will
-	 * participate.  It is also possible for the gpnum to have been
-	 * incremented before this function was called, and the bitmasks
-	 * to not be filled out until now, in which case we will also
-	 * participate due to our gpnum being behind.
-	 */
-
-	/* Since it is coming online, the CPU is in a quiescent state. */
-	cpu_quiet(cpu, rsp, rdp, lastcomp);
-	local_irq_restore(flags);
+	spin_unlock_irqrestore(&rsp->onofflock, flags);
 }
 
 static void __cpuinit rcu_online_cpu(int cpu)
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index bf8a6f9..8e8287a 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -142,7 +142,7 @@ struct rcu_data {
 	 */
 	struct rcu_head *nxtlist;
 	struct rcu_head **nxttail[RCU_NEXT_SIZE];
-	long		qlen; 	 	/* # of queued callbacks */
+	long		qlen;		/* # of queued callbacks */
 	long		blimit;		/* Upper limit on a processed batch */
 
 #ifdef CONFIG_NO_HZ
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4778936..1cee04f 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -64,22 +64,31 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
  * not in a quiescent state.  There might be any number of tasks blocked
  * while in an RCU read-side critical section.
  */
-static void rcu_preempt_qs_record(int cpu)
+static void rcu_preempt_qs(int cpu)
 {
 	struct rcu_data *rdp = &per_cpu(rcu_preempt_data, cpu);
-	rdp->passed_quiesc = 1;
 	rdp->passed_quiesc_completed = rdp->completed;
+	barrier();
+	rdp->passed_quiesc = 1;
 }
 
 /*
- * We have entered the scheduler or are between softirqs in ksoftirqd.
- * If we are in an RCU read-side critical section, we need to reflect
- * that in the state of the rcu_node structure corresponding to this CPU.
- * Caller must disable hardirqs.
+ * We have entered the scheduler, and the current task might soon be
+ * context-switched away from.  If this task is in an RCU read-side
+ * critical section, we will no longer be able to rely on the CPU to
+ * record that fact, so we enqueue the task on the appropriate entry
+ * of the blocked_tasks[] array.  The task will dequeue itself when
+ * it exits the outermost enclosing RCU read-side critical section.
+ * Therefore, the current grace period cannot be permitted to complete
+ * until the blocked_tasks[] entry indexed by the low-order bit of
+ * rnp->gpnum empties.
+ *
+ * Caller must disable preemption.
  */
-static void rcu_preempt_qs(int cpu)
+static void rcu_preempt_note_context_switch(int cpu)
 {
 	struct task_struct *t = current;
+	unsigned long flags;
 	int phase;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
@@ -90,7 +99,7 @@ static void rcu_preempt_qs(int cpu)
 		/* Possibly blocking in an RCU read-side critical section. */
 		rdp = rcu_preempt_state.rda[cpu];
 		rnp = rdp->mynode;
-		spin_lock(&rnp->lock);
+		spin_lock_irqsave(&rnp->lock, flags);
 		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
 		t->rcu_blocked_node = rnp;
 
@@ -103,11 +112,15 @@ static void rcu_preempt_qs(int cpu)
 		 * state for the current grace period), then as long
 		 * as that task remains queued, the current grace period
 		 * cannot end.
+		 *
+		 * But first, note that the current CPU must still be
+		 * on line!
 		 */
-		phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
+		WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
+		WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
+		phase = (rnp->gpnum + !(rnp->qsmask & rdp->grpmask)) & 0x1;
 		list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
-		smp_mb();  /* Ensure later ctxt swtch seen after above. */
-		spin_unlock(&rnp->lock);
+		spin_unlock_irqrestore(&rnp->lock, flags);
 	}
 
 	/*
@@ -119,9 +132,10 @@ static void rcu_preempt_qs(int cpu)
 	 * grace period, then the fact that the task has been enqueued
 	 * means that we continue to block the current grace period.
 	 */
-	rcu_preempt_qs_record(cpu);
-	t->rcu_read_unlock_special &= ~(RCU_READ_UNLOCK_NEED_QS |
-					RCU_READ_UNLOCK_GOT_QS);
+	rcu_preempt_qs(cpu);
+	local_irq_save(flags);
+	t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+	local_irq_restore(flags);
 }
 
 /*
@@ -157,7 +171,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
 	special = t->rcu_read_unlock_special;
 	if (special & RCU_READ_UNLOCK_NEED_QS) {
 		t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
-		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_GOT_QS;
+		rcu_preempt_qs(smp_processor_id());
 	}
 
 	/* Hardware IRQ handlers cannot block. */
@@ -177,10 +191,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		 */
 		for (;;) {
 			rnp = t->rcu_blocked_node;
-			spin_lock(&rnp->lock);
+			spin_lock(&rnp->lock);  /* irqs already disabled. */
 			if (rnp == t->rcu_blocked_node)
 				break;
-			spin_unlock(&rnp->lock);
+			spin_unlock(&rnp->lock);  /* irqs remain disabled. */
 		}
 		empty = list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]);
 		list_del_init(&t->rcu_node_entry);
@@ -194,9 +208,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		 */
 		if (!empty && rnp->qsmask == 0 &&
 		    list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1])) {
-			t->rcu_read_unlock_special &=
-				~(RCU_READ_UNLOCK_NEED_QS |
-				  RCU_READ_UNLOCK_GOT_QS);
+			struct rcu_node *rnp_p;
+
 			if (rnp->parent == NULL) {
 				/* Only one rcu_node in the tree. */
 				cpu_quiet_msk_finish(&rcu_preempt_state, flags);
@@ -205,9 +218,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
 			/* Report up the rest of the hierarchy. */
 			mask = rnp->grpmask;
 			spin_unlock_irqrestore(&rnp->lock, flags);
-			rnp = rnp->parent;
-			spin_lock_irqsave(&rnp->lock, flags);
-			cpu_quiet_msk(mask, &rcu_preempt_state, rnp, flags);
+			rnp_p = rnp->parent;
+			spin_lock_irqsave(&rnp_p->lock, flags);
+			WARN_ON_ONCE(rnp->qsmask);
+			cpu_quiet_msk(mask, &rcu_preempt_state, rnp_p, flags);
 			return;
 		}
 		spin_unlock(&rnp->lock);
@@ -259,6 +273,19 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
 
 /*
+ * Check that the list of blocked tasks for the newly completed grace
+ * period is in fact empty.  It is a serious bug to complete a grace
+ * period that still has RCU readers blocked!  This function must be
+ * invoked -before- updating this rnp's ->gpnum, and the rnp's ->lock
+ * must be held by the caller.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+	WARN_ON_ONCE(!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]));
+	WARN_ON_ONCE(rnp->qsmask);
+}
+
+/*
  * Check for preempted RCU readers for the specified rcu_node structure.
  * If the caller needs a reliable answer, it must hold the rcu_node's
  * >lock.
@@ -280,7 +307,8 @@ static int rcu_preempted_readers(struct rcu_node *rnp)
  * The caller must hold rnp->lock with irqs disabled.
  */
 static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
-				      struct rcu_node *rnp)
+				      struct rcu_node *rnp,
+				      struct rcu_data *rdp)
 {
 	int i;
 	struct list_head *lp;
@@ -292,6 +320,9 @@ static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
 		WARN_ONCE(1, "Last CPU thought to be offlined?");
 		return;  /* Shouldn't happen: at least one CPU online. */
 	}
+	WARN_ON_ONCE(rnp != rdp->mynode &&
+		     (!list_empty(&rnp->blocked_tasks[0]) ||
+		      !list_empty(&rnp->blocked_tasks[1])));
 
 	/*
 	 * Move tasks up to root rcu_node.  Rely on the fact that the
@@ -335,20 +366,12 @@ static void rcu_preempt_check_callbacks(int cpu)
 	struct task_struct *t = current;
 
 	if (t->rcu_read_lock_nesting == 0) {
-		t->rcu_read_unlock_special &=
-			~(RCU_READ_UNLOCK_NEED_QS | RCU_READ_UNLOCK_GOT_QS);
-		rcu_preempt_qs_record(cpu);
+		t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+		rcu_preempt_qs(cpu);
 		return;
 	}
-	if (per_cpu(rcu_preempt_data, cpu).qs_pending) {
-		if (t->rcu_read_unlock_special & RCU_READ_UNLOCK_GOT_QS) {
-			rcu_preempt_qs_record(cpu);
-			t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_GOT_QS;
-		} else if (!(t->rcu_read_unlock_special &
-			     RCU_READ_UNLOCK_NEED_QS)) {
-			t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
-		}
-	}
+	if (per_cpu(rcu_preempt_data, cpu).qs_pending)
+		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
 }
 
 /*
@@ -434,7 +457,7 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
  * Because preemptable RCU does not exist, we never have to check for
  * CPUs being in quiescent states.
  */
-static void rcu_preempt_qs(int cpu)
+static void rcu_preempt_note_context_switch(int cpu)
 {
 }
 
@@ -451,6 +474,16 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
 
 /*
+ * Because there is no preemptable RCU, there can be no readers blocked,
+ * so there is no need to check for blocked tasks.  So check only for
+ * bogus qsmask values.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+	WARN_ON_ONCE(rnp->qsmask);
+}
+
+/*
  * Because preemptable RCU does not exist, there are never any preempted
  * RCU readers.
  */
@@ -466,7 +499,8 @@ static int rcu_preempted_readers(struct rcu_node *rnp)
  * tasks that were blocked within RCU read-side critical sections.
  */
 static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
-				      struct rcu_node *rnp)
+				      struct rcu_node *rnp,
+				      struct rcu_data *rdp)
 {
 }
 
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 0ea1bff..c89f5e9 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -20,7 +20,7 @@
  * Papers:  http://www.rdrop.com/users/paulmck/RCU
  *
  * For detailed explanation of Read-Copy Update mechanism see -
- * 		Documentation/RCU
+ *		Documentation/RCU
  *
  */
 #include <linux/types.h>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-08-13 18:54 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-08-13 18:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Darren Hart (2):
      futex: Update futex_q lock_ptr on requeue proxy lock
      futex: Fix handling of bad requeue syscall pairing

Dinakar Guniguntala (1):
      futex: Fix compat_futex to be same as futex for REQUEUE_PI

Peter Zijlstra (1):
      locking, sched: Give waitqueue spinlocks their own lockdep classes


 include/linux/wait.h  |    9 ++++++++-
 kernel/futex.c        |   28 ++++++++++++++++++++++------
 kernel/futex_compat.c |    6 ++++--
 kernel/wait.c         |    5 +++--
 4 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 6788e1a..cf3c2f5 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -77,7 +77,14 @@ struct task_struct;
 #define __WAIT_BIT_KEY_INITIALIZER(word, bit)				\
 	{ .flags = word, .bit_nr = bit, }
 
-extern void init_waitqueue_head(wait_queue_head_t *q);
+extern void __init_waitqueue_head(wait_queue_head_t *q, struct lock_class_key *);
+
+#define init_waitqueue_head(q)				\
+	do {						\
+		static struct lock_class_key __key;	\
+							\
+		__init_waitqueue_head((q), &__key);	\
+	} while (0)
 
 #ifdef CONFIG_LOCKDEP
 # define __WAIT_QUEUE_HEAD_INIT_ONSTACK(name) \
diff --git a/kernel/futex.c b/kernel/futex.c
index 0672ff8..e18cfbd 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1010,15 +1010,19 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
  * requeue_pi_wake_futex() - Wake a task that acquired the lock during requeue
  * q:	the futex_q
  * key:	the key of the requeue target futex
+ * hb:  the hash_bucket of the requeue target futex
  *
  * During futex_requeue, with requeue_pi=1, it is possible to acquire the
  * target futex if it is uncontended or via a lock steal.  Set the futex_q key
  * to the requeue target futex so the waiter can detect the wakeup on the right
  * futex, but remove it from the hb and NULL the rt_waiter so it can detect
- * atomic lock acquisition.  Must be called with the q->lock_ptr held.
+ * atomic lock acquisition.  Set the q->lock_ptr to the requeue target hb->lock
+ * to protect access to the pi_state to fixup the owner later.  Must be called
+ * with both q->lock_ptr and hb->lock held.
  */
 static inline
-void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key)
+void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
+			   struct futex_hash_bucket *hb)
 {
 	drop_futex_key_refs(&q->key);
 	get_futex_key_refs(key);
@@ -1030,6 +1034,11 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key)
 	WARN_ON(!q->rt_waiter);
 	q->rt_waiter = NULL;
 
+	q->lock_ptr = &hb->lock;
+#ifdef CONFIG_DEBUG_PI_LIST
+	q->list.plist.lock = &hb->lock;
+#endif
+
 	wake_up_state(q->task, TASK_NORMAL);
 }
 
@@ -1088,7 +1097,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 	ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task,
 				   set_waiters);
 	if (ret == 1)
-		requeue_pi_wake_futex(top_waiter, key2);
+		requeue_pi_wake_futex(top_waiter, key2, hb2);
 
 	return ret;
 }
@@ -1247,8 +1256,15 @@ retry_private:
 		if (!match_futex(&this->key, &key1))
 			continue;
 
-		WARN_ON(!requeue_pi && this->rt_waiter);
-		WARN_ON(requeue_pi && !this->rt_waiter);
+		/*
+		 * FUTEX_WAIT_REQEUE_PI and FUTEX_CMP_REQUEUE_PI should always
+		 * be paired with each other and no other futex ops.
+		 */
+		if ((requeue_pi && !this->rt_waiter) ||
+		    (!requeue_pi && this->rt_waiter)) {
+			ret = -EINVAL;
+			break;
+		}
 
 		/*
 		 * Wake nr_wake waiters.  For requeue_pi, if we acquired the
@@ -1273,7 +1289,7 @@ retry_private:
 							this->task, 1);
 			if (ret == 1) {
 				/* We got the lock. */
-				requeue_pi_wake_futex(this, &key2);
+				requeue_pi_wake_futex(this, &key2, hb2);
 				continue;
 			} else if (ret) {
 				/* -EDEADLK */
diff --git a/kernel/futex_compat.c b/kernel/futex_compat.c
index d607a5b..2357165 100644
--- a/kernel/futex_compat.c
+++ b/kernel/futex_compat.c
@@ -180,7 +180,8 @@ asmlinkage long compat_sys_futex(u32 __user *uaddr, int op, u32 val,
 	int cmd = op & FUTEX_CMD_MASK;
 
 	if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
-		      cmd == FUTEX_WAIT_BITSET)) {
+		      cmd == FUTEX_WAIT_BITSET ||
+		      cmd == FUTEX_WAIT_REQUEUE_PI)) {
 		if (get_compat_timespec(&ts, utime))
 			return -EFAULT;
 		if (!timespec_valid(&ts))
@@ -191,7 +192,8 @@ asmlinkage long compat_sys_futex(u32 __user *uaddr, int op, u32 val,
 			t = ktime_add_safe(ktime_get(), t);
 		tp = &t;
 	}
-	if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE)
+	if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
+	    cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP)
 		val2 = (int) (unsigned long) utime;
 
 	return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
diff --git a/kernel/wait.c b/kernel/wait.c
index ea7c3b4..c4bd3d8 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -10,13 +10,14 @@
 #include <linux/wait.h>
 #include <linux/hash.h>
 
-void init_waitqueue_head(wait_queue_head_t *q)
+void __init_waitqueue_head(wait_queue_head_t *q, struct lock_class_key *key)
 {
 	spin_lock_init(&q->lock);
+	lockdep_set_class(&q->lock, key);
 	INIT_LIST_HEAD(&q->task_list);
 }
 
-EXPORT_SYMBOL(init_waitqueue_head);
+EXPORT_SYMBOL(__init_waitqueue_head);
 
 void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
 {

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-08-09 16:07 Ingo Molnar
@ 2009-08-09 18:41 ` Darren Hart
  0 siblings, 0 replies; 97+ messages in thread
From: Darren Hart @ 2009-08-09 18:41 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml, , Thomas Gleixner, Linus Torvalds

Ingo Molnar wrote:
> Linus,
> 
> Please pull the latest core-fixes-for-linus git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus
> 
>  Thanks,
> 
> 	Ingo
> 
> ------------------>
> Darren Hart (2):
>       rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock()
>       futex: Update woken requeued futex_q lock_ptr

Ingo, this still has the older version of:

"futex: Update woken requeued futex_q lock_ptr"

Please update to the resend on Aug 7:


Update futex_q lock_ptr on requeue proxy lock (resend)

From: Darren Hart <dvhltc@us.ibm.com>

futex_requeue() can acquire the lock on behalf of a waiter early on or during
the requeue loop if it is uncontended or in the event of a lock steal or owner
died. On wakeup, the waiter (in futex_wait_requeue_pi()) cleans up the pi_state
owner using the lock_ptr to protect against concurrent access to the pi_state.
The pi_state is hung off futex_q's on the requeue target futex hash bucket so
the lock_ptr needs to be updated accordingly.

The problem manifested by triggering the WARN_ON in lookup_pi_state() about the
pid != pi_state->owner->pid.  With this patch, the pi_state is properly guarded
against concurrent access via the requeue target hb lock.

The astute reviewer may notice that there is a window of time between when
futex_requeue() unlocks the hb locks and when futex_wait_requeue_pi() will
acquire hb2->lock.  During this time the pi_state and uval are not in sync with
the underlying rtmutex owner (but the uval does indicate there are waiters, so
no atomic changes will occur in userspace).  However, this is not a problem.
Should a contending thread enter lookup_pi_state() and acquire hb2->lock before
the ownership is fixed up, it will find the pi_state hung off a waiter's
(possibly the pending owner's) futex_q and block on the rtmutex.  Once
futex_wait_requeue_pi() fixes up the owner, it will also move the pi_state from
the old owner's task->pi_state_list to its own.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Dinakar Guniguntala <dino@in.ibm.com>
CC: John Stultz <johnstul@us.ibm.com>
---

 kernel/futex.c |   17 +++++++++++++----
 1 files changed, 13 insertions(+), 4 deletions(-)


diff --git a/kernel/futex.c b/kernel/futex.c
index abce822..df30983 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1059,15 +1059,19 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
  * requeue_pi_wake_futex() - Wake a task that acquired the lock during requeue
  * q:	the futex_q
  * key:	the key of the requeue target futex
+ * hb:  the hash_bucket of the requeue target futex
  *
  * During futex_requeue, with requeue_pi=1, it is possible to acquire the
  * target futex if it is uncontended or via a lock steal.  Set the futex_q key
  * to the requeue target futex so the waiter can detect the wakeup on the right
  * futex, but remove it from the hb and NULL the rt_waiter so it can detect
- * atomic lock acquisition.  Must be called with the q->lock_ptr held.
+ * atomic lock acquisition.  Set the q->lock_ptr to the requeue target hb->lock
+ * to protect access to the pi_state to fixup the owner later.  Must be called
+ * with both q->lock_ptr and hb->lock held.
  */
 static inline
-void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key)
+void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
+			   struct futex_hash_bucket *hb)
 {
 	drop_futex_key_refs(&q->key);
 	get_futex_key_refs(key);
@@ -1079,6 +1083,11 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key)
 	WARN_ON(!q->rt_waiter);
 	q->rt_waiter = NULL;
 
+	q->lock_ptr = &hb->lock;
+#ifdef CONFIG_DEBUG_PI_LIST
+	q->list.plist.slock = &hb->lock;
+#endif
+
 	wake_up_state(q->task, TASK_NORMAL);
 }
 
@@ -1137,7 +1146,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 	ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task,
 				   set_waiters);
 	if (ret == 1)
-		requeue_pi_wake_futex(top_waiter, key2);
+		requeue_pi_wake_futex(top_waiter, key2, hb2);
 
 	return ret;
 }
@@ -1323,7 +1332,7 @@ retry_private:
 							this->task, 1);
 			if (ret == 1) {
 				/* We got the lock. */
-				requeue_pi_wake_futex(this, &key2);
+				requeue_pi_wake_futex(this, &key2, hb2);
 				continue;
 			} else if (ret) {
 				/* -EDEADLK */

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-08-09 16:07 Ingo Molnar
  2009-08-09 18:41 ` Darren Hart
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2009-08-09 16:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Darren Hart (2):
      rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock()
      futex: Update woken requeued futex_q lock_ptr

Li Zefan (2):
      lockdep: Fix file mode of lock_stat
      lockdep: Fix typos in documentation


 Documentation/lockdep-design.txt |    6 +++---
 kernel/futex.c                   |   13 +++++++++----
 kernel/lockdep_proc.c            |    3 ++-
 kernel/rtmutex.c                 |    4 +---
 4 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index e20d913..abf768c 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -30,9 +30,9 @@ State
 The validator tracks lock-class usage history into 4n + 1 separate state bits:
 
 - 'ever held in STATE context'
-- 'ever head as readlock in STATE context'
-- 'ever head with STATE enabled'
-- 'ever head as readlock with STATE enabled'
+- 'ever held as readlock in STATE context'
+- 'ever held with STATE enabled'
+- 'ever held as readlock with STATE enabled'
 
 Where STATE can be either one of (kernel/lockdep_states.h)
  - hardirq
diff --git a/kernel/futex.c b/kernel/futex.c
index 0672ff8..57f5a80 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1010,19 +1010,24 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
  * requeue_pi_wake_futex() - Wake a task that acquired the lock during requeue
  * q:	the futex_q
  * key:	the key of the requeue target futex
+ * hb:  the hash_bucket of the requeue target futex
  *
  * During futex_requeue, with requeue_pi=1, it is possible to acquire the
  * target futex if it is uncontended or via a lock steal.  Set the futex_q key
  * to the requeue target futex so the waiter can detect the wakeup on the right
  * futex, but remove it from the hb and NULL the rt_waiter so it can detect
- * atomic lock acquisition.  Must be called with the q->lock_ptr held.
+ * atomic lock acquisition.  Set the q->lock_ptr to the requeue target hb->lock
+ * to protect access to the pi_state to fixup the owner later.  Must be called
+ * with the q->lock_ptr held.
  */
 static inline
-void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key)
+void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
+			   struct futex_hash_bucket *hb)
 {
 	drop_futex_key_refs(&q->key);
 	get_futex_key_refs(key);
 	q->key = *key;
+	q->lock_ptr = &hb->lock;
 
 	WARN_ON(plist_node_empty(&q->list));
 	plist_del(&q->list, &q->list.plist);
@@ -1088,7 +1093,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 	ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task,
 				   set_waiters);
 	if (ret == 1)
-		requeue_pi_wake_futex(top_waiter, key2);
+		requeue_pi_wake_futex(top_waiter, key2, hb2);
 
 	return ret;
 }
@@ -1273,7 +1278,7 @@ retry_private:
 							this->task, 1);
 			if (ret == 1) {
 				/* We got the lock. */
-				requeue_pi_wake_futex(this, &key2);
+				requeue_pi_wake_futex(this, &key2, hb2);
 				continue;
 			} else if (ret) {
 				/* -EDEADLK */
diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c
index d7135aa..e94caa6 100644
--- a/kernel/lockdep_proc.c
+++ b/kernel/lockdep_proc.c
@@ -758,7 +758,8 @@ static int __init lockdep_proc_init(void)
 		    &proc_lockdep_stats_operations);
 
 #ifdef CONFIG_LOCK_STAT
-	proc_create("lock_stat", S_IRUSR, NULL, &proc_lock_stat_operations);
+	proc_create("lock_stat", S_IRUSR | S_IWUSR, NULL,
+		    &proc_lock_stat_operations);
 #endif
 
 	return 0;
diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index fcd107a..29bd4ba 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -1039,16 +1039,14 @@ int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
 	if (!rt_mutex_owner(lock) || try_to_steal_lock(lock, task)) {
 		/* We got the lock for task. */
 		debug_rt_mutex_lock(lock);
-
 		rt_mutex_set_owner(lock, task, 0);
-
+		spin_unlock(&lock->wait_lock);
 		rt_mutex_deadlock_account_lock(lock, task);
 		return 1;
 	}
 
 	ret = task_blocks_on_rt_mutex(lock, waiter, task, detect_deadlock);
 
-
 	if (ret && !waiter->task) {
 		/*
 		 * Reset the return value. We might have

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-07-10 19:06 ` Linus Torvalds
  2009-07-10 19:31   ` Ingo Molnar
@ 2009-07-13 14:52   ` Joerg Roedel
  1 sibling, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2009-07-13 14:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra

On Fri, Jul 10, 2009 at 12:06:23PM -0700, Linus Torvalds wrote:
> What am I missing (apart from the fact that all those variables are 
> horribly badly named)?
> 
> Also, the tests make no sense. That's not how you are supposed to check 
> for overlap to begin with.

The tests made sense in my brain when I wrote that function ;-) My code
checked for possible overlap scenarios. I havn't thought about doing it
much much simpler...

> 
> Isn't it easier to test for _not_ overlapping?
> 
> 	/* range1 is fully before range2 */
> 	(end1 <= start2 || 
> 	/* range1 is fully after range2 */
> 	start1 >= end2)

... by checking for non-overlap and negating the result. But now I know
better :-)

	Joerg



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-07-10 19:52     ` Linus Torvalds
@ 2009-07-10 20:02       ` Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-07-10 20:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Joerg Roedel


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 10 Jul 2009, Ingo Molnar wrote:
> > > 
> > > but I really migth have done soemthing wrong there. It's a 
> > > simple function, but somebody needs to double-check that I 
> > > haven't made it worse.
> > 
> > Looks correct to me.
> 
> Note, I didn't look at how 'end' works, and it really does matter 
> if 'end' is an "inclusive" or "exclusive" end pointer address. So 
> my replacement overlap() function was written more as a conceptual 
> patch - I did not check the exact semantics of the arguments 
> passed in.
> 
> If 'end' is exclusive, then 'b1' should be calculated as 
> 'a1+size-1', because the ranges must have the same rules. And then 
> you should use the 'strict inequality' operators for testing the 
> ranges.

The ranges are inclusive in terms of non-overlap: we can have 
adjacent ranges with b1==a2 or b2==a1 that are still considered 
non-overlapping. Hence the sharp test you used (which is negated) 
looks correct to me.

The end-of-range symbols we use:

        if (overlap(addr, len, _text, _etext) ||
            overlap(addr, len, __start_rodata, __end_rodata))

Are all at the first byte outside of the to-be-avoided range:

        .text : {
                _text = .;      /* Text */
                *(.text)
                *(.text.*)
                _etext = . ;
        }

        ...

        __param : AT(ADDR(__param) - LOAD_OFFSET) {                     \
                VMLINUX_SYMBOL(__start___param) = .;                    \
                *(__param)                                              \
                VMLINUX_SYMBOL(__stop___param) = .;                     \
                . = ALIGN((align));                                     \
                VMLINUX_SYMBOL(__end_rodata) = .;                       \
        }                                                               \

        ...

I think ...

	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-07-10 19:31   ` Ingo Molnar
@ 2009-07-10 19:52     ` Linus Torvalds
  2009-07-10 20:02       ` Ingo Molnar
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2009-07-10 19:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Joerg Roedel



On Fri, 10 Jul 2009, Ingo Molnar wrote:
> > 
> > but I really migth have done soemthing wrong there. It's a simple 
> > function, but somebody needs to double-check that I haven't made 
> > it worse.
> 
> Looks correct to me.

Note, I didn't look at how 'end' works, and it really does matter if 'end' 
is an "inclusive" or "exclusive" end pointer address. So my replacement 
overlap() function was written more as a conceptual patch - I did not 
check the exact semantics of the arguments passed in.

If 'end' is exclusive, then 'b1' should be calculated as 'a1+size-1', 
because the ranges must have the same rules. And then you should use the 
'strict inequality' operators for testing the ranges.

		Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-07-10 19:06 ` Linus Torvalds
@ 2009-07-10 19:31   ` Ingo Molnar
  2009-07-10 19:52     ` Linus Torvalds
  2009-07-13 14:52   ` Joerg Roedel
  1 sibling, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2009-07-10 19:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Joerg Roedel


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> On Fri, 10 Jul 2009, Ingo Molnar wrote:
> > 
> > Joerg Roedel (1):
> >       dma-debug: fix off-by-one error in overlap function
> >
> > diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> > index 3b93129..c9187fe 100644
> > --- a/lib/dma-debug.c
> > +++ b/lib/dma-debug.c
> > @@ -862,7 +862,7 @@ static inline bool overlap(void *addr, u64 size, void *start, void *end)
> >  
> >  	return ((addr >= start && addr < end) ||
> >  		(addr2 >= start && addr2 < end) ||
> > -		((addr < start) && (addr2 >= end)));
> > +		((addr < start) && (addr2 > end)));
> >  }
> >  
> >  static void check_for_illegal_area(struct device *dev, void *addr, u64 size)
> 
> The above seems like total shit.
> 
> If (addr < start && addr2 == end) then the two areas very much overlap.
> 
> What am I missing (apart from the fact that all those variables are 
> horribly badly named)?
> 
> Also, the tests make no sense. That's not how you are supposed to check 
> for overlap to begin with.
> 
> Isn't it easier to test for _not_ overlapping?
> 
> 	/* range1 is fully before range2 */
> 	(end1 <= start2 || 
> 	/* range1 is fully after range2 */
> 	start1 >= end2)
> 
> possibly together with checking for overflow in the size addition? 
> But I didn't think that through, so maybe I'm doing something 
> stupid.
> 
> Finally, why is 'size' a u64? It will overflow anyway if it's 
> bigger than a pointer, so it should be just 'unsigned long'. Or it 
> should all be done in u64 if people care. Or we should care about 
> overflow (which cannot be done with pointers).
> 
> Also, comparing pointers is unsafe to begin with. It's not clear 
> if they are signed or unsigned comparisons, and gcc has 
> historically had bugs here (only unsigned comparisons make sense 
> for pointers, but _technically_ a crazy compiler person could 
> argue that at least in some environments any valid pointers to the 
> same object - which is the only thing C defines - must not cross 
> the sign barrier, so they use a buggy signed compare).

hm, indeed - and i missed that.

[ Even in the pointer space i think this cast is slightly confused 
  too:

    static inline bool overlap(void *addr, u64 size, void *start, void *end)
    {
            void *addr2 = (char *)addr + size;

  as void * has byte granular arithmetics already so 'addr + size'
  would suffice. ]

> IOW, I think this whole function is just total crap, apparently 
> put together by randomly assembling characters until it compiles. 
> Somebody should put more effort into looking at it, but I think it 
> should be something like
> 
> 	static inline int overlap(void *addr, unsigned long len, void *start, void *end)
> 	{
> 		unsigned long a1 = (unsigned long) addr;
> 		unsigned long b1 = a1 + len;
> 		unsigned long a2 = (unsigned long) start;
> 		unsigned long b2 = (unsigned long) end;

At least some arguments have unsigned long natural types (they come 
out of page_address() for example) so the function parameters could 
perhaps be changed to unsigned long too as well.

> 	#ifdef WE_CARE_DEEPLY
> 		/* Overflow? */
> 		if (b1 < a1)
> 			return 1;
> 	#ifdef AND_ARE_ANAL
> 		if (b2 < a2)
> 			return 1;
> 	#endif
> 	#endif
> 		return !(b1 <= a2 || a1 >= b2);
> 	}
> 
> but I really migth have done soemthing wrong there. It's a simple 
> function, but somebody needs to double-check that I haven't made 
> it worse.

Looks correct to me.

	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-07-10 16:28 Ingo Molnar
@ 2009-07-10 19:06 ` Linus Torvalds
  2009-07-10 19:31   ` Ingo Molnar
  2009-07-13 14:52   ` Joerg Roedel
  0 siblings, 2 replies; 97+ messages in thread
From: Linus Torvalds @ 2009-07-10 19:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra, Joerg Roedel


On Fri, 10 Jul 2009, Ingo Molnar wrote:
> 
> Joerg Roedel (1):
>       dma-debug: fix off-by-one error in overlap function
>
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 3b93129..c9187fe 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -862,7 +862,7 @@ static inline bool overlap(void *addr, u64 size, void *start, void *end)
>  
>  	return ((addr >= start && addr < end) ||
>  		(addr2 >= start && addr2 < end) ||
> -		((addr < start) && (addr2 >= end)));
> +		((addr < start) && (addr2 > end)));
>  }
>  
>  static void check_for_illegal_area(struct device *dev, void *addr, u64 size)

The above seems like total shit.

If (addr < start && addr2 == end) then the two areas very much overlap.

What am I missing (apart from the fact that all those variables are 
horribly badly named)?

Also, the tests make no sense. That's not how you are supposed to check 
for overlap to begin with.

Isn't it easier to test for _not_ overlapping?

	/* range1 is fully before range2 */
	(end1 <= start2 || 
	/* range1 is fully after range2 */
	start1 >= end2)

possibly together with checking for overflow in the size addition? But I 
didn't think that through, so maybe I'm doing something stupid.

Finally, why is 'size' a u64? It will overflow anyway if it's bigger than 
a pointer, so it should be just 'unsigned long'. Or it should all be done 
in u64 if people care. Or we should care about overflow (which cannot be 
done with pointers).

Also, comparing pointers is unsafe to begin with. It's not clear if they 
are signed or unsigned comparisons, and gcc has historically had bugs here 
(only unsigned comparisons make sense for pointers, but _technically_ a 
crazy compiler person could argue that at least in some environments any 
valid pointers to the same object - which is the only thing C defines - 
must not cross the sign barrier, so they use a buggy signed compare).

IOW, I think this whole function is just total crap, apparently put 
together by randomly assembling characters until it compiles. Somebody 
should put more effort into looking at it, but I think it should be 
something like

	static inline int overlap(void *addr, unsigned long len, void *start, void *end)
	{
		unsigned long a1 = (unsigned long) addr;
		unsigned long b1 = a1 + len;
		unsigned long a2 = (unsigned long) start;
		unsigned long b2 = (unsigned long) end;

	#ifdef WE_CARE_DEEPLY
		/* Overflow? */
		if (b1 < a1)
			return 1;
	#ifdef AND_ARE_ANAL
		if (b2 < a2)
			return 1;
	#endif
	#endif
		return !(b1 <= a2 || a1 >= b2);
	}

but I really migth have done soemthing wrong there. It's a simple 
function, but somebody needs to double-check that I haven't made it worse.

		Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-07-10 16:28 Ingo Molnar
  2009-07-10 19:06 ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2009-07-10 16:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Arnd Bergmann (1):
      signals: declare sys_rt_tgsigqueueinfo in syscalls.h

Ingo Molnar (1):
      dma-debug: Put all hash-chain locks into the same lock class

Joerg Roedel (1):
      dma-debug: fix off-by-one error in overlap function

Maynard Johnson (1):
      oprofile: reset bt_lost_no_mapping with other stats

Paul E. McKenney (1):
      rcu: Mark Hierarchical RCU no longer experimental

Robert Richter (1):
      x86/oprofile: rename kernel parameter for architectural perfmon to arch_perfmon


 Documentation/kernel-parameters.txt |    4 ++--
 arch/x86/oprofile/nmi_int.c         |    2 +-
 drivers/oprofile/oprofile_stats.c   |    1 +
 include/linux/syscalls.h            |    2 ++
 kernel/rcutree.c                    |    3 +--
 lib/dma-debug.c                     |    4 ++--
 6 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 92e1ab8..c59e965 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1728,8 +1728,8 @@ and is between 256 and 4096 characters. It is defined in the file
 	oprofile.cpu_type=	Force an oprofile cpu type
 			This might be useful if you have an older oprofile
 			userland or if you want common events.
-			Format: { archperfmon }
-			archperfmon: [X86] Force use of architectural
+			Format: { arch_perfmon }
+			arch_perfmon: [X86] Force use of architectural
 				perfmon on Intel CPUs instead of the
 				CPU specific event set.
 
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index b07dd8d..89b9a5c 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -390,7 +390,7 @@ static int __init p4_init(char **cpu_type)
 static int force_arch_perfmon;
 static int force_cpu_type(const char *str, struct kernel_param *kp)
 {
-	if (!strcmp(str, "archperfmon")) {
+	if (!strcmp(str, "arch_perfmon")) {
 		force_arch_perfmon = 1;
 		printk(KERN_INFO "oprofile: forcing architectural perfmon\n");
 	}
diff --git a/drivers/oprofile/oprofile_stats.c b/drivers/oprofile/oprofile_stats.c
index e1f6ce0..3c2270a 100644
--- a/drivers/oprofile/oprofile_stats.c
+++ b/drivers/oprofile/oprofile_stats.c
@@ -33,6 +33,7 @@ void oprofile_reset_stats(void)
 	atomic_set(&oprofile_stats.sample_lost_no_mm, 0);
 	atomic_set(&oprofile_stats.sample_lost_no_mapping, 0);
 	atomic_set(&oprofile_stats.event_lost_overflow, 0);
+	atomic_set(&oprofile_stats.bt_lost_no_mapping, 0);
 }
 
 
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index fa4242c..80de700 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -321,6 +321,8 @@ asmlinkage long sys_rt_sigtimedwait(const sigset_t __user *uthese,
 				siginfo_t __user *uinfo,
 				const struct timespec __user *uts,
 				size_t sigsetsize);
+asmlinkage long sys_rt_tgsigqueueinfo(pid_t tgid, pid_t  pid, int sig,
+		siginfo_t __user *uinfo);
 asmlinkage long sys_kill(int pid, int sig);
 asmlinkage long sys_tgkill(int tgid, int pid, int sig);
 asmlinkage long sys_tkill(int pid, int sig);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 0dccfbb..7717b95 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1533,7 +1533,7 @@ void __init __rcu_init(void)
 	int j;
 	struct rcu_node *rnp;
 
-	printk(KERN_WARNING "Experimental hierarchical RCU implementation.\n");
+	printk(KERN_INFO "Hierarchical RCU implementation.\n");
 #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
 	printk(KERN_INFO "RCU-based detection of stalled CPUs is enabled.\n");
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
@@ -1546,7 +1546,6 @@ void __init __rcu_init(void)
 		rcu_cpu_notify(&rcu_nb, CPU_UP_PREPARE, (void *)(long)i);
 	/* Register notifier for non-boot CPUs */
 	register_cpu_notifier(&rcu_nb);
-	printk(KERN_WARNING "Experimental hierarchical RCU init done.\n");
 }
 
 module_param(blimit, int, 0);
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 3b93129..c9187fe 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -716,7 +716,7 @@ void dma_debug_init(u32 num_entries)
 
 	for (i = 0; i < HASH_SIZE; ++i) {
 		INIT_LIST_HEAD(&dma_entry_hash[i].list);
-		dma_entry_hash[i].lock = SPIN_LOCK_UNLOCKED;
+		spin_lock_init(&dma_entry_hash[i].lock);
 	}
 
 	if (dma_debug_fs_init() != 0) {
@@ -862,7 +862,7 @@ static inline bool overlap(void *addr, u64 size, void *start, void *end)
 
 	return ((addr >= start && addr < end) ||
 		(addr2 >= start && addr2 < end) ||
-		((addr < start) && (addr2 >= end)));
+		((addr < start) && (addr2 > end)));
 }
 
 static void check_for_illegal_area(struct device *dev, void *addr, u64 size)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-06-21 17:57         ` Linus Torvalds
@ 2009-06-21 19:26           ` Thomas Gleixner
  0 siblings, 0 replies; 97+ messages in thread
From: Thomas Gleixner @ 2009-06-21 19:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Andrew Morton

Linus,

On Sun, 21 Jun 2009, Linus Torvalds wrote:
> So just doing a "make_sure_its_writable()" and using handle_fault() is the 
> right thing to do. Because it's what get_user_fast() would have done too, 
> except it would have gone through first the fast case, and failed, then 
> the slow case, and failed the lookup there, and then the slow case would 
> have done that handle_mm_fault() in the end anyway.
> 
> In fact, since you're not actually interested in the page, you _could_ 
> just do
> 
> 	get_user_pages(tsk, mm, uaddr, 4, 1, 0, NULL, NULL);
> 
> where a NULL "pages" pointer already tells get_user_pages() that you're 
> not interested.
> 
> That's at least cleaner than doing a "gup_fast()" (which isn't fast), and 
> then freeing the page that you weren't even interested in.

Yes, you are right. The retry fixup path is after the fault and we
should go through handle_mm_fault as long as we do not have a general
available nondestructive counterpart of get_user().

I confused myself by twisting my brain whether we can simplify or even
get rid of the whole retry business.

Sorry, I did not express myself very well - looking for more than an
hour into the futex code definitely hurts your brain. It's worse than
the drugs you suspected we're on. :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-06-21 17:37       ` Linus Torvalds
@ 2009-06-21 17:57         ` Linus Torvalds
  2009-06-21 19:26           ` Thomas Gleixner
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2009-06-21 17:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Andrew Morton



On Sun, 21 Jun 2009, Linus Torvalds wrote:
> 
> And as far as I can tell, that is indeed the only case where you use that 
> 'get_user_writeable()' thing. You've had futex_atomic_op_inuser() fail, 
> and need to repeat. No?

I just checked. Yes, this code is _only_ entered when an atomic op 
returned EFAULT. IOW, we absolutely know that the page tables are not set 
up for writability, and thus that the "fast" case will never ever trigger.

(Ok, in theory you could have some other thread writing to that page at 
the same time and handle the page fault and making it writable, but in 
practice that's not really relevant).

So just doing a "make_sure_its_writable()" and using handle_fault() is the 
right thing to do. Because it's what get_user_fast() would have done too, 
except it would have gone through first the fast case, and failed, then 
the slow case, and failed the lookup there, and then the slow case would 
have done that handle_mm_fault() in the end anyway.

In fact, since you're not actually interested in the page, you _could_ 
just do

	get_user_pages(tsk, mm, uaddr, 4, 1, 0, NULL, NULL);

where a NULL "pages" pointer already tells get_user_pages() that you're 
not interested.

That's at least cleaner than doing a "gup_fast()" (which isn't fast), and 
then freeing the page that you weren't even interested in.

		Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-06-21 17:12     ` Thomas Gleixner
@ 2009-06-21 17:37       ` Linus Torvalds
  2009-06-21 17:57         ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2009-06-21 17:37 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Andrew Morton



On Sun, 21 Jun 2009, Thomas Gleixner wrote:
> 
> Hmm. The main reason why we switched to get_user_pages_fast() in the
> futex code is to avoid mmap_sem contention which was observed as a
> real big performance problem especially with those horrible JavaVM
> applications.

Not relevant.

get_user_pages_fast() takes the mmap_sem for the case where it needs to 
fault things in too.

So assuming the _only_ reason this thing is called is because we failed 
earlier when doing the futex_atomic_op_inuser(), then you're basically 
guaranteed that the "fast" case of get_user_pages_fast() is never actually 
taken, since we already know that the page tables aren't amenable to an 
atomic access.

And as far as I can tell, that is indeed the only case where you use that 
'get_user_writeable()' thing. You've had futex_atomic_op_inuser() fail, 
and need to repeat. No?

> As a fallout of this we got rid of the private find_vma /
> handle_mm_fault magic (as above) in the futex code which mm folks
> frowned upon for quite a while. Unfortunately we got it wrong :(

Sure. But "get_user_pages_fast()" really is the wrong thing. You're not at 
all interested in the user pages. You're interested in making sure that 
the page is atomically writable, and nothing else. Right?

Which is why I said that "lock ; addl $0,(mem)" would be a _single_ 
instruction, and do everything that your "get_user_pages_fast()" hack 
would do. If the fault is unlikely, that would be a better operation. I 
just don't think the fault is unlikely, I suspect it happens every time.

			Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-06-20 19:01   ` Linus Torvalds
  2009-06-20 20:27     ` Ingo Molnar
@ 2009-06-21 17:12     ` Thomas Gleixner
  2009-06-21 17:37       ` Linus Torvalds
  1 sibling, 1 reply; 97+ messages in thread
From: Thomas Gleixner @ 2009-06-21 17:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Andrew Morton

On Sat, 20 Jun 2009, Linus Torvalds wrote:
> On Sat, 20 Jun 2009, Linus Torvalds wrote:
> > 
> > On x86, the natural way to do what you want done is ONE SINGLE 
> > INSTRUCTION! As far as I can tell, the above crazy function is 100% 
> > equivalent to this:
> > 
> > 	asm __inline__("lock ; addl $0,%0":"+m" (*uaddr): :"memory", "cc");
> > 
> > which really makes me think that using "get_user_pages_fast()" for it is 
> > some truly crazy crap.
> 
> We could also take the opposite approach - knowing that this is called 
> only when the page doesn't exist, and just doing
> 
> 	down_read(mmap_sem)
> 	vma = find_vma(..)
> 	ret = VM_FAULT_ERROR;
> 	if (vma && vma->vm_start <= address)
> 		ret = handle_mm_fault(mm, vma, address, 1);
> 	up_read(mmap_sem);
> 	return (ret & VM_FAULT_ERROR) ? -EFAULT : 0;
> 
> or something like that. Again, that looks saner than using 
> get_user_pages() for this and then dropping the page.

Hmm. The main reason why we switched to get_user_pages_fast() in the
futex code is to avoid mmap_sem contention which was observed as a
real big performance problem especially with those horrible JavaVM
applications.

As a fallout of this we got rid of the private find_vma /
handle_mm_fault magic (as above) in the futex code which mm folks
frowned upon for quite a while. Unfortunately we got it wrong :(

I agree that in the fault path we might go back to the mmap_sem
version, but I want to avoid it when possible. At least we need to run
some of those horrible JavaVM apps to see whether it matters or not.

OTOH, I really wonder whether we can simplify the whole logic when we
keep the page reference for the atomic access to the user space
address, but I still have doubts about the life time rules of all
this. Also I need to find out how this would affect the fast path
optimization of private futexes where we avoid the lookup in the first
place.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-06-20 19:01   ` Linus Torvalds
@ 2009-06-20 20:27     ` Ingo Molnar
  2009-06-21 17:12     ` Thomas Gleixner
  1 sibling, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-06-20 20:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sat, 20 Jun 2009, Linus Torvalds wrote:
> > 
> > On x86, the natural way to do what you want done is ONE SINGLE 
> > INSTRUCTION! As far as I can tell, the above crazy function is 100% 
> > equivalent to this:
> > 
> > 	asm __inline__("lock ; addl $0,%0":"+m" (*uaddr): :"memory", "cc");
> > 
> > which really makes me think that using "get_user_pages_fast()" for it is 
> > some truly crazy crap.
> 
> We could also take the opposite approach - knowing that this is called 
> only when the page doesn't exist, and just doing
> 
> 	down_read(mmap_sem)
> 	vma = find_vma(..)
> 	ret = VM_FAULT_ERROR;
> 	if (vma && vma->vm_start <= address)
> 		ret = handle_mm_fault(mm, vma, address, 1);
> 	up_read(mmap_sem);
> 	return (ret & VM_FAULT_ERROR) ? -EFAULT : 0;
> 
> or something like that. Again, that looks saner than using 
> get_user_pages() for this and then dropping the page.

We'll sort this out tomorrow, sorry about this.

( The other embarrasing bit is that i flagged this commit as bad
  nine days ago - during review i noticed the same bad pattern of a 
  pointless get-put cycle ... but the commit stuck around and forgot 
  about it. Sloppy. )

	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-06-20 18:49 ` Linus Torvalds
@ 2009-06-20 19:01   ` Linus Torvalds
  2009-06-20 20:27     ` Ingo Molnar
  2009-06-21 17:12     ` Thomas Gleixner
  0 siblings, 2 replies; 97+ messages in thread
From: Linus Torvalds @ 2009-06-20 19:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Andrew Morton



On Sat, 20 Jun 2009, Linus Torvalds wrote:
> 
> On x86, the natural way to do what you want done is ONE SINGLE 
> INSTRUCTION! As far as I can tell, the above crazy function is 100% 
> equivalent to this:
> 
> 	asm __inline__("lock ; addl $0,%0":"+m" (*uaddr): :"memory", "cc");
> 
> which really makes me think that using "get_user_pages_fast()" for it is 
> some truly crazy crap.

We could also take the opposite approach - knowing that this is called 
only when the page doesn't exist, and just doing

	down_read(mmap_sem)
	vma = find_vma(..)
	ret = VM_FAULT_ERROR;
	if (vma && vma->vm_start <= address)
		ret = handle_mm_fault(mm, vma, address, 1);
	up_read(mmap_sem);
	return (ret & VM_FAULT_ERROR) ? -EFAULT : 0;

or something like that. Again, that looks saner than using 
get_user_pages() for this and then dropping the page.

		Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-06-20 17:30 Ingo Molnar
@ 2009-06-20 18:49 ` Linus Torvalds
  2009-06-20 19:01   ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2009-06-20 18:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Andrew Morton



On Sat, 20 Jun 2009, Ingo Molnar wrote:
> 
> Please pull the latest core-fixes-for-linus git tree from:

I need to think about this one.

> +/*
> + * get_user_writeable - get user page and verify RW access
> + * @uaddr:	pointer to faulting user space address
> + *
> + * We cannot write to the user space address and get_user just faults
> + * the page in, but does not tell us whether the mapping is writeable.
> + *
> + * We can not rely on access_ok() for private futexes as it is just a
> + * range check and we can neither rely on get_user_pages() as there
> + * might be a mprotect(PROT_READ) for that mapping after
> + * get_user_pages() and before the fault in the atomic write access.
> + */
> +static int get_user_writeable(u32 __user *uaddr)
> +{
> +	unsigned long addr = (unsigned long)uaddr;
> +	struct page *page;
> +	int ret;
> +
> +	ret = get_user_pages_fast(addr, 1, 1, &page);
> +	if (ret > 0)
> +		put_page(page);
> +
> +	return ret;
> +}

This is some seriously crazy sh*t, man. What drugs were you on, and are 
the police on to you? Because whatever drugs they were, I seriously doube 
they are legal even with a prescription.

There's somethign wrong in futex land. This whole retry crap has been so 
incredibly broken so many times, and this particular fix looks so horribly 
ugly that I really need to ask people: "is the loop really sane?"

I also think that the above is a singularly stupid way to do what you want 
done. It's slow, it's nasty, it's complicated.

On x86, the natural way to do what you want done is ONE SINGLE 
INSTRUCTION! As far as I can tell, the above crazy function is 100% 
equivalent to this:

	asm __inline__("lock ; addl $0,%0":"+m" (*uaddr): :"memory", "cc");

which really makes me think that using "get_user_pages_fast()" for it is 
some truly crazy crap.

Sure, we don't have any architecture-generic way to do that, but still..

		Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-06-20 17:30 Ingo Molnar
  2009-06-20 18:49 ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2009-06-20 17:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Joerg Roedel (2):
      dma-debug: check for sg_call_ents in best-fit algorithm too
      dma-debug: be more careful when building reference entries

Peter Zijlstra (1):
      lockdep: Select frame pointers on x86

Thomas Gleixner (1):
      futex: Fix the write access fault problem for real


 kernel/futex.c    |   51 +++++++++++--------
 lib/Kconfig.debug |    2 +-
 lib/dma-debug.c   |  149 +++++++++++++++++++++++++++++++++++------------------
 3 files changed, 129 insertions(+), 73 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 80b5ce7..c0ff820 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -284,6 +284,31 @@ void put_futex_key(int fshared, union futex_key *key)
 	drop_futex_key_refs(key);
 }
 
+/*
+ * get_user_writeable - get user page and verify RW access
+ * @uaddr:	pointer to faulting user space address
+ *
+ * We cannot write to the user space address and get_user just faults
+ * the page in, but does not tell us whether the mapping is writeable.
+ *
+ * We can not rely on access_ok() for private futexes as it is just a
+ * range check and we can neither rely on get_user_pages() as there
+ * might be a mprotect(PROT_READ) for that mapping after
+ * get_user_pages() and before the fault in the atomic write access.
+ */
+static int get_user_writeable(u32 __user *uaddr)
+{
+	unsigned long addr = (unsigned long)uaddr;
+	struct page *page;
+	int ret;
+
+	ret = get_user_pages_fast(addr, 1, 1, &page);
+	if (ret > 0)
+		put_page(page);
+
+	return ret;
+}
+
 /**
  * futex_top_waiter() - Return the highest priority waiter on a futex
  * @hb:     the hash bucket the futex_q's reside in
@@ -896,7 +921,6 @@ retry:
 retry_private:
 	op_ret = futex_atomic_op_inuser(op, uaddr2);
 	if (unlikely(op_ret < 0)) {
-		u32 dummy;
 
 		double_unlock_hb(hb1, hb2);
 
@@ -914,7 +938,7 @@ retry_private:
 			goto out_put_keys;
 		}
 
-		ret = get_user(dummy, uaddr2);
+		ret = get_user_writeable(uaddr2);
 		if (ret)
 			goto out_put_keys;
 
@@ -1204,7 +1228,7 @@ retry_private:
 			double_unlock_hb(hb1, hb2);
 			put_futex_key(fshared, &key2);
 			put_futex_key(fshared, &key1);
-			ret = get_user(curval2, uaddr2);
+			ret = get_user_writeable(uaddr2);
 			if (!ret)
 				goto retry;
 			goto out;
@@ -1482,7 +1506,7 @@ retry:
 handle_fault:
 	spin_unlock(q->lock_ptr);
 
-	ret = get_user(uval, uaddr);
+	ret = get_user_writeable(uaddr);
 
 	spin_lock(q->lock_ptr);
 
@@ -1807,7 +1831,6 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct futex_hash_bucket *hb;
-	u32 uval;
 	struct futex_q q;
 	int res, ret;
 
@@ -1909,16 +1932,9 @@ out:
 	return ret != -EINTR ? ret : -ERESTARTNOINTR;
 
 uaddr_faulted:
-	/*
-	 * We have to r/w  *(int __user *)uaddr, and we have to modify it
-	 * atomically.  Therefore, if we continue to fault after get_user()
-	 * below, we need to handle the fault ourselves, while still holding
-	 * the mmap_sem.  This can occur if the uaddr is under contention as
-	 * we have to drop the mmap_sem in order to call get_user().
-	 */
 	queue_unlock(&q, hb);
 
-	ret = get_user(uval, uaddr);
+	ret = get_user_writeable(uaddr);
 	if (ret)
 		goto out_put_key;
 
@@ -2013,17 +2029,10 @@ out:
 	return ret;
 
 pi_faulted:
-	/*
-	 * We have to r/w  *(int __user *)uaddr, and we have to modify it
-	 * atomically.  Therefore, if we continue to fault after get_user()
-	 * below, we need to handle the fault ourselves, while still holding
-	 * the mmap_sem.  This can occur if the uaddr is under contention as
-	 * we have to drop the mmap_sem in order to call get_user().
-	 */
 	spin_unlock(&hb->lock);
 	put_futex_key(fshared, &key);
 
-	ret = get_user(uval, uaddr);
+	ret = get_user_writeable(uaddr);
 	if (!ret)
 		goto retry;
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6cdcf38..3be4b7c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -440,7 +440,7 @@ config LOCKDEP
 	bool
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
 	select STACKTRACE
-	select FRAME_POINTER if !X86 && !MIPS && !PPC && !ARM_UNWIND && !S390
+	select FRAME_POINTER if !MIPS && !PPC && !ARM_UNWIND && !S390
 	select KALLSYMS
 	select KALLSYMS_ALL
 
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index ad65fc0..3b93129 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -262,11 +262,12 @@ static struct dma_debug_entry *hash_bucket_find(struct hash_bucket *bucket,
 		 */
 		matches += 1;
 		match_lvl = 0;
-		entry->size      == ref->size      ? ++match_lvl : match_lvl;
-		entry->type      == ref->type      ? ++match_lvl : match_lvl;
-		entry->direction == ref->direction ? ++match_lvl : match_lvl;
+		entry->size         == ref->size         ? ++match_lvl : 0;
+		entry->type         == ref->type         ? ++match_lvl : 0;
+		entry->direction    == ref->direction    ? ++match_lvl : 0;
+		entry->sg_call_ents == ref->sg_call_ents ? ++match_lvl : 0;
 
-		if (match_lvl == 3) {
+		if (match_lvl == 4) {
 			/* perfect-fit - return the result */
 			return entry;
 		} else if (match_lvl > last_lvl) {
@@ -873,72 +874,68 @@ static void check_for_illegal_area(struct device *dev, void *addr, u64 size)
 				"[addr=%p] [size=%llu]\n", addr, size);
 }
 
-static void check_sync(struct device *dev, dma_addr_t addr,
-		       u64 size, u64 offset, int direction, bool to_cpu)
+static void check_sync(struct device *dev,
+		       struct dma_debug_entry *ref,
+		       bool to_cpu)
 {
-	struct dma_debug_entry ref = {
-		.dev            = dev,
-		.dev_addr       = addr,
-		.size           = size,
-		.direction      = direction,
-	};
 	struct dma_debug_entry *entry;
 	struct hash_bucket *bucket;
 	unsigned long flags;
 
-	bucket = get_hash_bucket(&ref, &flags);
+	bucket = get_hash_bucket(ref, &flags);
 
-	entry = hash_bucket_find(bucket, &ref);
+	entry = hash_bucket_find(bucket, ref);
 
 	if (!entry) {
 		err_printk(dev, NULL, "DMA-API: device driver tries "
 				"to sync DMA memory it has not allocated "
 				"[device address=0x%016llx] [size=%llu bytes]\n",
-				(unsigned long long)addr, size);
+				(unsigned long long)ref->dev_addr, ref->size);
 		goto out;
 	}
 
-	if ((offset + size) > entry->size) {
+	if (ref->size > entry->size) {
 		err_printk(dev, entry, "DMA-API: device driver syncs"
 				" DMA memory outside allocated range "
 				"[device address=0x%016llx] "
-				"[allocation size=%llu bytes] [sync offset=%llu] "
-				"[sync size=%llu]\n", entry->dev_addr, entry->size,
-				offset, size);
+				"[allocation size=%llu bytes] "
+				"[sync offset+size=%llu]\n",
+				entry->dev_addr, entry->size,
+				ref->size);
 	}
 
-	if (direction != entry->direction) {
+	if (ref->direction != entry->direction) {
 		err_printk(dev, entry, "DMA-API: device driver syncs "
 				"DMA memory with different direction "
 				"[device address=0x%016llx] [size=%llu bytes] "
 				"[mapped with %s] [synced with %s]\n",
-				(unsigned long long)addr, entry->size,
+				(unsigned long long)ref->dev_addr, entry->size,
 				dir2name[entry->direction],
-				dir2name[direction]);
+				dir2name[ref->direction]);
 	}
 
 	if (entry->direction == DMA_BIDIRECTIONAL)
 		goto out;
 
 	if (to_cpu && !(entry->direction == DMA_FROM_DEVICE) &&
-		      !(direction == DMA_TO_DEVICE))
+		      !(ref->direction == DMA_TO_DEVICE))
 		err_printk(dev, entry, "DMA-API: device driver syncs "
 				"device read-only DMA memory for cpu "
 				"[device address=0x%016llx] [size=%llu bytes] "
 				"[mapped with %s] [synced with %s]\n",
-				(unsigned long long)addr, entry->size,
+				(unsigned long long)ref->dev_addr, entry->size,
 				dir2name[entry->direction],
-				dir2name[direction]);
+				dir2name[ref->direction]);
 
 	if (!to_cpu && !(entry->direction == DMA_TO_DEVICE) &&
-		       !(direction == DMA_FROM_DEVICE))
+		       !(ref->direction == DMA_FROM_DEVICE))
 		err_printk(dev, entry, "DMA-API: device driver syncs "
 				"device write-only DMA memory to device "
 				"[device address=0x%016llx] [size=%llu bytes] "
 				"[mapped with %s] [synced with %s]\n",
-				(unsigned long long)addr, entry->size,
+				(unsigned long long)ref->dev_addr, entry->size,
 				dir2name[entry->direction],
-				dir2name[direction]);
+				dir2name[ref->direction]);
 
 out:
 	put_hash_bucket(bucket, &flags);
@@ -1036,19 +1033,16 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
 }
 EXPORT_SYMBOL(debug_dma_map_sg);
 
-static int get_nr_mapped_entries(struct device *dev, struct scatterlist *s)
+static int get_nr_mapped_entries(struct device *dev,
+				 struct dma_debug_entry *ref)
 {
-	struct dma_debug_entry *entry, ref;
+	struct dma_debug_entry *entry;
 	struct hash_bucket *bucket;
 	unsigned long flags;
 	int mapped_ents;
 
-	ref.dev      = dev;
-	ref.dev_addr = sg_dma_address(s);
-	ref.size     = sg_dma_len(s),
-
-	bucket       = get_hash_bucket(&ref, &flags);
-	entry        = hash_bucket_find(bucket, &ref);
+	bucket       = get_hash_bucket(ref, &flags);
+	entry        = hash_bucket_find(bucket, ref);
 	mapped_ents  = 0;
 
 	if (entry)
@@ -1076,16 +1070,14 @@ void debug_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
 			.dev_addr       = sg_dma_address(s),
 			.size           = sg_dma_len(s),
 			.direction      = dir,
-			.sg_call_ents   = 0,
+			.sg_call_ents   = nelems,
 		};
 
 		if (mapped_ents && i >= mapped_ents)
 			break;
 
-		if (!i) {
-			ref.sg_call_ents = nelems;
-			mapped_ents = get_nr_mapped_entries(dev, s);
-		}
+		if (!i)
+			mapped_ents = get_nr_mapped_entries(dev, &ref);
 
 		check_unmap(&ref);
 	}
@@ -1140,10 +1132,19 @@ EXPORT_SYMBOL(debug_dma_free_coherent);
 void debug_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
 				   size_t size, int direction)
 {
+	struct dma_debug_entry ref;
+
 	if (unlikely(global_disable))
 		return;
 
-	check_sync(dev, dma_handle, size, 0, direction, true);
+	ref.type         = dma_debug_single;
+	ref.dev          = dev;
+	ref.dev_addr     = dma_handle;
+	ref.size         = size;
+	ref.direction    = direction;
+	ref.sg_call_ents = 0;
+
+	check_sync(dev, &ref, true);
 }
 EXPORT_SYMBOL(debug_dma_sync_single_for_cpu);
 
@@ -1151,10 +1152,19 @@ void debug_dma_sync_single_for_device(struct device *dev,
 				      dma_addr_t dma_handle, size_t size,
 				      int direction)
 {
+	struct dma_debug_entry ref;
+
 	if (unlikely(global_disable))
 		return;
 
-	check_sync(dev, dma_handle, size, 0, direction, false);
+	ref.type         = dma_debug_single;
+	ref.dev          = dev;
+	ref.dev_addr     = dma_handle;
+	ref.size         = size;
+	ref.direction    = direction;
+	ref.sg_call_ents = 0;
+
+	check_sync(dev, &ref, false);
 }
 EXPORT_SYMBOL(debug_dma_sync_single_for_device);
 
@@ -1163,10 +1173,19 @@ void debug_dma_sync_single_range_for_cpu(struct device *dev,
 					 unsigned long offset, size_t size,
 					 int direction)
 {
+	struct dma_debug_entry ref;
+
 	if (unlikely(global_disable))
 		return;
 
-	check_sync(dev, dma_handle, size, offset, direction, true);
+	ref.type         = dma_debug_single;
+	ref.dev          = dev;
+	ref.dev_addr     = dma_handle;
+	ref.size         = offset + size;
+	ref.direction    = direction;
+	ref.sg_call_ents = 0;
+
+	check_sync(dev, &ref, true);
 }
 EXPORT_SYMBOL(debug_dma_sync_single_range_for_cpu);
 
@@ -1175,10 +1194,19 @@ void debug_dma_sync_single_range_for_device(struct device *dev,
 					    unsigned long offset,
 					    size_t size, int direction)
 {
+	struct dma_debug_entry ref;
+
 	if (unlikely(global_disable))
 		return;
 
-	check_sync(dev, dma_handle, size, offset, direction, false);
+	ref.type         = dma_debug_single;
+	ref.dev          = dev;
+	ref.dev_addr     = dma_handle;
+	ref.size         = offset + size;
+	ref.direction    = direction;
+	ref.sg_call_ents = 0;
+
+	check_sync(dev, &ref, false);
 }
 EXPORT_SYMBOL(debug_dma_sync_single_range_for_device);
 
@@ -1192,14 +1220,24 @@ void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 		return;
 
 	for_each_sg(sg, s, nelems, i) {
+
+		struct dma_debug_entry ref = {
+			.type           = dma_debug_sg,
+			.dev            = dev,
+			.paddr          = sg_phys(s),
+			.dev_addr       = sg_dma_address(s),
+			.size           = sg_dma_len(s),
+			.direction      = direction,
+			.sg_call_ents   = nelems,
+		};
+
 		if (!i)
-			mapped_ents = get_nr_mapped_entries(dev, s);
+			mapped_ents = get_nr_mapped_entries(dev, &ref);
 
 		if (i >= mapped_ents)
 			break;
 
-		check_sync(dev, sg_dma_address(s), sg_dma_len(s), 0,
-			   direction, true);
+		check_sync(dev, &ref, true);
 	}
 }
 EXPORT_SYMBOL(debug_dma_sync_sg_for_cpu);
@@ -1214,14 +1252,23 @@ void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 		return;
 
 	for_each_sg(sg, s, nelems, i) {
+
+		struct dma_debug_entry ref = {
+			.type           = dma_debug_sg,
+			.dev            = dev,
+			.paddr          = sg_phys(s),
+			.dev_addr       = sg_dma_address(s),
+			.size           = sg_dma_len(s),
+			.direction      = direction,
+			.sg_call_ents   = nelems,
+		};
 		if (!i)
-			mapped_ents = get_nr_mapped_entries(dev, s);
+			mapped_ents = get_nr_mapped_entries(dev, &ref);
 
 		if (i >= mapped_ents)
 			break;
 
-		check_sync(dev, sg_dma_address(s), sg_dma_len(s), 0,
-			   direction, false);
+		check_sync(dev, &ref, false);
 	}
 }
 EXPORT_SYMBOL(debug_dma_sync_sg_for_device);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-05-18 19:20   ` Thomas Gleixner
  2009-05-19 20:52     ` Linus Torvalds
@ 2009-05-19 22:20     ` Darren Hart
  1 sibling, 0 replies; 97+ messages in thread
From: Darren Hart @ 2009-05-19 22:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton, Peter Zijlstra

Thomas Gleixner wrote:
> Linus,
> 
> On Mon, 18 May 2009, Linus Torvalds wrote:
>> On Mon, 18 May 2009, Ingo Molnar wrote:
>>> Thomas Gleixner (1):
>>>       futex: futex mapping needs to be writable
>> I do not believe this is right.
> 
> You are right to believe that :)
> 
>> Just a few lines later, we have:
>>
>>          * NOTE: When userspace waits on a MAP_SHARED mapping, even if
>>          * it's a read-only handle, it's expected that futexes attach to   
>>          * the object not the particular process.
>>
>> note how we are _supposed_ to be able to wait for something that is 
>> read-only. As such, asking for a writable page is bogus.
> 
> We write access the user space address in various places in the futex
> functions, so we really need a writeable mapping for a bunch of the
> futex ops. 
> 
> We have an explicit check for the private futexes a few lines up.
> 
>       if (unlikely(!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))))
> 
> There are some of the futex ops which can be done with a RO mapping
> though. I'm not sure if it makes much sense as user space (at least
> glibc) always writes the variable and/or the surrounding members in
> the user space data structures, but for now we should leave it RO for
> the ones which do not modify the user value.
> 
>> I'm not going to pull this. I can well imagine that there was a real bug, 
>> but this is _not_ the real fix.
>>
>> The commentary is also TOTAL CRAP as far as I can tell. It starts out 
>> with:
>>
>>     commit 734b05b10e51d4ba38c8fc3ee02e846aab09eedf (futex: use
>>     fast_gup()) calls get_user_pages_fast() with the write argument set to
>>     0. This went unnoticed [...]
>>
>> and that is pure and utter SHIT. The fact is, the write argument was 
>> ALWAYS zero, and commit 734b05b10e51d4ba38c8fc3ee02e846aab09eedf has 
>> nothing to do with anything what-so-ever, and nothing went unnoticed 
>> anywhere.
> 
> Sorry, I misread the GUP commit.
> 
>> The real bug was apparently just commit e4dc5b7a3 ("clean up").
> 
> So we always had that write=0 mapping. And we did not notice that we
> always faulted in the futex functions which modify the user space
> variable simply because the fault was fixed up in the private futex
> fault handling code. The removal of that code led to the problem which
> we have right now.
> 
> Correct fix below.
> 
> Thanks,
> 
> 	tglx
> ------>
> futex: setup writeable mapping for futex ops which modify user space data
> 
> The futex code installs a read only mapping via get_user_pages_fast()
> even if the futex op function has to modify user space data. The
> eventual fault was fixed up by futex_handle_fault() which walked the
> VMA with mmap_sem held.
> 
> After the cleanup patches which removed the mmap_sem dependency of the
> futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex:
> clean up fault logic) removed the private VMA walk logic from the
> futex code. This change results in a stale RO mapping which is not
> fixed up.
> 
> Instead of reintroducing the previous fault logic we set up the
> mapping in get_user_pages_fast() read/write for all operations which
> modify user space data. Also handle private futexes in the same way
> and make the current unconditional access_ok(VERIFY_WRITE) depend on
> the futex op.
> 
> Reported-by: Andreas Schwab <schwab@linux-m68k.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

This looks to be an elegant fix to me and maintains the fault logic 
cleanup that actually caused the stale RO mapping issue mentioned above. 
  My only comment would be that it further obsoletes some commentary in 
functions like futex_lock_pi() about how fault handling is done.  I'll 
just add this to my list of futex commentary to fix!

Acked-by: Darren Hart <dvhltc@us.ibm.com>

> CC: stable@kernel.org
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index eef8cd2..3d7519d 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -193,6 +193,7 @@ static void drop_futex_key_refs(union futex_key *key)
>   * @uaddr: virtual address of the futex
>   * @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
>   * @key: address where result is stored.
> + * @rw: mapping needs to be read/write (values: VERIFY_READ, VERIFY_WRITE)
>   *
>   * Returns a negative error code or 0
>   * The key words are stored in *key on success.
> @@ -203,7 +204,8 @@ static void drop_futex_key_refs(union futex_key *key)
>   *
>   * lock_page() might sleep, the caller should not hold a spinlock.
>   */
> -static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
> +static int
> +get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
>  {
>  	unsigned long address = (unsigned long)uaddr;
>  	struct mm_struct *mm = current->mm;
> @@ -226,7 +228,7 @@ static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
>  	 *        but access_ok() should be faster than find_vma()
>  	 */
>  	if (!fshared) {
> -		if (unlikely(!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))))
> +		if (unlikely(!access_ok(rw, uaddr, sizeof(u32))))
>  			return -EFAULT;
>  		key->private.mm = mm;
>  		key->private.address = address;
> @@ -235,7 +237,7 @@ static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
>  	}
> 
>  again:
> -	err = get_user_pages_fast(address, 1, 0, &page);
> +	err = get_user_pages_fast(address, 1, rw == VERIFY_WRITE, &page);
>  	if (err < 0)
>  		return err;
> 
> @@ -677,7 +679,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
>  	if (!bitset)
>  		return -EINVAL;
> 
> -	ret = get_futex_key(uaddr, fshared, &key);
> +	ret = get_futex_key(uaddr, fshared, &key, VERIFY_READ);
>  	if (unlikely(ret != 0))
>  		goto out;
> 
> @@ -723,10 +725,10 @@ futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
>  	int ret, op_ret;
> 
>  retry:
> -	ret = get_futex_key(uaddr1, fshared, &key1);
> +	ret = get_futex_key(uaddr1, fshared, &key1, VERIFY_READ);
>  	if (unlikely(ret != 0))
>  		goto out;
> -	ret = get_futex_key(uaddr2, fshared, &key2);
> +	ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_WRITE);
>  	if (unlikely(ret != 0))
>  		goto out_put_key1;
> 
> @@ -814,10 +816,10 @@ static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
>  	int ret, drop_count = 0;
> 
>  retry:
> -	ret = get_futex_key(uaddr1, fshared, &key1);
> +	ret = get_futex_key(uaddr1, fshared, &key1, VERIFY_READ);
>  	if (unlikely(ret != 0))
>  		goto out;
> -	ret = get_futex_key(uaddr2, fshared, &key2);
> +	ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_READ);
>  	if (unlikely(ret != 0))
>  		goto out_put_key1;
> 
> @@ -1140,7 +1142,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
>  	q.bitset = bitset;
>  retry:
>  	q.key = FUTEX_KEY_INIT;
> -	ret = get_futex_key(uaddr, fshared, &q.key);
> +	ret = get_futex_key(uaddr, fshared, &q.key, VERIFY_READ);
>  	if (unlikely(ret != 0))
>  		goto out;
> 
> @@ -1330,7 +1332,7 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
>  	q.pi_state = NULL;
>  retry:
>  	q.key = FUTEX_KEY_INIT;
> -	ret = get_futex_key(uaddr, fshared, &q.key);
> +	ret = get_futex_key(uaddr, fshared, &q.key, VERIFY_WRITE);
>  	if (unlikely(ret != 0))
>  		goto out;
> 
> @@ -1594,7 +1596,7 @@ retry:
>  	if ((uval & FUTEX_TID_MASK) != task_pid_vnr(current))
>  		return -EPERM;
> 
> -	ret = get_futex_key(uaddr, fshared, &key);
> +	ret = get_futex_key(uaddr, fshared, &key, VERIFY_WRITE);
>  	if (unlikely(ret != 0))
>  		goto out;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-05-19 20:52     ` Linus Torvalds
@ 2009-05-19 21:45       ` Thomas Gleixner
  0 siblings, 0 replies; 97+ messages in thread
From: Thomas Gleixner @ 2009-05-19 21:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, linux-kernel, Andrew Morton, Peter Zijlstra

On Tue, 19 May 2009, Linus Torvalds wrote:
> Should I take it like this, or pull from something?

Either way. It's ready to pull from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/urgent

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-05-18 19:20   ` Thomas Gleixner
@ 2009-05-19 20:52     ` Linus Torvalds
  2009-05-19 21:45       ` Thomas Gleixner
  2009-05-19 22:20     ` Darren Hart
  1 sibling, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2009-05-19 20:52 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, linux-kernel, Andrew Morton, Peter Zijlstra



On Mon, 18 May 2009, Thomas Gleixner wrote:
> 
> So we always had that write=0 mapping. And we did not notice that we
> always faulted in the futex functions which modify the user space
> variable simply because the fault was fixed up in the private futex
> fault handling code. The removal of that code led to the problem which
> we have right now.
> 
> Correct fix below.

Ok, this looks fine to me. 

Should I take it like this, or pull from something?

		Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-05-18 15:48 ` Linus Torvalds
@ 2009-05-18 19:20   ` Thomas Gleixner
  2009-05-19 20:52     ` Linus Torvalds
  2009-05-19 22:20     ` Darren Hart
  0 siblings, 2 replies; 97+ messages in thread
From: Thomas Gleixner @ 2009-05-18 19:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, linux-kernel, Andrew Morton, Peter Zijlstra

Linus,

On Mon, 18 May 2009, Linus Torvalds wrote:
> On Mon, 18 May 2009, Ingo Molnar wrote:
> > 
> > Thomas Gleixner (1):
> >       futex: futex mapping needs to be writable
> 
> I do not believe this is right.

You are right to believe that :)

> Just a few lines later, we have:
> 
>          * NOTE: When userspace waits on a MAP_SHARED mapping, even if
>          * it's a read-only handle, it's expected that futexes attach to   
>          * the object not the particular process.
> 
> note how we are _supposed_ to be able to wait for something that is 
> read-only. As such, asking for a writable page is bogus.

We write access the user space address in various places in the futex
functions, so we really need a writeable mapping for a bunch of the
futex ops. 

We have an explicit check for the private futexes a few lines up.

      if (unlikely(!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))))

There are some of the futex ops which can be done with a RO mapping
though. I'm not sure if it makes much sense as user space (at least
glibc) always writes the variable and/or the surrounding members in
the user space data structures, but for now we should leave it RO for
the ones which do not modify the user value.

> I'm not going to pull this. I can well imagine that there was a real bug, 
> but this is _not_ the real fix.
> 
> The commentary is also TOTAL CRAP as far as I can tell. It starts out 
> with:
> 
>     commit 734b05b10e51d4ba38c8fc3ee02e846aab09eedf (futex: use
>     fast_gup()) calls get_user_pages_fast() with the write argument set to
>     0. This went unnoticed [...]
> 
> and that is pure and utter SHIT. The fact is, the write argument was 
> ALWAYS zero, and commit 734b05b10e51d4ba38c8fc3ee02e846aab09eedf has 
> nothing to do with anything what-so-ever, and nothing went unnoticed 
> anywhere.

Sorry, I misread the GUP commit.

> The real bug was apparently just commit e4dc5b7a3 ("clean up").

So we always had that write=0 mapping. And we did not notice that we
always faulted in the futex functions which modify the user space
variable simply because the fault was fixed up in the private futex
fault handling code. The removal of that code led to the problem which
we have right now.

Correct fix below.

Thanks,

	tglx
------>
futex: setup writeable mapping for futex ops which modify user space data

The futex code installs a read only mapping via get_user_pages_fast()
even if the futex op function has to modify user space data. The
eventual fault was fixed up by futex_handle_fault() which walked the
VMA with mmap_sem held.

After the cleanup patches which removed the mmap_sem dependency of the
futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex:
clean up fault logic) removed the private VMA walk logic from the
futex code. This change results in a stale RO mapping which is not
fixed up.

Instead of reintroducing the previous fault logic we set up the
mapping in get_user_pages_fast() read/write for all operations which
modify user space data. Also handle private futexes in the same way
and make the current unconditional access_ok(VERIFY_WRITE) depend on
the futex op.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CC: stable@kernel.org

diff --git a/kernel/futex.c b/kernel/futex.c
index eef8cd2..3d7519d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -193,6 +193,7 @@ static void drop_futex_key_refs(union futex_key *key)
  * @uaddr: virtual address of the futex
  * @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
  * @key: address where result is stored.
+ * @rw: mapping needs to be read/write (values: VERIFY_READ, VERIFY_WRITE)
  *
  * Returns a negative error code or 0
  * The key words are stored in *key on success.
@@ -203,7 +204,8 @@ static void drop_futex_key_refs(union futex_key *key)
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
  */
-static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
+static int
+get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
 {
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
@@ -226,7 +228,7 @@ static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
 	 *        but access_ok() should be faster than find_vma()
 	 */
 	if (!fshared) {
-		if (unlikely(!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))))
+		if (unlikely(!access_ok(rw, uaddr, sizeof(u32))))
 			return -EFAULT;
 		key->private.mm = mm;
 		key->private.address = address;
@@ -235,7 +237,7 @@ static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
 	}
 
 again:
-	err = get_user_pages_fast(address, 1, 0, &page);
+	err = get_user_pages_fast(address, 1, rw == VERIFY_WRITE, &page);
 	if (err < 0)
 		return err;
 
@@ -677,7 +679,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
 	if (!bitset)
 		return -EINVAL;
 
-	ret = get_futex_key(uaddr, fshared, &key);
+	ret = get_futex_key(uaddr, fshared, &key, VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -723,10 +725,10 @@ futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
 	int ret, op_ret;
 
 retry:
-	ret = get_futex_key(uaddr1, fshared, &key1);
+	ret = get_futex_key(uaddr1, fshared, &key1, VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, fshared, &key2);
+	ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_WRITE);
 	if (unlikely(ret != 0))
 		goto out_put_key1;
 
@@ -814,10 +816,10 @@ static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
 	int ret, drop_count = 0;
 
 retry:
-	ret = get_futex_key(uaddr1, fshared, &key1);
+	ret = get_futex_key(uaddr1, fshared, &key1, VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, fshared, &key2);
+	ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out_put_key1;
 
@@ -1140,7 +1142,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 	q.bitset = bitset;
 retry:
 	q.key = FUTEX_KEY_INIT;
-	ret = get_futex_key(uaddr, fshared, &q.key);
+	ret = get_futex_key(uaddr, fshared, &q.key, VERIFY_READ);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -1330,7 +1332,7 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
 	q.pi_state = NULL;
 retry:
 	q.key = FUTEX_KEY_INIT;
-	ret = get_futex_key(uaddr, fshared, &q.key);
+	ret = get_futex_key(uaddr, fshared, &q.key, VERIFY_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -1594,7 +1596,7 @@ retry:
 	if ((uval & FUTEX_TID_MASK) != task_pid_vnr(current))
 		return -EPERM;
 
-	ret = get_futex_key(uaddr, fshared, &key);
+	ret = get_futex_key(uaddr, fshared, &key, VERIFY_WRITE);
 	if (unlikely(ret != 0))
 		goto out;
 


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [GIT PULL] core kernel fixes
  2009-05-18 14:23 Ingo Molnar
@ 2009-05-18 15:48 ` Linus Torvalds
  2009-05-18 19:20   ` Thomas Gleixner
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2009-05-18 15:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra



On Mon, 18 May 2009, Ingo Molnar wrote:
> 
> Thomas Gleixner (1):
>       futex: futex mapping needs to be writable

I do not believe this is right.

Just a few lines later, we have:

         * NOTE: When userspace waits on a MAP_SHARED mapping, even if
         * it's a read-only handle, it's expected that futexes attach to   
         * the object not the particular process.

note how we are _supposed_ to be able to wait for something that is 
read-only. As such, asking for a writable page is bogus.

I'm not going to pull this. I can well imagine that there was a real bug, 
but this is _not_ the real fix.

The commentary is also TOTAL CRAP as far as I can tell. It starts out 
with:

    commit 734b05b10e51d4ba38c8fc3ee02e846aab09eedf (futex: use
    fast_gup()) calls get_user_pages_fast() with the write argument set to
    0. This went unnoticed [...]

and that is pure and utter SHIT. The fact is, the write argument was 
ALWAYS zero, and commit 734b05b10e51d4ba38c8fc3ee02e846aab09eedf has 
nothing to do with anything what-so-ever, and nothing went unnoticed 
anywhere.

The real bug was apparently just commit e4dc5b7a3 ("clean up").

I also have to object to the "Impact" line of that commit. That line is 
nonsensical and stupid. I hate to bring up this discussion again, but 
dammit, if those Impact lines are crap, then they are crap and should not 
be there! 

The fact that they _look_ nicer and do not break up the story any more 
doesn't change that fact. If you cannot write sane and meaningful impact 
lines, then f*ck me with a spoon - JUST DON'T DO THEM!

I'm upset. Quite frankly, there are just _so_ many things wrong with that 
commit that I get angry when it is sent this late in the game. 

		Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-05-18 14:23 Ingo Molnar
  2009-05-18 15:48 ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2009-05-18 14:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Ingo Molnar (1):
      lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINS

Thomas Gleixner (1):
      futex: futex mapping needs to be writable


 kernel/futex.c             |    2 +-
 kernel/lockdep_internals.h |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index eef8cd2..09db3ae 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -235,7 +235,7 @@ static int get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
 	}
 
 again:
-	err = get_user_pages_fast(address, 1, 0, &page);
+	err = get_user_pages_fast(address, 1, 1, &page);
 	if (err < 0)
 		return err;
 
diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
index a2cc7e9..699a2ac 100644
--- a/kernel/lockdep_internals.h
+++ b/kernel/lockdep_internals.h
@@ -54,9 +54,9 @@ enum {
  * table (if it's not there yet), and we check it for lock order
  * conflicts and deadlocks.
  */
-#define MAX_LOCKDEP_ENTRIES	8192UL
+#define MAX_LOCKDEP_ENTRIES	16384UL
 
-#define MAX_LOCKDEP_CHAINS_BITS	14
+#define MAX_LOCKDEP_CHAINS_BITS	15
 #define MAX_LOCKDEP_CHAINS	(1UL << MAX_LOCKDEP_CHAINS_BITS)
 
 #define MAX_LOCKDEP_CHAIN_HLOCKS (MAX_LOCKDEP_CHAINS*5)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [GIT PULL] core kernel fixes
@ 2009-05-05  9:33 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-05-05  9:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
H Hartley Sweeten (1):
      kernel/posix-cpu-timers.c: fix sparse warning

Joerg Roedel (1):
      dma-debug: remove broken dma memory leak detection for 2.6.30

Ming Lei (1):
      locking: Documentation: lockdep-design.txt, fix note of state bits


 Documentation/lockdep-design.txt |    6 ++--
 kernel/posix-cpu-timers.c        |    8 +++---
 lib/dma-debug.c                  |   53 +-------------------------------------
 3 files changed, 8 insertions(+), 59 deletions(-)

diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index 938ea22..e20d913 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -54,9 +54,9 @@ locking error messages, inside curlies. A contrived example:
 The bit position indicates STATE, STATE-read, for each of the states listed
 above, and the character displayed in each indicates:
 
-   '.'  acquired while irqs disabled
-   '+'  acquired in irq context
-   '-'  acquired with irqs enabled
+   '.'  acquired while irqs disabled and not in irq context
+   '-'  acquired in irq context
+   '+'  acquired with irqs enabled
    '?'  acquired in irq context with irqs enabled.
 
 Unused mutexes cannot be part of the cause of an error.
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c9dcf98..bece7c0 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -1420,19 +1420,19 @@ void run_posix_cpu_timers(struct task_struct *tsk)
 	 * timer call will interfere.
 	 */
 	list_for_each_entry_safe(timer, next, &firing, it.cpu.entry) {
-		int firing;
+		int cpu_firing;
+
 		spin_lock(&timer->it_lock);
 		list_del_init(&timer->it.cpu.entry);
-		firing = timer->it.cpu.firing;
+		cpu_firing = timer->it.cpu.firing;
 		timer->it.cpu.firing = 0;
 		/*
 		 * The firing flag is -1 if we collided with a reset
 		 * of the timer, which already reported this
 		 * almost-firing as an overrun.  So don't generate an event.
 		 */
-		if (likely(firing >= 0)) {
+		if (likely(cpu_firing >= 0))
 			cpu_timer_fire(timer);
-		}
 		spin_unlock(&timer->it_lock);
 	}
 }
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index d3da7ed..69da09a 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -400,60 +400,9 @@ out_err:
 	return -ENOMEM;
 }
 
-static int device_dma_allocations(struct device *dev)
-{
-	struct dma_debug_entry *entry;
-	unsigned long flags;
-	int count = 0, i;
-
-	for (i = 0; i < HASH_SIZE; ++i) {
-		spin_lock_irqsave(&dma_entry_hash[i].lock, flags);
-		list_for_each_entry(entry, &dma_entry_hash[i].list, list) {
-			if (entry->dev == dev)
-				count += 1;
-		}
-		spin_unlock_irqrestore(&dma_entry_hash[i].lock, flags);
-	}
-
-	return count;
-}
-
-static int dma_debug_device_change(struct notifier_block *nb,
-				    unsigned long action, void *data)
-{
-	struct device *dev = data;
-	int count;
-
-
-	switch (action) {
-	case BUS_NOTIFY_UNBIND_DRIVER:
-		count = device_dma_allocations(dev);
-		if (count == 0)
-			break;
-		err_printk(dev, NULL, "DMA-API: device driver has pending "
-				"DMA allocations while released from device "
-				"[count=%d]\n", count);
-		break;
-	default:
-		break;
-	}
-
-	return 0;
-}
-
 void dma_debug_add_bus(struct bus_type *bus)
 {
-	struct notifier_block *nb;
-
-	nb = kzalloc(sizeof(struct notifier_block), GFP_KERNEL);
-	if (nb == NULL) {
-		printk(KERN_ERR "dma_debug_add_bus: out of memory\n");
-		return;
-	}
-
-	nb->notifier_call = dma_debug_device_change;
-
-	bus_register_notifier(bus, nb);
+	/* FIXME: register notifier */
 }
 
 /*

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2009-01-30 23:12 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-01-30 23:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, Peter Zijlstra

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Ed Swierk (1):
      signals, debug: fix BUG: using smp_processor_id() in preemptible code in print_fatal_signal()

Rusty Russell (1):
      cpumask: convert lib/smp_processor_id to new cpumask ops

Steven Rostedt (1):
      generic-ipi: use per cpu data for single cpu ipi calls


 kernel/signal.c        |    2 ++
 kernel/smp.c           |   36 +++++++++++++++++++++++++++++++++---
 lib/smp_processor_id.c |    2 +-
 3 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index e737597..b6b3676 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -909,7 +909,9 @@ static void print_fatal_signal(struct pt_regs *regs, int signr)
 	}
 #endif
 	printk("\n");
+	preempt_disable();
 	show_regs(regs);
+	preempt_enable();
 }
 
 static int __init setup_print_fatal_signals(char *str)
diff --git a/kernel/smp.c b/kernel/smp.c
index 5cfa0e5..bbedbb7 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -18,6 +18,7 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(call_function_lock);
 enum {
 	CSD_FLAG_WAIT		= 0x01,
 	CSD_FLAG_ALLOC		= 0x02,
+	CSD_FLAG_LOCK		= 0x04,
 };
 
 struct call_function_data {
@@ -186,6 +187,9 @@ void generic_smp_call_function_single_interrupt(void)
 			if (data_flags & CSD_FLAG_WAIT) {
 				smp_wmb();
 				data->flags &= ~CSD_FLAG_WAIT;
+			} else if (data_flags & CSD_FLAG_LOCK) {
+				smp_wmb();
+				data->flags &= ~CSD_FLAG_LOCK;
 			} else if (data_flags & CSD_FLAG_ALLOC)
 				kfree(data);
 		}
@@ -196,6 +200,8 @@ void generic_smp_call_function_single_interrupt(void)
 	}
 }
 
+static DEFINE_PER_CPU(struct call_single_data, csd_data);
+
 /*
  * smp_call_function_single - Run a function on a specific CPU
  * @func: The function to run. This must be fast and non-blocking.
@@ -224,14 +230,38 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 		func(info);
 		local_irq_restore(flags);
 	} else if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
-		struct call_single_data *data = NULL;
+		struct call_single_data *data;
 
 		if (!wait) {
+			/*
+			 * We are calling a function on a single CPU
+			 * and we are not going to wait for it to finish.
+			 * We first try to allocate the data, but if we
+			 * fail, we fall back to use a per cpu data to pass
+			 * the information to that CPU. Since all callers
+			 * of this code will use the same data, we must
+			 * synchronize the callers to prevent a new caller
+			 * from corrupting the data before the callee
+			 * can access it.
+			 *
+			 * The CSD_FLAG_LOCK is used to let us know when
+			 * the IPI handler is done with the data.
+			 * The first caller will set it, and the callee
+			 * will clear it. The next caller must wait for
+			 * it to clear before we set it again. This
+			 * will make sure the callee is done with the
+			 * data before a new caller will use it.
+			 */
 			data = kmalloc(sizeof(*data), GFP_ATOMIC);
 			if (data)
 				data->flags = CSD_FLAG_ALLOC;
-		}
-		if (!data) {
+			else {
+				data = &per_cpu(csd_data, me);
+				while (data->flags & CSD_FLAG_LOCK)
+					cpu_relax();
+				data->flags = CSD_FLAG_LOCK;
+			}
+		} else {
 			data = &d;
 			data->flags = CSD_FLAG_WAIT;
 		}
diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 0f8fc22..4689cb0 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor_id(void)
 	 * Kernel threads bound to a single CPU can safely use
 	 * smp_processor_id():
 	 */
-	if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
+	if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
 		goto out;
 
 	/*

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2009-01-26 17:24 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-01-26 17:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Jiri Slaby (1):
      relay: fix lock imbalance in relay_late_setup_files

Lai Jiangshan (2):
      rcu: add __cpuinit to rcu_init_percpu_data()
      rcu: remove duplicate CONFIG_RCU_CPU_STALL_DETECTOR

Mandeep Singh Baines (1):
      softlock: fix false panic which can occur if softlockup_thresh is reduced

Mike Travis (1):
      rcu: move Kconfig menu

Robert Richter (1):
      oprofile: fix uninitialized use of struct op_entry

Thomas Gleixner (1):
      debugobjects: add and use INIT_WORK_ON_STACK


 arch/x86/kernel/hpet.c        |    3 +-
 drivers/oprofile/cpu_buffer.c |    5 +
 drivers/oprofile/cpu_buffer.h |    7 ++
 include/linux/sched.h         |    3 +
 include/linux/workqueue.h     |    6 ++
 init/Kconfig                  |  179 +++++++++++++++++++++--------------------
 kernel/rcuclassic.c           |    2 +-
 kernel/rcutree.c              |    2 +-
 kernel/relay.c                |    4 +-
 kernel/softlockup.c           |    9 ++
 kernel/sysctl.c               |    2 +-
 lib/Kconfig.debug             |   13 ---
 12 files changed, 130 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index cd759ad..64d5ad0 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -628,11 +628,12 @@ static int hpet_cpuhp_notify(struct notifier_block *n,
 
 	switch (action & 0xf) {
 	case CPU_ONLINE:
-		INIT_DELAYED_WORK(&work.work, hpet_work);
+		INIT_DELAYED_WORK_ON_STACK(&work.work, hpet_work);
 		init_completion(&work.complete);
 		/* FIXME: add schedule_work_on() */
 		schedule_delayed_work_on(cpu, &work.work, 0);
 		wait_for_completion(&work.complete);
+		destroy_timer_on_stack(&work.work.timer);
 		break;
 	case CPU_DEAD:
 		if (hdev) {
diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index 2e03b6d..e76d715 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -393,16 +393,21 @@ oprofile_write_reserve(struct op_entry *entry, struct pt_regs * const regs,
 	return;
 
 fail:
+	entry->event = NULL;
 	cpu_buf->sample_lost_overflow++;
 }
 
 int oprofile_add_data(struct op_entry *entry, unsigned long val)
 {
+	if (!entry->event)
+		return 0;
 	return op_cpu_buffer_add_data(entry, val);
 }
 
 int oprofile_write_commit(struct op_entry *entry)
 {
+	if (!entry->event)
+		return -EINVAL;
 	return op_cpu_buffer_write_commit(entry);
 }
 
diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h
index 63f81c4..272995d 100644
--- a/drivers/oprofile/cpu_buffer.h
+++ b/drivers/oprofile/cpu_buffer.h
@@ -66,6 +66,13 @@ static inline void op_cpu_buffer_reset(int cpu)
 	cpu_buf->last_task = NULL;
 }
 
+/*
+ * op_cpu_buffer_add_data() and op_cpu_buffer_write_commit() may be
+ * called only if op_cpu_buffer_write_reserve() did not return NULL or
+ * entry->event != NULL, otherwise entry->size or entry->event will be
+ * used uninitialized.
+ */
+
 struct op_sample
 *op_cpu_buffer_write_reserve(struct op_entry *entry, unsigned long size);
 int op_cpu_buffer_write_commit(struct op_entry *entry);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4cae9b8..54cbabf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -293,6 +293,9 @@ extern void sched_show_task(struct task_struct *p);
 extern void softlockup_tick(void);
 extern void touch_softlockup_watchdog(void);
 extern void touch_all_softlockup_watchdogs(void);
+extern int proc_dosoftlockup_thresh(struct ctl_table *table, int write,
+				    struct file *filp, void __user *buffer,
+				    size_t *lenp, loff_t *ppos);
 extern unsigned int  softlockup_panic;
 extern unsigned long sysctl_hung_task_check_count;
 extern unsigned long sysctl_hung_task_timeout_secs;
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index b362911..20b59eb 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -124,6 +124,12 @@ struct execute_work {
 		init_timer_deferrable(&(_work)->timer);		\
 	} while (0)
 
+#define INIT_DELAYED_WORK_ON_STACK(_work, _func)		\
+	do {							\
+		INIT_WORK(&(_work)->work, (_func));		\
+		init_timer_on_stack(&(_work)->timer);		\
+	} while (0)
+
 /**
  * work_pending - Find out whether a work item is currently pending
  * @work: The work item in question
diff --git a/init/Kconfig b/init/Kconfig
index 2af8382..3be35f3 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -238,6 +238,98 @@ config AUDIT_TREE
 	def_bool y
 	depends on AUDITSYSCALL && INOTIFY
 
+menu "RCU Subsystem"
+
+choice
+	prompt "RCU Implementation"
+	default CLASSIC_RCU
+
+config CLASSIC_RCU
+	bool "Classic RCU"
+	help
+	  This option selects the classic RCU implementation that is
+	  designed for best read-side performance on non-realtime
+	  systems.
+
+	  Select this option if you are unsure.
+
+config TREE_RCU
+	bool "Tree-based hierarchical RCU"
+	help
+	  This option selects the RCU implementation that is
+	  designed for very large SMP system with hundreds or
+	  thousands of CPUs.
+
+config PREEMPT_RCU
+	bool "Preemptible RCU"
+	depends on PREEMPT
+	help
+	  This option reduces the latency of the kernel by making certain
+	  RCU sections preemptible. Normally RCU code is non-preemptible, if
+	  this option is selected then read-only RCU sections become
+	  preemptible. This helps latency, but may expose bugs due to
+	  now-naive assumptions about each RCU read-side critical section
+	  remaining on a given CPU through its execution.
+
+endchoice
+
+config RCU_TRACE
+	bool "Enable tracing for RCU"
+	depends on TREE_RCU || PREEMPT_RCU
+	help
+	  This option provides tracing in RCU which presents stats
+	  in debugfs for debugging RCU implementation.
+
+	  Say Y here if you want to enable RCU tracing
+	  Say N if you are unsure.
+
+config RCU_FANOUT
+	int "Tree-based hierarchical RCU fanout value"
+	range 2 64 if 64BIT
+	range 2 32 if !64BIT
+	depends on TREE_RCU
+	default 64 if 64BIT
+	default 32 if !64BIT
+	help
+	  This option controls the fanout of hierarchical implementations
+	  of RCU, allowing RCU to work efficiently on machines with
+	  large numbers of CPUs.  This value must be at least the cube
+	  root of NR_CPUS, which allows NR_CPUS up to 32,768 for 32-bit
+	  systems and up to 262,144 for 64-bit systems.
+
+	  Select a specific number if testing RCU itself.
+	  Take the default if unsure.
+
+config RCU_FANOUT_EXACT
+	bool "Disable tree-based hierarchical RCU auto-balancing"
+	depends on TREE_RCU
+	default n
+	help
+	  This option forces use of the exact RCU_FANOUT value specified,
+	  regardless of imbalances in the hierarchy.  This is useful for
+	  testing RCU itself, and might one day be useful on systems with
+	  strong NUMA behavior.
+
+	  Without RCU_FANOUT_EXACT, the code will balance the hierarchy.
+
+	  Say N if unsure.
+
+config TREE_RCU_TRACE
+	def_bool RCU_TRACE && TREE_RCU
+	select DEBUG_FS
+	help
+	  This option provides tracing for the TREE_RCU implementation,
+	  permitting Makefile to trivially select kernel/rcutree_trace.c.
+
+config PREEMPT_RCU_TRACE
+	def_bool RCU_TRACE && PREEMPT_RCU
+	select DEBUG_FS
+	help
+	  This option provides tracing for the PREEMPT_RCU implementation,
+	  permitting Makefile to trivially select kernel/rcupreempt_trace.c.
+
+endmenu # "RCU Subsystem"
+
 config IKCONFIG
 	tristate "Kernel .config support"
 	---help---
@@ -972,90 +1064,3 @@ source "block/Kconfig"
 config PREEMPT_NOTIFIERS
 	bool
 
-choice
-	prompt "RCU Implementation"
-	default CLASSIC_RCU
-
-config CLASSIC_RCU
-	bool "Classic RCU"
-	help
-	  This option selects the classic RCU implementation that is
-	  designed for best read-side performance on non-realtime
-	  systems.
-
-	  Select this option if you are unsure.
-
-config TREE_RCU
-	bool "Tree-based hierarchical RCU"
-	help
-	  This option selects the RCU implementation that is
-	  designed for very large SMP system with hundreds or
-	  thousands of CPUs.
-
-config PREEMPT_RCU
-	bool "Preemptible RCU"
-	depends on PREEMPT
-	help
-	  This option reduces the latency of the kernel by making certain
-	  RCU sections preemptible. Normally RCU code is non-preemptible, if
-	  this option is selected then read-only RCU sections become
-	  preemptible. This helps latency, but may expose bugs due to
-	  now-naive assumptions about each RCU read-side critical section
-	  remaining on a given CPU through its execution.
-
-endchoice
-
-config RCU_TRACE
-	bool "Enable tracing for RCU"
-	depends on TREE_RCU || PREEMPT_RCU
-	help
-	  This option provides tracing in RCU which presents stats
-	  in debugfs for debugging RCU implementation.
-
-	  Say Y here if you want to enable RCU tracing
-	  Say N if you are unsure.
-
-config RCU_FANOUT
-	int "Tree-based hierarchical RCU fanout value"
-	range 2 64 if 64BIT
-	range 2 32 if !64BIT
-	depends on TREE_RCU
-	default 64 if 64BIT
-	default 32 if !64BIT
-	help
-	  This option controls the fanout of hierarchical implementations
-	  of RCU, allowing RCU to work efficiently on machines with
-	  large numbers of CPUs.  This value must be at least the cube
-	  root of NR_CPUS, which allows NR_CPUS up to 32,768 for 32-bit
-	  systems and up to 262,144 for 64-bit systems.
-
-	  Select a specific number if testing RCU itself.
-	  Take the default if unsure.
-
-config RCU_FANOUT_EXACT
-	bool "Disable tree-based hierarchical RCU auto-balancing"
-	depends on TREE_RCU
-	default n
-	help
-	  This option forces use of the exact RCU_FANOUT value specified,
-	  regardless of imbalances in the hierarchy.  This is useful for
-	  testing RCU itself, and might one day be useful on systems with
-	  strong NUMA behavior.
-
-	  Without RCU_FANOUT_EXACT, the code will balance the hierarchy.
-
-	  Say N if unsure.
-
-config TREE_RCU_TRACE
-	def_bool RCU_TRACE && TREE_RCU
-	select DEBUG_FS
-	help
-	  This option provides tracing for the TREE_RCU implementation,
-	  permitting Makefile to trivially select kernel/rcutree_trace.c.
-
-config PREEMPT_RCU_TRACE
-	def_bool RCU_TRACE && PREEMPT_RCU
-	select DEBUG_FS
-	help
-	  This option provides tracing for the PREEMPT_RCU implementation,
-	  permitting Makefile to trivially select kernel/rcupreempt_trace.c.
diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
index 490934f..bd5a900 100644
--- a/kernel/rcuclassic.c
+++ b/kernel/rcuclassic.c
@@ -716,7 +716,7 @@ void rcu_check_callbacks(int cpu, int user)
 	raise_rcu_softirq();
 }
 
-static void rcu_init_percpu_data(int cpu, struct rcu_ctrlblk *rcp,
+static void __cpuinit rcu_init_percpu_data(int cpu, struct rcu_ctrlblk *rcp,
 						struct rcu_data *rdp)
 {
 	unsigned long flags;
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index f2d8638..b2fd602 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1314,7 +1314,7 @@ int rcu_needs_cpu(int cpu)
  * access due to the fact that this CPU cannot possibly have any RCU
  * callbacks in flight yet.
  */
-static void
+static void __cpuinit
 rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 {
 	unsigned long flags;
diff --git a/kernel/relay.c b/kernel/relay.c
index 09ac200..9d79b78 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -663,8 +663,10 @@ int relay_late_setup_files(struct rchan *chan,
 
 	mutex_lock(&relay_channels_mutex);
 	/* Is chan already set up? */
-	if (unlikely(chan->has_base_filename))
+	if (unlikely(chan->has_base_filename)) {
+		mutex_unlock(&relay_channels_mutex);
 		return -EEXIST;
+	}
 	chan->has_base_filename = 1;
 	chan->parent = parent;
 	curr_cpu = get_cpu();
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index d9188c6..85d5a24 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -16,6 +16,7 @@
 #include <linux/lockdep.h>
 #include <linux/notifier.h>
 #include <linux/module.h>
+#include <linux/sysctl.h>
 
 #include <asm/irq_regs.h>
 
@@ -88,6 +89,14 @@ void touch_all_softlockup_watchdogs(void)
 }
 EXPORT_SYMBOL(touch_all_softlockup_watchdogs);
 
+int proc_dosoftlockup_thresh(struct ctl_table *table, int write,
+			     struct file *filp, void __user *buffer,
+			     size_t *lenp, loff_t *ppos)
+{
+	touch_all_softlockup_watchdogs();
+	return proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
+}
+
 /*
  * This callback runs from the timer interrupt, and checks
  * whether the watchdog thread has hung or not:
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 368d163..790f9d7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -809,7 +809,7 @@ static struct ctl_table kern_table[] = {
 		.data		= &softlockup_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= &proc_dointvec_minmax,
+		.proc_handler	= &proc_dosoftlockup_thresh,
 		.strategy	= &sysctl_intvec,
 		.extra1		= &neg_one,
 		.extra2		= &sixty,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4c9ae60..e770e85 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -633,19 +633,6 @@ config RCU_TORTURE_TEST_RUNNABLE
 
 config RCU_CPU_STALL_DETECTOR
 	bool "Check for stalled CPUs delaying RCU grace periods"
-	depends on CLASSIC_RCU
-	default n
-	help
-	  This option causes RCU to printk information on which
-	  CPUs are delaying the current grace period, but only when
-	  the grace period extends for excessive time periods.
-
-	  Say Y if you want RCU to perform such checks.
-
-	  Say N if you are unsure.
-
-config RCU_CPU_STALL_DETECTOR
-	bool "Check for stalled CPUs delaying RCU grace periods"
 	depends on CLASSIC_RCU || TREE_RCU
 	default n
 	help

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2009-01-11 14:36 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2009-01-11 14:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Andrew Morton (1):
      smp_call_function_single(): be slightly less stupid

David Miller (1):
      sparc64: Fix cpumask related build failure

Ingo Molnar (1):
      smp_call_function_single(): be slightly less stupid, fix

Paul E. McKenney (1):
      rcu: fix bug in rcutorture system-shutdown code


 arch/sparc/include/asm/topology_64.h |    4 +
 include/linux/smp.h                  |   13 +---
 kernel/Makefile                      |    6 ++-
 kernel/rcutorture.c                  |  113 ++++++++++++++++++++-------------
 kernel/up.c                          |   20 ++++++
 5 files changed, 100 insertions(+), 56 deletions(-)
 create mode 100644 kernel/up.c

diff --git a/arch/sparc/include/asm/topology_64.h b/arch/sparc/include/asm/topology_64.h
index b8a65b6..5bc0b8f 100644
--- a/arch/sparc/include/asm/topology_64.h
+++ b/arch/sparc/include/asm/topology_64.h
@@ -47,6 +47,10 @@ static inline int pcibus_to_node(struct pci_bus *pbus)
 	(pcibus_to_node(bus) == -1 ? \
 	 CPU_MASK_ALL : \
 	 node_to_cpumask(pcibus_to_node(bus)))
+#define cpumask_of_pcibus(bus)	\
+	(pcibus_to_node(bus) == -1 ? \
+	 CPU_MASK_ALL_PTR : \
+	 cpumask_of_node(pcibus_to_node(bus)))
 
 #define SD_NODE_INIT (struct sched_domain) {		\
 	.min_interval		= 8,			\
diff --git a/include/linux/smp.h b/include/linux/smp.h
index b824669..715196b 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -24,6 +24,9 @@ struct call_single_data {
 /* total number of cpus in this system (may exceed NR_CPUS) */
 extern unsigned int total_cpus;
 
+int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
+				int wait);
+
 #ifdef CONFIG_SMP
 
 #include <linux/preempt.h>
@@ -79,8 +82,6 @@ smp_call_function_mask(cpumask_t mask, void(*func)(void *info), void *info,
 	return 0;
 }
 
-int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
-				int wait);
 void __smp_call_function_single(int cpuid, struct call_single_data *data);
 
 /*
@@ -140,14 +141,6 @@ static inline int up_smp_call_function(void (*func)(void *), void *info)
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus()			1
 #define smp_prepare_boot_cpu()			do {} while (0)
-#define smp_call_function_single(cpuid, func, info, wait) \
-({ \
-	WARN_ON(cpuid != 0);	\
-	local_irq_disable();	\
-	(func)(info);		\
-	local_irq_enable();	\
-	0;			\
-})
 #define smp_call_function_mask(mask, func, info, wait) \
 			(up_smp_call_function(func, info))
 #define smp_call_function_many(mask, func, info, wait) \
diff --git a/kernel/Makefile b/kernel/Makefile
index 2921d90..2aebc4c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -40,7 +40,11 @@ obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
 obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o
 obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
-obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o
+ifeq ($(CONFIG_USE_GENERIC_SMP_HELPERS),y)
+obj-y += smp.o
+else
+obj-y += up.o
+endif
 obj-$(CONFIG_SMP) += spinlock.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 1cff28d..7c4142a 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -136,29 +136,47 @@ static int stutter_pause_test = 0;
 #endif
 int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT;
 
-#define FULLSTOP_SHUTDOWN 1	/* Bail due to system shutdown/panic. */
-#define FULLSTOP_CLEANUP  2	/* Orderly shutdown. */
-static int fullstop;		/* stop generating callbacks at test end. */
-DEFINE_MUTEX(fullstop_mutex);	/* protect fullstop transitions and */
-				/*  spawning of kthreads. */
+/* Mediate rmmod and system shutdown.  Concurrent rmmod & shutdown illegal! */
+
+#define FULLSTOP_DONTSTOP 0	/* Normal operation. */
+#define FULLSTOP_SHUTDOWN 1	/* System shutdown with rcutorture running. */
+#define FULLSTOP_RMMOD    2	/* Normal rmmod of rcutorture. */
+static int fullstop = FULLSTOP_RMMOD;
+DEFINE_MUTEX(fullstop_mutex);	/* Protect fullstop transitions and spawning */
+				/*  of kthreads. */
 
 /*
- * Detect and respond to a signal-based shutdown.
+ * Detect and respond to a system shutdown.
  */
 static int
 rcutorture_shutdown_notify(struct notifier_block *unused1,
 			   unsigned long unused2, void *unused3)
 {
-	if (fullstop)
-		return NOTIFY_DONE;
 	mutex_lock(&fullstop_mutex);
-	if (!fullstop)
+	if (fullstop == FULLSTOP_DONTSTOP)
 		fullstop = FULLSTOP_SHUTDOWN;
+	else
+		printk(KERN_WARNING /* but going down anyway, so... */
+		       "Concurrent 'rmmod rcutorture' and shutdown illegal!\n");
 	mutex_unlock(&fullstop_mutex);
 	return NOTIFY_DONE;
 }
 
 /*
+ * Absorb kthreads into a kernel function that won't return, so that
+ * they won't ever access module text or data again.
+ */
+static void rcutorture_shutdown_absorb(char *title)
+{
+	if (ACCESS_ONCE(fullstop) == FULLSTOP_SHUTDOWN) {
+		printk(KERN_NOTICE
+		       "rcutorture thread %s parking due to system shutdown\n",
+		       title);
+		schedule_timeout_uninterruptible(MAX_SCHEDULE_TIMEOUT);
+	}
+}
+
+/*
  * Allocate an element from the rcu_tortures pool.
  */
 static struct rcu_torture *
@@ -219,13 +237,14 @@ rcu_random(struct rcu_random_state *rrsp)
 }
 
 static void
-rcu_stutter_wait(void)
+rcu_stutter_wait(char *title)
 {
-	while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
+	while (stutter_pause_test || !rcutorture_runnable) {
 		if (rcutorture_runnable)
 			schedule_timeout_interruptible(1);
 		else
 			schedule_timeout_interruptible(round_jiffies_relative(HZ));
+		rcutorture_shutdown_absorb(title);
 	}
 }
 
@@ -287,7 +306,7 @@ rcu_torture_cb(struct rcu_head *p)
 	int i;
 	struct rcu_torture *rp = container_of(p, struct rcu_torture, rtort_rcu);
 
-	if (fullstop) {
+	if (fullstop != FULLSTOP_DONTSTOP) {
 		/* Test is ending, just drop callbacks on the floor. */
 		/* The next initialization will pick up the pieces. */
 		return;
@@ -619,10 +638,11 @@ rcu_torture_writer(void *arg)
 		}
 		rcu_torture_current_version++;
 		oldbatch = cur_ops->completed();
-		rcu_stutter_wait();
-	} while (!kthread_should_stop() && !fullstop);
+		rcu_stutter_wait("rcu_torture_writer");
+	} while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);
 	VERBOSE_PRINTK_STRING("rcu_torture_writer task stopping");
-	while (!kthread_should_stop() && fullstop != FULLSTOP_SHUTDOWN)
+	rcutorture_shutdown_absorb("rcu_torture_writer");
+	while (!kthread_should_stop())
 		schedule_timeout_uninterruptible(1);
 	return 0;
 }
@@ -643,11 +663,12 @@ rcu_torture_fakewriter(void *arg)
 		schedule_timeout_uninterruptible(1 + rcu_random(&rand)%10);
 		udelay(rcu_random(&rand) & 0x3ff);
 		cur_ops->sync();
-		rcu_stutter_wait();
-	} while (!kthread_should_stop() && !fullstop);
+		rcu_stutter_wait("rcu_torture_fakewriter");
+	} while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);
 
 	VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task stopping");
-	while (!kthread_should_stop() && fullstop != FULLSTOP_SHUTDOWN)
+	rcutorture_shutdown_absorb("rcu_torture_fakewriter");
+	while (!kthread_should_stop())
 		schedule_timeout_uninterruptible(1);
 	return 0;
 }
@@ -752,12 +773,13 @@ rcu_torture_reader(void *arg)
 		preempt_enable();
 		cur_ops->readunlock(idx);
 		schedule();
-		rcu_stutter_wait();
-	} while (!kthread_should_stop() && !fullstop);
+		rcu_stutter_wait("rcu_torture_reader");
+	} while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);
 	VERBOSE_PRINTK_STRING("rcu_torture_reader task stopping");
+	rcutorture_shutdown_absorb("rcu_torture_reader");
 	if (irqreader && cur_ops->irqcapable)
 		del_timer_sync(&t);
-	while (!kthread_should_stop() && fullstop != FULLSTOP_SHUTDOWN)
+	while (!kthread_should_stop())
 		schedule_timeout_uninterruptible(1);
 	return 0;
 }
@@ -854,7 +876,8 @@ rcu_torture_stats(void *arg)
 	do {
 		schedule_timeout_interruptible(stat_interval * HZ);
 		rcu_torture_stats_print();
-	} while (!kthread_should_stop() && !fullstop);
+		rcutorture_shutdown_absorb("rcu_torture_stats");
+	} while (!kthread_should_stop());
 	VERBOSE_PRINTK_STRING("rcu_torture_stats task stopping");
 	return 0;
 }
@@ -866,52 +889,49 @@ static int rcu_idle_cpu;	/* Force all torture tasks off this CPU */
  */
 static void rcu_torture_shuffle_tasks(void)
 {
-	cpumask_var_t tmp_mask;
+	cpumask_t tmp_mask;
 	int i;
 
-	if (!alloc_cpumask_var(&tmp_mask, GFP_KERNEL))
-		BUG();
-
-	cpumask_setall(tmp_mask);
+	cpus_setall(tmp_mask);
 	get_online_cpus();
 
 	/* No point in shuffling if there is only one online CPU (ex: UP) */
-	if (num_online_cpus() == 1)
-		goto out;
+	if (num_online_cpus() == 1) {
+		put_online_cpus();
+		return;
+	}
 
 	if (rcu_idle_cpu != -1)
-		cpumask_clear_cpu(rcu_idle_cpu, tmp_mask);
+		cpu_clear(rcu_idle_cpu, tmp_mask);
 
-	set_cpus_allowed_ptr(current, tmp_mask);
+	set_cpus_allowed_ptr(current, &tmp_mask);
 
 	if (reader_tasks) {
 		for (i = 0; i < nrealreaders; i++)
 			if (reader_tasks[i])
 				set_cpus_allowed_ptr(reader_tasks[i],
-						     tmp_mask);
+						     &tmp_mask);
 	}
 
 	if (fakewriter_tasks) {
 		for (i = 0; i < nfakewriters; i++)
 			if (fakewriter_tasks[i])
 				set_cpus_allowed_ptr(fakewriter_tasks[i],
-						     tmp_mask);
+						     &tmp_mask);
 	}
 
 	if (writer_task)
-		set_cpus_allowed_ptr(writer_task, tmp_mask);
+		set_cpus_allowed_ptr(writer_task, &tmp_mask);
 
 	if (stats_task)
-		set_cpus_allowed_ptr(stats_task, tmp_mask);
+		set_cpus_allowed_ptr(stats_task, &tmp_mask);
 
 	if (rcu_idle_cpu == -1)
 		rcu_idle_cpu = num_online_cpus() - 1;
 	else
 		rcu_idle_cpu--;
 
-out:
 	put_online_cpus();
-	free_cpumask_var(tmp_mask);
 }
 
 /* Shuffle tasks across CPUs, with the intent of allowing each CPU in the
@@ -925,7 +945,8 @@ rcu_torture_shuffle(void *arg)
 	do {
 		schedule_timeout_interruptible(shuffle_interval * HZ);
 		rcu_torture_shuffle_tasks();
-	} while (!kthread_should_stop() && !fullstop);
+		rcutorture_shutdown_absorb("rcu_torture_shuffle");
+	} while (!kthread_should_stop());
 	VERBOSE_PRINTK_STRING("rcu_torture_shuffle task stopping");
 	return 0;
 }
@@ -940,10 +961,11 @@ rcu_torture_stutter(void *arg)
 	do {
 		schedule_timeout_interruptible(stutter * HZ);
 		stutter_pause_test = 1;
-		if (!kthread_should_stop() && !fullstop)
+		if (!kthread_should_stop())
 			schedule_timeout_interruptible(stutter * HZ);
 		stutter_pause_test = 0;
-	} while (!kthread_should_stop() && !fullstop);
+		rcutorture_shutdown_absorb("rcu_torture_stutter");
+	} while (!kthread_should_stop());
 	VERBOSE_PRINTK_STRING("rcu_torture_stutter task stopping");
 	return 0;
 }
@@ -970,15 +992,16 @@ rcu_torture_cleanup(void)
 	int i;
 
 	mutex_lock(&fullstop_mutex);
-	if (!fullstop) {
-		/* If being signaled, let it happen, then exit. */
+	if (fullstop == FULLSTOP_SHUTDOWN) {
+		printk(KERN_WARNING /* but going down anyway, so... */
+		       "Concurrent 'rmmod rcutorture' and shutdown illegal!\n");
 		mutex_unlock(&fullstop_mutex);
-		schedule_timeout_interruptible(10 * HZ);
+		schedule_timeout_uninterruptible(10);
 		if (cur_ops->cb_barrier != NULL)
 			cur_ops->cb_barrier();
 		return;
 	}
-	fullstop = FULLSTOP_CLEANUP;
+	fullstop = FULLSTOP_RMMOD;
 	mutex_unlock(&fullstop_mutex);
 	unregister_reboot_notifier(&rcutorture_nb);
 	if (stutter_task) {
@@ -1078,7 +1101,7 @@ rcu_torture_init(void)
 	else
 		nrealreaders = 2 * num_online_cpus();
 	rcu_torture_print_module_parms("Start of test");
-	fullstop = 0;
+	fullstop = FULLSTOP_DONTSTOP;
 
 	/* Set up the freelist. */
 
diff --git a/kernel/up.c b/kernel/up.c
new file mode 100644
index 0000000..c04b9dc
--- /dev/null
+++ b/kernel/up.c
@@ -0,0 +1,20 @@
+/*
+ * Uniprocessor-only support functions.  The counterpart to kernel/smp.c
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/smp.h>
+
+int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
+				int wait)
+{
+	WARN_ON(cpu != 0);
+
+	local_irq_disable();
+	(func)(info);
+	local_irq_enable();
+
+	return 0;
+}
+EXPORT_SYMBOL(smp_call_function_single);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-12-04 19:39 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-12-04 19:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Mathieu Desnoyers (1):
      documentation: local_ops fix on_each_cpu

Roel Kluin (1):
      check_hung_task(): unsigned sysctl_hung_task_warnings cannot be less than 0


 Documentation/local_ops.txt |    2 +-
 kernel/softlockup.c         |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/local_ops.txt b/Documentation/local_ops.txt
index f4f8b1c..23045b8 100644
--- a/Documentation/local_ops.txt
+++ b/Documentation/local_ops.txt
@@ -149,7 +149,7 @@ static void do_test_timer(unsigned long data)
 	int cpu;
 
 	/* Increment the counters */
-	on_each_cpu(test_each, NULL, 0, 1);
+	on_each_cpu(test_each, NULL, 1);
 	/* Read all the counters */
 	printk("Counters read from CPU %d\n", smp_processor_id());
 	for_each_online_cpu(cpu) {
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 3953e4a..dc0b3be 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -188,7 +188,7 @@ static void check_hung_task(struct task_struct *t, unsigned long now)
 	if ((long)(now - t->last_switch_timestamp) <
 					sysctl_hung_task_timeout_secs)
 		return;
-	if (sysctl_hung_task_warnings < 0)
+	if (!sysctl_hung_task_warnings)
 		return;
 	sysctl_hung_task_warnings--;
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-11-29 19:36 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-11-29 19:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Li Zefan (1):
      lockdep: consistent alignement for lockdep info


 kernel/lockdep.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 06e1571..46a4041 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -3276,10 +3276,10 @@ void __init lockdep_info(void)
 {
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
-	printk("... MAX_LOCKDEP_SUBCLASSES:    %lu\n", MAX_LOCKDEP_SUBCLASSES);
+	printk("... MAX_LOCKDEP_SUBCLASSES:  %lu\n", MAX_LOCKDEP_SUBCLASSES);
 	printk("... MAX_LOCK_DEPTH:          %lu\n", MAX_LOCK_DEPTH);
 	printk("... MAX_LOCKDEP_KEYS:        %lu\n", MAX_LOCKDEP_KEYS);
-	printk("... CLASSHASH_SIZE:           %lu\n", CLASSHASH_SIZE);
+	printk("... CLASSHASH_SIZE:          %lu\n", CLASSHASH_SIZE);
 	printk("... MAX_LOCKDEP_ENTRIES:     %lu\n", MAX_LOCKDEP_ENTRIES);
 	printk("... MAX_LOCKDEP_CHAINS:      %lu\n", MAX_LOCKDEP_CHAINS);
 	printk("... CHAINHASH_SIZE:          %lu\n", CHAINHASH_SIZE);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-11-18 14:14 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-11-18 14:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
FUJITA Tomonori (1):
      swiotlb: use coherent_dma_mask in alloc_coherent

Ingo Molnar (1):
      MAINTAINERS: remove me as RAID maintainer


 MAINTAINERS   |    2 --
 lib/swiotlb.c |   10 +++++++---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8e0777f..627e4c8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3928,8 +3928,6 @@ M:	bootc@bootc.net
 S:	Maintained
 
 SOFTWARE RAID (Multiple Disks) SUPPORT
-P:	Ingo Molnar
-M:	mingo@redhat.com
 P:	Neil Brown
 M:	neilb@suse.de
 L:	linux-raid@vger.kernel.org
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 78330c3..5f6c629 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -467,9 +467,13 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
 	dma_addr_t dev_addr;
 	void *ret;
 	int order = get_order(size);
+	u64 dma_mask = DMA_32BIT_MASK;
+
+	if (hwdev && hwdev->coherent_dma_mask)
+		dma_mask = hwdev->coherent_dma_mask;
 
 	ret = (void *)__get_free_pages(flags, order);
-	if (ret && address_needs_mapping(hwdev, virt_to_bus(ret), size)) {
+	if (ret && !is_buffer_dma_capable(dma_mask, virt_to_bus(ret), size)) {
 		/*
 		 * The allocated memory isn't reachable by the device.
 		 * Fall back on swiotlb_map_single().
@@ -493,9 +497,9 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
 	dev_addr = virt_to_bus(ret);
 
 	/* Confirm address can be DMA'd by device */
-	if (address_needs_mapping(hwdev, dev_addr, size)) {
+	if (!is_buffer_dma_capable(dma_mask, dev_addr, size)) {
 		printk("hwdev DMA mask = 0x%016Lx, dev_addr = 0x%016Lx\n",
-		       (unsigned long long)*hwdev->dma_mask,
+		       (unsigned long long)dma_mask,
 		       (unsigned long long)dev_addr);
 
 		/* DMA_TO_DEVICE to avoid memcpy in unmap_single */

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-11-07 16:28 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-11-07 16:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

Xen fixes filed under tip/core/urgent becase they have MM impact.

 Thanks,

	Ingo

------------------>
Jeremy Fitzhardinge (2):
      vmap: cope with vm_unmap_aliases before vmalloc_init()
      xen: make sure stray alias mappings are gone before pinning


 arch/x86/xen/enlighten.c |    5 +++--
 arch/x86/xen/mmu.c       |    9 ++++++---
 mm/vmalloc.c             |    7 +++++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index b61534c..5e4686d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -863,15 +863,16 @@ static void xen_alloc_ptpage(struct mm_struct *mm, unsigned long pfn, unsigned l
 	if (PagePinned(virt_to_page(mm->pgd))) {
 		SetPagePinned(page);
 
+		vm_unmap_aliases();
 		if (!PageHighMem(page)) {
 			make_lowmem_page_readonly(__va(PFN_PHYS((unsigned long)pfn)));
 			if (level == PT_PTE && USE_SPLIT_PTLOCKS)
 				pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn);
-		} else
+		} else {
 			/* make sure there are no stray mappings of
 			   this page */
 			kmap_flush_unused();
-			vm_unmap_aliases();
+		}
 	}
 }
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index aba77b2..89f3b6e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -850,13 +850,16 @@ static int xen_pin_page(struct mm_struct *mm, struct page *page,
    read-only, and can be pinned. */
 static void __xen_pgd_pin(struct mm_struct *mm, pgd_t *pgd)
 {
+	vm_unmap_aliases();
+
 	xen_mc_batch();
 
-	if (xen_pgd_walk(mm, xen_pin_page, USER_LIMIT)) {
-		/* re-enable interrupts for kmap_flush_unused */
+	 if (xen_pgd_walk(mm, xen_pin_page, USER_LIMIT)) {
+		/* re-enable interrupts for flushing */
 		xen_mc_issue(0);
+
 		kmap_flush_unused();
-		vm_unmap_aliases();
+
 		xen_mc_batch();
 	}
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 66fad3f..ba6b0f5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -592,6 +592,8 @@ static void free_unmap_vmap_area_addr(unsigned long addr)
 
 #define VMAP_BLOCK_SIZE		(VMAP_BBMAP_BITS * PAGE_SIZE)
 
+static bool vmap_initialized __read_mostly = false;
+
 struct vmap_block_queue {
 	spinlock_t lock;
 	struct list_head free;
@@ -828,6 +830,9 @@ void vm_unmap_aliases(void)
 	int cpu;
 	int flush = 0;
 
+	if (unlikely(!vmap_initialized))
+		return;
+
 	for_each_possible_cpu(cpu) {
 		struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
 		struct vmap_block *vb;
@@ -942,6 +947,8 @@ void __init vmalloc_init(void)
 		INIT_LIST_HEAD(&vbq->dirty);
 		vbq->nr_dirty = 0;
 	}
+
+	vmap_initialized = true;
 }
 
 void unmap_kernel_range(unsigned long addr, unsigned long size)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
  2008-10-16 22:32 ` Linus Torvalds
@ 2008-10-17  6:23   ` Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-10-17  6:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Benjamin Herrenschmidt, Kumar Gala, Linux Kernel Mailing List,
	Andrew Morton, Jeremy Fitzhardinge, Becky Bruce


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> >       softirq, warning fix: correct a format to avoid a warning
> 
> Ingo, stop dicking around with this crap. You apparently fixed some 
> warning that I never saw by turning it into a warning that I _do_ see:

yes, sorry - i noticed the new warning too and notified you two days ago 
(see the mail below) but got held up by the ftrace stuff. I have two 
other core/urgent fixes queued up as well - see the pull request below.

	Ingo

----- Forwarded message from Ingo Molnar <mingo@elte.hu> -----

Date: Wed, 15 Oct 2008 18:29:01 +0200
From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [git pull] core kernel updates for v2.6.28


something i just noticed:

>    core/softirq

>       softirq, warning fix: correct a format to avoid a warning

Sorry, this warning fix was not complete and will produce a new warning 
on 64-bit x86. Will queue up a fix.

	Ingo

----- End forwarded message -----

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Ingo Molnar (1):
      m32r: fix build due to notify_cpu_starting() change

Stephen Rothwell (1):
      powerpc: fix linux-next build failure


 arch/m32r/kernel/smpboot.c      |    1 +
 arch/powerpc/include/asm/page.h |    6 +++++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/m32r/kernel/smpboot.c b/arch/m32r/kernel/smpboot.c
index fc29948..39cb6da 100644
--- a/arch/m32r/kernel/smpboot.c
+++ b/arch/m32r/kernel/smpboot.c
@@ -40,6 +40,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/cpu.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index e088545..94fe513 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -10,9 +10,13 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+#else
+#include <asm/types.h>
+#endif
 #include <asm/asm-compat.h>
 #include <asm/kdump.h>
-#include <asm/types.h>
 
 /*
  * On PPC32 page size is 4K. For PPC64 we support either 4K or 64K software

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-08-28 11:44 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-08-28 11:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

Thanks,

	Ingo

------------------>
Joe Korty (2):
      lockstat: fix numerical output rounding error
      lockstat: repair erronous contention statistics

Steve VanDeBogart (1):
      exit signals: use of uninitialized field notify_count

Zhu Yi (1):
      lockdep: fix invalid list_del_rcu in zap_class


 kernel/exit.c         |    4 ++--
 kernel/lockdep.c      |    6 +++---
 kernel/lockdep_proc.c |    3 ++-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 38ec406..75c6473 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -918,8 +918,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 
 	/* mt-exec, de_thread() is waiting for us */
 	if (thread_group_leader(tsk) &&
-	    tsk->signal->notify_count < 0 &&
-	    tsk->signal->group_exit_task)
+	    tsk->signal->group_exit_task &&
+	    tsk->signal->notify_count < 0)
 		wake_up_process(tsk->signal->group_exit_task);
 
 	write_unlock_irq(&tasklist_lock);
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 3bfb187..dbda475 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -875,11 +875,11 @@ static int add_lock_to_list(struct lock_class *class, struct lock_class *this,
 	if (!entry)
 		return 0;
 
-	entry->class = this;
-	entry->distance = distance;
 	if (!save_trace(&entry->trace))
 		return 0;
 
+	entry->class = this;
+	entry->distance = distance;
 	/*
 	 * Since we never remove from the dependency list, the list can
 	 * be walked lockless by other CPUs, it's only allocation
@@ -3029,7 +3029,7 @@ found_it:
 
 	stats = get_lock_stats(hlock_class(hlock));
 	if (point < ARRAY_SIZE(stats->contention_point))
-		stats->contention_point[i]++;
+		stats->contention_point[point]++;
 	if (lock->cpu != smp_processor_id())
 		stats->bounces[bounce_contended + !!hlock->read]++;
 	put_lock_stats(stats);
diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c
index 4b194d3..20dbcbf 100644
--- a/kernel/lockdep_proc.c
+++ b/kernel/lockdep_proc.c
@@ -472,8 +472,9 @@ static void snprint_time(char *buf, size_t bufsiz, s64 nr)
 {
 	unsigned long rem;
 
+	nr += 5; /* for display rounding */
 	rem = do_div(nr, 1000); /* XXX: do_div_signed */
-	snprintf(buf, bufsiz, "%lld.%02d", (long long)nr, ((int)rem+5)/10);
+	snprintf(buf, bufsiz, "%lld.%02d", (long long)nr, (int)rem/10);
 }
 
 static void seq_time(struct seq_file *m, s64 time)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-08-18 18:35 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-08-18 18:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, Peter Zijlstra

Linus,

Please pull the latest core-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

 Thanks,

	Ingo

------------------>
Dmitry Baryshkov (1):
      lockdep: fix spurious 'inconsistent lock state' warning


 kernel/lockdep.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 77fa776..3bfb187 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2582,7 +2582,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	hlock->trylock = trylock;
 	hlock->read = read;
 	hlock->check = check;
-	hlock->hardirqs_off = hardirqs_off;
+	hlock->hardirqs_off = !!hardirqs_off;
 #ifdef CONFIG_LOCK_STAT
 	hlock->waittime_stamp = 0;
 	hlock->holdtime_stamp = sched_clock();

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-07-24 15:13 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-07-24 15:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton

Linus,

Please pull the latest core kernel fixes git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

Thanks,

	Ingo

------------------>
Andrew Morton (1):
      arch/mips/kernel/stacktrace.c: Heiko can't type

Heiko Carstens (1):
      fix core/stacktrace changes on avr32, mips, sh

Mike Travis (1):
      kthread: reduce stack pressure in create_kthread and kthreadd


 arch/avr32/kernel/stacktrace.c |    1 +
 arch/mips/kernel/stacktrace.c  |    1 +
 arch/sh/kernel/stacktrace.c    |    1 +
 kernel/kthread.c               |    4 ++--
 4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/avr32/kernel/stacktrace.c b/arch/avr32/kernel/stacktrace.c
index f4bdb44..c09f0d8 100644
--- a/arch/avr32/kernel/stacktrace.c
+++ b/arch/avr32/kernel/stacktrace.c
@@ -10,6 +10,7 @@
 #include <linux/sched.h>
 #include <linux/stacktrace.h>
 #include <linux/thread_info.h>
+#include <linux/module.h>
 
 register unsigned long current_frame_pointer asm("r7");
 
diff --git a/arch/mips/kernel/stacktrace.c b/arch/mips/kernel/stacktrace.c
index 5eb4681..0632e2a 100644
--- a/arch/mips/kernel/stacktrace.c
+++ b/arch/mips/kernel/stacktrace.c
@@ -7,6 +7,7 @@
  */
 #include <linux/sched.h>
 #include <linux/stacktrace.h>
+#include <linux/module.h>
 #include <asm/stacktrace.h>
 
 /*
diff --git a/arch/sh/kernel/stacktrace.c b/arch/sh/kernel/stacktrace.c
index 1b2ae35..54d1f61 100644
--- a/arch/sh/kernel/stacktrace.c
+++ b/arch/sh/kernel/stacktrace.c
@@ -12,6 +12,7 @@
 #include <linux/sched.h>
 #include <linux/stacktrace.h>
 #include <linux/thread_info.h>
+#include <linux/module.h>
 #include <asm/ptrace.h>
 
 /*
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ac3fb73..6111c27 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -106,7 +106,7 @@ static void create_kthread(struct kthread_create_info *create)
 		 */
 		sched_setscheduler(create->result, SCHED_NORMAL, &param);
 		set_user_nice(create->result, KTHREAD_NICE_LEVEL);
-		set_cpus_allowed(create->result, CPU_MASK_ALL);
+		set_cpus_allowed_ptr(create->result, CPU_MASK_ALL_PTR);
 	}
 	complete(&create->done);
 }
@@ -233,7 +233,7 @@ int kthreadd(void *unused)
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
 	set_user_nice(tsk, KTHREAD_NICE_LEVEL);
-	set_cpus_allowed(tsk, CPU_MASK_ALL);
+	set_cpus_allowed_ptr(tsk, CPU_MASK_ALL_PTR);
 
 	current->flags |= PF_NOFREEZE | PF_FREEZER_NOSIG;
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [git pull] core kernel fixes
  2008-06-30 19:51         ` Vegard Nossum
@ 2008-06-30 19:54           ` Thomas Gleixner
  0 siblings, 0 replies; 97+ messages in thread
From: Thomas Gleixner @ 2008-06-30 19:54 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Linus Torvalds, linux-kernel, Andrew Morton,
	Daniel J Blueman

On Mon, 30 Jun 2008, Vegard Nossum wrote:
> >
> > alloc_object() is called with interrupts disabled from __debug_object_init()
> 
> Ok, thanks for clearing that up! :-)
> 
> (I'm just wary of patches that mutate when they have my name on the top...)

Yeah, that's why I added:

[ daniel.blueman@gmail.com: pool_lock needs to be taken irq safe in fill_pool ]

Thanks,
	tglx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [git pull] core kernel fixes
  2008-06-30 19:46       ` Thomas Gleixner
@ 2008-06-30 19:51         ` Vegard Nossum
  2008-06-30 19:54           ` Thomas Gleixner
  0 siblings, 1 reply; 97+ messages in thread
From: Vegard Nossum @ 2008-06-30 19:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Linus Torvalds, linux-kernel, Andrew Morton,
	Daniel J Blueman

On Mon, Jun 30, 2008 at 9:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 30 Jun 2008, Vegard Nossum wrote:
>> On Mon, Jun 30, 2008 at 8:20 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> > The patch was tested with our standard tests so it's certainly good in
>> > practice - but i havent specifically tried your testcase (maybe Thomas
>> > has). Can you see any problem with the fix?
>>
>> Well, what I can see is that the patch that was committed has some
>> missing changes. In Daniel's patch:
>>
>> -repeat:
>> -     spin_lock(&pool_lock);
>> +     spin_lock_irqsave(&pool_lock, flags);
>>       if (obj_pool.first) {
>>               obj         = hlist_entry(obj_pool.first, typeof(*obj), node);
>>
>> The patch that was committed:
>>
>> -repeat:
>>         spin_lock(&pool_lock);
>>         if (obj_pool.first) {
>>                 obj         = hlist_entry(obj_pool.first, typeof(*obj), node);
>>
>> Was it not necessary to make the pool lock irq-safe in this place?
>>
>> For reference:
>>
>> Daniel's patch: http://lkml.org/lkml/2008/6/15/27
>> Actual commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=70c85057e0bde35eb56352a293ecb5d1641a0334;hp=e6100f23375c0c71ce595d04551fa6553b611918
>
> alloc_object() is called with interrupts disabled from __debug_object_init()

Ok, thanks for clearing that up! :-)

(I'm just wary of patches that mutate when they have my name on the top...)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [git pull] core kernel fixes
  2008-06-30 18:43     ` Vegard Nossum
@ 2008-06-30 19:46       ` Thomas Gleixner
  2008-06-30 19:51         ` Vegard Nossum
  0 siblings, 1 reply; 97+ messages in thread
From: Thomas Gleixner @ 2008-06-30 19:46 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Linus Torvalds, linux-kernel, Andrew Morton,
	Daniel J Blueman

On Mon, 30 Jun 2008, Vegard Nossum wrote:
> On Mon, Jun 30, 2008 at 8:20 PM, Ingo Molnar <mingo@elte.hu> wrote:
> > The patch was tested with our standard tests so it's certainly good in
> > practice - but i havent specifically tried your testcase (maybe Thomas
> > has). Can you see any problem with the fix?
> 
> Well, what I can see is that the patch that was committed has some
> missing changes. In Daniel's patch:
> 
> -repeat:
> -	spin_lock(&pool_lock);
> +	spin_lock_irqsave(&pool_lock, flags);
>  	if (obj_pool.first) {
>  		obj	    = hlist_entry(obj_pool.first, typeof(*obj), node);
> 
> The patch that was committed:
> 
> -repeat:
>         spin_lock(&pool_lock);
>         if (obj_pool.first) {
>                 obj         = hlist_entry(obj_pool.first, typeof(*obj), node);
> 
> Was it not necessary to make the pool lock irq-safe in this place?
> 
> For reference:
> 
> Daniel's patch: http://lkml.org/lkml/2008/6/15/27
> Actual commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=70c85057e0bde35eb56352a293ecb5d1641a0334;hp=e6100f23375c0c71ce595d04551fa6553b611918

alloc_object() is called with interrupts disabled from __debug_object_init()

Thanks,
	tglx
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [git pull] core kernel fixes
  2008-06-30 18:20   ` Ingo Molnar
@ 2008-06-30 18:43     ` Vegard Nossum
  2008-06-30 19:46       ` Thomas Gleixner
  0 siblings, 1 reply; 97+ messages in thread
From: Vegard Nossum @ 2008-06-30 18:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Thomas Gleixner,
	Daniel J Blueman

On Mon, Jun 30, 2008 at 8:20 PM, Ingo Molnar <mingo@elte.hu> wrote:
> The patch was tested with our standard tests so it's certainly good in
> practice - but i havent specifically tried your testcase (maybe Thomas
> has). Can you see any problem with the fix?

Well, what I can see is that the patch that was committed has some
missing changes. In Daniel's patch:

-repeat:
-	spin_lock(&pool_lock);
+	spin_lock_irqsave(&pool_lock, flags);
 	if (obj_pool.first) {
 		obj	    = hlist_entry(obj_pool.first, typeof(*obj), node);

The patch that was committed:

-repeat:
        spin_lock(&pool_lock);
        if (obj_pool.first) {
                obj         = hlist_entry(obj_pool.first, typeof(*obj), node);

Was it not necessary to make the pool lock irq-safe in this place?

For reference:

Daniel's patch: http://lkml.org/lkml/2008/6/15/27
Actual commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=70c85057e0bde35eb56352a293ecb5d1641a0334;hp=e6100f23375c0c71ce595d04551fa6553b611918


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [git pull] core kernel fixes
  2008-06-30 17:02 ` Vegard Nossum
@ 2008-06-30 18:20   ` Ingo Molnar
  2008-06-30 18:43     ` Vegard Nossum
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2008-06-30 18:20 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Thomas Gleixner,
	Daniel J Blueman


* Vegard Nossum <vegard.nossum@gmail.com> wrote:

> On Mon, Jun 30, 2008 at 5:32 PM, Ingo Molnar <mingo@elte.hu> wrote:
> > diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> > index a76a5e1..85b18d7 100644
> > --- a/lib/debugobjects.c
> > +++ b/lib/debugobjects.c
> > @@ -68,6 +68,7 @@ static int fill_pool(void)
> >  {
> >        gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
> >        struct debug_obj *new;
> > +       unsigned long flags;
> >
> >        if (likely(obj_pool_free >= ODEBUG_POOL_MIN_LEVEL))
> >                return obj_pool_free;
> > @@ -81,10 +82,10 @@ static int fill_pool(void)
> >                if (!new)
> >                        return obj_pool_free;
> >
> > -               spin_lock(&pool_lock);
> > +               spin_lock_irqsave(&pool_lock, flags);
> >                hlist_add_head(&new->node, &obj_pool);
> >                obj_pool_free++;
> > -               spin_unlock(&pool_lock);
> > +               spin_unlock_irqrestore(&pool_lock, flags);
> >        }
> >        return obj_pool_free;
> >  }
> > @@ -110,16 +111,13 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
> >  }
> >
> >  /*
> > - * Allocate a new object. If the pool is empty and no refill possible,
> > - * switch off the debugger.
> > + * Allocate a new object. If the pool is empty, switch off the debugger.
> >  */
> >  static struct debug_obj *
> >  alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
> >  {
> >        struct debug_obj *obj = NULL;
> > -       int retry = 0;
> >
> > -repeat:
> >        spin_lock(&pool_lock);
> >        if (obj_pool.first) {
> >                obj         = hlist_entry(obj_pool.first, typeof(*obj), node);
> > @@ -141,9 +139,6 @@ repeat:
> >        }
> >        spin_unlock(&pool_lock);
> >
> > -       if (fill_pool() && !obj && !retry++)
> > -               goto repeat;
> > -
> >        return obj;
> >  }
> >
> > @@ -261,6 +256,8 @@ __debug_object_init(void *addr, struct debug_obj_descr *descr, int onstack)
> >        struct debug_obj *obj;
> >        unsigned long flags;
> >
> > +       fill_pool();
> > +
> >        db = get_bucket((unsigned long) addr);
> >
> >        spin_lock_irqsave(&db->lock, flags);
> > --
> 
> Hm. I have to wonder where this patch came from.
> 
> This was my (faulty) patch: http://lkml.org/lkml/2008/6/14/193
> and Daniel J Blueman followed up with this: http://lkml.org/lkml/2008/6/15/27
> 
> ..but this one looks different from both. I am guessing the last bits 
> were added (or removed?) by Thomas?
> 
> I am wondering if the final patch was tested with the reproducible 
> test case (if so, by whom?) and whether should be credited to Daniel 
> (or Thomas?) instead...

You can use "git log -1 -p --pretty=fuller 50db04dd9c" to see the exact 
details of the commit:

----------
| commit 50db04dd9c74178e68a981a7127c37252ffb3242
| Author:     Vegard Nossum <vegard.nossum@gmail.com>
| AuthorDate: Sun Jun 15 00:47:36 2008 +0200
| Commit:     Thomas Gleixner <tglx@linutronix.de>
| CommitDate: Wed Jun 18 11:09:54 2008 +0200
|
| [...]
|
|   [ daniel.blueman@gmail.com: pool_lock needs to be taken irq safe in fill_pool ]
|
|   Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
|   Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
|   Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
----------

As you can see it from the Commit line, it was committed by Thomas.

The "[ daniel.blueman: pool_lock ... ]" line shows that Thomas - instead 
of creating two commits - merged the two fixes into a single commit and 
credited Daniel for the irq-safe fix. This is the standard technique to 
squash small patches and to make fixes multi-authored.

The patch was tested with our standard tests so it's certainly good in 
practice - but i havent specifically tried your testcase (maybe Thomas 
has). Can you see any problem with the fix?

	Ingo

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [git pull] core kernel fixes
  2008-06-30 15:32 Ingo Molnar
@ 2008-06-30 17:02 ` Vegard Nossum
  2008-06-30 18:20   ` Ingo Molnar
  0 siblings, 1 reply; 97+ messages in thread
From: Vegard Nossum @ 2008-06-30 17:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Thomas Gleixner,
	Daniel J Blueman

On Mon, Jun 30, 2008 at 5:32 PM, Ingo Molnar <mingo@elte.hu> wrote:
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index a76a5e1..85b18d7 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -68,6 +68,7 @@ static int fill_pool(void)
>  {
>        gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
>        struct debug_obj *new;
> +       unsigned long flags;
>
>        if (likely(obj_pool_free >= ODEBUG_POOL_MIN_LEVEL))
>                return obj_pool_free;
> @@ -81,10 +82,10 @@ static int fill_pool(void)
>                if (!new)
>                        return obj_pool_free;
>
> -               spin_lock(&pool_lock);
> +               spin_lock_irqsave(&pool_lock, flags);
>                hlist_add_head(&new->node, &obj_pool);
>                obj_pool_free++;
> -               spin_unlock(&pool_lock);
> +               spin_unlock_irqrestore(&pool_lock, flags);
>        }
>        return obj_pool_free;
>  }
> @@ -110,16 +111,13 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
>  }
>
>  /*
> - * Allocate a new object. If the pool is empty and no refill possible,
> - * switch off the debugger.
> + * Allocate a new object. If the pool is empty, switch off the debugger.
>  */
>  static struct debug_obj *
>  alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
>  {
>        struct debug_obj *obj = NULL;
> -       int retry = 0;
>
> -repeat:
>        spin_lock(&pool_lock);
>        if (obj_pool.first) {
>                obj         = hlist_entry(obj_pool.first, typeof(*obj), node);
> @@ -141,9 +139,6 @@ repeat:
>        }
>        spin_unlock(&pool_lock);
>
> -       if (fill_pool() && !obj && !retry++)
> -               goto repeat;
> -
>        return obj;
>  }
>
> @@ -261,6 +256,8 @@ __debug_object_init(void *addr, struct debug_obj_descr *descr, int onstack)
>        struct debug_obj *obj;
>        unsigned long flags;
>
> +       fill_pool();
> +
>        db = get_bucket((unsigned long) addr);
>
>        spin_lock_irqsave(&db->lock, flags);
> --

Hm. I have to wonder where this patch came from.

This was my (faulty) patch: http://lkml.org/lkml/2008/6/14/193
and Daniel J Blueman followed up with this: http://lkml.org/lkml/2008/6/15/27

..but this one looks different from both. I am guessing the last bits
were added (or removed?) by Thomas?

I am wondering if the final patch was tested with the reproducible
test case (if so, by whom?) and whether should be credited to Daniel
(or Thomas?) instead...

?


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-06-30 15:32 Ingo Molnar
  2008-06-30 17:02 ` Vegard Nossum
  0 siblings, 1 reply; 97+ messages in thread
From: Ingo Molnar @ 2008-06-30 15:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner

Linus,

Please pull the latest core kernel fixes git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

Thanks,

	Ingo

------------------>
Vegard Nossum (1):
      debugobjects: fix lockdep warning

 lib/debugobjects.c |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index a76a5e1..85b18d7 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -68,6 +68,7 @@ static int fill_pool(void)
 {
 	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
 	struct debug_obj *new;
+	unsigned long flags;
 
 	if (likely(obj_pool_free >= ODEBUG_POOL_MIN_LEVEL))
 		return obj_pool_free;
@@ -81,10 +82,10 @@ static int fill_pool(void)
 		if (!new)
 			return obj_pool_free;
 
-		spin_lock(&pool_lock);
+		spin_lock_irqsave(&pool_lock, flags);
 		hlist_add_head(&new->node, &obj_pool);
 		obj_pool_free++;
-		spin_unlock(&pool_lock);
+		spin_unlock_irqrestore(&pool_lock, flags);
 	}
 	return obj_pool_free;
 }
@@ -110,16 +111,13 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
 }
 
 /*
- * Allocate a new object. If the pool is empty and no refill possible,
- * switch off the debugger.
+ * Allocate a new object. If the pool is empty, switch off the debugger.
  */
 static struct debug_obj *
 alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
 {
 	struct debug_obj *obj = NULL;
-	int retry = 0;
 
-repeat:
 	spin_lock(&pool_lock);
 	if (obj_pool.first) {
 		obj	    = hlist_entry(obj_pool.first, typeof(*obj), node);
@@ -141,9 +139,6 @@ repeat:
 	}
 	spin_unlock(&pool_lock);
 
-	if (fill_pool() && !obj && !retry++)
-		goto repeat;
-
 	return obj;
 }
 
@@ -261,6 +256,8 @@ __debug_object_init(void *addr, struct debug_obj_descr *descr, int onstack)
 	struct debug_obj *obj;
 	unsigned long flags;
 
+	fill_pool();
+
 	db = get_bucket((unsigned long) addr);
 
 	spin_lock_irqsave(&db->lock, flags);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-06-23 19:45 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-06-23 19:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner, Andrew Morton

Linus,

please pull the latest core kernel fixes git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

	Ingo

------------------>
Thomas Gleixner (1):
      futexes: fix fault handling in futex_lock_pi

 kernel/futex.c |   93 ++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 73 insertions(+), 20 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 449def8..7d1136e 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1096,21 +1096,64 @@ static void unqueue_me_pi(struct futex_q *q)
  * private futexes.
  */
 static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
-				struct task_struct *newowner)
+				struct task_struct *newowner,
+				struct rw_semaphore *fshared)
 {
 	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
+	struct task_struct *oldowner = pi_state->owner;
 	u32 uval, curval, newval;
-	int ret;
+	int ret, attempt = 0;
 
 	/* Owner died? */
+	if (!pi_state->owner)
+		newtid |= FUTEX_OWNER_DIED;
+
+	/*
+	 * We are here either because we stole the rtmutex from the
+	 * pending owner or we are the pending owner which failed to
+	 * get the rtmutex. We have to replace the pending owner TID
+	 * in the user space variable. This must be atomic as we have
+	 * to preserve the owner died bit here.
+	 *
+	 * Note: We write the user space value _before_ changing the
+	 * pi_state because we can fault here. Imagine swapped out
+	 * pages or a fork, which was running right before we acquired
+	 * mmap_sem, that marked all the anonymous memory readonly for
+	 * cow.
+	 *
+	 * Modifying pi_state _before_ the user space value would
+	 * leave the pi_state in an inconsistent state when we fault
+	 * here, because we need to drop the hash bucket lock to
+	 * handle the fault. This might be observed in the PID check
+	 * in lookup_pi_state.
+	 */
+retry:
+	if (get_futex_value_locked(&uval, uaddr))
+		goto handle_fault;
+
+	while (1) {
+		newval = (uval & FUTEX_OWNER_DIED) | newtid;
+
+		curval = cmpxchg_futex_value_locked(uaddr, uval, newval);
+
+		if (curval == -EFAULT)
+			goto handle_fault;
+		if (curval == uval)
+			break;
+		uval = curval;
+	}
+
+	/*
+	 * We fixed up user space. Now we need to fix the pi_state
+	 * itself.
+	 */
 	if (pi_state->owner != NULL) {
 		spin_lock_irq(&pi_state->owner->pi_lock);
 		WARN_ON(list_empty(&pi_state->list));
 		list_del_init(&pi_state->list);
 		spin_unlock_irq(&pi_state->owner->pi_lock);
-	} else
-		newtid |= FUTEX_OWNER_DIED;
+	}
 
 	pi_state->owner = newowner;
 
@@ -1118,26 +1161,35 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 	WARN_ON(!list_empty(&pi_state->list));
 	list_add(&pi_state->list, &newowner->pi_state_list);
 	spin_unlock_irq(&newowner->pi_lock);
+	return 0;
 
 	/*
-	 * We own it, so we have to replace the pending owner
-	 * TID. This must be atomic as we have preserve the
-	 * owner died bit here.
+	 * To handle the page fault we need to drop the hash bucket
+	 * lock here. That gives the other task (either the pending
+	 * owner itself or the task which stole the rtmutex) the
+	 * chance to try the fixup of the pi_state. So once we are
+	 * back from handling the fault we need to check the pi_state
+	 * after reacquiring the hash bucket lock and before trying to
+	 * do another fixup. When the fixup has been done already we
+	 * simply return.
 	 */
-	ret = get_futex_value_locked(&uval, uaddr);
+handle_fault:
+	spin_unlock(q->lock_ptr);
 
-	while (!ret) {
-		newval = (uval & FUTEX_OWNER_DIED) | newtid;
+	ret = futex_handle_fault((unsigned long)uaddr, fshared, attempt++);
 
-		curval = cmpxchg_futex_value_locked(uaddr, uval, newval);
+	spin_lock(q->lock_ptr);
 
-		if (curval == -EFAULT)
-			ret = -EFAULT;
-		if (curval == uval)
-			break;
-		uval = curval;
-	}
-	return ret;
+	/*
+	 * Check if someone else fixed it for us:
+	 */
+	if (pi_state->owner != oldowner)
+		return 0;
+
+	if (ret)
+		return ret;
+
+	goto retry;
 }
 
 /*
@@ -1507,7 +1559,7 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
 		 * that case:
 		 */
 		if (q.pi_state->owner != curr)
-			ret = fixup_pi_state_owner(uaddr, &q, curr);
+			ret = fixup_pi_state_owner(uaddr, &q, curr, fshared);
 	} else {
 		/*
 		 * Catch the rare case, where the lock was released
@@ -1539,7 +1591,8 @@ static int futex_lock_pi(u32 __user *uaddr, struct rw_semaphore *fshared,
 				int res;
 
 				owner = rt_mutex_owner(&q.pi_state->pi_mutex);
-				res = fixup_pi_state_owner(uaddr, &q, owner);
+				res = fixup_pi_state_owner(uaddr, &q, owner,
+							   fshared);
 
 				/* propagate -EFAULT, if the fixup failed */
 				if (res)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [git pull] core kernel fixes
@ 2008-06-19 15:16 Ingo Molnar
  0 siblings, 0 replies; 97+ messages in thread
From: Ingo Molnar @ 2008-06-19 15:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


Linus,

please pull the latest misc core-kernel fixes git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-fixes-for-linus

Thanks,

	Ingo

------------------>
Jason Wessel (1):
      softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression

Li Zefan (1):
      cpuset: limit the input of cpuset.sched_relax_domain_level

Steven Rostedt (1):
      rcupreempt: remove export of rcu_batches_completed_bh

 Documentation/cpusets.txt |    2 +-
 kernel/cpuset.c           |    4 ++--
 kernel/rcupreempt.c       |    2 --
 kernel/sched.c            |    7 ++++++-
 kernel/softlockup.c       |   15 ++++++++++-----
 5 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index d803c5c..353504d 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -542,7 +542,7 @@ otherwise initial value -1 that indicates the cpuset has no request.
    2  : search cores in a package.
    3  : search cpus in a node [= system wide on non-NUMA system]
  ( 4  : search nodes in a chunk of node [on NUMA system] )
- ( 5~ : search system wide [on NUMA system])
+ ( 5  : search system wide [on NUMA system] )
 
 This file is per-cpuset and affect the sched domain where the cpuset
 belongs to.  Therefore if the flag 'sched_load_balance' of a cpuset
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 039baa4..66103a1 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1037,8 +1037,8 @@ int current_cpuset_is_being_rebound(void)
 
 static int update_relax_domain_level(struct cpuset *cs, s64 val)
 {
-	if ((int)val < 0)
-		val = -1;
+	if (val < -1 || val >= SD_LV_MAX)
+		return -EINVAL;
 
 	if (val != cs->relax_domain_level) {
 		cs->relax_domain_level = val;
diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
index e1cdf19..5e02b77 100644
--- a/kernel/rcupreempt.c
+++ b/kernel/rcupreempt.c
@@ -217,8 +217,6 @@ long rcu_batches_completed(void)
 }
 EXPORT_SYMBOL_GPL(rcu_batches_completed);
 
-EXPORT_SYMBOL_GPL(rcu_batches_completed_bh);
-
 void __rcu_read_lock(void)
 {
 	int idx;
diff --git a/kernel/sched.c b/kernel/sched.c
index eaf6751..bb2c699 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6877,7 +6877,12 @@ static int default_relax_domain_level = -1;
 
 static int __init setup_relax_domain_level(char *str)
 {
-	default_relax_domain_level = simple_strtoul(str, NULL, 0);
+	unsigned long val;
+
+	val = simple_strtoul(str, NULL, 0);
+	if (val < SD_LV_MAX)
+		default_relax_domain_level = val;
+
 	return 1;
 }
 __setup("relax_domain_level=", setup_relax_domain_level);
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 01b6522..c828c23 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -49,12 +49,17 @@ static unsigned long get_timestamp(int this_cpu)
 	return cpu_clock(this_cpu) >> 30LL;  /* 2^30 ~= 10^9 */
 }
 
-void touch_softlockup_watchdog(void)
+static void __touch_softlockup_watchdog(void)
 {
 	int this_cpu = raw_smp_processor_id();
 
 	__raw_get_cpu_var(touch_timestamp) = get_timestamp(this_cpu);
 }
+
+void touch_softlockup_watchdog(void)
+{
+	__raw_get_cpu_var(touch_timestamp) = 0;
+}
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
 void touch_all_softlockup_watchdogs(void)
@@ -80,7 +85,7 @@ void softlockup_tick(void)
 	unsigned long now;
 
 	if (touch_timestamp == 0) {
-		touch_softlockup_watchdog();
+		__touch_softlockup_watchdog();
 		return;
 	}
 
@@ -95,7 +100,7 @@ void softlockup_tick(void)
 
 	/* do not print during early bootup: */
 	if (unlikely(system_state != SYSTEM_RUNNING)) {
-		touch_softlockup_watchdog();
+		__touch_softlockup_watchdog();
 		return;
 	}
 
@@ -214,7 +219,7 @@ static int watchdog(void *__bind_cpu)
 	sched_setscheduler(current, SCHED_FIFO, &param);
 
 	/* initialize timestamp */
-	touch_softlockup_watchdog();
+	__touch_softlockup_watchdog();
 
 	set_current_state(TASK_INTERRUPTIBLE);
 	/*
@@ -223,7 +228,7 @@ static int watchdog(void *__bind_cpu)
 	 * debug-printout triggers in softlockup_tick().
 	 */
 	while (!kthread_should_stop()) {
-		touch_softlockup_watchdog();
+		__touch_softlockup_watchdog();
 		schedule();
 
 		if (kthread_should_stop())

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2020-01-29 19:10 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-30 23:29 [git pull] core kernel fixes Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2020-01-29 11:53 [GIT PULL] " Ingo Molnar
2020-01-29 19:10 ` pr-tracker-bot
2019-05-18  8:51 Ingo Molnar
2019-05-19 17:45 ` pr-tracker-bot
2019-05-16 15:51 Ingo Molnar
2019-05-16 18:20 ` pr-tracker-bot
2018-07-21 11:58 Ingo Molnar
2017-12-06 22:01 Ingo Molnar
2017-11-05 14:33 Ingo Molnar
2017-11-05 18:09 ` Linus Torvalds
2017-11-05 19:53   ` Josh Poimboeuf
2017-11-05 20:12     ` Linus Torvalds
2017-11-05 21:01       ` Josh Poimboeuf
2017-07-21 10:01 Ingo Molnar
2016-10-18 10:14 Ingo Molnar
2016-07-13 10:55 Ingo Molnar
2016-04-03 10:45 Ingo Molnar
2015-02-06 18:28 Ingo Molnar
2012-10-23 10:57 Ingo Molnar
2012-08-03 16:31 Ingo Molnar
2012-08-03 16:55 ` Darren Hart
2012-08-03 17:01   ` Ingo Molnar
2012-08-03 17:24     ` Darren Hart
2012-06-15 18:45 Ingo Molnar
2012-01-26 18:05 Ingo Molnar
2011-08-04 20:45 Ingo Molnar
2011-04-02 10:21 Ingo Molnar
2011-03-25 12:52 Ingo Molnar
2011-01-21  2:11 Ingo Molnar
2011-01-15 15:15 Ingo Molnar
2010-10-05 19:12 Ingo Molnar
2010-10-05 20:15 ` Linus Torvalds
2010-10-05 21:09   ` Paul E. McKenney
2010-10-05 21:45     ` Linus Torvalds
2010-10-05 22:05       ` Paul E. McKenney
2010-10-06  2:56         ` Eric Dumazet
2010-10-06  4:59           ` Paul E. McKenney
2010-10-06 18:20             ` Ingo Molnar
2010-10-06 21:27               ` Paul E. McKenney
2010-10-07  8:11                 ` Ingo Molnar
2010-10-07 17:42                   ` Paul E. McKenney
2010-09-08 13:04 Ingo Molnar
2010-03-26 14:53 Ingo Molnar
2010-03-13 16:35 Ingo Molnar
2009-12-18 18:52 Ingo Molnar
2009-11-10 17:53 Ingo Molnar
2009-10-23 14:53 Ingo Molnar
2009-10-13 18:29 Ingo Molnar
2009-10-08 19:06 Ingo Molnar
2009-10-08 19:16 ` Linus Torvalds
2009-10-08 19:20   ` Ingo Molnar
2009-09-21 13:13 Ingo Molnar
2009-08-13 18:54 Ingo Molnar
2009-08-09 16:07 Ingo Molnar
2009-08-09 18:41 ` Darren Hart
2009-07-10 16:28 Ingo Molnar
2009-07-10 19:06 ` Linus Torvalds
2009-07-10 19:31   ` Ingo Molnar
2009-07-10 19:52     ` Linus Torvalds
2009-07-10 20:02       ` Ingo Molnar
2009-07-13 14:52   ` Joerg Roedel
2009-06-20 17:30 Ingo Molnar
2009-06-20 18:49 ` Linus Torvalds
2009-06-20 19:01   ` Linus Torvalds
2009-06-20 20:27     ` Ingo Molnar
2009-06-21 17:12     ` Thomas Gleixner
2009-06-21 17:37       ` Linus Torvalds
2009-06-21 17:57         ` Linus Torvalds
2009-06-21 19:26           ` Thomas Gleixner
2009-05-18 14:23 Ingo Molnar
2009-05-18 15:48 ` Linus Torvalds
2009-05-18 19:20   ` Thomas Gleixner
2009-05-19 20:52     ` Linus Torvalds
2009-05-19 21:45       ` Thomas Gleixner
2009-05-19 22:20     ` Darren Hart
2009-05-05  9:33 Ingo Molnar
2009-01-30 23:12 [git pull] " Ingo Molnar
2009-01-26 17:24 Ingo Molnar
2009-01-11 14:36 Ingo Molnar
2008-12-04 19:39 Ingo Molnar
2008-11-29 19:36 Ingo Molnar
2008-11-18 14:14 Ingo Molnar
2008-11-07 16:28 Ingo Molnar
2008-10-15 12:50 [git pull] core kernel updates for v2.6.28 Ingo Molnar
2008-10-16 22:32 ` Linus Torvalds
2008-10-17  6:23   ` [git pull] core kernel fixes Ingo Molnar
2008-08-28 11:44 Ingo Molnar
2008-08-18 18:35 Ingo Molnar
2008-07-24 15:13 Ingo Molnar
2008-06-30 15:32 Ingo Molnar
2008-06-30 17:02 ` Vegard Nossum
2008-06-30 18:20   ` Ingo Molnar
2008-06-30 18:43     ` Vegard Nossum
2008-06-30 19:46       ` Thomas Gleixner
2008-06-30 19:51         ` Vegard Nossum
2008-06-30 19:54           ` Thomas Gleixner
2008-06-23 19:45 Ingo Molnar
2008-06-19 15:16 Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).