LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers
@ 2021-11-30 11:44 Marco Elver
  2021-11-30 11:44 ` [PATCH v3 01/25] kcsan: Refactor reading of instrumented memory Marco Elver
                   ` (24 more replies)
  0 siblings, 25 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86, Josh Poimboeuf

Detection of some missing memory barriers has been on the KCSAN feature
wishlist for some time: this series adds support for modeling a subset
of weak memory as defined by the LKMM, which enables detection of a
subset of data races due to missing memory barriers.

KCSAN's approach to detecting missing memory barriers is based on
modeling access reordering. Each memory access for which a watchpoint is
set up, is also selected for simulated reordering within the scope of
its function (at most 1 in-flight access).

We are limited to modeling the effects of "buffering" (delaying the
access), since the runtime cannot "prefetch" accesses. Once an access
has been selected for reordering, it is checked along every other access
until the end of the function scope. If an appropriate memory barrier is
encountered, the access will no longer be considered for reordering.

When the result of a memory operation should be ordered by a barrier,
KCSAN can then detect data races where the conflict only occurs as a
result of a missing barrier due to reordering accesses.

Some more details and an example are captured in the updated
<Documentation/dev-tools/kcsan.rst>.

Some light fuzzing with the feature also resulted in a discussion [1]
around an issue which appears to be allowed, but unlikely in practice.

[1] https://lkml.kernel.org/r/YRo58c+JGOvec7tc@elver.google.com


The first half of the series are core KCSAN changes, documentation
updates, and test changes. The second half adds instrumentation to
barriers, atomics, bitops, along with enabling barrier instrumentation
for some currently uninstrumented subsystems.

Followed by objtool changes to add the usual entries to the uaccess
whitelist, but also instruct objtool to remove memory barrier
instrumentation from noinstr code (on x86), given not all versions of
Clang currently respect __no_kcsan (== __no_sanitize_thread) for the new
instrumentation.

The last 2 patches (new in v3) fix up __no_kcsan for newer versions of
Clang, so that non-x86 architectures can enable weak memory modeling
with Clang 14.0 or newer.

Changelog
---------

v3:
* Rework to avoid kcsan_noinstr hackery, because it is unclear if
  this works on architectures like arm64. A better alternative exists
  where we can get __no_kcsan to work for barrier instrumentation, too.
  Clang's and GCC's __no_kcsan (== __no_sanitize_thread) behave slightly
  differently, which is reflected in KCSAN_WEAK_MEMORY's dependencies
  (either STACK_VALIDATION for older Clang, or GCC which works as-is).
* Rework to avoid inserting explicit calls for barrier instrumentation,
  and instead repurpose __atomic_signal_fence (see comment at
  __tsan_atomic_signal_fence), which is handled by fsanitize=thread
  instrumentation and can therefore be removed via __no_kcsan.
* objtool: s/removable_instr/profiling_func/, and more comments per
  Josh's suggestion.
* Minimize diff in patch removing zero-initialization of globals.
* Don't define kcsan_weak_memory bool if !KCSAN_WEAK_MEMORY.
* Apply Acks.
* 2 new patches to make it work with Clang >= 14.0 without objtool,
  which will be required on non-x86 architectures.

v2: https://lkml.kernel.org/r/20211118081027.3175699-1-elver@google.com
* Rewrite objtool patch after rebase to v5.16-rc1.
* Note the reason in documentation that address or control dependencies
  do not require special handling.
* Rename kcsan_atomic_release() to kcsan_atomic_builtin_memorder() to
  avoid confusion.
* Define kcsan_noinstr as noinline if we rely on objtool nop'ing out
  calls, to avoid things like LTO inlining it.

v1: https://lore.kernel.org/all/20211005105905.1994700-1-elver@google.com/
---

Alexander Potapenko (1):
  compiler_attributes.h: Add __disable_sanitizer_instrumentation

Marco Elver (24):
  kcsan: Refactor reading of instrumented memory
  kcsan: Remove redundant zero-initialization of globals
  kcsan: Avoid checking scoped accesses from nested contexts
  kcsan: Add core support for a subset of weak memory modeling
  kcsan: Add core memory barrier instrumentation functions
  kcsan, kbuild: Add option for barrier instrumentation only
  kcsan: Call scoped accesses reordered in reports
  kcsan: Show location access was reordered to
  kcsan: Document modeling of weak memory
  kcsan: test: Match reordered or normal accesses
  kcsan: test: Add test cases for memory barrier instrumentation
  kcsan: Ignore GCC 11+ warnings about TSan runtime support
  kcsan: selftest: Add test case to check memory barrier instrumentation
  locking/barriers, kcsan: Add instrumentation for barriers
  locking/barriers, kcsan: Support generic instrumentation
  locking/atomics, kcsan: Add instrumentation for barriers
  asm-generic/bitops, kcsan: Add instrumentation for barriers
  x86/barriers, kcsan: Use generic instrumentation for non-smp barriers
  x86/qspinlock, kcsan: Instrument barrier of pv_queued_spin_unlock()
  mm, kcsan: Enable barrier instrumentation
  sched, kcsan: Enable memory barrier instrumentation
  objtool, kcsan: Add memory barrier instrumentation to whitelist
  objtool, kcsan: Remove memory barrier instrumentation from noinstr
  kcsan: Support WEAK_MEMORY with Clang where no objtool support exists

 Documentation/dev-tools/kcsan.rst             |  76 +++-
 arch/x86/include/asm/barrier.h                |  10 +-
 arch/x86/include/asm/qspinlock.h              |   1 +
 include/asm-generic/barrier.h                 |  54 ++-
 .../asm-generic/bitops/instrumented-atomic.h  |   3 +
 .../asm-generic/bitops/instrumented-lock.h    |   3 +
 include/linux/atomic/atomic-instrumented.h    | 135 +++++-
 include/linux/compiler_attributes.h           |  18 +
 include/linux/compiler_types.h                |  13 +-
 include/linux/kcsan-checks.h                  |  81 +++-
 include/linux/kcsan.h                         |  11 +-
 include/linux/sched.h                         |   3 +
 include/linux/spinlock.h                      |   2 +-
 init/init_task.c                              |   5 -
 kernel/kcsan/Makefile                         |   2 +
 kernel/kcsan/core.c                           | 345 ++++++++++++---
 kernel/kcsan/kcsan_test.c                     | 415 ++++++++++++++++--
 kernel/kcsan/report.c                         |  51 ++-
 kernel/kcsan/selftest.c                       | 141 ++++++
 kernel/sched/Makefile                         |   7 +-
 lib/Kconfig.kcsan                             |  20 +
 mm/Makefile                                   |   2 +
 scripts/Makefile.kcsan                        |  15 +-
 scripts/Makefile.lib                          |   5 +
 scripts/atomic/gen-atomic-instrumented.sh     |  41 +-
 tools/objtool/check.c                         |  41 +-
 tools/objtool/include/objtool/elf.h           |   2 +-
 27 files changed, 1330 insertions(+), 172 deletions(-)

-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 01/25] kcsan: Refactor reading of instrumented memory
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 02/25] kcsan: Remove redundant zero-initialization of globals Marco Elver
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Factor out the switch statement reading instrumented memory into a
helper read_instrumented_memory().

No functional change.

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
---
 kernel/kcsan/core.c | 51 +++++++++++++++------------------------------
 1 file changed, 17 insertions(+), 34 deletions(-)

diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 4b84c8e7884b..6bfd3040f46b 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -325,6 +325,21 @@ static void delay_access(int type)
 	udelay(delay);
 }
 
+/*
+ * Reads the instrumented memory for value change detection; value change
+ * detection is currently done for accesses up to a size of 8 bytes.
+ */
+static __always_inline u64 read_instrumented_memory(const volatile void *ptr, size_t size)
+{
+	switch (size) {
+	case 1:  return READ_ONCE(*(const u8 *)ptr);
+	case 2:  return READ_ONCE(*(const u16 *)ptr);
+	case 4:  return READ_ONCE(*(const u32 *)ptr);
+	case 8:  return READ_ONCE(*(const u64 *)ptr);
+	default: return 0; /* Ignore; we do not diff the values. */
+	}
+}
+
 void kcsan_save_irqtrace(struct task_struct *task)
 {
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -482,23 +497,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 * Read the current value, to later check and infer a race if the data
 	 * was modified via a non-instrumented access, e.g. from a device.
 	 */
-	old = 0;
-	switch (size) {
-	case 1:
-		old = READ_ONCE(*(const u8 *)ptr);
-		break;
-	case 2:
-		old = READ_ONCE(*(const u16 *)ptr);
-		break;
-	case 4:
-		old = READ_ONCE(*(const u32 *)ptr);
-		break;
-	case 8:
-		old = READ_ONCE(*(const u64 *)ptr);
-		break;
-	default:
-		break; /* ignore; we do not diff the values */
-	}
+	old = read_instrumented_memory(ptr, size);
 
 	/*
 	 * Delay this thread, to increase probability of observing a racy
@@ -511,23 +510,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 * racy access.
 	 */
 	access_mask = ctx->access_mask;
-	new = 0;
-	switch (size) {
-	case 1:
-		new = READ_ONCE(*(const u8 *)ptr);
-		break;
-	case 2:
-		new = READ_ONCE(*(const u16 *)ptr);
-		break;
-	case 4:
-		new = READ_ONCE(*(const u32 *)ptr);
-		break;
-	case 8:
-		new = READ_ONCE(*(const u64 *)ptr);
-		break;
-	default:
-		break; /* ignore; we do not diff the values */
-	}
+	new = read_instrumented_memory(ptr, size);
 
 	diff = old ^ new;
 	if (access_mask)
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 02/25] kcsan: Remove redundant zero-initialization of globals
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
  2021-11-30 11:44 ` [PATCH v3 01/25] kcsan: Refactor reading of instrumented memory Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 03/25] kcsan: Avoid checking scoped accesses from nested contexts Marco Elver
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

They are implicitly zero-initialized, remove explicit initialization.
It keeps the upcoming additions to kcsan_ctx consistent with the rest.

No functional change intended.

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
---
v3:
* Minimize diff by leaving "scoped_accesses" on its own line, which
  should also reduce diff of future changes.
---
 init/init_task.c    | 5 -----
 kernel/kcsan/core.c | 5 -----
 2 files changed, 10 deletions(-)

diff --git a/init/init_task.c b/init/init_task.c
index 2d024066e27b..73cc8f03511a 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -182,11 +182,6 @@ struct task_struct init_task
 #endif
 #ifdef CONFIG_KCSAN
 	.kcsan_ctx = {
-		.disable_count		= 0,
-		.atomic_next		= 0,
-		.atomic_nest_count	= 0,
-		.in_flat_atomic		= false,
-		.access_mask		= 0,
 		.scoped_accesses	= {LIST_POISON1, NULL},
 	},
 #endif
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 6bfd3040f46b..e34a1710b7bc 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -44,11 +44,6 @@ bool kcsan_enabled;
 
 /* Per-CPU kcsan_ctx for interrupts */
 static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
-	.disable_count		= 0,
-	.atomic_next		= 0,
-	.atomic_nest_count	= 0,
-	.in_flat_atomic		= false,
-	.access_mask		= 0,
 	.scoped_accesses	= {LIST_POISON1, NULL},
 };
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 03/25] kcsan: Avoid checking scoped accesses from nested contexts
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
  2021-11-30 11:44 ` [PATCH v3 01/25] kcsan: Refactor reading of instrumented memory Marco Elver
  2021-11-30 11:44 ` [PATCH v3 02/25] kcsan: Remove redundant zero-initialization of globals Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling Marco Elver
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Avoid checking scoped accesses from nested contexts (such as nested
interrupts or in scheduler code) which share the same kcsan_ctx.

This is to avoid detecting false positive races of accesses in the same
thread with currently scoped accesses: consider setting up a watchpoint
for a non-scoped (normal) access that also "conflicts" with a current
scoped access. In a nested interrupt (or in the scheduler), which shares
the same kcsan_ctx, we cannot check scoped accesses set up in the parent
context -- simply ignore them in this case.

With the introduction of kcsan_ctx::disable_scoped, we can also clean up
kcsan_check_scoped_accesses()'s recursion guard, and do not need to
modify the list's prev pointer.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/kcsan.h |  1 +
 kernel/kcsan/core.c   | 18 +++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
index fc266ecb2a4d..13cef3458fed 100644
--- a/include/linux/kcsan.h
+++ b/include/linux/kcsan.h
@@ -21,6 +21,7 @@
  */
 struct kcsan_ctx {
 	int disable_count; /* disable counter */
+	int disable_scoped; /* disable scoped access counter */
 	int atomic_next; /* number of following atomic ops */
 
 	/*
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index e34a1710b7bc..bd359f8ee63a 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -204,15 +204,17 @@ check_access(const volatile void *ptr, size_t size, int type, unsigned long ip);
 static noinline void kcsan_check_scoped_accesses(void)
 {
 	struct kcsan_ctx *ctx = get_ctx();
-	struct list_head *prev_save = ctx->scoped_accesses.prev;
 	struct kcsan_scoped_access *scoped_access;
 
-	ctx->scoped_accesses.prev = NULL;  /* Avoid recursion. */
+	if (ctx->disable_scoped)
+		return;
+
+	ctx->disable_scoped++;
 	list_for_each_entry(scoped_access, &ctx->scoped_accesses, list) {
 		check_access(scoped_access->ptr, scoped_access->size,
 			     scoped_access->type, scoped_access->ip);
 	}
-	ctx->scoped_accesses.prev = prev_save;
+	ctx->disable_scoped--;
 }
 
 /* Rules for generic atomic accesses. Called from fast-path. */
@@ -465,6 +467,15 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 		goto out;
 	}
 
+	/*
+	 * Avoid races of scoped accesses from nested interrupts (or scheduler).
+	 * Assume setting up a watchpoint for a non-scoped (normal) access that
+	 * also conflicts with a current scoped access. In a nested interrupt,
+	 * which shares the context, it would check a conflicting scoped access.
+	 * To avoid, disable scoped access checking.
+	 */
+	ctx->disable_scoped++;
+
 	/*
 	 * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
 	 * runtime is entered for every memory access, and potentially useful
@@ -578,6 +589,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	if (!kcsan_interrupt_watcher)
 		local_irq_restore(irq_flags);
 	kcsan_restore_irqtrace(current);
+	ctx->disable_scoped--;
 out:
 	user_access_restore(ua_flags);
 }
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (2 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 03/25] kcsan: Avoid checking scoped accesses from nested contexts Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-12-03  8:56   ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 05/25] kcsan: Add core memory barrier instrumentation functions Marco Elver
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Add support for modeling a subset of weak memory, which will enable
detection of a subset of data races due to missing memory barriers.

KCSAN's approach to detecting missing memory barriers is based on
modeling access reordering, and enabled if `CONFIG_KCSAN_WEAK_MEMORY=y`,
which depends on `CONFIG_KCSAN_STRICT=y`. The feature can be enabled or
disabled at boot and runtime via the `kcsan.weak_memory` boot parameter.

Each memory access for which a watchpoint is set up, is also selected
for simulated reordering within the scope of its function (at most 1
in-flight access).

We are limited to modeling the effects of "buffering" (delaying the
access), since the runtime cannot "prefetch" accesses (therefore no
acquire modeling). Once an access has been selected for reordering, it
is checked along every other access until the end of the function scope.
If an appropriate memory barrier is encountered, the access will no
longer be considered for reordering.

When the result of a memory operation should be ordered by a barrier,
KCSAN can then detect data races where the conflict only occurs as a
result of a missing barrier due to reordering accesses.

Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Remove kcsan_noinstr hackery, since we now try to avoid adding any
  instrumentation to .noinstr.text in the first place.
* Restrict config WEAK_MEMORY to only be enabled with tooling where
  we actually remove instrumentation from noinstr.
* Don't define kcsan_weak_memory bool if !KCSAN_WEAK_MEMORY.

v2:
* Define kcsan_noinstr as noinline if we rely on objtool nop'ing out
  calls, to avoid things like LTO inlining it.
---
 include/linux/kcsan-checks.h |  10 +-
 include/linux/kcsan.h        |  10 +-
 include/linux/sched.h        |   3 +
 kernel/kcsan/core.c          | 209 ++++++++++++++++++++++++++++++++---
 lib/Kconfig.kcsan            |  20 ++++
 scripts/Makefile.kcsan       |   9 +-
 6 files changed, 242 insertions(+), 19 deletions(-)

diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
index 5f5965246877..a1c6a89fde71 100644
--- a/include/linux/kcsan-checks.h
+++ b/include/linux/kcsan-checks.h
@@ -99,7 +99,15 @@ void kcsan_set_access_mask(unsigned long mask);
 
 /* Scoped access information. */
 struct kcsan_scoped_access {
-	struct list_head list;
+	union {
+		struct list_head list; /* scoped_accesses list */
+		/*
+		 * Not an entry in scoped_accesses list; stack depth from where
+		 * the access was initialized.
+		 */
+		int stack_depth;
+	};
+
 	/* Access information. */
 	const volatile void *ptr;
 	size_t size;
diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
index 13cef3458fed..c07c71f5ba4f 100644
--- a/include/linux/kcsan.h
+++ b/include/linux/kcsan.h
@@ -49,8 +49,16 @@ struct kcsan_ctx {
 	 */
 	unsigned long access_mask;
 
-	/* List of scoped accesses. */
+	/* List of scoped accesses; likely to be empty. */
 	struct list_head scoped_accesses;
+
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	/*
+	 * Scoped access for modeling access reordering to detect missing memory
+	 * barriers; only keep 1 to keep fast-path complexity manageable.
+	 */
+	struct kcsan_scoped_access reorder_access;
+#endif
 };
 
 /**
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec..0cd40b010487 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1339,6 +1339,9 @@ struct task_struct {
 #ifdef CONFIG_TRACE_IRQFLAGS
 	struct irqtrace_events		kcsan_save_irqtrace;
 #endif
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	int				kcsan_stack_depth;
+#endif
 #endif
 
 #if IS_ENABLED(CONFIG_KUNIT)
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index bd359f8ee63a..36a75e79a0bd 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -12,6 +12,7 @@
 #include <linux/delay.h>
 #include <linux/export.h>
 #include <linux/init.h>
+#include <linux/instrumentation.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/moduleparam.h>
@@ -20,6 +21,8 @@
 #include <linux/sched.h>
 #include <linux/uaccess.h>
 
+#include <asm/sections.h>
+
 #include "encoding.h"
 #include "kcsan.h"
 #include "permissive.h"
@@ -40,6 +43,13 @@ module_param_named(udelay_interrupt, kcsan_udelay_interrupt, uint, 0644);
 module_param_named(skip_watch, kcsan_skip_watch, long, 0644);
 module_param_named(interrupt_watcher, kcsan_interrupt_watcher, bool, 0444);
 
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+static bool kcsan_weak_memory = true;
+module_param_named(weak_memory, kcsan_weak_memory, bool, 0644);
+#else
+#define kcsan_weak_memory false
+#endif
+
 bool kcsan_enabled;
 
 /* Per-CPU kcsan_ctx for interrupts */
@@ -351,6 +361,67 @@ void kcsan_restore_irqtrace(struct task_struct *task)
 #endif
 }
 
+static __always_inline int get_kcsan_stack_depth(void)
+{
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	return current->kcsan_stack_depth;
+#else
+	BUILD_BUG();
+	return 0;
+#endif
+}
+
+static __always_inline void add_kcsan_stack_depth(int val)
+{
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	current->kcsan_stack_depth += val;
+#else
+	BUILD_BUG();
+#endif
+}
+
+static __always_inline struct kcsan_scoped_access *get_reorder_access(struct kcsan_ctx *ctx)
+{
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	return ctx->disable_scoped ? NULL : &ctx->reorder_access;
+#else
+	return NULL;
+#endif
+}
+
+static __always_inline bool
+find_reorder_access(struct kcsan_ctx *ctx, const volatile void *ptr, size_t size,
+		    int type, unsigned long ip)
+{
+	struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
+
+	if (!reorder_access)
+		return false;
+
+	/*
+	 * Note: If accesses are repeated while reorder_access is identical,
+	 * never matches the new access, because !(type & KCSAN_ACCESS_SCOPED).
+	 */
+	return reorder_access->ptr == ptr && reorder_access->size == size &&
+	       reorder_access->type == type && reorder_access->ip == ip;
+}
+
+static inline void
+set_reorder_access(struct kcsan_ctx *ctx, const volatile void *ptr, size_t size,
+		   int type, unsigned long ip)
+{
+	struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
+
+	if (!reorder_access || !kcsan_weak_memory)
+		return;
+
+	reorder_access->ptr		= ptr;
+	reorder_access->size		= size;
+	reorder_access->type		= type | KCSAN_ACCESS_SCOPED;
+	reorder_access->ip		= ip;
+	reorder_access->stack_depth	= get_kcsan_stack_depth();
+}
+
 /*
  * Pull everything together: check_access() below contains the performance
  * critical operations; the fast-path (including check_access) functions should
@@ -389,8 +460,10 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
 	 * The access_mask check relies on value-change comparison. To avoid
 	 * reporting a race where e.g. the writer set up the watchpoint, but the
 	 * reader has access_mask!=0, we have to ignore the found watchpoint.
+	 *
+	 * reorder_access is never created from an access with access_mask set.
 	 */
-	if (ctx->access_mask)
+	if (ctx->access_mask && !find_reorder_access(ctx, ptr, size, type, ip))
 		return;
 
 	/*
@@ -440,11 +513,13 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	const bool is_assert = (type & KCSAN_ACCESS_ASSERT) != 0;
 	atomic_long_t *watchpoint;
 	u64 old, new, diff;
-	unsigned long access_mask;
 	enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
+	bool interrupt_watcher = kcsan_interrupt_watcher;
 	unsigned long ua_flags = user_access_save();
 	struct kcsan_ctx *ctx = get_ctx();
+	unsigned long access_mask = ctx->access_mask;
 	unsigned long irq_flags = 0;
+	bool is_reorder_access;
 
 	/*
 	 * Always reset kcsan_skip counter in slow-path to avoid underflow; see
@@ -467,6 +542,17 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 		goto out;
 	}
 
+	/*
+	 * The local CPU cannot observe reordering of its own accesses, and
+	 * therefore we need to take care of 2 cases to avoid false positives:
+	 *
+	 *	1. Races of the reordered access with interrupts. To avoid, if
+	 *	   the current access is reorder_access, disable interrupts.
+	 *	2. Avoid races of scoped accesses from nested interrupts (below).
+	 */
+	is_reorder_access = find_reorder_access(ctx, ptr, size, type, ip);
+	if (is_reorder_access)
+		interrupt_watcher = false;
 	/*
 	 * Avoid races of scoped accesses from nested interrupts (or scheduler).
 	 * Assume setting up a watchpoint for a non-scoped (normal) access that
@@ -482,7 +568,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 * information is lost if dirtied by KCSAN.
 	 */
 	kcsan_save_irqtrace(current);
-	if (!kcsan_interrupt_watcher)
+	if (!interrupt_watcher)
 		local_irq_save(irq_flags);
 
 	watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
@@ -503,7 +589,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 * Read the current value, to later check and infer a race if the data
 	 * was modified via a non-instrumented access, e.g. from a device.
 	 */
-	old = read_instrumented_memory(ptr, size);
+	old = is_reorder_access ? 0 : read_instrumented_memory(ptr, size);
 
 	/*
 	 * Delay this thread, to increase probability of observing a racy
@@ -515,8 +601,17 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 * Re-read value, and check if it is as expected; if not, we infer a
 	 * racy access.
 	 */
-	access_mask = ctx->access_mask;
-	new = read_instrumented_memory(ptr, size);
+	if (!is_reorder_access) {
+		new = read_instrumented_memory(ptr, size);
+	} else {
+		/*
+		 * Reordered accesses cannot be used for value change detection,
+		 * because the memory location may no longer be accessible and
+		 * could result in a fault.
+		 */
+		new = 0;
+		access_mask = 0;
+	}
 
 	diff = old ^ new;
 	if (access_mask)
@@ -585,11 +680,20 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 */
 	remove_watchpoint(watchpoint);
 	atomic_long_dec(&kcsan_counters[KCSAN_COUNTER_USED_WATCHPOINTS]);
+
 out_unlock:
-	if (!kcsan_interrupt_watcher)
+	if (!interrupt_watcher)
 		local_irq_restore(irq_flags);
 	kcsan_restore_irqtrace(current);
 	ctx->disable_scoped--;
+
+	/*
+	 * Reordered accesses cannot be used for value change detection,
+	 * therefore never consider for reordering if access_mask is set.
+	 * ASSERT_EXCLUSIVE are not real accesses, ignore them as well.
+	 */
+	if (!access_mask && !is_assert)
+		set_reorder_access(ctx, ptr, size, type, ip);
 out:
 	user_access_restore(ua_flags);
 }
@@ -597,7 +701,6 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 static __always_inline void
 check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
 {
-	const bool is_write = (type & KCSAN_ACCESS_WRITE) != 0;
 	atomic_long_t *watchpoint;
 	long encoded_watchpoint;
 
@@ -608,12 +711,14 @@ check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
 	if (unlikely(size == 0))
 		return;
 
+again:
 	/*
 	 * Avoid user_access_save in fast-path: find_watchpoint is safe without
 	 * user_access_save, as the address that ptr points to is only used to
 	 * check if a watchpoint exists; ptr is never dereferenced.
 	 */
-	watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
+	watchpoint = find_watchpoint((unsigned long)ptr, size,
+				     !(type & KCSAN_ACCESS_WRITE),
 				     &encoded_watchpoint);
 	/*
 	 * It is safe to check kcsan_is_enabled() after find_watchpoint in the
@@ -627,9 +732,42 @@ check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
 	else {
 		struct kcsan_ctx *ctx = get_ctx(); /* Call only once in fast-path. */
 
-		if (unlikely(should_watch(ctx, ptr, size, type)))
+		if (unlikely(should_watch(ctx, ptr, size, type))) {
 			kcsan_setup_watchpoint(ptr, size, type, ip);
-		else if (unlikely(ctx->scoped_accesses.prev))
+			return;
+		}
+
+		if (!(type & KCSAN_ACCESS_SCOPED)) {
+			struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
+
+			if (reorder_access) {
+				/*
+				 * reorder_access check: simulates reordering of
+				 * the access after subsequent operations.
+				 */
+				ptr = reorder_access->ptr;
+				type = reorder_access->type;
+				ip = reorder_access->ip;
+				/*
+				 * Upon a nested interrupt, this context's
+				 * reorder_access can be modified (shared ctx).
+				 * We know that upon return, reorder_access is
+				 * always invalidated by setting size to 0 via
+				 * __tsan_func_exit(). Therefore we must read
+				 * and check size after the other fields.
+				 */
+				barrier();
+				size = READ_ONCE(reorder_access->size);
+				if (size)
+					goto again;
+			}
+		}
+
+		/*
+		 * Always checked last, right before returning from runtime;
+		 * if reorder_access is valid, checked after it was checked.
+		 */
+		if (unlikely(ctx->scoped_accesses.prev))
 			kcsan_check_scoped_accesses();
 	}
 }
@@ -916,19 +1054,60 @@ DEFINE_TSAN_VOLATILE_READ_WRITE(8);
 DEFINE_TSAN_VOLATILE_READ_WRITE(16);
 
 /*
- * The below are not required by KCSAN, but can still be emitted by the
- * compiler.
+ * Function entry and exit are used to determine the validty of reorder_access.
+ * Reordering of the access ends at the end of the function scope where the
+ * access happened. This is done for two reasons:
+ *
+ *	1. Artificially limits the scope where missing barriers are detected.
+ *	   This minimizes false positives due to uninstrumented functions that
+ *	   contain the required barriers but were missed.
+ *
+ *	2. Simplifies generating the stack trace of the access.
  */
 void __tsan_func_entry(void *call_pc);
-void __tsan_func_entry(void *call_pc)
+noinline void __tsan_func_entry(void *call_pc)
 {
+	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
+		return;
+
+	instrumentation_begin();
+	add_kcsan_stack_depth(1);
+	instrumentation_end();
 }
 EXPORT_SYMBOL(__tsan_func_entry);
+
 void __tsan_func_exit(void);
-void __tsan_func_exit(void)
+noinline void __tsan_func_exit(void)
 {
+	struct kcsan_scoped_access *reorder_access;
+
+	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
+		return;
+
+	instrumentation_begin();
+	reorder_access = get_reorder_access(get_ctx());
+	if (!reorder_access)
+		goto out;
+
+	if (get_kcsan_stack_depth() <= reorder_access->stack_depth) {
+		/*
+		 * Access check to catch cases where write without a barrier
+		 * (supposed release) was last access in function: because
+		 * instrumentation is inserted before the real access, a data
+		 * race due to the write giving up a c-s would only be caught if
+		 * we do the conflicting access after.
+		 */
+		check_access(reorder_access->ptr, reorder_access->size,
+			     reorder_access->type, reorder_access->ip);
+		reorder_access->size = 0;
+		reorder_access->stack_depth = INT_MIN;
+	}
+out:
+	add_kcsan_stack_depth(-1);
+	instrumentation_end();
 }
 EXPORT_SYMBOL(__tsan_func_exit);
+
 void __tsan_init(void);
 void __tsan_init(void)
 {
diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
index e0a93ffdef30..e4394ea8068b 100644
--- a/lib/Kconfig.kcsan
+++ b/lib/Kconfig.kcsan
@@ -191,6 +191,26 @@ config KCSAN_STRICT
 	  closely aligns with the rules defined by the Linux-kernel memory
 	  consistency model (LKMM).
 
+config KCSAN_WEAK_MEMORY
+	bool "Enable weak memory modeling to detect missing memory barriers"
+	default y
+	depends on KCSAN_STRICT
+	# We can either let objtool nop __tsan_func_{entry,exit}() and builtin
+	# atomics instrumentation in .noinstr.text, or use a compiler that can
+	# implement __no_kcsan to really remove all instrumentation.
+	depends on STACK_VALIDATION || CC_IS_GCC
+	help
+	  Enable support for modeling a subset of weak memory, which allows
+	  detecting a subset of data races due to missing memory barriers.
+
+	  Depends on KCSAN_STRICT, because the options strenghtening certain
+	  plain accesses by default (depending on !KCSAN_STRICT) reduce the
+	  ability to detect any data races invoving reordered accesses, in
+	  particular reordered writes.
+
+	  Weak memory modeling relies on additional instrumentation and may
+	  affect performance.
+
 config KCSAN_REPORT_VALUE_CHANGE_ONLY
 	bool "Only report races where watcher observed a data value change"
 	default y
diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
index 37cb504c77e1..4c7f0d282e42 100644
--- a/scripts/Makefile.kcsan
+++ b/scripts/Makefile.kcsan
@@ -9,7 +9,12 @@ endif
 
 # Keep most options here optional, to allow enabling more compilers if absence
 # of some options does not break KCSAN nor causes false positive reports.
-export CFLAGS_KCSAN := -fsanitize=thread \
-	$(call cc-option,$(call cc-param,tsan-instrument-func-entry-exit=0) -fno-optimize-sibling-calls) \
+kcsan-cflags := -fsanitize=thread -fno-optimize-sibling-calls \
 	$(call cc-option,$(call cc-param,tsan-compound-read-before-write=1),$(call cc-option,$(call cc-param,tsan-instrument-read-before-write=1))) \
 	$(call cc-param,tsan-distinguish-volatile=1)
+
+ifndef CONFIG_KCSAN_WEAK_MEMORY
+kcsan-cflags += $(call cc-option,$(call cc-param,tsan-instrument-func-entry-exit=0))
+endif
+
+export CFLAGS_KCSAN := $(kcsan-cflags)
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 05/25] kcsan: Add core memory barrier instrumentation functions
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (3 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 06/25] kcsan, kbuild: Add option for barrier instrumentation only Marco Elver
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Add the core memory barrier instrumentation functions. These invalidate
the current in-flight reordered access based on the rules for the
respective barrier types and in-flight access type.

To obtain barrier instrumentation that can be disabled via __no_kcsan
with appropriate compiler-support (and not just with objtool help),
barrier instrumentation repurposes __atomic_signal_fence(), instead of
inserting explicit calls. Crucially, __atomic_signal_fence() normally
does not map to any real instructions, but is still intercepted by
fsanitize=thread. As a result, like any other instrumentation done by
the compiler, barrier instrumentation can be disabled with __no_kcsan.

Unfortunately Clang and GCC currently differ in their __no_kcsan aka
__no_sanitize_thread behaviour with respect to builtin atomics (and
__tsan_func_{entry,exit}) instrumentation. This is already reflected in
Kconfig.kcsan's dependencies for KCSAN_WEAK_MEMORY. A later change will
introduce support for newer versions of Clang that can implement
__no_kcsan to also remove the additional instrumentation introduced by
KCSAN_WEAK_MEMORY.

Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Rework to avoid inserting explicit calls, and instead repurpose
  __atomic_signal_fence (see comment at __tsan_atomic_signal_fence),
  which is known to the ThreadSanitizer instrumentation and can
  therefore be removed via function attributes.

v2:
* Rename kcsan_atomic_release() to kcsan_atomic_builtin_memorder() to
  avoid confusion.
---
 include/linux/kcsan-checks.h | 71 +++++++++++++++++++++++++++++++++++-
 kernel/kcsan/core.c          | 68 +++++++++++++++++++++++++++++++++-
 2 files changed, 136 insertions(+), 3 deletions(-)

diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
index a1c6a89fde71..9d2c869167f2 100644
--- a/include/linux/kcsan-checks.h
+++ b/include/linux/kcsan-checks.h
@@ -36,6 +36,36 @@
  */
 void __kcsan_check_access(const volatile void *ptr, size_t size, int type);
 
+/*
+ * See definition of __tsan_atomic_signal_fence() in kernel/kcsan/core.c.
+ * Note: The mappings are arbitrary, and do not reflect any real mappings of C11
+ * memory orders to the LKMM memory orders and vice-versa!
+ */
+#define __KCSAN_BARRIER_TO_SIGNAL_FENCE_mb	__ATOMIC_SEQ_CST
+#define __KCSAN_BARRIER_TO_SIGNAL_FENCE_wmb	__ATOMIC_ACQ_REL
+#define __KCSAN_BARRIER_TO_SIGNAL_FENCE_rmb	__ATOMIC_ACQUIRE
+#define __KCSAN_BARRIER_TO_SIGNAL_FENCE_release	__ATOMIC_RELEASE
+
+/**
+ * __kcsan_mb - full memory barrier instrumentation
+ */
+void __kcsan_mb(void);
+
+/**
+ * __kcsan_wmb - write memory barrier instrumentation
+ */
+void __kcsan_wmb(void);
+
+/**
+ * __kcsan_rmb - read memory barrier instrumentation
+ */
+void __kcsan_rmb(void);
+
+/**
+ * __kcsan_release - release barrier instrumentation
+ */
+void __kcsan_release(void);
+
 /**
  * kcsan_disable_current - disable KCSAN for the current context
  *
@@ -159,6 +189,10 @@ void kcsan_end_scoped_access(struct kcsan_scoped_access *sa);
 static inline void __kcsan_check_access(const volatile void *ptr, size_t size,
 					int type) { }
 
+static inline void __kcsan_mb(void)			{ }
+static inline void __kcsan_wmb(void)			{ }
+static inline void __kcsan_rmb(void)			{ }
+static inline void __kcsan_release(void)		{ }
 static inline void kcsan_disable_current(void)		{ }
 static inline void kcsan_enable_current(void)		{ }
 static inline void kcsan_enable_current_nowarn(void)	{ }
@@ -191,12 +225,45 @@ static inline void kcsan_end_scoped_access(struct kcsan_scoped_access *sa) { }
  */
 #define __kcsan_disable_current kcsan_disable_current
 #define __kcsan_enable_current kcsan_enable_current_nowarn
-#else
+#else /* __SANITIZE_THREAD__ */
 static inline void kcsan_check_access(const volatile void *ptr, size_t size,
 				      int type) { }
 static inline void __kcsan_enable_current(void)  { }
 static inline void __kcsan_disable_current(void) { }
-#endif
+#endif /* __SANITIZE_THREAD__ */
+
+#if defined(CONFIG_KCSAN_WEAK_MEMORY) && defined(__SANITIZE_THREAD__)
+/*
+ * Normal barrier instrumentation is not done via explicit calls, but by mapping
+ * to a repurposed __atomic_signal_fence(), which normally does not generate any
+ * real instructions, but is still intercepted by fsanitize=thread. This means,
+ * like any other compile-time instrumentation, barrier instrumentation can be
+ * disabled with the __no_kcsan function attribute.
+ *
+ * Also see definition of __tsan_atomic_signal_fence() in kernel/kcsan/core.c.
+ */
+#define __KCSAN_BARRIER_TO_SIGNAL_FENCE(name)					\
+	static __always_inline void kcsan_##name(void)				\
+	{									\
+		barrier();							\
+		__atomic_signal_fence(__KCSAN_BARRIER_TO_SIGNAL_FENCE_##name);	\
+		barrier();							\
+	}
+__KCSAN_BARRIER_TO_SIGNAL_FENCE(mb)
+__KCSAN_BARRIER_TO_SIGNAL_FENCE(wmb)
+__KCSAN_BARRIER_TO_SIGNAL_FENCE(rmb)
+__KCSAN_BARRIER_TO_SIGNAL_FENCE(release)
+#elif defined(CONFIG_KCSAN_WEAK_MEMORY) && defined(__KCSAN_INSTRUMENT_BARRIERS__)
+#define kcsan_mb	__kcsan_mb
+#define kcsan_wmb	__kcsan_wmb
+#define kcsan_rmb	__kcsan_rmb
+#define kcsan_release	__kcsan_release
+#else /* CONFIG_KCSAN_WEAK_MEMORY && ... */
+static inline void kcsan_mb(void)		{ }
+static inline void kcsan_wmb(void)		{ }
+static inline void kcsan_rmb(void)		{ }
+static inline void kcsan_release(void)		{ }
+#endif /* CONFIG_KCSAN_WEAK_MEMORY && ... */
 
 /**
  * __kcsan_check_read - check regular read access for races
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 36a75e79a0bd..2254cb75cbb0 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -942,6 +942,22 @@ void __kcsan_check_access(const volatile void *ptr, size_t size, int type)
 }
 EXPORT_SYMBOL(__kcsan_check_access);
 
+#define DEFINE_MEMORY_BARRIER(name, order_before_cond)				\
+	void __kcsan_##name(void)						\
+	{									\
+		struct kcsan_scoped_access *sa = get_reorder_access(get_ctx());	\
+		if (!sa)							\
+			return;							\
+		if (order_before_cond)						\
+			sa->size = 0;						\
+	}									\
+	EXPORT_SYMBOL(__kcsan_##name)
+
+DEFINE_MEMORY_BARRIER(mb, true);
+DEFINE_MEMORY_BARRIER(wmb, sa->type & (KCSAN_ACCESS_WRITE | KCSAN_ACCESS_COMPOUND));
+DEFINE_MEMORY_BARRIER(rmb, !(sa->type & KCSAN_ACCESS_WRITE) || (sa->type & KCSAN_ACCESS_COMPOUND));
+DEFINE_MEMORY_BARRIER(release, true);
+
 /*
  * KCSAN uses the same instrumentation that is emitted by supported compilers
  * for ThreadSanitizer (TSAN).
@@ -1130,10 +1146,19 @@ EXPORT_SYMBOL(__tsan_init);
  * functions, whose job is to also execute the operation itself.
  */
 
+static __always_inline void kcsan_atomic_builtin_memorder(int memorder)
+{
+	if (memorder == __ATOMIC_RELEASE ||
+	    memorder == __ATOMIC_SEQ_CST ||
+	    memorder == __ATOMIC_ACQ_REL)
+		__kcsan_release();
+}
+
 #define DEFINE_TSAN_ATOMIC_LOAD_STORE(bits)                                                        \
 	u##bits __tsan_atomic##bits##_load(const u##bits *ptr, int memorder);                      \
 	u##bits __tsan_atomic##bits##_load(const u##bits *ptr, int memorder)                       \
 	{                                                                                          \
+		kcsan_atomic_builtin_memorder(memorder);                                           \
 		if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) {                                    \
 			check_access(ptr, bits / BITS_PER_BYTE, KCSAN_ACCESS_ATOMIC, _RET_IP_);    \
 		}                                                                                  \
@@ -1143,6 +1168,7 @@ EXPORT_SYMBOL(__tsan_init);
 	void __tsan_atomic##bits##_store(u##bits *ptr, u##bits v, int memorder);                   \
 	void __tsan_atomic##bits##_store(u##bits *ptr, u##bits v, int memorder)                    \
 	{                                                                                          \
+		kcsan_atomic_builtin_memorder(memorder);                                           \
 		if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) {                                    \
 			check_access(ptr, bits / BITS_PER_BYTE,                                    \
 				     KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC, _RET_IP_);          \
@@ -1155,6 +1181,7 @@ EXPORT_SYMBOL(__tsan_init);
 	u##bits __tsan_atomic##bits##_##op(u##bits *ptr, u##bits v, int memorder);                 \
 	u##bits __tsan_atomic##bits##_##op(u##bits *ptr, u##bits v, int memorder)                  \
 	{                                                                                          \
+		kcsan_atomic_builtin_memorder(memorder);                                           \
 		if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) {                                    \
 			check_access(ptr, bits / BITS_PER_BYTE,                                    \
 				     KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE |                  \
@@ -1187,6 +1214,7 @@ EXPORT_SYMBOL(__tsan_init);
 	int __tsan_atomic##bits##_compare_exchange_##strength(u##bits *ptr, u##bits *exp,          \
 							      u##bits val, int mo, int fail_mo)    \
 	{                                                                                          \
+		kcsan_atomic_builtin_memorder(mo);                                                 \
 		if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) {                                    \
 			check_access(ptr, bits / BITS_PER_BYTE,                                    \
 				     KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE |                  \
@@ -1202,6 +1230,7 @@ EXPORT_SYMBOL(__tsan_init);
 	u##bits __tsan_atomic##bits##_compare_exchange_val(u##bits *ptr, u##bits exp, u##bits val, \
 							   int mo, int fail_mo)                    \
 	{                                                                                          \
+		kcsan_atomic_builtin_memorder(mo);                                                 \
 		if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) {                                    \
 			check_access(ptr, bits / BITS_PER_BYTE,                                    \
 				     KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE |                  \
@@ -1233,10 +1262,47 @@ DEFINE_TSAN_ATOMIC_OPS(64);
 void __tsan_atomic_thread_fence(int memorder);
 void __tsan_atomic_thread_fence(int memorder)
 {
+	kcsan_atomic_builtin_memorder(memorder);
 	__atomic_thread_fence(memorder);
 }
 EXPORT_SYMBOL(__tsan_atomic_thread_fence);
 
+/*
+ * In instrumented files, we emit instrumentation for barriers by mapping the
+ * kernel barriers to an __atomic_signal_fence(), which is interpreted specially
+ * and otherwise has no relation to a real __atomic_signal_fence(). No known
+ * kernel code uses __atomic_signal_fence().
+ *
+ * Since fsanitize=thread instrumentation handles __atomic_signal_fence(), which
+ * are turned into calls to __tsan_atomic_signal_fence(), such instrumentation
+ * can be disabled via the __no_kcsan function attribute (vs. an explicit call
+ * which could not). When __no_kcsan is requested, __atomic_signal_fence()
+ * generates no code.
+ *
+ * Note: The result of using __atomic_signal_fence() with KCSAN enabled is
+ * potentially limiting the compiler's ability to reorder operations; however,
+ * if barriers were instrumented with explicit calls (without LTO), the compiler
+ * couldn't optimize much anyway. The result of a hypothetical architecture
+ * using __atomic_signal_fence() in normal code would be KCSAN false negatives.
+ */
 void __tsan_atomic_signal_fence(int memorder);
-void __tsan_atomic_signal_fence(int memorder) { }
+noinline void __tsan_atomic_signal_fence(int memorder)
+{
+	switch (memorder) {
+	case __KCSAN_BARRIER_TO_SIGNAL_FENCE_mb:
+		__kcsan_mb();
+		break;
+	case __KCSAN_BARRIER_TO_SIGNAL_FENCE_wmb:
+		__kcsan_wmb();
+		break;
+	case __KCSAN_BARRIER_TO_SIGNAL_FENCE_rmb:
+		__kcsan_rmb();
+		break;
+	case __KCSAN_BARRIER_TO_SIGNAL_FENCE_release:
+		__kcsan_release();
+		break;
+	default:
+		break;
+	}
+}
 EXPORT_SYMBOL(__tsan_atomic_signal_fence);
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 06/25] kcsan, kbuild: Add option for barrier instrumentation only
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (4 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 05/25] kcsan: Add core memory barrier instrumentation functions Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 07/25] kcsan: Call scoped accesses reordered in reports Marco Elver
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Source files that disable KCSAN via KCSAN_SANITIZE := n, remove all
instrumentation, including explicit barrier instrumentation. With
instrumentation for memory barriers, in few places it is required to
enable just the explicit instrumentation for memory barriers to avoid
false positives.

Providing the Makefile variable KCSAN_INSTRUMENT_BARRIERS_obj.o or
KCSAN_INSTRUMENT_BARRIERS (for all files) set to 'y' only enables the
explicit barrier instrumentation.

Signed-off-by: Marco Elver <elver@google.com>
---
 scripts/Makefile.lib | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index d1f865b8c0cb..ab17f7b2e33c 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -182,6 +182,11 @@ ifeq ($(CONFIG_KCSAN),y)
 _c_flags += $(if $(patsubst n%,, \
 	$(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
 	$(CFLAGS_KCSAN))
+# Some uninstrumented files provide implied barriers required to avoid false
+# positives: set KCSAN_INSTRUMENT_BARRIERS for barrier instrumentation only.
+_c_flags += $(if $(patsubst n%,, \
+	$(KCSAN_INSTRUMENT_BARRIERS_$(basetarget).o)$(KCSAN_INSTRUMENT_BARRIERS)n), \
+	-D__KCSAN_INSTRUMENT_BARRIERS__)
 endif
 
 # $(srctree)/$(src) for including checkin headers from generated source files
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 07/25] kcsan: Call scoped accesses reordered in reports
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (5 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 06/25] kcsan, kbuild: Add option for barrier instrumentation only Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 08/25] kcsan: Show location access was reordered to Marco Elver
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

The scoping of an access simply denotes the scope in which it may be
reordered. However, in reports, it'll be less confusing to say the
access is "reordered". This is more accurate when the race occurred.

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/kcsan/kcsan_test.c |  4 ++--
 kernel/kcsan/report.c     | 16 ++++++++--------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/kcsan/kcsan_test.c b/kernel/kcsan/kcsan_test.c
index 660729238588..6e3c2b8bc608 100644
--- a/kernel/kcsan/kcsan_test.c
+++ b/kernel/kcsan/kcsan_test.c
@@ -213,9 +213,9 @@ static bool report_matches(const struct expect_report *r)
 		const bool is_atomic = (ty & KCSAN_ACCESS_ATOMIC);
 		const bool is_scoped = (ty & KCSAN_ACCESS_SCOPED);
 		const char *const access_type_aux =
-				(is_atomic && is_scoped)	? " (marked, scoped)"
+				(is_atomic && is_scoped)	? " (marked, reordered)"
 				: (is_atomic			? " (marked)"
-				   : (is_scoped			? " (scoped)" : ""));
+				   : (is_scoped			? " (reordered)" : ""));
 
 		if (i == 1) {
 			/* Access 2 */
diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
index fc15077991c4..1b0e050bdf6a 100644
--- a/kernel/kcsan/report.c
+++ b/kernel/kcsan/report.c
@@ -215,9 +215,9 @@ static const char *get_access_type(int type)
 	if (type & KCSAN_ACCESS_ASSERT) {
 		if (type & KCSAN_ACCESS_SCOPED) {
 			if (type & KCSAN_ACCESS_WRITE)
-				return "assert no accesses (scoped)";
+				return "assert no accesses (reordered)";
 			else
-				return "assert no writes (scoped)";
+				return "assert no writes (reordered)";
 		} else {
 			if (type & KCSAN_ACCESS_WRITE)
 				return "assert no accesses";
@@ -240,17 +240,17 @@ static const char *get_access_type(int type)
 	case KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC:
 		return "read-write (marked)";
 	case KCSAN_ACCESS_SCOPED:
-		return "read (scoped)";
+		return "read (reordered)";
 	case KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_ATOMIC:
-		return "read (marked, scoped)";
+		return "read (marked, reordered)";
 	case KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_WRITE:
-		return "write (scoped)";
+		return "write (reordered)";
 	case KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC:
-		return "write (marked, scoped)";
+		return "write (marked, reordered)";
 	case KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE:
-		return "read-write (scoped)";
+		return "read-write (reordered)";
 	case KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC:
-		return "read-write (marked, scoped)";
+		return "read-write (marked, reordered)";
 	default:
 		BUG();
 	}
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 08/25] kcsan: Show location access was reordered to
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (6 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 07/25] kcsan: Call scoped accesses reordered in reports Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-12-06  5:03   ` Boqun Feng
  2021-11-30 11:44 ` [PATCH v3 09/25] kcsan: Document modeling of weak memory Marco Elver
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Also show the location the access was reordered to. An example report:

| ==================================================================
| BUG: KCSAN: data-race in test_kernel_wrong_memorder / test_kernel_wrong_memorder
|
| read-write to 0xffffffffc01e61a8 of 8 bytes by task 2311 on cpu 5:
|  test_kernel_wrong_memorder+0x57/0x90
|  access_thread+0x99/0xe0
|  kthread+0x2ba/0x2f0
|  ret_from_fork+0x22/0x30
|
| read-write (reordered) to 0xffffffffc01e61a8 of 8 bytes by task 2310 on cpu 7:
|  test_kernel_wrong_memorder+0x57/0x90
|  access_thread+0x99/0xe0
|  kthread+0x2ba/0x2f0
|  ret_from_fork+0x22/0x30
|   |
|   +-> reordered to: test_kernel_wrong_memorder+0x80/0x90
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 7 PID: 2310 Comm: access_thread Not tainted 5.14.0-rc1+ #18
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
| ==================================================================

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/kcsan/report.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
index 1b0e050bdf6a..67794404042a 100644
--- a/kernel/kcsan/report.c
+++ b/kernel/kcsan/report.c
@@ -308,10 +308,12 @@ static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries
 
 /*
  * Skips to the first entry that matches the function of @ip, and then replaces
- * that entry with @ip, returning the entries to skip.
+ * that entry with @ip, returning the entries to skip with @replaced containing
+ * the replaced entry.
  */
 static int
-replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned long ip)
+replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned long ip,
+		    unsigned long *replaced)
 {
 	unsigned long symbolsize, offset;
 	unsigned long target_func;
@@ -330,6 +332,7 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
 		func -= offset;
 
 		if (func == target_func) {
+			*replaced = stack_entries[skip];
 			stack_entries[skip] = ip;
 			return skip;
 		}
@@ -342,9 +345,10 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
 }
 
 static int
-sanitize_stack_entries(unsigned long stack_entries[], int num_entries, unsigned long ip)
+sanitize_stack_entries(unsigned long stack_entries[], int num_entries, unsigned long ip,
+		       unsigned long *replaced)
 {
-	return ip ? replace_stack_entry(stack_entries, num_entries, ip) :
+	return ip ? replace_stack_entry(stack_entries, num_entries, ip, replaced) :
 			  get_stack_skipnr(stack_entries, num_entries);
 }
 
@@ -360,6 +364,14 @@ static int sym_strcmp(void *addr1, void *addr2)
 	return strncmp(buf1, buf2, sizeof(buf1));
 }
 
+static void
+print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long reordered_to)
+{
+	stack_trace_print(stack_entries, num_entries, 0);
+	if (reordered_to)
+		pr_err("  |\n  +-> reordered to: %pS\n", (void *)reordered_to);
+}
+
 static void print_verbose_info(struct task_struct *task)
 {
 	if (!task)
@@ -378,10 +390,12 @@ static void print_report(enum kcsan_value_change value_change,
 			 struct other_info *other_info,
 			 u64 old, u64 new, u64 mask)
 {
+	unsigned long reordered_to = 0;
 	unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
 	int num_stack_entries = stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
-	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip);
+	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip, &reordered_to);
 	unsigned long this_frame = stack_entries[skipnr];
+	unsigned long other_reordered_to = 0;
 	unsigned long other_frame = 0;
 	int other_skipnr = 0; /* silence uninit warnings */
 
@@ -394,7 +408,7 @@ static void print_report(enum kcsan_value_change value_change,
 	if (other_info) {
 		other_skipnr = sanitize_stack_entries(other_info->stack_entries,
 						      other_info->num_stack_entries,
-						      other_info->ai.ip);
+						      other_info->ai.ip, &other_reordered_to);
 		other_frame = other_info->stack_entries[other_skipnr];
 
 		/* @value_change is only known for the other thread */
@@ -434,10 +448,9 @@ static void print_report(enum kcsan_value_change value_change,
 		       other_info->ai.cpu_id);
 
 		/* Print the other thread's stack trace. */
-		stack_trace_print(other_info->stack_entries + other_skipnr,
+		print_stack_trace(other_info->stack_entries + other_skipnr,
 				  other_info->num_stack_entries - other_skipnr,
-				  0);
-
+				  other_reordered_to);
 		if (IS_ENABLED(CONFIG_KCSAN_VERBOSE))
 			print_verbose_info(other_info->task);
 
@@ -451,9 +464,7 @@ static void print_report(enum kcsan_value_change value_change,
 		       get_thread_desc(ai->task_pid), ai->cpu_id);
 	}
 	/* Print stack trace of this thread. */
-	stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
-			  0);
-
+	print_stack_trace(stack_entries + skipnr, num_stack_entries - skipnr, reordered_to);
 	if (IS_ENABLED(CONFIG_KCSAN_VERBOSE))
 		print_verbose_info(current);
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 09/25] kcsan: Document modeling of weak memory
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (7 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 08/25] kcsan: Show location access was reordered to Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 10/25] kcsan: test: Match reordered or normal accesses Marco Elver
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Document how KCSAN models a subset of weak memory and the subset of
missing memory barriers it can detect as a result.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Note the reason that address or control dependencies do not require
  special handling.
---
 Documentation/dev-tools/kcsan.rst | 76 +++++++++++++++++++++++++------
 1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
index 7db43c7c09b8..3ae866dcc924 100644
--- a/Documentation/dev-tools/kcsan.rst
+++ b/Documentation/dev-tools/kcsan.rst
@@ -204,17 +204,17 @@ Ultimately this allows to determine the possible executions of concurrent code,
 and if that code is free from data races.
 
 KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``,
-``atomic_*``, etc.), but is oblivious of any ordering guarantees and simply
-assumes that memory barriers are placed correctly. In other words, KCSAN
-assumes that as long as a plain access is not observed to race with another
-conflicting access, memory operations are correctly ordered.
-
-This means that KCSAN will not report *potential* data races due to missing
-memory ordering. Developers should therefore carefully consider the required
-memory ordering requirements that remain unchecked. If, however, missing
-memory ordering (that is observable with a particular compiler and
-architecture) leads to an observable data race (e.g. entering a critical
-section erroneously), KCSAN would report the resulting data race.
+``atomic_*``, etc.), and a subset of ordering guarantees implied by memory
+barriers. With ``CONFIG_KCSAN_WEAK_MEMORY=y``, KCSAN models load or store
+buffering, and can detect missing ``smp_mb()``, ``smp_wmb()``, ``smp_rmb()``,
+``smp_store_release()``, and all ``atomic_*`` operations with equivalent
+implied barriers.
+
+Note, KCSAN will not report all data races due to missing memory ordering,
+specifically where a memory barrier would be required to prohibit subsequent
+memory operation from reordering before the barrier. Developers should
+therefore carefully consider the required memory ordering requirements that
+remain unchecked.
 
 Race Detection Beyond Data Races
 --------------------------------
@@ -268,6 +268,56 @@ marked operations, if all accesses to a variable that is accessed concurrently
 are properly marked, KCSAN will never trigger a watchpoint and therefore never
 report the accesses.
 
+Modeling Weak Memory
+~~~~~~~~~~~~~~~~~~~~
+
+KCSAN's approach to detecting data races due to missing memory barriers is
+based on modeling access reordering (with ``CONFIG_KCSAN_WEAK_MEMORY=y``).
+Each plain memory access for which a watchpoint is set up, is also selected for
+simulated reordering within the scope of its function (at most 1 in-flight
+access).
+
+Once an access has been selected for reordering, it is checked along every
+other access until the end of the function scope. If an appropriate memory
+barrier is encountered, the access will no longer be considered for simulated
+reordering.
+
+When the result of a memory operation should be ordered by a barrier, KCSAN can
+then detect data races where the conflict only occurs as a result of a missing
+barrier. Consider the example::
+
+    int x, flag;
+    void T1(void)
+    {
+        x = 1;                  // data race!
+        WRITE_ONCE(flag, 1);    // correct: smp_store_release(&flag, 1)
+    }
+    void T2(void)
+    {
+        while (!READ_ONCE(flag));   // correct: smp_load_acquire(&flag)
+        ... = x;                    // data race!
+    }
+
+When weak memory modeling is enabled, KCSAN can consider ``x`` in ``T1`` for
+simulated reordering. After the write of ``flag``, ``x`` is again checked for
+concurrent accesses: because ``T2`` is able to proceed after the write of
+``flag``, a data race is detected. With the correct barriers in place, ``x``
+would not be considered for reordering after the proper release of ``flag``,
+and no data race would be detected.
+
+Deliberate trade-offs in complexity but also practical limitations mean only a
+subset of data races due to missing memory barriers can be detected. With
+currently available compiler support, the implementation is limited to modeling
+the effects of "buffering" (delaying accesses), since the runtime cannot
+"prefetch" accesses. Also recall that watchpoints are only set up for plain
+accesses, and the only access type for which KCSAN simulates reordering. This
+means reordering of marked accesses is not modeled.
+
+A consequence of the above is that acquire operations do not require barrier
+instrumentation (no prefetching). Furthermore, marked accesses introducing
+address or control dependencies do not require special handling (the marked
+access cannot be reordered, later dependent accesses cannot be prefetched).
+
 Key Properties
 ~~~~~~~~~~~~~~
 
@@ -290,8 +340,8 @@ Key Properties
 4. **Detects Racy Writes from Devices:** Due to checking data values upon
    setting up watchpoints, racy writes from devices can also be detected.
 
-5. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering
-   rules; this may result in missed data races (false negatives).
+5. **Memory Ordering:** KCSAN is aware of only a subset of LKMM ordering rules;
+   this may result in missed data races (false negatives).
 
 6. **Analysis Accuracy:** For observed executions, due to using a sampling
    strategy, the analysis is *unsound* (false negatives possible), but aims to
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 10/25] kcsan: test: Match reordered or normal accesses
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (8 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 09/25] kcsan: Document modeling of weak memory Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 11/25] kcsan: test: Add test cases for memory barrier instrumentation Marco Elver
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Due to reordering accesses with weak memory modeling, any access can now
appear as "(reordered)".

Match any permutation of accesses if CONFIG_KCSAN_WEAK_MEMORY=y, so that
we effectively match an access if it is denoted "(reordered)" or not.

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/kcsan/kcsan_test.c | 92 +++++++++++++++++++++++++++------------
 1 file changed, 63 insertions(+), 29 deletions(-)

diff --git a/kernel/kcsan/kcsan_test.c b/kernel/kcsan/kcsan_test.c
index 6e3c2b8bc608..ec054879201b 100644
--- a/kernel/kcsan/kcsan_test.c
+++ b/kernel/kcsan/kcsan_test.c
@@ -151,7 +151,7 @@ struct expect_report {
 
 /* Check observed report matches information in @r. */
 __no_kcsan
-static bool report_matches(const struct expect_report *r)
+static bool __report_matches(const struct expect_report *r)
 {
 	const bool is_assert = (r->access[0].type | r->access[1].type) & KCSAN_ACCESS_ASSERT;
 	bool ret = false;
@@ -253,6 +253,40 @@ static bool report_matches(const struct expect_report *r)
 	return ret;
 }
 
+static __always_inline const struct expect_report *
+__report_set_scoped(struct expect_report *r, int accesses)
+{
+	BUILD_BUG_ON(accesses > 3);
+
+	if (accesses & 1)
+		r->access[0].type |= KCSAN_ACCESS_SCOPED;
+	else
+		r->access[0].type &= ~KCSAN_ACCESS_SCOPED;
+
+	if (accesses & 2)
+		r->access[1].type |= KCSAN_ACCESS_SCOPED;
+	else
+		r->access[1].type &= ~KCSAN_ACCESS_SCOPED;
+
+	return r;
+}
+
+__no_kcsan
+static bool report_matches_any_reordered(struct expect_report *r)
+{
+	return __report_matches(__report_set_scoped(r, 0)) ||
+	       __report_matches(__report_set_scoped(r, 1)) ||
+	       __report_matches(__report_set_scoped(r, 2)) ||
+	       __report_matches(__report_set_scoped(r, 3));
+}
+
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+/* Due to reordering accesses, any access may appear as "(reordered)". */
+#define report_matches report_matches_any_reordered
+#else
+#define report_matches __report_matches
+#endif
+
 /* ===== Test kernels ===== */
 
 static long test_sink;
@@ -438,13 +472,13 @@ static noinline void test_kernel_xor_1bit(void)
 __no_kcsan
 static void test_basic(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_write, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 		},
 	};
-	static const struct expect_report never = {
+	struct expect_report never = {
 		.access = {
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
@@ -469,14 +503,14 @@ static void test_basic(struct kunit *test)
 __no_kcsan
 static void test_concurrent_races(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			/* NULL will match any address. */
 			{ test_kernel_rmw_array, NULL, 0, __KCSAN_ACCESS_RW(KCSAN_ACCESS_WRITE) },
 			{ test_kernel_rmw_array, NULL, 0, __KCSAN_ACCESS_RW(0) },
 		},
 	};
-	static const struct expect_report never = {
+	struct expect_report never = {
 		.access = {
 			{ test_kernel_rmw_array, NULL, 0, 0 },
 			{ test_kernel_rmw_array, NULL, 0, 0 },
@@ -498,13 +532,13 @@ static void test_concurrent_races(struct kunit *test)
 __no_kcsan
 static void test_novalue_change(struct kunit *test)
 {
-	const struct expect_report expect_rw = {
+	struct expect_report expect_rw = {
 		.access = {
 			{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 		},
 	};
-	const struct expect_report expect_ww = {
+	struct expect_report expect_ww = {
 		.access = {
 			{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
 			{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
@@ -530,13 +564,13 @@ static void test_novalue_change(struct kunit *test)
 __no_kcsan
 static void test_novalue_change_exception(struct kunit *test)
 {
-	const struct expect_report expect_rw = {
+	struct expect_report expect_rw = {
 		.access = {
 			{ test_kernel_write_nochange_rcu, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 		},
 	};
-	const struct expect_report expect_ww = {
+	struct expect_report expect_ww = {
 		.access = {
 			{ test_kernel_write_nochange_rcu, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
 			{ test_kernel_write_nochange_rcu, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
@@ -556,7 +590,7 @@ static void test_novalue_change_exception(struct kunit *test)
 __no_kcsan
 static void test_unknown_origin(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 			{ NULL },
@@ -578,7 +612,7 @@ static void test_unknown_origin(struct kunit *test)
 __no_kcsan
 static void test_write_write_assume_atomic(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_write, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
 			{ test_kernel_write, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
@@ -604,7 +638,7 @@ static void test_write_write_assume_atomic(struct kunit *test)
 __no_kcsan
 static void test_write_write_struct(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
 			{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
@@ -626,7 +660,7 @@ static void test_write_write_struct(struct kunit *test)
 __no_kcsan
 static void test_write_write_struct_part(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
 			{ test_kernel_write_struct_part, &test_struct.val[3], sizeof(test_struct.val[3]), KCSAN_ACCESS_WRITE },
@@ -658,7 +692,7 @@ static void test_read_atomic_write_atomic(struct kunit *test)
 __no_kcsan
 static void test_read_plain_atomic_write(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 			{ test_kernel_write_atomic, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC },
@@ -679,7 +713,7 @@ static void test_read_plain_atomic_write(struct kunit *test)
 __no_kcsan
 static void test_read_plain_atomic_rmw(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 			{ test_kernel_atomic_rmw, &test_var, sizeof(test_var),
@@ -701,13 +735,13 @@ static void test_read_plain_atomic_rmw(struct kunit *test)
 __no_kcsan
 static void test_zero_size_access(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
 			{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
 		},
 	};
-	const struct expect_report never = {
+	struct expect_report never = {
 		.access = {
 			{ test_kernel_write_struct, &test_struct, sizeof(test_struct), KCSAN_ACCESS_WRITE },
 			{ test_kernel_read_struct_zero_size, &test_struct.val[3], 0, 0 },
@@ -741,7 +775,7 @@ static void test_data_race(struct kunit *test)
 __no_kcsan
 static void test_assert_exclusive_writer(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_assert_writer, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
 			{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
@@ -759,7 +793,7 @@ static void test_assert_exclusive_writer(struct kunit *test)
 __no_kcsan
 static void test_assert_exclusive_access(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_assert_access, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE },
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
@@ -777,19 +811,19 @@ static void test_assert_exclusive_access(struct kunit *test)
 __no_kcsan
 static void test_assert_exclusive_access_writer(struct kunit *test)
 {
-	const struct expect_report expect_access_writer = {
+	struct expect_report expect_access_writer = {
 		.access = {
 			{ test_kernel_assert_access, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE },
 			{ test_kernel_assert_writer, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
 		},
 	};
-	const struct expect_report expect_access_access = {
+	struct expect_report expect_access_access = {
 		.access = {
 			{ test_kernel_assert_access, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE },
 			{ test_kernel_assert_access, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE },
 		},
 	};
-	const struct expect_report never = {
+	struct expect_report never = {
 		.access = {
 			{ test_kernel_assert_writer, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
 			{ test_kernel_assert_writer, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
@@ -813,7 +847,7 @@ static void test_assert_exclusive_access_writer(struct kunit *test)
 __no_kcsan
 static void test_assert_exclusive_bits_change(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_assert_bits_change, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT },
 			{ test_kernel_change_bits, &test_var, sizeof(test_var),
@@ -844,13 +878,13 @@ static void test_assert_exclusive_bits_nochange(struct kunit *test)
 __no_kcsan
 static void test_assert_exclusive_writer_scoped(struct kunit *test)
 {
-	const struct expect_report expect_start = {
+	struct expect_report expect_start = {
 		.access = {
 			{ test_kernel_assert_writer_scoped, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_SCOPED },
 			{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
 		},
 	};
-	const struct expect_report expect_inscope = {
+	struct expect_report expect_inscope = {
 		.access = {
 			{ test_enter_scope, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_SCOPED },
 			{ test_kernel_write_nochange, &test_var, sizeof(test_var), KCSAN_ACCESS_WRITE },
@@ -871,16 +905,16 @@ static void test_assert_exclusive_writer_scoped(struct kunit *test)
 __no_kcsan
 static void test_assert_exclusive_access_scoped(struct kunit *test)
 {
-	const struct expect_report expect_start1 = {
+	struct expect_report expect_start1 = {
 		.access = {
 			{ test_kernel_assert_access_scoped, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_SCOPED },
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 		},
 	};
-	const struct expect_report expect_start2 = {
+	struct expect_report expect_start2 = {
 		.access = { expect_start1.access[0], expect_start1.access[0] },
 	};
-	const struct expect_report expect_inscope = {
+	struct expect_report expect_inscope = {
 		.access = {
 			{ test_enter_scope, &test_var, sizeof(test_var), KCSAN_ACCESS_ASSERT | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_SCOPED },
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
@@ -985,7 +1019,7 @@ static void test_atomic_builtins(struct kunit *test)
 __no_kcsan
 static void test_1bit_value_change(struct kunit *test)
 {
-	const struct expect_report expect = {
+	struct expect_report expect = {
 		.access = {
 			{ test_kernel_read, &test_var, sizeof(test_var), 0 },
 			{ test_kernel_xor_1bit, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(KCSAN_ACCESS_WRITE) },
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 11/25] kcsan: test: Add test cases for memory barrier instrumentation
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (9 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 10/25] kcsan: test: Match reordered or normal accesses Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 12/25] kcsan: Ignore GCC 11+ warnings about TSan runtime support Marco Elver
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Adds test cases to check that memory barriers are instrumented
correctly, and detection of missing memory barriers is working as
intended if CONFIG_KCSAN_STRICT=y.

Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* Remove __no_kcsan from barrier test, given __no_kcsan would now remove
  barrier instrumentation, too.
---
 kernel/kcsan/kcsan_test.c | 319 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 319 insertions(+)

diff --git a/kernel/kcsan/kcsan_test.c b/kernel/kcsan/kcsan_test.c
index ec054879201b..5bf94550bcdf 100644
--- a/kernel/kcsan/kcsan_test.c
+++ b/kernel/kcsan/kcsan_test.c
@@ -16,9 +16,12 @@
 #define pr_fmt(fmt) "kcsan_test: " fmt
 
 #include <kunit/test.h>
+#include <linux/atomic.h>
+#include <linux/bitops.h>
 #include <linux/jiffies.h>
 #include <linux/kcsan-checks.h>
 #include <linux/kernel.h>
+#include <linux/mutex.h>
 #include <linux/sched.h>
 #include <linux/seqlock.h>
 #include <linux/spinlock.h>
@@ -305,6 +308,16 @@ static DEFINE_SEQLOCK(test_seqlock);
 __no_kcsan
 static noinline void sink_value(long v) { WRITE_ONCE(test_sink, v); }
 
+/*
+ * Generates a delay and some accesses that enter the runtime but do not produce
+ * data races.
+ */
+static noinline void test_delay(int iter)
+{
+	while (iter--)
+		sink_value(READ_ONCE(test_sink));
+}
+
 static noinline void test_kernel_read(void) { sink_value(test_var); }
 
 static noinline void test_kernel_write(void)
@@ -466,8 +479,219 @@ static noinline void test_kernel_xor_1bit(void)
 	kcsan_nestable_atomic_end();
 }
 
+#define TEST_KERNEL_LOCKED(name, acquire, release)		\
+	static noinline void test_kernel_##name(void)		\
+	{							\
+		long *flag = &test_struct.val[0];		\
+		long v = 0;					\
+		if (!(acquire))					\
+			return;					\
+		while (v++ < 100) {				\
+			test_var++;				\
+			barrier();				\
+		}						\
+		release;					\
+		test_delay(10);					\
+	}
+
+TEST_KERNEL_LOCKED(with_memorder,
+		   cmpxchg_acquire(flag, 0, 1) == 0,
+		   smp_store_release(flag, 0));
+TEST_KERNEL_LOCKED(wrong_memorder,
+		   cmpxchg_relaxed(flag, 0, 1) == 0,
+		   WRITE_ONCE(*flag, 0));
+TEST_KERNEL_LOCKED(atomic_builtin_with_memorder,
+		   __atomic_compare_exchange_n(flag, &v, 1, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED),
+		   __atomic_store_n(flag, 0, __ATOMIC_RELEASE));
+TEST_KERNEL_LOCKED(atomic_builtin_wrong_memorder,
+		   __atomic_compare_exchange_n(flag, &v, 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED),
+		   __atomic_store_n(flag, 0, __ATOMIC_RELAXED));
+
 /* ===== Test cases ===== */
 
+/*
+ * Tests that various barriers have the expected effect on internal state. Not
+ * exhaustive on atomic_t operations. Unlike the selftest, also checks for
+ * too-strict barrier instrumentation; these can be tolerated, because it does
+ * not cause false positives, but at least we should be aware of such cases.
+ */
+static void test_barrier_nothreads(struct kunit *test)
+{
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	struct kcsan_scoped_access *reorder_access = &current->kcsan_ctx.reorder_access;
+#else
+	struct kcsan_scoped_access *reorder_access = NULL;
+#endif
+	arch_spinlock_t arch_spinlock = __ARCH_SPIN_LOCK_UNLOCKED;
+	DEFINE_SPINLOCK(spinlock);
+	DEFINE_MUTEX(mutex);
+	atomic_t dummy;
+
+	KCSAN_TEST_REQUIRES(test, reorder_access != NULL);
+	KCSAN_TEST_REQUIRES(test, IS_ENABLED(CONFIG_SMP));
+
+#define __KCSAN_EXPECT_BARRIER(access_type, barrier, order_before, name)			\
+	do {											\
+		reorder_access->type = (access_type) | KCSAN_ACCESS_SCOPED;			\
+		reorder_access->size = sizeof(test_var);					\
+		barrier;									\
+		KUNIT_EXPECT_EQ_MSG(test, reorder_access->size,					\
+				    order_before ? 0 : sizeof(test_var),			\
+				    "improperly instrumented type=(" #access_type "): " name);	\
+	} while (0)
+#define KCSAN_EXPECT_READ_BARRIER(b, o)  __KCSAN_EXPECT_BARRIER(0, b, o, #b)
+#define KCSAN_EXPECT_WRITE_BARRIER(b, o) __KCSAN_EXPECT_BARRIER(KCSAN_ACCESS_WRITE, b, o, #b)
+#define KCSAN_EXPECT_RW_BARRIER(b, o)    __KCSAN_EXPECT_BARRIER(KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE, b, o, #b)
+
+	/* Force creating a valid entry in reorder_access first. */
+	test_var = 0;
+	while (test_var++ < 1000000 && reorder_access->size != sizeof(test_var))
+		__kcsan_check_read(&test_var, sizeof(test_var));
+	KUNIT_ASSERT_EQ(test, reorder_access->size, sizeof(test_var));
+
+	kcsan_nestable_atomic_begin(); /* No watchpoints in called functions. */
+
+	KCSAN_EXPECT_READ_BARRIER(mb(), true);
+	KCSAN_EXPECT_READ_BARRIER(wmb(), false);
+	KCSAN_EXPECT_READ_BARRIER(rmb(), true);
+	KCSAN_EXPECT_READ_BARRIER(smp_mb(), true);
+	KCSAN_EXPECT_READ_BARRIER(smp_wmb(), false);
+	KCSAN_EXPECT_READ_BARRIER(smp_rmb(), true);
+	KCSAN_EXPECT_READ_BARRIER(dma_wmb(), false);
+	KCSAN_EXPECT_READ_BARRIER(dma_rmb(), true);
+	KCSAN_EXPECT_READ_BARRIER(smp_mb__before_atomic(), true);
+	KCSAN_EXPECT_READ_BARRIER(smp_mb__after_atomic(), true);
+	KCSAN_EXPECT_READ_BARRIER(smp_mb__after_spinlock(), true);
+	KCSAN_EXPECT_READ_BARRIER(smp_store_mb(test_var, 0), true);
+	KCSAN_EXPECT_READ_BARRIER(smp_load_acquire(&test_var), false);
+	KCSAN_EXPECT_READ_BARRIER(smp_store_release(&test_var, 0), true);
+	KCSAN_EXPECT_READ_BARRIER(xchg(&test_var, 0), true);
+	KCSAN_EXPECT_READ_BARRIER(xchg_release(&test_var, 0), true);
+	KCSAN_EXPECT_READ_BARRIER(xchg_relaxed(&test_var, 0), false);
+	KCSAN_EXPECT_READ_BARRIER(cmpxchg(&test_var, 0,  0), true);
+	KCSAN_EXPECT_READ_BARRIER(cmpxchg_release(&test_var, 0,  0), true);
+	KCSAN_EXPECT_READ_BARRIER(cmpxchg_relaxed(&test_var, 0,  0), false);
+	KCSAN_EXPECT_READ_BARRIER(atomic_read(&dummy), false);
+	KCSAN_EXPECT_READ_BARRIER(atomic_read_acquire(&dummy), false);
+	KCSAN_EXPECT_READ_BARRIER(atomic_set(&dummy, 0), false);
+	KCSAN_EXPECT_READ_BARRIER(atomic_set_release(&dummy, 0), true);
+	KCSAN_EXPECT_READ_BARRIER(atomic_add(1, &dummy), false);
+	KCSAN_EXPECT_READ_BARRIER(atomic_add_return(1, &dummy), true);
+	KCSAN_EXPECT_READ_BARRIER(atomic_add_return_acquire(1, &dummy), false);
+	KCSAN_EXPECT_READ_BARRIER(atomic_add_return_release(1, &dummy), true);
+	KCSAN_EXPECT_READ_BARRIER(atomic_add_return_relaxed(1, &dummy), false);
+	KCSAN_EXPECT_READ_BARRIER(atomic_fetch_add(1, &dummy), true);
+	KCSAN_EXPECT_READ_BARRIER(atomic_fetch_add_acquire(1, &dummy), false);
+	KCSAN_EXPECT_READ_BARRIER(atomic_fetch_add_release(1, &dummy), true);
+	KCSAN_EXPECT_READ_BARRIER(atomic_fetch_add_relaxed(1, &dummy), false);
+	KCSAN_EXPECT_READ_BARRIER(test_and_set_bit(0, &test_var), true);
+	KCSAN_EXPECT_READ_BARRIER(test_and_clear_bit(0, &test_var), true);
+	KCSAN_EXPECT_READ_BARRIER(test_and_change_bit(0, &test_var), true);
+	KCSAN_EXPECT_READ_BARRIER(clear_bit_unlock(0, &test_var), true);
+	KCSAN_EXPECT_READ_BARRIER(__clear_bit_unlock(0, &test_var), true);
+	KCSAN_EXPECT_READ_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var), true);
+	KCSAN_EXPECT_READ_BARRIER(arch_spin_lock(&arch_spinlock), false);
+	KCSAN_EXPECT_READ_BARRIER(arch_spin_unlock(&arch_spinlock), true);
+	KCSAN_EXPECT_READ_BARRIER(spin_lock(&spinlock), false);
+	KCSAN_EXPECT_READ_BARRIER(spin_unlock(&spinlock), true);
+	KCSAN_EXPECT_READ_BARRIER(mutex_lock(&mutex), false);
+	KCSAN_EXPECT_READ_BARRIER(mutex_unlock(&mutex), true);
+
+	KCSAN_EXPECT_WRITE_BARRIER(mb(), true);
+	KCSAN_EXPECT_WRITE_BARRIER(wmb(), true);
+	KCSAN_EXPECT_WRITE_BARRIER(rmb(), false);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_mb(), true);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_wmb(), true);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_rmb(), false);
+	KCSAN_EXPECT_WRITE_BARRIER(dma_wmb(), true);
+	KCSAN_EXPECT_WRITE_BARRIER(dma_rmb(), false);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_mb__before_atomic(), true);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_mb__after_atomic(), true);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_mb__after_spinlock(), true);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_store_mb(test_var, 0), true);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_load_acquire(&test_var), false);
+	KCSAN_EXPECT_WRITE_BARRIER(smp_store_release(&test_var, 0), true);
+	KCSAN_EXPECT_WRITE_BARRIER(xchg(&test_var, 0), true);
+	KCSAN_EXPECT_WRITE_BARRIER(xchg_release(&test_var, 0), true);
+	KCSAN_EXPECT_WRITE_BARRIER(xchg_relaxed(&test_var, 0), false);
+	KCSAN_EXPECT_WRITE_BARRIER(cmpxchg(&test_var, 0,  0), true);
+	KCSAN_EXPECT_WRITE_BARRIER(cmpxchg_release(&test_var, 0,  0), true);
+	KCSAN_EXPECT_WRITE_BARRIER(cmpxchg_relaxed(&test_var, 0,  0), false);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_read(&dummy), false);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_read_acquire(&dummy), false);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_set(&dummy, 0), false);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_set_release(&dummy, 0), true);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_add(1, &dummy), false);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_add_return(1, &dummy), true);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_add_return_acquire(1, &dummy), false);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_add_return_release(1, &dummy), true);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_add_return_relaxed(1, &dummy), false);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_fetch_add(1, &dummy), true);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_fetch_add_acquire(1, &dummy), false);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_fetch_add_release(1, &dummy), true);
+	KCSAN_EXPECT_WRITE_BARRIER(atomic_fetch_add_relaxed(1, &dummy), false);
+	KCSAN_EXPECT_WRITE_BARRIER(test_and_set_bit(0, &test_var), true);
+	KCSAN_EXPECT_WRITE_BARRIER(test_and_clear_bit(0, &test_var), true);
+	KCSAN_EXPECT_WRITE_BARRIER(test_and_change_bit(0, &test_var), true);
+	KCSAN_EXPECT_WRITE_BARRIER(clear_bit_unlock(0, &test_var), true);
+	KCSAN_EXPECT_WRITE_BARRIER(__clear_bit_unlock(0, &test_var), true);
+	KCSAN_EXPECT_WRITE_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var), true);
+	KCSAN_EXPECT_WRITE_BARRIER(arch_spin_lock(&arch_spinlock), false);
+	KCSAN_EXPECT_WRITE_BARRIER(arch_spin_unlock(&arch_spinlock), true);
+	KCSAN_EXPECT_WRITE_BARRIER(spin_lock(&spinlock), false);
+	KCSAN_EXPECT_WRITE_BARRIER(spin_unlock(&spinlock), true);
+	KCSAN_EXPECT_WRITE_BARRIER(mutex_lock(&mutex), false);
+	KCSAN_EXPECT_WRITE_BARRIER(mutex_unlock(&mutex), true);
+
+	KCSAN_EXPECT_RW_BARRIER(mb(), true);
+	KCSAN_EXPECT_RW_BARRIER(wmb(), true);
+	KCSAN_EXPECT_RW_BARRIER(rmb(), true);
+	KCSAN_EXPECT_RW_BARRIER(smp_mb(), true);
+	KCSAN_EXPECT_RW_BARRIER(smp_wmb(), true);
+	KCSAN_EXPECT_RW_BARRIER(smp_rmb(), true);
+	KCSAN_EXPECT_RW_BARRIER(dma_wmb(), true);
+	KCSAN_EXPECT_RW_BARRIER(dma_rmb(), true);
+	KCSAN_EXPECT_RW_BARRIER(smp_mb__before_atomic(), true);
+	KCSAN_EXPECT_RW_BARRIER(smp_mb__after_atomic(), true);
+	KCSAN_EXPECT_RW_BARRIER(smp_mb__after_spinlock(), true);
+	KCSAN_EXPECT_RW_BARRIER(smp_store_mb(test_var, 0), true);
+	KCSAN_EXPECT_RW_BARRIER(smp_load_acquire(&test_var), false);
+	KCSAN_EXPECT_RW_BARRIER(smp_store_release(&test_var, 0), true);
+	KCSAN_EXPECT_RW_BARRIER(xchg(&test_var, 0), true);
+	KCSAN_EXPECT_RW_BARRIER(xchg_release(&test_var, 0), true);
+	KCSAN_EXPECT_RW_BARRIER(xchg_relaxed(&test_var, 0), false);
+	KCSAN_EXPECT_RW_BARRIER(cmpxchg(&test_var, 0,  0), true);
+	KCSAN_EXPECT_RW_BARRIER(cmpxchg_release(&test_var, 0,  0), true);
+	KCSAN_EXPECT_RW_BARRIER(cmpxchg_relaxed(&test_var, 0,  0), false);
+	KCSAN_EXPECT_RW_BARRIER(atomic_read(&dummy), false);
+	KCSAN_EXPECT_RW_BARRIER(atomic_read_acquire(&dummy), false);
+	KCSAN_EXPECT_RW_BARRIER(atomic_set(&dummy, 0), false);
+	KCSAN_EXPECT_RW_BARRIER(atomic_set_release(&dummy, 0), true);
+	KCSAN_EXPECT_RW_BARRIER(atomic_add(1, &dummy), false);
+	KCSAN_EXPECT_RW_BARRIER(atomic_add_return(1, &dummy), true);
+	KCSAN_EXPECT_RW_BARRIER(atomic_add_return_acquire(1, &dummy), false);
+	KCSAN_EXPECT_RW_BARRIER(atomic_add_return_release(1, &dummy), true);
+	KCSAN_EXPECT_RW_BARRIER(atomic_add_return_relaxed(1, &dummy), false);
+	KCSAN_EXPECT_RW_BARRIER(atomic_fetch_add(1, &dummy), true);
+	KCSAN_EXPECT_RW_BARRIER(atomic_fetch_add_acquire(1, &dummy), false);
+	KCSAN_EXPECT_RW_BARRIER(atomic_fetch_add_release(1, &dummy), true);
+	KCSAN_EXPECT_RW_BARRIER(atomic_fetch_add_relaxed(1, &dummy), false);
+	KCSAN_EXPECT_RW_BARRIER(test_and_set_bit(0, &test_var), true);
+	KCSAN_EXPECT_RW_BARRIER(test_and_clear_bit(0, &test_var), true);
+	KCSAN_EXPECT_RW_BARRIER(test_and_change_bit(0, &test_var), true);
+	KCSAN_EXPECT_RW_BARRIER(clear_bit_unlock(0, &test_var), true);
+	KCSAN_EXPECT_RW_BARRIER(__clear_bit_unlock(0, &test_var), true);
+	KCSAN_EXPECT_RW_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var), true);
+	KCSAN_EXPECT_RW_BARRIER(arch_spin_lock(&arch_spinlock), false);
+	KCSAN_EXPECT_RW_BARRIER(arch_spin_unlock(&arch_spinlock), true);
+	KCSAN_EXPECT_RW_BARRIER(spin_lock(&spinlock), false);
+	KCSAN_EXPECT_RW_BARRIER(spin_unlock(&spinlock), true);
+	KCSAN_EXPECT_RW_BARRIER(mutex_lock(&mutex), false);
+	KCSAN_EXPECT_RW_BARRIER(mutex_unlock(&mutex), true);
+
+	kcsan_nestable_atomic_end();
+}
+
 /* Simple test with normal data race. */
 __no_kcsan
 static void test_basic(struct kunit *test)
@@ -1039,6 +1263,90 @@ static void test_1bit_value_change(struct kunit *test)
 		KUNIT_EXPECT_TRUE(test, match);
 }
 
+__no_kcsan
+static void test_correct_barrier(struct kunit *test)
+{
+	struct expect_report expect = {
+		.access = {
+			{ test_kernel_with_memorder, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(KCSAN_ACCESS_WRITE) },
+			{ test_kernel_with_memorder, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(0) },
+		},
+	};
+	bool match_expect = false;
+
+	test_struct.val[0] = 0; /* init unlocked */
+	begin_test_checks(test_kernel_with_memorder, test_kernel_with_memorder);
+	do {
+		match_expect = report_matches_any_reordered(&expect);
+	} while (!end_test_checks(match_expect));
+	KUNIT_EXPECT_FALSE(test, match_expect);
+}
+
+__no_kcsan
+static void test_missing_barrier(struct kunit *test)
+{
+	struct expect_report expect = {
+		.access = {
+			{ test_kernel_wrong_memorder, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(KCSAN_ACCESS_WRITE) },
+			{ test_kernel_wrong_memorder, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(0) },
+		},
+	};
+	bool match_expect = false;
+
+	test_struct.val[0] = 0; /* init unlocked */
+	begin_test_checks(test_kernel_wrong_memorder, test_kernel_wrong_memorder);
+	do {
+		match_expect = report_matches_any_reordered(&expect);
+	} while (!end_test_checks(match_expect));
+	if (IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
+		KUNIT_EXPECT_TRUE(test, match_expect);
+	else
+		KUNIT_EXPECT_FALSE(test, match_expect);
+}
+
+__no_kcsan
+static void test_atomic_builtins_correct_barrier(struct kunit *test)
+{
+	struct expect_report expect = {
+		.access = {
+			{ test_kernel_atomic_builtin_with_memorder, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(KCSAN_ACCESS_WRITE) },
+			{ test_kernel_atomic_builtin_with_memorder, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(0) },
+		},
+	};
+	bool match_expect = false;
+
+	test_struct.val[0] = 0; /* init unlocked */
+	begin_test_checks(test_kernel_atomic_builtin_with_memorder,
+			  test_kernel_atomic_builtin_with_memorder);
+	do {
+		match_expect = report_matches_any_reordered(&expect);
+	} while (!end_test_checks(match_expect));
+	KUNIT_EXPECT_FALSE(test, match_expect);
+}
+
+__no_kcsan
+static void test_atomic_builtins_missing_barrier(struct kunit *test)
+{
+	struct expect_report expect = {
+		.access = {
+			{ test_kernel_atomic_builtin_wrong_memorder, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(KCSAN_ACCESS_WRITE) },
+			{ test_kernel_atomic_builtin_wrong_memorder, &test_var, sizeof(test_var), __KCSAN_ACCESS_RW(0) },
+		},
+	};
+	bool match_expect = false;
+
+	test_struct.val[0] = 0; /* init unlocked */
+	begin_test_checks(test_kernel_atomic_builtin_wrong_memorder,
+			  test_kernel_atomic_builtin_wrong_memorder);
+	do {
+		match_expect = report_matches_any_reordered(&expect);
+	} while (!end_test_checks(match_expect));
+	if (IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
+		KUNIT_EXPECT_TRUE(test, match_expect);
+	else
+		KUNIT_EXPECT_FALSE(test, match_expect);
+}
+
 /*
  * Generate thread counts for all test cases. Values generated are in interval
  * [2, 5] followed by exponentially increasing thread counts from 8 to 32.
@@ -1088,6 +1396,7 @@ static const void *nthreads_gen_params(const void *prev, char *desc)
 
 #define KCSAN_KUNIT_CASE(test_name) KUNIT_CASE_PARAM(test_name, nthreads_gen_params)
 static struct kunit_case kcsan_test_cases[] = {
+	KUNIT_CASE(test_barrier_nothreads),
 	KCSAN_KUNIT_CASE(test_basic),
 	KCSAN_KUNIT_CASE(test_concurrent_races),
 	KCSAN_KUNIT_CASE(test_novalue_change),
@@ -1112,6 +1421,10 @@ static struct kunit_case kcsan_test_cases[] = {
 	KCSAN_KUNIT_CASE(test_seqlock_noreport),
 	KCSAN_KUNIT_CASE(test_atomic_builtins),
 	KCSAN_KUNIT_CASE(test_1bit_value_change),
+	KCSAN_KUNIT_CASE(test_correct_barrier),
+	KCSAN_KUNIT_CASE(test_missing_barrier),
+	KCSAN_KUNIT_CASE(test_atomic_builtins_correct_barrier),
+	KCSAN_KUNIT_CASE(test_atomic_builtins_missing_barrier),
 	{},
 };
 
@@ -1176,6 +1489,9 @@ static int test_init(struct kunit *test)
 	observed.nlines = 0;
 	spin_unlock_irqrestore(&observed.lock, flags);
 
+	if (strstr(test->name, "nothreads"))
+		return 0;
+
 	if (!torture_init_begin((char *)test->name, 1))
 		return -EBUSY;
 
@@ -1218,6 +1534,9 @@ static void test_exit(struct kunit *test)
 	struct task_struct **stop_thread;
 	int i;
 
+	if (strstr(test->name, "nothreads"))
+		return;
+
 	if (torture_cleanup_begin())
 		return;
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 12/25] kcsan: Ignore GCC 11+ warnings about TSan runtime support
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (10 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 11/25] kcsan: test: Add test cases for memory barrier instrumentation Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 13/25] kcsan: selftest: Add test case to check memory barrier instrumentation Marco Elver
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

GCC 11 has introduced a new warning option, -Wtsan [1], to warn about
unsupported operations in the TSan runtime. But KCSAN != TSan runtime,
so none of the warnings apply.

[1] https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Warning-Options.html

Ignore the warnings.

Currently the warning only fires in the test for __atomic_thread_fence():

kernel/kcsan/kcsan_test.c: In function ‘test_atomic_builtins’:
kernel/kcsan/kcsan_test.c:1234:17: warning: ‘atomic_thread_fence’ is not supported with ‘-fsanitize=thread’ [-Wtsan]
 1234 |                 __atomic_thread_fence(__ATOMIC_SEQ_CST);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

which exists to ensure the KCSAN runtime keeps supporting the builtin
instrumentation.

Signed-off-by: Marco Elver <elver@google.com>
---
 scripts/Makefile.kcsan | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
index 4c7f0d282e42..19f693b68a96 100644
--- a/scripts/Makefile.kcsan
+++ b/scripts/Makefile.kcsan
@@ -13,6 +13,12 @@ kcsan-cflags := -fsanitize=thread -fno-optimize-sibling-calls \
 	$(call cc-option,$(call cc-param,tsan-compound-read-before-write=1),$(call cc-option,$(call cc-param,tsan-instrument-read-before-write=1))) \
 	$(call cc-param,tsan-distinguish-volatile=1)
 
+ifdef CONFIG_CC_IS_GCC
+# GCC started warning about operations unsupported by the TSan runtime. But
+# KCSAN != TSan, so just ignore these warnings.
+kcsan-cflags += -Wno-tsan
+endif
+
 ifndef CONFIG_KCSAN_WEAK_MEMORY
 kcsan-cflags += $(call cc-option,$(call cc-param,tsan-instrument-func-entry-exit=0))
 endif
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 13/25] kcsan: selftest: Add test case to check memory barrier instrumentation
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (11 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 12/25] kcsan: Ignore GCC 11+ warnings about TSan runtime support Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 14/25] locking/barriers, kcsan: Add instrumentation for barriers Marco Elver
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Memory barrier instrumentation is crucial to avoid false positives. To
avoid surprises, run a simple test case in the boot-time selftest to
ensure memory barriers are still instrumented correctly.

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/kcsan/Makefile   |   2 +
 kernel/kcsan/selftest.c | 141 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 143 insertions(+)

diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
index c2bb07f5bcc7..ff47e896de3b 100644
--- a/kernel/kcsan/Makefile
+++ b/kernel/kcsan/Makefile
@@ -11,6 +11,8 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
 	-fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 obj-y := core.o debugfs.o report.o
+
+KCSAN_INSTRUMENT_BARRIERS_selftest.o := y
 obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o
 
 CFLAGS_kcsan_test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer
diff --git a/kernel/kcsan/selftest.c b/kernel/kcsan/selftest.c
index b4295a3892b7..08c6b84b9ebe 100644
--- a/kernel/kcsan/selftest.c
+++ b/kernel/kcsan/selftest.c
@@ -7,10 +7,15 @@
 
 #define pr_fmt(fmt) "kcsan: " fmt
 
+#include <linux/atomic.h>
+#include <linux/bitops.h>
 #include <linux/init.h>
+#include <linux/kcsan-checks.h>
 #include <linux/kernel.h>
 #include <linux/printk.h>
 #include <linux/random.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
 #include <linux/types.h>
 
 #include "encoding.h"
@@ -103,6 +108,141 @@ static bool __init test_matching_access(void)
 	return true;
 }
 
+/*
+ * Correct memory barrier instrumentation is critical to avoiding false
+ * positives: simple test to check at boot certain barriers are always properly
+ * instrumented. See kcsan_test for a more complete test.
+ */
+static bool __init test_barrier(void)
+{
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	struct kcsan_scoped_access *reorder_access = &current->kcsan_ctx.reorder_access;
+#else
+	struct kcsan_scoped_access *reorder_access = NULL;
+#endif
+	bool ret = true;
+	arch_spinlock_t arch_spinlock = __ARCH_SPIN_LOCK_UNLOCKED;
+	DEFINE_SPINLOCK(spinlock);
+	atomic_t dummy;
+	long test_var;
+
+	if (!reorder_access || !IS_ENABLED(CONFIG_SMP))
+		return true;
+
+#define __KCSAN_CHECK_BARRIER(access_type, barrier, name)					\
+	do {											\
+		reorder_access->type = (access_type) | KCSAN_ACCESS_SCOPED;			\
+		reorder_access->size = 1;							\
+		barrier;									\
+		if (reorder_access->size != 0) {						\
+			pr_err("improperly instrumented type=(" #access_type "): " name "\n");	\
+			ret = false;								\
+		}										\
+	} while (0)
+#define KCSAN_CHECK_READ_BARRIER(b)  __KCSAN_CHECK_BARRIER(0, b, #b)
+#define KCSAN_CHECK_WRITE_BARRIER(b) __KCSAN_CHECK_BARRIER(KCSAN_ACCESS_WRITE, b, #b)
+#define KCSAN_CHECK_RW_BARRIER(b)    __KCSAN_CHECK_BARRIER(KCSAN_ACCESS_WRITE | KCSAN_ACCESS_COMPOUND, b, #b)
+
+	kcsan_nestable_atomic_begin(); /* No watchpoints in called functions. */
+
+	KCSAN_CHECK_READ_BARRIER(mb());
+	KCSAN_CHECK_READ_BARRIER(rmb());
+	KCSAN_CHECK_READ_BARRIER(smp_mb());
+	KCSAN_CHECK_READ_BARRIER(smp_rmb());
+	KCSAN_CHECK_READ_BARRIER(dma_rmb());
+	KCSAN_CHECK_READ_BARRIER(smp_mb__before_atomic());
+	KCSAN_CHECK_READ_BARRIER(smp_mb__after_atomic());
+	KCSAN_CHECK_READ_BARRIER(smp_mb__after_spinlock());
+	KCSAN_CHECK_READ_BARRIER(smp_store_mb(test_var, 0));
+	KCSAN_CHECK_READ_BARRIER(smp_store_release(&test_var, 0));
+	KCSAN_CHECK_READ_BARRIER(xchg(&test_var, 0));
+	KCSAN_CHECK_READ_BARRIER(xchg_release(&test_var, 0));
+	KCSAN_CHECK_READ_BARRIER(cmpxchg(&test_var, 0,  0));
+	KCSAN_CHECK_READ_BARRIER(cmpxchg_release(&test_var, 0,  0));
+	KCSAN_CHECK_READ_BARRIER(atomic_set_release(&dummy, 0));
+	KCSAN_CHECK_READ_BARRIER(atomic_add_return(1, &dummy));
+	KCSAN_CHECK_READ_BARRIER(atomic_add_return_release(1, &dummy));
+	KCSAN_CHECK_READ_BARRIER(atomic_fetch_add(1, &dummy));
+	KCSAN_CHECK_READ_BARRIER(atomic_fetch_add_release(1, &dummy));
+	KCSAN_CHECK_READ_BARRIER(test_and_set_bit(0, &test_var));
+	KCSAN_CHECK_READ_BARRIER(test_and_clear_bit(0, &test_var));
+	KCSAN_CHECK_READ_BARRIER(test_and_change_bit(0, &test_var));
+	KCSAN_CHECK_READ_BARRIER(clear_bit_unlock(0, &test_var));
+	KCSAN_CHECK_READ_BARRIER(__clear_bit_unlock(0, &test_var));
+	KCSAN_CHECK_READ_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var));
+	arch_spin_lock(&arch_spinlock);
+	KCSAN_CHECK_READ_BARRIER(arch_spin_unlock(&arch_spinlock));
+	spin_lock(&spinlock);
+	KCSAN_CHECK_READ_BARRIER(spin_unlock(&spinlock));
+
+	KCSAN_CHECK_WRITE_BARRIER(mb());
+	KCSAN_CHECK_WRITE_BARRIER(wmb());
+	KCSAN_CHECK_WRITE_BARRIER(smp_mb());
+	KCSAN_CHECK_WRITE_BARRIER(smp_wmb());
+	KCSAN_CHECK_WRITE_BARRIER(dma_wmb());
+	KCSAN_CHECK_WRITE_BARRIER(smp_mb__before_atomic());
+	KCSAN_CHECK_WRITE_BARRIER(smp_mb__after_atomic());
+	KCSAN_CHECK_WRITE_BARRIER(smp_mb__after_spinlock());
+	KCSAN_CHECK_WRITE_BARRIER(smp_store_mb(test_var, 0));
+	KCSAN_CHECK_WRITE_BARRIER(smp_store_release(&test_var, 0));
+	KCSAN_CHECK_WRITE_BARRIER(xchg(&test_var, 0));
+	KCSAN_CHECK_WRITE_BARRIER(xchg_release(&test_var, 0));
+	KCSAN_CHECK_WRITE_BARRIER(cmpxchg(&test_var, 0,  0));
+	KCSAN_CHECK_WRITE_BARRIER(cmpxchg_release(&test_var, 0,  0));
+	KCSAN_CHECK_WRITE_BARRIER(atomic_set_release(&dummy, 0));
+	KCSAN_CHECK_WRITE_BARRIER(atomic_add_return(1, &dummy));
+	KCSAN_CHECK_WRITE_BARRIER(atomic_add_return_release(1, &dummy));
+	KCSAN_CHECK_WRITE_BARRIER(atomic_fetch_add(1, &dummy));
+	KCSAN_CHECK_WRITE_BARRIER(atomic_fetch_add_release(1, &dummy));
+	KCSAN_CHECK_WRITE_BARRIER(test_and_set_bit(0, &test_var));
+	KCSAN_CHECK_WRITE_BARRIER(test_and_clear_bit(0, &test_var));
+	KCSAN_CHECK_WRITE_BARRIER(test_and_change_bit(0, &test_var));
+	KCSAN_CHECK_WRITE_BARRIER(clear_bit_unlock(0, &test_var));
+	KCSAN_CHECK_WRITE_BARRIER(__clear_bit_unlock(0, &test_var));
+	KCSAN_CHECK_WRITE_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var));
+	arch_spin_lock(&arch_spinlock);
+	KCSAN_CHECK_WRITE_BARRIER(arch_spin_unlock(&arch_spinlock));
+	spin_lock(&spinlock);
+	KCSAN_CHECK_WRITE_BARRIER(spin_unlock(&spinlock));
+
+	KCSAN_CHECK_RW_BARRIER(mb());
+	KCSAN_CHECK_RW_BARRIER(wmb());
+	KCSAN_CHECK_RW_BARRIER(rmb());
+	KCSAN_CHECK_RW_BARRIER(smp_mb());
+	KCSAN_CHECK_RW_BARRIER(smp_wmb());
+	KCSAN_CHECK_RW_BARRIER(smp_rmb());
+	KCSAN_CHECK_RW_BARRIER(dma_wmb());
+	KCSAN_CHECK_RW_BARRIER(dma_rmb());
+	KCSAN_CHECK_RW_BARRIER(smp_mb__before_atomic());
+	KCSAN_CHECK_RW_BARRIER(smp_mb__after_atomic());
+	KCSAN_CHECK_RW_BARRIER(smp_mb__after_spinlock());
+	KCSAN_CHECK_RW_BARRIER(smp_store_mb(test_var, 0));
+	KCSAN_CHECK_RW_BARRIER(smp_store_release(&test_var, 0));
+	KCSAN_CHECK_RW_BARRIER(xchg(&test_var, 0));
+	KCSAN_CHECK_RW_BARRIER(xchg_release(&test_var, 0));
+	KCSAN_CHECK_RW_BARRIER(cmpxchg(&test_var, 0,  0));
+	KCSAN_CHECK_RW_BARRIER(cmpxchg_release(&test_var, 0,  0));
+	KCSAN_CHECK_RW_BARRIER(atomic_set_release(&dummy, 0));
+	KCSAN_CHECK_RW_BARRIER(atomic_add_return(1, &dummy));
+	KCSAN_CHECK_RW_BARRIER(atomic_add_return_release(1, &dummy));
+	KCSAN_CHECK_RW_BARRIER(atomic_fetch_add(1, &dummy));
+	KCSAN_CHECK_RW_BARRIER(atomic_fetch_add_release(1, &dummy));
+	KCSAN_CHECK_RW_BARRIER(test_and_set_bit(0, &test_var));
+	KCSAN_CHECK_RW_BARRIER(test_and_clear_bit(0, &test_var));
+	KCSAN_CHECK_RW_BARRIER(test_and_change_bit(0, &test_var));
+	KCSAN_CHECK_RW_BARRIER(clear_bit_unlock(0, &test_var));
+	KCSAN_CHECK_RW_BARRIER(__clear_bit_unlock(0, &test_var));
+	KCSAN_CHECK_RW_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var));
+	arch_spin_lock(&arch_spinlock);
+	KCSAN_CHECK_RW_BARRIER(arch_spin_unlock(&arch_spinlock));
+	spin_lock(&spinlock);
+	KCSAN_CHECK_RW_BARRIER(spin_unlock(&spinlock));
+
+	kcsan_nestable_atomic_end();
+
+	return ret;
+}
+
 static int __init kcsan_selftest(void)
 {
 	int passed = 0;
@@ -120,6 +260,7 @@ static int __init kcsan_selftest(void)
 	RUN_TEST(test_requires);
 	RUN_TEST(test_encode_decode);
 	RUN_TEST(test_matching_access);
+	RUN_TEST(test_barrier);
 
 	pr_info("selftest: %d/%d tests passed\n", passed, total);
 	if (passed != total)
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 14/25] locking/barriers, kcsan: Add instrumentation for barriers
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (12 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 13/25] kcsan: selftest: Add test case to check memory barrier instrumentation Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 15/25] locking/barriers, kcsan: Support generic instrumentation Marco Elver
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Adds the required KCSAN instrumentation for barriers if CONFIG_SMP.
KCSAN supports modeling the effects of:

	smp_mb()
	smp_rmb()
	smp_wmb()
	smp_store_release()

Signed-off-by: Marco Elver <elver@google.com>
---
 include/asm-generic/barrier.h | 29 +++++++++++++++--------------
 include/linux/spinlock.h      |  2 +-
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 640f09479bdf..27a9c9edfef6 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -14,6 +14,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/compiler.h>
+#include <linux/kcsan-checks.h>
 #include <asm/rwonce.h>
 
 #ifndef nop
@@ -62,15 +63,15 @@
 #ifdef CONFIG_SMP
 
 #ifndef smp_mb
-#define smp_mb()	__smp_mb()
+#define smp_mb()	do { kcsan_mb(); __smp_mb(); } while (0)
 #endif
 
 #ifndef smp_rmb
-#define smp_rmb()	__smp_rmb()
+#define smp_rmb()	do { kcsan_rmb(); __smp_rmb(); } while (0)
 #endif
 
 #ifndef smp_wmb
-#define smp_wmb()	__smp_wmb()
+#define smp_wmb()	do { kcsan_wmb(); __smp_wmb(); } while (0)
 #endif
 
 #else	/* !CONFIG_SMP */
@@ -123,19 +124,19 @@ do {									\
 #ifdef CONFIG_SMP
 
 #ifndef smp_store_mb
-#define smp_store_mb(var, value)  __smp_store_mb(var, value)
+#define smp_store_mb(var, value)  do { kcsan_mb(); __smp_store_mb(var, value); } while (0)
 #endif
 
 #ifndef smp_mb__before_atomic
-#define smp_mb__before_atomic()	__smp_mb__before_atomic()
+#define smp_mb__before_atomic()	do { kcsan_mb(); __smp_mb__before_atomic(); } while (0)
 #endif
 
 #ifndef smp_mb__after_atomic
-#define smp_mb__after_atomic()	__smp_mb__after_atomic()
+#define smp_mb__after_atomic()	do { kcsan_mb(); __smp_mb__after_atomic(); } while (0)
 #endif
 
 #ifndef smp_store_release
-#define smp_store_release(p, v) __smp_store_release(p, v)
+#define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0)
 #endif
 
 #ifndef smp_load_acquire
@@ -178,13 +179,13 @@ do {									\
 #endif	/* CONFIG_SMP */
 
 /* Barriers for virtual machine guests when talking to an SMP host */
-#define virt_mb() __smp_mb()
-#define virt_rmb() __smp_rmb()
-#define virt_wmb() __smp_wmb()
-#define virt_store_mb(var, value) __smp_store_mb(var, value)
-#define virt_mb__before_atomic() __smp_mb__before_atomic()
-#define virt_mb__after_atomic()	__smp_mb__after_atomic()
-#define virt_store_release(p, v) __smp_store_release(p, v)
+#define virt_mb() do { kcsan_mb(); __smp_mb(); } while (0)
+#define virt_rmb() do { kcsan_rmb(); __smp_rmb(); } while (0)
+#define virt_wmb() do { kcsan_wmb(); __smp_wmb(); } while (0)
+#define virt_store_mb(var, value) do { kcsan_mb(); __smp_store_mb(var, value); } while (0)
+#define virt_mb__before_atomic() do { kcsan_mb(); __smp_mb__before_atomic(); } while (0)
+#define virt_mb__after_atomic()	do { kcsan_mb(); __smp_mb__after_atomic(); } while (0)
+#define virt_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0)
 #define virt_load_acquire(p) __smp_load_acquire(p)
 
 /**
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index b4e5ca23f840..5c0c5174155d 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -171,7 +171,7 @@ do {									\
  * Architectures that can implement ACQUIRE better need to take care.
  */
 #ifndef smp_mb__after_spinlock
-#define smp_mb__after_spinlock()	do { } while (0)
+#define smp_mb__after_spinlock()	kcsan_mb()
 #endif
 
 #ifdef CONFIG_DEBUG_SPINLOCK
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 15/25] locking/barriers, kcsan: Support generic instrumentation
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (13 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 14/25] locking/barriers, kcsan: Add instrumentation for barriers Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 16/25] locking/atomics, kcsan: Add instrumentation for barriers Marco Elver
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Thus far only smp_*() barriers had been defined by asm-generic/barrier.h
based on __smp_*() barriers, because the !SMP case is usually generic.

With the introduction of instrumentation, it also makes sense to have
asm-generic/barrier.h assist in the definition of instrumented versions
of mb(), rmb(), wmb(), dma_rmb(), and dma_wmb().

Because there is no requirement to distinguish the !SMP case, the
definition can be simpler: we can avoid also providing fallbacks for the
__ prefixed cases, and only check if `defined(__<barrier>)`, to finally
define the KCSAN-instrumented versions.

This also allows for the compiler to complain if an architecture
accidentally defines both the normal and __ prefixed variant.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/asm-generic/barrier.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 27a9c9edfef6..02c4339c8eeb 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -21,6 +21,31 @@
 #define nop()	asm volatile ("nop")
 #endif
 
+/*
+ * Architectures that want generic instrumentation can define __ prefixed
+ * variants of all barriers.
+ */
+
+#ifdef __mb
+#define mb()	do { kcsan_mb(); __mb(); } while (0)
+#endif
+
+#ifdef __rmb
+#define rmb()	do { kcsan_rmb(); __rmb(); } while (0)
+#endif
+
+#ifdef __wmb
+#define wmb()	do { kcsan_wmb(); __wmb(); } while (0)
+#endif
+
+#ifdef __dma_rmb
+#define dma_rmb()	do { kcsan_rmb(); __dma_rmb(); } while (0)
+#endif
+
+#ifdef __dma_wmb
+#define dma_wmb()	do { kcsan_wmb(); __dma_wmb(); } while (0)
+#endif
+
 /*
  * Force strict CPU ordering. And yes, this is required on UP too when we're
  * talking to devices.
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 16/25] locking/atomics, kcsan: Add instrumentation for barriers
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (14 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 15/25] locking/barriers, kcsan: Support generic instrumentation Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 17/25] asm-generic/bitops, " Marco Elver
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Adds the required KCSAN instrumentation for barriers of atomics.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/atomic/atomic-instrumented.h | 135 ++++++++++++++++++++-
 scripts/atomic/gen-atomic-instrumented.sh  |  41 +++++--
 2 files changed, 166 insertions(+), 10 deletions(-)

diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index a0f654370da3..5d69b143c28e 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -45,6 +45,7 @@ atomic_set(atomic_t *v, int i)
 static __always_inline void
 atomic_set_release(atomic_t *v, int i)
 {
+	kcsan_release();
 	instrument_atomic_write(v, sizeof(*v));
 	arch_atomic_set_release(v, i);
 }
@@ -59,6 +60,7 @@ atomic_add(int i, atomic_t *v)
 static __always_inline int
 atomic_add_return(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_add_return(i, v);
 }
@@ -73,6 +75,7 @@ atomic_add_return_acquire(int i, atomic_t *v)
 static __always_inline int
 atomic_add_return_release(int i, atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_add_return_release(i, v);
 }
@@ -87,6 +90,7 @@ atomic_add_return_relaxed(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_add(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_add(i, v);
 }
@@ -101,6 +105,7 @@ atomic_fetch_add_acquire(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_add_release(int i, atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_release(i, v);
 }
@@ -122,6 +127,7 @@ atomic_sub(int i, atomic_t *v)
 static __always_inline int
 atomic_sub_return(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_sub_return(i, v);
 }
@@ -136,6 +142,7 @@ atomic_sub_return_acquire(int i, atomic_t *v)
 static __always_inline int
 atomic_sub_return_release(int i, atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_sub_return_release(i, v);
 }
@@ -150,6 +157,7 @@ atomic_sub_return_relaxed(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_sub(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub(i, v);
 }
@@ -164,6 +172,7 @@ atomic_fetch_sub_acquire(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_sub_release(int i, atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_release(i, v);
 }
@@ -185,6 +194,7 @@ atomic_inc(atomic_t *v)
 static __always_inline int
 atomic_inc_return(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_inc_return(v);
 }
@@ -199,6 +209,7 @@ atomic_inc_return_acquire(atomic_t *v)
 static __always_inline int
 atomic_inc_return_release(atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_inc_return_release(v);
 }
@@ -213,6 +224,7 @@ atomic_inc_return_relaxed(atomic_t *v)
 static __always_inline int
 atomic_fetch_inc(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc(v);
 }
@@ -227,6 +239,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
 static __always_inline int
 atomic_fetch_inc_release(atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_release(v);
 }
@@ -248,6 +261,7 @@ atomic_dec(atomic_t *v)
 static __always_inline int
 atomic_dec_return(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_dec_return(v);
 }
@@ -262,6 +276,7 @@ atomic_dec_return_acquire(atomic_t *v)
 static __always_inline int
 atomic_dec_return_release(atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_dec_return_release(v);
 }
@@ -276,6 +291,7 @@ atomic_dec_return_relaxed(atomic_t *v)
 static __always_inline int
 atomic_fetch_dec(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec(v);
 }
@@ -290,6 +306,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
 static __always_inline int
 atomic_fetch_dec_release(atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_release(v);
 }
@@ -311,6 +328,7 @@ atomic_and(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_and(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_and(i, v);
 }
@@ -325,6 +343,7 @@ atomic_fetch_and_acquire(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_and_release(int i, atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_release(i, v);
 }
@@ -346,6 +365,7 @@ atomic_andnot(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_andnot(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot(i, v);
 }
@@ -360,6 +380,7 @@ atomic_fetch_andnot_acquire(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_andnot_release(int i, atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_release(i, v);
 }
@@ -381,6 +402,7 @@ atomic_or(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_or(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_or(i, v);
 }
@@ -395,6 +417,7 @@ atomic_fetch_or_acquire(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_or_release(int i, atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_release(i, v);
 }
@@ -416,6 +439,7 @@ atomic_xor(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_xor(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor(i, v);
 }
@@ -430,6 +454,7 @@ atomic_fetch_xor_acquire(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_xor_release(int i, atomic_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_release(i, v);
 }
@@ -444,6 +469,7 @@ atomic_fetch_xor_relaxed(int i, atomic_t *v)
 static __always_inline int
 atomic_xchg(atomic_t *v, int i)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_xchg(v, i);
 }
@@ -458,6 +484,7 @@ atomic_xchg_acquire(atomic_t *v, int i)
 static __always_inline int
 atomic_xchg_release(atomic_t *v, int i)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_xchg_release(v, i);
 }
@@ -472,6 +499,7 @@ atomic_xchg_relaxed(atomic_t *v, int i)
 static __always_inline int
 atomic_cmpxchg(atomic_t *v, int old, int new)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg(v, old, new);
 }
@@ -486,6 +514,7 @@ atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 static __always_inline int
 atomic_cmpxchg_release(atomic_t *v, int old, int new)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_release(v, old, new);
 }
@@ -500,6 +529,7 @@ atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 static __always_inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	instrument_atomic_read_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg(v, old, new);
@@ -516,6 +546,7 @@ atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 static __always_inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	instrument_atomic_read_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_release(v, old, new);
@@ -532,6 +563,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 static __always_inline bool
 atomic_sub_and_test(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_sub_and_test(i, v);
 }
@@ -539,6 +571,7 @@ atomic_sub_and_test(int i, atomic_t *v)
 static __always_inline bool
 atomic_dec_and_test(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_dec_and_test(v);
 }
@@ -546,6 +579,7 @@ atomic_dec_and_test(atomic_t *v)
 static __always_inline bool
 atomic_inc_and_test(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_inc_and_test(v);
 }
@@ -553,6 +587,7 @@ atomic_inc_and_test(atomic_t *v)
 static __always_inline bool
 atomic_add_negative(int i, atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_add_negative(i, v);
 }
@@ -560,6 +595,7 @@ atomic_add_negative(int i, atomic_t *v)
 static __always_inline int
 atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_unless(v, a, u);
 }
@@ -567,6 +603,7 @@ atomic_fetch_add_unless(atomic_t *v, int a, int u)
 static __always_inline bool
 atomic_add_unless(atomic_t *v, int a, int u)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_add_unless(v, a, u);
 }
@@ -574,6 +611,7 @@ atomic_add_unless(atomic_t *v, int a, int u)
 static __always_inline bool
 atomic_inc_not_zero(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_inc_not_zero(v);
 }
@@ -581,6 +619,7 @@ atomic_inc_not_zero(atomic_t *v)
 static __always_inline bool
 atomic_inc_unless_negative(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_inc_unless_negative(v);
 }
@@ -588,6 +627,7 @@ atomic_inc_unless_negative(atomic_t *v)
 static __always_inline bool
 atomic_dec_unless_positive(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_dec_unless_positive(v);
 }
@@ -595,6 +635,7 @@ atomic_dec_unless_positive(atomic_t *v)
 static __always_inline int
 atomic_dec_if_positive(atomic_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_dec_if_positive(v);
 }
@@ -623,6 +664,7 @@ atomic64_set(atomic64_t *v, s64 i)
 static __always_inline void
 atomic64_set_release(atomic64_t *v, s64 i)
 {
+	kcsan_release();
 	instrument_atomic_write(v, sizeof(*v));
 	arch_atomic64_set_release(v, i);
 }
@@ -637,6 +679,7 @@ atomic64_add(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_add_return(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_add_return(i, v);
 }
@@ -651,6 +694,7 @@ atomic64_add_return_acquire(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_add_return_release(s64 i, atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_add_return_release(i, v);
 }
@@ -665,6 +709,7 @@ atomic64_add_return_relaxed(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_add(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add(i, v);
 }
@@ -679,6 +724,7 @@ atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_add_release(s64 i, atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_release(i, v);
 }
@@ -700,6 +746,7 @@ atomic64_sub(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_sub_return(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_sub_return(i, v);
 }
@@ -714,6 +761,7 @@ atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_sub_return_release(s64 i, atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_release(i, v);
 }
@@ -728,6 +776,7 @@ atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub(i, v);
 }
@@ -742,6 +791,7 @@ atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_release(i, v);
 }
@@ -763,6 +813,7 @@ atomic64_inc(atomic64_t *v)
 static __always_inline s64
 atomic64_inc_return(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_inc_return(v);
 }
@@ -777,6 +828,7 @@ atomic64_inc_return_acquire(atomic64_t *v)
 static __always_inline s64
 atomic64_inc_return_release(atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_release(v);
 }
@@ -791,6 +843,7 @@ atomic64_inc_return_relaxed(atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_inc(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc(v);
 }
@@ -805,6 +858,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_inc_release(atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_release(v);
 }
@@ -826,6 +880,7 @@ atomic64_dec(atomic64_t *v)
 static __always_inline s64
 atomic64_dec_return(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_dec_return(v);
 }
@@ -840,6 +895,7 @@ atomic64_dec_return_acquire(atomic64_t *v)
 static __always_inline s64
 atomic64_dec_return_release(atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_release(v);
 }
@@ -854,6 +910,7 @@ atomic64_dec_return_relaxed(atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_dec(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec(v);
 }
@@ -868,6 +925,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_dec_release(atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_release(v);
 }
@@ -889,6 +947,7 @@ atomic64_and(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_and(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and(i, v);
 }
@@ -903,6 +962,7 @@ atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_and_release(s64 i, atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_release(i, v);
 }
@@ -924,6 +984,7 @@ atomic64_andnot(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot(i, v);
 }
@@ -938,6 +999,7 @@ atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_release(i, v);
 }
@@ -959,6 +1021,7 @@ atomic64_or(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_or(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or(i, v);
 }
@@ -973,6 +1036,7 @@ atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_or_release(s64 i, atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_release(i, v);
 }
@@ -994,6 +1058,7 @@ atomic64_xor(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor(i, v);
 }
@@ -1008,6 +1073,7 @@ atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_release(i, v);
 }
@@ -1022,6 +1088,7 @@ atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_xchg(atomic64_t *v, s64 i)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_xchg(v, i);
 }
@@ -1036,6 +1103,7 @@ atomic64_xchg_acquire(atomic64_t *v, s64 i)
 static __always_inline s64
 atomic64_xchg_release(atomic64_t *v, s64 i)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_xchg_release(v, i);
 }
@@ -1050,6 +1118,7 @@ atomic64_xchg_relaxed(atomic64_t *v, s64 i)
 static __always_inline s64
 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg(v, old, new);
 }
@@ -1064,6 +1133,7 @@ atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 static __always_inline s64
 atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_release(v, old, new);
 }
@@ -1078,6 +1148,7 @@ atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 static __always_inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	instrument_atomic_read_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg(v, old, new);
@@ -1094,6 +1165,7 @@ atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 static __always_inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	instrument_atomic_read_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_release(v, old, new);
@@ -1110,6 +1182,7 @@ atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 static __always_inline bool
 atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_sub_and_test(i, v);
 }
@@ -1117,6 +1190,7 @@ atomic64_sub_and_test(s64 i, atomic64_t *v)
 static __always_inline bool
 atomic64_dec_and_test(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_dec_and_test(v);
 }
@@ -1124,6 +1198,7 @@ atomic64_dec_and_test(atomic64_t *v)
 static __always_inline bool
 atomic64_inc_and_test(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_inc_and_test(v);
 }
@@ -1131,6 +1206,7 @@ atomic64_inc_and_test(atomic64_t *v)
 static __always_inline bool
 atomic64_add_negative(s64 i, atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_add_negative(i, v);
 }
@@ -1138,6 +1214,7 @@ atomic64_add_negative(s64 i, atomic64_t *v)
 static __always_inline s64
 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_unless(v, a, u);
 }
@@ -1145,6 +1222,7 @@ atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 static __always_inline bool
 atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_add_unless(v, a, u);
 }
@@ -1152,6 +1230,7 @@ atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 static __always_inline bool
 atomic64_inc_not_zero(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_inc_not_zero(v);
 }
@@ -1159,6 +1238,7 @@ atomic64_inc_not_zero(atomic64_t *v)
 static __always_inline bool
 atomic64_inc_unless_negative(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_inc_unless_negative(v);
 }
@@ -1166,6 +1246,7 @@ atomic64_inc_unless_negative(atomic64_t *v)
 static __always_inline bool
 atomic64_dec_unless_positive(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_dec_unless_positive(v);
 }
@@ -1173,6 +1254,7 @@ atomic64_dec_unless_positive(atomic64_t *v)
 static __always_inline s64
 atomic64_dec_if_positive(atomic64_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic64_dec_if_positive(v);
 }
@@ -1201,6 +1283,7 @@ atomic_long_set(atomic_long_t *v, long i)
 static __always_inline void
 atomic_long_set_release(atomic_long_t *v, long i)
 {
+	kcsan_release();
 	instrument_atomic_write(v, sizeof(*v));
 	arch_atomic_long_set_release(v, i);
 }
@@ -1215,6 +1298,7 @@ atomic_long_add(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_add_return(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_add_return(i, v);
 }
@@ -1229,6 +1313,7 @@ atomic_long_add_return_acquire(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_add_return_release(long i, atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_add_return_release(i, v);
 }
@@ -1243,6 +1328,7 @@ atomic_long_add_return_relaxed(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_add(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_add(i, v);
 }
@@ -1257,6 +1343,7 @@ atomic_long_fetch_add_acquire(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_add_release(long i, atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_add_release(i, v);
 }
@@ -1278,6 +1365,7 @@ atomic_long_sub(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_sub_return(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_sub_return(i, v);
 }
@@ -1292,6 +1380,7 @@ atomic_long_sub_return_acquire(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_sub_return_release(long i, atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_sub_return_release(i, v);
 }
@@ -1306,6 +1395,7 @@ atomic_long_sub_return_relaxed(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_sub(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_sub(i, v);
 }
@@ -1320,6 +1410,7 @@ atomic_long_fetch_sub_acquire(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_sub_release(long i, atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_sub_release(i, v);
 }
@@ -1341,6 +1432,7 @@ atomic_long_inc(atomic_long_t *v)
 static __always_inline long
 atomic_long_inc_return(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_inc_return(v);
 }
@@ -1355,6 +1447,7 @@ atomic_long_inc_return_acquire(atomic_long_t *v)
 static __always_inline long
 atomic_long_inc_return_release(atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_inc_return_release(v);
 }
@@ -1369,6 +1462,7 @@ atomic_long_inc_return_relaxed(atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_inc(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_inc(v);
 }
@@ -1383,6 +1477,7 @@ atomic_long_fetch_inc_acquire(atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_inc_release(atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_inc_release(v);
 }
@@ -1404,6 +1499,7 @@ atomic_long_dec(atomic_long_t *v)
 static __always_inline long
 atomic_long_dec_return(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_dec_return(v);
 }
@@ -1418,6 +1514,7 @@ atomic_long_dec_return_acquire(atomic_long_t *v)
 static __always_inline long
 atomic_long_dec_return_release(atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_dec_return_release(v);
 }
@@ -1432,6 +1529,7 @@ atomic_long_dec_return_relaxed(atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_dec(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_dec(v);
 }
@@ -1446,6 +1544,7 @@ atomic_long_fetch_dec_acquire(atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_dec_release(atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_dec_release(v);
 }
@@ -1467,6 +1566,7 @@ atomic_long_and(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_and(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_and(i, v);
 }
@@ -1481,6 +1581,7 @@ atomic_long_fetch_and_acquire(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_and_release(long i, atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_and_release(i, v);
 }
@@ -1502,6 +1603,7 @@ atomic_long_andnot(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_andnot(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_andnot(i, v);
 }
@@ -1516,6 +1618,7 @@ atomic_long_fetch_andnot_acquire(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_andnot_release(long i, atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_andnot_release(i, v);
 }
@@ -1537,6 +1640,7 @@ atomic_long_or(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_or(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_or(i, v);
 }
@@ -1551,6 +1655,7 @@ atomic_long_fetch_or_acquire(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_or_release(long i, atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_or_release(i, v);
 }
@@ -1572,6 +1677,7 @@ atomic_long_xor(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_xor(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_xor(i, v);
 }
@@ -1586,6 +1692,7 @@ atomic_long_fetch_xor_acquire(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_xor_release(long i, atomic_long_t *v)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_xor_release(i, v);
 }
@@ -1600,6 +1707,7 @@ atomic_long_fetch_xor_relaxed(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_xchg(atomic_long_t *v, long i)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_xchg(v, i);
 }
@@ -1614,6 +1722,7 @@ atomic_long_xchg_acquire(atomic_long_t *v, long i)
 static __always_inline long
 atomic_long_xchg_release(atomic_long_t *v, long i)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_xchg_release(v, i);
 }
@@ -1628,6 +1737,7 @@ atomic_long_xchg_relaxed(atomic_long_t *v, long i)
 static __always_inline long
 atomic_long_cmpxchg(atomic_long_t *v, long old, long new)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_cmpxchg(v, old, new);
 }
@@ -1642,6 +1752,7 @@ atomic_long_cmpxchg_acquire(atomic_long_t *v, long old, long new)
 static __always_inline long
 atomic_long_cmpxchg_release(atomic_long_t *v, long old, long new)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_cmpxchg_release(v, old, new);
 }
@@ -1656,6 +1767,7 @@ atomic_long_cmpxchg_relaxed(atomic_long_t *v, long old, long new)
 static __always_inline bool
 atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	instrument_atomic_read_write(old, sizeof(*old));
 	return arch_atomic_long_try_cmpxchg(v, old, new);
@@ -1672,6 +1784,7 @@ atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
 static __always_inline bool
 atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
 {
+	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
 	instrument_atomic_read_write(old, sizeof(*old));
 	return arch_atomic_long_try_cmpxchg_release(v, old, new);
@@ -1688,6 +1801,7 @@ atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
 static __always_inline bool
 atomic_long_sub_and_test(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_sub_and_test(i, v);
 }
@@ -1695,6 +1809,7 @@ atomic_long_sub_and_test(long i, atomic_long_t *v)
 static __always_inline bool
 atomic_long_dec_and_test(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_dec_and_test(v);
 }
@@ -1702,6 +1817,7 @@ atomic_long_dec_and_test(atomic_long_t *v)
 static __always_inline bool
 atomic_long_inc_and_test(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_inc_and_test(v);
 }
@@ -1709,6 +1825,7 @@ atomic_long_inc_and_test(atomic_long_t *v)
 static __always_inline bool
 atomic_long_add_negative(long i, atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_add_negative(i, v);
 }
@@ -1716,6 +1833,7 @@ atomic_long_add_negative(long i, atomic_long_t *v)
 static __always_inline long
 atomic_long_fetch_add_unless(atomic_long_t *v, long a, long u)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_fetch_add_unless(v, a, u);
 }
@@ -1723,6 +1841,7 @@ atomic_long_fetch_add_unless(atomic_long_t *v, long a, long u)
 static __always_inline bool
 atomic_long_add_unless(atomic_long_t *v, long a, long u)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_add_unless(v, a, u);
 }
@@ -1730,6 +1849,7 @@ atomic_long_add_unless(atomic_long_t *v, long a, long u)
 static __always_inline bool
 atomic_long_inc_not_zero(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_inc_not_zero(v);
 }
@@ -1737,6 +1857,7 @@ atomic_long_inc_not_zero(atomic_long_t *v)
 static __always_inline bool
 atomic_long_inc_unless_negative(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_inc_unless_negative(v);
 }
@@ -1744,6 +1865,7 @@ atomic_long_inc_unless_negative(atomic_long_t *v)
 static __always_inline bool
 atomic_long_dec_unless_positive(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_dec_unless_positive(v);
 }
@@ -1751,6 +1873,7 @@ atomic_long_dec_unless_positive(atomic_long_t *v)
 static __always_inline long
 atomic_long_dec_if_positive(atomic_long_t *v)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
 	return arch_atomic_long_dec_if_positive(v);
 }
@@ -1758,6 +1881,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 #define xchg(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
+	kcsan_mb(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_xchg(__ai_ptr, __VA_ARGS__); \
 })
@@ -1772,6 +1896,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 #define xchg_release(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
+	kcsan_release(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_xchg_release(__ai_ptr, __VA_ARGS__); \
 })
@@ -1786,6 +1911,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 #define cmpxchg(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
+	kcsan_mb(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
@@ -1800,6 +1926,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 #define cmpxchg_release(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
+	kcsan_release(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
 })
@@ -1814,6 +1941,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 #define cmpxchg64(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
+	kcsan_mb(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
 })
@@ -1828,6 +1956,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 #define cmpxchg64_release(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
+	kcsan_release(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
 })
@@ -1843,6 +1972,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
 	typeof(oldp) __ai_oldp = (oldp); \
+	kcsan_mb(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
 	arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
@@ -1861,6 +1991,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
 	typeof(oldp) __ai_oldp = (oldp); \
+	kcsan_release(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
 	arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
@@ -1892,6 +2023,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 #define sync_cmpxchg(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
+	kcsan_mb(); \
 	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
 	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
@@ -1899,6 +2031,7 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 #define cmpxchg_double(ptr, ...) \
 ({ \
 	typeof(ptr) __ai_ptr = (ptr); \
+	kcsan_mb(); \
 	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
 	arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
 })
@@ -1912,4 +2045,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 })
 
 #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 2a9553f0a9d5619f19151092df5cabbbf16ce835
+// 87c974b93032afd42143613434d1a7788fa598f9
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 035ceb4ee85c..68f902731d01 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -34,6 +34,14 @@ gen_param_check()
 gen_params_checks()
 {
 	local meta="$1"; shift
+	local order="$1"; shift
+
+	if [ "${order}" = "_release" ]; then
+		printf "\tkcsan_release();\n"
+	elif [ -z "${order}" ] && ! meta_in "$meta" "slv"; then
+		# RMW with return value is fully ordered
+		printf "\tkcsan_mb();\n"
+	fi
 
 	while [ "$#" -gt 0 ]; do
 		gen_param_check "$meta" "$1"
@@ -56,7 +64,7 @@ gen_proto_order_variant()
 
 	local ret="$(gen_ret_type "${meta}" "${int}")"
 	local params="$(gen_params "${int}" "${atomic}" "$@")"
-	local checks="$(gen_params_checks "${meta}" "$@")"
+	local checks="$(gen_params_checks "${meta}" "${order}" "$@")"
 	local args="$(gen_args "$@")"
 	local retstmt="$(gen_ret_stmt "${meta}")"
 
@@ -75,29 +83,44 @@ EOF
 gen_xchg()
 {
 	local xchg="$1"; shift
+	local order="$1"; shift
 	local mult="$1"; shift
 
+	kcsan_barrier=""
+	if [ "${xchg%_local}" = "${xchg}" ]; then
+		case "$order" in
+		_release)	kcsan_barrier="kcsan_release()" ;;
+		"")			kcsan_barrier="kcsan_mb()" ;;
+		esac
+	fi
+
 	if [ "${xchg%${xchg#try_cmpxchg}}" = "try_cmpxchg" ] ; then
 
 cat <<EOF
-#define ${xchg}(ptr, oldp, ...) \\
+#define ${xchg}${order}(ptr, oldp, ...) \\
 ({ \\
 	typeof(ptr) __ai_ptr = (ptr); \\
 	typeof(oldp) __ai_oldp = (oldp); \\
+EOF
+[ -n "$kcsan_barrier" ] && printf "\t${kcsan_barrier}; \\\\\n"
+cat <<EOF
 	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
 	instrument_atomic_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\
-	arch_${xchg}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
+	arch_${xchg}${order}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
 })
 EOF
 
 	else
 
 cat <<EOF
-#define ${xchg}(ptr, ...) \\
+#define ${xchg}${order}(ptr, ...) \\
 ({ \\
 	typeof(ptr) __ai_ptr = (ptr); \\
+EOF
+[ -n "$kcsan_barrier" ] && printf "\t${kcsan_barrier}; \\\\\n"
+cat <<EOF
 	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
-	arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
+	arch_${xchg}${order}(__ai_ptr, __VA_ARGS__); \\
 })
 EOF
 
@@ -145,21 +168,21 @@ done
 
 for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg"; do
 	for order in "" "_acquire" "_release" "_relaxed"; do
-		gen_xchg "${xchg}${order}" ""
+		gen_xchg "${xchg}" "${order}" ""
 		printf "\n"
 	done
 done
 
 for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
-	gen_xchg "${xchg}" ""
+	gen_xchg "${xchg}" "" ""
 	printf "\n"
 done
 
-gen_xchg "cmpxchg_double" "2 * "
+gen_xchg "cmpxchg_double" "" "2 * "
 
 printf "\n\n"
 
-gen_xchg "cmpxchg_double_local" "2 * "
+gen_xchg "cmpxchg_double_local" "" "2 * "
 
 cat <<EOF
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 17/25] asm-generic/bitops, kcsan: Add instrumentation for barriers
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (15 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 16/25] locking/atomics, kcsan: Add instrumentation for barriers Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 18/25] x86/barriers, kcsan: Use generic instrumentation for non-smp barriers Marco Elver
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Adds the required KCSAN instrumentation for barriers of atomic bitops.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/asm-generic/bitops/instrumented-atomic.h | 3 +++
 include/asm-generic/bitops/instrumented-lock.h   | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/include/asm-generic/bitops/instrumented-atomic.h b/include/asm-generic/bitops/instrumented-atomic.h
index 81915dcd4b4e..c90192b1c755 100644
--- a/include/asm-generic/bitops/instrumented-atomic.h
+++ b/include/asm-generic/bitops/instrumented-atomic.h
@@ -67,6 +67,7 @@ static inline void change_bit(long nr, volatile unsigned long *addr)
  */
 static inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_set_bit(nr, addr);
 }
@@ -80,6 +81,7 @@ static inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
  */
 static inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_clear_bit(nr, addr);
 }
@@ -93,6 +95,7 @@ static inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
  */
 static inline bool test_and_change_bit(long nr, volatile unsigned long *addr)
 {
+	kcsan_mb();
 	instrument_atomic_read_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_change_bit(nr, addr);
 }
diff --git a/include/asm-generic/bitops/instrumented-lock.h b/include/asm-generic/bitops/instrumented-lock.h
index 75ef606f7145..eb64bd4f11f3 100644
--- a/include/asm-generic/bitops/instrumented-lock.h
+++ b/include/asm-generic/bitops/instrumented-lock.h
@@ -22,6 +22,7 @@
  */
 static inline void clear_bit_unlock(long nr, volatile unsigned long *addr)
 {
+	kcsan_release();
 	instrument_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_clear_bit_unlock(nr, addr);
 }
@@ -37,6 +38,7 @@ static inline void clear_bit_unlock(long nr, volatile unsigned long *addr)
  */
 static inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
 {
+	kcsan_release();
 	instrument_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit_unlock(nr, addr);
 }
@@ -71,6 +73,7 @@ static inline bool test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 static inline bool
 clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
 {
+	kcsan_release();
 	instrument_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_clear_bit_unlock_is_negative_byte(nr, addr);
 }
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 18/25] x86/barriers, kcsan: Use generic instrumentation for non-smp barriers
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (16 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 17/25] asm-generic/bitops, " Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 19/25] x86/qspinlock, kcsan: Instrument barrier of pv_queued_spin_unlock() Marco Elver
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Prefix all barriers with __, now that asm-generic/barriers.h supports
defining the final instrumented version of these barriers. The change is
limited to barriers used by x86-64.

Signed-off-by: Marco Elver <elver@google.com>
---
 arch/x86/include/asm/barrier.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 3ba772a69cc8..35389b2af88e 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -19,9 +19,9 @@
 #define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \
 				       X86_FEATURE_XMM2) ::: "memory", "cc")
 #else
-#define mb() 	asm volatile("mfence":::"memory")
-#define rmb()	asm volatile("lfence":::"memory")
-#define wmb()	asm volatile("sfence" ::: "memory")
+#define __mb()	asm volatile("mfence":::"memory")
+#define __rmb()	asm volatile("lfence":::"memory")
+#define __wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
 /**
@@ -51,8 +51,8 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 /* Prevent speculative execution past this barrier. */
 #define barrier_nospec() alternative("", "lfence", X86_FEATURE_LFENCE_RDTSC)
 
-#define dma_rmb()	barrier()
-#define dma_wmb()	barrier()
+#define __dma_rmb()	barrier()
+#define __dma_wmb()	barrier()
 
 #define __smp_mb()	asm volatile("lock; addl $0,-4(%%" _ASM_SP ")" ::: "memory", "cc")
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 19/25] x86/qspinlock, kcsan: Instrument barrier of pv_queued_spin_unlock()
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (17 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 18/25] x86/barriers, kcsan: Use generic instrumentation for non-smp barriers Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 20/25] mm, kcsan: Enable barrier instrumentation Marco Elver
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

If CONFIG_PARAVIRT_SPINLOCKS=y, queued_spin_unlock() is implemented
using pv_queued_spin_unlock() which is entirely inline asm based. As
such, we do not receive any KCSAN barrier instrumentation via regular
atomic operations.

Add the missing KCSAN barrier instrumentation for the
CONFIG_PARAVIRT_SPINLOCKS case.

Signed-off-by: Marco Elver <elver@google.com>
---
 arch/x86/include/asm/qspinlock.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index d86ab942219c..d87451df480b 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -53,6 +53,7 @@ static inline void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 
 static inline void queued_spin_unlock(struct qspinlock *lock)
 {
+	kcsan_release();
 	pv_queued_spin_unlock(lock);
 }
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 20/25] mm, kcsan: Enable barrier instrumentation
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (18 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 19/25] x86/qspinlock, kcsan: Instrument barrier of pv_queued_spin_unlock() Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 21/25] sched, kcsan: Enable memory " Marco Elver
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Some memory management calls imply memory barriers that are required to
avoid false positives. For example, without the correct instrumentation,
we could observe data races of the following variant:

                   T0           |           T1
        ------------------------+------------------------
                                |
         *a = 42;    ---+       |
         kfree(a);      |       |
                        |       | b = kmalloc(..); // b == a
          <reordered> <-+       | *b = 42;         // not a data race!
                                |

Therefore, instrument memory barriers in all allocator code currently
not being instrumented in a default build.

Signed-off-by: Marco Elver <elver@google.com>
---
 mm/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/Makefile b/mm/Makefile
index d6c0042e3aa0..7919cd7f13f2 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -15,6 +15,8 @@ KCSAN_SANITIZE_slab_common.o := n
 KCSAN_SANITIZE_slab.o := n
 KCSAN_SANITIZE_slub.o := n
 KCSAN_SANITIZE_page_alloc.o := n
+# But enable explicit instrumentation for memory barriers.
+KCSAN_INSTRUMENT_BARRIERS := y
 
 # These files are disabled because they produce non-interesting and/or
 # flaky coverage that is not a function of syscall inputs. E.g. slab is out of
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 21/25] sched, kcsan: Enable memory barrier instrumentation
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (19 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 20/25] mm, kcsan: Enable barrier instrumentation Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 22/25] objtool, kcsan: Add memory barrier instrumentation to whitelist Marco Elver
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

There's no fundamental reason to disable KCSAN for scheduler code,
except for excessive noise and performance concerns (instrumenting
scheduler code is usually a good way to stress test KCSAN itself).

However, several core sched functions imply memory barriers that are
invisible to KCSAN without instrumentation, but are required to avoid
false positives. Therefore, unconditionally enable instrumentation of
memory barriers in scheduler code. Also update the comment to reflect
this and be a bit more brief.

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/sched/Makefile | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index c7421f2d05e1..c83b37af155b 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -11,11 +11,10 @@ ccflags-y += $(call cc-disable-warning, unused-but-set-variable)
 # that is not a function of syscall inputs. E.g. involuntary context switches.
 KCOV_INSTRUMENT := n
 
-# There are numerous data races here, however, most of them are due to plain accesses.
-# This would make it even harder for syzbot to find reproducers, because these
-# bugs trigger without specific input. Disable by default, but should re-enable
-# eventually.
+# Disable KCSAN to avoid excessive noise and performance degradation. To avoid
+# false positives ensure barriers implied by sched functions are instrumented.
 KCSAN_SANITIZE := n
+KCSAN_INSTRUMENT_BARRIERS := y
 
 ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 22/25] objtool, kcsan: Add memory barrier instrumentation to whitelist
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (20 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 21/25] sched, kcsan: Enable memory " Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 23/25] objtool, kcsan: Remove memory barrier instrumentation from noinstr Marco Elver
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Adds KCSAN's memory barrier instrumentation to objtool's uaccess
whitelist.

Signed-off-by: Marco Elver <elver@google.com>
---
 tools/objtool/check.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 21735829b860..61dfb66b30b6 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -849,6 +849,10 @@ static const char *uaccess_safe_builtin[] = {
 	"__asan_report_store16_noabort",
 	/* KCSAN */
 	"__kcsan_check_access",
+	"__kcsan_mb",
+	"__kcsan_wmb",
+	"__kcsan_rmb",
+	"__kcsan_release",
 	"kcsan_found_watchpoint",
 	"kcsan_setup_watchpoint",
 	"kcsan_check_scoped_accesses",
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 23/25] objtool, kcsan: Remove memory barrier instrumentation from noinstr
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (21 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 22/25] objtool, kcsan: Add memory barrier instrumentation to whitelist Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 24/25] compiler_attributes.h: Add __disable_sanitizer_instrumentation Marco Elver
  2021-11-30 11:44 ` [PATCH v3 25/25] kcsan: Support WEAK_MEMORY with Clang where no objtool support exists Marco Elver
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86, Josh Poimboeuf

Teach objtool to turn instrumentation required for memory barrier
modeling into nops in noinstr text.

The __tsan_func_entry/exit calls are still emitted by compilers even
with the __no_sanitize_thread attribute. The memory barrier
instrumentation will be inserted explicitly (without compiler help), and
thus needs to also explicitly be removed.

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
v3:
* s/removable_instr/profiling_func/ (suggested by Josh Poimboeuf)
* s/__kcsan_(mb|wmb|rmb|release)/__atomic_signal_fence/, because
  Clang < 14.0 will still emit these in noinstr even with __no_kcsan.
* Fix and add more comments.

v2:
* Rewrite after rebase to v5.16-rc1.
---
 tools/objtool/check.c               | 37 ++++++++++++++++++++++++-----
 tools/objtool/include/objtool/elf.h |  2 +-
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 61dfb66b30b6..a9a1f7259d62 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1072,11 +1072,11 @@ static void annotate_call_site(struct objtool_file *file,
 	}
 
 	/*
-	 * Many compilers cannot disable KCOV with a function attribute
-	 * so they need a little help, NOP out any KCOV calls from noinstr
-	 * text.
+	 * Many compilers cannot disable KCOV or sanitizer calls with a function
+	 * attribute so they need a little help, NOP out any such calls from
+	 * noinstr text.
 	 */
-	if (insn->sec->noinstr && sym->kcov) {
+	if (insn->sec->noinstr && sym->profiling_func) {
 		if (reloc) {
 			reloc->type = R_NONE;
 			elf_write_reloc(file->elf, reloc);
@@ -1991,6 +1991,31 @@ static int read_intra_function_calls(struct objtool_file *file)
 	return 0;
 }
 
+/*
+ * Return true if name matches an instrumentation function, where calls to that
+ * function from noinstr code can safely be removed, but compilers won't do so.
+ */
+static bool is_profiling_func(const char *name)
+{
+	/*
+	 * Many compilers cannot disable KCOV with a function attribute.
+	 */
+	if (!strncmp(name, "__sanitizer_cov_", 16))
+		return true;
+
+	/*
+	 * Some compilers currently do not remove __tsan_func_entry/exit nor
+	 * __tsan_atomic_signal_fence (used for barrier instrumentation) with
+	 * the __no_sanitize_thread attribute, remove them. Once the kernel's
+	 * minimum Clang version is 14.0, this can be removed.
+	 */
+	if (!strncmp(name, "__tsan_func_", 12) ||
+	    !strcmp(name, "__tsan_atomic_signal_fence"))
+		return true;
+
+	return false;
+}
+
 static int classify_symbols(struct objtool_file *file)
 {
 	struct section *sec;
@@ -2011,8 +2036,8 @@ static int classify_symbols(struct objtool_file *file)
 			if (!strcmp(func->name, "__fentry__"))
 				func->fentry = true;
 
-			if (!strncmp(func->name, "__sanitizer_cov_", 16))
-				func->kcov = true;
+			if (is_profiling_func(func->name))
+				func->profiling_func = true;
 		}
 	}
 
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index cdc739fa9a6f..d22336781401 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -58,7 +58,7 @@ struct symbol {
 	u8 static_call_tramp : 1;
 	u8 retpoline_thunk   : 1;
 	u8 fentry            : 1;
-	u8 kcov              : 1;
+	u8 profiling_func    : 1;
 	struct list_head pv_target;
 };
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 24/25] compiler_attributes.h: Add __disable_sanitizer_instrumentation
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (22 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 23/25] objtool, kcsan: Remove memory barrier instrumentation from noinstr Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  2021-11-30 11:44 ` [PATCH v3 25/25] kcsan: Support WEAK_MEMORY with Clang where no objtool support exists Marco Elver
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

From: Alexander Potapenko <glider@google.com>

The new attribute maps to
__attribute__((disable_sanitizer_instrumentation)), which will be
supported by Clang >= 14.0. Future support in GCC is also possible.

This attribute disables compiler instrumentation for kernel sanitizer
tools, making it easier to implement noinstr. It is different from the
existing __no_sanitize* attributes, which may still allow certain types
of instrumentation to prevent false positives.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* New patch.
---
 include/linux/compiler_attributes.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index b9121afd8733..37e260020221 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -308,6 +308,24 @@
 # define __compiletime_warning(msg)
 #endif
 
+/*
+ * Optional: only supported since clang >= 14.0
+ *
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#disable-sanitizer-instrumentation
+ *
+ * disable_sanitizer_instrumentation is not always similar to
+ * no_sanitize((<sanitizer-name>)): the latter may still let specific sanitizers
+ * insert code into functions to prevent false positives. Unlike that,
+ * disable_sanitizer_instrumentation prevents all kinds of instrumentation to
+ * functions with the attribute.
+ */
+#if __has_attribute(disable_sanitizer_instrumentation)
+# define __disable_sanitizer_instrumentation \
+	 __attribute__((disable_sanitizer_instrumentation))
+#else
+# define __disable_sanitizer_instrumentation
+#endif
+
 /*
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-weak-function-attribute
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-weak-variable-attribute
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 25/25] kcsan: Support WEAK_MEMORY with Clang where no objtool support exists
  2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
                   ` (23 preceding siblings ...)
  2021-11-30 11:44 ` [PATCH v3 24/25] compiler_attributes.h: Add __disable_sanitizer_instrumentation Marco Elver
@ 2021-11-30 11:44 ` Marco Elver
  24 siblings, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-11-30 11:44 UTC (permalink / raw)
  To: elver, Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

Clang and GCC behave a little differently when it comes to the
__no_sanitize_thread attribute, which has valid reasons, and depending
on context either one could be right.

Traditionally, user space ThreadSanitizer [1] still expects instrumented
builtin atomics (to avoid false positives) and __tsan_func_{entry,exit}
(to generate meaningful stack traces), even if the function has the
attribute no_sanitize("thread").

[1] https://clang.llvm.org/docs/ThreadSanitizer.html#attribute-no-sanitize-thread

GCC doesn't follow the same policy (for better or worse), and removes
all kinds of instrumentation if no_sanitize is added. Arguably, since
this may be a problem for user space ThreadSanitizer, we expect this may
change in future.

Since KCSAN != ThreadSanitizer, the likelihood of false positives even
without barrier instrumentation everywhere, is much lower by design.

At least for Clang, however, to fully remove all sanitizer
instrumentation, we must add the disable_sanitizer_instrumentation
attribute, which is available since Clang 14.0.

Signed-off-by: Marco Elver <elver@google.com>
---
v3:
* New patch.
---
 include/linux/compiler_types.h | 13 ++++++++++++-
 lib/Kconfig.kcsan              |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 1d32f4c03c9e..3c1795fdb568 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -198,9 +198,20 @@ struct ftrace_likely_data {
 # define __no_kasan_or_inline __always_inline
 #endif
 
-#define __no_kcsan __no_sanitize_thread
 #ifdef __SANITIZE_THREAD__
+/*
+ * Clang still emits instrumentation for __tsan_func_{entry,exit}() and builtin
+ * atomics even with __no_sanitize_thread (to avoid false positives in userspace
+ * ThreadSanitizer). The kernel's requirements are stricter and we really do not
+ * want any instrumentation with __no_kcsan.
+ *
+ * Therefore we add __disable_sanitizer_instrumentation where available to
+ * disable all instrumentation. See Kconfig.kcsan where this is mandatory.
+ */
+# define __no_kcsan __no_sanitize_thread __disable_sanitizer_instrumentation
 # define __no_sanitize_or_inline __no_kcsan notrace __maybe_unused
+#else
+# define __no_kcsan
 #endif
 
 #ifndef __no_sanitize_or_inline
diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
index e4394ea8068b..63b70b8c5551 100644
--- a/lib/Kconfig.kcsan
+++ b/lib/Kconfig.kcsan
@@ -198,7 +198,7 @@ config KCSAN_WEAK_MEMORY
 	# We can either let objtool nop __tsan_func_{entry,exit}() and builtin
 	# atomics instrumentation in .noinstr.text, or use a compiler that can
 	# implement __no_kcsan to really remove all instrumentation.
-	depends on STACK_VALIDATION || CC_IS_GCC
+	depends on STACK_VALIDATION || CC_IS_GCC || CLANG_VERSION >= 140000
 	help
 	  Enable support for modeling a subset of weak memory, which allows
 	  detecting a subset of data races due to missing memory barriers.
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling
  2021-11-30 11:44 ` [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling Marco Elver
@ 2021-12-03  8:56   ` Marco Elver
  2021-12-03 16:50     ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Marco Elver @ 2021-12-03  8:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

On Tue, Nov 30, 2021 at 12:44PM +0100, Marco Elver wrote:
[...]
> v3:
> * Remove kcsan_noinstr hackery, since we now try to avoid adding any
>   instrumentation to .noinstr.text in the first place.
[...]

I missed some cleanups after changes from v2 to v3 -- the below cleanup
is missing.

Full replacement patch attached.

Thanks,
-- Marco

------ >8 ------

diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 2254cb75cbb0..916060913966 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -12,7 +12,6 @@
 #include <linux/delay.h>
 #include <linux/export.h>
 #include <linux/init.h>
-#include <linux/instrumentation.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/moduleparam.h>
@@ -21,8 +20,6 @@
 #include <linux/sched.h>
 #include <linux/uaccess.h>
 
-#include <asm/sections.h>
-
 #include "encoding.h"
 #include "kcsan.h"
 #include "permissive.h"
@@ -1086,9 +1083,7 @@ noinline void __tsan_func_entry(void *call_pc)
 	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
 		return;
 
-	instrumentation_begin();
 	add_kcsan_stack_depth(1);
-	instrumentation_end();
 }
 EXPORT_SYMBOL(__tsan_func_entry);
 
@@ -1100,7 +1095,6 @@ noinline void __tsan_func_exit(void)
 	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
 		return;
 
-	instrumentation_begin();
 	reorder_access = get_reorder_access(get_ctx());
 	if (!reorder_access)
 		goto out;
@@ -1120,7 +1114,6 @@ noinline void __tsan_func_exit(void)
 	}
 out:
 	add_kcsan_stack_depth(-1);
-	instrumentation_end();
 }
 EXPORT_SYMBOL(__tsan_func_exit);
 

[-- Attachment #2: v4-0004-kcsan-Add-core-support-for-a-subset-of-weak-memor.patch --]
[-- Type: text/x-diff, Size: 17002 bytes --]

From 7ac337afb7bec3cc5c5bd5e4155b08bdb554bc7d Mon Sep 17 00:00:00 2001
From: Marco Elver <elver@google.com>
Date: Thu, 5 Aug 2021 14:57:45 +0200
Subject: [PATCH v4 04/25] kcsan: Add core support for a subset of weak memory
 modeling

Add support for modeling a subset of weak memory, which will enable
detection of a subset of data races due to missing memory barriers.

KCSAN's approach to detecting missing memory barriers is based on
modeling access reordering, and enabled if `CONFIG_KCSAN_WEAK_MEMORY=y`,
which depends on `CONFIG_KCSAN_STRICT=y`. The feature can be enabled or
disabled at boot and runtime via the `kcsan.weak_memory` boot parameter.

Each memory access for which a watchpoint is set up, is also selected
for simulated reordering within the scope of its function (at most 1
in-flight access).

We are limited to modeling the effects of "buffering" (delaying the
access), since the runtime cannot "prefetch" accesses (therefore no
acquire modeling). Once an access has been selected for reordering, it
is checked along every other access until the end of the function scope.
If an appropriate memory barrier is encountered, the access will no
longer be considered for reordering.

When the result of a memory operation should be ordered by a barrier,
KCSAN can then detect data races where the conflict only occurs as a
result of a missing barrier due to reordering accesses.

Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
---
v4:
* Remove redundant instrumentation_begin/end() now that kcsan_noinstr no
  longer exists.

v3:
* Remove kcsan_noinstr hackery, since we now try to avoid adding any
  instrumentation to .noinstr.text in the first place.
* Restrict config WEAK_MEMORY to only be enabled with tooling where
  we actually remove instrumentation from noinstr.
* Don't define kcsan_weak_memory bool if !KCSAN_WEAK_MEMORY.

v2:
* Define kcsan_noinstr as noinline if we rely on objtool nop'ing out
  calls, to avoid things like LTO inlining it.
---
 include/linux/kcsan-checks.h |  10 +-
 include/linux/kcsan.h        |  10 +-
 include/linux/sched.h        |   3 +
 kernel/kcsan/core.c          | 202 ++++++++++++++++++++++++++++++++---
 lib/Kconfig.kcsan            |  20 ++++
 scripts/Makefile.kcsan       |   9 +-
 6 files changed, 235 insertions(+), 19 deletions(-)

diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
index 5f5965246877..a1c6a89fde71 100644
--- a/include/linux/kcsan-checks.h
+++ b/include/linux/kcsan-checks.h
@@ -99,7 +99,15 @@ void kcsan_set_access_mask(unsigned long mask);
 
 /* Scoped access information. */
 struct kcsan_scoped_access {
-	struct list_head list;
+	union {
+		struct list_head list; /* scoped_accesses list */
+		/*
+		 * Not an entry in scoped_accesses list; stack depth from where
+		 * the access was initialized.
+		 */
+		int stack_depth;
+	};
+
 	/* Access information. */
 	const volatile void *ptr;
 	size_t size;
diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
index 13cef3458fed..c07c71f5ba4f 100644
--- a/include/linux/kcsan.h
+++ b/include/linux/kcsan.h
@@ -49,8 +49,16 @@ struct kcsan_ctx {
 	 */
 	unsigned long access_mask;
 
-	/* List of scoped accesses. */
+	/* List of scoped accesses; likely to be empty. */
 	struct list_head scoped_accesses;
+
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	/*
+	 * Scoped access for modeling access reordering to detect missing memory
+	 * barriers; only keep 1 to keep fast-path complexity manageable.
+	 */
+	struct kcsan_scoped_access reorder_access;
+#endif
 };
 
 /**
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec..0cd40b010487 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1339,6 +1339,9 @@ struct task_struct {
 #ifdef CONFIG_TRACE_IRQFLAGS
 	struct irqtrace_events		kcsan_save_irqtrace;
 #endif
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	int				kcsan_stack_depth;
+#endif
 #endif
 
 #if IS_ENABLED(CONFIG_KUNIT)
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index bd359f8ee63a..481f8a524089 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -40,6 +40,13 @@ module_param_named(udelay_interrupt, kcsan_udelay_interrupt, uint, 0644);
 module_param_named(skip_watch, kcsan_skip_watch, long, 0644);
 module_param_named(interrupt_watcher, kcsan_interrupt_watcher, bool, 0444);
 
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+static bool kcsan_weak_memory = true;
+module_param_named(weak_memory, kcsan_weak_memory, bool, 0644);
+#else
+#define kcsan_weak_memory false
+#endif
+
 bool kcsan_enabled;
 
 /* Per-CPU kcsan_ctx for interrupts */
@@ -351,6 +358,67 @@ void kcsan_restore_irqtrace(struct task_struct *task)
 #endif
 }
 
+static __always_inline int get_kcsan_stack_depth(void)
+{
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	return current->kcsan_stack_depth;
+#else
+	BUILD_BUG();
+	return 0;
+#endif
+}
+
+static __always_inline void add_kcsan_stack_depth(int val)
+{
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	current->kcsan_stack_depth += val;
+#else
+	BUILD_BUG();
+#endif
+}
+
+static __always_inline struct kcsan_scoped_access *get_reorder_access(struct kcsan_ctx *ctx)
+{
+#ifdef CONFIG_KCSAN_WEAK_MEMORY
+	return ctx->disable_scoped ? NULL : &ctx->reorder_access;
+#else
+	return NULL;
+#endif
+}
+
+static __always_inline bool
+find_reorder_access(struct kcsan_ctx *ctx, const volatile void *ptr, size_t size,
+		    int type, unsigned long ip)
+{
+	struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
+
+	if (!reorder_access)
+		return false;
+
+	/*
+	 * Note: If accesses are repeated while reorder_access is identical,
+	 * never matches the new access, because !(type & KCSAN_ACCESS_SCOPED).
+	 */
+	return reorder_access->ptr == ptr && reorder_access->size == size &&
+	       reorder_access->type == type && reorder_access->ip == ip;
+}
+
+static inline void
+set_reorder_access(struct kcsan_ctx *ctx, const volatile void *ptr, size_t size,
+		   int type, unsigned long ip)
+{
+	struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
+
+	if (!reorder_access || !kcsan_weak_memory)
+		return;
+
+	reorder_access->ptr		= ptr;
+	reorder_access->size		= size;
+	reorder_access->type		= type | KCSAN_ACCESS_SCOPED;
+	reorder_access->ip		= ip;
+	reorder_access->stack_depth	= get_kcsan_stack_depth();
+}
+
 /*
  * Pull everything together: check_access() below contains the performance
  * critical operations; the fast-path (including check_access) functions should
@@ -389,8 +457,10 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
 	 * The access_mask check relies on value-change comparison. To avoid
 	 * reporting a race where e.g. the writer set up the watchpoint, but the
 	 * reader has access_mask!=0, we have to ignore the found watchpoint.
+	 *
+	 * reorder_access is never created from an access with access_mask set.
 	 */
-	if (ctx->access_mask)
+	if (ctx->access_mask && !find_reorder_access(ctx, ptr, size, type, ip))
 		return;
 
 	/*
@@ -440,11 +510,13 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	const bool is_assert = (type & KCSAN_ACCESS_ASSERT) != 0;
 	atomic_long_t *watchpoint;
 	u64 old, new, diff;
-	unsigned long access_mask;
 	enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
+	bool interrupt_watcher = kcsan_interrupt_watcher;
 	unsigned long ua_flags = user_access_save();
 	struct kcsan_ctx *ctx = get_ctx();
+	unsigned long access_mask = ctx->access_mask;
 	unsigned long irq_flags = 0;
+	bool is_reorder_access;
 
 	/*
 	 * Always reset kcsan_skip counter in slow-path to avoid underflow; see
@@ -467,6 +539,17 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 		goto out;
 	}
 
+	/*
+	 * The local CPU cannot observe reordering of its own accesses, and
+	 * therefore we need to take care of 2 cases to avoid false positives:
+	 *
+	 *	1. Races of the reordered access with interrupts. To avoid, if
+	 *	   the current access is reorder_access, disable interrupts.
+	 *	2. Avoid races of scoped accesses from nested interrupts (below).
+	 */
+	is_reorder_access = find_reorder_access(ctx, ptr, size, type, ip);
+	if (is_reorder_access)
+		interrupt_watcher = false;
 	/*
 	 * Avoid races of scoped accesses from nested interrupts (or scheduler).
 	 * Assume setting up a watchpoint for a non-scoped (normal) access that
@@ -482,7 +565,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 * information is lost if dirtied by KCSAN.
 	 */
 	kcsan_save_irqtrace(current);
-	if (!kcsan_interrupt_watcher)
+	if (!interrupt_watcher)
 		local_irq_save(irq_flags);
 
 	watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
@@ -503,7 +586,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 * Read the current value, to later check and infer a race if the data
 	 * was modified via a non-instrumented access, e.g. from a device.
 	 */
-	old = read_instrumented_memory(ptr, size);
+	old = is_reorder_access ? 0 : read_instrumented_memory(ptr, size);
 
 	/*
 	 * Delay this thread, to increase probability of observing a racy
@@ -515,8 +598,17 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 * Re-read value, and check if it is as expected; if not, we infer a
 	 * racy access.
 	 */
-	access_mask = ctx->access_mask;
-	new = read_instrumented_memory(ptr, size);
+	if (!is_reorder_access) {
+		new = read_instrumented_memory(ptr, size);
+	} else {
+		/*
+		 * Reordered accesses cannot be used for value change detection,
+		 * because the memory location may no longer be accessible and
+		 * could result in a fault.
+		 */
+		new = 0;
+		access_mask = 0;
+	}
 
 	diff = old ^ new;
 	if (access_mask)
@@ -585,11 +677,20 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 	 */
 	remove_watchpoint(watchpoint);
 	atomic_long_dec(&kcsan_counters[KCSAN_COUNTER_USED_WATCHPOINTS]);
+
 out_unlock:
-	if (!kcsan_interrupt_watcher)
+	if (!interrupt_watcher)
 		local_irq_restore(irq_flags);
 	kcsan_restore_irqtrace(current);
 	ctx->disable_scoped--;
+
+	/*
+	 * Reordered accesses cannot be used for value change detection,
+	 * therefore never consider for reordering if access_mask is set.
+	 * ASSERT_EXCLUSIVE are not real accesses, ignore them as well.
+	 */
+	if (!access_mask && !is_assert)
+		set_reorder_access(ctx, ptr, size, type, ip);
 out:
 	user_access_restore(ua_flags);
 }
@@ -597,7 +698,6 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
 static __always_inline void
 check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
 {
-	const bool is_write = (type & KCSAN_ACCESS_WRITE) != 0;
 	atomic_long_t *watchpoint;
 	long encoded_watchpoint;
 
@@ -608,12 +708,14 @@ check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
 	if (unlikely(size == 0))
 		return;
 
+again:
 	/*
 	 * Avoid user_access_save in fast-path: find_watchpoint is safe without
 	 * user_access_save, as the address that ptr points to is only used to
 	 * check if a watchpoint exists; ptr is never dereferenced.
 	 */
-	watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
+	watchpoint = find_watchpoint((unsigned long)ptr, size,
+				     !(type & KCSAN_ACCESS_WRITE),
 				     &encoded_watchpoint);
 	/*
 	 * It is safe to check kcsan_is_enabled() after find_watchpoint in the
@@ -627,9 +729,42 @@ check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
 	else {
 		struct kcsan_ctx *ctx = get_ctx(); /* Call only once in fast-path. */
 
-		if (unlikely(should_watch(ctx, ptr, size, type)))
+		if (unlikely(should_watch(ctx, ptr, size, type))) {
 			kcsan_setup_watchpoint(ptr, size, type, ip);
-		else if (unlikely(ctx->scoped_accesses.prev))
+			return;
+		}
+
+		if (!(type & KCSAN_ACCESS_SCOPED)) {
+			struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
+
+			if (reorder_access) {
+				/*
+				 * reorder_access check: simulates reordering of
+				 * the access after subsequent operations.
+				 */
+				ptr = reorder_access->ptr;
+				type = reorder_access->type;
+				ip = reorder_access->ip;
+				/*
+				 * Upon a nested interrupt, this context's
+				 * reorder_access can be modified (shared ctx).
+				 * We know that upon return, reorder_access is
+				 * always invalidated by setting size to 0 via
+				 * __tsan_func_exit(). Therefore we must read
+				 * and check size after the other fields.
+				 */
+				barrier();
+				size = READ_ONCE(reorder_access->size);
+				if (size)
+					goto again;
+			}
+		}
+
+		/*
+		 * Always checked last, right before returning from runtime;
+		 * if reorder_access is valid, checked after it was checked.
+		 */
+		if (unlikely(ctx->scoped_accesses.prev))
 			kcsan_check_scoped_accesses();
 	}
 }
@@ -916,19 +1051,56 @@ DEFINE_TSAN_VOLATILE_READ_WRITE(8);
 DEFINE_TSAN_VOLATILE_READ_WRITE(16);
 
 /*
- * The below are not required by KCSAN, but can still be emitted by the
- * compiler.
+ * Function entry and exit are used to determine the validty of reorder_access.
+ * Reordering of the access ends at the end of the function scope where the
+ * access happened. This is done for two reasons:
+ *
+ *	1. Artificially limits the scope where missing barriers are detected.
+ *	   This minimizes false positives due to uninstrumented functions that
+ *	   contain the required barriers but were missed.
+ *
+ *	2. Simplifies generating the stack trace of the access.
  */
 void __tsan_func_entry(void *call_pc);
-void __tsan_func_entry(void *call_pc)
+noinline void __tsan_func_entry(void *call_pc)
 {
+	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
+		return;
+
+	add_kcsan_stack_depth(1);
 }
 EXPORT_SYMBOL(__tsan_func_entry);
+
 void __tsan_func_exit(void);
-void __tsan_func_exit(void)
+noinline void __tsan_func_exit(void)
 {
+	struct kcsan_scoped_access *reorder_access;
+
+	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
+		return;
+
+	reorder_access = get_reorder_access(get_ctx());
+	if (!reorder_access)
+		goto out;
+
+	if (get_kcsan_stack_depth() <= reorder_access->stack_depth) {
+		/*
+		 * Access check to catch cases where write without a barrier
+		 * (supposed release) was last access in function: because
+		 * instrumentation is inserted before the real access, a data
+		 * race due to the write giving up a c-s would only be caught if
+		 * we do the conflicting access after.
+		 */
+		check_access(reorder_access->ptr, reorder_access->size,
+			     reorder_access->type, reorder_access->ip);
+		reorder_access->size = 0;
+		reorder_access->stack_depth = INT_MIN;
+	}
+out:
+	add_kcsan_stack_depth(-1);
 }
 EXPORT_SYMBOL(__tsan_func_exit);
+
 void __tsan_init(void);
 void __tsan_init(void)
 {
diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
index e0a93ffdef30..e4394ea8068b 100644
--- a/lib/Kconfig.kcsan
+++ b/lib/Kconfig.kcsan
@@ -191,6 +191,26 @@ config KCSAN_STRICT
 	  closely aligns with the rules defined by the Linux-kernel memory
 	  consistency model (LKMM).
 
+config KCSAN_WEAK_MEMORY
+	bool "Enable weak memory modeling to detect missing memory barriers"
+	default y
+	depends on KCSAN_STRICT
+	# We can either let objtool nop __tsan_func_{entry,exit}() and builtin
+	# atomics instrumentation in .noinstr.text, or use a compiler that can
+	# implement __no_kcsan to really remove all instrumentation.
+	depends on STACK_VALIDATION || CC_IS_GCC
+	help
+	  Enable support for modeling a subset of weak memory, which allows
+	  detecting a subset of data races due to missing memory barriers.
+
+	  Depends on KCSAN_STRICT, because the options strenghtening certain
+	  plain accesses by default (depending on !KCSAN_STRICT) reduce the
+	  ability to detect any data races invoving reordered accesses, in
+	  particular reordered writes.
+
+	  Weak memory modeling relies on additional instrumentation and may
+	  affect performance.
+
 config KCSAN_REPORT_VALUE_CHANGE_ONLY
 	bool "Only report races where watcher observed a data value change"
 	default y
diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
index 37cb504c77e1..4c7f0d282e42 100644
--- a/scripts/Makefile.kcsan
+++ b/scripts/Makefile.kcsan
@@ -9,7 +9,12 @@ endif
 
 # Keep most options here optional, to allow enabling more compilers if absence
 # of some options does not break KCSAN nor causes false positive reports.
-export CFLAGS_KCSAN := -fsanitize=thread \
-	$(call cc-option,$(call cc-param,tsan-instrument-func-entry-exit=0) -fno-optimize-sibling-calls) \
+kcsan-cflags := -fsanitize=thread -fno-optimize-sibling-calls \
 	$(call cc-option,$(call cc-param,tsan-compound-read-before-write=1),$(call cc-option,$(call cc-param,tsan-instrument-read-before-write=1))) \
 	$(call cc-param,tsan-distinguish-volatile=1)
+
+ifndef CONFIG_KCSAN_WEAK_MEMORY
+kcsan-cflags += $(call cc-option,$(call cc-param,tsan-instrument-func-entry-exit=0))
+endif
+
+export CFLAGS_KCSAN := $(kcsan-cflags)
-- 
2.34.0.384.gca35af8252-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling
  2021-12-03  8:56   ` Marco Elver
@ 2021-12-03 16:50     ` Paul E. McKenney
  2021-12-03 21:08       ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-12-03 16:50 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Fri, Dec 03, 2021 at 09:56:45AM +0100, Marco Elver wrote:
> On Tue, Nov 30, 2021 at 12:44PM +0100, Marco Elver wrote:
> [...]
> > v3:
> > * Remove kcsan_noinstr hackery, since we now try to avoid adding any
> >   instrumentation to .noinstr.text in the first place.
> [...]
> 
> I missed some cleanups after changes from v2 to v3 -- the below cleanup
> is missing.
> 
> Full replacement patch attached.

I pulled this into -rcu with the other patches from your v3 post, thank
you all!

							Thanx, Paul

> Thanks,
> -- Marco
> 
> ------ >8 ------
> 
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> index 2254cb75cbb0..916060913966 100644
> --- a/kernel/kcsan/core.c
> +++ b/kernel/kcsan/core.c
> @@ -12,7 +12,6 @@
>  #include <linux/delay.h>
>  #include <linux/export.h>
>  #include <linux/init.h>
> -#include <linux/instrumentation.h>
>  #include <linux/kernel.h>
>  #include <linux/list.h>
>  #include <linux/moduleparam.h>
> @@ -21,8 +20,6 @@
>  #include <linux/sched.h>
>  #include <linux/uaccess.h>
>  
> -#include <asm/sections.h>
> -
>  #include "encoding.h"
>  #include "kcsan.h"
>  #include "permissive.h"
> @@ -1086,9 +1083,7 @@ noinline void __tsan_func_entry(void *call_pc)
>  	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
>  		return;
>  
> -	instrumentation_begin();
>  	add_kcsan_stack_depth(1);
> -	instrumentation_end();
>  }
>  EXPORT_SYMBOL(__tsan_func_entry);
>  
> @@ -1100,7 +1095,6 @@ noinline void __tsan_func_exit(void)
>  	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
>  		return;
>  
> -	instrumentation_begin();
>  	reorder_access = get_reorder_access(get_ctx());
>  	if (!reorder_access)
>  		goto out;
> @@ -1120,7 +1114,6 @@ noinline void __tsan_func_exit(void)
>  	}
>  out:
>  	add_kcsan_stack_depth(-1);
> -	instrumentation_end();
>  }
>  EXPORT_SYMBOL(__tsan_func_exit);
>  

> >From 7ac337afb7bec3cc5c5bd5e4155b08bdb554bc7d Mon Sep 17 00:00:00 2001
> From: Marco Elver <elver@google.com>
> Date: Thu, 5 Aug 2021 14:57:45 +0200
> Subject: [PATCH v4 04/25] kcsan: Add core support for a subset of weak memory
>  modeling
> 
> Add support for modeling a subset of weak memory, which will enable
> detection of a subset of data races due to missing memory barriers.
> 
> KCSAN's approach to detecting missing memory barriers is based on
> modeling access reordering, and enabled if `CONFIG_KCSAN_WEAK_MEMORY=y`,
> which depends on `CONFIG_KCSAN_STRICT=y`. The feature can be enabled or
> disabled at boot and runtime via the `kcsan.weak_memory` boot parameter.
> 
> Each memory access for which a watchpoint is set up, is also selected
> for simulated reordering within the scope of its function (at most 1
> in-flight access).
> 
> We are limited to modeling the effects of "buffering" (delaying the
> access), since the runtime cannot "prefetch" accesses (therefore no
> acquire modeling). Once an access has been selected for reordering, it
> is checked along every other access until the end of the function scope.
> If an appropriate memory barrier is encountered, the access will no
> longer be considered for reordering.
> 
> When the result of a memory operation should be ordered by a barrier,
> KCSAN can then detect data races where the conflict only occurs as a
> result of a missing barrier due to reordering accesses.
> 
> Suggested-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v4:
> * Remove redundant instrumentation_begin/end() now that kcsan_noinstr no
>   longer exists.
> 
> v3:
> * Remove kcsan_noinstr hackery, since we now try to avoid adding any
>   instrumentation to .noinstr.text in the first place.
> * Restrict config WEAK_MEMORY to only be enabled with tooling where
>   we actually remove instrumentation from noinstr.
> * Don't define kcsan_weak_memory bool if !KCSAN_WEAK_MEMORY.
> 
> v2:
> * Define kcsan_noinstr as noinline if we rely on objtool nop'ing out
>   calls, to avoid things like LTO inlining it.
> ---
>  include/linux/kcsan-checks.h |  10 +-
>  include/linux/kcsan.h        |  10 +-
>  include/linux/sched.h        |   3 +
>  kernel/kcsan/core.c          | 202 ++++++++++++++++++++++++++++++++---
>  lib/Kconfig.kcsan            |  20 ++++
>  scripts/Makefile.kcsan       |   9 +-
>  6 files changed, 235 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> index 5f5965246877..a1c6a89fde71 100644
> --- a/include/linux/kcsan-checks.h
> +++ b/include/linux/kcsan-checks.h
> @@ -99,7 +99,15 @@ void kcsan_set_access_mask(unsigned long mask);
>  
>  /* Scoped access information. */
>  struct kcsan_scoped_access {
> -	struct list_head list;
> +	union {
> +		struct list_head list; /* scoped_accesses list */
> +		/*
> +		 * Not an entry in scoped_accesses list; stack depth from where
> +		 * the access was initialized.
> +		 */
> +		int stack_depth;
> +	};
> +
>  	/* Access information. */
>  	const volatile void *ptr;
>  	size_t size;
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> index 13cef3458fed..c07c71f5ba4f 100644
> --- a/include/linux/kcsan.h
> +++ b/include/linux/kcsan.h
> @@ -49,8 +49,16 @@ struct kcsan_ctx {
>  	 */
>  	unsigned long access_mask;
>  
> -	/* List of scoped accesses. */
> +	/* List of scoped accesses; likely to be empty. */
>  	struct list_head scoped_accesses;
> +
> +#ifdef CONFIG_KCSAN_WEAK_MEMORY
> +	/*
> +	 * Scoped access for modeling access reordering to detect missing memory
> +	 * barriers; only keep 1 to keep fast-path complexity manageable.
> +	 */
> +	struct kcsan_scoped_access reorder_access;
> +#endif
>  };
>  
>  /**
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 78c351e35fec..0cd40b010487 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1339,6 +1339,9 @@ struct task_struct {
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  	struct irqtrace_events		kcsan_save_irqtrace;
>  #endif
> +#ifdef CONFIG_KCSAN_WEAK_MEMORY
> +	int				kcsan_stack_depth;
> +#endif
>  #endif
>  
>  #if IS_ENABLED(CONFIG_KUNIT)
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> index bd359f8ee63a..481f8a524089 100644
> --- a/kernel/kcsan/core.c
> +++ b/kernel/kcsan/core.c
> @@ -40,6 +40,13 @@ module_param_named(udelay_interrupt, kcsan_udelay_interrupt, uint, 0644);
>  module_param_named(skip_watch, kcsan_skip_watch, long, 0644);
>  module_param_named(interrupt_watcher, kcsan_interrupt_watcher, bool, 0444);
>  
> +#ifdef CONFIG_KCSAN_WEAK_MEMORY
> +static bool kcsan_weak_memory = true;
> +module_param_named(weak_memory, kcsan_weak_memory, bool, 0644);
> +#else
> +#define kcsan_weak_memory false
> +#endif
> +
>  bool kcsan_enabled;
>  
>  /* Per-CPU kcsan_ctx for interrupts */
> @@ -351,6 +358,67 @@ void kcsan_restore_irqtrace(struct task_struct *task)
>  #endif
>  }
>  
> +static __always_inline int get_kcsan_stack_depth(void)
> +{
> +#ifdef CONFIG_KCSAN_WEAK_MEMORY
> +	return current->kcsan_stack_depth;
> +#else
> +	BUILD_BUG();
> +	return 0;
> +#endif
> +}
> +
> +static __always_inline void add_kcsan_stack_depth(int val)
> +{
> +#ifdef CONFIG_KCSAN_WEAK_MEMORY
> +	current->kcsan_stack_depth += val;
> +#else
> +	BUILD_BUG();
> +#endif
> +}
> +
> +static __always_inline struct kcsan_scoped_access *get_reorder_access(struct kcsan_ctx *ctx)
> +{
> +#ifdef CONFIG_KCSAN_WEAK_MEMORY
> +	return ctx->disable_scoped ? NULL : &ctx->reorder_access;
> +#else
> +	return NULL;
> +#endif
> +}
> +
> +static __always_inline bool
> +find_reorder_access(struct kcsan_ctx *ctx, const volatile void *ptr, size_t size,
> +		    int type, unsigned long ip)
> +{
> +	struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
> +
> +	if (!reorder_access)
> +		return false;
> +
> +	/*
> +	 * Note: If accesses are repeated while reorder_access is identical,
> +	 * never matches the new access, because !(type & KCSAN_ACCESS_SCOPED).
> +	 */
> +	return reorder_access->ptr == ptr && reorder_access->size == size &&
> +	       reorder_access->type == type && reorder_access->ip == ip;
> +}
> +
> +static inline void
> +set_reorder_access(struct kcsan_ctx *ctx, const volatile void *ptr, size_t size,
> +		   int type, unsigned long ip)
> +{
> +	struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
> +
> +	if (!reorder_access || !kcsan_weak_memory)
> +		return;
> +
> +	reorder_access->ptr		= ptr;
> +	reorder_access->size		= size;
> +	reorder_access->type		= type | KCSAN_ACCESS_SCOPED;
> +	reorder_access->ip		= ip;
> +	reorder_access->stack_depth	= get_kcsan_stack_depth();
> +}
> +
>  /*
>   * Pull everything together: check_access() below contains the performance
>   * critical operations; the fast-path (including check_access) functions should
> @@ -389,8 +457,10 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
>  	 * The access_mask check relies on value-change comparison. To avoid
>  	 * reporting a race where e.g. the writer set up the watchpoint, but the
>  	 * reader has access_mask!=0, we have to ignore the found watchpoint.
> +	 *
> +	 * reorder_access is never created from an access with access_mask set.
>  	 */
> -	if (ctx->access_mask)
> +	if (ctx->access_mask && !find_reorder_access(ctx, ptr, size, type, ip))
>  		return;
>  
>  	/*
> @@ -440,11 +510,13 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
>  	const bool is_assert = (type & KCSAN_ACCESS_ASSERT) != 0;
>  	atomic_long_t *watchpoint;
>  	u64 old, new, diff;
> -	unsigned long access_mask;
>  	enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
> +	bool interrupt_watcher = kcsan_interrupt_watcher;
>  	unsigned long ua_flags = user_access_save();
>  	struct kcsan_ctx *ctx = get_ctx();
> +	unsigned long access_mask = ctx->access_mask;
>  	unsigned long irq_flags = 0;
> +	bool is_reorder_access;
>  
>  	/*
>  	 * Always reset kcsan_skip counter in slow-path to avoid underflow; see
> @@ -467,6 +539,17 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
>  		goto out;
>  	}
>  
> +	/*
> +	 * The local CPU cannot observe reordering of its own accesses, and
> +	 * therefore we need to take care of 2 cases to avoid false positives:
> +	 *
> +	 *	1. Races of the reordered access with interrupts. To avoid, if
> +	 *	   the current access is reorder_access, disable interrupts.
> +	 *	2. Avoid races of scoped accesses from nested interrupts (below).
> +	 */
> +	is_reorder_access = find_reorder_access(ctx, ptr, size, type, ip);
> +	if (is_reorder_access)
> +		interrupt_watcher = false;
>  	/*
>  	 * Avoid races of scoped accesses from nested interrupts (or scheduler).
>  	 * Assume setting up a watchpoint for a non-scoped (normal) access that
> @@ -482,7 +565,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
>  	 * information is lost if dirtied by KCSAN.
>  	 */
>  	kcsan_save_irqtrace(current);
> -	if (!kcsan_interrupt_watcher)
> +	if (!interrupt_watcher)
>  		local_irq_save(irq_flags);
>  
>  	watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> @@ -503,7 +586,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
>  	 * Read the current value, to later check and infer a race if the data
>  	 * was modified via a non-instrumented access, e.g. from a device.
>  	 */
> -	old = read_instrumented_memory(ptr, size);
> +	old = is_reorder_access ? 0 : read_instrumented_memory(ptr, size);
>  
>  	/*
>  	 * Delay this thread, to increase probability of observing a racy
> @@ -515,8 +598,17 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
>  	 * Re-read value, and check if it is as expected; if not, we infer a
>  	 * racy access.
>  	 */
> -	access_mask = ctx->access_mask;
> -	new = read_instrumented_memory(ptr, size);
> +	if (!is_reorder_access) {
> +		new = read_instrumented_memory(ptr, size);
> +	} else {
> +		/*
> +		 * Reordered accesses cannot be used for value change detection,
> +		 * because the memory location may no longer be accessible and
> +		 * could result in a fault.
> +		 */
> +		new = 0;
> +		access_mask = 0;
> +	}
>  
>  	diff = old ^ new;
>  	if (access_mask)
> @@ -585,11 +677,20 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
>  	 */
>  	remove_watchpoint(watchpoint);
>  	atomic_long_dec(&kcsan_counters[KCSAN_COUNTER_USED_WATCHPOINTS]);
> +
>  out_unlock:
> -	if (!kcsan_interrupt_watcher)
> +	if (!interrupt_watcher)
>  		local_irq_restore(irq_flags);
>  	kcsan_restore_irqtrace(current);
>  	ctx->disable_scoped--;
> +
> +	/*
> +	 * Reordered accesses cannot be used for value change detection,
> +	 * therefore never consider for reordering if access_mask is set.
> +	 * ASSERT_EXCLUSIVE are not real accesses, ignore them as well.
> +	 */
> +	if (!access_mask && !is_assert)
> +		set_reorder_access(ctx, ptr, size, type, ip);
>  out:
>  	user_access_restore(ua_flags);
>  }
> @@ -597,7 +698,6 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type, unsigned
>  static __always_inline void
>  check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
>  {
> -	const bool is_write = (type & KCSAN_ACCESS_WRITE) != 0;
>  	atomic_long_t *watchpoint;
>  	long encoded_watchpoint;
>  
> @@ -608,12 +708,14 @@ check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
>  	if (unlikely(size == 0))
>  		return;
>  
> +again:
>  	/*
>  	 * Avoid user_access_save in fast-path: find_watchpoint is safe without
>  	 * user_access_save, as the address that ptr points to is only used to
>  	 * check if a watchpoint exists; ptr is never dereferenced.
>  	 */
> -	watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +	watchpoint = find_watchpoint((unsigned long)ptr, size,
> +				     !(type & KCSAN_ACCESS_WRITE),
>  				     &encoded_watchpoint);
>  	/*
>  	 * It is safe to check kcsan_is_enabled() after find_watchpoint in the
> @@ -627,9 +729,42 @@ check_access(const volatile void *ptr, size_t size, int type, unsigned long ip)
>  	else {
>  		struct kcsan_ctx *ctx = get_ctx(); /* Call only once in fast-path. */
>  
> -		if (unlikely(should_watch(ctx, ptr, size, type)))
> +		if (unlikely(should_watch(ctx, ptr, size, type))) {
>  			kcsan_setup_watchpoint(ptr, size, type, ip);
> -		else if (unlikely(ctx->scoped_accesses.prev))
> +			return;
> +		}
> +
> +		if (!(type & KCSAN_ACCESS_SCOPED)) {
> +			struct kcsan_scoped_access *reorder_access = get_reorder_access(ctx);
> +
> +			if (reorder_access) {
> +				/*
> +				 * reorder_access check: simulates reordering of
> +				 * the access after subsequent operations.
> +				 */
> +				ptr = reorder_access->ptr;
> +				type = reorder_access->type;
> +				ip = reorder_access->ip;
> +				/*
> +				 * Upon a nested interrupt, this context's
> +				 * reorder_access can be modified (shared ctx).
> +				 * We know that upon return, reorder_access is
> +				 * always invalidated by setting size to 0 via
> +				 * __tsan_func_exit(). Therefore we must read
> +				 * and check size after the other fields.
> +				 */
> +				barrier();
> +				size = READ_ONCE(reorder_access->size);
> +				if (size)
> +					goto again;
> +			}
> +		}
> +
> +		/*
> +		 * Always checked last, right before returning from runtime;
> +		 * if reorder_access is valid, checked after it was checked.
> +		 */
> +		if (unlikely(ctx->scoped_accesses.prev))
>  			kcsan_check_scoped_accesses();
>  	}
>  }
> @@ -916,19 +1051,56 @@ DEFINE_TSAN_VOLATILE_READ_WRITE(8);
>  DEFINE_TSAN_VOLATILE_READ_WRITE(16);
>  
>  /*
> - * The below are not required by KCSAN, but can still be emitted by the
> - * compiler.
> + * Function entry and exit are used to determine the validty of reorder_access.
> + * Reordering of the access ends at the end of the function scope where the
> + * access happened. This is done for two reasons:
> + *
> + *	1. Artificially limits the scope where missing barriers are detected.
> + *	   This minimizes false positives due to uninstrumented functions that
> + *	   contain the required barriers but were missed.
> + *
> + *	2. Simplifies generating the stack trace of the access.
>   */
>  void __tsan_func_entry(void *call_pc);
> -void __tsan_func_entry(void *call_pc)
> +noinline void __tsan_func_entry(void *call_pc)
>  {
> +	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
> +		return;
> +
> +	add_kcsan_stack_depth(1);
>  }
>  EXPORT_SYMBOL(__tsan_func_entry);
> +
>  void __tsan_func_exit(void);
> -void __tsan_func_exit(void)
> +noinline void __tsan_func_exit(void)
>  {
> +	struct kcsan_scoped_access *reorder_access;
> +
> +	if (!IS_ENABLED(CONFIG_KCSAN_WEAK_MEMORY))
> +		return;
> +
> +	reorder_access = get_reorder_access(get_ctx());
> +	if (!reorder_access)
> +		goto out;
> +
> +	if (get_kcsan_stack_depth() <= reorder_access->stack_depth) {
> +		/*
> +		 * Access check to catch cases where write without a barrier
> +		 * (supposed release) was last access in function: because
> +		 * instrumentation is inserted before the real access, a data
> +		 * race due to the write giving up a c-s would only be caught if
> +		 * we do the conflicting access after.
> +		 */
> +		check_access(reorder_access->ptr, reorder_access->size,
> +			     reorder_access->type, reorder_access->ip);
> +		reorder_access->size = 0;
> +		reorder_access->stack_depth = INT_MIN;
> +	}
> +out:
> +	add_kcsan_stack_depth(-1);
>  }
>  EXPORT_SYMBOL(__tsan_func_exit);
> +
>  void __tsan_init(void);
>  void __tsan_init(void)
>  {
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> index e0a93ffdef30..e4394ea8068b 100644
> --- a/lib/Kconfig.kcsan
> +++ b/lib/Kconfig.kcsan
> @@ -191,6 +191,26 @@ config KCSAN_STRICT
>  	  closely aligns with the rules defined by the Linux-kernel memory
>  	  consistency model (LKMM).
>  
> +config KCSAN_WEAK_MEMORY
> +	bool "Enable weak memory modeling to detect missing memory barriers"
> +	default y
> +	depends on KCSAN_STRICT
> +	# We can either let objtool nop __tsan_func_{entry,exit}() and builtin
> +	# atomics instrumentation in .noinstr.text, or use a compiler that can
> +	# implement __no_kcsan to really remove all instrumentation.
> +	depends on STACK_VALIDATION || CC_IS_GCC
> +	help
> +	  Enable support for modeling a subset of weak memory, which allows
> +	  detecting a subset of data races due to missing memory barriers.
> +
> +	  Depends on KCSAN_STRICT, because the options strenghtening certain
> +	  plain accesses by default (depending on !KCSAN_STRICT) reduce the
> +	  ability to detect any data races invoving reordered accesses, in
> +	  particular reordered writes.
> +
> +	  Weak memory modeling relies on additional instrumentation and may
> +	  affect performance.
> +
>  config KCSAN_REPORT_VALUE_CHANGE_ONLY
>  	bool "Only report races where watcher observed a data value change"
>  	default y
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> index 37cb504c77e1..4c7f0d282e42 100644
> --- a/scripts/Makefile.kcsan
> +++ b/scripts/Makefile.kcsan
> @@ -9,7 +9,12 @@ endif
>  
>  # Keep most options here optional, to allow enabling more compilers if absence
>  # of some options does not break KCSAN nor causes false positive reports.
> -export CFLAGS_KCSAN := -fsanitize=thread \
> -	$(call cc-option,$(call cc-param,tsan-instrument-func-entry-exit=0) -fno-optimize-sibling-calls) \
> +kcsan-cflags := -fsanitize=thread -fno-optimize-sibling-calls \
>  	$(call cc-option,$(call cc-param,tsan-compound-read-before-write=1),$(call cc-option,$(call cc-param,tsan-instrument-read-before-write=1))) \
>  	$(call cc-param,tsan-distinguish-volatile=1)
> +
> +ifndef CONFIG_KCSAN_WEAK_MEMORY
> +kcsan-cflags += $(call cc-option,$(call cc-param,tsan-instrument-func-entry-exit=0))
> +endif
> +
> +export CFLAGS_KCSAN := $(kcsan-cflags)
> -- 
> 2.34.0.384.gca35af8252-goog
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling
  2021-12-03 16:50     ` Paul E. McKenney
@ 2021-12-03 21:08       ` Paul E. McKenney
  2021-12-03 23:42         ` Marco Elver
  2021-12-03 23:42         ` Paul E. McKenney
  0 siblings, 2 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-12-03 21:08 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Fri, Dec 03, 2021 at 08:50:20AM -0800, Paul E. McKenney wrote:
> On Fri, Dec 03, 2021 at 09:56:45AM +0100, Marco Elver wrote:
> > On Tue, Nov 30, 2021 at 12:44PM +0100, Marco Elver wrote:
> > [...]
> > > v3:
> > > * Remove kcsan_noinstr hackery, since we now try to avoid adding any
> > >   instrumentation to .noinstr.text in the first place.
> > [...]
> > 
> > I missed some cleanups after changes from v2 to v3 -- the below cleanup
> > is missing.
> > 
> > Full replacement patch attached.
> 
> I pulled this into -rcu with the other patches from your v3 post, thank
> you all!

A few quick tests located the following:

[    0.635383] INFO: trying to register non-static key.
[    0.635804] The code is fine but needs lockdep annotation, or maybe
[    0.636194] you didn't initialize this object before use?
[    0.636194] turning off the locking correctness validator.
[    0.636194] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc1+ #3208
[    0.636194] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[    0.636194] Call Trace:
[    0.636194]  <TASK>
[    0.636194]  dump_stack_lvl+0x88/0xd8
[    0.636194]  dump_stack+0x15/0x1b
[    0.636194]  register_lock_class+0x6b3/0x840
[    0.636194]  ? __this_cpu_preempt_check+0x1d/0x30
[    0.636194]  __lock_acquire+0x81/0xee0
[    0.636194]  ? lock_is_held_type+0xf1/0x160
[    0.636194]  lock_acquire+0xce/0x230
[    0.636194]  ? test_barrier+0x490/0x14c7
[    0.636194]  ? lock_is_held_type+0xf1/0x160
[    0.636194]  ? test_barrier+0x490/0x14c7
[    0.636194]  _raw_spin_lock+0x36/0x50
[    0.636194]  ? test_barrier+0x490/0x14c7
[    0.636194]  ? kcsan_init+0xf/0x80
[    0.636194]  test_barrier+0x490/0x14c7
[    0.636194]  ? kcsan_debugfs_init+0x1f/0x1f
[    0.636194]  kcsan_selftest+0x47/0xa0
[    0.636194]  do_one_initcall+0x104/0x230
[    0.636194]  ? rcu_read_lock_sched_held+0x5b/0xc0
[    0.636194]  ? kernel_init+0x1c/0x200
[    0.636194]  do_initcall_level+0xa5/0xb6
[    0.636194]  do_initcalls+0x66/0x95
[    0.636194]  do_basic_setup+0x1d/0x23
[    0.636194]  kernel_init_freeable+0x254/0x2ed
[    0.636194]  ? rest_init+0x290/0x290
[    0.636194]  kernel_init+0x1c/0x200
[    0.636194]  ? rest_init+0x290/0x290
[    0.636194]  ret_from_fork+0x22/0x30
[    0.636194]  </TASK>

When running without the new patch series, this splat does not appear.

Do I need a toolchain upgrade?  I see the Clang 14.0 in the cover letter,
but that seems to apply only to non-x86 architectures.

$ clang-11 -v
Ubuntu clang version 11.1.0-++20210805102428+1fdec59bffc1-1~exp1~20210805203044.169

							Thanx, Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling
  2021-12-03 21:08       ` Paul E. McKenney
@ 2021-12-03 23:42         ` Marco Elver
  2021-12-03 23:42         ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Marco Elver @ 2021-12-03 23:42 UTC (permalink / raw)
  To: paulmck
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Fri, 3 Dec 2021 at 22:09, Paul E. McKenney <paulmck@kernel.org> wrote:
[...]
> A few quick tests located the following:
>
> [    0.635383] INFO: trying to register non-static key.
> [    0.635804] The code is fine but needs lockdep annotation, or maybe
> [    0.636194] you didn't initialize this object before use?
> [    0.636194] turning off the locking correctness validator.
> [    0.636194] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc1+ #3208
[...]
> When running without the new patch series, this splat does not appear.
>
> Do I need a toolchain upgrade?  I see the Clang 14.0 in the cover letter,
> but that seems to apply only to non-x86 architectures.
>
> $ clang-11 -v
> Ubuntu clang version 11.1.0-++20210805102428+1fdec59bffc1-1~exp1~20210805203044.169

Good catch! That would be lockdep telling me off for putting test
locks on the stack. :-/

I thought I had tested this with lockdep, but it seems the set of
semi-automated tests I run didn't (yet) generate a config with
KCSAN_WEAK_MEMORY + LOCKDEP.

This should be fixed by:
https://lkml.kernel.org/r/20211203233817.2815340-1-elver@google.com

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling
  2021-12-03 21:08       ` Paul E. McKenney
  2021-12-03 23:42         ` Marco Elver
@ 2021-12-03 23:42         ` Paul E. McKenney
  2021-12-03 23:45           ` Marco Elver
  1 sibling, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-12-03 23:42 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Fri, Dec 03, 2021 at 01:08:56PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 03, 2021 at 08:50:20AM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 03, 2021 at 09:56:45AM +0100, Marco Elver wrote:
> > > On Tue, Nov 30, 2021 at 12:44PM +0100, Marco Elver wrote:
> > > [...]
> > > > v3:
> > > > * Remove kcsan_noinstr hackery, since we now try to avoid adding any
> > > >   instrumentation to .noinstr.text in the first place.
> > > [...]
> > > 
> > > I missed some cleanups after changes from v2 to v3 -- the below cleanup
> > > is missing.
> > > 
> > > Full replacement patch attached.
> > 
> > I pulled this into -rcu with the other patches from your v3 post, thank
> > you all!
> 
> A few quick tests located the following:
> 
> [    0.635383] INFO: trying to register non-static key.
> [    0.635804] The code is fine but needs lockdep annotation, or maybe
> [    0.636194] you didn't initialize this object before use?
> [    0.636194] turning off the locking correctness validator.
> [    0.636194] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc1+ #3208
> [    0.636194] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [    0.636194] Call Trace:
> [    0.636194]  <TASK>
> [    0.636194]  dump_stack_lvl+0x88/0xd8
> [    0.636194]  dump_stack+0x15/0x1b
> [    0.636194]  register_lock_class+0x6b3/0x840
> [    0.636194]  ? __this_cpu_preempt_check+0x1d/0x30
> [    0.636194]  __lock_acquire+0x81/0xee0
> [    0.636194]  ? lock_is_held_type+0xf1/0x160
> [    0.636194]  lock_acquire+0xce/0x230
> [    0.636194]  ? test_barrier+0x490/0x14c7
> [    0.636194]  ? lock_is_held_type+0xf1/0x160
> [    0.636194]  ? test_barrier+0x490/0x14c7
> [    0.636194]  _raw_spin_lock+0x36/0x50
> [    0.636194]  ? test_barrier+0x490/0x14c7
> [    0.636194]  ? kcsan_init+0xf/0x80
> [    0.636194]  test_barrier+0x490/0x14c7
> [    0.636194]  ? kcsan_debugfs_init+0x1f/0x1f
> [    0.636194]  kcsan_selftest+0x47/0xa0
> [    0.636194]  do_one_initcall+0x104/0x230
> [    0.636194]  ? rcu_read_lock_sched_held+0x5b/0xc0
> [    0.636194]  ? kernel_init+0x1c/0x200
> [    0.636194]  do_initcall_level+0xa5/0xb6
> [    0.636194]  do_initcalls+0x66/0x95
> [    0.636194]  do_basic_setup+0x1d/0x23
> [    0.636194]  kernel_init_freeable+0x254/0x2ed
> [    0.636194]  ? rest_init+0x290/0x290
> [    0.636194]  kernel_init+0x1c/0x200
> [    0.636194]  ? rest_init+0x290/0x290
> [    0.636194]  ret_from_fork+0x22/0x30
> [    0.636194]  </TASK>
> 
> When running without the new patch series, this splat does not appear.
> 
> Do I need a toolchain upgrade?  I see the Clang 14.0 in the cover letter,
> but that seems to apply only to non-x86 architectures.
> 
> $ clang-11 -v
> Ubuntu clang version 11.1.0-++20210805102428+1fdec59bffc1-1~exp1~20210805203044.169

And to further extend this bug report, the following patch suppresses
the error.

							Thanx, Paul

------------------------------------------------------------------------

commit d157b802f05bd12cf40bef7a73ca6914b85c865e
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri Dec 3 15:35:29 2021 -0800

    kcsan: selftest: Move test spinlock to static global
    
    Running the TREE01 or TREE02 rcutorture scenarios results in the
    following splat:
    
    ------------------------------------------------------------------------
    
     INFO: trying to register non-static key.
     The code is fine but needs lockdep annotation, or maybe
     you didn't initialize this object before use?
     turning off the locking correctness validator.
     CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc1+ #3208
     Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
     Call Trace:
      <TASK>
      dump_stack_lvl+0x88/0xd8
      dump_stack+0x15/0x1b
      register_lock_class+0x6b3/0x840
      ? __this_cpu_preempt_check+0x1d/0x30
      __lock_acquire+0x81/0xee0
      ? lock_is_held_type+0xf1/0x160
      lock_acquire+0xce/0x230
      ? test_barrier+0x490/0x14c7
      ? lock_is_held_type+0xf1/0x160
      ? test_barrier+0x490/0x14c7
      _raw_spin_lock+0x36/0x50
      ? test_barrier+0x490/0x14c7
      ? kcsan_init+0xf/0x80
      test_barrier+0x490/0x14c7
      ? kcsan_debugfs_init+0x1f/0x1f
      kcsan_selftest+0x47/0xa0
      do_one_initcall+0x104/0x230
      ? rcu_read_lock_sched_held+0x5b/0xc0
      ? kernel_init+0x1c/0x200
      do_initcall_level+0xa5/0xb6
      do_initcalls+0x66/0x95
      do_basic_setup+0x1d/0x23
      kernel_init_freeable+0x254/0x2ed
      ? rest_init+0x290/0x290
      kernel_init+0x1c/0x200
      ? rest_init+0x290/0x290
      ret_from_fork+0x22/0x30
      </TASK>
    
    ------------------------------------------------------------------------
    
    This appears to be due to this line of code in kernel/kcsan/selftest.c:
    KCSAN_CHECK_READ_BARRIER(spin_unlock(&spinlock)), which operates on a
    spinlock allocated on the stack.  This shot-in-the-dark patch makes the
    spinlock instead be a static global, which suppresses the above splat.
    
    Fixes: 510b49b8d4c9 ("kcsan: selftest: Add test case to check memory barrier instrumentation")
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/kcsan/selftest.c b/kernel/kcsan/selftest.c
index 08c6b84b9ebed..05d772c9fe933 100644
--- a/kernel/kcsan/selftest.c
+++ b/kernel/kcsan/selftest.c
@@ -108,6 +108,8 @@ static bool __init test_matching_access(void)
 	return true;
 }
 
+static DEFINE_SPINLOCK(test_barrier_spinlock);
+
 /*
  * Correct memory barrier instrumentation is critical to avoiding false
  * positives: simple test to check at boot certain barriers are always properly
@@ -122,7 +124,6 @@ static bool __init test_barrier(void)
 #endif
 	bool ret = true;
 	arch_spinlock_t arch_spinlock = __ARCH_SPIN_LOCK_UNLOCKED;
-	DEFINE_SPINLOCK(spinlock);
 	atomic_t dummy;
 	long test_var;
 
@@ -172,8 +173,8 @@ static bool __init test_barrier(void)
 	KCSAN_CHECK_READ_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var));
 	arch_spin_lock(&arch_spinlock);
 	KCSAN_CHECK_READ_BARRIER(arch_spin_unlock(&arch_spinlock));
-	spin_lock(&spinlock);
-	KCSAN_CHECK_READ_BARRIER(spin_unlock(&spinlock));
+	spin_lock(&test_barrier_spinlock);
+	KCSAN_CHECK_READ_BARRIER(spin_unlock(&test_barrier_spinlock));
 
 	KCSAN_CHECK_WRITE_BARRIER(mb());
 	KCSAN_CHECK_WRITE_BARRIER(wmb());
@@ -202,8 +203,8 @@ static bool __init test_barrier(void)
 	KCSAN_CHECK_WRITE_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var));
 	arch_spin_lock(&arch_spinlock);
 	KCSAN_CHECK_WRITE_BARRIER(arch_spin_unlock(&arch_spinlock));
-	spin_lock(&spinlock);
-	KCSAN_CHECK_WRITE_BARRIER(spin_unlock(&spinlock));
+	spin_lock(&test_barrier_spinlock);
+	KCSAN_CHECK_WRITE_BARRIER(spin_unlock(&test_barrier_spinlock));
 
 	KCSAN_CHECK_RW_BARRIER(mb());
 	KCSAN_CHECK_RW_BARRIER(wmb());
@@ -235,8 +236,8 @@ static bool __init test_barrier(void)
 	KCSAN_CHECK_RW_BARRIER(clear_bit_unlock_is_negative_byte(0, &test_var));
 	arch_spin_lock(&arch_spinlock);
 	KCSAN_CHECK_RW_BARRIER(arch_spin_unlock(&arch_spinlock));
-	spin_lock(&spinlock);
-	KCSAN_CHECK_RW_BARRIER(spin_unlock(&spinlock));
+	spin_lock(&test_barrier_spinlock);
+	KCSAN_CHECK_RW_BARRIER(spin_unlock(&test_barrier_spinlock));
 
 	kcsan_nestable_atomic_end();
 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling
  2021-12-03 23:42         ` Paul E. McKenney
@ 2021-12-03 23:45           ` Marco Elver
  2021-12-04  1:14             ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Marco Elver @ 2021-12-03 23:45 UTC (permalink / raw)
  To: paulmck
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Sat, 4 Dec 2021 at 00:42, Paul E. McKenney <paulmck@kernel.org> wrote:
[...]
> And to further extend this bug report, the following patch suppresses
> the error.
>
>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit d157b802f05bd12cf40bef7a73ca6914b85c865e
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Fri Dec 3 15:35:29 2021 -0800
>
>     kcsan: selftest: Move test spinlock to static global

Indeed, that will fix the selftest. The kcsan_test has the same
problem (+1 extra problem).

We raced sending the fix. :-)
I hope this patch works for you:
https://lkml.kernel.org/r/20211203233817.2815340-1-elver@google.com

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling
  2021-12-03 23:45           ` Marco Elver
@ 2021-12-04  1:14             ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-12-04  1:14 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Potapenko, Boqun Feng, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Sat, Dec 04, 2021 at 12:45:30AM +0100, Marco Elver wrote:
> On Sat, 4 Dec 2021 at 00:42, Paul E. McKenney <paulmck@kernel.org> wrote:
> [...]
> > And to further extend this bug report, the following patch suppresses
> > the error.
> >
> >                                                         Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit d157b802f05bd12cf40bef7a73ca6914b85c865e
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Fri Dec 3 15:35:29 2021 -0800
> >
> >     kcsan: selftest: Move test spinlock to static global
> 
> Indeed, that will fix the selftest. The kcsan_test has the same
> problem (+1 extra problem).
> 
> We raced sending the fix. :-)
> I hope this patch works for you:
> https://lkml.kernel.org/r/20211203233817.2815340-1-elver@google.com

I replaced my patch with yours and am starting up testing, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/25] kcsan: Show location access was reordered to
  2021-11-30 11:44 ` [PATCH v3 08/25] kcsan: Show location access was reordered to Marco Elver
@ 2021-12-06  5:03   ` Boqun Feng
  2021-12-06  7:16     ` Marco Elver
  0 siblings, 1 reply; 39+ messages in thread
From: Boqun Feng @ 2021-12-06  5:03 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Alexander Potapenko, Borislav Petkov,
	Dmitry Vyukov, Ingo Molnar, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Waiman Long, Will Deacon, kasan-dev, linux-arch,
	linux-doc, linux-kbuild, linux-kernel, linux-mm, llvm, x86

Hi,

On Tue, Nov 30, 2021 at 12:44:16PM +0100, Marco Elver wrote:
> Also show the location the access was reordered to. An example report:
> 
> | ==================================================================
> | BUG: KCSAN: data-race in test_kernel_wrong_memorder / test_kernel_wrong_memorder
> |
> | read-write to 0xffffffffc01e61a8 of 8 bytes by task 2311 on cpu 5:
> |  test_kernel_wrong_memorder+0x57/0x90
> |  access_thread+0x99/0xe0
> |  kthread+0x2ba/0x2f0
> |  ret_from_fork+0x22/0x30
> |
> | read-write (reordered) to 0xffffffffc01e61a8 of 8 bytes by task 2310 on cpu 7:
> |  test_kernel_wrong_memorder+0x57/0x90
> |  access_thread+0x99/0xe0
> |  kthread+0x2ba/0x2f0
> |  ret_from_fork+0x22/0x30
> |   |
> |   +-> reordered to: test_kernel_wrong_memorder+0x80/0x90
> |

Should this be "reordered from" instead of "reordered to"? For example,
if the following case needs a smp_mb() between write to A and write to
B, I think currently it will report as follow:

	foo() {
		WRITE_ONCE(A, 1); // let's say A's address is 0xaaaa
		bar() {
			WRITE_ONCE(B, 1); // Assume B's address is 0xbbbb
					  // KCSAN find the problem here
		}
	}

	<report>
	| write (reordered) to 0xaaaa of ...:
	| bar+0x... // address of the write to B
	| foo+0x... // address of the callsite to bar()
	| ...
	|  |
	|  +-> reordered to: foo+0x... // address of the write to A

But since the access reported here is the write to A, so it's a
"reordered from" instead of "reordered to"?

Regards,
Boqun

> | Reported by Kernel Concurrency Sanitizer on:
> | CPU: 7 PID: 2310 Comm: access_thread Not tainted 5.14.0-rc1+ #18
> | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> | ==================================================================
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  kernel/kcsan/report.c | 35 +++++++++++++++++++++++------------
>  1 file changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> index 1b0e050bdf6a..67794404042a 100644
> --- a/kernel/kcsan/report.c
> +++ b/kernel/kcsan/report.c
> @@ -308,10 +308,12 @@ static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries
>  
>  /*
>   * Skips to the first entry that matches the function of @ip, and then replaces
> - * that entry with @ip, returning the entries to skip.
> + * that entry with @ip, returning the entries to skip with @replaced containing
> + * the replaced entry.
>   */
>  static int
> -replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned long ip)
> +replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned long ip,
> +		    unsigned long *replaced)
>  {
>  	unsigned long symbolsize, offset;
>  	unsigned long target_func;
> @@ -330,6 +332,7 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
>  		func -= offset;
>  
>  		if (func == target_func) {
> +			*replaced = stack_entries[skip];
>  			stack_entries[skip] = ip;
>  			return skip;
>  		}
> @@ -342,9 +345,10 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
>  }
>  
>  static int
> -sanitize_stack_entries(unsigned long stack_entries[], int num_entries, unsigned long ip)
> +sanitize_stack_entries(unsigned long stack_entries[], int num_entries, unsigned long ip,
> +		       unsigned long *replaced)
>  {
> -	return ip ? replace_stack_entry(stack_entries, num_entries, ip) :
> +	return ip ? replace_stack_entry(stack_entries, num_entries, ip, replaced) :
>  			  get_stack_skipnr(stack_entries, num_entries);
>  }
>  
> @@ -360,6 +364,14 @@ static int sym_strcmp(void *addr1, void *addr2)
>  	return strncmp(buf1, buf2, sizeof(buf1));
>  }
>  
> +static void
> +print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long reordered_to)
> +{
> +	stack_trace_print(stack_entries, num_entries, 0);
> +	if (reordered_to)
> +		pr_err("  |\n  +-> reordered to: %pS\n", (void *)reordered_to);
> +}
> +
>  static void print_verbose_info(struct task_struct *task)
>  {
>  	if (!task)
> @@ -378,10 +390,12 @@ static void print_report(enum kcsan_value_change value_change,
>  			 struct other_info *other_info,
>  			 u64 old, u64 new, u64 mask)
>  {
> +	unsigned long reordered_to = 0;
>  	unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
>  	int num_stack_entries = stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> -	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip);
> +	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip, &reordered_to);
>  	unsigned long this_frame = stack_entries[skipnr];
> +	unsigned long other_reordered_to = 0;
>  	unsigned long other_frame = 0;
>  	int other_skipnr = 0; /* silence uninit warnings */
>  
> @@ -394,7 +408,7 @@ static void print_report(enum kcsan_value_change value_change,
>  	if (other_info) {
>  		other_skipnr = sanitize_stack_entries(other_info->stack_entries,
>  						      other_info->num_stack_entries,
> -						      other_info->ai.ip);
> +						      other_info->ai.ip, &other_reordered_to);
>  		other_frame = other_info->stack_entries[other_skipnr];
>  
>  		/* @value_change is only known for the other thread */
> @@ -434,10 +448,9 @@ static void print_report(enum kcsan_value_change value_change,
>  		       other_info->ai.cpu_id);
>  
>  		/* Print the other thread's stack trace. */
> -		stack_trace_print(other_info->stack_entries + other_skipnr,
> +		print_stack_trace(other_info->stack_entries + other_skipnr,
>  				  other_info->num_stack_entries - other_skipnr,
> -				  0);
> -
> +				  other_reordered_to);
>  		if (IS_ENABLED(CONFIG_KCSAN_VERBOSE))
>  			print_verbose_info(other_info->task);
>  
> @@ -451,9 +464,7 @@ static void print_report(enum kcsan_value_change value_change,
>  		       get_thread_desc(ai->task_pid), ai->cpu_id);
>  	}
>  	/* Print stack trace of this thread. */
> -	stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> -			  0);
> -
> +	print_stack_trace(stack_entries + skipnr, num_stack_entries - skipnr, reordered_to);
>  	if (IS_ENABLED(CONFIG_KCSAN_VERBOSE))
>  		print_verbose_info(current);
>  
> -- 
> 2.34.0.rc2.393.gf8c9666880-goog
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/25] kcsan: Show location access was reordered to
  2021-12-06  5:03   ` Boqun Feng
@ 2021-12-06  7:16     ` Marco Elver
  2021-12-06 14:31       ` Boqun Feng
  0 siblings, 1 reply; 39+ messages in thread
From: Marco Elver @ 2021-12-06  7:16 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Alexander Potapenko, Borislav Petkov,
	Dmitry Vyukov, Ingo Molnar, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Waiman Long, Will Deacon, kasan-dev, linux-arch,
	linux-doc, linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Mon, 6 Dec 2021 at 06:04, Boqun Feng <boqun.feng@gmail.com> wrote:
>
> Hi,
>
> On Tue, Nov 30, 2021 at 12:44:16PM +0100, Marco Elver wrote:
> > Also show the location the access was reordered to. An example report:
> >
> > | ==================================================================
> > | BUG: KCSAN: data-race in test_kernel_wrong_memorder / test_kernel_wrong_memorder
> > |
> > | read-write to 0xffffffffc01e61a8 of 8 bytes by task 2311 on cpu 5:
> > |  test_kernel_wrong_memorder+0x57/0x90
> > |  access_thread+0x99/0xe0
> > |  kthread+0x2ba/0x2f0
> > |  ret_from_fork+0x22/0x30
> > |
> > | read-write (reordered) to 0xffffffffc01e61a8 of 8 bytes by task 2310 on cpu 7:
> > |  test_kernel_wrong_memorder+0x57/0x90
> > |  access_thread+0x99/0xe0
> > |  kthread+0x2ba/0x2f0
> > |  ret_from_fork+0x22/0x30
> > |   |
> > |   +-> reordered to: test_kernel_wrong_memorder+0x80/0x90
> > |
>
> Should this be "reordered from" instead of "reordered to"? For example,
> if the following case needs a smp_mb() between write to A and write to
> B, I think currently it will report as follow:
>
>         foo() {
>                 WRITE_ONCE(A, 1); // let's say A's address is 0xaaaa
>                 bar() {
>                         WRITE_ONCE(B, 1); // Assume B's address is 0xbbbb
>                                           // KCSAN find the problem here
>                 }
>         }
>
>         <report>
>         | write (reordered) to 0xaaaa of ...:
>         | bar+0x... // address of the write to B
>         | foo+0x... // address of the callsite to bar()
>         | ...
>         |  |
>         |  +-> reordered to: foo+0x... // address of the write to A
>
> But since the access reported here is the write to A, so it's a
> "reordered from" instead of "reordered to"?

Perhaps I could have commented on this in the commit message to avoid
the confusion, but per its updated comment replace_stack_entry()
"skips to the first entry that matches the function of @ip, and then
replaces that entry with @ip, returning the entries to skip with
@replaced containing the replaced entry."

When a reorder_access is set up, the ip to it is stored, which is
what's passed to @ip of replace_stack_entry(). It effectively swaps
the top frame where the race occurred with where the original access
happened. This all works because the runtime is careful to only keep
reorder_accesses valid until the original function where it occurred
is left.

So in your above example you need to swap "reordered to" and the top
frame of the stack trace.

The implementation is a little trickier of course, but I really wanted
the main stack trace to look like any other non-reordered access,
which starts from the original access, and only have the "reordered
to" location be secondary information.

The foundation for doing this this was put in place here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6c65eb75686f

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/25] kcsan: Show location access was reordered to
  2021-12-06  7:16     ` Marco Elver
@ 2021-12-06 14:31       ` Boqun Feng
  2021-12-06 16:04         ` Marco Elver
  0 siblings, 1 reply; 39+ messages in thread
From: Boqun Feng @ 2021-12-06 14:31 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Alexander Potapenko, Borislav Petkov,
	Dmitry Vyukov, Ingo Molnar, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Waiman Long, Will Deacon, kasan-dev, linux-arch,
	linux-doc, linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Mon, Dec 06, 2021 at 08:16:11AM +0100, Marco Elver wrote:
> On Mon, 6 Dec 2021 at 06:04, Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Nov 30, 2021 at 12:44:16PM +0100, Marco Elver wrote:
> > > Also show the location the access was reordered to. An example report:
> > >
> > > | ==================================================================
> > > | BUG: KCSAN: data-race in test_kernel_wrong_memorder / test_kernel_wrong_memorder
> > > |
> > > | read-write to 0xffffffffc01e61a8 of 8 bytes by task 2311 on cpu 5:
> > > |  test_kernel_wrong_memorder+0x57/0x90
> > > |  access_thread+0x99/0xe0
> > > |  kthread+0x2ba/0x2f0
> > > |  ret_from_fork+0x22/0x30
> > > |
> > > | read-write (reordered) to 0xffffffffc01e61a8 of 8 bytes by task 2310 on cpu 7:
> > > |  test_kernel_wrong_memorder+0x57/0x90
> > > |  access_thread+0x99/0xe0
> > > |  kthread+0x2ba/0x2f0
> > > |  ret_from_fork+0x22/0x30
> > > |   |
> > > |   +-> reordered to: test_kernel_wrong_memorder+0x80/0x90
> > > |
> >
> > Should this be "reordered from" instead of "reordered to"? For example,
> > if the following case needs a smp_mb() between write to A and write to
> > B, I think currently it will report as follow:
> >
> >         foo() {
> >                 WRITE_ONCE(A, 1); // let's say A's address is 0xaaaa
> >                 bar() {
> >                         WRITE_ONCE(B, 1); // Assume B's address is 0xbbbb
> >                                           // KCSAN find the problem here
> >                 }
> >         }
> >
> >         <report>
> >         | write (reordered) to 0xaaaa of ...:
> >         | bar+0x... // address of the write to B
> >         | foo+0x... // address of the callsite to bar()
> >         | ...
> >         |  |
> >         |  +-> reordered to: foo+0x... // address of the write to A
> >
> > But since the access reported here is the write to A, so it's a
> > "reordered from" instead of "reordered to"?
> 
> Perhaps I could have commented on this in the commit message to avoid
> the confusion, but per its updated comment replace_stack_entry()
> "skips to the first entry that matches the function of @ip, and then
> replaces that entry with @ip, returning the entries to skip with
> @replaced containing the replaced entry."
> 
> When a reorder_access is set up, the ip to it is stored, which is
> what's passed to @ip of replace_stack_entry(). It effectively swaps
> the top frame where the race occurred with where the original access
> happened. This all works because the runtime is careful to only keep
> reorder_accesses valid until the original function where it occurred
> is left.
> 

Thanks for the explanation, I was missing the swap here. However...

> So in your above example you need to swap "reordered to" and the top
> frame of the stack trace.
> 

IIUC, the report for my above example will be:

         | write (reordered) to 0xaaaa of ...:
         | foo+0x... // address of the write to A
         | ...
         |  |
         |  +-> reordered to: foo+0x... // address of the callsite to bar() in foo()

, right? Because in replace_stack_entry(), it's not the top frame where
the race occurred that gets swapped, it's the frame which belongs to the
same function as the original access that gets swapped. In other words,
when KCSAN finds the problem, top entries of the calling stack are:

	[0] bar+0x.. // address of the write to B
	[1] foo+0x.. // address of the callsite to bar() in foo()

after replace_stack_entry(), they changes to:

	[0] bar+0x.. // address of the write to B
skip  ->[1] foo+0x.. // address of the write to A

, as a result the report won't mention bar() at all.

And I think a better report will be:

         | write (reordered) to 0xaaaa of ...:
         | foo+0x... // address of the write to A
         | ...
         |  |
         |  +-> reordered to: bar+0x... // address of the write to B in bar()

because it tells users the exact place the accesses get reordered. That
means maybe we want something as below? Not completely tested, but I
play with scope checking a bit, seems it gives what I want. Thoughts?

Regards,
Boqun

diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
index 67794404042a..b495ed3aa637 100644
--- a/kernel/kcsan/report.c
+++ b/kernel/kcsan/report.c
@@ -324,7 +324,10 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
        else
                goto fallback;

-       for (skip = 0; skip < num_entries; ++skip) {
+       skip = get_stack_skipnr(stack_entries, num_entries);
+       *replaced = stack_entries[skip];
+
+       for (;skip < num_entries; ++skip) {
                unsigned long func = stack_entries[skip];

                if (!kallsyms_lookup_size_offset(func, &symbolsize, &offset))
@@ -332,7 +335,6 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
                func -= offset;

                if (func == target_func) {
-                       *replaced = stack_entries[skip];
                        stack_entries[skip] = ip;
                        return skip;
                }

> The implementation is a little trickier of course, but I really wanted
> the main stack trace to look like any other non-reordered access,
> which starts from the original access, and only have the "reordered
> to" location be secondary information.
> 
> The foundation for doing this this was put in place here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6c65eb75686f
> 
> Thanks,
> -- Marco

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/25] kcsan: Show location access was reordered to
  2021-12-06 14:31       ` Boqun Feng
@ 2021-12-06 16:04         ` Marco Elver
  2021-12-06 17:16           ` Boqun Feng
  0 siblings, 1 reply; 39+ messages in thread
From: Marco Elver @ 2021-12-06 16:04 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Alexander Potapenko, Borislav Petkov,
	Dmitry Vyukov, Ingo Molnar, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Waiman Long, Will Deacon, kasan-dev, linux-arch,
	linux-doc, linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Mon, Dec 06, 2021 at 10:31PM +0800, Boqun Feng wrote:
[...]
> Thanks for the explanation, I was missing the swap here. However...
> 
> > So in your above example you need to swap "reordered to" and the top
> > frame of the stack trace.
> > 

Apologies, I wasn't entirely precise ... what you say below is correct.

> IIUC, the report for my above example will be:
> 
>          | write (reordered) to 0xaaaa of ...:
>          | foo+0x... // address of the write to A
>          | ...
>          |  |
>          |  +-> reordered to: foo+0x... // address of the callsite to bar() in foo()
> 
> , right? Because in replace_stack_entry(), it's not the top frame where
> the race occurred that gets swapped, it's the frame which belongs to the
> same function as the original access that gets swapped. In other words,
> when KCSAN finds the problem, top entries of the calling stack are:
> 
> 	[0] bar+0x.. // address of the write to B
> 	[1] foo+0x.. // address of the callsite to bar() in foo()
> 
> after replace_stack_entry(), they changes to:
> 
> 	[0] bar+0x.. // address of the write to B
> skip  ->[1] foo+0x.. // address of the write to A
> 
> , as a result the report won't mention bar() at all.

Correct.

> And I think a better report will be:
> 
>          | write (reordered) to 0xaaaa of ...:
>          | foo+0x... // address of the write to A
>          | ...
>          |  |
>          |  +-> reordered to: bar+0x... // address of the write to B in bar()
> 
> because it tells users the exact place the accesses get reordered. That
> means maybe we want something as below? Not completely tested, but I
> play with scope checking a bit, seems it gives what I want. Thoughts?

This is problematic because it makes it much harder to actually figure
out what's going on, given "reordered to" isn't a full stack trace. So
if you're deep in some call hierarchy, seeing a random "reordered to"
line is quite useless. What I want to see, at the very least, is the ip
to the same function where the original access happened.

We could of course try and generate a full stack trace at "reordered
to", but this would entail

	a) allocating 2x unsigned long[64] on the stack (or moving to
	   static storage),
	b) further increasing the report length,
	c) an even larger number of possibly distinct reports for the
	   same issue; this makes deduplication even harder.

The reason I couldn't justify all that is that when I looked through
several dozen "reordered to" reports, I never found anything other than
the ip in the function frame of the original access useful. That, and in
most cases the "reordered to" location was in the same function or in an
inlined function.

The below patch would do what you'd want I think.

My opinion is to err on the side of simplicity until there is evidence
we need it. Of course, if you have a compelling reason that we need it
from the beginning, happy to send it as a separate patch on top.

What do you think?

Thanks,
-- Marco

------ >8 ------

From: Marco Elver <elver@google.com>
Date: Mon, 6 Dec 2021 16:35:02 +0100
Subject: [PATCH] kcsan: Show full stack trace of reordered-to accesses

Change reports involving reordered accesses to show the full stack trace
of "reordered to" accesses. For example:

 | ==================================================================
 | BUG: KCSAN: data-race in test_kernel_wrong_memorder / test_kernel_wrong_memorder
 |
 | read-write to 0xffffffffc02d01e8 of 8 bytes by task 2481 on cpu 2:
 |  test_kernel_wrong_memorder+0x57/0x90
 |  access_thread+0xb7/0x100
 |  kthread+0x2ed/0x320
 |  ret_from_fork+0x22/0x30
 |
 | read-write (reordered) to 0xffffffffc02d01e8 of 8 bytes by task 2480 on cpu 0:
 |  test_kernel_wrong_memorder+0x57/0x90
 |  access_thread+0xb7/0x100
 |  kthread+0x2ed/0x320
 |  ret_from_fork+0x22/0x30
 |   |
 |   +-> reordered to: test_delay+0x31/0x110
 |                     test_kernel_wrong_memorder+0x80/0x90
 |
 | Reported by Kernel Concurrency Sanitizer on:
 | CPU: 0 PID: 2480 Comm: access_thread Not tainted 5.16.0-rc1+ #2
 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
 | ==================================================================

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/kcsan/report.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
index 67794404042a..a8317d5f5123 100644
--- a/kernel/kcsan/report.c
+++ b/kernel/kcsan/report.c
@@ -317,22 +317,29 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
 {
 	unsigned long symbolsize, offset;
 	unsigned long target_func;
-	int skip;
+	int skip, i;
 
 	if (kallsyms_lookup_size_offset(ip, &symbolsize, &offset))
 		target_func = ip - offset;
 	else
 		goto fallback;
 
-	for (skip = 0; skip < num_entries; ++skip) {
+	skip = get_stack_skipnr(stack_entries, num_entries);
+	for (i = 0; skip < num_entries; ++skip, ++i) {
 		unsigned long func = stack_entries[skip];
 
 		if (!kallsyms_lookup_size_offset(func, &symbolsize, &offset))
 			goto fallback;
 		func -= offset;
 
+		replaced[i] = stack_entries[skip];
 		if (func == target_func) {
-			*replaced = stack_entries[skip];
+			/*
+			 * There must be at least 1 entry left in the original
+			 * @stack_entries, so we know that we will never occupy
+			 * more than @num_entries - 1 of @replaced.
+			 */
+			replaced[i + 1] = 0;
 			stack_entries[skip] = ip;
 			return skip;
 		}
@@ -341,6 +348,7 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
 fallback:
 	/* Should not happen; the resulting stack trace is likely misleading. */
 	WARN_ONCE(1, "Cannot find frame for %pS in stack trace", (void *)ip);
+	replaced[0] = 0;
 	return get_stack_skipnr(stack_entries, num_entries);
 }
 
@@ -365,11 +373,16 @@ static int sym_strcmp(void *addr1, void *addr2)
 }
 
 static void
-print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long reordered_to)
+print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long *reordered_to)
 {
 	stack_trace_print(stack_entries, num_entries, 0);
-	if (reordered_to)
-		pr_err("  |\n  +-> reordered to: %pS\n", (void *)reordered_to);
+	if (reordered_to[0]) {
+		int i;
+
+		pr_err("  |\n  +-> reordered to: %pS\n", (void *)reordered_to[0]);
+		for (i = 1; i < NUM_STACK_ENTRIES && reordered_to[i]; ++i)
+			pr_err("                    %pS\n", (void *)reordered_to[i]);
+	}
 }
 
 static void print_verbose_info(struct task_struct *task)
@@ -390,12 +403,12 @@ static void print_report(enum kcsan_value_change value_change,
 			 struct other_info *other_info,
 			 u64 old, u64 new, u64 mask)
 {
-	unsigned long reordered_to = 0;
+	unsigned long reordered_to[NUM_STACK_ENTRIES] = { 0 };
 	unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
 	int num_stack_entries = stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
-	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip, &reordered_to);
+	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip, reordered_to);
 	unsigned long this_frame = stack_entries[skipnr];
-	unsigned long other_reordered_to = 0;
+	unsigned long other_reordered_to[NUM_STACK_ENTRIES] = { 0 };
 	unsigned long other_frame = 0;
 	int other_skipnr = 0; /* silence uninit warnings */
 
@@ -408,7 +421,7 @@ static void print_report(enum kcsan_value_change value_change,
 	if (other_info) {
 		other_skipnr = sanitize_stack_entries(other_info->stack_entries,
 						      other_info->num_stack_entries,
-						      other_info->ai.ip, &other_reordered_to);
+						      other_info->ai.ip, other_reordered_to);
 		other_frame = other_info->stack_entries[other_skipnr];
 
 		/* @value_change is only known for the other thread */
-- 
2.34.1.400.ga245620fadb-goog


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/25] kcsan: Show location access was reordered to
  2021-12-06 16:04         ` Marco Elver
@ 2021-12-06 17:16           ` Boqun Feng
  2021-12-06 17:38             ` Paul E. McKenney
  0 siblings, 1 reply; 39+ messages in thread
From: Boqun Feng @ 2021-12-06 17:16 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Alexander Potapenko, Borislav Petkov,
	Dmitry Vyukov, Ingo Molnar, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner, Waiman Long, Will Deacon, kasan-dev, linux-arch,
	linux-doc, linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Mon, Dec 06, 2021 at 05:04:20PM +0100, Marco Elver wrote:
> On Mon, Dec 06, 2021 at 10:31PM +0800, Boqun Feng wrote:
> [...]
> > Thanks for the explanation, I was missing the swap here. However...
> > 
> > > So in your above example you need to swap "reordered to" and the top
> > > frame of the stack trace.
> > > 
> 
> Apologies, I wasn't entirely precise ... what you say below is correct.
> 
> > IIUC, the report for my above example will be:
> > 
> >          | write (reordered) to 0xaaaa of ...:
> >          | foo+0x... // address of the write to A
> >          | ...
> >          |  |
> >          |  +-> reordered to: foo+0x... // address of the callsite to bar() in foo()
> > 
> > , right? Because in replace_stack_entry(), it's not the top frame where
> > the race occurred that gets swapped, it's the frame which belongs to the
> > same function as the original access that gets swapped. In other words,
> > when KCSAN finds the problem, top entries of the calling stack are:
> > 
> > 	[0] bar+0x.. // address of the write to B
> > 	[1] foo+0x.. // address of the callsite to bar() in foo()
> > 
> > after replace_stack_entry(), they changes to:
> > 
> > 	[0] bar+0x.. // address of the write to B
> > skip  ->[1] foo+0x.. // address of the write to A
> > 
> > , as a result the report won't mention bar() at all.
> 
> Correct.
> 
> > And I think a better report will be:
> > 
> >          | write (reordered) to 0xaaaa of ...:
> >          | foo+0x... // address of the write to A
> >          | ...
> >          |  |
> >          |  +-> reordered to: bar+0x... // address of the write to B in bar()
> > 
> > because it tells users the exact place the accesses get reordered. That
> > means maybe we want something as below? Not completely tested, but I
> > play with scope checking a bit, seems it gives what I want. Thoughts?
> 
> This is problematic because it makes it much harder to actually figure
> out what's going on, given "reordered to" isn't a full stack trace. So
> if you're deep in some call hierarchy, seeing a random "reordered to"
> line is quite useless. What I want to see, at the very least, is the ip
> to the same function where the original access happened.
> 
> We could of course try and generate a full stack trace at "reordered
> to", but this would entail
> 
> 	a) allocating 2x unsigned long[64] on the stack (or moving to
> 	   static storage),
> 	b) further increasing the report length,
> 	c) an even larger number of possibly distinct reports for the
> 	   same issue; this makes deduplication even harder.
> 
> The reason I couldn't justify all that is that when I looked through
> several dozen "reordered to" reports, I never found anything other than
> the ip in the function frame of the original access useful. That, and in
> most cases the "reordered to" location was in the same function or in an
> inlined function.
> 
> The below patch would do what you'd want I think.
> 
> My opinion is to err on the side of simplicity until there is evidence
> we need it. Of course, if you have a compelling reason that we need it
> from the beginning, happy to send it as a separate patch on top.
> 
> What do you think?
> 

Totally agreed. It's better to keep it simple until people report that
they want to see more information to resolve the issues. And thanks for
looking into the "double stack traces", that looks good to me too.

For the original patch, feel free to add:

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

> Thanks,
> -- Marco
> 
> ------ >8 ------
> 
> From: Marco Elver <elver@google.com>
> Date: Mon, 6 Dec 2021 16:35:02 +0100
> Subject: [PATCH] kcsan: Show full stack trace of reordered-to accesses
> 
> Change reports involving reordered accesses to show the full stack trace
> of "reordered to" accesses. For example:
> 
>  | ==================================================================
>  | BUG: KCSAN: data-race in test_kernel_wrong_memorder / test_kernel_wrong_memorder
>  |
>  | read-write to 0xffffffffc02d01e8 of 8 bytes by task 2481 on cpu 2:
>  |  test_kernel_wrong_memorder+0x57/0x90
>  |  access_thread+0xb7/0x100
>  |  kthread+0x2ed/0x320
>  |  ret_from_fork+0x22/0x30
>  |
>  | read-write (reordered) to 0xffffffffc02d01e8 of 8 bytes by task 2480 on cpu 0:
>  |  test_kernel_wrong_memorder+0x57/0x90
>  |  access_thread+0xb7/0x100
>  |  kthread+0x2ed/0x320
>  |  ret_from_fork+0x22/0x30
>  |   |
>  |   +-> reordered to: test_delay+0x31/0x110
>  |                     test_kernel_wrong_memorder+0x80/0x90
>  |
>  | Reported by Kernel Concurrency Sanitizer on:
>  | CPU: 0 PID: 2480 Comm: access_thread Not tainted 5.16.0-rc1+ #2
>  | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
>  | ==================================================================
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  kernel/kcsan/report.c | 33 +++++++++++++++++++++++----------
>  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> index 67794404042a..a8317d5f5123 100644
> --- a/kernel/kcsan/report.c
> +++ b/kernel/kcsan/report.c
> @@ -317,22 +317,29 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
>  {
>  	unsigned long symbolsize, offset;
>  	unsigned long target_func;
> -	int skip;
> +	int skip, i;
>  
>  	if (kallsyms_lookup_size_offset(ip, &symbolsize, &offset))
>  		target_func = ip - offset;
>  	else
>  		goto fallback;
>  
> -	for (skip = 0; skip < num_entries; ++skip) {
> +	skip = get_stack_skipnr(stack_entries, num_entries);
> +	for (i = 0; skip < num_entries; ++skip, ++i) {
>  		unsigned long func = stack_entries[skip];
>  
>  		if (!kallsyms_lookup_size_offset(func, &symbolsize, &offset))
>  			goto fallback;
>  		func -= offset;
>  
> +		replaced[i] = stack_entries[skip];
>  		if (func == target_func) {
> -			*replaced = stack_entries[skip];
> +			/*
> +			 * There must be at least 1 entry left in the original
> +			 * @stack_entries, so we know that we will never occupy
> +			 * more than @num_entries - 1 of @replaced.
> +			 */
> +			replaced[i + 1] = 0;
>  			stack_entries[skip] = ip;
>  			return skip;
>  		}
> @@ -341,6 +348,7 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
>  fallback:
>  	/* Should not happen; the resulting stack trace is likely misleading. */
>  	WARN_ONCE(1, "Cannot find frame for %pS in stack trace", (void *)ip);
> +	replaced[0] = 0;
>  	return get_stack_skipnr(stack_entries, num_entries);
>  }
>  
> @@ -365,11 +373,16 @@ static int sym_strcmp(void *addr1, void *addr2)
>  }
>  
>  static void
> -print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long reordered_to)
> +print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long *reordered_to)
>  {
>  	stack_trace_print(stack_entries, num_entries, 0);
> -	if (reordered_to)
> -		pr_err("  |\n  +-> reordered to: %pS\n", (void *)reordered_to);
> +	if (reordered_to[0]) {
> +		int i;
> +
> +		pr_err("  |\n  +-> reordered to: %pS\n", (void *)reordered_to[0]);
> +		for (i = 1; i < NUM_STACK_ENTRIES && reordered_to[i]; ++i)
> +			pr_err("                    %pS\n", (void *)reordered_to[i]);
> +	}
>  }
>  
>  static void print_verbose_info(struct task_struct *task)
> @@ -390,12 +403,12 @@ static void print_report(enum kcsan_value_change value_change,
>  			 struct other_info *other_info,
>  			 u64 old, u64 new, u64 mask)
>  {
> -	unsigned long reordered_to = 0;
> +	unsigned long reordered_to[NUM_STACK_ENTRIES] = { 0 };
>  	unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
>  	int num_stack_entries = stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> -	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip, &reordered_to);
> +	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip, reordered_to);
>  	unsigned long this_frame = stack_entries[skipnr];
> -	unsigned long other_reordered_to = 0;
> +	unsigned long other_reordered_to[NUM_STACK_ENTRIES] = { 0 };
>  	unsigned long other_frame = 0;
>  	int other_skipnr = 0; /* silence uninit warnings */
>  
> @@ -408,7 +421,7 @@ static void print_report(enum kcsan_value_change value_change,
>  	if (other_info) {
>  		other_skipnr = sanitize_stack_entries(other_info->stack_entries,
>  						      other_info->num_stack_entries,
> -						      other_info->ai.ip, &other_reordered_to);
> +						      other_info->ai.ip, other_reordered_to);
>  		other_frame = other_info->stack_entries[other_skipnr];
>  
>  		/* @value_change is only known for the other thread */
> -- 
> 2.34.1.400.ga245620fadb-goog
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 08/25] kcsan: Show location access was reordered to
  2021-12-06 17:16           ` Boqun Feng
@ 2021-12-06 17:38             ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-12-06 17:38 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Marco Elver, Alexander Potapenko, Borislav Petkov, Dmitry Vyukov,
	Ingo Molnar, Mark Rutland, Peter Zijlstra, Thomas Gleixner,
	Waiman Long, Will Deacon, kasan-dev, linux-arch, linux-doc,
	linux-kbuild, linux-kernel, linux-mm, llvm, x86

On Tue, Dec 07, 2021 at 01:16:25AM +0800, Boqun Feng wrote:
> On Mon, Dec 06, 2021 at 05:04:20PM +0100, Marco Elver wrote:
> > On Mon, Dec 06, 2021 at 10:31PM +0800, Boqun Feng wrote:
> > [...]
> > > Thanks for the explanation, I was missing the swap here. However...
> > > 
> > > > So in your above example you need to swap "reordered to" and the top
> > > > frame of the stack trace.
> > > > 
> > 
> > Apologies, I wasn't entirely precise ... what you say below is correct.
> > 
> > > IIUC, the report for my above example will be:
> > > 
> > >          | write (reordered) to 0xaaaa of ...:
> > >          | foo+0x... // address of the write to A
> > >          | ...
> > >          |  |
> > >          |  +-> reordered to: foo+0x... // address of the callsite to bar() in foo()
> > > 
> > > , right? Because in replace_stack_entry(), it's not the top frame where
> > > the race occurred that gets swapped, it's the frame which belongs to the
> > > same function as the original access that gets swapped. In other words,
> > > when KCSAN finds the problem, top entries of the calling stack are:
> > > 
> > > 	[0] bar+0x.. // address of the write to B
> > > 	[1] foo+0x.. // address of the callsite to bar() in foo()
> > > 
> > > after replace_stack_entry(), they changes to:
> > > 
> > > 	[0] bar+0x.. // address of the write to B
> > > skip  ->[1] foo+0x.. // address of the write to A
> > > 
> > > , as a result the report won't mention bar() at all.
> > 
> > Correct.
> > 
> > > And I think a better report will be:
> > > 
> > >          | write (reordered) to 0xaaaa of ...:
> > >          | foo+0x... // address of the write to A
> > >          | ...
> > >          |  |
> > >          |  +-> reordered to: bar+0x... // address of the write to B in bar()
> > > 
> > > because it tells users the exact place the accesses get reordered. That
> > > means maybe we want something as below? Not completely tested, but I
> > > play with scope checking a bit, seems it gives what I want. Thoughts?
> > 
> > This is problematic because it makes it much harder to actually figure
> > out what's going on, given "reordered to" isn't a full stack trace. So
> > if you're deep in some call hierarchy, seeing a random "reordered to"
> > line is quite useless. What I want to see, at the very least, is the ip
> > to the same function where the original access happened.
> > 
> > We could of course try and generate a full stack trace at "reordered
> > to", but this would entail
> > 
> > 	a) allocating 2x unsigned long[64] on the stack (or moving to
> > 	   static storage),
> > 	b) further increasing the report length,
> > 	c) an even larger number of possibly distinct reports for the
> > 	   same issue; this makes deduplication even harder.
> > 
> > The reason I couldn't justify all that is that when I looked through
> > several dozen "reordered to" reports, I never found anything other than
> > the ip in the function frame of the original access useful. That, and in
> > most cases the "reordered to" location was in the same function or in an
> > inlined function.
> > 
> > The below patch would do what you'd want I think.
> > 
> > My opinion is to err on the side of simplicity until there is evidence
> > we need it. Of course, if you have a compelling reason that we need it
> > from the beginning, happy to send it as a separate patch on top.
> > 
> > What do you think?
> > 
> 
> Totally agreed. It's better to keep it simple until people report that
> they want to see more information to resolve the issues. And thanks for
> looking into the "double stack traces", that looks good to me too.
> 
> For the original patch, feel free to add:
> 
> Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Thank you both!  I will add Boqun's Reviewed-by on the next rebase.

							Thanx, Paul

> Regards,
> Boqun
> 
> > Thanks,
> > -- Marco
> > 
> > ------ >8 ------
> > 
> > From: Marco Elver <elver@google.com>
> > Date: Mon, 6 Dec 2021 16:35:02 +0100
> > Subject: [PATCH] kcsan: Show full stack trace of reordered-to accesses
> > 
> > Change reports involving reordered accesses to show the full stack trace
> > of "reordered to" accesses. For example:
> > 
> >  | ==================================================================
> >  | BUG: KCSAN: data-race in test_kernel_wrong_memorder / test_kernel_wrong_memorder
> >  |
> >  | read-write to 0xffffffffc02d01e8 of 8 bytes by task 2481 on cpu 2:
> >  |  test_kernel_wrong_memorder+0x57/0x90
> >  |  access_thread+0xb7/0x100
> >  |  kthread+0x2ed/0x320
> >  |  ret_from_fork+0x22/0x30
> >  |
> >  | read-write (reordered) to 0xffffffffc02d01e8 of 8 bytes by task 2480 on cpu 0:
> >  |  test_kernel_wrong_memorder+0x57/0x90
> >  |  access_thread+0xb7/0x100
> >  |  kthread+0x2ed/0x320
> >  |  ret_from_fork+0x22/0x30
> >  |   |
> >  |   +-> reordered to: test_delay+0x31/0x110
> >  |                     test_kernel_wrong_memorder+0x80/0x90
> >  |
> >  | Reported by Kernel Concurrency Sanitizer on:
> >  | CPU: 0 PID: 2480 Comm: access_thread Not tainted 5.16.0-rc1+ #2
> >  | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> >  | ==================================================================
> > 
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  kernel/kcsan/report.c | 33 +++++++++++++++++++++++----------
> >  1 file changed, 23 insertions(+), 10 deletions(-)
> > 
> > diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> > index 67794404042a..a8317d5f5123 100644
> > --- a/kernel/kcsan/report.c
> > +++ b/kernel/kcsan/report.c
> > @@ -317,22 +317,29 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
> >  {
> >  	unsigned long symbolsize, offset;
> >  	unsigned long target_func;
> > -	int skip;
> > +	int skip, i;
> >  
> >  	if (kallsyms_lookup_size_offset(ip, &symbolsize, &offset))
> >  		target_func = ip - offset;
> >  	else
> >  		goto fallback;
> >  
> > -	for (skip = 0; skip < num_entries; ++skip) {
> > +	skip = get_stack_skipnr(stack_entries, num_entries);
> > +	for (i = 0; skip < num_entries; ++skip, ++i) {
> >  		unsigned long func = stack_entries[skip];
> >  
> >  		if (!kallsyms_lookup_size_offset(func, &symbolsize, &offset))
> >  			goto fallback;
> >  		func -= offset;
> >  
> > +		replaced[i] = stack_entries[skip];
> >  		if (func == target_func) {
> > -			*replaced = stack_entries[skip];
> > +			/*
> > +			 * There must be at least 1 entry left in the original
> > +			 * @stack_entries, so we know that we will never occupy
> > +			 * more than @num_entries - 1 of @replaced.
> > +			 */
> > +			replaced[i + 1] = 0;
> >  			stack_entries[skip] = ip;
> >  			return skip;
> >  		}
> > @@ -341,6 +348,7 @@ replace_stack_entry(unsigned long stack_entries[], int num_entries, unsigned lon
> >  fallback:
> >  	/* Should not happen; the resulting stack trace is likely misleading. */
> >  	WARN_ONCE(1, "Cannot find frame for %pS in stack trace", (void *)ip);
> > +	replaced[0] = 0;
> >  	return get_stack_skipnr(stack_entries, num_entries);
> >  }
> >  
> > @@ -365,11 +373,16 @@ static int sym_strcmp(void *addr1, void *addr2)
> >  }
> >  
> >  static void
> > -print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long reordered_to)
> > +print_stack_trace(unsigned long stack_entries[], int num_entries, unsigned long *reordered_to)
> >  {
> >  	stack_trace_print(stack_entries, num_entries, 0);
> > -	if (reordered_to)
> > -		pr_err("  |\n  +-> reordered to: %pS\n", (void *)reordered_to);
> > +	if (reordered_to[0]) {
> > +		int i;
> > +
> > +		pr_err("  |\n  +-> reordered to: %pS\n", (void *)reordered_to[0]);
> > +		for (i = 1; i < NUM_STACK_ENTRIES && reordered_to[i]; ++i)
> > +			pr_err("                    %pS\n", (void *)reordered_to[i]);
> > +	}
> >  }
> >  
> >  static void print_verbose_info(struct task_struct *task)
> > @@ -390,12 +403,12 @@ static void print_report(enum kcsan_value_change value_change,
> >  			 struct other_info *other_info,
> >  			 u64 old, u64 new, u64 mask)
> >  {
> > -	unsigned long reordered_to = 0;
> > +	unsigned long reordered_to[NUM_STACK_ENTRIES] = { 0 };
> >  	unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> >  	int num_stack_entries = stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> > -	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip, &reordered_to);
> > +	int skipnr = sanitize_stack_entries(stack_entries, num_stack_entries, ai->ip, reordered_to);
> >  	unsigned long this_frame = stack_entries[skipnr];
> > -	unsigned long other_reordered_to = 0;
> > +	unsigned long other_reordered_to[NUM_STACK_ENTRIES] = { 0 };
> >  	unsigned long other_frame = 0;
> >  	int other_skipnr = 0; /* silence uninit warnings */
> >  
> > @@ -408,7 +421,7 @@ static void print_report(enum kcsan_value_change value_change,
> >  	if (other_info) {
> >  		other_skipnr = sanitize_stack_entries(other_info->stack_entries,
> >  						      other_info->num_stack_entries,
> > -						      other_info->ai.ip, &other_reordered_to);
> > +						      other_info->ai.ip, other_reordered_to);
> >  		other_frame = other_info->stack_entries[other_skipnr];
> >  
> >  		/* @value_change is only known for the other thread */
> > -- 
> > 2.34.1.400.ga245620fadb-goog
> > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-12-06 17:39 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-30 11:44 [PATCH v3 00/25] kcsan: Support detecting a subset of missing memory barriers Marco Elver
2021-11-30 11:44 ` [PATCH v3 01/25] kcsan: Refactor reading of instrumented memory Marco Elver
2021-11-30 11:44 ` [PATCH v3 02/25] kcsan: Remove redundant zero-initialization of globals Marco Elver
2021-11-30 11:44 ` [PATCH v3 03/25] kcsan: Avoid checking scoped accesses from nested contexts Marco Elver
2021-11-30 11:44 ` [PATCH v3 04/25] kcsan: Add core support for a subset of weak memory modeling Marco Elver
2021-12-03  8:56   ` Marco Elver
2021-12-03 16:50     ` Paul E. McKenney
2021-12-03 21:08       ` Paul E. McKenney
2021-12-03 23:42         ` Marco Elver
2021-12-03 23:42         ` Paul E. McKenney
2021-12-03 23:45           ` Marco Elver
2021-12-04  1:14             ` Paul E. McKenney
2021-11-30 11:44 ` [PATCH v3 05/25] kcsan: Add core memory barrier instrumentation functions Marco Elver
2021-11-30 11:44 ` [PATCH v3 06/25] kcsan, kbuild: Add option for barrier instrumentation only Marco Elver
2021-11-30 11:44 ` [PATCH v3 07/25] kcsan: Call scoped accesses reordered in reports Marco Elver
2021-11-30 11:44 ` [PATCH v3 08/25] kcsan: Show location access was reordered to Marco Elver
2021-12-06  5:03   ` Boqun Feng
2021-12-06  7:16     ` Marco Elver
2021-12-06 14:31       ` Boqun Feng
2021-12-06 16:04         ` Marco Elver
2021-12-06 17:16           ` Boqun Feng
2021-12-06 17:38             ` Paul E. McKenney
2021-11-30 11:44 ` [PATCH v3 09/25] kcsan: Document modeling of weak memory Marco Elver
2021-11-30 11:44 ` [PATCH v3 10/25] kcsan: test: Match reordered or normal accesses Marco Elver
2021-11-30 11:44 ` [PATCH v3 11/25] kcsan: test: Add test cases for memory barrier instrumentation Marco Elver
2021-11-30 11:44 ` [PATCH v3 12/25] kcsan: Ignore GCC 11+ warnings about TSan runtime support Marco Elver
2021-11-30 11:44 ` [PATCH v3 13/25] kcsan: selftest: Add test case to check memory barrier instrumentation Marco Elver
2021-11-30 11:44 ` [PATCH v3 14/25] locking/barriers, kcsan: Add instrumentation for barriers Marco Elver
2021-11-30 11:44 ` [PATCH v3 15/25] locking/barriers, kcsan: Support generic instrumentation Marco Elver
2021-11-30 11:44 ` [PATCH v3 16/25] locking/atomics, kcsan: Add instrumentation for barriers Marco Elver
2021-11-30 11:44 ` [PATCH v3 17/25] asm-generic/bitops, " Marco Elver
2021-11-30 11:44 ` [PATCH v3 18/25] x86/barriers, kcsan: Use generic instrumentation for non-smp barriers Marco Elver
2021-11-30 11:44 ` [PATCH v3 19/25] x86/qspinlock, kcsan: Instrument barrier of pv_queued_spin_unlock() Marco Elver
2021-11-30 11:44 ` [PATCH v3 20/25] mm, kcsan: Enable barrier instrumentation Marco Elver
2021-11-30 11:44 ` [PATCH v3 21/25] sched, kcsan: Enable memory " Marco Elver
2021-11-30 11:44 ` [PATCH v3 22/25] objtool, kcsan: Add memory barrier instrumentation to whitelist Marco Elver
2021-11-30 11:44 ` [PATCH v3 23/25] objtool, kcsan: Remove memory barrier instrumentation from noinstr Marco Elver
2021-11-30 11:44 ` [PATCH v3 24/25] compiler_attributes.h: Add __disable_sanitizer_instrumentation Marco Elver
2021-11-30 11:44 ` [PATCH v3 25/25] kcsan: Support WEAK_MEMORY with Clang where no objtool support exists Marco Elver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).