LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/4] locking/rwbase: Assorted fixes
@ 2021-09-09 10:59 Peter Zijlstra
  2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-09 10:59 UTC (permalink / raw)
  To: tglx, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira


Hai, these here patches are what I've got based on Boqun's review.
The plan is to stuff in them in locking/urgent somewhere tomorrow.

Please holler if you see anything that ought to make me reconsider this :-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-09 10:59 [PATCH 0/4] locking/rwbase: Assorted fixes Peter Zijlstra
@ 2021-09-09 10:59 ` Peter Zijlstra
  2021-09-09 13:45   ` Will Deacon
                     ` (2 more replies)
  2021-09-09 10:59 ` [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state() Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-09 10:59 UTC (permalink / raw)
  To: tglx, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

While looking at current_save_and_set_rtlock_wait_state() I'm thinking
it really ought to use smp_store_mb(), because something like:

	current_save_and_set_rtlock_wait_state();
	for (;;) {
		if (try_lock())
			break;

		raw_spin_unlock_irq(&lock->wait_lock);
		schedule();
		raw_spin_lock_irq(&lock->wait_lock);

		set_current_state(TASK_RTLOCK_WAIT);
	}
	current_restore_rtlock_saved_state();

which is the advertised usage in the comment, is actually broken,
since trylock() will only need a load-acquire in general and that
could be re-ordered against the state store, which could lead to a
missed wakeup -> BAD (tm).

While there, make them consistent with the IRQ usage in
set_special_state().

Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched.h |   19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -245,7 +245,8 @@ struct task_group;
  *		if (try_lock())
  *			break;
  *		raw_spin_unlock_irq(&lock->wait_lock);
- *		schedule_rtlock();
+ *		if (!cond)
+ *			schedule_rtlock();
  *		raw_spin_lock_irq(&lock->wait_lock);
  *		set_current_state(TASK_RTLOCK_WAIT);
  *	}
@@ -253,22 +254,24 @@ struct task_group;
  */
 #define current_save_and_set_rtlock_wait_state()			\
 	do {								\
-		lockdep_assert_irqs_disabled();				\
-		raw_spin_lock(&current->pi_lock);			\
+		unsigned long flags; /* may shadow */			\
+									\
+		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
 		current->saved_state = current->__state;		\
 		debug_rtlock_wait_set_state();				\
-		WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT);		\
-		raw_spin_unlock(&current->pi_lock);			\
+		smp_store_mb(current->__state, TASK_RTLOCK_WAIT);	\
+		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
 	} while (0);
 
 #define current_restore_rtlock_saved_state()				\
 	do {								\
-		lockdep_assert_irqs_disabled();				\
-		raw_spin_lock(&current->pi_lock);			\
+		unsigned long flags; /* may shadow */			\
+									\
+		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
 		debug_rtlock_wait_restore_state();			\
 		WRITE_ONCE(current->__state, current->saved_state);	\
 		current->saved_state = TASK_RUNNING;			\
-		raw_spin_unlock(&current->pi_lock);			\
+		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
 	} while (0);
 
 #define get_current_state()	READ_ONCE(current->__state)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state()
  2021-09-09 10:59 [PATCH 0/4] locking/rwbase: Assorted fixes Peter Zijlstra
  2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
@ 2021-09-09 10:59 ` Peter Zijlstra
  2021-09-09 13:53   ` Will Deacon
                     ` (2 more replies)
  2021-09-09 10:59 ` [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock() Peter Zijlstra
  2021-09-09 10:59 ` [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader Peter Zijlstra
  3 siblings, 3 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-09 10:59 UTC (permalink / raw)
  To: tglx, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

Noticed while looking at the readers race.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/rwbase_rt.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -220,7 +220,7 @@ static int __sched rwbase_write_lock(str
 	for (; atomic_read(&rwb->readers);) {
 		/* Optimized out for rwlocks */
 		if (rwbase_signal_pending_state(state, current)) {
-			__set_current_state(TASK_RUNNING);
+			rwbase_restore_current_state();
 			__rwbase_write_unlock(rwb, 0, flags);
 			return -EINTR;
 		}



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock()
  2021-09-09 10:59 [PATCH 0/4] locking/rwbase: Assorted fixes Peter Zijlstra
  2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
  2021-09-09 10:59 ` [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state() Peter Zijlstra
@ 2021-09-09 10:59 ` Peter Zijlstra
  2021-09-14  7:45   ` Thomas Gleixner
  2021-09-09 10:59 ` [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader Peter Zijlstra
  3 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-09 10:59 UTC (permalink / raw)
  To: tglx, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

Boqun noticed that the write-trylock sequence of load+set is broken in
rwbase_write_lock()'s wait-loop since they're not both under the same
wait_lock instance.

Restructure the code to make this more obvious and correct.

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/rwbase_rt.c |   44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -196,6 +196,19 @@ static inline void rwbase_write_downgrad
 	__rwbase_write_unlock(rwb, WRITER_BIAS - 1, flags);
 }
 
+static inline bool __rwbase_write_trylock(struct rwbase_rt *rwb)
+{
+	/* Can do without CAS because we're serialized by wait_lock. */
+	lockdep_assert_held(&rwb->rtmutex.wait_lock);
+
+	if (!atomic_read(&rwb->readers)) {
+		atomic_set(&rwb->readers, WRITER_BIAS);
+		return 1;
+	}
+
+	return 0;
+}
+
 static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 				     unsigned int state)
 {
@@ -210,34 +223,30 @@ static int __sched rwbase_write_lock(str
 	atomic_sub(READER_BIAS, &rwb->readers);
 
 	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
-	/*
-	 * set_current_state() for rw_semaphore
-	 * current_save_and_set_rtlock_wait_state() for rwlock
-	 */
-	rwbase_set_and_save_current_state(state);
+	if (__rwbase_write_trylock(rwb))
+		goto out_unlock;
 
-	/* Block until all readers have left the critical section. */
-	for (; atomic_read(&rwb->readers);) {
+	rwbase_set_and_save_current_state(state);
+	for (;;) {
 		/* Optimized out for rwlocks */
 		if (rwbase_signal_pending_state(state, current)) {
 			rwbase_restore_current_state();
 			__rwbase_write_unlock(rwb, 0, flags);
 			return -EINTR;
 		}
+
+		if (__rwbase_write_trylock(rwb))
+			break;
+
 		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
+		rwbase_schedule();
+		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
 
-		/*
-		 * Schedule and wait for the readers to leave the critical
-		 * section. The last reader leaving it wakes the waiter.
-		 */
-		if (atomic_read(&rwb->readers) != 0)
-			rwbase_schedule();
 		set_current_state(state);
-		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
 	}
-
-	atomic_set(&rwb->readers, WRITER_BIAS);
 	rwbase_restore_current_state();
+
+out_unlock:
 	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
 	return 0;
 }
@@ -253,8 +262,7 @@ static inline int rwbase_write_trylock(s
 	atomic_sub(READER_BIAS, &rwb->readers);
 
 	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
-	if (!atomic_read(&rwb->readers)) {
-		atomic_set(&rwb->readers, WRITER_BIAS);
+	if (__rwbase_write_trylock(rwb)) {
 		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
 		return 1;
 	}



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader
  2021-09-09 10:59 [PATCH 0/4] locking/rwbase: Assorted fixes Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-09-09 10:59 ` [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock() Peter Zijlstra
@ 2021-09-09 10:59 ` Peter Zijlstra
  2021-09-14  7:46   ` Thomas Gleixner
  2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Boqun Feng
  3 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-09 10:59 UTC (permalink / raw)
  To: tglx, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

From: Boqun Feng <boqun.feng@gmail.com>

Readers of rwbase can lock and unlock without taking any inner lock, if
that happens, we need the ordering provided by atomic operations to
satisfy the ordering semantics of lock/unlock. Without that, considering
the follow case:

	{ X = 0 initially }

	CPU 0			CPU 1
	=====			=====
				rt_write_lock();
				X = 1
				rt_write_unlock():
				  atomic_add(READER_BIAS - WRITER_BIAS, ->readers);
				  // ->readers is READER_BIAS.
	rt_read_lock():
	  if ((r = atomic_read(->readers)) < 0) // True
	    atomic_try_cmpxchg(->readers, r, r + 1); // succeed.
	  <acquire the read lock via fast path>

	r1 = X;	// r1 may be 0, because nothing prevent the reordering
	        // of "X=1" and atomic_add() on CPU 1.

Therefore audit every usage of atomic operations that may happen in a
fast path, and add necessary barriers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/rwbase_rt.c |   21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -41,6 +41,12 @@
  * The risk of writer starvation is there, but the pathological use cases
  * which trigger it are not necessarily the typical RT workloads.
  *
+ * Fast-path orderings:
+ * The lock/unlock of readers can run in fast paths: lock and unlock are only
+ * atomic ops, and there is no inner lock to provide ACQUIRE and RELEASE
+ * semantics of rwbase_rt. Atomic ops should thus provide _acquire()
+ * and _release() (or stronger).
+ *
  * Common code shared between RT rw_semaphore and rwlock
  */
 
@@ -53,6 +59,7 @@ static __always_inline int rwbase_read_t
 	 * set.
 	 */
 	for (r = atomic_read(&rwb->readers); r < 0;) {
+		/* Fully-ordered if cmpxchg() succeeds, provides ACQUIRE */
 		if (likely(atomic_try_cmpxchg(&rwb->readers, &r, r + 1)))
 			return 1;
 	}
@@ -162,6 +169,8 @@ static __always_inline void rwbase_read_
 	/*
 	 * rwb->readers can only hit 0 when a writer is waiting for the
 	 * active readers to leave the critical section.
+	 *
+	 * dec_and_test() is fully ordered, provides RELEASE.
 	 */
 	if (unlikely(atomic_dec_and_test(&rwb->readers)))
 		__rwbase_read_unlock(rwb, state);
@@ -172,7 +181,11 @@ static inline void __rwbase_write_unlock
 {
 	struct rt_mutex_base *rtm = &rwb->rtmutex;
 
-	atomic_add(READER_BIAS - bias, &rwb->readers);
+	/*
+	 * _release() is needed in case that reader is in fast path, pairing
+	 * with atomic_try_cmpxchg() in rwbase_read_trylock(), provides RELEASE
+	 */
+	(void)atomic_add_return_release(READER_BIAS - bias, &rwb->readers);
 	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
 	rwbase_rtmutex_unlock(rtm);
 }
@@ -201,7 +214,11 @@ static inline bool __rwbase_write_tryloc
 	/* Can do without CAS because we're serialized by wait_lock. */
 	lockdep_assert_held(&rwb->rtmutex.wait_lock);
 
-	if (!atomic_read(&rwb->readers)) {
+ 	/*
+	 * _acquire is needed in case the reader is in the fast path, pairing
+	 * with rwbase_read_unlock(), provides ACQUIRE.
+	 */
+	if (!atomic_read_acquire(&rwb->readers)) {
 		atomic_set(&rwb->readers, WRITER_BIAS);
 		return 1;
 	}



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
@ 2021-09-09 13:45   ` Will Deacon
  2021-09-09 14:27     ` Peter Zijlstra
  2021-09-10 12:45   ` Sebastian Andrzej Siewior
  2021-09-13 22:08   ` Thomas Gleixner
  2 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2021-09-09 13:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> it really ought to use smp_store_mb(), because something like:
> 
> 	current_save_and_set_rtlock_wait_state();
> 	for (;;) {
> 		if (try_lock())
> 			break;
> 
> 		raw_spin_unlock_irq(&lock->wait_lock);
> 		schedule();
> 		raw_spin_lock_irq(&lock->wait_lock);
> 
> 		set_current_state(TASK_RTLOCK_WAIT);
> 	}
> 	current_restore_rtlock_saved_state();
> 
> which is the advertised usage in the comment, is actually broken,
> since trylock() will only need a load-acquire in general and that
> could be re-ordered against the state store, which could lead to a
> missed wakeup -> BAD (tm).

Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
order the state change before the successful try_lock? I'm just struggling
to envisage how this actually goes wrong.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state()
  2021-09-09 10:59 ` [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state() Peter Zijlstra
@ 2021-09-09 13:53   ` Will Deacon
  2021-09-14  7:31   ` Thomas Gleixner
  2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: Will Deacon @ 2021-09-09 13:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09, 2021 at 12:59:17PM +0200, Peter Zijlstra wrote:
> Noticed while looking at the readers race.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/locking/rwbase_rt.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/locking/rwbase_rt.c
> +++ b/kernel/locking/rwbase_rt.c
> @@ -220,7 +220,7 @@ static int __sched rwbase_write_lock(str
>  	for (; atomic_read(&rwb->readers);) {
>  		/* Optimized out for rwlocks */
>  		if (rwbase_signal_pending_state(state, current)) {
> -			__set_current_state(TASK_RUNNING);
> +			rwbase_restore_current_state();
>  			__rwbase_write_unlock(rwb, 0, flags);
>  			return -EINTR;

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-09 13:45   ` Will Deacon
@ 2021-09-09 14:27     ` Peter Zijlstra
  2021-09-10 12:57       ` Will Deacon
  2021-09-12  3:57       ` Boqun Feng
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-09 14:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote:
> On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> > While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> > it really ought to use smp_store_mb(), because something like:
> > 
> > 	current_save_and_set_rtlock_wait_state();
> > 	for (;;) {
> > 		if (try_lock())
> > 			break;
> > 
> > 		raw_spin_unlock_irq(&lock->wait_lock);
> > 		schedule();
> > 		raw_spin_lock_irq(&lock->wait_lock);
> > 
> > 		set_current_state(TASK_RTLOCK_WAIT);
> > 	}
> > 	current_restore_rtlock_saved_state();
> > 
> > which is the advertised usage in the comment, is actually broken,
> > since trylock() will only need a load-acquire in general and that
> > could be re-ordered against the state store, which could lead to a
> > missed wakeup -> BAD (tm).
> 
> Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
> order the state change before the successful try_lock? I'm just struggling
> to envisage how this actually goes wrong.

Moo yes, so the earlier changelog I wrote was something like:

	current_save_and_set_rtlock_wait_state();
	for (;;) {
		if (try_lock())
			break;

		raw_spin_unlock_irq(&lock->wait_lock);
		if (!cond)
			schedule();
		raw_spin_lock_irq(&lock->wait_lock);

		set_current_state(TASK_RTLOCK_WAIT);
	}
	current_restore_rtlock_saved_state();

which is more what the code looks like before these patches, and in that
case the @cond load can be lifted before __state.

It all sorta works in the current application because most things are
serialized by ->wait_lock, but given the 'normal' wait pattern I got
highly suspicious of there not being a full barrier around.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
  2021-09-09 13:45   ` Will Deacon
@ 2021-09-10 12:45   ` Sebastian Andrzej Siewior
  2021-09-13 22:08   ` Thomas Gleixner
  2 siblings, 0 replies; 27+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-10 12:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Will Deacon, Waiman Long,
	Mike Galbraith, Daniel Bristot de Oliveira

On 2021-09-09 12:59:16 [+0200], Peter Zijlstra wrote:
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -253,22 +254,24 @@ struct task_group;
>   */
>  #define current_save_and_set_rtlock_wait_state()			\
>  	do {								\
> -		lockdep_assert_irqs_disabled();				\
> -		raw_spin_lock(&current->pi_lock);			\
> +		unsigned long flags; /* may shadow */			\
could we haz __flags so no shadow?

> +									\
> +		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
>  		current->saved_state = current->__state;		\
>  		debug_rtlock_wait_set_state();				\
> -		WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT);		\
> -		raw_spin_unlock(&current->pi_lock);			\
> +		smp_store_mb(current->__state, TASK_RTLOCK_WAIT);	\
> +		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\

Sebastian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-09 14:27     ` Peter Zijlstra
@ 2021-09-10 12:57       ` Will Deacon
  2021-09-10 13:17         ` Peter Zijlstra
  2021-09-12  3:57       ` Boqun Feng
  1 sibling, 1 reply; 27+ messages in thread
From: Will Deacon @ 2021-09-10 12:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> > > While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> > > it really ought to use smp_store_mb(), because something like:
> > > 
> > > 	current_save_and_set_rtlock_wait_state();
> > > 	for (;;) {
> > > 		if (try_lock())
> > > 			break;
> > > 
> > > 		raw_spin_unlock_irq(&lock->wait_lock);
> > > 		schedule();
> > > 		raw_spin_lock_irq(&lock->wait_lock);
> > > 
> > > 		set_current_state(TASK_RTLOCK_WAIT);
> > > 	}
> > > 	current_restore_rtlock_saved_state();
> > > 
> > > which is the advertised usage in the comment, is actually broken,
> > > since trylock() will only need a load-acquire in general and that
> > > could be re-ordered against the state store, which could lead to a
> > > missed wakeup -> BAD (tm).
> > 
> > Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
> > order the state change before the successful try_lock? I'm just struggling
> > to envisage how this actually goes wrong.
> 
> Moo yes, so the earlier changelog I wrote was something like:
> 
> 	current_save_and_set_rtlock_wait_state();
> 	for (;;) {
> 		if (try_lock())
> 			break;
> 
> 		raw_spin_unlock_irq(&lock->wait_lock);
> 		if (!cond)
> 			schedule();
> 		raw_spin_lock_irq(&lock->wait_lock);
> 
> 		set_current_state(TASK_RTLOCK_WAIT);
> 	}
> 	current_restore_rtlock_saved_state();
> 
> which is more what the code looks like before these patches, and in that
> case the @cond load can be lifted before __state.

Ah, so that makes more sense, thanks. I can't see how the try_lock() could
be reordered though, as it's going to have to do an atomic rmw.

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-10 12:57       ` Will Deacon
@ 2021-09-10 13:17         ` Peter Zijlstra
  2021-09-10 14:01           ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-10 13:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Fri, Sep 10, 2021 at 01:57:26PM +0100, Will Deacon wrote:
> On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> > Moo yes, so the earlier changelog I wrote was something like:
> > 
> > 	current_save_and_set_rtlock_wait_state();
> > 	for (;;) {
> > 		if (try_lock())
> > 			break;
> > 
> > 		raw_spin_unlock_irq(&lock->wait_lock);
> > 		if (!cond)
> > 			schedule();
> > 		raw_spin_lock_irq(&lock->wait_lock);
> > 
> > 		set_current_state(TASK_RTLOCK_WAIT);
> > 	}
> > 	current_restore_rtlock_saved_state();
> > 
> > which is more what the code looks like before these patches, and in that
> > case the @cond load can be lifted before __state.
> 
> Ah, so that makes more sense, thanks. I can't see how the try_lock() could
> be reordered though, as it's going to have to do an atomic rmw.

OK, lemme go update the Changelog and make it __flags for bigeasy :-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-10 13:17         ` Peter Zijlstra
@ 2021-09-10 14:01           ` Peter Zijlstra
  2021-09-10 15:06             ` Will Deacon
  2021-09-10 16:07             ` Waiman Long
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-10 14:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Fri, Sep 10, 2021 at 03:17:04PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 10, 2021 at 01:57:26PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> > > Moo yes, so the earlier changelog I wrote was something like:
> > > 
> > > 	current_save_and_set_rtlock_wait_state();
> > > 	for (;;) {
> > > 		if (try_lock())
> > > 			break;
> > > 
> > > 		raw_spin_unlock_irq(&lock->wait_lock);
> > > 		if (!cond)
> > > 			schedule();
> > > 		raw_spin_lock_irq(&lock->wait_lock);
> > > 
> > > 		set_current_state(TASK_RTLOCK_WAIT);
> > > 	}
> > > 	current_restore_rtlock_saved_state();
> > > 
> > > which is more what the code looks like before these patches, and in that
> > > case the @cond load can be lifted before __state.
> > 
> > Ah, so that makes more sense, thanks. I can't see how the try_lock() could
> > be reordered though, as it's going to have to do an atomic rmw.
> 
> OK, lemme go update the Changelog and make it __flags for bigeasy :-)

The patch now reads:

---
Subject: sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 09 Sep 2021 12:59:16 +0200

While looking at current_save_and_set_rtlock_wait_state() I'm thinking
it really ought to use smp_store_mb(), because using it for a more
traditional wait loop like:

	current_save_and_set_rtlock_wait_state();
	for (;;) {
		if (cond)
			schedule();

		set_current_state(TASK_RTLOCK_WAIT);
	}
	current_restore_rtlock_saved_state();

is actually broken, since the cond load could be re-ordered against
the state store, which could lead to a missed wakeup -> BAD (tm).

While there, make them consistent with the IRQ usage in
set_special_state().

Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210909110203.767330253@infradead.org
---
 include/linux/sched.h |   21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -218,7 +218,7 @@ struct task_group;
  */
 #define set_special_state(state_value)					\
 	do {								\
-		unsigned long flags; /* may shadow */			\
+		unsigned long __flags; /* may shadow */			\
 									\
 		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
 		debug_special_state_change((state_value));		\
@@ -245,7 +245,8 @@ struct task_group;
  *		if (try_lock())
  *			break;
  *		raw_spin_unlock_irq(&lock->wait_lock);
- *		schedule_rtlock();
+ *		if (!cond)
+ *			schedule_rtlock();
  *		raw_spin_lock_irq(&lock->wait_lock);
  *		set_current_state(TASK_RTLOCK_WAIT);
  *	}
@@ -253,22 +254,24 @@ struct task_group;
  */
 #define current_save_and_set_rtlock_wait_state()			\
 	do {								\
-		lockdep_assert_irqs_disabled();				\
-		raw_spin_lock(&current->pi_lock);			\
+		unsigned long __flags; /* may shadow */			\
+									\
+		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
 		current->saved_state = current->__state;		\
 		debug_rtlock_wait_set_state();				\
-		WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT);		\
-		raw_spin_unlock(&current->pi_lock);			\
+		smp_store_mb(current->__state, TASK_RTLOCK_WAIT);	\
+		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
 	} while (0);
 
 #define current_restore_rtlock_saved_state()				\
 	do {								\
-		lockdep_assert_irqs_disabled();				\
-		raw_spin_lock(&current->pi_lock);			\
+		unsigned long __flags; /* may shadow */			\
+									\
+		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
 		debug_rtlock_wait_restore_state();			\
 		WRITE_ONCE(current->__state, current->saved_state);	\
 		current->saved_state = TASK_RUNNING;			\
-		raw_spin_unlock(&current->pi_lock);			\
+		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
 	} while (0);
 
 #define get_current_state()	READ_ONCE(current->__state)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-10 14:01           ` Peter Zijlstra
@ 2021-09-10 15:06             ` Will Deacon
  2021-09-10 16:07             ` Waiman Long
  1 sibling, 0 replies; 27+ messages in thread
From: Will Deacon @ 2021-09-10 15:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Fri, Sep 10, 2021 at 04:01:29PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 10, 2021 at 03:17:04PM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 10, 2021 at 01:57:26PM +0100, Will Deacon wrote:
> > > On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> > > > Moo yes, so the earlier changelog I wrote was something like:
> > > > 
> > > > 	current_save_and_set_rtlock_wait_state();
> > > > 	for (;;) {
> > > > 		if (try_lock())
> > > > 			break;
> > > > 
> > > > 		raw_spin_unlock_irq(&lock->wait_lock);
> > > > 		if (!cond)
> > > > 			schedule();
> > > > 		raw_spin_lock_irq(&lock->wait_lock);
> > > > 
> > > > 		set_current_state(TASK_RTLOCK_WAIT);
> > > > 	}
> > > > 	current_restore_rtlock_saved_state();
> > > > 
> > > > which is more what the code looks like before these patches, and in that
> > > > case the @cond load can be lifted before __state.
> > > 
> > > Ah, so that makes more sense, thanks. I can't see how the try_lock() could
> > > be reordered though, as it's going to have to do an atomic rmw.
> > 
> > OK, lemme go update the Changelog and make it __flags for bigeasy :-)
> 
> The patch now reads:
> 
> ---
> Subject: sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu, 09 Sep 2021 12:59:16 +0200
> 
> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> it really ought to use smp_store_mb(), because using it for a more
> traditional wait loop like:
> 
> 	current_save_and_set_rtlock_wait_state();
> 	for (;;) {
> 		if (cond)
> 			schedule();
> 
> 		set_current_state(TASK_RTLOCK_WAIT);
> 	}
> 	current_restore_rtlock_saved_state();
> 
> is actually broken, since the cond load could be re-ordered against
> the state store, which could lead to a missed wakeup -> BAD (tm).
> 
> While there, make them consistent with the IRQ usage in
> set_special_state().

Cheers, that's much better:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-10 14:01           ` Peter Zijlstra
  2021-09-10 15:06             ` Will Deacon
@ 2021-09-10 16:07             ` Waiman Long
  2021-09-10 17:09               ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Waiman Long @ 2021-09-10 16:07 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon
  Cc: tglx, boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Sebastian Andrzej Siewior,
	Mike Galbraith, Daniel Bristot de Oliveira

On 9/10/21 10:01 AM, Peter Zijlstra wrote:
> On Fri, Sep 10, 2021 at 03:17:04PM +0200, Peter Zijlstra wrote:
>> On Fri, Sep 10, 2021 at 01:57:26PM +0100, Will Deacon wrote:
>>> On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
>>>> Moo yes, so the earlier changelog I wrote was something like:
>>>>
>>>> 	current_save_and_set_rtlock_wait_state();
>>>> 	for (;;) {
>>>> 		if (try_lock())
>>>> 			break;
>>>>
>>>> 		raw_spin_unlock_irq(&lock->wait_lock);
>>>> 		if (!cond)
>>>> 			schedule();
>>>> 		raw_spin_lock_irq(&lock->wait_lock);
>>>>
>>>> 		set_current_state(TASK_RTLOCK_WAIT);
>>>> 	}
>>>> 	current_restore_rtlock_saved_state();
>>>>
>>>> which is more what the code looks like before these patches, and in that
>>>> case the @cond load can be lifted before __state.
>>> Ah, so that makes more sense, thanks. I can't see how the try_lock() could
>>> be reordered though, as it's going to have to do an atomic rmw.
>> OK, lemme go update the Changelog and make it __flags for bigeasy :-)
> The patch now reads:
>
> ---
> Subject: sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu, 09 Sep 2021 12:59:16 +0200
>
> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> it really ought to use smp_store_mb(), because using it for a more
> traditional wait loop like:
>
> 	current_save_and_set_rtlock_wait_state();
> 	for (;;) {
> 		if (cond)
> 			schedule();
>
> 		set_current_state(TASK_RTLOCK_WAIT);
> 	}
> 	current_restore_rtlock_saved_state();
>
> is actually broken, since the cond load could be re-ordered against
> the state store, which could lead to a missed wakeup -> BAD (tm).
>
> While there, make them consistent with the IRQ usage in
> set_special_state().
>
> Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20210909110203.767330253@infradead.org
> ---
>   include/linux/sched.h |   21 ++++++++++++---------
>   1 file changed, 12 insertions(+), 9 deletions(-)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -218,7 +218,7 @@ struct task_group;
>    */
>   #define set_special_state(state_value)					\
>   	do {								\
> -		unsigned long flags; /* may shadow */			\
> +		unsigned long __flags; /* may shadow */			\

Do you still need the "may shadow" comment?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-10 16:07             ` Waiman Long
@ 2021-09-10 17:09               ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-10 17:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, tglx, boqun.feng, linux-kernel, Ingo Molnar,
	Juri Lelli, Steven Rostedt, Davidlohr Bueso,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Fri, Sep 10, 2021 at 12:07:40PM -0400, Waiman Long wrote:
> >   #define set_special_state(state_value)					\
> >   	do {								\
> > -		unsigned long flags; /* may shadow */			\
> > +		unsigned long __flags; /* may shadow */			\
> 
> Do you still need the "may shadow" comment?

Strictly speaking yes, there _could_ be a local variable called __flags,
however unlikely.

At some point someone was going around and doing dodgy patches based on
compiler warnings about variables being shadowed, this it to preempt
anybody trying to 'fix' this (again).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-09 14:27     ` Peter Zijlstra
  2021-09-10 12:57       ` Will Deacon
@ 2021-09-12  3:57       ` Boqun Feng
  1 sibling, 0 replies; 27+ messages in thread
From: Boqun Feng @ 2021-09-12  3:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, tglx, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> > > While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> > > it really ought to use smp_store_mb(), because something like:
> > > 
> > > 	current_save_and_set_rtlock_wait_state();
> > > 	for (;;) {
> > > 		if (try_lock())
> > > 			break;
> > > 
> > > 		raw_spin_unlock_irq(&lock->wait_lock);
> > > 		schedule();
> > > 		raw_spin_lock_irq(&lock->wait_lock);
> > > 
> > > 		set_current_state(TASK_RTLOCK_WAIT);
> > > 	}
> > > 	current_restore_rtlock_saved_state();
> > > 
> > > which is the advertised usage in the comment, is actually broken,
> > > since trylock() will only need a load-acquire in general and that
> > > could be re-ordered against the state store, which could lead to a
> > > missed wakeup -> BAD (tm).
> > 
> > Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
> > order the state change before the successful try_lock? I'm just struggling
> > to envisage how this actually goes wrong.
> 
> Moo yes, so the earlier changelog I wrote was something like:
> 
> 	current_save_and_set_rtlock_wait_state();
> 	for (;;) {
> 		if (try_lock())
> 			break;
> 
> 		raw_spin_unlock_irq(&lock->wait_lock);
> 		if (!cond)
> 			schedule();
> 		raw_spin_lock_irq(&lock->wait_lock);
> 
> 		set_current_state(TASK_RTLOCK_WAIT);
> 	}
> 	current_restore_rtlock_saved_state();
> 
> which is more what the code looks like before these patches, and in that
> case the @cond load can be lifted before __state.
> 
> It all sorta works in the current application because most things are
> serialized by ->wait_lock, but given the 'normal' wait pattern I got
> highly suspicious of there not being a full barrier around.

Hmm.. I think ->pi_lock actually protects us here. IIUC, a mising
wake-up would happen if try_to_wake_up() failed to observe the __state
change by the about-to-wait task, and the about-to-wait task didn't
observe the condition set by the waker task, for example:

	TASK 0				TASK 1
	======				======
					cond = 1;
					...
					try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..):
					  ttwu_state_match(...)
					    if (t0->__state & TASK_RTLOCK_WAIT) // false
					      ..
					    return false; // don't wake up
	...
	current->__state = TASK_RTLOCK_WAIT
	...
	if (!cond) // !cond is true because of memory reordering
	  schedule(); // sleep, and may not be waken up again.

But let's add ->pi_lock critical sections into the example:

	TASK 0				TASK 1
	======				======
					cond = 1;
					...
					try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..):
					  raw_spin_lock_irqsave(->pi_lock,...);
					  ttwu_state_match(...)
					    if (t0->__state & TASK_RTLOCK_WAIT) // false
					      ..
					    return false; // don't wake up
					  raw_spin_unlock_irqrestore(->pi_lock,...); // A
	...
	raw_spin_lock_irqsave(->pi_lock, ...); // B
	current->__state = TASK_RTLOCK_WAIT
	raw_spin_unlock_irqrestore(->pi_lock, ...);
	if (!cond)
	  schedule();

Now the read of cond on TASK0 must observe the store of cond on TASK1,
because accesses to __state is serialized by ->pi_lock, so if TASK1's
read to __state didn't observe the write of TASK0 to __state, then the
lock B must read from the unlock A (or another unlock co-after A),
then we have a release-acquire pair to guarantee that the read of cond
on TASK0 sees the write of cond on TASK1. Simplify this by a litmus
test below:

	C unlock-lock
	{
	}

	P0(spinlock_t *s, int *cond, int *state)
	{
		int r1;

		spin_lock(s);
		WRITE_ONCE(*state, 1);
		spin_unlock(s);
		r1 = READ_ONCE(*cond);
	}

	P1(spinlock_t *s, int *cond, int *state)
	{
		int r1;

		WRITE_ONCE(*cond, 1);
		spin_lock(s);
		r1 = READ_ONCE(*state);
		spin_unlock(s);
	}

	exists (0:r1=0 /\ 1:r1=0)

and result is:

	Test unlock-lock Allowed
	States 3
	0:r1=0; 1:r1=1;
	0:r1=1; 1:r1=0;
	0:r1=1; 1:r1=1;
	No
	Witnesses
	Positive: 0 Negative: 3
	Condition exists (0:r1=0 /\ 1:r1=0)
	Observation unlock-lock Never 0 3
	Time unlock-lock 0.01
	Hash=e1f914505f07e380405f65d3b0fb6940

In short, since we write to the __state with ->pi_lock held, I don't
think we need to smp_store_mb() for __state. But maybe I'm missing
something subtle here ;-)

Regards,
Boqun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
  2021-09-09 13:45   ` Will Deacon
  2021-09-10 12:45   ` Sebastian Andrzej Siewior
@ 2021-09-13 22:08   ` Thomas Gleixner
  2021-09-13 22:52     ` Thomas Gleixner
  2 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2021-09-13 22:08 UTC (permalink / raw)
  To: Peter Zijlstra, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09 2021 at 12:59, Peter Zijlstra wrote:
> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> it really ought to use smp_store_mb(), because something like:
>
> 	current_save_and_set_rtlock_wait_state();
> 	for (;;) {
> 		if (try_lock())
> 			break;
>
> 		raw_spin_unlock_irq(&lock->wait_lock);
> 		schedule();
> 		raw_spin_lock_irq(&lock->wait_lock);
>
> 		set_current_state(TASK_RTLOCK_WAIT);
> 	}
> 	current_restore_rtlock_saved_state();
>
> which is the advertised usage in the comment, is actually broken,
> since trylock() will only need a load-acquire in general and that
> could be re-ordered against the state store, which could lead to a
> missed wakeup -> BAD (tm).

I don't think so because both the state store and the wakeup are
serialized via tsk->pi_lock.

> While there, make them consistent with the IRQ usage in
> set_special_state().
>
> Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/sched.h |   19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -245,7 +245,8 @@ struct task_group;
>   *		if (try_lock())
>   *			break;
>   *		raw_spin_unlock_irq(&lock->wait_lock);
> - *		schedule_rtlock();
> + *		if (!cond)
> + *			schedule_rtlock();

cond is not really relevant here.

>   *		raw_spin_lock_irq(&lock->wait_lock);
>   *		set_current_state(TASK_RTLOCK_WAIT);
>   *	}
> @@ -253,22 +254,24 @@ struct task_group;
>   */
>  #define current_save_and_set_rtlock_wait_state()			\
>  	do {								\
> -		lockdep_assert_irqs_disabled();				\
> -		raw_spin_lock(&current->pi_lock);			\
> +		unsigned long flags; /* may shadow */			\
> +									\
> +		raw_spin_lock_irqsave(&current->pi_lock, flags);	\

Why? This is solely for the rtlock use case which invokes this with
interrupts disabled. So why do we need that irqsave() overhead here?

>  		current->saved_state = current->__state;		\
>  		debug_rtlock_wait_set_state();				\
> -		WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT);		\
> -		raw_spin_unlock(&current->pi_lock);			\
> +		smp_store_mb(current->__state, TASK_RTLOCK_WAIT);	\

The try_lock() does not matter at all here, really. All what matters is
that the unlocker cannot observe the wrong state and that's fully
serialized via tsk::pi_lock.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-13 22:08   ` Thomas Gleixner
@ 2021-09-13 22:52     ` Thomas Gleixner
  2021-09-14  6:45       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2021-09-13 22:52 UTC (permalink / raw)
  To: Peter Zijlstra, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Tue, Sep 14 2021 at 00:08, Thomas Gleixner wrote:

> On Thu, Sep 09 2021 at 12:59, Peter Zijlstra wrote:
>> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
>> it really ought to use smp_store_mb(), because something like:
>>
>> 	current_save_and_set_rtlock_wait_state();
>> 	for (;;) {
>> 		if (try_lock())
>> 			break;
>>
>> 		raw_spin_unlock_irq(&lock->wait_lock);
>> 		schedule();
>> 		raw_spin_lock_irq(&lock->wait_lock);
>>
>> 		set_current_state(TASK_RTLOCK_WAIT);
>> 	}
>> 	current_restore_rtlock_saved_state();
>>
>> which is the advertised usage in the comment, is actually broken,
>> since trylock() will only need a load-acquire in general and that
>> could be re-ordered against the state store, which could lead to a
>> missed wakeup -> BAD (tm).
>
> I don't think so because both the state store and the wakeup are
> serialized via tsk->pi_lock.
>
>> While there, make them consistent with the IRQ usage in
>> set_special_state().
>>
>> Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> ---
>>  include/linux/sched.h |   19 +++++++++++--------
>>  1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -245,7 +245,8 @@ struct task_group;
>>   *		if (try_lock())
>>   *			break;
>>   *		raw_spin_unlock_irq(&lock->wait_lock);
>> - *		schedule_rtlock();
>> + *		if (!cond)
>> + *			schedule_rtlock();
>
> cond is not really relevant here.
>
>>   *		raw_spin_lock_irq(&lock->wait_lock);
>>   *		set_current_state(TASK_RTLOCK_WAIT);
>>   *	}
>> @@ -253,22 +254,24 @@ struct task_group;
>>   */
>>  #define current_save_and_set_rtlock_wait_state()			\
>>  	do {								\
>> -		lockdep_assert_irqs_disabled();				\
>> -		raw_spin_lock(&current->pi_lock);			\
>> +		unsigned long flags; /* may shadow */			\
>> +									\
>> +		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
>
> Why? This is solely for the rtlock use case which invokes this with
> interrupts disabled. So why do we need that irqsave() overhead here?
>
>>  		current->saved_state = current->__state;		\
>>  		debug_rtlock_wait_set_state();				\
>> -		WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT);		\
>> -		raw_spin_unlock(&current->pi_lock);			\
>> +		smp_store_mb(current->__state, TASK_RTLOCK_WAIT);	\
>
> The try_lock() does not matter at all here, really. All what matters is
> that the unlocker cannot observe the wrong state and that's fully
> serialized via tsk::pi_lock.

If your reasoning would be correct, then set_special_state() would be
broken as well.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
  2021-09-13 22:52     ` Thomas Gleixner
@ 2021-09-14  6:45       ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-14  6:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Tue, Sep 14, 2021 at 12:52:29AM +0200, Thomas Gleixner wrote:
> On Tue, Sep 14 2021 at 00:08, Thomas Gleixner wrote:

> If your reasoning would be correct, then set_special_state() would be
> broken as well.

I've dropped the patch.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state()
  2021-09-09 10:59 ` [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state() Peter Zijlstra
  2021-09-09 13:53   ` Will Deacon
@ 2021-09-14  7:31   ` Thomas Gleixner
  2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2021-09-14  7:31 UTC (permalink / raw)
  To: Peter Zijlstra, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09 2021 at 12:59, Peter Zijlstra wrote:
> Noticed while looking at the readers race.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/locking/rwbase_rt.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/locking/rwbase_rt.c
> +++ b/kernel/locking/rwbase_rt.c
> @@ -220,7 +220,7 @@ static int __sched rwbase_write_lock(str
>  	for (; atomic_read(&rwb->readers);) {
>  		/* Optimized out for rwlocks */
>  		if (rwbase_signal_pending_state(state, current)) {
> -			__set_current_state(TASK_RUNNING);
> +			rwbase_restore_current_state();

Right, that's functionally equivalent and makes the code more consistent.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock()
  2021-09-09 10:59 ` [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock() Peter Zijlstra
@ 2021-09-14  7:45   ` Thomas Gleixner
  2021-09-14 13:59     ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2021-09-14  7:45 UTC (permalink / raw)
  To: Peter Zijlstra, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09 2021 at 12:59, Peter Zijlstra wrote:

> Boqun noticed that the write-trylock sequence of load+set is broken in
> rwbase_write_lock()'s wait-loop since they're not both under the same
> wait_lock instance.

Confused.

lock(); A

for (; atomic_read(readers);) {
   ...
   unlock();
   ..
   lock(); B
}

atomic_set();
unlock(); A or B

The read/set is always in the same lock instance.

> Restructure the code to make this more obvious and correct.

I agree that it's easier to read, but I disagree that it makes the code
more correct.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader
  2021-09-09 10:59 ` [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader Peter Zijlstra
@ 2021-09-14  7:46   ` Thomas Gleixner
  2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Boqun Feng
  1 sibling, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2021-09-14  7:46 UTC (permalink / raw)
  To: Peter Zijlstra, boqun.feng
  Cc: linux-kernel, peterz, Ingo Molnar, Juri Lelli, Steven Rostedt,
	Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Thu, Sep 09 2021 at 12:59, Peter Zijlstra wrote:
> From: Boqun Feng <boqun.feng@gmail.com>
>
> Readers of rwbase can lock and unlock without taking any inner lock, if
> that happens, we need the ordering provided by atomic operations to
> satisfy the ordering semantics of lock/unlock. Without that, considering
> the follow case:
>
> 	{ X = 0 initially }
>
> 	CPU 0			CPU 1
> 	=====			=====
> 				rt_write_lock();
> 				X = 1
> 				rt_write_unlock():
> 				  atomic_add(READER_BIAS - WRITER_BIAS, ->readers);
> 				  // ->readers is READER_BIAS.
> 	rt_read_lock():
> 	  if ((r = atomic_read(->readers)) < 0) // True
> 	    atomic_try_cmpxchg(->readers, r, r + 1); // succeed.
> 	  <acquire the read lock via fast path>
>
> 	r1 = X;	// r1 may be 0, because nothing prevent the reordering
> 	        // of "X=1" and atomic_add() on CPU 1.
>
> Therefore audit every usage of atomic operations that may happen in a
> fast path, and add necessary barriers.
>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/locking/rwbase_rt.c |   21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> --- a/kernel/locking/rwbase_rt.c
> +++ b/kernel/locking/rwbase_rt.c
> @@ -41,6 +41,12 @@
>   * The risk of writer starvation is there, but the pathological use cases
>   * which trigger it are not necessarily the typical RT workloads.
>   *
> + * Fast-path orderings:
> + * The lock/unlock of readers can run in fast paths: lock and unlock are only
> + * atomic ops, and there is no inner lock to provide ACQUIRE and RELEASE
> + * semantics of rwbase_rt. Atomic ops should thus provide _acquire()
> + * and _release() (or stronger).
> + *
>   * Common code shared between RT rw_semaphore and rwlock
>   */
>  
> @@ -53,6 +59,7 @@ static __always_inline int rwbase_read_t
>  	 * set.
>  	 */
>  	for (r = atomic_read(&rwb->readers); r < 0;) {
> +		/* Fully-ordered if cmpxchg() succeeds, provides ACQUIRE */
>  		if (likely(atomic_try_cmpxchg(&rwb->readers, &r, r + 1)))
>  			return 1;
>  	}
> @@ -162,6 +169,8 @@ static __always_inline void rwbase_read_
>  	/*
>  	 * rwb->readers can only hit 0 when a writer is waiting for the
>  	 * active readers to leave the critical section.
> +	 *
> +	 * dec_and_test() is fully ordered, provides RELEASE.
>  	 */
>  	if (unlikely(atomic_dec_and_test(&rwb->readers)))
>  		__rwbase_read_unlock(rwb, state);
> @@ -172,7 +181,11 @@ static inline void __rwbase_write_unlock
>  {
>  	struct rt_mutex_base *rtm = &rwb->rtmutex;
>  
> -	atomic_add(READER_BIAS - bias, &rwb->readers);
> +	/*
> +	 * _release() is needed in case that reader is in fast path, pairing
> +	 * with atomic_try_cmpxchg() in rwbase_read_trylock(), provides RELEASE
> +	 */
> +	(void)atomic_add_return_release(READER_BIAS - bias, &rwb->readers);
>  	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
>  	rwbase_rtmutex_unlock(rtm);
>  }
> @@ -201,7 +214,11 @@ static inline bool __rwbase_write_tryloc
>  	/* Can do without CAS because we're serialized by wait_lock. */
>  	lockdep_assert_held(&rwb->rtmutex.wait_lock);
>  
> -	if (!atomic_read(&rwb->readers)) {
> + 	/*
> +	 * _acquire is needed in case the reader is in the fast path, pairing
> +	 * with rwbase_read_unlock(), provides ACQUIRE.
> +	 */
> +	if (!atomic_read_acquire(&rwb->readers)) {
>  		atomic_set(&rwb->readers, WRITER_BIAS);
>  		return 1;
>  	}

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock()
  2021-09-14  7:45   ` Thomas Gleixner
@ 2021-09-14 13:59     ` Peter Zijlstra
  2021-09-14 15:00       ` Thomas Gleixner
  2021-09-16 11:59       ` [tip: locking/urgent] locking/rwbase: Extract __rwbase_write_trylock() tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-09-14 13:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Tue, Sep 14, 2021 at 09:45:12AM +0200, Thomas Gleixner wrote:
> On Thu, Sep 09 2021 at 12:59, Peter Zijlstra wrote:
> 
> > Boqun noticed that the write-trylock sequence of load+set is broken in
> > rwbase_write_lock()'s wait-loop since they're not both under the same
> > wait_lock instance.
> 
> Confused.
> 
> lock(); A
> 
> for (; atomic_read(readers);) {
>    ...
>    unlock();
>    ..
>    lock(); B
> }
> 
> atomic_set();
> unlock(); A or B
> 
> The read/set is always in the same lock instance.

I really did make a mess of things didn't I :-/ It was some intermediate
state that was broken.

How's this then?

---
Subject: locking/rwbase: Extract __rwbase_write_trylock()
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 09 Sep 2021 12:59:18 +0200

The code in rwbase_write_lock() is a little non-obvious vs the
read+set 'trylock', extract the sequence into a helper function to
clarify the code.

This also provides a single site to fix fast-path ordering.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/rwbase_rt.c |   44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -196,6 +196,19 @@ static inline void rwbase_write_downgrad
 	__rwbase_write_unlock(rwb, WRITER_BIAS - 1, flags);
 }
 
+static inline bool __rwbase_write_trylock(struct rwbase_rt *rwb)
+{
+	/* Can do without CAS because we're serialized by wait_lock. */
+	lockdep_assert_held(&rwb->rtmutex.wait_lock);
+
+	if (!atomic_read(&rwb->readers)) {
+		atomic_set(&rwb->readers, WRITER_BIAS);
+		return 1;
+	}
+
+	return 0;
+}
+
 static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 				     unsigned int state)
 {
@@ -210,34 +223,30 @@ static int __sched rwbase_write_lock(str
 	atomic_sub(READER_BIAS, &rwb->readers);
 
 	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
-	/*
-	 * set_current_state() for rw_semaphore
-	 * current_save_and_set_rtlock_wait_state() for rwlock
-	 */
-	rwbase_set_and_save_current_state(state);
+	if (__rwbase_write_trylock(rwb))
+		goto out_unlock;
 
-	/* Block until all readers have left the critical section. */
-	for (; atomic_read(&rwb->readers);) {
+	rwbase_set_and_save_current_state(state);
+	for (;;) {
 		/* Optimized out for rwlocks */
 		if (rwbase_signal_pending_state(state, current)) {
 			rwbase_restore_current_state();
 			__rwbase_write_unlock(rwb, 0, flags);
 			return -EINTR;
 		}
+
+		if (__rwbase_write_trylock(rwb))
+			break;
+
 		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
+		rwbase_schedule();
+		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
 
-		/*
-		 * Schedule and wait for the readers to leave the critical
-		 * section. The last reader leaving it wakes the waiter.
-		 */
-		if (atomic_read(&rwb->readers) != 0)
-			rwbase_schedule();
 		set_current_state(state);
-		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
 	}
-
-	atomic_set(&rwb->readers, WRITER_BIAS);
 	rwbase_restore_current_state();
+
+out_unlock:
 	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
 	return 0;
 }
@@ -253,8 +262,7 @@ static inline int rwbase_write_trylock(s
 	atomic_sub(READER_BIAS, &rwb->readers);
 
 	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
-	if (!atomic_read(&rwb->readers)) {
-		atomic_set(&rwb->readers, WRITER_BIAS);
+	if (__rwbase_write_trylock(rwb)) {
 		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
 		return 1;
 	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock()
  2021-09-14 13:59     ` Peter Zijlstra
@ 2021-09-14 15:00       ` Thomas Gleixner
  2021-09-16 11:59       ` [tip: locking/urgent] locking/rwbase: Extract __rwbase_write_trylock() tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2021-09-14 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: boqun.feng, linux-kernel, Ingo Molnar, Juri Lelli,
	Steven Rostedt, Davidlohr Bueso, Will Deacon, Waiman Long,
	Sebastian Andrzej Siewior, Mike Galbraith,
	Daniel Bristot de Oliveira

On Tue, Sep 14 2021 at 15:59, Peter Zijlstra wrote:
> On Tue, Sep 14, 2021 at 09:45:12AM +0200, Thomas Gleixner wrote:
>> The read/set is always in the same lock instance.
>
> I really did make a mess of things didn't I :-/ It was some intermediate
> state that was broken.

Thinking about memory ordering can reorder your memory :)

> How's this then?
>
> ---
> Subject: locking/rwbase: Extract __rwbase_write_trylock()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu, 09 Sep 2021 12:59:18 +0200
>
> The code in rwbase_write_lock() is a little non-obvious vs the
> read+set 'trylock', extract the sequence into a helper function to
> clarify the code.
>
> This also provides a single site to fix fast-path ordering.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip: locking/urgent] locking/rwbase: Extract __rwbase_write_trylock()
  2021-09-14 13:59     ` Peter Zijlstra
  2021-09-14 15:00       ` Thomas Gleixner
@ 2021-09-16 11:59       ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-09-16 11:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     616be87eac9fa2ab2dca1069712f7236e50f3bf6
Gitweb:        https://git.kernel.org/tip/616be87eac9fa2ab2dca1069712f7236e50f3bf6
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Thu, 09 Sep 2021 12:59:18 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 15 Sep 2021 17:49:15 +02:00

locking/rwbase: Extract __rwbase_write_trylock()

The code in rwbase_write_lock() is a little non-obvious vs the
read+set 'trylock', extract the sequence into a helper function to
clarify the code.

This also provides a single site to fix fast-path ordering.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/YUCq3L+u44NDieEJ@hirez.programming.kicks-ass.net
---
 kernel/locking/rwbase_rt.c | 44 +++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 542b017..48fbbcc 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -196,6 +196,19 @@ static inline void rwbase_write_downgrade(struct rwbase_rt *rwb)
 	__rwbase_write_unlock(rwb, WRITER_BIAS - 1, flags);
 }
 
+static inline bool __rwbase_write_trylock(struct rwbase_rt *rwb)
+{
+	/* Can do without CAS because we're serialized by wait_lock. */
+	lockdep_assert_held(&rwb->rtmutex.wait_lock);
+
+	if (!atomic_read(&rwb->readers)) {
+		atomic_set(&rwb->readers, WRITER_BIAS);
+		return 1;
+	}
+
+	return 0;
+}
+
 static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 				     unsigned int state)
 {
@@ -210,34 +223,30 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 	atomic_sub(READER_BIAS, &rwb->readers);
 
 	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
-	/*
-	 * set_current_state() for rw_semaphore
-	 * current_save_and_set_rtlock_wait_state() for rwlock
-	 */
-	rwbase_set_and_save_current_state(state);
+	if (__rwbase_write_trylock(rwb))
+		goto out_unlock;
 
-	/* Block until all readers have left the critical section. */
-	for (; atomic_read(&rwb->readers);) {
+	rwbase_set_and_save_current_state(state);
+	for (;;) {
 		/* Optimized out for rwlocks */
 		if (rwbase_signal_pending_state(state, current)) {
 			rwbase_restore_current_state();
 			__rwbase_write_unlock(rwb, 0, flags);
 			return -EINTR;
 		}
+
+		if (__rwbase_write_trylock(rwb))
+			break;
+
 		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
+		rwbase_schedule();
+		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
 
-		/*
-		 * Schedule and wait for the readers to leave the critical
-		 * section. The last reader leaving it wakes the waiter.
-		 */
-		if (atomic_read(&rwb->readers) != 0)
-			rwbase_schedule();
 		set_current_state(state);
-		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
 	}
-
-	atomic_set(&rwb->readers, WRITER_BIAS);
 	rwbase_restore_current_state();
+
+out_unlock:
 	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
 	return 0;
 }
@@ -253,8 +262,7 @@ static inline int rwbase_write_trylock(struct rwbase_rt *rwb)
 	atomic_sub(READER_BIAS, &rwb->readers);
 
 	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
-	if (!atomic_read(&rwb->readers)) {
-		atomic_set(&rwb->readers, WRITER_BIAS);
+	if (__rwbase_write_trylock(rwb)) {
 		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
 		return 1;
 	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip: locking/urgent] locking/rwbase: Take care of ordering guarantee for fastpath reader
  2021-09-09 10:59 ` [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader Peter Zijlstra
  2021-09-14  7:46   ` Thomas Gleixner
@ 2021-09-16 11:59   ` tip-bot2 for Boqun Feng
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Boqun Feng @ 2021-09-16 11:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Boqun Feng, Peter Zijlstra (Intel), Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     81121524f1c798c9481bd7900450b72ee7ac2eef
Gitweb:        https://git.kernel.org/tip/81121524f1c798c9481bd7900450b72ee7ac2eef
Author:        Boqun Feng <boqun.feng@gmail.com>
AuthorDate:    Thu, 09 Sep 2021 12:59:19 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 15 Sep 2021 17:49:16 +02:00

locking/rwbase: Take care of ordering guarantee for fastpath reader

Readers of rwbase can lock and unlock without taking any inner lock, if
that happens, we need the ordering provided by atomic operations to
satisfy the ordering semantics of lock/unlock. Without that, considering
the follow case:

	{ X = 0 initially }

	CPU 0			CPU 1
	=====			=====
				rt_write_lock();
				X = 1
				rt_write_unlock():
				  atomic_add(READER_BIAS - WRITER_BIAS, ->readers);
				  // ->readers is READER_BIAS.
	rt_read_lock():
	  if ((r = atomic_read(->readers)) < 0) // True
	    atomic_try_cmpxchg(->readers, r, r + 1); // succeed.
	  <acquire the read lock via fast path>

	r1 = X;	// r1 may be 0, because nothing prevent the reordering
	        // of "X=1" and atomic_add() on CPU 1.

Therefore audit every usage of atomic operations that may happen in a
fast path, and add necessary barriers.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20210909110203.953991276@infradead.org
---
 kernel/locking/rwbase_rt.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 48fbbcc..88191f6 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -41,6 +41,12 @@
  * The risk of writer starvation is there, but the pathological use cases
  * which trigger it are not necessarily the typical RT workloads.
  *
+ * Fast-path orderings:
+ * The lock/unlock of readers can run in fast paths: lock and unlock are only
+ * atomic ops, and there is no inner lock to provide ACQUIRE and RELEASE
+ * semantics of rwbase_rt. Atomic ops should thus provide _acquire()
+ * and _release() (or stronger).
+ *
  * Common code shared between RT rw_semaphore and rwlock
  */
 
@@ -53,6 +59,7 @@ static __always_inline int rwbase_read_trylock(struct rwbase_rt *rwb)
 	 * set.
 	 */
 	for (r = atomic_read(&rwb->readers); r < 0;) {
+		/* Fully-ordered if cmpxchg() succeeds, provides ACQUIRE */
 		if (likely(atomic_try_cmpxchg(&rwb->readers, &r, r + 1)))
 			return 1;
 	}
@@ -162,6 +169,8 @@ static __always_inline void rwbase_read_unlock(struct rwbase_rt *rwb,
 	/*
 	 * rwb->readers can only hit 0 when a writer is waiting for the
 	 * active readers to leave the critical section.
+	 *
+	 * dec_and_test() is fully ordered, provides RELEASE.
 	 */
 	if (unlikely(atomic_dec_and_test(&rwb->readers)))
 		__rwbase_read_unlock(rwb, state);
@@ -172,7 +181,11 @@ static inline void __rwbase_write_unlock(struct rwbase_rt *rwb, int bias,
 {
 	struct rt_mutex_base *rtm = &rwb->rtmutex;
 
-	atomic_add(READER_BIAS - bias, &rwb->readers);
+	/*
+	 * _release() is needed in case that reader is in fast path, pairing
+	 * with atomic_try_cmpxchg() in rwbase_read_trylock(), provides RELEASE
+	 */
+	(void)atomic_add_return_release(READER_BIAS - bias, &rwb->readers);
 	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
 	rwbase_rtmutex_unlock(rtm);
 }
@@ -201,7 +214,11 @@ static inline bool __rwbase_write_trylock(struct rwbase_rt *rwb)
 	/* Can do without CAS because we're serialized by wait_lock. */
 	lockdep_assert_held(&rwb->rtmutex.wait_lock);
 
-	if (!atomic_read(&rwb->readers)) {
+	/*
+	 * _acquire is needed in case the reader is in the fast path, pairing
+	 * with rwbase_read_unlock(), provides ACQUIRE.
+	 */
+	if (!atomic_read_acquire(&rwb->readers)) {
 		atomic_set(&rwb->readers, WRITER_BIAS);
 		return 1;
 	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip: locking/urgent] locking/rwbase: Properly match set_and_save_state() to restore_state()
  2021-09-09 10:59 ` [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state() Peter Zijlstra
  2021-09-09 13:53   ` Will Deacon
  2021-09-14  7:31   ` Thomas Gleixner
@ 2021-09-16 11:59   ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-09-16 11:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, Will Deacon, x86, linux-kernel

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     7687201e37fabf2b7cf2b828f7ca46bf30e2948f
Gitweb:        https://git.kernel.org/tip/7687201e37fabf2b7cf2b828f7ca46bf30e2948f
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Thu, 09 Sep 2021 12:59:17 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 15 Sep 2021 17:49:15 +02:00

locking/rwbase: Properly match set_and_save_state() to restore_state()

Noticed while looking at the readers race.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20210909110203.828203010@infradead.org
---
 kernel/locking/rwbase_rt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 4ba1508..542b017 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -220,7 +220,7 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 	for (; atomic_read(&rwb->readers);) {
 		/* Optimized out for rwlocks */
 		if (rwbase_signal_pending_state(state, current)) {
-			__set_current_state(TASK_RUNNING);
+			rwbase_restore_current_state();
 			__rwbase_write_unlock(rwb, 0, flags);
 			return -EINTR;
 		}

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-09-16 11:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09 10:59 [PATCH 0/4] locking/rwbase: Assorted fixes Peter Zijlstra
2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
2021-09-09 13:45   ` Will Deacon
2021-09-09 14:27     ` Peter Zijlstra
2021-09-10 12:57       ` Will Deacon
2021-09-10 13:17         ` Peter Zijlstra
2021-09-10 14:01           ` Peter Zijlstra
2021-09-10 15:06             ` Will Deacon
2021-09-10 16:07             ` Waiman Long
2021-09-10 17:09               ` Peter Zijlstra
2021-09-12  3:57       ` Boqun Feng
2021-09-10 12:45   ` Sebastian Andrzej Siewior
2021-09-13 22:08   ` Thomas Gleixner
2021-09-13 22:52     ` Thomas Gleixner
2021-09-14  6:45       ` Peter Zijlstra
2021-09-09 10:59 ` [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state() Peter Zijlstra
2021-09-09 13:53   ` Will Deacon
2021-09-14  7:31   ` Thomas Gleixner
2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Peter Zijlstra
2021-09-09 10:59 ` [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock() Peter Zijlstra
2021-09-14  7:45   ` Thomas Gleixner
2021-09-14 13:59     ` Peter Zijlstra
2021-09-14 15:00       ` Thomas Gleixner
2021-09-16 11:59       ` [tip: locking/urgent] locking/rwbase: Extract __rwbase_write_trylock() tip-bot2 for Peter Zijlstra
2021-09-09 10:59 ` [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader Peter Zijlstra
2021-09-14  7:46   ` Thomas Gleixner
2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Boqun Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).