LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 3.14.25-rt22 0/2] rtmutex Real-Time Linux: fix kernel BUG at kernel/locking/rtmutex.c:997! and some optimization
@ 2015-02-20  1:31 Thavatchai Makphaibulchoke
  2015-02-20  1:31 ` [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-02-20  1:31 UTC (permalink / raw)
  To: rostedt, linux-kernel
  Cc: mingo, tglx, linux-rt-users, Thavatchai Makphaibulchoke

This patch series compose of 2 patches.

First patch, fixing kernel BUG at kernel/locking/rtmutex.c:997!

Second patch, some code optimation in kernel/locking/rtmutex.c

Thavatchai Makphaibulchoke (2):
  rtmutex Real-Time Linux: Fixing kernel BUG at
    kernel/locking/rtmutex.c:997!
  kernel/locking/rtmutex.c: some code optimization

 include/linux/spinlock_rt.h     |   4 +
 kernel/locking/rtmutex-debug.c  |  15 ++-
 kernel/locking/rtmutex.c        | 248 ++++++++++++++++++++++++++++------------
 kernel/locking/rtmutex_common.h |  21 ++++
 kernel/timer.c                  |   4 +-
 5 files changed, 214 insertions(+), 78 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-20  1:31 [PATCH 3.14.25-rt22 0/2] rtmutex Real-Time Linux: fix kernel BUG at kernel/locking/rtmutex.c:997! and some optimization Thavatchai Makphaibulchoke
@ 2015-02-20  1:31 ` Thavatchai Makphaibulchoke
  2015-02-20  4:53   ` Steven Rostedt
  2015-02-23 18:37   ` Steven Rostedt
  2015-02-20  1:31 ` [PATCH 3.14.25-rt22 2/2] kernel/locking/rtmutex.c: some code optimization Thavatchai Makphaibulchoke
  2015-04-07  1:26 ` [PATCH v2 0/2] rtmutex Real-Time Linux: fix BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
  2 siblings, 2 replies; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-02-20  1:31 UTC (permalink / raw)
  To: rostedt, linux-kernel
  Cc: mingo, tglx, linux-rt-users, Thavatchai Makphaibulchoke

This patch fixes the problem that the ownership of a mutex acquired by an
interrupt handler(IH) gets incorrectly attributed to the interrupted thread.

This could result in an incorrect deadlock detection in function
rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading
up to a system hang.

Here is the approach taken: when calling from an interrupt handler, instead of
attributing ownership to the interrupted task, use a reserved task_struct value
to indicate that the owner is a interrupt handler.  This approach avoids the
incorrect deadlock detection.

This also includes changes in several function in rtmutex.c now that the lock's
requester may be a interrupt handler, not a real task struct.  This impacts
the way how the lock is acquired and prioritized and decision whether to do
the house keeping functions required for a real task struct.

The reserved task_struct values for interrupt handler are

	current | 0x2

where current is the task_struct value of the interrupted task.

Since IH will both acquire and release the lock only during an interrupt
handling, during which current is not changed, the reserved task_struct value
for an IH should be distinct from another instances of IH on a different cpu.

Kernel version 3.14.25 + patch-3.14.25-rt22

Signed-off-by: T. Makphaibulchoke <tmac@hp.com>
---
 include/linux/spinlock_rt.h     |   4 +
 kernel/locking/rtmutex-debug.c  |  15 ++-
 kernel/locking/rtmutex.c        | 212 ++++++++++++++++++++++++++++------------
 kernel/locking/rtmutex_common.h |  21 ++++
 kernel/timer.c                  |   4 +-
 5 files changed, 188 insertions(+), 68 deletions(-)

diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h
index c0d1367..eeb4188 100644
--- a/include/linux/spinlock_rt.h
+++ b/include/linux/spinlock_rt.h
@@ -27,6 +27,7 @@ extern void __lockfunc rt_spin_unlock_wait(spinlock_t *lock);
 extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags);
 extern int __lockfunc rt_spin_trylock_bh(spinlock_t *lock);
 extern int __lockfunc rt_spin_trylock(spinlock_t *lock);
+extern int __lockfunc rt_spin_trylock_in_interrupt(spinlock_t *lock);
 extern int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock);
 
 /*
@@ -52,6 +53,9 @@ extern int __lockfunc __rt_spin_trylock(struct rt_mutex *lock);
 
 #define spin_lock_irq(lock)		spin_lock(lock)
 
+#define spin_do_trylock_in_interrupt(lock)	\
+		__cond_lock(lock, rt_spin_trylock_in_interrupt(lock))
+
 #define spin_do_trylock(lock)		__cond_lock(lock, rt_spin_trylock(lock))
 
 #define spin_trylock(lock)			\
diff --git a/kernel/locking/rtmutex-debug.c b/kernel/locking/rtmutex-debug.c
index 49b2ed3..c36d629 100644
--- a/kernel/locking/rtmutex-debug.c
+++ b/kernel/locking/rtmutex-debug.c
@@ -40,6 +40,8 @@ static void printk_task(struct task_struct *p)
 
 static void printk_lock(struct rt_mutex *lock, int print_owner)
 {
+	struct task_struct *owner = rt_mutex_owner(lock);
+
 	if (lock->name)
 		printk(" [%p] {%s}\n",
 			lock, lock->name);
@@ -47,10 +49,13 @@ static void printk_lock(struct rt_mutex *lock, int print_owner)
 		printk(" [%p] {%s:%d}\n",
 			lock, lock->file, lock->line);
 
-	if (print_owner && rt_mutex_owner(lock)) {
+	if (print_owner && owner) {
 		printk(".. ->owner: %p\n", lock->owner);
 		printk(".. held by:  ");
-		printk_task(rt_mutex_owner(lock));
+		if (rt_mutex_owner_is_task(owner))
+			printk_task(owner);
+		else
+			printk(" an interrupt handler.");
 		printk("\n");
 	}
 }
@@ -76,6 +81,8 @@ void debug_rt_mutex_deadlock(int detect, struct rt_mutex_waiter *act_waiter,
 
 	task = rt_mutex_owner(act_waiter->lock);
 	if (task && task != current) {
+		/* Interrupt handler should not be deadlocking. */
+		BUG_ON(!rt_mutex_owner_is_task(task));
 		act_waiter->deadlock_task_pid = get_pid(task_pid(task));
 		act_waiter->deadlock_lock = lock;
 	}
@@ -138,7 +145,9 @@ void debug_rt_mutex_lock(struct rt_mutex *lock)
 
 void debug_rt_mutex_unlock(struct rt_mutex *lock)
 {
-	DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current);
+	DEBUG_LOCKS_WARN_ON(in_interrupt() ?
+		!rt_mutex_owner_is_task(rt_mutex_owner(lock)) :
+		rt_mutex_owner(lock) != current);
 }
 
 void
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6c40660..8b66f81 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -51,6 +51,9 @@
  * waiters. This can happen when grabbing the lock in the slow path.
  * To prevent a cmpxchg of the owner releasing the lock, we need to
  * set this bit before looking at the lock.
+ *
+ * owner can also be reserved value, INTERRUPT_HANDLER, in case the mutex
+ * is owned by an interrupt handler.
  */
 
 static void
@@ -366,10 +369,11 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 				      struct rt_mutex_waiter *orig_waiter,
 				      struct task_struct *top_task)
 {
-	struct rt_mutex *lock;
+	struct rt_mutex *lock = NULL;
 	struct rt_mutex_waiter *waiter, *top_waiter = orig_waiter;
 	int detect_deadlock, ret = 0, depth = 0;
 	unsigned long flags;
+	struct task_struct *owner;
 
 	detect_deadlock = debug_rt_mutex_detect_deadlock(orig_waiter,
 							 deadlock_detect);
@@ -417,8 +421,14 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 	 * Check the orig_waiter state. After we dropped the locks,
 	 * the previous owner of the lock might have released the lock.
 	 */
-	if (orig_waiter && !rt_mutex_owner(orig_lock))
-		goto out_unlock_pi;
+	if (orig_waiter) {
+		struct task_struct *orig_owner;
+
+		WARN_ON(!orig_lock);
+		orig_owner = rt_mutex_owner(orig_lock);
+		if (!orig_owner || !rt_mutex_owner_is_task(orig_owner))
+			goto out_unlock_pi;
+	}
 
 	/*
 	 * We dropped all locks after taking a refcount on @task, so
@@ -484,16 +494,24 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 
 	/* Release the task */
 	raw_spin_unlock_irqrestore(&task->pi_lock, flags);
-	if (!rt_mutex_owner(lock)) {
-		struct rt_mutex_waiter *lock_top_waiter;
-
-		/*
-		 * If the requeue above changed the top waiter, then we need
-		 * to wake the new top waiter up to try to get the lock.
-		 */
-		lock_top_waiter = rt_mutex_top_waiter(lock);
-		if (top_waiter != lock_top_waiter)
-			rt_mutex_wake_waiter(lock_top_waiter);
+	owner = rt_mutex_owner(lock);
+	/*
+	 * No need to continue if lock is either free or
+	 * owned by an interrupt handler.
+	 */
+	if (!owner || !rt_mutex_owner_is_task(owner)) {
+		if (!owner) {
+			struct rt_mutex_waiter *lock_top_waiter;
+
+			/*
+			 * If the lock is free and the requeue above changed the
+			 * top waiter, then we need to wake the new top waiter
+			 * up to try to get the lock.
+			 */
+			lock_top_waiter = rt_mutex_top_waiter(lock);
+			if (top_waiter != lock_top_waiter)
+				rt_mutex_wake_waiter(lock_top_waiter);
+		}
 		raw_spin_unlock(&lock->wait_lock);
 		goto out_put_task;
 	}
@@ -583,6 +601,8 @@ static int
 __try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
 		       struct rt_mutex_waiter *waiter, int mode)
 {
+	int caller_is_task = rt_mutex_owner_is_task(task);
+	int has_waiters;
 	/*
 	 * We have to be careful here if the atomic speedups are
 	 * enabled, such that, when
@@ -613,43 +633,47 @@ __try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
 	 * 2) higher priority than waiters
 	 * 3) it is top waiter
 	 */
-	if (rt_mutex_has_waiters(lock)) {
+	has_waiters = rt_mutex_has_waiters(lock);
+	if (has_waiters) {
 		struct task_struct *pown = rt_mutex_top_waiter(lock)->task;
 
-		if (task != pown && !lock_is_stealable(task, pown, mode))
+		if (!caller_is_task || (task != pown &&
+			!lock_is_stealable(task, pown, mode)))
 			return 0;
 	}
 
 	/* We got the lock. */
 
-	if (waiter || rt_mutex_has_waiters(lock)) {
+	if (waiter || has_waiters) {
 		unsigned long flags;
 		struct rt_mutex_waiter *top;
 
-		raw_spin_lock_irqsave(&task->pi_lock, flags);
-
 		/* remove the queued waiter. */
-		if (waiter) {
+		if (waiter)
 			rt_mutex_dequeue(lock, waiter);
-			task->pi_blocked_on = NULL;
-		}
 
-		/*
-		 * We have to enqueue the top waiter(if it exists) into
-		 * task->pi_waiters list.
-		 */
-		if (rt_mutex_has_waiters(lock)) {
-			top = rt_mutex_top_waiter(lock);
-			rt_mutex_enqueue_pi(task, top);
+		if (caller_is_task) {
+			raw_spin_lock_irqsave(&task->pi_lock, flags);
+			if (waiter)
+				task->pi_blocked_on = NULL;
+			/*
+			 * We have to enqueue the top waiter(if it exists) into
+			 * task->pi_waiters list.
+			 */
+			if (rt_mutex_has_waiters(lock)) {
+				top = rt_mutex_top_waiter(lock);
+				rt_mutex_enqueue_pi(task, top);
+			}
+			raw_spin_unlock_irqrestore(&task->pi_lock, flags);
 		}
-		raw_spin_unlock_irqrestore(&task->pi_lock, flags);
 	}
 
 	debug_rt_mutex_lock(lock);
 
 	rt_mutex_set_owner(lock, task);
 
-	rt_mutex_deadlock_account_lock(lock, task);
+	if (caller_is_task)
+		rt_mutex_deadlock_account_lock(lock, task);
 
 	return 1;
 }
@@ -723,14 +747,13 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
 
 	raw_spin_unlock_irqrestore(&task->pi_lock, flags);
 
-	if (!owner)
+	if (!owner || !rt_mutex_owner_is_task(owner))
 		return 0;
 
 	raw_spin_lock_irqsave(&owner->pi_lock, flags);
 	if (waiter == rt_mutex_top_waiter(lock)) {
 		rt_mutex_dequeue_pi(owner, top_waiter);
 		rt_mutex_enqueue_pi(owner, waiter);
-
 		__rt_mutex_adjust_prio(owner);
 		if (rt_mutex_real_waiter(owner->pi_blocked_on))
 			chain_walk = 1;
@@ -777,21 +800,27 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
  */
 static void wakeup_next_waiter(struct rt_mutex *lock)
 {
+	struct task_struct *owner = rt_mutex_owner(lock);
 	struct rt_mutex_waiter *waiter;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&current->pi_lock, flags);
-
 	waiter = rt_mutex_top_waiter(lock);
 
-	/*
-	 * Remove it from current->pi_waiters. We do not adjust a
-	 * possible priority boost right now. We execute wakeup in the
-	 * boosted mode and go back to normal after releasing
-	 * lock->wait_lock.
-	 */
-	rt_mutex_dequeue_pi(current, waiter);
+	/* Check to make sure caller is not interrupt handler */
+	if (rt_mutex_owner_is_task(owner)) {
 
+		raw_spin_lock_irqsave(&owner->pi_lock, flags);
+
+		/*
+		 * Remove it from current->pi_waiters. We do not adjust a
+		 * possible priority boost right now. We execute wakeup in the
+		 * boosted mode and go back to normal after releasing
+		 * lock->wait_lock.
+		 */
+		rt_mutex_dequeue_pi(owner, waiter);
+
+		raw_spin_unlock_irqrestore(&owner->pi_lock, flags);
+	}
 	/*
 	 * As we are waking up the top waiter, and the waiter stays
 	 * queued on the lock until it gets the lock, this lock
@@ -802,7 +831,6 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
 	 */
 	lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
 
-	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
 
 	/*
 	 * It's safe to dereference waiter as it cannot go away as
@@ -831,7 +859,8 @@ static void remove_waiter(struct rt_mutex *lock,
 	current->pi_blocked_on = NULL;
 	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
 
-	if (!owner)
+	/* Return if no owner or owned by interrupt handler */
+	if (!owner || !rt_mutex_owner_is_task(owner))
 		return;
 
 	if (first) {
@@ -902,6 +931,8 @@ void rt_mutex_adjust_pi(struct task_struct *task)
 static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
 					 void  (*slowfn)(struct rt_mutex *lock))
 {
+	/* Might sleep, should not be called in interrupt context. */
+	BUG_ON(in_interrupt());
 	might_sleep();
 
 	if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
@@ -911,12 +942,12 @@ static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
 }
 
 static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock,
-					   void  (*slowfn)(struct rt_mutex *lock))
+	void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
 {
 	if (likely(rt_mutex_cmpxchg(lock, current, NULL)))
 		rt_mutex_deadlock_account_unlock(current);
 	else
-		slowfn(lock);
+		slowfn(lock, current);
 }
 
 #ifdef CONFIG_SMP
@@ -937,10 +968,12 @@ static int adaptive_wait(struct rt_mutex *lock,
 		 * Ensure that owner->on_cpu is dereferenced _after_
 		 * checking the above to be valid.
 		 */
-		barrier();
-		if (!owner->on_cpu) {
-			res = 1;
-			break;
+		if (rt_mutex_owner_is_task(owner)) {
+			barrier();
+			if (!owner->on_cpu) {
+				res = 1;
+				break;
+			}
 		}
 		cpu_relax();
 	}
@@ -971,6 +1004,8 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
 	struct rt_mutex_waiter waiter, *top_waiter;
 	int ret;
 
+	/* Might sleep, should not be called in interrupt context. */
+	BUG_ON(in_interrupt());
 	rt_mutex_init_waiter(&waiter, true);
 
 	raw_spin_lock(&lock->wait_lock);
@@ -1008,6 +1043,10 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
 
 		debug_rt_mutex_print_deadlock(&waiter);
 
+		/*
+		 * If lock is owned by interrupt handler, go ahead and
+		 * retry. Interrupt handler should complete soon.
+		 */
 		if (top_waiter != &waiter || adaptive_wait(lock, lock_owner))
 			schedule_rt_mutex(lock);
 
@@ -1047,11 +1086,15 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
 /*
  * Slow path to release a rt_mutex spin_lock style
  */
-static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock)
+static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock,
+	struct task_struct *task)
 {
+	int caller_is_task = rt_mutex_owner_is_task(task);
+
 	debug_rt_mutex_unlock(lock);
 
-	rt_mutex_deadlock_account_unlock(current);
+	if (caller_is_task)
+		rt_mutex_deadlock_account_unlock(task);
 
 	if (!rt_mutex_has_waiters(lock)) {
 		lock->owner = NULL;
@@ -1064,24 +1107,30 @@ static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock)
 	raw_spin_unlock(&lock->wait_lock);
 
 	/* Undo pi boosting.when necessary */
-	rt_mutex_adjust_prio(current);
+	if (caller_is_task)
+		rt_mutex_adjust_prio(task);
 }
 
-static void  noinline __sched rt_spin_lock_slowunlock(struct rt_mutex *lock)
+static noinline void __sched rt_spin_lock_slowunlock(struct rt_mutex *lock,
+	struct task_struct *task)
 {
 	raw_spin_lock(&lock->wait_lock);
-	__rt_spin_lock_slowunlock(lock);
+	__rt_spin_lock_slowunlock(lock, task);
 }
 
-static void  noinline __sched rt_spin_lock_slowunlock_hirq(struct rt_mutex *lock)
+static inline void rt_spin_lock_fastunlock_in_irq(struct rt_mutex *lock,
+	void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
 {
 	int ret;
+	struct task_struct *intr_owner = rt_mutex_owner_intr_handler(current);
 
+	if (likely(rt_mutex_cmpxchg(lock, intr_owner, NULL)))
+		return;
 	do {
 		ret = raw_spin_trylock(&lock->wait_lock);
 	} while (!ret);
 
-	__rt_spin_lock_slowunlock(lock);
+	slowfn(lock, intr_owner);
 }
 
 void __lockfunc rt_spin_lock(spinlock_t *lock)
@@ -1100,6 +1149,8 @@ EXPORT_SYMBOL(__rt_spin_lock);
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void __lockfunc rt_spin_lock_nested(spinlock_t *lock, int subclass)
 {
+	/* Might sleep, should not be called in interrupt context. */
+	BUG_ON(in_interrupt());
 	rt_spin_lock_fastlock(&lock->lock, rt_spin_lock_slowlock);
 	spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
 }
@@ -1118,7 +1169,7 @@ void __lockfunc rt_spin_unlock_after_trylock_in_irq(spinlock_t *lock)
 {
 	/* NOTE: we always pass in '1' for nested, for simplicity */
 	spin_release(&lock->dep_map, 1, _RET_IP_);
-	rt_spin_lock_fastunlock(&lock->lock, rt_spin_lock_slowunlock_hirq);
+	rt_spin_lock_fastunlock_in_irq(&lock->lock, __rt_spin_lock_slowunlock);
 }
 
 void __lockfunc __rt_spin_unlock(struct rt_mutex *lock)
@@ -1266,6 +1317,8 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state,
 	int ret = 0;
 
 	for (;;) {
+		struct task_struct *owner;
+
 		/* Try to acquire the lock: */
 		if (try_to_take_rt_mutex(lock, current, waiter))
 			break;
@@ -1290,11 +1343,17 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state,
 				break;
 		}
 
+		owner = rt_mutex_owner(lock);
 		raw_spin_unlock(&lock->wait_lock);
 
 		debug_rt_mutex_print_deadlock(waiter);
 
-		schedule_rt_mutex(lock);
+		/*
+		 * Only try to reschedule, if owner is a real task.
+		 * Interrupt handler should complete soon.
+		 */
+		if (!owner || rt_mutex_owner_is_task(owner))
+			schedule_rt_mutex(lock);
 
 		raw_spin_lock(&lock->wait_lock);
 		set_current_state(state);
@@ -1410,6 +1469,8 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 	struct rt_mutex_waiter waiter;
 	int ret = 0;
 
+	/* Might sleep, should not be called in interrupt context. */
+	BUG_ON(in_interrupt());
 	rt_mutex_init_waiter(&waiter, false);
 
 	raw_spin_lock(&lock->wait_lock);
@@ -1466,16 +1527,16 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
  * Slow path try-lock function:
  */
 static inline int
-rt_mutex_slowtrylock(struct rt_mutex *lock)
+rt_mutex_slowtrylock(struct rt_mutex *lock, struct task_struct *task)
 {
 	int ret = 0;
 
 	if (!raw_spin_trylock(&lock->wait_lock))
 		return ret;
 
-	if (likely(rt_mutex_owner(lock) != current)) {
+	if (likely(rt_mutex_owner(lock) != task)) {
 
-		ret = try_to_take_rt_mutex(lock, current, NULL);
+		ret = try_to_take_rt_mutex(lock, task, NULL);
 		/*
 		 * try_to_take_rt_mutex() sets the lock waiters
 		 * bit unconditionally. Clean this up.
@@ -1590,13 +1651,25 @@ rt_mutex_timed_fastlock(struct rt_mutex *lock, int state,
 
 static inline int
 rt_mutex_fasttrylock(struct rt_mutex *lock,
-		     int (*slowfn)(struct rt_mutex *lock))
+	int (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
 {
 	if (likely(rt_mutex_cmpxchg(lock, NULL, current))) {
 		rt_mutex_deadlock_account_lock(lock, current);
 		return 1;
 	}
-	return slowfn(lock);
+	return slowfn(lock, current);
+}
+
+static inline int
+rt_mutex_fasttrylock_in_irq(struct rt_mutex *lock,
+	int (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
+{
+	struct task_struct *intr_owner = rt_mutex_owner_intr_handler(current);
+
+	/* Called by interrupt handler, use reservered task_strcut */
+	if (likely(rt_mutex_cmpxchg(lock, NULL, intr_owner)))
+		return 1;
+	return slowfn(lock, intr_owner);
 }
 
 static inline void
@@ -1609,6 +1682,19 @@ rt_mutex_fastunlock(struct rt_mutex *lock,
 		slowfn(lock);
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+int __lockfunc rt_spin_trylock_in_interrupt(spinlock_t *lock)
+{
+	int ret = rt_mutex_fasttrylock_in_irq(&lock->lock,
+			rt_mutex_slowtrylock);
+
+	if (ret)
+		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+	return ret;
+}
+EXPORT_SYMBOL(rt_spin_trylock_in_interrupt);
+#endif /* PREEMPT_RT_FULL */
+
 /**
  * rt_mutex_lock - lock a rt_mutex
  *
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index ac636d3..fc0ad46 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -95,6 +95,8 @@ task_top_pi_waiter(struct task_struct *p)
  */
 #define RT_MUTEX_HAS_WAITERS	1UL
 #define RT_MUTEX_OWNER_MASKALL	1UL
+#define RT_MUTEX_OWNER_MASKBITS	1UL
+#define RT_MUTEX_INTR_HDLR_BITS	2UL
 
 static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
 {
@@ -103,6 +105,25 @@ static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
 }
 
 /*
+ * Function to determine if a lock->owner task_struct is a real task.
+ */
+static inline int rt_mutex_owner_is_task(struct task_struct *task)
+{
+	return !((unsigned long)task & RT_MUTEX_INTR_HDLR_BITS);
+}
+
+/*
+ * Function to generate rtmutex's owner task_struct for interrupt handler from
+ * a given interrupted task.
+ */
+static inline struct task_struct *rt_mutex_owner_intr_handler(struct task_struct
+	*task)
+{
+	return (struct task_struct *)((unsigned long)task |
+		RT_MUTEX_INTR_HDLR_BITS);
+}
+
+/*
  * PI-futex support (proxy locking functions, etc.):
  */
 #define PI_WAKEUP_INPROGRESS	((struct rt_mutex_waiter *) 1)
diff --git a/kernel/timer.c b/kernel/timer.c
index 8f1687a..b34efb6 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1389,7 +1389,7 @@ unsigned long get_next_timer_interrupt(unsigned long now)
 	 * value.  We use the rt functions here directly to avoid a
 	 * migrate_disable() call.
 	 */
-	if (!spin_do_trylock(&base->lock))
+	if (!spin_do_trylock_in_interrupt(&base->lock))
 		return  now + 1;
 #else
 	spin_lock(&base->lock);
@@ -1480,7 +1480,7 @@ void run_local_timers(void)
 		return;
 	}
 
-	if (!spin_do_trylock(&base->lock)) {
+	if (!spin_do_trylock_in_interrupt(&base->lock)) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3.14.25-rt22 2/2] kernel/locking/rtmutex.c: some code optimization
  2015-02-20  1:31 [PATCH 3.14.25-rt22 0/2] rtmutex Real-Time Linux: fix kernel BUG at kernel/locking/rtmutex.c:997! and some optimization Thavatchai Makphaibulchoke
  2015-02-20  1:31 ` [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
@ 2015-02-20  1:31 ` Thavatchai Makphaibulchoke
  2015-04-07  1:26 ` [PATCH v2 0/2] rtmutex Real-Time Linux: fix BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
  2 siblings, 0 replies; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-02-20  1:31 UTC (permalink / raw)
  To: rostedt, linux-kernel
  Cc: mingo, tglx, linux-rt-users, Thavatchai Makphaibulchoke

Adding the following code optimization,

- Reducing the number of cmpxchgs.  Only call mark_rt_mutex_waiters() when
  needed, waiters bit is not set.
- Reducing the hold time of wait_lock lock.
- Calling fixup_rt_mutex_waiters() only when needed.

Signed-off-by: T. Makphaibulchoke <tmac@hp.com>
---
 kernel/locking/rtmutex.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 8b66f81..2600026 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -622,7 +622,8 @@ __try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
 	 * any more. This is fixed up when we take the ownership.
 	 * This is the transitional state explained at the top of this file.
 	 */
-	mark_rt_mutex_waiters(lock);
+	if (!((unsigned long)lock->owner & RT_MUTEX_HAS_WAITERS))
+		mark_rt_mutex_waiters(lock);
 
 	if (rt_mutex_owner(lock))
 		return 0;
@@ -854,8 +855,8 @@ static void remove_waiter(struct rt_mutex *lock,
 	struct rt_mutex *next_lock = NULL;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&current->pi_lock, flags);
 	rt_mutex_dequeue(lock, waiter);
+	raw_spin_lock_irqsave(&current->pi_lock, flags);
 	current->pi_blocked_on = NULL;
 	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
 
@@ -960,6 +961,9 @@ static int adaptive_wait(struct rt_mutex *lock,
 {
 	int res = 0;
 
+	if (!owner)
+		return res;
+
 	rcu_read_lock();
 	for (;;) {
 		if (owner != rt_mutex_owner(lock))
@@ -1050,11 +1054,11 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
 		if (top_waiter != &waiter || adaptive_wait(lock, lock_owner))
 			schedule_rt_mutex(lock);
 
-		raw_spin_lock(&lock->wait_lock);
-
 		pi_lock(&self->pi_lock);
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		pi_unlock(&self->pi_lock);
+
+		raw_spin_lock(&lock->wait_lock);
 	}
 
 	/*
@@ -1070,10 +1074,11 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
 	pi_unlock(&self->pi_lock);
 
 	/*
-	 * try_to_take_rt_mutex() sets the waiter bit
-	 * unconditionally. We might have to fix that up:
+	 * No need to call fixup_rt_mutex_waiters(), as we only get
+	 * here when __try_to_take_rt_mutex() returns TRUE.
+	 * In this case, rt_mutex_set_owner() has already take care of the
+	 * waiter bit.
 	 */
-	fixup_rt_mutex_waiters(lock);
 
 	BUG_ON(rt_mutex_has_waiters(lock) && &waiter == rt_mutex_top_waiter(lock));
 	BUG_ON(!RB_EMPTY_NODE(&waiter.tree_entry));
@@ -1127,7 +1132,16 @@ static inline void rt_spin_lock_fastunlock_in_irq(struct rt_mutex *lock,
 	if (likely(rt_mutex_cmpxchg(lock, intr_owner, NULL)))
 		return;
 	do {
+		/*
+		 * Alternate between fast acquire and try lock and proceed
+		 * to slow lock whichever succeeds first.
+		 *
+		 * Also use reserved INTERRUPT_HANDLER task_strcut.
+		 */
 		ret = raw_spin_trylock(&lock->wait_lock);
+		if (!ret && unlikely(rt_mutex_cmpxchg(lock, intr_owner,
+			NULL)))
+			return;
 	} while (!ret);
 
 	slowfn(lock, intr_owner);
@@ -1538,10 +1552,12 @@ rt_mutex_slowtrylock(struct rt_mutex *lock, struct task_struct *task)
 
 		ret = try_to_take_rt_mutex(lock, task, NULL);
 		/*
-		 * try_to_take_rt_mutex() sets the lock waiters
-		 * bit unconditionally. Clean this up.
+		 * try_to_take_rt_mutex() leaves the lock waiters bit set
+		 * if we fail to take the lock.  Clean this up if we
+		 * don't get the lock.
 		 */
-		fixup_rt_mutex_waiters(lock);
+		if (!ret)
+			fixup_rt_mutex_waiters(lock);
 	}
 
 	raw_spin_unlock(&lock->wait_lock);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-20  1:31 ` [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
@ 2015-02-20  4:53   ` Steven Rostedt
  2015-02-20 18:54     ` Thavatchai Makphaibulchoke
  2015-02-23 18:37   ` Steven Rostedt
  1 sibling, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2015-02-20  4:53 UTC (permalink / raw)
  To: Thavatchai Makphaibulchoke; +Cc: linux-kernel, mingo, tglx, linux-rt-users

On Thu, 19 Feb 2015 18:31:05 -0700
Thavatchai Makphaibulchoke <tmac@hp.com> wrote:

> This patch fixes the problem that the ownership of a mutex acquired by an
> interrupt handler(IH) gets incorrectly attributed to the interrupted thread.

*blink*

> 
> This could result in an incorrect deadlock detection in function
> rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading
> up to a system hang.

I highly doubt this is an incorrect deadlock that was detected. My
money is on a real deadlock that happened.

> 
> Here is the approach taken: when calling from an interrupt handler, instead of
> attributing ownership to the interrupted task, use a reserved task_struct value
> to indicate that the owner is a interrupt handler.  This approach avoids the
> incorrect deadlock detection.

How is this an incorrect deadlock? Please explain.

> 
> This also includes changes in several function in rtmutex.c now that the lock's
> requester may be a interrupt handler, not a real task struct.  This impacts
> the way how the lock is acquired and prioritized and decision whether to do
> the house keeping functions required for a real task struct.
> 
> The reserved task_struct values for interrupt handler are
> 
> 	current | 0x2
> 
> where current is the task_struct value of the interrupted task.
> 
> Since IH will both acquire and release the lock only during an interrupt
> handling, during which current is not changed, the reserved task_struct value
> for an IH should be distinct from another instances of IH on a different cpu.
> 

The interrupt handler is a thread just like any other task. It's not
special. If there was a deadlock detected, it most likely means that a
deadlock exists.

-- Steve


> Kernel version 3.14.25 + patch-3.14.25-rt22
> 
> Signed-off-by: T. Makphaibulchoke <tmac@hp.com>
> ---
>  include/linux/spinlock_rt.h     |   4 +
>  kernel/locking/rtmutex-debug.c  |  15 ++-
>  kernel/locking/rtmutex.c        | 212 ++++++++++++++++++++++++++++------------
>  kernel/locking/rtmutex_common.h |  21 ++++
>  kernel/timer.c                  |   4 +-
>  5 files changed, 188 insertions(+), 68 deletions(-)
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-20  4:53   ` Steven Rostedt
@ 2015-02-20 18:54     ` Thavatchai Makphaibulchoke
  2015-02-21  1:49       ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-02-20 18:54 UTC (permalink / raw)
  To: Steven Rostedt, Thavatchai Makphaibulchoke
  Cc: linux-kernel, mingo, tglx, linux-rt-users


On 02/19/2015 09:53 PM, Steven Rostedt wrote:
> On Thu, 19 Feb 2015 18:31:05 -0700
> Thavatchai Makphaibulchoke <tmac@hp.com> wrote:
> 
>> This patch fixes the problem that the ownership of a mutex acquired by an
>> interrupt handler(IH) gets incorrectly attributed to the interrupted thread.
> 
> *blink*
> 
>>
>> This could result in an incorrect deadlock detection in function
>> rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading
>> up to a system hang.
> 
> I highly doubt this is an incorrect deadlock that was detected. My
> money is on a real deadlock that happened.
> 
>>
>> Here is the approach taken: when calling from an interrupt handler, instead of
>> attributing ownership to the interrupted task, use a reserved task_struct value
>> to indicate that the owner is a interrupt handler.  This approach avoids the
>> incorrect deadlock detection.
> 
> How is this an incorrect deadlock? Please explain.
> 

Thanks for the comments.

Sorry for not explaining the problem in more details.

IH here means the bottom half of interrupt handler, executing in the
interrupt context (IC), not the preemptible interrupt kernel thread.
interrupt.

Here is the problem we encountered.

An smp_apic_timer_interrupt comes in while task X is in the process of
waiting for mutex A .  The IH successfully locks mutex B (in this case
run_local_timers() gets the timer base's lock, base->lock, via
spin_trylock()).

At the same time, task Y holding mutex A requests mutex B.

With current rtmutex code, mutex B ownership is incorrectly attributed
to task X (using current, which is inaccurate in the IC). To task Y the
situation effectively looks like it is holding mutex A and reuqesting B,
which is held by task X holding mutex B and is now waiting for mutex A.
 The deadlock detection is correct, a classic potential circular mutex
deadlock.

In reality, it is not.  The IH the actual owner of mutex B will
eventually completes and releases mutex B and task Y will eventually get
mutex B and proceed and so will task X.  Actually either deleting or
changing BUG_ON(ret) to WARN_ON(ret) in line 997 in fucntion
rt_spin_lock_slowlock(), the test ran fine without any problem.

A more detailed description of the problem could also be found at,

http://markmail.org/message/np33it233hoot4b2#query:+page:1+mid:np33it233hoot4b2+state:results


Please let me know what you think or need any additional info.

Thanks,
Mak.

>>
>> This also includes changes in several function in rtmutex.c now that the lock's
>> requester may be a interrupt handler, not a real task struct.  This impacts
>> the way how the lock is acquired and prioritized and decision whether to do
>> the house keeping functions required for a real task struct.
>>
>> The reserved task_struct values for interrupt handler are
>>
>> 	current | 0x2
>>
>> where current is the task_struct value of the interrupted task.
>>
>> Since IH will both acquire and release the lock only during an interrupt
>> handling, during which current is not changed, the reserved task_struct value
>> for an IH should be distinct from another instances of IH on a different cpu.
>>
> 
> The interrupt handler is a thread just like any other task. It's not
> special. If there was a deadlock detected, it most likely means that a
> deadlock exists.
> 
> -- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-20 18:54     ` Thavatchai Makphaibulchoke
@ 2015-02-21  1:49       ` Steven Rostedt
  0 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2015-02-21  1:49 UTC (permalink / raw)
  To: Thavatchai Makphaibulchoke
  Cc: Thavatchai Makphaibulchoke, linux-kernel, mingo, tglx, linux-rt-users

On Fri, 20 Feb 2015 11:54:45 -0700
Thavatchai Makphaibulchoke <thavatchai.makpahibulchoke@hp.com> wrote:

> Sorry for not explaining the problem in more details.
> 
> IH here means the bottom half of interrupt handler, executing in the
> interrupt context (IC), not the preemptible interrupt kernel thread.
> interrupt.
> 
> Here is the problem we encountered.
> 
> An smp_apic_timer_interrupt comes in while task X is in the process of
> waiting for mutex A .  The IH successfully locks mutex B (in this case
> run_local_timers() gets the timer base's lock, base->lock, via
> spin_trylock()).
> 
> At the same time, task Y holding mutex A requests mutex B.
> 
> With current rtmutex code, mutex B ownership is incorrectly attributed
> to task X (using current, which is inaccurate in the IC). To task Y the
> situation effectively looks like it is holding mutex A and reuqesting B,
> which is held by task X holding mutex B and is now waiting for mutex A.
>  The deadlock detection is correct, a classic potential circular mutex
> deadlock.
> 
> In reality, it is not.  The IH the actual owner of mutex B will
> eventually completes and releases mutex B and task Y will eventually get
> mutex B and proceed and so will task X.  Actually either deleting or
> changing BUG_ON(ret) to WARN_ON(ret) in line 997 in fucntion
> rt_spin_lock_slowlock(), the test ran fine without any problem.
> 

Ah, I see the problem you have. Let me explain it in my own words to
make sure that you and I are on the same page.

Task X tries to grab rt_mutex A, but it's held by task Y. But as Y is
still running, the adaptive mutex code is in play and task X is
spinning (with it's blocked on A set). An interrupt comes in,
preempting task X and does a trylock on rt_mutex B, and succeeds. Now it
looks like task X has mutex B and is blocked on mutex A.

Task Y tries to take mutex B and sees that is held by task X which is
blocked on mutex A which Y owns. Thus you get a false deadlock detect.

I'm I correct?

Now, the question is, can we safely change the ownership of mutex B in
the interrupt context where it wont cause another side effect?


> A more detailed description of the problem could also be found at,
> 
> http://markmail.org/message/np33it233hoot4b2#query:+page:1+mid:np33it233hoot4b2+state:results
> 
> 
> Please let me know what you think or need any additional info.
> 

I haven't looked at the above link. I'll have to think about this some
more, but as I'm currently traveling, it will have to be done sometime
next week. Feel free top ping me on Monday or Tuesday. Tuesday would
probably be better, as I'm sure I'm going to be overloaded with other
work when I get back to my office on Monday.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-20  1:31 ` [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
  2015-02-20  4:53   ` Steven Rostedt
@ 2015-02-23 18:37   ` Steven Rostedt
  2015-02-24  0:16     ` Thavatchai Makphaibulchoke
  1 sibling, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2015-02-23 18:37 UTC (permalink / raw)
  To: Thavatchai Makphaibulchoke; +Cc: linux-kernel, mingo, tglx, linux-rt-users

On Thu, 19 Feb 2015 18:31:05 -0700
Thavatchai Makphaibulchoke <tmac@hp.com> wrote:

> This patch fixes the problem that the ownership of a mutex acquired by an
> interrupt handler(IH) gets incorrectly attributed to the interrupted thread.
> 
> This could result in an incorrect deadlock detection in function
> rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading
> up to a system hang.
> 
> Here is the approach taken: when calling from an interrupt handler, instead of
> attributing ownership to the interrupted task, use a reserved task_struct value
> to indicate that the owner is a interrupt handler.  This approach avoids the
> incorrect deadlock detection.
> 
> This also includes changes in several function in rtmutex.c now that the lock's
> requester may be a interrupt handler, not a real task struct.  This impacts
> the way how the lock is acquired and prioritized and decision whether to do
> the house keeping functions required for a real task struct.
> 
> The reserved task_struct values for interrupt handler are
> 
> 	current | 0x2
> 
> where current is the task_struct value of the interrupted task.
> 
> Since IH will both acquire and release the lock only during an interrupt
> handling, during which current is not changed, the reserved task_struct value
> for an IH should be distinct from another instances of IH on a different cpu.
> 
> Kernel version 3.14.25 + patch-3.14.25-rt22
> 
> Signed-off-by: T. Makphaibulchoke <tmac@hp.com>

OK, I believe I understand the issue. Perhaps it would be much better
to create a fake task per CPU that we use when grabbing locks in
interrupt mode. And make these have a priority of 0 (highest), since
they can not be preempted, they do have such a priority.

Then in the fast trylock and unlock code, we can add:

	struct task_struct *curr = current;

	if (unlikely(in_irq()))
		curr = this_cpu_read(irq_task);

This way the priority inheritance will stop when it hits this task (no
need to boost a task of highest priority), and we can leave that code
alone.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-23 18:37   ` Steven Rostedt
@ 2015-02-24  0:16     ` Thavatchai Makphaibulchoke
  2015-02-24  0:57       ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-02-24  0:16 UTC (permalink / raw)
  To: Steven Rostedt, Thavatchai Makphaibulchoke
  Cc: linux-kernel, mingo, tglx, linux-rt-users



On 02/23/2015 11:37 AM, Steven Rostedt wrote:
> 
> OK, I believe I understand the issue. Perhaps it would be much better
> to create a fake task per CPU that we use when grabbing locks in
> interrupt mode. And make these have a priority of 0 (highest), since
> they can not be preempted, they do have such a priority.
> 
> Then in the fast trylock and unlock code, we can add:
> 
> 	struct task_struct *curr = current;
> 
> 	if (unlikely(in_irq()))
> 		curr = this_cpu_read(irq_task);
> 
> This way the priority inheritance will stop when it hits this task (no
> need to boost a task of highest priority), and we can leave that code
> alone.
> 

Thanks again for the comments and suggestion.

Yes, creating a per cpu fake task was one of the alternative considered.
 I believe one of the reasons I did not purse is the amount of extra
storage it requires (sizeof(struct task_struct) * number of cpus.
Though the changes may not be as intrusive as the one I sent, some are
still required, mainly with current (one in particular came to mind is
in wakeup_next-watier()).

If I'm not mistaken, another reason could also be due to the rate of the
timer interrupt, in the case that the mutex is highly contested IH could
stall the non-real-time requester for a long time, even to the point of
the cpu is perceived as hung.

Anyway, I'll retry the fake task approach a try and report back if there
is any issue.

Thanks,
Mak.


> -- Steve
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-24  0:16     ` Thavatchai Makphaibulchoke
@ 2015-02-24  0:57       ` Steven Rostedt
  2015-02-26 13:56         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2015-02-24  0:57 UTC (permalink / raw)
  To: Thavatchai Makphaibulchoke
  Cc: Thavatchai Makphaibulchoke, linux-kernel, mingo, tglx, linux-rt-users

On Mon, 23 Feb 2015 17:16:27 -0700
Thavatchai Makphaibulchoke <thavatchai.makpahibulchoke@hp.com> wrote:


> Thanks again for the comments and suggestion.
> 
> Yes, creating a per cpu fake task was one of the alternative considered.
>  I believe one of the reasons I did not purse is the amount of extra
> storage it requires (sizeof(struct task_struct) * number of cpus.
> Though the changes may not be as intrusive as the one I sent, some are
> still required, mainly with current (one in particular came to mind is
> in wakeup_next-watier()).

Yeah, that's a draw back, but still. Big CPU machines should have big
memory. If we really are concerned, we could tidy up task_struct a bit,
rearranging the fields to make sure like fields are together (helps
with cache too) and move the RT stuff up further. Then we could
allocate just enough to cover all the task struct fields that are
accessed in the rtmutex code.


> 
> If I'm not mistaken, another reason could also be due to the rate of the
> timer interrupt, in the case that the mutex is highly contested IH could
> stall the non-real-time requester for a long time, even to the point of
> the cpu is perceived as hung.

Perhaps we should just have trylocks fail if there are other waiters.
As it wont slow down the high priority task. And doing that would
probably even help other rt tasks that are blocked on the lock waiting
for a release. Why make those tasks wait more if even a higher priority
task is simply doing a trylock and can safely fail it. At least we
could do that if the task trying to get the lock is a interrupt.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-24  0:57       ` Steven Rostedt
@ 2015-02-26 13:56         ` Sebastian Andrzej Siewior
  2015-02-26 14:06           ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-26 13:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thavatchai Makphaibulchoke, Thavatchai Makphaibulchoke,
	linux-kernel, mingo, tglx, linux-rt-users

* Steven Rostedt | 2015-02-23 19:57:43 [-0500]:

>On Mon, 23 Feb 2015 17:16:27 -0700
>Thavatchai Makphaibulchoke <thavatchai.makpahibulchoke@hp.com> wrote:
>> If I'm not mistaken, another reason could also be due to the rate of the
>> timer interrupt, in the case that the mutex is highly contested IH could
>> stall the non-real-time requester for a long time, even to the point of
>> the cpu is perceived as hung.
>
>Perhaps we should just have trylocks fail if there are other waiters.
>As it wont slow down the high priority task. And doing that would
>probably even help other rt tasks that are blocked on the lock waiting
>for a release. Why make those tasks wait more if even a higher priority
>task is simply doing a trylock and can safely fail it. At least we
>could do that if the task trying to get the lock is a interrupt.

What happened so far? The events I remember:
- we gained FULL_NO_HZ
- people started to isolate CPUs and push their work / RT tasks there
- it has been noticed that the kernel is raising the timer softirq even
  there is nothing going on once the softirq was started.
- tglx came with a patch which could go mainline if solves the problem.
- this patch did not make its way upstream (yet) and I pulled it into
  -RT. Isn't this a problem in mainline, too? Why isn't there anyone
  screaming?
- we had dead locks because the inner-lock of the sleeping was not safe
  to be used from hardirq context. #1
- we had boxes freezing on startup and not making progress due to missed
  irq_work wakeups. #2
- we had a deadlock splat on UP because the trylock failed. This was
  fixed by only allowing this feature on SMP (since it only makes sense
  with isolated CPUs). #3
- Now since the rtmutex rework we have dead lock splats which BUG()s the
  systems. #4

The four problems we had/have so far are -RT specific but still
plainfull when I think back.
rtmutex wasn't made to be accessed from hardirq context. That is where we
use the rawlocks. One problem that we still have and Peter pointer out
around #1 is about owner boosting if the lock is held in hardirq context
and the wrong owner is recorded. This problem was ignored so far.

Using a fake task as you suggest in irq context and ignoring would
somehow fix the boosting problem and avoid the deadlock we see now.

I am not sure if we want keep doing that. The only reason why we grab
the lock in the first place was to check if there is a timer pending
and we run on the isolated CPU. It should not matter for the other CPUs,
right?
So instead going further that road, what about storing base->next_timer
someplace so it can be obtained via atomic_read() for the isolated CPUs?

>-- Steve

Sebastian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-26 13:56         ` Sebastian Andrzej Siewior
@ 2015-02-26 14:06           ` Steven Rostedt
  2015-03-06 12:19             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2015-02-26 14:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thavatchai Makphaibulchoke, Thavatchai Makphaibulchoke,
	linux-kernel, mingo, tglx, linux-rt-users

On Thu, 26 Feb 2015 14:56:30 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> I am not sure if we want keep doing that. The only reason why we grab
> the lock in the first place was to check if there is a timer pending
> and we run on the isolated CPU. It should not matter for the other CPUs,
> right?
> So instead going further that road, what about storing base->next_timer
> someplace so it can be obtained via atomic_read() for the isolated CPUs?
> 

If we can pull that off and remove all rtmutex trylocks from hardirq
context, I would much rather do that.

This hocus pocus coding is just going to lead us down the path of the
black arts. I already have a black cat, so I'm good to go.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-02-26 14:06           ` Steven Rostedt
@ 2015-03-06 12:19             ` Sebastian Andrzej Siewior
  2015-03-09 16:36               ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-03-06 12:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thavatchai Makphaibulchoke, Thavatchai Makphaibulchoke,
	linux-kernel, mingo, tglx, linux-rt-users

* Steven Rostedt | 2015-02-26 09:06:10 [-0500]:

>If we can pull that off and remove all rtmutex trylocks from hardirq
>context, I would much rather do that.
>
>This hocus pocus coding is just going to lead us down the path of the
>black arts. I already have a black cat, so I'm good to go.

Okay. So what is the fix for v3.14 and other trees where the rtmutex
reworked has been integrated? Revert the optimization like I did in
v3.18?

Sebastian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-03-06 12:19             ` Sebastian Andrzej Siewior
@ 2015-03-09 16:36               ` Steven Rostedt
  2015-03-09 16:49                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2015-03-09 16:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thavatchai Makphaibulchoke, Thavatchai Makphaibulchoke,
	linux-kernel, mingo, tglx, linux-rt-users

On Fri, 6 Mar 2015 13:19:21 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> * Steven Rostedt | 2015-02-26 09:06:10 [-0500]:
> 
> >If we can pull that off and remove all rtmutex trylocks from hardirq
> >context, I would much rather do that.
> >
> >This hocus pocus coding is just going to lead us down the path of the
> >black arts. I already have a black cat, so I'm good to go.
> 
> Okay. So what is the fix for v3.14 and other trees where the rtmutex
> reworked has been integrated? Revert the optimization like I did in
> v3.18?

Probably.

I'm going to start looking at what's in 3.18 and start porting changes
to 3.14 and earlier. I've only been updating the stable kernels to pull
in the mainline stable updates. I haven't reverted anything (unless
mainline stable caused things to be obsolete), nor have I added any
updates.

BTW, I'm going to start with 3.18-rt1 and see what's been added to the
other -rt updates. If there's something I need that was added to
3.18-rt1 can you let me know. That is, if it wasn't marked with a
stable-rt tag. My scripts will find those.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-03-09 16:36               ` Steven Rostedt
@ 2015-03-09 16:49                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 37+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-03-09 16:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thavatchai Makphaibulchoke, Thavatchai Makphaibulchoke,
	linux-kernel, mingo, tglx, linux-rt-users

On 03/09/2015 05:36 PM, Steven Rostedt wrote:
> BTW, I'm going to start with 3.18-rt1 and see what's been added to the
> other -rt updates. If there's something I need that was added to
> 3.18-rt1 can you let me know. That is, if it wasn't marked with a
> stable-rt tag. My scripts will find those.

I tried to create new patch files for new things and mark them stable
where possible.
I didn't mark the "simple-work" (a workqueue based on swait to avoid a
new kernel thread for each of those things where we need to schedule a
workqueue or something like that from atomic context, like MCE) with cc
stable.

The patch where I reverted this timer thingy is
  Revert-timers-do-not-raise-softirq-unconditionally.patch

and it was due to hrtimer beeing broken on 3.18 and has no stable tag.

I'm not sure about the multi-queue block stuff (*-mq-* in the queue).
You might want take a look. I updated a few patches because I managed
to create deadlocks. I'm not sure if it was possible deadlock in v3.14
and I simply didn't trigger if the code changed and it become possible.
I remember that the multi queue block code entered v3.12 but was unused
and started with v3.14 it gained a user.

> 
> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 0/2] rtmutex Real-Time Linux: fix BUG at kernel/locking/rtmutex.c:997!
  2015-02-20  1:31 [PATCH 3.14.25-rt22 0/2] rtmutex Real-Time Linux: fix kernel BUG at kernel/locking/rtmutex.c:997! and some optimization Thavatchai Makphaibulchoke
  2015-02-20  1:31 ` [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
  2015-02-20  1:31 ` [PATCH 3.14.25-rt22 2/2] kernel/locking/rtmutex.c: some code optimization Thavatchai Makphaibulchoke
@ 2015-04-07  1:26 ` Thavatchai Makphaibulchoke
  2015-04-07  1:26   ` [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel " Thavatchai Makphaibulchoke
  2015-04-07  1:26   ` [PATCH v2 2/2] kernel/locking/rtmutex.c: some code optimization Thavatchai Makphaibulchoke
  2 siblings, 2 replies; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-04-07  1:26 UTC (permalink / raw)
  To: rostedt, linux-kernel
  Cc: mingo, tglx, linux-rt-users, umgwanakikbuti, Thavatchai Makphaibulchoke

This patch series compose of 2 patches.

First patch, fixing kernel BUG at kernel/locking/rtmutex.c:997!

Second patch, some code optimation in kernel/locking/rtmutex.c

Changed in v2:
        - Use idle_task on the processor as rtmutex's owner instead of the
          reserved interrupt handler task value.
        - Removed code to hadle the reserved interrupt handler's task value.

Thavatchai Makphaibulchoke (2):
  rtmutex Real-Time Linux: Fixing kernel BUG at
    kernel/locking/rtmutex.c:997!
  kernel/locking/rtmutex.c: some code optimization

 kernel/locking/rtmutex.c | 107 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 69 insertions(+), 38 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07  1:26 ` [PATCH v2 0/2] rtmutex Real-Time Linux: fix BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
@ 2015-04-07  1:26   ` Thavatchai Makphaibulchoke
  2015-04-07  1:59     ` Steven Rostedt
  2015-04-07 11:23     ` Thomas Gleixner
  2015-04-07  1:26   ` [PATCH v2 2/2] kernel/locking/rtmutex.c: some code optimization Thavatchai Makphaibulchoke
  1 sibling, 2 replies; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-04-07  1:26 UTC (permalink / raw)
  To: rostedt, linux-kernel
  Cc: mingo, tglx, linux-rt-users, umgwanakikbuti, Thavatchai Makphaibulchoke

This patch fixes the problem that the ownership of a mutex acquired by an
interrupt handler(IH) gets incorrectly attributed to the interrupted thread.

This could result in an incorrect deadlock detection in function
rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading
up to a system hang.

Here is the approach taken: when calling from an interrupt handler, instead of
attributing ownership to the interrupted task, use the idle task on the processor
to indicate that the owner is a interrupt handler.  This approach avoids the
above incorrect deadlock detection.

This also includes changes to disable priority boosting when lock owner is
the idle_task, as it is not allowed.

Kernel version 3.14.25 + patch-3.14.25-rt22

Signed-off-by: T. Makphaibulchoke <tmac@hp.com>
---
Changed in v2:
        - Use idle_task on the processor as rtmutex's owner instead of the
          reserved interrupt handler task value.
        - Removed code to hadle the reserved interrupt handler's task value.
 kernel/locking/rtmutex.c | 77 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 25 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6c40660..ae5c13f 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -26,6 +26,9 @@
 
 #include "rtmutex_common.h"
 
+static int __sched __rt_mutex_trylock(struct rt_mutex *lock,
+		struct task_struct *caller);
+
 /*
  * lock->owner state tracking:
  *
@@ -51,6 +54,9 @@
  * waiters. This can happen when grabbing the lock in the slow path.
  * To prevent a cmpxchg of the owner releasing the lock, we need to
  * set this bit before looking at the lock.
+ *
+ * Owner can also be reserved value, INTERRUPT_HANDLER. In this case the mutex
+ * is owned by idle_task on the processor.
  */
 
 static void
@@ -298,7 +304,7 @@ static void __rt_mutex_adjust_prio(struct task_struct *task)
 {
 	int prio = rt_mutex_getprio(task);
 
-	if (task->prio != prio || dl_prio(prio))
+	if (!is_idle_task(task) && (task->prio != prio || dl_prio(prio)))
 		rt_mutex_setprio(task, prio);
 }
 
@@ -730,7 +736,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
 	if (waiter == rt_mutex_top_waiter(lock)) {
 		rt_mutex_dequeue_pi(owner, top_waiter);
 		rt_mutex_enqueue_pi(owner, waiter);
-
 		__rt_mutex_adjust_prio(owner);
 		if (rt_mutex_real_waiter(owner->pi_blocked_on))
 			chain_walk = 1;
@@ -777,10 +782,11 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
  */
 static void wakeup_next_waiter(struct rt_mutex *lock)
 {
+	struct task_struct *owner = rt_mutex_owner(lock);
 	struct rt_mutex_waiter *waiter;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&current->pi_lock, flags);
+	raw_spin_lock_irqsave(&owner->pi_lock, flags);
 
 	waiter = rt_mutex_top_waiter(lock);
 
@@ -790,7 +796,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
 	 * boosted mode and go back to normal after releasing
 	 * lock->wait_lock.
 	 */
-	rt_mutex_dequeue_pi(current, waiter);
+	rt_mutex_dequeue_pi(owner, waiter);
 
 	/*
 	 * As we are waking up the top waiter, and the waiter stays
@@ -802,7 +808,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
 	 */
 	lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
 
-	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
+	raw_spin_unlock_irqrestore(&owner->pi_lock, flags);
 
 	/*
 	 * It's safe to dereference waiter as it cannot go away as
@@ -902,6 +908,8 @@ void rt_mutex_adjust_pi(struct task_struct *task)
 static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
 					 void  (*slowfn)(struct rt_mutex *lock))
 {
+	/* Might sleep, should not be called in interrupt context. */
+	BUG_ON(in_interrupt());
 	might_sleep();
 
 	if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
@@ -911,12 +919,12 @@ static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
 }
 
 static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock,
-					   void  (*slowfn)(struct rt_mutex *lock))
+	void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
 {
 	if (likely(rt_mutex_cmpxchg(lock, current, NULL)))
 		rt_mutex_deadlock_account_unlock(current);
 	else
-		slowfn(lock);
+		slowfn(lock, current);
 }
 
 #ifdef CONFIG_SMP
@@ -1047,11 +1055,12 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
 /*
  * Slow path to release a rt_mutex spin_lock style
  */
-static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock)
+static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock,
+	struct task_struct *task)
 {
 	debug_rt_mutex_unlock(lock);
 
-	rt_mutex_deadlock_account_unlock(current);
+	rt_mutex_deadlock_account_unlock(task);
 
 	if (!rt_mutex_has_waiters(lock)) {
 		lock->owner = NULL;
@@ -1064,24 +1073,33 @@ static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock)
 	raw_spin_unlock(&lock->wait_lock);
 
 	/* Undo pi boosting.when necessary */
-	rt_mutex_adjust_prio(current);
+	rt_mutex_adjust_prio(task);
 }
 
-static void  noinline __sched rt_spin_lock_slowunlock(struct rt_mutex *lock)
+static noinline void __sched rt_spin_lock_slowunlock(struct rt_mutex *lock,
+	struct task_struct *task)
 {
 	raw_spin_lock(&lock->wait_lock);
-	__rt_spin_lock_slowunlock(lock);
+	__rt_spin_lock_slowunlock(lock, task);
 }
 
-static void  noinline __sched rt_spin_lock_slowunlock_hirq(struct rt_mutex *lock)
+static inline void rt_spin_lock_fastunlock_in_irq(struct rt_mutex *lock,
+	void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
 {
 	int ret;
+	struct task_struct *intr_owner = current;
 
+	if (unlikely(in_irq()))
+		intr_owner = idle_task(smp_processor_id());
+	if (likely(rt_mutex_cmpxchg(lock, intr_owner, NULL))) {
+		rt_mutex_deadlock_account_unlock(intr_owner);
+		return;
+	}
 	do {
 		ret = raw_spin_trylock(&lock->wait_lock);
 	} while (!ret);
 
-	__rt_spin_lock_slowunlock(lock);
+	slowfn(lock, intr_owner);
 }
 
 void __lockfunc rt_spin_lock(spinlock_t *lock)
@@ -1118,7 +1136,7 @@ void __lockfunc rt_spin_unlock_after_trylock_in_irq(spinlock_t *lock)
 {
 	/* NOTE: we always pass in '1' for nested, for simplicity */
 	spin_release(&lock->dep_map, 1, _RET_IP_);
-	rt_spin_lock_fastunlock(&lock->lock, rt_spin_lock_slowunlock_hirq);
+	rt_spin_lock_fastunlock_in_irq(&lock->lock, __rt_spin_lock_slowunlock);
 }
 
 void __lockfunc __rt_spin_unlock(struct rt_mutex *lock)
@@ -1146,8 +1164,12 @@ int __lockfunc __rt_spin_trylock(struct rt_mutex *lock)
 
 int __lockfunc rt_spin_trylock(spinlock_t *lock)
 {
-	int ret = rt_mutex_trylock(&lock->lock);
+	struct task_struct *intr_owner = current;
+	int ret;
 
+	if (unlikely(in_irq()))
+		intr_owner = idle_task(smp_processor_id());
+	ret = __rt_mutex_trylock(&lock->lock, intr_owner);
 	if (ret)
 		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
 	return ret;
@@ -1466,16 +1488,16 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
  * Slow path try-lock function:
  */
 static inline int
-rt_mutex_slowtrylock(struct rt_mutex *lock)
+rt_mutex_slowtrylock(struct rt_mutex *lock, struct task_struct *task)
 {
 	int ret = 0;
 
 	if (!raw_spin_trylock(&lock->wait_lock))
 		return ret;
 
-	if (likely(rt_mutex_owner(lock) != current)) {
+	if (likely(rt_mutex_owner(lock) != task)) {
 
-		ret = try_to_take_rt_mutex(lock, current, NULL);
+		ret = try_to_take_rt_mutex(lock, task, NULL);
 		/*
 		 * try_to_take_rt_mutex() sets the lock waiters
 		 * bit unconditionally. Clean this up.
@@ -1589,14 +1611,14 @@ rt_mutex_timed_fastlock(struct rt_mutex *lock, int state,
 }
 
 static inline int
-rt_mutex_fasttrylock(struct rt_mutex *lock,
-		     int (*slowfn)(struct rt_mutex *lock))
+rt_mutex_fasttrylock(struct rt_mutex *lock, struct task_struct *caller,
+	int (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
 {
-	if (likely(rt_mutex_cmpxchg(lock, NULL, current))) {
-		rt_mutex_deadlock_account_lock(lock, current);
+	if (likely(rt_mutex_cmpxchg(lock, NULL, caller))) {
+		rt_mutex_deadlock_account_lock(lock, caller);
 		return 1;
 	}
-	return slowfn(lock);
+	return slowfn(lock, caller);
 }
 
 static inline void
@@ -1690,6 +1712,11 @@ rt_mutex_timed_lock(struct rt_mutex *lock, struct hrtimer_sleeper *timeout,
 }
 EXPORT_SYMBOL_GPL(rt_mutex_timed_lock);
 
+static int __sched __rt_mutex_trylock(struct rt_mutex *lock,
+		struct task_struct *caller)
+{
+	return rt_mutex_fasttrylock(lock, caller, rt_mutex_slowtrylock);
+}
 /**
  * rt_mutex_trylock - try to lock a rt_mutex
  *
@@ -1699,7 +1726,7 @@ EXPORT_SYMBOL_GPL(rt_mutex_timed_lock);
  */
 int __sched rt_mutex_trylock(struct rt_mutex *lock)
 {
-	return rt_mutex_fasttrylock(lock, rt_mutex_slowtrylock);
+	return __rt_mutex_trylock(lock, current);
 }
 EXPORT_SYMBOL_GPL(rt_mutex_trylock);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 2/2] kernel/locking/rtmutex.c: some code optimization
  2015-04-07  1:26 ` [PATCH v2 0/2] rtmutex Real-Time Linux: fix BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
  2015-04-07  1:26   ` [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel " Thavatchai Makphaibulchoke
@ 2015-04-07  1:26   ` Thavatchai Makphaibulchoke
  1 sibling, 0 replies; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-04-07  1:26 UTC (permalink / raw)
  To: rostedt, linux-kernel
  Cc: mingo, tglx, linux-rt-users, umgwanakikbuti, Thavatchai Makphaibulchoke

Adding the following code optimization,

- Reducing the number of cmpxchgs.  Only call mark_rt_mutex_waiters() when
  needed, waiters bit is not set.
- Reducing the hold time of wait_lock lock.
- Calling fixup_rt_mutex_waiters() only when needed.
- When unlocking rt_spin_lock in IRQ, alternate between attempting fast
  unlocking and attempting to lock mutex's wait_lock.

Signed-off-by: T. Makphaibulchoke <tmac@hp.com>
---
 kernel/locking/rtmutex.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index ae5c13f..cadba20 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -608,7 +608,8 @@ __try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
 	 * any more. This is fixed up when we take the ownership.
 	 * This is the transitional state explained at the top of this file.
 	 */
-	mark_rt_mutex_waiters(lock);
+	if (!((unsigned long)lock->owner & RT_MUTEX_HAS_WAITERS))
+		mark_rt_mutex_waiters(lock);
 
 	if (rt_mutex_owner(lock))
 		return 0;
@@ -832,8 +833,8 @@ static void remove_waiter(struct rt_mutex *lock,
 	struct rt_mutex *next_lock = NULL;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&current->pi_lock, flags);
 	rt_mutex_dequeue(lock, waiter);
+	raw_spin_lock_irqsave(&current->pi_lock, flags);
 	current->pi_blocked_on = NULL;
 	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
 
@@ -1019,11 +1020,11 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
 		if (top_waiter != &waiter || adaptive_wait(lock, lock_owner))
 			schedule_rt_mutex(lock);
 
-		raw_spin_lock(&lock->wait_lock);
-
 		pi_lock(&self->pi_lock);
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		pi_unlock(&self->pi_lock);
+
+		raw_spin_lock(&lock->wait_lock);
 	}
 
 	/*
@@ -1038,12 +1039,6 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
 	self->saved_state = TASK_RUNNING;
 	pi_unlock(&self->pi_lock);
 
-	/*
-	 * try_to_take_rt_mutex() sets the waiter bit
-	 * unconditionally. We might have to fix that up:
-	 */
-	fixup_rt_mutex_waiters(lock);
-
 	BUG_ON(rt_mutex_has_waiters(lock) && &waiter == rt_mutex_top_waiter(lock));
 	BUG_ON(!RB_EMPTY_NODE(&waiter.tree_entry));
 
@@ -1096,7 +1091,14 @@ static inline void rt_spin_lock_fastunlock_in_irq(struct rt_mutex *lock,
 		return;
 	}
 	do {
+		/*
+		 * Alternate between fast acquire and try lock and proceed
+		 * to slow lock whichever succeeds first.
+		 */
 		ret = raw_spin_trylock(&lock->wait_lock);
+		if (!ret && unlikely(rt_mutex_cmpxchg(lock, intr_owner,
+			NULL)))
+			return;
 	} while (!ret);
 
 	slowfn(lock, intr_owner);
@@ -1499,10 +1501,12 @@ rt_mutex_slowtrylock(struct rt_mutex *lock, struct task_struct *task)
 
 		ret = try_to_take_rt_mutex(lock, task, NULL);
 		/*
-		 * try_to_take_rt_mutex() sets the lock waiters
-		 * bit unconditionally. Clean this up.
+		 * try_to_take_rt_mutex() keeps the lock waiters
+		 * bit set when failed to grab lock. Clean this
+		 * in case of a failure.
 		 */
-		fixup_rt_mutex_waiters(lock);
+		if (!ret)
+			fixup_rt_mutex_waiters(lock);
 	}
 
 	raw_spin_unlock(&lock->wait_lock);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07  1:26   ` [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel " Thavatchai Makphaibulchoke
@ 2015-04-07  1:59     ` Steven Rostedt
  2015-04-07  5:09       ` Mike Galbraith
  2015-04-08  0:55       ` Thavatchai Makphaibulchoke
  2015-04-07 11:23     ` Thomas Gleixner
  1 sibling, 2 replies; 37+ messages in thread
From: Steven Rostedt @ 2015-04-07  1:59 UTC (permalink / raw)
  To: Thavatchai Makphaibulchoke
  Cc: linux-kernel, mingo, tglx, linux-rt-users, umgwanakikbuti,
	Peter Zijlstra, Sebastian Andrzej Siewior

On Mon,  6 Apr 2015 19:26:01 -0600
Thavatchai Makphaibulchoke <tmac@hp.com> wrote:

> This patch fixes the problem that the ownership of a mutex acquired by an
> interrupt handler(IH) gets incorrectly attributed to the interrupted thread.
> 
> This could result in an incorrect deadlock detection in function
> rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading
> up to a system hang.
> 
> Here is the approach taken: when calling from an interrupt handler, instead of
> attributing ownership to the interrupted task, use the idle task on the processor
> to indicate that the owner is a interrupt handler.  This approach avoids the
> above incorrect deadlock detection.
> 
> This also includes changes to disable priority boosting when lock owner is
> the idle_task, as it is not allowed.

Hmm, why is it not allowed?

If we just let it boost it, it will cut down on the code changes and
checks that add to the hot paths.

> 
> Kernel version 3.14.25 + patch-3.14.25-rt22
> 
> Signed-off-by: T. Makphaibulchoke <tmac@hp.com>
> ---
> Changed in v2:
>         - Use idle_task on the processor as rtmutex's owner instead of the
>           reserved interrupt handler task value.
>         - Removed code to hadle the reserved interrupt handler's task value.
>  kernel/locking/rtmutex.c | 77 ++++++++++++++++++++++++++++++++----------------
>  1 file changed, 52 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 6c40660..ae5c13f 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -26,6 +26,9 @@
>  
>  #include "rtmutex_common.h"
>  
> +static int __sched __rt_mutex_trylock(struct rt_mutex *lock,
> +		struct task_struct *caller);
> +
>  /*
>   * lock->owner state tracking:
>   *
> @@ -51,6 +54,9 @@
>   * waiters. This can happen when grabbing the lock in the slow path.
>   * To prevent a cmpxchg of the owner releasing the lock, we need to
>   * set this bit before looking at the lock.
> + *
> + * Owner can also be reserved value, INTERRUPT_HANDLER. In this case the mutex
> + * is owned by idle_task on the processor.
>   */
>  
>  static void
> @@ -298,7 +304,7 @@ static void __rt_mutex_adjust_prio(struct task_struct *task)
>  {
>  	int prio = rt_mutex_getprio(task);
>  
> -	if (task->prio != prio || dl_prio(prio))
> +	if (!is_idle_task(task) && (task->prio != prio || dl_prio(prio)))
>  		rt_mutex_setprio(task, prio);
>  }
>  
> @@ -730,7 +736,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
>  	if (waiter == rt_mutex_top_waiter(lock)) {
>  		rt_mutex_dequeue_pi(owner, top_waiter);
>  		rt_mutex_enqueue_pi(owner, waiter);
> -

I don't think this whitespace change needs to be done. The space does
split up the dequeue and enqueue from the rest.

>  		__rt_mutex_adjust_prio(owner);
>  		if (rt_mutex_real_waiter(owner->pi_blocked_on))
>  			chain_walk = 1;
> @@ -777,10 +782,11 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
>   */
>  static void wakeup_next_waiter(struct rt_mutex *lock)
>  {
> +	struct task_struct *owner = rt_mutex_owner(lock);
>  	struct rt_mutex_waiter *waiter;
>  	unsigned long flags;
>  
> -	raw_spin_lock_irqsave(&current->pi_lock, flags);
> +	raw_spin_lock_irqsave(&owner->pi_lock, flags);
>  
>  	waiter = rt_mutex_top_waiter(lock);
>  
> @@ -790,7 +796,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
>  	 * boosted mode and go back to normal after releasing
>  	 * lock->wait_lock.
>  	 */
> -	rt_mutex_dequeue_pi(current, waiter);
> +	rt_mutex_dequeue_pi(owner, waiter);
>  
>  	/*
>  	 * As we are waking up the top waiter, and the waiter stays
> @@ -802,7 +808,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
>  	 */
>  	lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
>  
> -	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
> +	raw_spin_unlock_irqrestore(&owner->pi_lock, flags);
>  
>  	/*
>  	 * It's safe to dereference waiter as it cannot go away as
> @@ -902,6 +908,8 @@ void rt_mutex_adjust_pi(struct task_struct *task)
>  static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
>  					 void  (*slowfn)(struct rt_mutex *lock))
>  {
> +	/* Might sleep, should not be called in interrupt context. */
> +	BUG_ON(in_interrupt());

You're right it shouldn't. But that's why might_sleep() will give us a
nice big warning if it is. Don't add the BUG_ON().

>  	might_sleep();
>  
>  	if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
> @@ -911,12 +919,12 @@ static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
>  }
>  
>  static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock,
> -					   void  (*slowfn)(struct rt_mutex *lock))
> +	void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
>  {
>  	if (likely(rt_mutex_cmpxchg(lock, current, NULL)))
>  		rt_mutex_deadlock_account_unlock(current);
>  	else
> -		slowfn(lock);
> +		slowfn(lock, current);
>  }
>  
>  #ifdef CONFIG_SMP
> @@ -1047,11 +1055,12 @@ static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
>  /*
>   * Slow path to release a rt_mutex spin_lock style
>   */
> -static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock)
> +static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock,
> +	struct task_struct *task)
>  {
>  	debug_rt_mutex_unlock(lock);
>  
> -	rt_mutex_deadlock_account_unlock(current);
> +	rt_mutex_deadlock_account_unlock(task);
>  
>  	if (!rt_mutex_has_waiters(lock)) {
>  		lock->owner = NULL;
> @@ -1064,24 +1073,33 @@ static void __sched __rt_spin_lock_slowunlock(struct rt_mutex *lock)
>  	raw_spin_unlock(&lock->wait_lock);
>  
>  	/* Undo pi boosting.when necessary */
> -	rt_mutex_adjust_prio(current);
> +	rt_mutex_adjust_prio(task);
>  }
>  
> -static void  noinline __sched rt_spin_lock_slowunlock(struct rt_mutex *lock)
> +static noinline void __sched rt_spin_lock_slowunlock(struct rt_mutex *lock,
> +	struct task_struct *task)
>  {
>  	raw_spin_lock(&lock->wait_lock);
> -	__rt_spin_lock_slowunlock(lock);
> +	__rt_spin_lock_slowunlock(lock, task);
>  }
>  
> -static void  noinline __sched rt_spin_lock_slowunlock_hirq(struct rt_mutex *lock)
> +static inline void rt_spin_lock_fastunlock_in_irq(struct rt_mutex *lock,

Why the name change?

> +	void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
>  {
>  	int ret;
> +	struct task_struct *intr_owner = current;
>  
> +	if (unlikely(in_irq()))

Why unlikely? This should only be called in interrupt context.

In fact, perhaps we should have a:

	WARN_ON(!in_irq());

Then we don't need this test at all, and just assign the owner the idle
task.

> +		intr_owner = idle_task(smp_processor_id());

Also, never butt a single if statement up against another if statement.
Add a space, otherwise it gives the impression of being an
  if () else if ()

> +	if (likely(rt_mutex_cmpxchg(lock, intr_owner, NULL))) {
> +		rt_mutex_deadlock_account_unlock(intr_owner);
> +		return;
> +	}

And add a space here. Don't butt conditionals together unless they are
related (if else if, etc)

>  	do {
>  		ret = raw_spin_trylock(&lock->wait_lock);
>  	} while (!ret);

I know this isn't part of the patch, but that do loop needs a comment
(this is more toward Sebastian, and not you). It looks buggy, and I
think we do it this way just so that lockdep doesn't complain. We need
a comment here that states something like:

	/*
	 * To get this rt_mutex from interrupt context, we had to have
	 * taken the wait_lock once before. Thus, nothing can deadlock
	 * us now. The wait_lock is internal to the rt_mutex, and
	 * anything that may have it now, will soon release it, because
	 * we own the rt_mutex but do not hold anything that the owner
	 * of the wait_lock would need to grab.
	 *
	 * The do { } while() is to keep lockdep from complaining.
	 */

I wonder if there's another way to just take the wait_lock and tell
lockdep not to complain?

Peter?

>  
> -	__rt_spin_lock_slowunlock(lock);
> +	slowfn(lock, intr_owner);
>  }
>  
>  void __lockfunc rt_spin_lock(spinlock_t *lock)
> @@ -1118,7 +1136,7 @@ void __lockfunc rt_spin_unlock_after_trylock_in_irq(spinlock_t *lock)
>  {
>  	/* NOTE: we always pass in '1' for nested, for simplicity */
>  	spin_release(&lock->dep_map, 1, _RET_IP_);
> -	rt_spin_lock_fastunlock(&lock->lock, rt_spin_lock_slowunlock_hirq);
> +	rt_spin_lock_fastunlock_in_irq(&lock->lock, __rt_spin_lock_slowunlock);
>  }
>  
>  void __lockfunc __rt_spin_unlock(struct rt_mutex *lock)
> @@ -1146,8 +1164,12 @@ int __lockfunc __rt_spin_trylock(struct rt_mutex *lock)
>  
>  int __lockfunc rt_spin_trylock(spinlock_t *lock)

We really should have a rt_spin_trylock_in_irq() and not have the
below if conditional.

The paths that will be executed in hard irq context are static. They
should be labeled as such.

-- Steve

>  {
> -	int ret = rt_mutex_trylock(&lock->lock);
> +	struct task_struct *intr_owner = current;
> +	int ret;
>  
> +	if (unlikely(in_irq()))
> +		intr_owner = idle_task(smp_processor_id());
> +	ret = __rt_mutex_trylock(&lock->lock, intr_owner);
>  	if (ret)
>  		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
>  	return ret;
> @@ -1466,16 +1488,16 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
>   * Slow path try-lock function:
>   */
>  static inline int
> -rt_mutex_slowtrylock(struct rt_mutex *lock)
> +rt_mutex_slowtrylock(struct rt_mutex *lock, struct task_struct *task)
>  {
>  	int ret = 0;
>  
>  	if (!raw_spin_trylock(&lock->wait_lock))
>  		return ret;
>  
> -	if (likely(rt_mutex_owner(lock) != current)) {
> +	if (likely(rt_mutex_owner(lock) != task)) {
>  
> -		ret = try_to_take_rt_mutex(lock, current, NULL);
> +		ret = try_to_take_rt_mutex(lock, task, NULL);
>  		/*
>  		 * try_to_take_rt_mutex() sets the lock waiters
>  		 * bit unconditionally. Clean this up.
> @@ -1589,14 +1611,14 @@ rt_mutex_timed_fastlock(struct rt_mutex *lock, int state,
>  }
>  
>  static inline int
> -rt_mutex_fasttrylock(struct rt_mutex *lock,
> -		     int (*slowfn)(struct rt_mutex *lock))
> +rt_mutex_fasttrylock(struct rt_mutex *lock, struct task_struct *caller,
> +	int (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
>  {
> -	if (likely(rt_mutex_cmpxchg(lock, NULL, current))) {
> -		rt_mutex_deadlock_account_lock(lock, current);
> +	if (likely(rt_mutex_cmpxchg(lock, NULL, caller))) {
> +		rt_mutex_deadlock_account_lock(lock, caller);
>  		return 1;
>  	}
> -	return slowfn(lock);
> +	return slowfn(lock, caller);
>  }
>  
>  static inline void
> @@ -1690,6 +1712,11 @@ rt_mutex_timed_lock(struct rt_mutex *lock, struct hrtimer_sleeper *timeout,
>  }
>  EXPORT_SYMBOL_GPL(rt_mutex_timed_lock);
>  
> +static int __sched __rt_mutex_trylock(struct rt_mutex *lock,
> +		struct task_struct *caller)
> +{
> +	return rt_mutex_fasttrylock(lock, caller, rt_mutex_slowtrylock);
> +}
>  /**
>   * rt_mutex_trylock - try to lock a rt_mutex
>   *
> @@ -1699,7 +1726,7 @@ EXPORT_SYMBOL_GPL(rt_mutex_timed_lock);
>   */
>  int __sched rt_mutex_trylock(struct rt_mutex *lock)
>  {
> -	return rt_mutex_fasttrylock(lock, rt_mutex_slowtrylock);
> +	return __rt_mutex_trylock(lock, current);
>  }
>  EXPORT_SYMBOL_GPL(rt_mutex_trylock);
>  


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07  1:59     ` Steven Rostedt
@ 2015-04-07  5:09       ` Mike Galbraith
  2015-04-07 10:29         ` Peter Zijlstra
  2015-04-08  0:55       ` Thavatchai Makphaibulchoke
  1 sibling, 1 reply; 37+ messages in thread
From: Mike Galbraith @ 2015-04-07  5:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thavatchai Makphaibulchoke, linux-kernel, mingo, tglx,
	linux-rt-users, Peter Zijlstra, Sebastian Andrzej Siewior

On Mon, 2015-04-06 at 21:59 -0400, Steven Rostedt wrote:
> 
> We really should have a rt_spin_trylock_in_irq() and not have the
> below if conditional.
> 
> The paths that will be executed in hard irq context are static. They
> should be labeled as such.

I did it as an explicitly labeled special purpose (naughty) pair.

---
 include/linux/spinlock_rt.h |    2 ++
 kernel/locking/rtmutex.c    |   31 ++++++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 1 deletion(-)

--- a/include/linux/spinlock_rt.h
+++ b/include/linux/spinlock_rt.h
@@ -27,6 +27,8 @@ extern void __lockfunc rt_spin_unlock_wa
 extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags);
 extern int __lockfunc rt_spin_trylock_bh(spinlock_t *lock);
 extern int __lockfunc rt_spin_trylock(spinlock_t *lock);
+extern int __lockfunc rt_spin_trylock_in_irq(spinlock_t *lock);
+extern void __lockfunc rt_spin_trylock_in_irq_unlock(spinlock_t *lock);
 extern int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock);
 
 /*
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -87,7 +87,7 @@ static int rt_mutex_real_waiter(struct r
  * supports cmpxchg and if there's no debugging state to be set up
  */
 #if defined(__HAVE_ARCH_CMPXCHG) && !defined(CONFIG_DEBUG_RT_MUTEXES)
-# define rt_mutex_cmpxchg(l,c,n)       (cmpxchg(&l->owner, c, n) == c)
+# define rt_mutex_cmpxchg(l,c,n)       (cmpxchg(&(l)->owner, (c), (n)) == (c))
 static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
 {
        unsigned long owner, *p = (unsigned long *) &lock->owner;
@@ -1208,6 +1208,35 @@ int __lockfunc rt_spin_trylock_irqsave(s
 }
 EXPORT_SYMBOL(rt_spin_trylock_irqsave);
 
+/*
+ * Special purpose for locks taken in interrupt context: Take and hold
+ * ->wait_lock lest PI catching us with our fingers in the cookie jar.
+ * Do NOT abuse.
+ */
+int __lockfunc rt_spin_trylock_in_irq(spinlock_t *lock)
+{
+       struct task_struct *owner;
+       if (!raw_spin_trylock(&lock->lock.wait_lock))
+               return 0;
+       owner = idle_task(raw_smp_processor_id());
+       if (!(rt_mutex_cmpxchg(&lock->lock, NULL, owner))) {
+               raw_spin_unlock(&lock->lock.wait_lock);
+               return 0;
+       }
+       spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+       return 1;
+}
+
+/* ONLY for use with rt_spin_trylock_in_irq(), do NOT abuse. */
+void __lockfunc rt_spin_trylock_in_irq_unlock(spinlock_t *lock)
+{
+       struct task_struct *owner = idle_task(raw_smp_processor_id());
+       /* NOTE: we always pass in '1' for nested, for simplicity */
+       spin_release(&lock->dep_map, 1, _RET_IP_);
+       BUG_ON(!(rt_mutex_cmpxchg(&lock->lock, owner, NULL)));
+       raw_spin_unlock(&lock->lock.wait_lock);
+}
+
 int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock)
 {
        /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07  5:09       ` Mike Galbraith
@ 2015-04-07 10:29         ` Peter Zijlstra
  2015-04-07 10:49           ` Mike Galbraith
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2015-04-07 10:29 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Thavatchai Makphaibulchoke, linux-kernel, mingo,
	tglx, linux-rt-users, Sebastian Andrzej Siewior

On Tue, Apr 07, 2015 at 07:09:43AM +0200, Mike Galbraith wrote:
> On Mon, 2015-04-06 at 21:59 -0400, Steven Rostedt wrote:
> > 
> > We really should have a rt_spin_trylock_in_irq() and not have the
> > below if conditional.
> > 
> > The paths that will be executed in hard irq context are static. They
> > should be labeled as such.
> 
> +/*
> + * Special purpose for locks taken in interrupt context: Take and hold
> + * ->wait_lock lest PI catching us with our fingers in the cookie jar.
> + * Do NOT abuse.
> + */
> +int __lockfunc rt_spin_trylock_in_irq(spinlock_t *lock)
> +{
> +       struct task_struct *owner;
> +       if (!raw_spin_trylock(&lock->lock.wait_lock))
> +               return 0;
> +       owner = idle_task(raw_smp_processor_id());
> +       if (!(rt_mutex_cmpxchg(&lock->lock, NULL, owner))) {
> +               raw_spin_unlock(&lock->lock.wait_lock);
> +               return 0;
> +       }
> +       spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
> +       return 1;
> +}
> +
> +/* ONLY for use with rt_spin_trylock_in_irq(), do NOT abuse. */
> +void __lockfunc rt_spin_trylock_in_irq_unlock(spinlock_t *lock)
> +{
> +       struct task_struct *owner = idle_task(raw_smp_processor_id());
> +       /* NOTE: we always pass in '1' for nested, for simplicity */
> +       spin_release(&lock->dep_map, 1, _RET_IP_);
> +       BUG_ON(!(rt_mutex_cmpxchg(&lock->lock, owner, NULL)));
> +       raw_spin_unlock(&lock->lock.wait_lock);
> +}
> +

Can someone explain this braindamage? You should _NOT_ take mutexes in
hardirq context.

And if its an irq thread, then the irq thread _IS_ the right owner, the
thread needs to be boosted by waiters.

The idle thread cannot ever be owner of a mutex, that's complete and
utter bullshit.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 10:29         ` Peter Zijlstra
@ 2015-04-07 10:49           ` Mike Galbraith
  2015-04-07 10:56             ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Galbraith @ 2015-04-07 10:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Thavatchai Makphaibulchoke, linux-kernel, mingo,
	tglx, linux-rt-users, Sebastian Andrzej Siewior

On Tue, 2015-04-07 at 12:29 +0200, Peter Zijlstra wrote:
> On Tue, Apr 07, 2015 at 07:09:43AM +0200, Mike Galbraith wrote:
> > On Mon, 2015-04-06 at 21:59 -0400, Steven Rostedt wrote:
> > > 
> > > We really should have a rt_spin_trylock_in_irq() and not have the
> > > below if conditional.
> > > 
> > > The paths that will be executed in hard irq context are static. 
> > > They
> > > should be labeled as such.
> > 
> > +/*
> > + * Special purpose for locks taken in interrupt context: Take and 
> > hold
> > + * ->wait_lock lest PI catching us with our fingers in the cookie 
> > jar.
> > + * Do NOT abuse.
> > + */
> > +int __lockfunc rt_spin_trylock_in_irq(spinlock_t *lock)
> > +{
> > +       struct task_struct *owner;
> > +       if (!raw_spin_trylock(&lock->lock.wait_lock))
> > +               return 0;
> > +       owner = idle_task(raw_smp_processor_id());
> > +       if (!(rt_mutex_cmpxchg(&lock->lock, NULL, owner))) {
> > +               raw_spin_unlock(&lock->lock.wait_lock);
> > +               return 0;
> > +       }
> > +       spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
> > +       return 1;
> > +}
> > +
> > +/* ONLY for use with rt_spin_trylock_in_irq(), do NOT abuse. */
> > +void __lockfunc rt_spin_trylock_in_irq_unlock(spinlock_t *lock)
> > +{
> > +       struct task_struct *owner = 
> > idle_task(raw_smp_processor_id());
> > +       /* NOTE: we always pass in '1' for nested, for simplicity 
> > */
> > +       spin_release(&lock->dep_map, 1, _RET_IP_);
> > +       BUG_ON(!(rt_mutex_cmpxchg(&lock->lock, owner, NULL)));
> > +       raw_spin_unlock(&lock->lock.wait_lock);
> > +}
> > +
> 
> Can someone explain this braindamage? You should _NOT_ take mutexes 
> in
> hardirq context.

No.. really? ;-)

If you have a spot where it'd be nice to do that despite it being 
somewhat (koff).. discouraged shall we say, you have to do something 
funky.  Thomas had a patch to not raise sirq unconditionally for -rt 
to let nohz_full work, but it needs a lock that's converted to an 
rtmutex in -rt, and which doesn't want to be un-converted.  Ergo, get 
funky.

> And if its an irq thread, then the irq thread _IS_ the right owner, 
> the
> thread needs to be boosted by waiters.

No irq thread.

> The idle thread cannot ever be owner of a mutex, that's complete and
> utter bullshit.

Not if you want to hide current from the deadlock detector lest it get 
upset and make box go boom.

        -Mike

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 10:49           ` Mike Galbraith
@ 2015-04-07 10:56             ` Peter Zijlstra
  2015-04-07 11:01               ` Mike Galbraith
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2015-04-07 10:56 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Thavatchai Makphaibulchoke, linux-kernel, mingo,
	tglx, linux-rt-users, Sebastian Andrzej Siewior

On Tue, Apr 07, 2015 at 12:49:16PM +0200, Mike Galbraith wrote:
> If you have a spot where it'd be nice to do that despite it being 
> somewhat (koff).. discouraged shall we say, you have to do something 
> funky.  Thomas had a patch to not raise sirq unconditionally for -rt 
> to let nohz_full work, but it needs a lock that's converted to an 
> rtmutex in -rt, and which doesn't want to be un-converted.  Ergo, get 
> funky.

I should probably back away slowly at this point.. but got a reference
to a thread that explains the 'problem'?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 10:56             ` Peter Zijlstra
@ 2015-04-07 11:01               ` Mike Galbraith
  0 siblings, 0 replies; 37+ messages in thread
From: Mike Galbraith @ 2015-04-07 11:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Thavatchai Makphaibulchoke, linux-kernel, mingo,
	tglx, linux-rt-users, Sebastian Andrzej Siewior

On Tue, 2015-04-07 at 12:56 +0200, Peter Zijlstra wrote:
> On Tue, Apr 07, 2015 at 12:49:16PM +0200, Mike Galbraith wrote:
> > If you have a spot where it'd be nice to do that despite it being 
> > somewhat (koff).. discouraged shall we say, you have to do 
> > something 
> > funky.  Thomas had a patch to not raise sirq unconditionally for -
> > rt 
> > to let nohz_full work, but it needs a lock that's converted to an 
> > rtmutex in -rt, and which doesn't want to be un-converted.  Ergo, 
> > get 
> > funky.
> 
> I should probably back away slowly at this point.. but got a 
> reference
> to a thread that explains the 'problem'?

https://lkml.org/lkml/2015/4/6/434

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07  1:26   ` [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel " Thavatchai Makphaibulchoke
  2015-04-07  1:59     ` Steven Rostedt
@ 2015-04-07 11:23     ` Thomas Gleixner
  2015-04-07 11:47       ` Mike Galbraith
  1 sibling, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2015-04-07 11:23 UTC (permalink / raw)
  To: Thavatchai Makphaibulchoke
  Cc: Steven Rostedt, LKML, mingo, linux-rt-users, umgwanakikbuti,
	Peter Zijlstra

On Mon, 6 Apr 2015, Thavatchai Makphaibulchoke wrote:

> This patch fixes the problem that the ownership of a mutex acquired
> by an interrupt handler(IH) gets incorrectly attributed to the
> interrupted thread.

An hard interrupt handler is not allowed to take a mutex. End of
story, nothing to fix here.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 11:23     ` Thomas Gleixner
@ 2015-04-07 11:47       ` Mike Galbraith
  2015-04-07 12:04         ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Galbraith @ 2015-04-07 11:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Thavatchai Makphaibulchoke, Steven Rostedt, LKML, mingo,
	linux-rt-users, Peter Zijlstra

On Tue, 2015-04-07 at 13:23 +0200, Thomas Gleixner wrote:
> On Mon, 6 Apr 2015, Thavatchai Makphaibulchoke wrote:
> 
> > This patch fixes the problem that the ownership of a mutex acquired
> > by an interrupt handler(IH) gets incorrectly attributed to the
> > interrupted thread.
> 
> An hard interrupt handler is not allowed to take a mutex. End of
> story, nothing to fix here.

Well, the patch that started this thread..

timers-do-not-raise-softirq-unconditionally.patch

..(attributed to you) was picked up in -rt (perhaps erroneously) to 
get nohz_full working, and then reverted due to the deadlock detector 
getting properly angry.  All of this is about reinstating it.

I posted a patchlet to simply subtract softirqd from the ->nr_running 
check, which gets nohz_full working in -rt sans illegal activity.

        -Mike

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 11:47       ` Mike Galbraith
@ 2015-04-07 12:04         ` Peter Zijlstra
  2015-04-07 12:07           ` Peter Zijlstra
                             ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Peter Zijlstra @ 2015-04-07 12:04 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Thomas Gleixner, Thavatchai Makphaibulchoke, Steven Rostedt,
	LKML, mingo, linux-rt-users

On Tue, Apr 07, 2015 at 01:47:16PM +0200, Mike Galbraith wrote:
> On Tue, 2015-04-07 at 13:23 +0200, Thomas Gleixner wrote:
> > On Mon, 6 Apr 2015, Thavatchai Makphaibulchoke wrote:
> > 
> > > This patch fixes the problem that the ownership of a mutex acquired
> > > by an interrupt handler(IH) gets incorrectly attributed to the
> > > interrupted thread.
> > 
> > An hard interrupt handler is not allowed to take a mutex. End of
> > story, nothing to fix here.
> 
> Well, the patch that started this thread..
> 
> timers-do-not-raise-softirq-unconditionally.patch

Aah, that is the problem..

@@ -1454,8 +1452,32 @@ static void run_timer_softirq(struct softirq_action *h)
  */
 void run_local_timers(void)
 {
+	struct tvec_base *base = __this_cpu_read(tvec_bases);
+
 	hrtimer_run_queues();
-	raise_softirq(TIMER_SOFTIRQ);
+	/*
+	 * We can access this lockless as we are in the timer
+	 * interrupt. If there are no timers queued, nothing to do in
+	 * the timer softirq.
+	 */
+#ifdef CONFIG_PREEMPT_RT_FULL
+	if (!spin_do_trylock(&base->lock)) {
+		raise_softirq(TIMER_SOFTIRQ);
+		return;
+	}
+#endif
+	if (!base->active_timers)
+		goto out;
+
+	/* Check whether the next pending timer has expired */
+	if (time_before_eq(base->next_timer, jiffies))
+		raise_softirq(TIMER_SOFTIRQ);
+out:
+#ifdef CONFIG_PREEMPT_RT_FULL
+	rt_spin_unlock_after_trylock_in_irq(&base->lock);
+#endif
+	/* The ; ensures that gcc won't complain in the !RT case */
+	;
 }

That smells like something we should be able to do without a lock.

If we use {READ,WRITE}_ONCE() on those two fields (->active_timers and
->next_timer) we should be able to do this without the spinlock.

Races here aren't really a problem I think, if you manage to install a
timer at the current jiffy and have already missed the tick you're in
the same boat. You get to wait for the next tick.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 12:04         ` Peter Zijlstra
@ 2015-04-07 12:07           ` Peter Zijlstra
  2015-04-07 12:41           ` Steven Rostedt
  2015-04-07 18:12           ` Jason Low
  2 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2015-04-07 12:07 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Thomas Gleixner, Thavatchai Makphaibulchoke, Steven Rostedt,
	LKML, mingo, linux-rt-users

On Tue, Apr 07, 2015 at 02:04:03PM +0200, Peter Zijlstra wrote:
> +out:
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +	rt_spin_unlock_after_trylock_in_irq(&base->lock);
> +#endif
> +	/* The ; ensures that gcc won't complain in the !RT case */
> +	;
>  }

Fwiw, I typically use a 'return' in that case. But yes, gcc is a bit of
a whiny arse when it comes to labels to the end of the block.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 12:04         ` Peter Zijlstra
  2015-04-07 12:07           ` Peter Zijlstra
@ 2015-04-07 12:41           ` Steven Rostedt
  2015-04-07 12:54             ` Peter Zijlstra
  2015-04-07 18:12           ` Jason Low
  2 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2015-04-07 12:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Thomas Gleixner, Thavatchai Makphaibulchoke,
	LKML, mingo, linux-rt-users

On Tue, 7 Apr 2015 14:04:03 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Apr 07, 2015 at 01:47:16PM +0200, Mike Galbraith wrote:
> > On Tue, 2015-04-07 at 13:23 +0200, Thomas Gleixner wrote:
> > > On Mon, 6 Apr 2015, Thavatchai Makphaibulchoke wrote:
> > > 
> > > > This patch fixes the problem that the ownership of a mutex acquired
> > > > by an interrupt handler(IH) gets incorrectly attributed to the
> > > > interrupted thread.
> > > 
> > > An hard interrupt handler is not allowed to take a mutex. End of
> > > story, nothing to fix here.
> > 
> > Well, the patch that started this thread..
> > 
> > timers-do-not-raise-softirq-unconditionally.patch
> 
> Aah, that is the problem..
> 

Yep, all this nonsense came from that patch and trying to get
NO_HZ_FULL working with -rt. It's a bit ironic that the push to get
NO_HZ_FULL into mainline came from our RT mini summit, but its
implementation is broken on -rt :-p

Ideally, we don't want to take mutexes in hard interrupt context.


> @@ -1454,8 +1452,32 @@ static void run_timer_softirq(struct softirq_action *h)
>   */
>  void run_local_timers(void)
>  {
> +	struct tvec_base *base = __this_cpu_read(tvec_bases);
> +
>  	hrtimer_run_queues();
> -	raise_softirq(TIMER_SOFTIRQ);
> +	/*
> +	 * We can access this lockless as we are in the timer
> +	 * interrupt. If there are no timers queued, nothing to do in
> +	 * the timer softirq.
> +	 */
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +	if (!spin_do_trylock(&base->lock)) {
> +		raise_softirq(TIMER_SOFTIRQ);
> +		return;
> +	}
> +#endif
> +	if (!base->active_timers)
> +		goto out;
> +
> +	/* Check whether the next pending timer has expired */
> +	if (time_before_eq(base->next_timer, jiffies))
> +		raise_softirq(TIMER_SOFTIRQ);
> +out:
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +	rt_spin_unlock_after_trylock_in_irq(&base->lock);
> +#endif
> +	/* The ; ensures that gcc won't complain in the !RT case */
> +	;
>  }
> 
> That smells like something we should be able to do without a lock.
> 
> If we use {READ,WRITE}_ONCE() on those two fields (->active_timers and
> ->next_timer) we should be able to do this without the spinlock.
> 
> Races here aren't really a problem I think, if you manage to install a
> timer at the current jiffy and have already missed the tick you're in
> the same boat. You get to wait for the next tick.

I'll take a deeper look at this code too. If we can get rid of this
hack, then we don't need the mutex-in-irq hack either.

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 12:41           ` Steven Rostedt
@ 2015-04-07 12:54             ` Peter Zijlstra
  2015-04-07 13:58               ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2015-04-07 12:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mike Galbraith, Thomas Gleixner, Thavatchai Makphaibulchoke,
	LKML, mingo, linux-rt-users

On Tue, Apr 07, 2015 at 08:41:54AM -0400, Steven Rostedt wrote:
> Ideally, we don't want to take mutexes in hard interrupt context.

Nothing ideal about it. Hard requirement, anything else is plain
insanity.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 12:54             ` Peter Zijlstra
@ 2015-04-07 13:58               ` Steven Rostedt
  0 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2015-04-07 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Thomas Gleixner, Thavatchai Makphaibulchoke,
	LKML, mingo, linux-rt-users

On Tue, 7 Apr 2015 14:54:16 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Apr 07, 2015 at 08:41:54AM -0400, Steven Rostedt wrote:
> > Ideally, we don't want to take mutexes in hard interrupt context.
> 
> Nothing ideal about it. Hard requirement, anything else is plain
> insanity.

Um, we're talking about the -rt folks. Everything we do is plain
insanity.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 12:04         ` Peter Zijlstra
  2015-04-07 12:07           ` Peter Zijlstra
  2015-04-07 12:41           ` Steven Rostedt
@ 2015-04-07 18:12           ` Jason Low
  2015-04-07 19:17             ` Thomas Gleixner
  2 siblings, 1 reply; 37+ messages in thread
From: Jason Low @ 2015-04-07 18:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Thomas Gleixner, Thavatchai Makphaibulchoke,
	Steven Rostedt, LKML, Ingo Molnar, linux-rt-users, Jason Low

On Tue, Apr 7, 2015 at 5:04 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> That smells like something we should be able to do without a lock.
>
> If we use {READ,WRITE}_ONCE() on those two fields (->active_timers and
> ->next_timer) we should be able to do this without the spinlock.

Yeah, when atomics were suggested earlier, I was wondering if we could
just use READ_ONCE/WRITE_ONCE.

> Races here aren't really a problem I think, if you manage to install a
> timer at the current jiffy and have already missed the tick you're in
> the same boat. You get to wait for the next tick.

The lock shouldn't be used in get_next_timer_interrupt() either right?

unsigned long get_next_timer_interrupt(unsigned long now)
{
...

#ifdef CONFIG_PREEMPT_RT_FULL
        /*
         * On PREEMPT_RT we cannot sleep here. If the trylock does not
         * succeed then we return the worst-case 'expires in 1 tick'
         * value.  We use the rt functions here directly to avoid a
         * migrate_disable() call.
         */
        if (!spin_do_trylock(&base->lock))
                return  now + 1;
#else

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 18:12           ` Jason Low
@ 2015-04-07 19:17             ` Thomas Gleixner
  2015-04-07 19:57               ` Jason Low
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2015-04-07 19:17 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Mike Galbraith, Thavatchai Makphaibulchoke,
	Steven Rostedt, LKML, Ingo Molnar, linux-rt-users

On Tue, 7 Apr 2015, Jason Low wrote:
> The lock shouldn't be used in get_next_timer_interrupt() either right?
> 
> unsigned long get_next_timer_interrupt(unsigned long now)
> {
> ...
> 
> #ifdef CONFIG_PREEMPT_RT_FULL
>         /*
>          * On PREEMPT_RT we cannot sleep here. If the trylock does not
>          * succeed then we return the worst-case 'expires in 1 tick'
>          * value.  We use the rt functions here directly to avoid a
>          * migrate_disable() call.
>          */
>         if (!spin_do_trylock(&base->lock))
>                 return  now + 1;
> #else

And how do you protect the walk of the timer wheel against a
concurrent insertion/removal?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 19:17             ` Thomas Gleixner
@ 2015-04-07 19:57               ` Jason Low
  2015-04-07 21:38                 ` Thomas Gleixner
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Low @ 2015-04-07 19:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Mike Galbraith, Thavatchai Makphaibulchoke,
	Steven Rostedt, LKML, Ingo Molnar, linux-rt-users, jason.low2

On Tue, 2015-04-07 at 21:17 +0200, Thomas Gleixner wrote:
> On Tue, 7 Apr 2015, Jason Low wrote:
> > The lock shouldn't be used in get_next_timer_interrupt() either right?
> > 
> > unsigned long get_next_timer_interrupt(unsigned long now)
> > {
> > ...
> > 
> > #ifdef CONFIG_PREEMPT_RT_FULL
> >         /*
> >          * On PREEMPT_RT we cannot sleep here. If the trylock does not
> >          * succeed then we return the worst-case 'expires in 1 tick'
> >          * value.  We use the rt functions here directly to avoid a
> >          * migrate_disable() call.
> >          */
> >         if (!spin_do_trylock(&base->lock))
> >                 return  now + 1;
> > #else
> 
> And how do you protect the walk of the timer wheel against a
> concurrent insertion/removal?

So I just wanted to mention that the issue also applies to
get_next_timer_interrupt(), in addition to run_local_timers(), but if we
really want to remove the lock there, can we always return "now + 1" for
PREEMPT_RT_FULL?


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07 19:57               ` Jason Low
@ 2015-04-07 21:38                 ` Thomas Gleixner
  0 siblings, 0 replies; 37+ messages in thread
From: Thomas Gleixner @ 2015-04-07 21:38 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Mike Galbraith, Thavatchai Makphaibulchoke,
	Steven Rostedt, LKML, Ingo Molnar, linux-rt-users

On Tue, 7 Apr 2015, Jason Low wrote:
> On Tue, 2015-04-07 at 21:17 +0200, Thomas Gleixner wrote:
> > On Tue, 7 Apr 2015, Jason Low wrote:
> > > The lock shouldn't be used in get_next_timer_interrupt() either right?
> > > 
> > > unsigned long get_next_timer_interrupt(unsigned long now)
> > > {
> > > ...
> > > 
> > > #ifdef CONFIG_PREEMPT_RT_FULL
> > >         /*
> > >          * On PREEMPT_RT we cannot sleep here. If the trylock does not
> > >          * succeed then we return the worst-case 'expires in 1 tick'
> > >          * value.  We use the rt functions here directly to avoid a
> > >          * migrate_disable() call.
> > >          */
> > >         if (!spin_do_trylock(&base->lock))
> > >                 return  now + 1;
> > > #else
> > 
> > And how do you protect the walk of the timer wheel against a
> > concurrent insertion/removal?
> 
> So I just wanted to mention that the issue also applies to
> get_next_timer_interrupt(), in addition to run_local_timers(), but if we
> really want to remove the lock there, can we always return "now + 1" for
> PREEMPT_RT_FULL?
 
There is a simpler solution for this: Disable NOHZ for RT. Because
that's what 'always return "now + 1"' does at runtime just with more
overhead.

We need a smarter solution for RT to avoid the lock completely.

Thanks,

	tglx











^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-07  1:59     ` Steven Rostedt
  2015-04-07  5:09       ` Mike Galbraith
@ 2015-04-08  0:55       ` Thavatchai Makphaibulchoke
  2015-04-08  8:50         ` Thomas Gleixner
  1 sibling, 1 reply; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-04-08  0:55 UTC (permalink / raw)
  To: Steven Rostedt, Thavatchai Makphaibulchoke
  Cc: linux-kernel, mingo, tglx, linux-rt-users, umgwanakikbuti,
	Peter Zijlstra, Sebastian Andrzej Siewior



On 04/06/2015 07:59 PM, Steven Rostedt wrote:
> 

Thanks for the comments.

> Hmm, why is it not allowed?
> 
> If we just let it boost it, it will cut down on the code changes and
> checks that add to the hot paths.
> 

There is a WARN_ON() at line 3150 in sched/core.c to warn against
boosting idle_task priority.

In this case we are not actually boosting the idle_task priority, which
should be OK.  But the warning could be very overwhelming on some
platforms. TO keep the warning, I decided not to boots priority.  Please
let me know if you have any suggestion.

>>  		rt_mutex_enqueue_pi(owner, waiter);
>> -
> 
> I don't think this whitespace change needs to be done. The space does
> split up the dequeue and enqueue from the rest.
> 

Will restore it.

>> +	/* Might sleep, should not be called in interrupt context. */
>> +	BUG_ON(in_interrupt());
> 
> You're right it shouldn't. But that's why might_sleep() will give us a
> nice big warning if it is. Don't add the BUG_ON().
> 

Will remove it.

>> -static void  noinline __sched rt_spin_lock_slowunlock_hirq(struct rt_mutex *lock)
>> +static inline void rt_spin_lock_fastunlock_in_irq(struct rt_mutex *lock,
> 
> Why the name change?
> 

Instead of adding a new task_struct *caller parameter to
rt_spin_lock_fastUnlock() and make all other invocations of it to supply
the additional parameter, a simpler change would be to add a new
function rt_spin_lock_fastunlock_in_irq(), similar to the original
rt_spin_lock_slowunlock_hirq(), but first do fast mutex acquire attempt
with idle_task as owner and attempt the slow path if required and leave
the rt_spin_lock_fast_unlock() as it is.

>> +	void (*slowfn)(struct rt_mutex *lock, struct task_struct *task))
>>  {
>>  	int ret;
>> +	struct task_struct *intr_owner = current;
>>  
>> +	if (unlikely(in_irq()))
> 
> Why unlikely? This should only be called in interrupt context.
> 
> In fact, perhaps we should have a:
> 
> 	WARN_ON(!in_irq());
> 
> Then we don't need this test at all, and just assign the owner the idle
> task.
> 

You are right.  Sorry I guess I did not pay enough attention here. Will
do that.

>> +		intr_owner = idle_task(smp_processor_id());
> 
> Also, never butt a single if statement up against another if statement.
> Add a space, otherwise it gives the impression of being an
>   if () else if ()
> 

OK thanks.

>> +	if (likely(rt_mutex_cmpxchg(lock, intr_owner, NULL))) {
>> +		rt_mutex_deadlock_account_unlock(intr_owner);
>> +		return;
>> +	}
> 
> And add a space here. Don't butt conditionals together unless they are
> related (if else if, etc)
> 

Will do.

>>  	do {
>>  		ret = raw_spin_trylock(&lock->wait_lock);
>>  	} while (!ret);
> 
> I know this isn't part of the patch, but that do loop needs a comment
> (this is more toward Sebastian, and not you). It looks buggy, and I
> think we do it this way just so that lockdep doesn't complain. We need
> a comment here that states something like:
> 
> 	/*
> 	 * To get this rt_mutex from interrupt context, we had to have
> 	 * taken the wait_lock once before. Thus, nothing can deadlock
> 	 * us now. The wait_lock is internal to the rt_mutex, and
> 	 * anything that may have it now, will soon release it, because
> 	 * we own the rt_mutex but do not hold anything that the owner
> 	 * of the wait_lock would need to grab.
> 	 *
> 	 * The do { } while() is to keep lockdep from complaining.
> 	 */
> 

Will do.

> I wonder if there's another way to just take the wait_lock and tell
> lockdep not to complain?
> 
> Peter?
> 
>>  
>> -	__rt_spin_lock_slowunlock(lock);
>> +	slowfn(lock, intr_owner);
>>  }
>>  
>>  void __lockfunc rt_spin_lock(spinlock_t *lock)
>> @@ -1118,7 +1136,7 @@ void __lockfunc rt_spin_unlock_after_trylock_in_irq(spinlock_t *lock)
>>  {
>>  	/* NOTE: we always pass in '1' for nested, for simplicity */
>>  	spin_release(&lock->dep_map, 1, _RET_IP_);
>> -	rt_spin_lock_fastunlock(&lock->lock, rt_spin_lock_slowunlock_hirq);
>> +	rt_spin_lock_fastunlock_in_irq(&lock->lock, __rt_spin_lock_slowunlock);
>>  }
>>  
>>  void __lockfunc __rt_spin_unlock(struct rt_mutex *lock)
>> @@ -1146,8 +1164,12 @@ int __lockfunc __rt_spin_trylock(struct rt_mutex *lock)
>>  
>>  int __lockfunc rt_spin_trylock(spinlock_t *lock)
> 
> We really should have a rt_spin_trylock_in_irq() and not have the
> below if conditional.
> 
> The paths that will be executed in hard irq context are static. They
> should be labeled as such.
> 

Are you talking about having a new function spin_trylock_in_irq() that
is turned into rt_spin-trylock_in_irq() that is called only in the
interrupt context?

That was part of my originally changes.  But that also require change in
kernel/timer.c and include/linux/spinlock_rt.h.  Since it involves
changes in 2 additional files, I backed out.  BTW, with that we could
also add a WAR_ON(in_irq()) in rt_spin_trylock().

> -- Steve

Thanks,
Mak.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-08  0:55       ` Thavatchai Makphaibulchoke
@ 2015-04-08  8:50         ` Thomas Gleixner
  2015-04-09 22:56           ` Thavatchai Makphaibulchoke
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2015-04-08  8:50 UTC (permalink / raw)
  To: Thavatchai Makphaibulchoke
  Cc: Steven Rostedt, Thavatchai Makphaibulchoke, linux-kernel, mingo,
	linux-rt-users, umgwanakikbuti, Peter Zijlstra,
	Sebastian Andrzej Siewior

On Tue, 7 Apr 2015, Thavatchai Makphaibulchoke wrote:
> On 04/06/2015 07:59 PM, Steven Rostedt wrote:
> In this case we are not actually boosting the idle_task priority, which
> should be OK.  But the warning could be very overwhelming on some
> platforms. TO keep the warning, I decided not to boots priority.  Please
> let me know if you have any suggestion.

Don't even try to get this working. Taking the lock in idle/interrupt
context is just plain wrong.

The proper solution is to get rid of the locking requirement and that
needs some thought on the timer wheel code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
  2015-04-08  8:50         ` Thomas Gleixner
@ 2015-04-09 22:56           ` Thavatchai Makphaibulchoke
  0 siblings, 0 replies; 37+ messages in thread
From: Thavatchai Makphaibulchoke @ 2015-04-09 22:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Thavatchai Makphaibulchoke, linux-kernel, mingo,
	linux-rt-users, umgwanakikbuti, Peter Zijlstra,
	Sebastian Andrzej Siewior



On 04/08/2015 02:50 AM, Thomas Gleixner wrote:
> 
> Don't even try to get this working. Taking the lock in idle/interrupt
> context is just plain wrong.
> 

Thanks for the comments.

Could you please confirm you meant rt_mutex as it can go to sleep.

> The proper solution is to get rid of the locking requirement and that
> needs some thought on the timer wheel code.
> 

Yes, I agree fixing the problem in the timer code is the proper
solution.  BTW, once the code is fixed, I think we should add
WARN_ON(in_irq()); to the function spin_do_trylock() to prevent any
invalid future usage.

Thanks,
Mak.

> Thanks,
> 
> 	tglx
> 




^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-04-09 22:57 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-20  1:31 [PATCH 3.14.25-rt22 0/2] rtmutex Real-Time Linux: fix kernel BUG at kernel/locking/rtmutex.c:997! and some optimization Thavatchai Makphaibulchoke
2015-02-20  1:31 ` [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
2015-02-20  4:53   ` Steven Rostedt
2015-02-20 18:54     ` Thavatchai Makphaibulchoke
2015-02-21  1:49       ` Steven Rostedt
2015-02-23 18:37   ` Steven Rostedt
2015-02-24  0:16     ` Thavatchai Makphaibulchoke
2015-02-24  0:57       ` Steven Rostedt
2015-02-26 13:56         ` Sebastian Andrzej Siewior
2015-02-26 14:06           ` Steven Rostedt
2015-03-06 12:19             ` Sebastian Andrzej Siewior
2015-03-09 16:36               ` Steven Rostedt
2015-03-09 16:49                 ` Sebastian Andrzej Siewior
2015-02-20  1:31 ` [PATCH 3.14.25-rt22 2/2] kernel/locking/rtmutex.c: some code optimization Thavatchai Makphaibulchoke
2015-04-07  1:26 ` [PATCH v2 0/2] rtmutex Real-Time Linux: fix BUG at kernel/locking/rtmutex.c:997! Thavatchai Makphaibulchoke
2015-04-07  1:26   ` [PATCH v2 1/2] rtmutex Real-Time Linux: Fixing kernel " Thavatchai Makphaibulchoke
2015-04-07  1:59     ` Steven Rostedt
2015-04-07  5:09       ` Mike Galbraith
2015-04-07 10:29         ` Peter Zijlstra
2015-04-07 10:49           ` Mike Galbraith
2015-04-07 10:56             ` Peter Zijlstra
2015-04-07 11:01               ` Mike Galbraith
2015-04-08  0:55       ` Thavatchai Makphaibulchoke
2015-04-08  8:50         ` Thomas Gleixner
2015-04-09 22:56           ` Thavatchai Makphaibulchoke
2015-04-07 11:23     ` Thomas Gleixner
2015-04-07 11:47       ` Mike Galbraith
2015-04-07 12:04         ` Peter Zijlstra
2015-04-07 12:07           ` Peter Zijlstra
2015-04-07 12:41           ` Steven Rostedt
2015-04-07 12:54             ` Peter Zijlstra
2015-04-07 13:58               ` Steven Rostedt
2015-04-07 18:12           ` Jason Low
2015-04-07 19:17             ` Thomas Gleixner
2015-04-07 19:57               ` Jason Low
2015-04-07 21:38                 ` Thomas Gleixner
2015-04-07  1:26   ` [PATCH v2 2/2] kernel/locking/rtmutex.c: some code optimization Thavatchai Makphaibulchoke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).