LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Gregory Haskins <ghaskins@novell.com>
To: mingo@elte.hu, a.p.zijlstra@chello.nl, tglx@linutronix.de,
	rostedt@goodmis.org, linux-rt-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, bill.huey@gmail.com,
	kevin@hilman.org, cminyard@mvista.com, dsingleton@mvista.com,
	dwalker@mvista.com, npiggin@suse.de, dsaxena@plexity.net,
	ak@suse.de, pavel@ucw.cz, acme@redhat.com, gregkh@suse.de,
	sdietrich@novell.com, pmorreale@novell.com, mkohari@novell.com,
	ghaskins@novell.com
Subject: [(RT RFC) PATCH v2 5/9] adaptive real-time lock support
Date: Mon, 25 Feb 2008 11:01:03 -0500	[thread overview]
Message-ID: <20080225160103.11268.25579.stgit@novell1.haskins.net> (raw)
In-Reply-To: <20080225155959.11268.35541.stgit@novell1.haskins.net>

There are pros and cons when deciding between the two basic forms of
locking primitives (spinning vs sleeping).  Without going into great
detail on either one, we note that spinlocks have the advantage of
lower overhead for short hold locks.  However, they also have a
con in that they create indeterminate latencies since preemption
must traditionally be disabled while the lock is held (to prevent deadlock).

We want to avoid non-deterministic critical sections in -rt. Therefore,
when realtime is enabled, most contexts are converted to threads, and
likewise most spinlock_ts are converted to sleepable rt-mutex derived
locks.  This allows the holder of the lock to remain fully preemptible,
thus reducing a major source of latencies in the kernel.

However, converting what was once a true spinlock into a sleeping lock
may also decrease performance since the locks will now sleep under
contention.  Since the fundamental lock used to be a spinlock, it is
highly likely that it was used in a short-hold path and that release
is imminent.  Therefore sleeping only serves to cause context-thrashing.

Adaptive RT locks use a hybrid approach to solve the problem.  They
spin when possible, and sleep when necessary (to avoid deadlock, etc).
This significantly improves many areas of the performance of the -rt
kernel.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Morreale <pmorreale@novell.com>
Signed-off-by: Sven Dietrich <sdietrich@novell.com>
---

 kernel/Kconfig.preempt    |   20 +++++++
 kernel/rtmutex.c          |   30 +++++++---
 kernel/rtmutex_adaptive.h |  138 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 178 insertions(+), 10 deletions(-)

diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index e493257..d2432fa 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -183,6 +183,26 @@ config RCU_TRACE
 	  Say Y/M here if you want to enable RCU tracing in-kernel/module.
 	  Say N if you are unsure.
 
+config ADAPTIVE_RTLOCK
+        bool "Adaptive real-time locks"
+        default y
+        depends on PREEMPT_RT && SMP
+        help
+          PREEMPT_RT allows for greater determinism by transparently
+          converting normal spinlock_ts into preemptible rtmutexes which
+          sleep any waiters under contention.  However, in many cases the
+          lock will be released in less time than it takes to context
+          switch.  Therefore, the "sleep under contention" policy may also
+          degrade throughput performance due to the extra context switches.
+
+          This option alters the rtmutex derived spinlock_t replacement
+          code to use an adaptive spin/sleep algorithm.  It will spin
+          unless it determines it must sleep to avoid deadlock.  This
+          offers a best of both worlds solution since we achieve both
+          high-throughput and low-latency.
+
+          If unsure, say Y.
+
 config SPINLOCK_BKL
 	bool "Old-Style Big Kernel Lock"
 	depends on (PREEMPT || SMP) && !PREEMPT_RT
diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index bf9e230..3802ef8 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -7,6 +7,8 @@
  *  Copyright (C) 2005-2006 Timesys Corp., Thomas Gleixner <tglx@timesys.com>
  *  Copyright (C) 2005 Kihon Technologies Inc., Steven Rostedt
  *  Copyright (C) 2006 Esben Nielsen
+ *  Copyright (C) 2008 Novell, Inc., Sven Dietrich, Peter Morreale,
+ *                                   and Gregory Haskins
  *
  *  See Documentation/rt-mutex-design.txt for details.
  */
@@ -17,6 +19,7 @@
 #include <linux/hardirq.h>
 
 #include "rtmutex_common.h"
+#include "rtmutex_adaptive.h"
 
 #ifdef CONFIG_RTLOCK_LATERAL_STEAL
 int rtmutex_lateral_steal __read_mostly = 1;
@@ -734,6 +737,7 @@ rt_spin_lock_slowlock(struct rt_mutex *lock)
 {
 	struct rt_mutex_waiter waiter;
 	unsigned long saved_state, state, flags;
+	DECLARE_ADAPTIVE_WAITER(adaptive);
 
 	debug_rt_mutex_init_waiter(&waiter);
 	waiter.task = NULL;
@@ -780,6 +784,8 @@ rt_spin_lock_slowlock(struct rt_mutex *lock)
 				continue;
 		}
 
+		prepare_adaptive_wait(lock, &adaptive);
+
 		/*
 		 * Prevent schedule() to drop BKL, while waiting for
 		 * the lock ! We restore lock_depth when we come back.
@@ -791,16 +797,20 @@ rt_spin_lock_slowlock(struct rt_mutex *lock)
 
 		debug_rt_mutex_print_deadlock(&waiter);
 
-		update_current(TASK_UNINTERRUPTIBLE, &saved_state);
-		/*
-		 * The xchg() in update_current() is an implicit barrier
-		 * which we rely upon to ensure current->state is visible
-		 * before we test waiter.task.
-		 */
-		if (waiter.task)
-			schedule_rt_mutex(lock);
-		else
-			update_current(TASK_RUNNING_MUTEX, &saved_state);
+		/* adaptive_wait() returns 1 if we need to sleep */
+		if (adaptive_wait(lock, &waiter, &adaptive)) {
+			update_current(TASK_UNINTERRUPTIBLE, &saved_state);
+			/*
+			 * The xchg() in update_current() is an implicit
+			 * barrier which we rely upon to ensure current->state
+			 * is visible before we test waiter.task.
+			 */
+			if (waiter.task)
+				schedule_rt_mutex(lock);
+			else
+				update_current(TASK_RUNNING_MUTEX,
+					       &saved_state);
+		}
 
 		spin_lock_irqsave(&lock->wait_lock, flags);
 		current->flags |= saved_flags;
diff --git a/kernel/rtmutex_adaptive.h b/kernel/rtmutex_adaptive.h
new file mode 100644
index 0000000..862c088
--- /dev/null
+++ b/kernel/rtmutex_adaptive.h
@@ -0,0 +1,138 @@
+/*
+ * Adaptive RT lock support
+ *
+ * There are pros and cons when deciding between the two basic forms of
+ * locking primitives (spinning vs sleeping).  Without going into great
+ * detail on either one, we note that spinlocks have the advantage of
+ * lower overhead for short hold locks.  However, they also have a
+ * con in that they create indeterminate latencies since preemption
+ * must traditionally be disabled while the lock is held (to prevent deadlock).
+ *
+ * We want to avoid non-deterministic critical sections in -rt. Therefore,
+ * when realtime is enabled, most contexts are converted to threads, and
+ * likewise most spinlock_ts are converted to sleepable rt-mutex derived
+ * locks.  This allows the holder of the lock to remain fully preemptible,
+ * thus reducing a major source of latencies in the kernel.
+ *
+ * However, converting what was once a true spinlock into a sleeping lock
+ * may also decrease performance since the locks will now sleep under
+ * contention.  Since the fundamental lock used to be a spinlock, it is
+ * highly likely that it was used in a short-hold path and that release
+ * is imminent.  Therefore sleeping only serves to cause context-thrashing.
+ *
+ * Adaptive RT locks use a hybrid approach to solve the problem.  They
+ * spin when possible, and sleep when necessary (to avoid deadlock, etc).
+ * This significantly improves many areas of the performance of the -rt
+ * kernel.
+ *
+ * Copyright (C) 2008 Novell, Inc.,
+ *          Sven Dietrich, Peter Morreale, and Gregory Haskins
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; version 2
+ *  of the License.
+ */
+
+#ifndef __KERNEL_RTMUTEX_ADAPTIVE_H
+#define __KERNEL_RTMUTEX_ADAPTIVE_H
+
+#include "rtmutex_common.h"
+
+
+#ifdef CONFIG_ADAPTIVE_RTLOCK
+struct adaptive_waiter {
+	struct task_struct *owner;
+};
+
+/*
+ * Adaptive-rtlocks will busywait when possible, and sleep only if
+ * necessary. Note that the busyloop looks racy, and it is....but we do
+ * not care. If we lose any races it simply means that we spin one more
+ * time before seeing that we need to break-out on the next iteration.
+ *
+ * We realize this is a relatively large function to inline, but note that
+ * it is only instantiated 1 or 2 times max, and it makes a measurable
+ * performance different to avoid the call.
+ *
+ * Returns 1 if we should sleep
+ *
+ */
+static inline int
+adaptive_wait(struct rt_mutex *lock, struct rt_mutex_waiter *waiter,
+	      struct adaptive_waiter *adaptive)
+{
+	int sleep = 0;
+
+	for (;;) {
+		/*
+		 * If the task was re-awoken, break out completely so we can
+		 * reloop through the lock-acquisition code.
+		 */
+		if (!waiter->task)
+			break;
+
+		/*
+		 * We need to break if the owner changed so we can reloop
+		 * and safely acquire the owner-pointer again with the
+		 * wait_lock held.
+		 */
+		if (adaptive->owner != rt_mutex_owner(lock))
+			break;
+
+		/*
+		 * If we got here, presumably the lock ownership is still
+		 * current.  We will use it to our advantage to be able to
+		 * spin without disabling preemption...
+		 */
+
+		/*
+		 * .. sleep if the owner is not running..
+		 */
+		if (!adaptive->owner->se.on_rq) {
+			sleep = 1;
+			break;
+		}
+
+		/*
+		 * .. or is running on our own cpu (to prevent deadlock)
+		 */
+		if (task_cpu(adaptive->owner) == task_cpu(current)) {
+			sleep = 1;
+			break;
+		}
+
+		cpu_relax();
+	}
+
+	put_task_struct(adaptive->owner);
+
+	return sleep;
+}
+
+static inline void
+prepare_adaptive_wait(struct rt_mutex *lock, struct adaptive_waiter *adaptive)
+{
+	/*
+	 * We must acquire/lock the owner pointer while holding
+	 * the wait_lock, or we risk racing against the owner
+	 * exiting.
+	 */
+	adaptive->owner = rt_mutex_owner(lock);
+	get_task_struct(adaptive->owner);
+}
+
+#define DECLARE_ADAPTIVE_WAITER(name) \
+     struct adaptive_waiter name = { .owner = NULL, }
+
+#else
+
+#define DECLARE_ADAPTIVE_WAITER(name)
+
+#define adaptive_wait(lock, waiter, busy) 1
+#define prepare_adaptive_wait(lock, busy) {}
+
+#endif /* CONFIG_ADAPTIVE_RTLOCK */
+
+
+#endif /* __KERNEL_RTMUTEX_ADAPTIVE_H */


  parent reply	other threads:[~2008-02-25 16:31 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-25 16:00 [(RT RFC) PATCH v2 0/9] adaptive real-time locks Gregory Haskins
2008-02-25 16:00 ` [(RT RFC) PATCH v2 1/9] allow rt-mutex lock-stealing to include lateral priority Gregory Haskins
2008-03-03 15:13   ` Steven Rostedt
2008-03-03 15:41     ` [(RT RFC) PATCH v2 1/9] allow rt-mutex lock-stealing to includelateral priority Gregory Haskins
2008-03-03 15:55       ` Steven Rostedt
2008-03-03 15:55         ` [(RT RFC) PATCH v2 1/9] allow rt-mutex lock-stealing toincludelateral priority Gregory Haskins
2008-02-25 16:00 ` [(RT RFC) PATCH v2 2/9] sysctl for runtime-control of lateral mutex stealing Gregory Haskins
2008-02-25 21:53   ` Pavel Machek
2008-02-25 22:57     ` Sven-Thorsten Dietrich
2008-02-25 23:00       ` Pavel Machek
2008-02-25 23:40         ` Sven-Thorsten Dietrich
2008-02-26  1:15       ` Gregory Haskins
2008-02-25 16:00 ` [(RT RFC) PATCH v2 3/9] rearrange rt_spin_lock sleep Gregory Haskins
2008-02-25 21:54   ` Pavel Machek
2008-02-26  0:45     ` Gregory Haskins
2008-02-25 16:00 ` [(RT RFC) PATCH v2 4/9] optimize rt lock wakeup Gregory Haskins
2008-03-03 15:37   ` Steven Rostedt
2008-03-03 15:41     ` Gregory Haskins
2008-02-25 16:01 ` Gregory Haskins [this message]
2008-02-25 22:03   ` [(RT RFC) PATCH v2 5/9] adaptive real-time lock support Pavel Machek
2008-02-26  0:48     ` Gregory Haskins
2008-02-26 15:03     ` Gregory Haskins
2008-02-26 18:06       ` Pavel Machek
2008-02-26 18:01         ` Gregory Haskins
2008-02-25 16:01 ` [(RT RFC) PATCH v2 6/9] add a loop counter based timeout mechanism Gregory Haskins
2008-02-25 22:06   ` Pavel Machek
2008-02-25 22:19     ` Greg KH
2008-02-25 22:21       ` Pavel Machek
2008-02-25 22:39     ` Sven-Thorsten Dietrich
2008-02-26 15:09     ` Gregory Haskins
2008-02-25 16:01 ` [(RT RFC) PATCH v2 7/9] adaptive mutexes Gregory Haskins
2008-02-25 22:09   ` Pavel Machek
2008-02-26  0:52     ` Gregory Haskins
2008-02-25 16:01 ` [(RT RFC) PATCH v2 8/9] adjust pi_lock usage in wakeup Gregory Haskins
2008-02-25 22:10   ` Pavel Machek
2008-02-25 16:01 ` [(RT RFC) PATCH v2 9/9] remove the extra call to try_to_take_lock Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080225160103.11268.25579.stgit@novell1.haskins.net \
    --to=ghaskins@novell.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=ak@suse.de \
    --cc=bill.huey@gmail.com \
    --cc=cminyard@mvista.com \
    --cc=dsaxena@plexity.net \
    --cc=dsingleton@mvista.com \
    --cc=dwalker@mvista.com \
    --cc=gregkh@suse.de \
    --cc=kevin@hilman.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mkohari@novell.com \
    --cc=npiggin@suse.de \
    --cc=pavel@ucw.cz \
    --cc=pmorreale@novell.com \
    --cc=rostedt@goodmis.org \
    --cc=sdietrich@novell.com \
    --cc=tglx@linutronix.de \
    --subject='Re: [(RT RFC) PATCH v2 5/9] adaptive real-time lock support' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).