LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Viresh Kumar <viresh.kumar@linaro.org>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: linaro-kernel@lists.linaro.org, linux-kernel@vger.kernel.org,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Steven Miao <realmz6@gmail.com>
Subject: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer
Date: Tue, 31 Mar 2015 12:25:16 +0530	[thread overview]
Message-ID: <80182e47a7103608d2ddab7f62c0c3dffc99fdcc.1427782893.git.viresh.kumar@linaro.org> (raw)
In-Reply-To: <cover.1427782893.git.viresh.kumar@linaro.org>
In-Reply-To: <cover.1427782893.git.viresh.kumar@linaro.org>

While queuing a timer, we try to migrate it to a non-idle core if the local core
is idle, but we don't try that if the timer is re-armed from its handler.

There were few unsolved problems due to which it was avoided until now. But
there are cases where solving these problems can be useful. When the timer is
always re-armed from its handler, it never migrates to other cores. And many a
times, it ends up waking an idle core to just service the timer, which could
have been handled by a non-idle core.

Problems to solve, if we allow such migrations:

- Serialization of timer with itself. Its possible that timer may fire on the
  new base, before the timer handler finishes on old base.

- del_timer_sync() can't detect that the timer's handler has not finished.

__mod_timer migrates the timer with following code:

	spin_lock(&old_base->lock);
	timer_set_base(timer, NULL);
	spin_unlock(&old_base->lock);

	spin_lock(&new_base->lock);
	timer_set_base(timer, new_base);
	spin_unlock(&new_base->lock);

Once the new_base->lock is released, del_timer_sync() can take the
new_base->lock and will get new_base->running_timer != timer. del_timer_sync()
will then remove the timer and return while timer's handler hasn't finished yet
on the old base.

To fix these issues, lets use another bit from base pointer to mark if a timer's
handler is currently running or not. This can be used to verify timer's state in
del_timer_sync().

Before running timer's handler we must ensure timer isn't running on any other
CPU. If it is, then process other expired timers first, if any, and then wait
until the handler finishes.

Cc: Steven Miao <realmz6@gmail.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/timer.h |  3 +-
 kernel/time/timer.c   | 93 ++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 76 insertions(+), 20 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..68bf09d69352 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
  */
 #define TIMER_DEFERRABLE		0x1LU
 #define TIMER_IRQSAFE			0x2LU
+#define TIMER_RUNNING			0x4LU
 
-#define TIMER_FLAG_MASK			0x3LU
+#define TIMER_FLAG_MASK			0x7LU
 
 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..364644811485 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -105,6 +105,21 @@ static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
 	return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
 }
 
+static inline unsigned int timer_running(struct timer_list *timer)
+{
+	return ((unsigned int)(unsigned long)timer->base & TIMER_RUNNING);
+}
+
+static inline void timer_set_running(struct timer_list *timer)
+{
+	timer->base = (struct tvec_base *)((unsigned long)timer->base | TIMER_RUNNING);
+}
+
+static inline void timer_clear_running(struct timer_list *timer)
+{
+	timer->base = (struct tvec_base *)((unsigned long)timer->base & ~TIMER_RUNNING);
+}
+
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
 	return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -781,21 +796,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 	new_base = per_cpu(tvec_bases, cpu);
 
 	if (base != new_base) {
-		/*
-		 * We are trying to schedule the timer on the local CPU.
-		 * However we can't change timer's base while it is running,
-		 * otherwise del_timer_sync() can't detect that the timer's
-		 * handler yet has not finished. This also guarantees that
-		 * the timer is serialized wrt itself.
-		 */
-		if (likely(base->running_timer != timer)) {
-			/* See the comment in lock_timer_base() */
-			timer_set_base(timer, NULL);
-			spin_unlock(&base->lock);
-			base = new_base;
-			spin_lock(&base->lock);
-			timer_set_base(timer, base);
-		}
+		/* See the comment in lock_timer_base() */
+		timer_set_base(timer, NULL);
+		spin_unlock(&base->lock);
+		base = new_base;
+		spin_lock(&base->lock);
+		timer_set_base(timer, base);
 	}
 
 	timer->expires = expires;
@@ -1016,7 +1022,7 @@ int try_to_del_timer_sync(struct timer_list *timer)
 
 	base = lock_timer_base(timer, &flags);
 
-	if (base->running_timer != timer) {
+	if (!timer_running(timer)) {
 		timer_stats_timer_clear_start_info(timer);
 		ret = detach_if_pending(timer, base, true);
 	}
@@ -1050,12 +1056,12 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
  *    ----                             ----
  *                                   <SOFTIRQ>
  *                                   call_timer_fn();
- *                                     base->running_timer = mytimer;
+ *                                     timer_set_running(mytimer);
  *  spin_lock_irq(somelock);
  *                                     <IRQ>
  *                                        spin_lock(somelock);
  *  del_timer_sync(mytimer);
- *   while (base->running_timer == mytimer);
+ *   while (timer_running(mytimer));
  *
  * Now del_timer_sync() will never return and never release somelock.
  * The interrupt on the other CPU is waiting to grab somelock but
@@ -1189,12 +1195,41 @@ static inline void __run_timers(struct tvec_base *base)
 			cascade(base, &base->tv5, INDEX(3));
 		++base->timer_jiffies;
 		list_replace_init(base->tv1.vec + index, head);
+
+again:
 		while (!list_empty(head)) {
 			void (*fn)(unsigned long);
 			unsigned long data;
 			bool irqsafe;
 
-			timer = list_first_entry(head, struct timer_list,entry);
+			timer = list_first_entry(head, struct timer_list, entry);
+
+			if (unlikely(timer_running(timer))) {
+				/*
+				 * The only way to get here is if the handler,
+				 * running on another base, re-queued itself on
+				 * this base, and the handler hasn't finished
+				 * yet.
+				 */
+
+				if (list_is_last(&timer->entry, head)) {
+					/*
+					 * Drop lock, so that TIMER_RUNNING can
+					 * be cleared on another base.
+					 */
+					spin_unlock(&base->lock);
+
+					while (timer_running(timer))
+						cpu_relax();
+
+					spin_lock(&base->lock);
+				} else {
+					/* Rotate the list and try someone else */
+					list_move_tail(&timer->entry, head);
+				}
+				goto again;
+			}
+
 			fn = timer->function;
 			data = timer->data;
 			irqsafe = tbase_get_irqsafe(timer->base);
@@ -1202,6 +1237,7 @@ static inline void __run_timers(struct tvec_base *base)
 			timer_stats_account_timer(timer);
 
 			base->running_timer = timer;
+			timer_set_running(timer);
 			detach_expired_timer(timer, base);
 
 			if (irqsafe) {
@@ -1213,6 +1249,25 @@ static inline void __run_timers(struct tvec_base *base)
 				call_timer_fn(timer, fn, data);
 				spin_lock_irq(&base->lock);
 			}
+
+			/*
+			 * Handler running on this base, re-queued itself on
+			 * another base ?
+			 */
+			if (unlikely(timer->base != base)) {
+				unsigned long flags;
+				struct tvec_base *tbase;
+
+				spin_unlock(&base->lock);
+
+				tbase = lock_timer_base(timer, &flags);
+				timer_clear_running(timer);
+				spin_unlock(&tbase->lock);
+
+				spin_lock(&base->lock);
+			} else {
+				timer_clear_running(timer);
+			}
 		}
 	}
 	base->running_timer = NULL;
-- 
2.3.0.rc0.44.ga94655d


  reply	other threads:[~2015-03-31  6:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-31  6:55 [PATCH 0/2] timer: Migrate running timers Viresh Kumar
2015-03-31  6:55 ` Viresh Kumar [this message]
2015-03-31 14:53   ` [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer Peter Zijlstra
2015-04-14 23:13   ` Thomas Gleixner
2015-04-17  8:12     ` viresh kumar
2015-04-17  8:32       ` Ingo Molnar
2015-04-21 21:32       ` Thomas Gleixner
2015-04-21 21:54         ` Eric Dumazet
2015-04-22 15:29           ` Peter Zijlstra
2015-04-22 16:02             ` Eric Dumazet
2015-04-22 18:56               ` Thomas Gleixner
2015-04-22 19:59                 ` Eric Dumazet
2015-04-22 21:56                   ` Thomas Gleixner
2015-04-23  6:57                     ` Eric Dumazet
2015-04-23 12:45                       ` Thomas Gleixner
2015-04-25 18:37                         ` Eric Dumazet
2015-05-05 13:00                           ` Thomas Gleixner
2015-05-06 16:33                             ` Eric Dumazet
2015-04-15 15:54   ` Thomas Gleixner
2015-03-31  6:55 ` [PATCH 2/2] timer: Replace base-> 'running_timer' with 'busy' Viresh Kumar
2015-03-31 15:01 ` [PATCH 0/2] timer: Migrate running timers Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80182e47a7103608d2ddab7f62c0c3dffc99fdcc.1427782893.git.viresh.kumar@linaro.org \
    --to=viresh.kumar@linaro.org \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=realmz6@gmail.com \
    --cc=tglx@linutronix.de \
    --subject='Re: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).