LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: huggie@earth.li, linux-kernel@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Scheduler broken? sdhci issues with scheduling
Date: Fri, 29 Feb 2008 14:51:37 +0100	[thread overview]
Message-ID: <20080229135137.GA23781@elte.hu> (raw)
In-Reply-To: <20080229133433.GA15532@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> does the patch below help, even if you keep HZ=100? This doesnt look 
> like a scheduler issue, it's more of a timer/timing issue. Different 
> HZ means different msleep() results - and the mmc code does a loop of 
> small msleep delays.

alternatively, instead of applying the first patch, could you apply the 
patch below instead? This makes "msleep()" much more precise. Does this 
help in the HZ != 1000 case?

	Ingo

----------------------->
Subject: hrtimers: use hrtimers so that msleep() sleeps for the requested time
From: Jonathan Corbet <corbet@lwn.net>

The problem being addressed here is that the current msleep() will stop
for a minimum of two jiffies, meaning that, on a HZ=100 system,
msleep(1) delays for for about 20ms.  In a driver with one such delay
for each of 150 or so register setting operations, the extra time adds
up to a few seconds.

This patch addresses the situation by using hrtimers.  On tickless
systems with working timers, msleep(1) now sleeps for 1ms, even with
HZ=100.

Most comments last time were favorable.  The one dissenter was Roman,
who worries about the overhead of using hrtimers for this operation; my
understanding is that he would rather see a really_msleep() function for
those who actually want millisecond resolution.  I'm not sure how to
characterize what the cost could be, but it can only be buried by the
fact that every call sleeps for some number of milliseconds.  On my
system, the several hundred total msleep() calls can't cause any real
overhead, and almost all happen at initialization time.

I still think it would be useful for msleep() to do what it says it does
and not vastly oversleep with small arguments.  A quick grep turns up
450 msleep(1) calls in the current mainline.  Andrew, if you agree, can
you drop this into -mm?  If not, I guess I'll let it go.

Current msleep() snoozes for at least two jiffies, causing msleep(1) to
sleep for at least 20ms on HZ=100 systems.  Using hrtimers allows
msleep() to sleep for something much closer to the requested time.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 fs/proc/proc_misc.c  |    4 ++--
 fs/select.c          |    6 ++++--
 include/linux/time.h |    8 ++++----
 kernel/timer.c       |   47 ++++++++++++++++++++++++++++++++++++++---------
 4 files changed, 48 insertions(+), 17 deletions(-)

Index: linux/fs/proc/proc_misc.c
===================================================================
--- linux.orig/fs/proc/proc_misc.c
+++ linux/fs/proc/proc_misc.c
@@ -115,9 +115,9 @@ static int uptime_read_proc(char *page, 
 	cputime_to_timespec(idletime, &idle);
 	len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
 			(unsigned long) uptime.tv_sec,
-			(uptime.tv_nsec / (NSEC_PER_SEC / 100)),
+			(uptime.tv_nsec / (unsigned long)(NSEC_PER_SEC / 100)),
 			(unsigned long) idle.tv_sec,
-			(idle.tv_nsec / (NSEC_PER_SEC / 100)));
+			(idle.tv_nsec / (unsigned long)(NSEC_PER_SEC / 100)));
 
 	return proc_calc_metrics(page, start, off, count, eof, len);
 }
Index: linux/fs/select.c
===================================================================
--- linux.orig/fs/select.c
+++ linux/fs/select.c
@@ -446,7 +446,8 @@ asmlinkage long sys_pselect7(int n, fd_s
 		if ((u64)ts.tv_sec >= (u64)MAX_INT64_SECONDS)
 			timeout = -1;	/* infinite */
 		else {
-			timeout = DIV_ROUND_UP(ts.tv_nsec, NSEC_PER_SEC/HZ);
+			timeout = DIV_ROUND_UP(ts.tv_nsec,
+					       (unsigned long)NSEC_PER_SEC/HZ);
 			timeout += ts.tv_sec * HZ;
 		}
 	}
@@ -777,7 +778,8 @@ asmlinkage long sys_ppoll(struct pollfd 
 		if ((u64)ts.tv_sec >= (u64)MAX_INT64_SECONDS)
 			timeout = -1;	/* infinite */
 		else {
-			timeout = DIV_ROUND_UP(ts.tv_nsec, NSEC_PER_SEC/HZ);
+			timeout = DIV_ROUND_UP(ts.tv_nsec,
+					       (unsigned long)NSEC_PER_SEC/HZ);
 			timeout += ts.tv_sec * HZ;
 		}
 	}
Index: linux/include/linux/time.h
===================================================================
--- linux.orig/include/linux/time.h
+++ linux/include/linux/time.h
@@ -31,11 +31,11 @@ struct timezone {
 /* Parameters used to convert the timespec values: */
 #define MSEC_PER_SEC	1000L
 #define USEC_PER_MSEC	1000L
-#define NSEC_PER_USEC	1000L
-#define NSEC_PER_MSEC	1000000L
+#define NSEC_PER_USEC	1000LL
+#define NSEC_PER_MSEC	1000000LL
 #define USEC_PER_SEC	1000000L
-#define NSEC_PER_SEC	1000000000L
-#define FSEC_PER_SEC	1000000000000000L
+#define NSEC_PER_SEC	1000000000LL
+#define FSEC_PER_SEC	1000000000000000LL
 
 static inline int timespec_equal(const struct timespec *a,
                                  const struct timespec *b)
Index: linux/kernel/timer.c
===================================================================
--- linux.orig/kernel/timer.c
+++ linux/kernel/timer.c
@@ -1368,18 +1368,43 @@ void __init init_timers(void)
 	open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
 }
 
+
+
+
+static void do_msleep(unsigned int msecs, struct hrtimer_sleeper *sleeper,
+	int sigs)
+{
+	enum hrtimer_mode mode = HRTIMER_MODE_REL;
+	int state = sigs ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+
+	/*
+	 * This is really just a reworked and simplified version
+	 * of do_nanosleep().
+	 */
+	hrtimer_init(&sleeper->timer, CLOCK_MONOTONIC, mode);
+	sleeper->timer.expires = ktime_set(0, msecs*NSEC_PER_MSEC);
+	hrtimer_init_sleeper(sleeper, current);
+
+	do {
+		set_current_state(state);
+		hrtimer_start(&sleeper->timer, sleeper->timer.expires, mode);
+		if (sleeper->task)
+			schedule();
+		hrtimer_cancel(&sleeper->timer);
+		mode = HRTIMER_MODE_ABS;
+	} while (sleeper->task && !(sigs && signal_pending(current)));
+}
+
 /**
  * msleep - sleep safely even with waitqueue interruptions
  * @msecs: Time in milliseconds to sleep for
  */
 void msleep(unsigned int msecs)
 {
-	unsigned long timeout = msecs_to_jiffies(msecs) + 1;
+	struct hrtimer_sleeper sleeper;
 
-	while (timeout)
-		timeout = schedule_timeout_uninterruptible(timeout);
+	do_msleep(msecs, &sleeper, 0);
 }
-
 EXPORT_SYMBOL(msleep);
 
 /**
@@ -1388,11 +1413,15 @@ EXPORT_SYMBOL(msleep);
  */
 unsigned long msleep_interruptible(unsigned int msecs)
 {
-	unsigned long timeout = msecs_to_jiffies(msecs) + 1;
+	struct hrtimer_sleeper sleeper;
+	ktime_t left;
 
-	while (timeout && !signal_pending(current))
-		timeout = schedule_timeout_interruptible(timeout);
-	return jiffies_to_msecs(timeout);
-}
+	do_msleep(msecs, &sleeper, 1);
 
+	if (!sleeper.task)
+		return 0;
+	left = ktime_sub(sleeper.timer.expires,
+			 sleeper.timer.base->get_time());
+	return max(((long) ktime_to_ns(left))/(long)NSEC_PER_MSEC, 1L);
+}
 EXPORT_SYMBOL(msleep_interruptible);

  reply	other threads:[~2008-02-29 13:52 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-29 11:52 Simon Huggins
2008-02-29 13:34 ` Ingo Molnar
2008-02-29 13:51   ` Ingo Molnar [this message]
2008-02-29 14:00     ` Ingo Molnar
2008-02-29 19:39       ` Simon Huggins
2008-02-29 20:39         ` Ingo Molnar
2008-03-01 12:42           ` Simon Huggins
2008-03-01 14:08             ` Pierre Ossman
2008-03-02 10:40 ` Pavel Machek
2008-03-02 18:04   ` Simon Huggins
2008-03-03 19:43   ` Pierre Ossman
2008-03-03 20:38     ` Pavel Machek
2008-03-03 21:05       ` Pierre Ossman
2008-03-13 21:17         ` Simon Huggins
2008-03-15 15:04           ` Pierre Ossman
2008-04-20 21:07             ` Simon Huggins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080229135137.GA23781@elte.hu \
    --to=mingo@elte.hu \
    --cc=huggie@earth.li \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --subject='Re: Scheduler broken? sdhci issues with scheduling' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).