LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* sched_clock() on i386
@ 2006-12-22 10:43 Stephane Eranian
2006-12-22 12:19 ` [patch] sched: improve sched_clock() on i686 Ingo Molnar
2006-12-23 15:53 ` sched_clock() on i386 Daniel Walker
0 siblings, 2 replies; 6+ messages in thread
From: Stephane Eranian @ 2006-12-22 10:43 UTC (permalink / raw)
To: linux-kernel
Cc: venkatesh.pallipadi, suresh.b.siddha, kenneth.w.chen, tony.luck,
Stephane Eranian
Hello,
The perfmon subsystems needs to compute per-CPU duration. It is using
sched_clock() to provide this information. However, it seems they are
big variations in the way sched_clock() is implemented for each architectures,
especially in the accuracy of the returned value (going from TSC to jiffies).
Looking at the i386 implementation, it is not so clear to me what the
actual goal of the function is. I was under the impression that this
function was meant to compute per-CPU time deltas. This is how the
scheduler seems to use it.
The x86-64 and i386 implementations are quite different. The i386 comment
about NUMA seems to contradict the initial goal of the function.
Why is that?
Does this come from the fact that sched_lock() is used for the time-stamping
printk(). But in this case, like on IA-64, couldn't we define a specific
timing function for printk?
Excerpt from arch/i386/kernel/tsc.c:
unsigned long long sched_clock(void)
{
unsigned long long this_offset;
/*
* in the NUMA case we dont use the TSC as they are not
* synchronized across all CPUs.
*/
#ifndef CONFIG_NUMA
if (!cpu_khz || check_tsc_unstable())
#endif
/* no locking but a rare wrong value is not a big deal */
return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
/* read the Time Stamp Counter: */
rdtscll(this_offset);
/* return the value in ns */
return cycles_2_ns(this_offset);
}
--
-Stephane
^ permalink raw reply [flat|nested] 6+ messages in thread
* [patch] sched: improve sched_clock() on i686
2006-12-22 10:43 sched_clock() on i386 Stephane Eranian
@ 2006-12-22 12:19 ` Ingo Molnar
2006-12-23 10:27 ` Ingo Molnar
2007-01-17 11:05 ` Stephane Eranian
2006-12-23 15:53 ` sched_clock() on i386 Daniel Walker
1 sibling, 2 replies; 6+ messages in thread
From: Ingo Molnar @ 2006-12-22 12:19 UTC (permalink / raw)
To: Stephane Eranian
Cc: linux-kernel, venkatesh.pallipadi, suresh.b.siddha,
kenneth.w.chen, tony.luck, Andrew Morton
* Stephane Eranian <eranian@hpl.hp.com> wrote:
> The perfmon subsystems needs to compute per-CPU duration. It is using
> sched_clock() to provide this information. However, it seems they are
> big variations in the way sched_clock() is implemented for each
> architectures, especially in the accuracy of the returned value (going
> from TSC to jiffies).
>
> Looking at the i386 implementation, it is not so clear to me what the
> actual goal of the function is. I was under the impression that this
> function was meant to compute per-CPU time deltas. This is how the
> scheduler seems to use it.
>
> The x86-64 and i386 implementations are quite different. The i386
> comment about NUMA seems to contradict the initial goal of the
> function. Why is that?
it's purely historic - the i686 sched_clock() implementation predates
the scheduler's ability to deal with non-synchronous per-CPU clocks. I
tried to fix that (a year ago) and it didnt work out - but i've reviewed
my old patch and now realize what the mistake was - the patch below
should work better.
Ingo
---------------------->
Subject: [patch] sched: improve sched_clock() on i686
From: Ingo Molnar <mingo@elte.hu>
this patch cleans up sched_clock() on i686: it will use the TSC if
available and falls back to jiffies only if the user asked for it to be
disabled via notsc or the CPU calibration code didnt figure out the
right cpu_khz.
this generally makes the scheduler timestamps more finegrained, on all
hardware. (the current scheduler is pretty resistant against
asynchronous sched_clock() values on different CPUs, it will allow at
most up to a jiffy of jitter.)
also simplify sched_clock()'s check for TSC availability: propagate the
desire and ability to use the TSC into the tsc_disable flag, previously
this flag only indicated whether the notsc option was passed. This makes
the rare low-res sched_clock() codepath a single branch off a
read-mostly flag.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/i386/kernel/tsc.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
Index: linux/arch/i386/kernel/tsc.c
===================================================================
--- linux.orig/arch/i386/kernel/tsc.c
+++ linux/arch/i386/kernel/tsc.c
@@ -108,13 +108,10 @@ unsigned long long sched_clock(void)
unsigned long long this_offset;
/*
- * in the NUMA case we dont use the TSC as they are not
- * synchronized across all CPUs.
+ * Fall back to jiffies if there's no TSC available:
*/
-#ifndef CONFIG_NUMA
- if (!cpu_khz || check_tsc_unstable())
-#endif
- /* no locking but a rare wrong value is not a big deal */
+ if (unlikely(tsc_disable))
+ /* No locking but a rare wrong value is not a big deal: */
return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
/* read the Time Stamp Counter: */
@@ -194,13 +191,13 @@ EXPORT_SYMBOL(recalibrate_cpu_khz);
void __init tsc_init(void)
{
if (!cpu_has_tsc || tsc_disable)
- return;
+ goto out_no_tsc;
cpu_khz = calculate_cpu_khz();
tsc_khz = cpu_khz;
if (!cpu_khz)
- return;
+ goto out_no_tsc;
printk("Detected %lu.%03lu MHz processor.\n",
(unsigned long)cpu_khz / 1000,
@@ -208,6 +205,15 @@ void __init tsc_init(void)
set_cyc2ns_scale(cpu_khz);
use_tsc_delay();
+ return;
+
+out_no_tsc:
+ /*
+ * Set the tsc_disable flag if there's no TSC support, this
+ * makes it a fast flag for the kernel to see whether it
+ * should be using the TSC.
+ */
+ tsc_disable = 1;
}
#ifdef CONFIG_CPU_FREQ
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] sched: improve sched_clock() on i686
2006-12-22 12:19 ` [patch] sched: improve sched_clock() on i686 Ingo Molnar
@ 2006-12-23 10:27 ` Ingo Molnar
2007-01-17 11:05 ` Stephane Eranian
1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2006-12-23 10:27 UTC (permalink / raw)
To: Stephane Eranian
Cc: linux-kernel, venkatesh.pallipadi, suresh.b.siddha,
kenneth.w.chen, tony.luck, Andrew Morton
* Ingo Molnar <mingo@elte.hu> wrote:
> it's purely historic - the i686 sched_clock() implementation predates
> the scheduler's ability to deal with non-synchronous per-CPU clocks. I
> tried to fix that (a year ago) and it didnt work out - but i've
> reviewed my old patch and now realize what the mistake was - the patch
> below should work better.
that patch needs the small fix below as well.
Ingo
Index: linux/include/asm-i386/bugs.h
===================================================================
--- linux.orig/include/asm-i386/bugs.h
+++ linux/include/asm-i386/bugs.h
@@ -160,7 +160,7 @@ static void __init check_config(void)
* If we configured ourselves for a TSC, we'd better have one!
*/
#ifdef CONFIG_X86_TSC
- if (!cpu_has_tsc)
+ if (!cpu_has_tsc && !tsc_disable)
panic("Kernel compiled for Pentium+, requires TSC feature!");
#endif
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sched_clock() on i386
2006-12-22 10:43 sched_clock() on i386 Stephane Eranian
2006-12-22 12:19 ` [patch] sched: improve sched_clock() on i686 Ingo Molnar
@ 2006-12-23 15:53 ` Daniel Walker
2007-01-03 14:36 ` Stephane Eranian
1 sibling, 1 reply; 6+ messages in thread
From: Daniel Walker @ 2006-12-23 15:53 UTC (permalink / raw)
To: eranian
Cc: linux-kernel, venkatesh.pallipadi, suresh.b.siddha,
kenneth.w.chen, tony.luck
On Fri, 2006-12-22 at 02:43 -0800, Stephane Eranian wrote:
> Hello,
>
>
> The perfmon subsystems needs to compute per-CPU duration. It is using
> sched_clock() to provide this information. However, it seems they are
> big variations in the way sched_clock() is implemented for each architectures,
> especially in the accuracy of the returned value (going from TSC to jiffies).
>
The vast majority of architectures return a scaled jiffies value for
sched_clock(). MIPS, and ARM for instance are two, and i386 does
sometimes. The function isn't very predictable in terms or what you'll
get as output.
The most reliable way to get timing is to use gettimeofday() which in
turn uses a lowlevel clock. I'm not sure exactly what your application
is, but sometimes gettimeofday() can be a little complicated to use.
Which is why I create the following clocksource changes,
ftp://source.mvista.com/pub/dwalker/clocksource/
the purpose of which is to allow generic access to suitable lowlevel
clocks. It just extends the mechanism already used by gettimeofday().
Daniel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sched_clock() on i386
2006-12-23 15:53 ` sched_clock() on i386 Daniel Walker
@ 2007-01-03 14:36 ` Stephane Eranian
0 siblings, 0 replies; 6+ messages in thread
From: Stephane Eranian @ 2007-01-03 14:36 UTC (permalink / raw)
To: Daniel Walker
Cc: linux-kernel, venkatesh.pallipadi, suresh.b.siddha,
kenneth.w.chen, tony.luck
Daniel,
On Sat, Dec 23, 2006 at 07:53:47AM -0800, Daniel Walker wrote:
> On Fri, 2006-12-22 at 02:43 -0800, Stephane Eranian wrote:
> > Hello,
> >
> >
> > The perfmon subsystems needs to compute per-CPU duration. It is using
> > sched_clock() to provide this information. However, it seems they are
> > big variations in the way sched_clock() is implemented for each architectures,
> > especially in the accuracy of the returned value (going from TSC to jiffies).
> >
>
> The vast majority of architectures return a scaled jiffies value for
> sched_clock(). MIPS, and ARM for instance are two, and i386 does
> sometimes. The function isn't very predictable in terms or what you'll
> get as output.
>
My understanding is that you'll get a per-CPU timestamp expressed in nanoseconds.
The granularity of the returned value is highly dependent on the CPU
architecture (and apparently on how you've compiled your kernel).
> The most reliable way to get timing is to use gettimeofday() which in
> turn uses a lowlevel clock. I'm not sure exactly what your application
> is, but sometimes gettimeofday() can be a little complicated to use.
> Which is why I create the following clocksource changes,
>
I do NOT need a wall-clock time. I am looking for a simple per-CPU clock
source with best possible granularity. I use the clock to compute elapsed
execution duration. I initially was using TSC, then during the code review
someone suggested I use sched_clock(). Using getttimeofday() can be failry
expensive and I need to compute the duration in the context switch path.
Now my understanding is that on some processors with frequence scaling,
using TSC may not easily allow computing elapsed time. So there may not
be any cheap solution to my problem.
> ftp://source.mvista.com/pub/dwalker/clocksource/
>
> the purpose of which is to allow generic access to suitable lowlevel
> clocks. It just extends the mechanism already used by gettimeofday().
>
--
-Stephane
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] sched: improve sched_clock() on i686
2006-12-22 12:19 ` [patch] sched: improve sched_clock() on i686 Ingo Molnar
2006-12-23 10:27 ` Ingo Molnar
@ 2007-01-17 11:05 ` Stephane Eranian
1 sibling, 0 replies; 6+ messages in thread
From: Stephane Eranian @ 2007-01-17 11:05 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, venkatesh.pallipadi, suresh.b.siddha,
kenneth.w.chen, tony.luck, Andrew Morton
Ingo,
I see that this patch made it into -mm, so i am hoping it will
also show up in mainline fairly soon.
Thanks.
On Fri, Dec 22, 2006 at 01:19:20PM +0100, Ingo Molnar wrote:
>
> * Stephane Eranian <eranian@hpl.hp.com> wrote:
>
> > The perfmon subsystems needs to compute per-CPU duration. It is using
> > sched_clock() to provide this information. However, it seems they are
> > big variations in the way sched_clock() is implemented for each
> > architectures, especially in the accuracy of the returned value (going
> > from TSC to jiffies).
> >
> > Looking at the i386 implementation, it is not so clear to me what the
> > actual goal of the function is. I was under the impression that this
> > function was meant to compute per-CPU time deltas. This is how the
> > scheduler seems to use it.
> >
> > The x86-64 and i386 implementations are quite different. The i386
> > comment about NUMA seems to contradict the initial goal of the
> > function. Why is that?
>
> it's purely historic - the i686 sched_clock() implementation predates
> the scheduler's ability to deal with non-synchronous per-CPU clocks. I
> tried to fix that (a year ago) and it didnt work out - but i've reviewed
> my old patch and now realize what the mistake was - the patch below
> should work better.
>
> Ingo
>
> ---------------------->
> Subject: [patch] sched: improve sched_clock() on i686
> From: Ingo Molnar <mingo@elte.hu>
>
> this patch cleans up sched_clock() on i686: it will use the TSC if
> available and falls back to jiffies only if the user asked for it to be
> disabled via notsc or the CPU calibration code didnt figure out the
> right cpu_khz.
>
> this generally makes the scheduler timestamps more finegrained, on all
> hardware. (the current scheduler is pretty resistant against
> asynchronous sched_clock() values on different CPUs, it will allow at
> most up to a jiffy of jitter.)
>
> also simplify sched_clock()'s check for TSC availability: propagate the
> desire and ability to use the TSC into the tsc_disable flag, previously
> this flag only indicated whether the notsc option was passed. This makes
> the rare low-res sched_clock() codepath a single branch off a
> read-mostly flag.
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> arch/i386/kernel/tsc.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> Index: linux/arch/i386/kernel/tsc.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/tsc.c
> +++ linux/arch/i386/kernel/tsc.c
> @@ -108,13 +108,10 @@ unsigned long long sched_clock(void)
> unsigned long long this_offset;
>
> /*
> - * in the NUMA case we dont use the TSC as they are not
> - * synchronized across all CPUs.
> + * Fall back to jiffies if there's no TSC available:
> */
> -#ifndef CONFIG_NUMA
> - if (!cpu_khz || check_tsc_unstable())
> -#endif
> - /* no locking but a rare wrong value is not a big deal */
> + if (unlikely(tsc_disable))
> + /* No locking but a rare wrong value is not a big deal: */
> return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
>
> /* read the Time Stamp Counter: */
> @@ -194,13 +191,13 @@ EXPORT_SYMBOL(recalibrate_cpu_khz);
> void __init tsc_init(void)
> {
> if (!cpu_has_tsc || tsc_disable)
> - return;
> + goto out_no_tsc;
>
> cpu_khz = calculate_cpu_khz();
> tsc_khz = cpu_khz;
>
> if (!cpu_khz)
> - return;
> + goto out_no_tsc;
>
> printk("Detected %lu.%03lu MHz processor.\n",
> (unsigned long)cpu_khz / 1000,
> @@ -208,6 +205,15 @@ void __init tsc_init(void)
>
> set_cyc2ns_scale(cpu_khz);
> use_tsc_delay();
> + return;
> +
> +out_no_tsc:
> + /*
> + * Set the tsc_disable flag if there's no TSC support, this
> + * makes it a fast flag for the kernel to see whether it
> + * should be using the TSC.
> + */
> + tsc_disable = 1;
> }
>
> #ifdef CONFIG_CPU_FREQ
--
-Stephane
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-01-17 11:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-22 10:43 sched_clock() on i386 Stephane Eranian
2006-12-22 12:19 ` [patch] sched: improve sched_clock() on i686 Ingo Molnar
2006-12-23 10:27 ` Ingo Molnar
2007-01-17 11:05 ` Stephane Eranian
2006-12-23 15:53 ` sched_clock() on i386 Daniel Walker
2007-01-03 14:36 ` Stephane Eranian
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).