LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 00/12] Increased clocksource validation and cleanups (v3)
@ 2015-03-07  2:49 John Stultz
  2015-03-07  2:49 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
                   ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

So Ingo asked me to resend this series, which is the result of
earlier discussions with Linus and his suggestions around
improvements to clocksource validation in the hope we can more
easily catch bad hardware.

There's also a few cleanups Linus suggested as well as a few I've been
meaning to get to for awhile.

I tried in address all the feedback that had been given, adding
the checks behind CONFIG_DEBUG_TIMEKEEPING.  I also sorted out a
sane way to print rate-limited warnings if we see cycle deltas that
are too large, or if they look like small underflows.

I'd like to get this queued into -tip soon so it can get as much
testing in -next as possible.

If there are any objections or feedback, I'd love to hear it!

New in v3:
* Its been awhile since v2, and I don't recall any substantial changes,
  but I did change a function macro into a static inline per PeterZ's
  request.

thanks
-john

The patches are also available via a git pull:

The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:

  Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)

are available in the git repository at:

  https://git.linaro.org/people/john.stultz/linux.git fortglx/4.1/time

for you to fetch changes up to 904d6befdcab1b9947c14dd517977782216daafa:

  clocksource: Add some debug info about clocksources being registered
(2015-02-25 20:58:30 -0800)


Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>

John Stultz (12):
  clocksource: Simplify clocks_calc_max_nsecs logic
  clocksource: Simplify logic around clocksource wrapping saftey margins
  clocksource: Remove clocksource_max_deferment()
  clocksource: Add max_cycles to clocksource structure
  time: Add debugging checks to warn if we see delays
  time: Add infrastructure to cap clocksource reads to the max_cycles
    value
  time: Try to catch clocksource delta underflows
  time: Add warnings when overflows or underflows are observed
  clocksource: Improve clocksource watchdog reporting
  clocksource: Mostly kill clocksource_register()
  sparc: Convert to using clocksource_register_hz()
  clocksource: Add some debug info about clocksources being registered

 arch/s390/kernel/time.c     |   2 +-
 arch/sparc/kernel/time_32.c |   6 +-
 include/linux/clocksource.h |  16 ++++-
 kernel/time/clocksource.c   | 164 +++++++++++++++++++-------------------------
 kernel/time/jiffies.c       |   5 +-
 kernel/time/sched_clock.c   |   6 +-
 kernel/time/timekeeping.c   | 116 +++++++++++++++++++++++++++----
 lib/Kconfig.debug           |  12 ++++
 8 files changed, 208 insertions(+), 119 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:20   ` Ingo Molnar
  2015-03-07  2:49 ` [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

The previous clocks_calc_max_nsecs had some unecessarily
complex bit logic to find the max interval that could cause
multiplication overflows. Since this is not in the hot
path, just do the divide to make it easier to read.

The previous implementation also had a subtle issue
that it avoided overflows into signed 64bit values, where
as the intervals are always unsigned. This resulted in
overly conservative intervals, which other saftey margins
were then added to, reducing the intended interval length.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4892352..11323f4 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -476,19 +476,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 
 	/*
 	 * Calculate the maximum number of cycles that we can pass to the
-	 * cyc2ns function without overflowing a 64-bit signed result. The
-	 * maximum number of cycles is equal to ULLONG_MAX/(mult+maxadj)
-	 * which is equivalent to the below.
-	 * max_cycles < (2^63)/(mult + maxadj)
-	 * max_cycles < 2^(log2((2^63)/(mult + maxadj)))
-	 * max_cycles < 2^(log2(2^63) - log2(mult + maxadj))
-	 * max_cycles < 2^(63 - log2(mult + maxadj))
-	 * max_cycles < 1 << (63 - log2(mult + maxadj))
-	 * Please note that we add 1 to the result of the log2 to account for
-	 * any rounding errors, ensure the above inequality is satisfied and
-	 * no overflow will occur.
+	 * cyc2ns function without overflowing a 64-bit result.
 	 */
-	max_cycles = 1ULL << (63 - (ilog2(mult + maxadj) + 1));
+	max_cycles = ULLONG_MAX;
+	do_div(max_cycles, mult+maxadj);
 
 	/*
 	 * The actual maximum number of cycles we can defer the clocksource is
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
  2015-03-07  2:49 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:22   ` Ingo Molnar
  2015-03-07  2:49 ` [PATCH 03/12] clocksource: Remove clocksource_max_deferment() John Stultz
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

The clocksource logic has a number of places where we try to
include a safety margin. Most of these are 12% safety margins,
but they are inconsistently applied and sometimes are applied
on top of each other.

Additionally, in the previous patch, we corrected an issue
where we unintentionally in effect created a 50% saftey margin,
which these 12.5% margins where then added to.

So to siplify the logic here, this patch removes the various
12.5% margins, and consolidates adding the margin in one place:
clocks_calc_max_nsecs().

Addtionally, Linus prefers a 50% safety margin, as it allows
bad clock values to be more easily caught. This should really
have no net effect, due to the corrected issue earlier which
caused greater then 50% margins to be used w/o issue.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Stephen Boyd <sboyd@codeaurora.org> (for sched_clock.c bit)
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 26 ++++++++++++--------------
 kernel/time/sched_clock.c |  4 ++--
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 11323f4..e5d00e6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -469,6 +469,9 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
  * @shift:	cycle to nanosecond divisor (power of two)
  * @maxadj:	maximum adjustment value to mult (~11%)
  * @mask:	bitmask for two's complement subtraction of non 64 bit counters
+ *
+ * NOTE: This function includes a saftey margin of 50%, so that bad clock values
+ * can be detected.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 {
@@ -490,11 +493,14 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	max_cycles = min(max_cycles, mask);
 	max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+	/* Return 50% of the actual maximum, so we can detect bad values */
+	max_nsecs >>= 1;
+
 	return max_nsecs;
 }
 
 /**
- * clocksource_max_deferment - Returns max time the clocksource can be deferred
+ * clocksource_max_deferment - Returns max time the clocksource should be deferred
  * @cs:         Pointer to clocksource
  *
  */
@@ -504,13 +510,7 @@ static u64 clocksource_max_deferment(struct clocksource *cs)
 
 	max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
 					  cs->mask);
-	/*
-	 * To ensure that the clocksource does not wrap whilst we are idle,
-	 * limit the time the clocksource can be deferred by 12.5%. Please
-	 * note a margin of 12.5% is used because this can be computed with
-	 * a shift, versus say 10% which would require division.
-	 */
-	return max_nsecs - (max_nsecs >> 3);
+	return max_nsecs;
 }
 
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
@@ -659,10 +659,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	 * conversion precision. 10 minutes is still a reasonable
 	 * amount. That results in a shift value of 24 for a
 	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
-	 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
-	 * margin as we do in clocksource_max_deferment()
+	 * ~ 0.06ppm granularity for NTP.
 	 */
-	sec = (cs->mask - (cs->mask >> 3));
+	sec = cs->mask;
 	do_div(sec, freq);
 	do_div(sec, scale);
 	if (!sec)
@@ -674,9 +673,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 			       NSEC_PER_SEC / scale, sec * scale);
 
 	/*
-	 * for clocksources that have large mults, to avoid overflow.
-	 * Since mult may be adjusted by ntp, add an safety extra margin
-	 *
+	 * Ensure clocksources that have large mults don't overflow
+	 * when adjusted.
 	 */
 	cs->maxadj = clocksource_max_adjustment(cs);
 	while ((cs->mult + cs->maxadj < cs->mult)
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 01d2d15..c794b84 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -125,9 +125,9 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 
 	new_mask = CLOCKSOURCE_MASK(bits);
 
-	/* calculate how many ns until we wrap */
+	/* calculate how many ns until we risk wrapping */
 	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
-	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
+	new_wrap_kt = ns_to_ktime(wrap);
 
 	/* update epoch for new counter and update epoch_ns from old counter*/
 	new_epoch = read();
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 03/12] clocksource: Remove clocksource_max_deferment()
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
  2015-03-07  2:49 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
  2015-03-07  2:49 ` [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:18   ` Ingo Molnar
  2015-03-07  2:49 ` [PATCH 04/12] clocksource: Add max_cycles to clocksource structure John Stultz
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

clocksource_max_deferment() doesn't do anything useful
anymore, so zap it.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index e5d00e6..4988411 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -499,20 +499,6 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	return max_nsecs;
 }
 
-/**
- * clocksource_max_deferment - Returns max time the clocksource should be deferred
- * @cs:         Pointer to clocksource
- *
- */
-static u64 clocksource_max_deferment(struct clocksource *cs)
-{
-	u64 max_nsecs;
-
-	max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
-					  cs->mask);
-	return max_nsecs;
-}
-
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
 
 static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
@@ -684,7 +670,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 		cs->maxadj = clocksource_max_adjustment(cs);
 	}
 
-	cs->max_idle_ns = clocksource_max_deferment(cs);
+	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
+						 cs->maxadj, cs->mask);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
@@ -731,7 +718,8 @@ int clocksource_register(struct clocksource *cs)
 		cs->name);
 
 	/* calculate max idle time permitted for this clocksource */
-	cs->max_idle_ns = clocksource_max_deferment(cs);
+	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
+						 cs->maxadj, cs->mask);
 
 	mutex_lock(&clocksource_mutex);
 	clocksource_enqueue(cs);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 04/12] clocksource: Add max_cycles to clocksource structure
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (2 preceding siblings ...)
  2015-03-07  2:49 ` [PATCH 03/12] clocksource: Remove clocksource_max_deferment() John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:54   ` Ingo Molnar
  2015-03-07  2:49 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

In order to facilitate some clocksource validation,
add a max_cycles entry to the structure which will
hold the maximum cycle value that can safely be
multiplied without potentially causing an overflow.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/clocksource.h |  6 ++++--
 kernel/time/clocksource.c   | 15 ++++++++++++---
 kernel/time/sched_clock.c   |  2 +-
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 9c78d15..63fe52f 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -56,6 +56,7 @@ struct module;
  * @shift:		cycle to nanosecond divisor (power of two)
  * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
  * @maxadj:		maximum adjustment value to mult (~11%)
+ * @max_cycles:		maximum safe cycle value which won't overflow on mult
  * @flags:		flags describing special properties
  * @archdata:		arch-specific data
  * @suspend:		suspend function for the clocksource, if necessary
@@ -76,7 +77,7 @@ struct clocksource {
 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
 	struct arch_clocksource_data archdata;
 #endif
-
+	u64 max_cycles;
 	const char *name;
 	struct list_head list;
 	int rating;
@@ -189,7 +190,8 @@ extern struct clocksource * __init clocksource_default_clock(void);
 extern void clocksource_mark_unstable(struct clocksource *cs);
 
 extern u64
-clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask);
+clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask,
+							u64 *max_cycles);
 extern void
 clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec);
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4988411..e6c752b 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -469,11 +469,14 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
  * @shift:	cycle to nanosecond divisor (power of two)
  * @maxadj:	maximum adjustment value to mult (~11%)
  * @mask:	bitmask for two's complement subtraction of non 64 bit counters
+ * @max_cyc:    maximum cycle value before potential overflow (does not include
+ *		any saftey margin)
  *
  * NOTE: This function includes a saftey margin of 50%, so that bad clock values
  * can be detected.
  */
-u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
+u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask,
+								u64 *max_cyc)
 {
 	u64 max_nsecs, max_cycles;
 
@@ -493,6 +496,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	max_cycles = min(max_cycles, mask);
 	max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+	/* return the max_cycles value as well if requested */
+	if (max_cyc)
+		*max_cyc = max_cycles;
+
 	/* Return 50% of the actual maximum, so we can detect bad values */
 	max_nsecs >>= 1;
 
@@ -671,7 +678,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	}
 
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask);
+						 cs->maxadj, cs->mask,
+						 &cs->max_cycles);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
@@ -719,7 +727,8 @@ int clocksource_register(struct clocksource *cs)
 
 	/* calculate max idle time permitted for this clocksource */
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask);
+						 cs->maxadj, cs->mask,
+						 &cs->max_cycles);
 
 	mutex_lock(&clocksource_mutex);
 	clocksource_enqueue(cs);
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index c794b84..d43855b 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -126,7 +126,7 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 	new_mask = CLOCKSOURCE_MASK(bits);
 
 	/* calculate how many ns until we risk wrapping */
-	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
+	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask, NULL);
 	new_wrap_kt = ns_to_ktime(wrap);
 
 	/* update epoch for new counter and update epoch_ns from old counter*/
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 05/12] time: Add debugging checks to warn if we see delays
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (3 preceding siblings ...)
  2015-03-07  2:49 ` [PATCH 04/12] clocksource: Add max_cycles to clocksource structure John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:22   ` Paul Bolle
  2015-03-07  9:29   ` Ingo Molnar
  2015-03-07  2:49 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

Recently there's been some request for better sanity
checking in the time code, so that its more clear
when something is going wrong since timekeeping issues
could manifest in a large number of strange ways with
various subsystems.

Thus, this patch adds some extra infrastructure to
add a check update_wall_time to print warnings if we
see the call delayed beyond the max_cycles overflow
point, or beyond the clocksource max_idle_ns value
which is currently 50% of the overflow point.

This extra infrastructure is conditionalized
behind a new CONFIG_DEBUG_TIMEKEEPING option
also added in this patch.

Tested this a bit by halting qemu for specified
lengths of time to trigger the warnings.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/jiffies.c     |  1 +
 kernel/time/timekeeping.c | 26 ++++++++++++++++++++++++++
 lib/Kconfig.debug         | 12 ++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index a6a5bf5..7e41390 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -71,6 +71,7 @@ static struct clocksource clocksource_jiffies = {
 	.mask		= 0xffffffff, /*32bits*/
 	.mult		= NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
 	.shift		= JIFFIES_SHIFT,
+	.max_cycles	= 10,
 };
 
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 91db941..7e9d433 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -118,6 +118,29 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
 	tk->offs_boot = ktime_add(tk->offs_boot, delta);
 }
 
+#ifdef CONFIG_DEBUG_TIMEKEEPING
+static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
+{
+
+	cycle_t max_cycles = tk->tkr.clock->max_cycles;
+	const char *name = tk->tkr.clock->name;
+
+	if (offset > max_cycles)
+		printk_deferred("ERROR: cycle offset (%lld) is larger then"
+			" allowed %s max_cycles (%lld)\n",
+			offset, name, max_cycles);
+	else if (offset > (max_cycles >> 1))
+		printk_deferred("WARNING: cycle offset (%lld) is past"
+			" the %s 50%% safety margin (%lld)\n",
+			offset, name, max_cycles>>1);
+}
+#else
+static inline
+void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
+{
+}
+#endif
+
 /**
  * tk_setup_internals - Set up internals to use clocksource clock.
  *
@@ -1630,6 +1653,9 @@ void update_wall_time(void)
 	if (offset < real_tk->cycle_interval)
 		goto out;
 
+	/* Do some additional sanity checking */
+	timekeeping_check_update(real_tk, offset);
+
 	/*
 	 * With NO_HZ we may have to accumulate many cycle_intervals
 	 * (think "ticks") worth of time at once. To do this efficiently,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c5cefb3..32065f6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -865,6 +865,18 @@ config SCHED_STACK_END_CHECK
 	  data corruption or a sporadic crash at a later stage once the region
 	  is examined. The runtime overhead introduced is minimal.
 
+config DEBUG_TIMEKEEPING
+	bool "Enable extra timekeeping sanity checking"
+	help
+	  This option will enable additional timekeeping sanity checks
+	  which may be helpful when diagnoising issues where timekeeping
+	  problems are suspected.
+
+	  This may include checks in the timekeeping hotpaths, so this
+	  option may have a performance impact to some workloads.
+
+	  If unsure, say N.
+
 config TIMER_STATS
 	bool "Collect kernel timers statistics"
 	depends on DEBUG_KERNEL && PROC_FS
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (4 preceding siblings ...)
  2015-03-07  2:49 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:32   ` Ingo Molnar
  2015-03-07  2:49 ` [PATCH 07/12] time: Try to catch clocksource delta underflows John Stultz
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplciation
overflow from ocurring.

This patch introduces infrastructure to allow a capp that
limits the read delta value to the max_cycles value, which is
where an overflow would occur.

Since this is in the hotpath, it adds the extra checking under
CONFIG_DEBUG_TIMEKEEPING.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were misscheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 7e9d433..8b9e328 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -134,11 +134,34 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 			" the %s 50%% safety margin (%lld)\n",
 			offset, name, max_cycles>>1);
 }
+
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+	cycle_t cycle_now, delta;
+
+	/* read clocksource */
+	cycle_now = tkr->read(tkr->clock);
+
+	/* calculate the delta since the last update_wall_time */
+	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	if (unlikely(delta > tkr->clock->max_cycles))
+		delta = tkr->clock->max_cycles;
+
+	return delta;
+}
 #else
 static inline
 void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 {
 }
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+	/* calculate the delta since the last update_wall_time */
+	return clocksource_delta(tkr->read(tkr->clock), tkr->cycle_last,
+								tkr->mask);
+}
 #endif
 
 /**
@@ -216,14 +239,10 @@ static inline u32 arch_gettimeoffset(void) { return 0; }
 
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
-	cycle_t cycle_now, delta;
+	cycle_t delta;
 	s64 nsec;
 
-	/* read clocksource: */
-	cycle_now = tkr->read(tkr->clock);
-
-	/* calculate the delta since the last update_wall_time: */
-	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+	delta = timekeeping_get_delta(tkr);
 
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;
@@ -235,14 +254,10 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 {
 	struct clocksource *clock = tk->tkr.clock;
-	cycle_t cycle_now, delta;
+	cycle_t delta;
 	s64 nsec;
 
-	/* read clocksource: */
-	cycle_now = tk->tkr.read(clock);
-
-	/* calculate the delta since the last update_wall_time: */
-	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
+	delta = timekeeping_get_delta(&tk->tkr);
 
 	/* convert delta to nanoseconds. */
 	nsec = clocksource_cyc2ns(delta, clock->mult, clock->shift);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 07/12] time: Try to catch clocksource delta underflows
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (5 preceding siblings ...)
  2015-03-07  2:49 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:34   ` Ingo Molnar
  2015-03-07  2:49 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

In the case where there is a broken clocksource
where there are multiple actual clocks that
aren't perfectly aligned, we may see small "negative"
deltas when we subtract now from cycle_last.

The values are actually negative with respect to the
clocksource mask value, not necessarily negative
if cast to a s64, but we can check by checking the
delta see if it is a small (relative to the mask)
negative value (again negative relative to the mask).

If so, we assume we jumped backwards somehow and
instead use zero for our delta.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 8b9e328..4e8ccde 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -145,6 +145,13 @@ static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
 	/* calculate the delta since the last update_wall_time */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
 
+	/*
+	 * Try to catch underflows by checking if we are seeing small
+	 * mask-relative negative values.
+	 */
+	if (unlikely((~delta & tkr->mask) < (tkr->mask >> 3)))
+		delta = 0;
+
 	/* Cap delta value to the max_cycles values to avoid mult overflows */
 	if (unlikely(delta > tkr->clock->max_cycles))
 		delta = tkr->clock->max_cycles;
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 08/12] time: Add warnings when overflows or underflows are observed
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (6 preceding siblings ...)
  2015-03-07  2:49 ` [PATCH 07/12] time: Try to catch clocksource delta underflows John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:40   ` Ingo Molnar
  2015-03-07  2:50 ` [PATCH 09/12] clocksource: Improve clocksource watchdog reporting John Stultz
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

It was suggested that the underflow/overflow protection
should probably throw some sort of warning out, rather
then just silently fixing the issue.

So this patch adds some warnings here. The flag variables
used are not protected by locks, but since we can't print
from the reading functions, just being able to say we
saw an issue in the update interval is useful enough,
and can be slightly racy without real consequnece.

The big complication is that we're only under a read
seqlock, so the data could shift under us during
our calcualtion to see if there was a problem. This
patch avoids this issue by nesting another seqlock
which allows us to snapshot the just required values
atomically. So we shouldn't see false positives.

I also added some basic ratelimiting here, since
on one build machine w/ skewed TSCs it was fairly
noisy at bootup.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 58 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4e8ccde..5f62308 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -119,11 +119,23 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
 }
 
 #ifdef CONFIG_DEBUG_TIMEKEEPING
+#define WARNINGFREQ (HZ*300) /* 5 minute rate-limiting */
+/*
+ * These simple flag variables are managed
+ * without locks, which is racy, but ok since
+ * we don't really care about being super
+ * precise about how many events were seen,
+ * just that a problem was observed.
+ */
+static int timekeeping_underflow_seen;
+static int timekeeping_overflow_seen;
+
 static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 {
 
 	cycle_t max_cycles = tk->tkr.clock->max_cycles;
 	const char *name = tk->tkr.clock->name;
+	static long last_warning; /* we always hold write on timekeeper lock */
 
 	if (offset > max_cycles)
 		printk_deferred("ERROR: cycle offset (%lld) is larger then"
@@ -133,28 +145,60 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 		printk_deferred("WARNING: cycle offset (%lld) is past"
 			" the %s 50%% safety margin (%lld)\n",
 			offset, name, max_cycles>>1);
+
+	if (timekeeping_underflow_seen) {
+		if (jiffies - last_warning > WARNINGFREQ) {
+			printk_deferred("WARNING: Clocksource underflow observed\n");
+			last_warning = jiffies;
+		}
+		timekeeping_underflow_seen = 0;
+	}
+	if (timekeeping_overflow_seen) {
+		if (jiffies - last_warning > WARNINGFREQ) {
+			printk_deferred("WARNING: Clocksource overflow observed\n");
+			last_warning = jiffies;
+		}
+		timekeeping_overflow_seen = 0;
+	}
+
 }
 
 static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
 {
-	cycle_t cycle_now, delta;
+	cycle_t now, last, mask, max, delta;
+	unsigned int seq;
 
-	/* read clocksource */
-	cycle_now = tkr->read(tkr->clock);
+	/*
+	 * Since we're called holding a seqlock, the data may shift
+	 * under us while we're doign the calculation. This can cause
+	 * false positives, since we'd note a problem but throw the
+	 * results away. So nest  another seqlock here to atomically
+	 * grab the points we are checking with.
+	 */
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		now = tkr->read(tkr->clock);
+		last = tkr->cycle_last;
+		mask = tkr->mask;
+		max = tkr->clock->max_cycles;
+	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	/* calculate the delta since the last update_wall_time */
-	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+	delta = clocksource_delta(now, last, mask);
 
 	/*
 	 * Try to catch underflows by checking if we are seeing small
 	 * mask-relative negative values.
 	 */
-	if (unlikely((~delta & tkr->mask) < (tkr->mask >> 3)))
+	if (unlikely((~delta & mask) < (mask >> 3))) {
+		timekeeping_underflow_seen = 1;
 		delta = 0;
+	}
 
 	/* Cap delta value to the max_cycles values to avoid mult overflows */
-	if (unlikely(delta > tkr->clock->max_cycles))
+	if (unlikely(delta > max)) {
+		timekeeping_overflow_seen = 1;
 		delta = tkr->clock->max_cycles;
+	}
 
 	return delta;
 }
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 09/12] clocksource: Improve clocksource watchdog reporting
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (7 preceding siblings ...)
  2015-03-07  2:49 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
@ 2015-03-07  2:50 ` John Stultz
  2015-03-07  2:50 ` [PATCH 10/12] clocksource: Mostly kill clocksource_register() John Stultz
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:50 UTC (permalink / raw)
  To: lkml; +Cc: John Stultz

The clocksource watchdog reporting has been less helpful
then desired, as it just printed the delta between
the two clocksources. This prevents any useful analysis
of why the skew occurred.

Thus this patch tries to improve the output when we
mark a clocksource as unstable, printing out the cycle
last and now values for both the current clocksource
and the watchdog clocksource. This will allow us to see
if the result was due to a false positive caused by
a problematic watchdog.

Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index e6c752b..51c7b3a 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -142,13 +142,6 @@ static void __clocksource_unstable(struct clocksource *cs)
 		schedule_work(&watchdog_work);
 }
 
-static void clocksource_unstable(struct clocksource *cs, int64_t delta)
-{
-	printk(KERN_WARNING "Clocksource %s unstable (delta = %Ld ns)\n",
-	       cs->name, delta);
-	__clocksource_unstable(cs);
-}
-
 /**
  * clocksource_mark_unstable - mark clocksource unstable via watchdog
  * @cs:		clocksource to be marked unstable
@@ -174,7 +167,7 @@ void clocksource_mark_unstable(struct clocksource *cs)
 static void clocksource_watchdog(unsigned long data)
 {
 	struct clocksource *cs;
-	cycle_t csnow, wdnow, delta;
+	cycle_t csnow, wdnow, cslast, wdlast, delta;
 	int64_t wd_nsec, cs_nsec;
 	int next_cpu, reset_pending;
 
@@ -213,6 +206,8 @@ static void clocksource_watchdog(unsigned long data)
 
 		delta = clocksource_delta(csnow, cs->cs_last, cs->mask);
 		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		wdlast = cs->wd_last; /* save these incase we print them */
+		cslast = cs->cs_last;
 		cs->cs_last = csnow;
 		cs->wd_last = wdnow;
 
@@ -221,7 +216,15 @@ static void clocksource_watchdog(unsigned long data)
 
 		/* Check the deviation from the watchdog clocksource. */
 		if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
-			clocksource_unstable(cs, cs_nsec - wd_nsec);
+			pr_warn("Watchdog: clocksource %s unstable\n",
+				cs->name);
+			pr_warn("	"
+				"%s wd_now: %llx wd_last: %llx mask: %llx\n",
+				watchdog->name, wdnow, wdlast, watchdog->mask);
+			pr_warn("	"
+				"%s cs_now: %llx cs_last: %llx mask: %llx\n",
+				cs->name, csnow, cslast, cs->mask);
+			__clocksource_unstable(cs);
 			continue;
 		}
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 10/12] clocksource: Mostly kill clocksource_register()
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (8 preceding siblings ...)
  2015-03-07  2:50 ` [PATCH 09/12] clocksource: Improve clocksource watchdog reporting John Stultz
@ 2015-03-07  2:50 ` John Stultz
  2015-03-07  9:45   ` Ingo Molnar
  2015-03-07  2:50 ` [PATCH 11/12] sparc: Convert to using clocksource_register_hz() John Stultz
  2015-03-07  2:50 ` [PATCH 12/12] clocksource: Add some debug info about clocksources being registered John Stultz
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:50 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller, Martin Schwidefsky

A long running project has been to cleanup remaining uses
of clocksource_register(), replacing it with the simpler
clocksource_register_khz/hz().

However, there are a few cases where we need to self-define
our mult/shift values, so switch the function to a more
obviously internal __clocksource_register(), and consolidate
much of the internal logic so we don't have duplication.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/s390/kernel/time.c     |  2 +-
 arch/sparc/kernel/time_32.c |  2 +-
 include/linux/clocksource.h | 10 +++++-
 kernel/time/clocksource.c   | 83 +++++++++++++++++++--------------------------
 kernel/time/jiffies.c       |  4 +--
 5 files changed, 47 insertions(+), 54 deletions(-)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 20660dd..6c273cd 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -283,7 +283,7 @@ void __init time_init(void)
 	if (register_external_irq(EXT_IRQ_TIMING_ALERT, timing_alert_interrupt))
 		panic("Couldn't request external interrupt 0x1406");
 
-	if (clocksource_register(&clocksource_tod) != 0)
+	if (__clocksource_register(&clocksource_tod) != 0)
 		panic("Could not register TOD clock source");
 
 	/* Enable TOD clock interrupts on the boot cpu. */
diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index 2f80d23..a31c0c8 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -191,7 +191,7 @@ static __init int setup_timer_cs(void)
 	timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
 	                                    timer_cs.shift);
 
-	return clocksource_register(&timer_cs);
+	return __clocksource_register(&timer_cs);
 }
 
 #ifdef CONFIG_SMP
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 63fe52f..c064349 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -179,7 +179,6 @@ static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift)
 }
 
 
-extern int clocksource_register(struct clocksource*);
 extern int clocksource_unregister(struct clocksource*);
 extern void clocksource_touch_watchdog(void);
 extern struct clocksource* clocksource_get_next(void);
@@ -204,6 +203,15 @@ __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq);
 extern void
 __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq);
 
+/*
+ * Dont' call this unless you're a default clocksource
+ * (AKA: jiffies) and absolutely have to.
+ */
+static inline int __clocksource_register(struct clocksource *cs)
+{
+	return __clocksource_register_scale(cs, 1, 0);
+}
+
 static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
 {
 	return __clocksource_register_scale(cs, 1, hz);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 51c7b3a..3f24bb3 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -648,38 +648,52 @@ static void clocksource_enqueue(struct clocksource *cs)
 void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 {
 	u64 sec;
+
 	/*
-	 * Calc the maximum number of seconds which we can run before
-	 * wrapping around. For clocksources which have a mask > 32bit
-	 * we need to limit the max sleep time to have a good
-	 * conversion precision. 10 minutes is still a reasonable
-	 * amount. That results in a shift value of 24 for a
-	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
-	 * ~ 0.06ppm granularity for NTP.
+	 * Default clocksources are *special* and self-define their mult/shift.
+	 * But, you're not special, so you should specify a freq value.
 	 */
-	sec = cs->mask;
-	do_div(sec, freq);
-	do_div(sec, scale);
-	if (!sec)
-		sec = 1;
-	else if (sec > 600 && cs->mask > UINT_MAX)
-		sec = 600;
-
-	clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
-			       NSEC_PER_SEC / scale, sec * scale);
-
+	if (freq) {
+		/*
+		 * Calc the maximum number of seconds which we can run before
+		 * wrapping around. For clocksources which have a mask > 32bit
+		 * we need to limit the max sleep time to have a good
+		 * conversion precision. 10 minutes is still a reasonable
+		 * amount. That results in a shift value of 24 for a
+		 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
+		 * ~ 0.06ppm granularity for NTP.
+		 */
+		sec = cs->mask;
+		do_div(sec, freq);
+		do_div(sec, scale);
+		if (!sec)
+			sec = 1;
+		else if (sec > 600 && cs->mask > UINT_MAX)
+			sec = 600;
+
+		clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
+				       NSEC_PER_SEC / scale, sec * scale);
+	}
 	/*
 	 * Ensure clocksources that have large mults don't overflow
 	 * when adjusted.
 	 */
 	cs->maxadj = clocksource_max_adjustment(cs);
-	while ((cs->mult + cs->maxadj < cs->mult)
-		|| (cs->mult - cs->maxadj > cs->mult)) {
+	while (freq && ((cs->mult + cs->maxadj < cs->mult)
+		|| (cs->mult - cs->maxadj > cs->mult))) {
 		cs->mult >>= 1;
 		cs->shift--;
 		cs->maxadj = clocksource_max_adjustment(cs);
 	}
 
+	/*
+	 * Only warn for *special* clocksources that self-define
+	 * their mult/shift values and don't specify a freq.
+	 */
+	WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
+		"Clocksource %s might overflow on 11%% adjustment\n",
+		cs->name);
+
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
 						 cs->maxadj, cs->mask,
 						 &cs->max_cycles);
@@ -713,35 +727,6 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
 }
 EXPORT_SYMBOL_GPL(__clocksource_register_scale);
 
-
-/**
- * clocksource_register - Used to install new clocksources
- * @cs:		clocksource to be registered
- *
- * Returns -EBUSY if registration fails, zero otherwise.
- */
-int clocksource_register(struct clocksource *cs)
-{
-	/* calculate max adjustment for given mult/shift */
-	cs->maxadj = clocksource_max_adjustment(cs);
-	WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
-		"Clocksource %s might overflow on 11%% adjustment\n",
-		cs->name);
-
-	/* calculate max idle time permitted for this clocksource */
-	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask,
-						 &cs->max_cycles);
-
-	mutex_lock(&clocksource_mutex);
-	clocksource_enqueue(cs);
-	clocksource_enqueue_watchdog(cs);
-	clocksource_select();
-	mutex_unlock(&clocksource_mutex);
-	return 0;
-}
-EXPORT_SYMBOL(clocksource_register);
-
 static void __clocksource_change_rating(struct clocksource *cs, int rating)
 {
 	list_del(&cs->list);
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 7e41390..c4bb518 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -95,7 +95,7 @@ EXPORT_SYMBOL(jiffies);
 
 static int __init init_jiffies_clocksource(void)
 {
-	return clocksource_register(&clocksource_jiffies);
+	return __clocksource_register(&clocksource_jiffies);
 }
 
 core_initcall(init_jiffies_clocksource);
@@ -131,6 +131,6 @@ int register_refined_jiffies(long cycles_per_second)
 
 	refined_jiffies.mult = ((u32)nsec_per_tick) << JIFFIES_SHIFT;
 
-	clocksource_register(&refined_jiffies);
+	__clocksource_register(&refined_jiffies);
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 11/12] sparc: Convert to using clocksource_register_hz()
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (9 preceding siblings ...)
  2015-03-07  2:50 ` [PATCH 10/12] clocksource: Mostly kill clocksource_register() John Stultz
@ 2015-03-07  2:50 ` John Stultz
  2015-03-07  9:46   ` Ingo Molnar
  2015-03-07  2:50 ` [PATCH 12/12] clocksource: Add some debug info about clocksources being registered John Stultz
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:50 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller

While cleaning up some clocksource code, I noticed the
time_32 impelementation uses the hz2mult helper, but
doesn't use the clocksource_register_hz() method.

I don't believe the sparc clocksource is a default
clocksource, so we shouldn't need to self-define
the mult/shift pair.

So convert the time_32.c implementation to use
clocksource_register_hz().

Untested.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/sparc/kernel/time_32.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index a31c0c8..18147a5 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -181,17 +181,13 @@ static struct clocksource timer_cs = {
 	.rating	= 100,
 	.read	= timer_cs_read,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.shift	= 2,
 	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
 static __init int setup_timer_cs(void)
 {
 	timer_cs_enabled = 1;
-	timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
-	                                    timer_cs.shift);
-
-	return __clocksource_register(&timer_cs);
+	return clocksource_register_hz(&timer_cs, sparc_config.clock_rate);
 }
 
 #ifdef CONFIG_SMP
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 12/12] clocksource: Add some debug info about clocksources being registered
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
                   ` (10 preceding siblings ...)
  2015-03-07  2:50 ` [PATCH 11/12] sparc: Convert to using clocksource_register_hz() John Stultz
@ 2015-03-07  2:50 ` John Stultz
  2015-03-07  9:50   ` Ingo Molnar
  11 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-07  2:50 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

Print the mask, max_cycles, and max_idle_ns values for clocksources
being registered.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f24bb3..9b75316 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -697,6 +697,10 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
 						 cs->maxadj, cs->mask,
 						 &cs->max_cycles);
+
+	pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
+			cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
+
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/12] clocksource: Remove clocksource_max_deferment()
  2015-03-07  2:49 ` [PATCH 03/12] clocksource: Remove clocksource_max_deferment() John Stultz
@ 2015-03-07  9:18   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:18 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> clocksource_max_deferment() doesn't do anything useful
> anymore, so zap it.

Well, it does something useful, it encapsulates the max_deferment 
property of the clocksource:

> -static u64 clocksource_max_deferment(struct clocksource *cs)
> -{
> -	u64 max_nsecs;
> -
> -	max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
> -					  cs->mask);
> -	return max_nsecs;
> -}

Which could be written in a shorter form, using:

static u64 clocksource_max_deferment(struct clocksource *cs)
{
	return clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj, cs->mask);
}

Which all allows short forms of:

	cs->max_idle_ns = clocksource_max_deferment(cs);

without writing out all the arguments.

Instead of that, you've introduced:

> +	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
> +						 cs->maxadj, cs->mask);

> +	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
> +						 cs->maxadj, cs->mask);

Which in the next patch gets even worse:


>  	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
> +						 cs->maxadj, cs->mask,
> +						 &cs->max_cycles);

>  	/* calculate max idle time permitted for this clocksource */
>  	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
> +						 cs->maxadj, cs->mask,
> +						 &cs->max_cycles);

While with the helper function it would still be the same sweet:

	cs->max_idle_ns = clocksource_max_deferment(cs);

So I don't think this cleanup is an improvement ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic
  2015-03-07  2:49 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
@ 2015-03-07  9:20   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:20 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> The previous clocks_calc_max_nsecs had some unecessarily
> complex bit logic to find the max interval that could cause
> multiplication overflows. Since this is not in the hot
> path, just do the divide to make it easier to read.
> 
> The previous implementation also had a subtle issue
> that it avoided overflows into signed 64bit values, where

So here you write the weird '64bit' form, while in the code you write:

> +	 * cyc2ns function without overflowing a 64-bit result.

This repeats in later patches as well. I'd suggest using '64-bit' 
consistently throughout the whole series.

> as the intervals are always unsigned. This resulted in
> overly conservative intervals, which other saftey margins
> were then added to, reducing the intended interval length.

Typo.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins
  2015-03-07  2:49 ` [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
@ 2015-03-07  9:22   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:22 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> The clocksource logic has a number of places where we try to
> include a safety margin. Most of these are 12% safety margins,
> but they are inconsistently applied and sometimes are applied
> on top of each other.
> 
> Additionally, in the previous patch, we corrected an issue
> where we unintentionally in effect created a 50% saftey margin,
> which these 12.5% margins where then added to.
> 
> So to siplify the logic here, this patch removes the various

Typo.

> 12.5% margins, and consolidates adding the margin in one place:
> clocks_calc_max_nsecs().
> 
> Addtionally, Linus prefers a 50% safety margin, as it allows
> bad clock values to be more easily caught. This should really
> have no net effect, due to the corrected issue earlier which
> caused greater then 50% margins to be used w/o issue.
> 
> Cc: Dave Jones <davej@codemonkey.org.uk>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Stephen Boyd <sboyd@codeaurora.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Stephen Boyd <sboyd@codeaurora.org> (for sched_clock.c bit)
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/time/clocksource.c | 26 ++++++++++++--------------
>  kernel/time/sched_clock.c |  4 ++--
>  2 files changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 11323f4..e5d00e6 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -469,6 +469,9 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
>   * @shift:	cycle to nanosecond divisor (power of two)
>   * @maxadj:	maximum adjustment value to mult (~11%)
>   * @mask:	bitmask for two's complement subtraction of non 64 bit counters
> + *
> + * NOTE: This function includes a saftey margin of 50%, so that bad clock values

Typo.

There's also the same typo in the title of the patch.
 
Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/12] time: Add debugging checks to warn if we see delays
  2015-03-07  2:49 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
@ 2015-03-07  9:22   ` Paul Bolle
  2015-03-07  9:29   ` Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Paul Bolle @ 2015-03-07  9:22 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Fri, 2015-03-06 at 18:49 -0800, John Stultz wrote:
> +config DEBUG_TIMEKEEPING
> +	bool "Enable extra timekeeping sanity checking"
> +	help
> +	  This option will enable additional timekeeping sanity checks
> +	  which may be helpful when diagnoising issues where timekeeping

diagnoise, verb:
	to diagnose a problem by generating a large amount of data
	see also: printk debugging

> +	  problems are suspected.
> +
> +	  This may include checks in the timekeeping hotpaths, so this
> +	  option may have a performance impact to some workloads.
> +
> +	  If unsure, say N.
> +



Paul Bolle


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/12] time: Add debugging checks to warn if we see delays
  2015-03-07  2:49 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
  2015-03-07  9:22   ` Paul Bolle
@ 2015-03-07  9:29   ` Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:29 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> Recently there's been some request for better sanity
> checking in the time code, so that its more clear
> when something is going wrong since timekeeping issues
> could manifest in a large number of strange ways with
> various subsystems.
> 
> Thus, this patch adds some extra infrastructure to
> add a check update_wall_time to print warnings if we
> see the call delayed beyond the max_cycles overflow
> point, or beyond the clocksource max_idle_ns value
> which is currently 50% of the overflow point.

Just a changelog style nit, but isn't this easier to read:

  Thus, this patch adds some extra infrastructure to
  add a check to update_wall_time() to print warnings if we
  see the call delayed beyond the 'max_cycles' overflow
  point, or beyond the clocksource 'max_idle_ns' value
  which is currently 50% of the overflow point.

?

To mark functions with parentheses and arguments in quotes, to make 
them mix better with free form English sentences?

> +#ifdef CONFIG_DEBUG_TIMEKEEPING
> +static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
> +{
> +
> +	cycle_t max_cycles = tk->tkr.clock->max_cycles;
> +	const char *name = tk->tkr.clock->name;
> +
> +	if (offset > max_cycles)
> +		printk_deferred("ERROR: cycle offset (%lld) is larger then"
> +			" allowed %s max_cycles (%lld)\n",
> +			offset, name, max_cycles);
> +	else if (offset > (max_cycles >> 1))
> +		printk_deferred("WARNING: cycle offset (%lld) is past"
> +			" the %s 50%% safety margin (%lld)\n",
> +			offset, name, max_cycles>>1);

s/larger then/larger than

Also, please don't break user visible messages on col80 boundaries 
just to pacify checkpatch: ignore checkpatch in these cases.

Also, plase use curly braces on multi-line statements, plus I'd shape 
it like this:

	if () {
	} else {
		if () {
		}
	}

That way the second 'if' condition in the else branch does not get 
lost in the noise of silly line breaks. (At least during email review: 
in an actual editor syntax highlighting helps.)

> +config DEBUG_TIMEKEEPING
> +	bool "Enable extra timekeeping sanity checking"
> +	help
> +	  This option will enable additional timekeeping sanity checks
> +	  which may be helpful when diagnoising issues where timekeeping

Typo.

There's really a disproportionate ratio of typos, considering how many 
iterations this patch-set has been through :-/

Or maybe I'm oversensitive to small details.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value
  2015-03-07  2:49 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
@ 2015-03-07  9:32   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:32 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> When calculating the current delta since the last tick, we
> currently have no hard protections to prevent a multiplciation

Typo.

> overflow from ocurring.

Typo.

> This patch introduces infrastructure to allow a capp that

Typo ...

> +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
> +{
> +	/* calculate the delta since the last update_wall_time */
> +	return clocksource_delta(tkr->read(tkr->clock), tkr->cycle_last,
> +								tkr->mask);

Please don't do spurious col80 line breaks that don't improve the end 
result.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/12] time: Try to catch clocksource delta underflows
  2015-03-07  2:49 ` [PATCH 07/12] time: Try to catch clocksource delta underflows John Stultz
@ 2015-03-07  9:34   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:34 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> In the case where there is a broken clocksource
> where there are multiple actual clocks that
> aren't perfectly aligned, we may see small "negative"
> deltas when we subtract now from cycle_last.

So this is a typical example where putting argument references is 
really helpful. Consider these two variants:

   deltas when we subtract now from cycle_last.

   deltas when we subtract 'now' from 'cycle_last'.

I had to read the first variant three times until I realized that 
'now' is a C variable misleadingly inserted into English text.

> The values are actually negative with respect to the
> clocksource mask value, not necessarily negative
> if cast to a s64, but we can check by checking the
> delta see if it is a small (relative to the mask)
> negative value (again negative relative to the mask).

typo.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/12] time: Add warnings when overflows or underflows are observed
  2015-03-07  2:49 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
@ 2015-03-07  9:40   ` Ingo Molnar
  2015-03-09 16:50     ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:40 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> It was suggested that the underflow/overflow protection
> should probably throw some sort of warning out, rather
> then just silently fixing the issue.

Typo.

> So this patch adds some warnings here. The flag variables
> used are not protected by locks, but since we can't print
> from the reading functions, just being able to say we
> saw an issue in the update interval is useful enough,
> and can be slightly racy without real consequnece.

Typo.

> The big complication is that we're only under a read
> seqlock, so the data could shift under us during
> our calcualtion to see if there was a problem. This

Typo.

> patch avoids this issue by nesting another seqlock
> which allows us to snapshot the just required values
> atomically. So we shouldn't see false positives.
> 
> I also added some basic ratelimiting here, since
> on one build machine w/ skewed TSCs it was fairly
> noisy at bootup.

> +#define WARNINGFREQ (HZ*300) /* 5 minute rate-limiting */

Nit: so in general wereallytrytokeepwordsapart, so I'd suggest a 
name of WARNING_FREQ or so?

>  	cycle_t max_cycles = tk->tkr.clock->max_cycles;
>  	const char *name = tk->tkr.clock->name;
> +	static long last_warning; /* we always hold write on timekeeper lock */

So I'm not sure I ever heard the phrase 'to hold write', this doesn't 
parse for me.

Also, static global variables should really, really not be immersed 
amongst on-stack variables, they are so easy to overlook. Just put 
them in front of the function.

>  
>  	if (offset > max_cycles)
>  		printk_deferred("ERROR: cycle offset (%lld) is larger then"
> @@ -133,28 +145,60 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
>  		printk_deferred("WARNING: cycle offset (%lld) is past"
>  			" the %s 50%% safety margin (%lld)\n",
>  			offset, name, max_cycles>>1);
> +
> +	if (timekeeping_underflow_seen) {
> +		if (jiffies - last_warning > WARNINGFREQ) {
> +			printk_deferred("WARNING: Clocksource underflow observed\n");
> +			last_warning = jiffies;
> +		}
> +		timekeeping_underflow_seen = 0;
> +	}
> +	if (timekeeping_overflow_seen) {
> +		if (jiffies - last_warning > WARNINGFREQ) {
> +			printk_deferred("WARNING: Clocksource overflow observed\n");

I think the warning should be more informative. If a distro turns this 
on and a user sees this value, what will he think? Is the kernel still 
OK? What can he do about it?

> +			last_warning = jiffies;
> +		}
> +		timekeeping_overflow_seen = 0;
> +	}
> +
>  }
>  
>  static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
>  {
> -	cycle_t cycle_now, delta;
> +	cycle_t now, last, mask, max, delta;
> +	unsigned int seq;
>  
> -	/* read clocksource */
> -	cycle_now = tkr->read(tkr->clock);
> +	/*
> +	 * Since we're called holding a seqlock, the data may shift
> +	 * under us while we're doign the calculation. This can cause

Typo...

> +	 * false positives, since we'd note a problem but throw the
> +	 * results away. So nest  another seqlock here to atomically

Spurious space. I know they are cheap, but still.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/12] clocksource: Mostly kill clocksource_register()
  2015-03-07  2:50 ` [PATCH 10/12] clocksource: Mostly kill clocksource_register() John Stultz
@ 2015-03-07  9:45   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:45 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra,
	David S. Miller, Martin Schwidefsky


* John Stultz <john.stultz@linaro.org> wrote:

> A long running project has been to cleanup remaining uses
> of clocksource_register(), replacing it with the simpler
> clocksource_register_khz/hz().

s/.$/ functions.

> However, there are a few cases where we need to self-define
> our mult/shift values, so switch the function to a more
> obviously internal __clocksource_register(), and consolidate

s/(),/() name,

> much of the internal logic so we don't have duplication.

> +	if (freq) {
> +		/*
> +		 * Calc the maximum number of seconds which we can run before
> +		 * wrapping around. For clocksources which have a mask > 32bit
> +		 * we need to limit the max sleep time to have a good
> +		 * conversion precision. 10 minutes is still a reasonable
> +		 * amount. That results in a shift value of 24 for a
> +		 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
> +		 * ~ 0.06ppm granularity for NTP.

s/32bit/32-bit
s/40bit/40-bit

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/12] sparc: Convert to using clocksource_register_hz()
  2015-03-07  2:50 ` [PATCH 11/12] sparc: Convert to using clocksource_register_hz() John Stultz
@ 2015-03-07  9:46   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:46 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra,
	David S. Miller


* John Stultz <john.stultz@linaro.org> wrote:

> While cleaning up some clocksource code, I noticed the
> time_32 impelementation uses the hz2mult helper, but

typo.

s/hz2mult/clocksource_hz2mult()

> doesn't use the clocksource_register_hz() method.
> 
> I don't believe the sparc clocksource is a default

s/sparc/Sparc

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/12] clocksource: Add some debug info about clocksources being registered
  2015-03-07  2:50 ` [PATCH 12/12] clocksource: Add some debug info about clocksources being registered John Stultz
@ 2015-03-07  9:50   ` Ingo Molnar
  2015-03-12  3:16     ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:50 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> Print the mask, max_cycles, and max_idle_ns values for clocksources
> being registered.
> 
> Cc: Dave Jones <davej@codemonkey.org.uk>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Stephen Boyd <sboyd@codeaurora.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/time/clocksource.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 3f24bb3..9b75316 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -697,6 +697,10 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)

So that name should really be something like __clocksource_update_freq_scale(),
right?

>  	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
>  						 cs->maxadj, cs->mask,
>  						 &cs->max_cycles);
> +
> +	pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
> +			cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
> +

Also, why is this function exported? No in-tree module uses it.

Also, when I read the code, I noticed:

/**
 * __clocksource_updatefreq_scale - Used update clocksource with new freq
 * @cs:         clocksource to be registered
 * @scale:      Scale factor multiplied against freq to get clocksource hz
 * @freq:       clocksource frequency (cycles per second) divided by scale
 *
 * This should only be called from the clocksource->enable() method.
 *
 * This *SHOULD NOT* be called directly! Please use the
 * clocksource_updatefreq_hz() or clocksource_updatefreq_khz helper functions.

One more reason to not export it?

Also, typo in the description.

Plus, one helper function is marked with '()', the other not ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/12] clocksource: Add max_cycles to clocksource structure
  2015-03-07  2:49 ` [PATCH 04/12] clocksource: Add max_cycles to clocksource structure John Stultz
@ 2015-03-07  9:54   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:54 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -469,11 +469,14 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
>   * @shift:	cycle to nanosecond divisor (power of two)
>   * @maxadj:	maximum adjustment value to mult (~11%)
>   * @mask:	bitmask for two's complement subtraction of non 64 bit counters
> + * @max_cyc:    maximum cycle value before potential overflow (does not include

So this uses spaces instead of tabs, possibly breaking Docbook.

> + *		any saftey margin)

Typo.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/12] time: Add warnings when overflows or underflows are observed
  2015-03-07  9:40   ` Ingo Molnar
@ 2015-03-09 16:50     ` John Stultz
  2015-03-10  5:05       ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-03-09 16:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra

On Sat, Mar 7, 2015 at 1:40 AM, Ingo Molnar <mingo@kernel.org> wrote:
...
>
> Typo.
...
>
> Typo.
>
...
>
> Typo.
>
...
>
> Typo...
>
...
>
> Spurious space. I know they are cheap, but still.

And a big D with a circle around it. Back to grade-school with me. :)

Thanks Ingo for the very close review, and apologies for my poor
keyboardmanship (I hope I didn't burn much of your good will here).
I'll work to get these trivial changes integrated along with the more
substantial feedback as well.

thanks
-john

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/12] time: Add warnings when overflows or underflows are observed
  2015-03-09 16:50     ` John Stultz
@ 2015-03-10  5:05       ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-03-10  5:05 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> Thanks Ingo for the very close review, and apologies for my poor 
> keyboardmanship (I hope I didn't burn much of your good will here).

No problem. I usually fix typos up when the patch is otherwise good, 
except for Git pulls, where I cannot, so I'm pushing back ...

> I'll work to get these trivial changes integrated along with the 
> more substantial feedback as well.

It's all nice changes otherwise. I'm fairly sure the new sanity checks 
are going to show us interesting things in the future.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/12] clocksource: Add some debug info about clocksources being registered
  2015-03-07  9:50   ` Ingo Molnar
@ 2015-03-12  3:16     ` John Stultz
  0 siblings, 0 replies; 30+ messages in thread
From: John Stultz @ 2015-03-12  3:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra

On Sat, Mar 7, 2015 at 1:50 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> Print the mask, max_cycles, and max_idle_ns values for clocksources
>> being registered.
>>
>> Cc: Dave Jones <davej@codemonkey.org.uk>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Richard Cochran <richardcochran@gmail.com>
>> Cc: Prarit Bhargava <prarit@redhat.com>
>> Cc: Stephen Boyd <sboyd@codeaurora.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>> ---
>>  kernel/time/clocksource.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index 3f24bb3..9b75316 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -697,6 +697,10 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
>
> So that name should really be something like __clocksource_update_freq_scale(),
> right?

Sure, I can add an extra patch to address that...


>>       cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
>>                                                cs->maxadj, cs->mask,
>>                                                &cs->max_cycles);
>> +
>> +     pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
>> +                     cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
>> +
>
> Also, why is this function exported? No in-tree module uses it.

So, its mostly due to the fact that the inline function
__clocksource_updatefreq_hz() in clocksource.h uses it, which may be
used by some clocksources in their enable() hooks. And those
clocksources may be compiled as a module.


> Also, when I read the code, I noticed:
>
> /**
>  * __clocksource_updatefreq_scale - Used update clocksource with new freq
>  * @cs:         clocksource to be registered
>  * @scale:      Scale factor multiplied against freq to get clocksource hz
>  * @freq:       clocksource frequency (cycles per second) divided by scale
>  *
>  * This should only be called from the clocksource->enable() method.
>  *
>  * This *SHOULD NOT* be called directly! Please use the
>  * clocksource_updatefreq_hz() or clocksource_updatefreq_khz helper functions.
>
> One more reason to not export it?
>
> Also, typo in the description.
>
> Plus, one helper function is marked with '()', the other not ...

I'll address that in the new patch as well.

thanks
-john

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/12] time: Add warnings when overflows or underflows are observed
  2015-01-23  0:09 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
@ 2015-01-23 14:30   ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2015-01-23 14:30 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Thu, Jan 22, 2015 at 04:09:23PM -0800, John Stultz wrote:
>  #ifdef CONFIG_DEBUG_TIMEKEEPING
> +#define WARNINGFREQ (HZ*300) /* 5 minute rate-limiting */
> +/*
> + * These simple flag variables are managed
> + * without locks, which is racy, but ok since
> + * we don't really care about being super
> + * precise about how many events were seen,
> + * just that a problem was observed.
> + */
> +static int timekeeping_underflow_seen;
> +static int timekeeping_overflow_seen;
> +
>  static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
>  {
>  
>  	cycle_t max_cycles = tk->tkr.clock->max_cycles;
>  	const char *name = tk->tkr.clock->name;
> +	static long last_warning; /* we always hold write on timekeeper lock */
>  
>  	if (offset > max_cycles)
>  		printk_deferred("ERROR: cycle offset (%lld) is larger then"
> @@ -133,28 +145,60 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
>  		printk_deferred("WARNING: cycle offset (%lld) is past"
>  			" the %s 50%% safety margin (%lld)\n",
>  			offset, name, max_cycles>>1);
> +
> +	if (timekeeping_underflow_seen) {
> +		if (jiffies - last_warning > WARNINGFREQ) {
> +			printk_deferred("WARNING: Clocksource underflow observed\n");
> +			last_warning = jiffies;
> +		}
> +		timekeeping_underflow_seen = 0;
> +	}
> +	if (timekeeping_overflow_seen) {
> +		if (jiffies - last_warning > WARNINGFREQ) {
> +			printk_deferred("WARNING: Clocksource overflow observed\n");
> +			last_warning = jiffies;
> +		}
> +		timekeeping_overflow_seen = 0;
> +	}
> +
>  }

Ah, ignore my last comment. Excellent!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 08/12] time: Add warnings when overflows or underflows are observed
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23 14:30   ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

It was suggested that the underflow/overflow protection
should probably throw some sort of warning out, rather
then just silently fixing the issue.

So this patch adds some warnings here. The flag variables
used are not protected by locks, but since we can't print
from the reading functions, just being able to say we
saw an issue in the update interval is useful enough,
and can be slightly racy without real consequnece.

The big complication is that we're only under a read
seqlock, so the data could shift under us during
our calcualtion to see if there was a problem. This
patch avoids this issue by nesting another seqlock
which allows us to snapshot the just required values
atomically. So we shouldn't see false positives.

I also added some basic ratelimiting here, since
on one build machine w/ skewed TSCs it was fairly
noisy at bootup.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 58 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 568186c..d216b0e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -119,11 +119,23 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
 }
 
 #ifdef CONFIG_DEBUG_TIMEKEEPING
+#define WARNINGFREQ (HZ*300) /* 5 minute rate-limiting */
+/*
+ * These simple flag variables are managed
+ * without locks, which is racy, but ok since
+ * we don't really care about being super
+ * precise about how many events were seen,
+ * just that a problem was observed.
+ */
+static int timekeeping_underflow_seen;
+static int timekeeping_overflow_seen;
+
 static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 {
 
 	cycle_t max_cycles = tk->tkr.clock->max_cycles;
 	const char *name = tk->tkr.clock->name;
+	static long last_warning; /* we always hold write on timekeeper lock */
 
 	if (offset > max_cycles)
 		printk_deferred("ERROR: cycle offset (%lld) is larger then"
@@ -133,28 +145,60 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 		printk_deferred("WARNING: cycle offset (%lld) is past"
 			" the %s 50%% safety margin (%lld)\n",
 			offset, name, max_cycles>>1);
+
+	if (timekeeping_underflow_seen) {
+		if (jiffies - last_warning > WARNINGFREQ) {
+			printk_deferred("WARNING: Clocksource underflow observed\n");
+			last_warning = jiffies;
+		}
+		timekeeping_underflow_seen = 0;
+	}
+	if (timekeeping_overflow_seen) {
+		if (jiffies - last_warning > WARNINGFREQ) {
+			printk_deferred("WARNING: Clocksource overflow observed\n");
+			last_warning = jiffies;
+		}
+		timekeeping_overflow_seen = 0;
+	}
+
 }
 
 static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
 {
-	cycle_t cycle_now, delta;
+	cycle_t now, last, mask, max, delta;
+	unsigned int seq;
 
-	/* read clocksource */
-	cycle_now = tkr->read(tkr->clock);
+	/*
+	 * Since we're called holding a seqlock, the data may shift
+	 * under us while we're doign the calculation. This can cause
+	 * false positives, since we'd note a problem but throw the
+	 * results away. So nest  another seqlock here to atomically
+	 * grab the points we are checking with.
+	 */
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		now = tkr->read(tkr->clock);
+		last = tkr->cycle_last;
+		mask = tkr->mask;
+		max = tkr->clock->max_cycles;
+	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	/* calculate the delta since the last update_wall_time */
-	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+	delta = clocksource_delta(now, last, mask);
 
 	/*
 	 * Try to catch underflows by checking if we are seeing small
 	 * mask-relative negative values.
 	 */
-	if (unlikely((~delta & tkr->mask) < (tkr->mask >> 3)))
+	if (unlikely((~delta & mask) < (mask >> 3))) {
+		timekeeping_underflow_seen = 1;
 		delta = 0;
+	}
 
 	/* Cap delta value to the max_cycles values to avoid mult overflows */
-	if (unlikely(delta > tkr->clock->max_cycles))
+	if (unlikely(delta > max)) {
+		timekeeping_overflow_seen = 1;
 		delta = tkr->clock->max_cycles;
+	}
 
 	return delta;
 }
-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-03-12  3:16 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
2015-03-07  2:49 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
2015-03-07  9:20   ` Ingo Molnar
2015-03-07  2:49 ` [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
2015-03-07  9:22   ` Ingo Molnar
2015-03-07  2:49 ` [PATCH 03/12] clocksource: Remove clocksource_max_deferment() John Stultz
2015-03-07  9:18   ` Ingo Molnar
2015-03-07  2:49 ` [PATCH 04/12] clocksource: Add max_cycles to clocksource structure John Stultz
2015-03-07  9:54   ` Ingo Molnar
2015-03-07  2:49 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
2015-03-07  9:22   ` Paul Bolle
2015-03-07  9:29   ` Ingo Molnar
2015-03-07  2:49 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
2015-03-07  9:32   ` Ingo Molnar
2015-03-07  2:49 ` [PATCH 07/12] time: Try to catch clocksource delta underflows John Stultz
2015-03-07  9:34   ` Ingo Molnar
2015-03-07  2:49 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
2015-03-07  9:40   ` Ingo Molnar
2015-03-09 16:50     ` John Stultz
2015-03-10  5:05       ` Ingo Molnar
2015-03-07  2:50 ` [PATCH 09/12] clocksource: Improve clocksource watchdog reporting John Stultz
2015-03-07  2:50 ` [PATCH 10/12] clocksource: Mostly kill clocksource_register() John Stultz
2015-03-07  9:45   ` Ingo Molnar
2015-03-07  2:50 ` [PATCH 11/12] sparc: Convert to using clocksource_register_hz() John Stultz
2015-03-07  9:46   ` Ingo Molnar
2015-03-07  2:50 ` [PATCH 12/12] clocksource: Add some debug info about clocksources being registered John Stultz
2015-03-07  9:50   ` Ingo Molnar
2015-03-12  3:16     ` John Stultz
  -- strict thread matches above, loose matches on Subject: below --
2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
2015-01-23  0:09 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
2015-01-23 14:30   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).