LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: john stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Hellwig <hch@infradead.org>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tim Bird <tim.bird@am.sony.com>, Sam Ravnborg <sam@ravnborg.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Steven Rostedt <srostedt@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	Daniel Walker <dwalker@mvista.com>
Subject: Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles
Date: Wed, 16 Jan 2008 18:39:28 -0500	[thread overview]
Message-ID: <20080116233927.GB23895@Krystal> (raw)
In-Reply-To: <1200523867.6127.5.camel@localhost.localdomain>

* john stultz (johnstul@us.ibm.com) wrote:
> 
> On Wed, 2008-01-16 at 14:36 -0800, john stultz wrote:
> > On Jan 16, 2008 6:56 AM, Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > > If you really want an seqlock free algorithm (I _do_ want this for
> > > tracing!) :) maybe going in the RCU direction could help (I refer to my
> > > RCU-based 32-to-64 bits lockless timestamp counter extension, which
> > > could be turned into the clocksource updater).
> > 
> > Yea. After our earlier discussion and talking w/ Steven, I'm taking a
> > swing at this now.  The lock-free method still doesn't apply to the
> > update_wall_time function, but does work fine for the monotonic cycle
> > uses.  I'll send a patch for review as soon as I get things building.
> 
> So here's my first attempt at adding Mathieu's lock-free method to
> Steven's get_monotonic_cycles() interface. 
> 
> Completely un-tested, but it builds, so I figured I'd send it out for
> review.
> 
> I'm not super sure the update or the read doesn't need something
> additional to force a memory access, but as I didn't see anything 
> special in Mathieu's implementation, I'm going to guess this is ok.
> 
> Mathieu, Let me know if this isn't what you're suggesting.
> 
> Signed-off-by: John Stultz <johnstul@us.ibm.com>
> 
> Index: monotonic-cleanup/include/linux/clocksource.h
> ===================================================================
> --- monotonic-cleanup.orig/include/linux/clocksource.h	2008-01-16 12:22:04.000000000 -0800
> +++ monotonic-cleanup/include/linux/clocksource.h	2008-01-16 14:41:31.000000000 -0800
> @@ -87,9 +87,17 @@
>  	 * more than one cache line.
>  	 */
>  	struct {
> -		cycle_t cycle_last, cycle_accumulated, cycle_raw;
> -	} ____cacheline_aligned_in_smp;

Shouldn't the cycle_last and cycle_accumulated by in the array too ?

> +		cycle_t cycle_last, cycle_accumulated;
>  
> +		/* base structure provides lock-free read
> +		 * access to a virtualized 64bit counter
> +		 * Uses RCU-like update.
> +		 */
> +		struct {

We had cycle_raw before, why do we need the following two ?

> +			cycle_t cycle_base_last, cycle_base;

I'm not quite sure why you need both cycle_base_last and cycle_base...

I think I'll need a bit of an explanation of what you are trying to
achieve here to see what to expect from the clock source. Are you trying
to deal with non-synchronized TSCs across CPUs in a way that will
generate a monotonic (sometimes stalling) clock ?

What I am trying to say is : I know you are trying to make a virtual
clock source where time cannot go backward, but what are your
assumptions about the "real" clock source ?

Is the intent to deal with an HPET suddenly reset to 0 or something
like this ?

Basically, I wonder why you have to calculate the current cycle count
from the previous update_wall_time event. Is is because you need to be
consistent when a clocksource change occurs ?


> +		} base[2];
> +		int base_num;
> +	} ____cacheline_aligned_in_smp;
>  	u64 xtime_nsec;
>  	s64 error;
>  
> @@ -175,19 +183,21 @@
>  }
>  
>  /**
> - * clocksource_get_cycles: - Access the clocksource's accumulated cycle value
> + * clocksource_get_basecycles: - get the clocksource's accumulated cycle value
>   * @cs:		pointer to clocksource being read
>   * @now:	current cycle value
>   *
>   * Uses the clocksource to return the current cycle_t value.
>   * NOTE!!!: This is different from clocksource_read, because it
> - * returns the accumulated cycle value! Must hold xtime lock!
> + * returns a 64bit wide accumulated value.
>   */
>  static inline cycle_t
> -clocksource_get_cycles(struct clocksource *cs, cycle_t now)
> +clocksource_get_basecycles(struct clocksource *cs, cycle_t now)
>  {
> -	cycle_t offset = (now - cs->cycle_last) & cs->mask;
> -	offset += cs->cycle_accumulated;

I would disable preemption in clocksource_get_basecycles. We would not
want to be scheduled out while we hold a pointer to the old array
element.

> +	int num = cs->base_num;

Since you deal with base_num in a shared manner (not per cpu), you will
need a smp_read_barrier_depend() here after the cs->base_num read.

You should think about reading the cs->base_num first, and _after_ that
read the real clocksource. Here, the clocksource value is passed as
parameter. It means that the read clocksource may have been read in the
previous RCU window.

> +	cycle_t offset = (now - cs->base[num].cycle_base_last);
> +	offset &= cs->mask;
> +	offset += cs->base[num].cycle_base;
>  	return offset;
>  }
>  
> @@ -197,14 +207,25 @@
>   * @now:	current cycle value
>   *
>   * Used to avoids clocksource hardware overflow by periodically
> - * accumulating the current cycle delta. Must hold xtime write lock!
> + * accumulating the current cycle delta. Uses RCU-like update, but
> + * ***still requires the xtime_lock is held for writing!***
>   */
>  static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
>  {

Why do we still require xtime_lock here ? Can you tell exactly which
contexts this function will be called from (periodical timer interrupt?)
I guess it is called from one and only one CPU periodically.

> -	cycle_t offset = (now - cs->cycle_last) & cs->mask;
> +	/* First update the monotonic base portion.
> +	 * The dual array update method allows for lock-free reading.
> +	 */
> +	int num = !cs->base_num;
> +	cycle_t offset = (now - cs->base[!num].cycle_base_last);

!0 is not necessarily 1. This is why I use cpu_synth->index ? 0 : 1 in
my code. The two previous lines seems buggy. (I did the same mistake in
my first implementation) ;)

> +	offset &= cs->mask;
> +	cs->base[num].cycle_base = cs->base[!num].cycle_base + offset;

Here too.

> +	cs->base[num].cycle_base_last = now;

Since you deal with shared data (in my algo, I use per-cpu data), you
have to add a wmb() before the base_num value update. Only then will you
ensure that other CPUs will see consistent values.

> +	cs->base_num = num;
> +
> +	/* Now update the cycle_accumulated portion */
> +	offset = (now - cs->cycle_last) & cs->mask;

The following two updates are racy. I think they should be in the array
too. We want consistant cycle_raw, cycle_last and cycle_accumulated
values; they should therefore be presented to the reader atomically with
a pointer change.

>  	cs->cycle_last = now;
>  	cs->cycle_accumulated += offset;
> -	cs->cycle_raw += offset;
>  }
>  
>  /**
> Index: monotonic-cleanup/kernel/time/timekeeping.c
> ===================================================================
> --- monotonic-cleanup.orig/kernel/time/timekeeping.c	2008-01-16 12:21:46.000000000 -0800
> +++ monotonic-cleanup/kernel/time/timekeeping.c	2008-01-16 14:15:31.000000000 -0800
> @@ -71,10 +71,12 @@
>   */
>  static inline s64 __get_nsec_offset(void)
>  {
> -	cycle_t cycle_delta;
> +	cycle_t now, cycle_delta;
>  	s64 ns_offset;
>  
> -	cycle_delta = clocksource_get_cycles(clock, clocksource_read(clock));
> +	now = clocksource_read(clock);

Why is there a second level of cycle_last and cycle_accumulated here ?

> +	cycle_delta = (now - clock->cycle_last) & clock->mask;
> +	cycle_delta += clock->cycle_accumulated;
>  	ns_offset = cyc2ns(clock, cycle_delta);
>  
>  	return ns_offset;
> @@ -105,35 +107,7 @@
>  
>  cycle_t notrace get_monotonic_cycles(void)
>  {
> -	cycle_t cycle_now, cycle_delta, cycle_raw, cycle_last;
> -
> -	do {
> -		/*
> -		 * cycle_raw and cycle_last can change on
> -		 * another CPU and we need the delta calculation
> -		 * of cycle_now and cycle_last happen atomic, as well
> -		 * as the adding to cycle_raw. We don't need to grab
> -		 * any locks, we just keep trying until get all the
> -		 * calculations together in one state.
> -		 *
> -		 * In fact, we __cant__ grab any locks. This
> -		 * function is called from the latency_tracer which can
> -		 * be called anywhere. To grab any locks (including
> -		 * seq_locks) we risk putting ourselves into a deadlock.
> -		 */
> -		cycle_raw = clock->cycle_raw;
> -		cycle_last = clock->cycle_last;
> -
> -		/* read clocksource: */
> -		cycle_now = clocksource_read(clock);
> -
> -		/* calculate the delta since the last update_wall_time: */
> -		cycle_delta = (cycle_now - cycle_last) & clock->mask;
> -
> -	} while (cycle_raw != clock->cycle_raw ||
> -		 cycle_last != clock->cycle_last);
> -
> -	return cycle_raw + cycle_delta;
> +	return clocksource_get_basecycles(clock, clocksource_read(clock));
>  }
>  
>  unsigned long notrace cycles_to_usecs(cycle_t cycles)
> 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  parent reply	other threads:[~2008-01-16 23:39 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 23:29 [RFC PATCH 00/22 -v2] mcount and latency tracing utility -v2 Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 01/22 -v2] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-10 18:19   ` Jan Kiszka
2008-01-10 19:54     ` Steven Rostedt
2008-01-10 23:02     ` Steven Rostedt
2008-01-10 18:28   ` Sam Ravnborg
2008-01-10 19:10     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 02/22 -v2] Annotate core code that should not be traced Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 03/22 -v2] x86_64: notrace annotations Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 04/22 -v2] add notrace annotations to vsyscall Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 05/22 -v2] add notrace annotations for NMI routines Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 06/22 -v2] mcount based trace in the form of a header file library Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 07/22 -v2] tracer add debugfs interface Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 08/22 -v2] mcount tracer output file Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 09/22 -v2] mcount tracer show task comm and pid Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 10/22 -v2] Add a symbol only trace output Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 11/22 -v2] Reset the tracer when started Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 12/22 -v2] separate out the percpu date into a percpu struct Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays Steven Rostedt
2008-01-10  0:00   ` john stultz
2008-01-10  0:09     ` Steven Rostedt
2008-01-10 19:54     ` Tony Luck
2008-01-10 20:15       ` Steven Rostedt
2008-01-10 20:41         ` john stultz
2008-01-10 20:29       ` john stultz
2008-01-10 20:42         ` Mathieu Desnoyers
2008-01-10 21:25           ` john stultz
2008-01-10 22:00             ` Mathieu Desnoyers
2008-01-10 22:40               ` Steven Rostedt
2008-01-10 22:51               ` john stultz
2008-01-10 23:05                 ` john stultz
2008-01-10 21:33         ` [RFC PATCH 13/22 -v2] handle accurate time keeping over longdelays Luck, Tony
2008-01-10  0:19   ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays john stultz
2008-01-10  0:25     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 14/22 -v2] time keeping add cycle_raw for actual incrementation Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 15/22 -v2] initialize the clock source to jiffies clock Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 16/22 -v2] add get_monotonic_cycles Steven Rostedt
2008-01-10  3:28   ` Daniel Walker
2008-01-15 21:46   ` Mathieu Desnoyers
2008-01-15 22:01     ` Steven Rostedt
2008-01-15 22:03       ` Steven Rostedt
2008-01-15 22:08       ` Mathieu Desnoyers
2008-01-16  1:38         ` Steven Rostedt
2008-01-16  3:17           ` Mathieu Desnoyers
2008-01-16 13:17             ` Steven Rostedt
2008-01-16 14:56               ` Mathieu Desnoyers
2008-01-16 15:06                 ` Steven Rostedt
2008-01-16 15:28                   ` Mathieu Desnoyers
2008-01-16 15:58                     ` Steven Rostedt
2008-01-16 17:00                       ` Mathieu Desnoyers
2008-01-16 17:49                         ` Mathieu Desnoyers
2008-01-16 19:43                         ` Steven Rostedt
2008-01-16 20:17                           ` Mathieu Desnoyers
2008-01-16 20:45                             ` Tim Bird
2008-01-16 20:49                             ` Steven Rostedt
2008-01-17 20:08                             ` Steven Rostedt
2008-01-17 20:37                               ` Frank Ch. Eigler
2008-01-17 21:03                                 ` Steven Rostedt
2008-01-18 22:26                                   ` Mathieu Desnoyers
2008-01-18 22:49                                     ` Steven Rostedt
2008-01-18 23:19                                       ` Mathieu Desnoyers
2008-01-19  3:36                                         ` Frank Ch. Eigler
2008-01-19  3:55                                           ` Steven Rostedt
2008-01-19  4:23                                             ` Frank Ch. Eigler
2008-01-19 15:29                                               ` Mathieu Desnoyers
2008-01-19  3:32                                       ` Frank Ch. Eigler
2008-01-16 18:01                       ` Tim Bird
2008-01-16 22:36                 ` john stultz
2008-01-16 22:51                   ` john stultz
2008-01-16 23:33                     ` Steven Rostedt
2008-01-17  2:28                       ` john stultz
2008-01-17  2:40                         ` Mathieu Desnoyers
2008-01-17  2:50                           ` Mathieu Desnoyers
2008-01-17  3:02                             ` Steven Rostedt
2008-01-17  3:21                             ` Paul Mackerras
2008-01-17  3:39                               ` Steven Rostedt
2008-01-17  4:22                                 ` Mathieu Desnoyers
2008-01-17  4:25                                 ` Mathieu Desnoyers
2008-01-17  4:14                               ` Mathieu Desnoyers
2008-01-17 15:22                                 ` Steven Rostedt
2008-01-17 17:46                                 ` Linus Torvalds
2008-01-17  2:51                           ` Steven Rostedt
2008-01-16 23:39                     ` Mathieu Desnoyers [this message]
2008-01-16 23:50                       ` Steven Rostedt
2008-01-17  0:36                         ` Steven Rostedt
2008-01-17  0:33                       ` john stultz
2008-01-17  2:20                         ` Mathieu Desnoyers
2008-01-17  1:03                       ` Linus Torvalds
2008-01-17  1:35                         ` Mathieu Desnoyers
2008-01-17  2:20                       ` john stultz
2008-01-17  2:35                         ` Mathieu Desnoyers
2008-01-09 23:29 ` [RFC PATCH 17/22 -v2] Add timestamps to tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 18/22 -v2] Sort trace by timestamp Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 19/22 -v2] speed up the output of the tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 20/22 -v2] Add latency_trace format tor tracer Steven Rostedt
2008-01-10  3:41   ` Daniel Walker
2008-01-09 23:29 ` [RFC PATCH 21/22 -v2] Split out specific tracing functions Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 22/22 -v2] Trace irq disabled critical timings Steven Rostedt
2008-01-10  3:58   ` Daniel Walker
2008-01-10 14:45     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080116233927.GB23895@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=akpm@linux-foundation.org \
    --cc=dwalker@mvista.com \
    --cc=fche@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=hch@infradead.org \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --cc=sam@ravnborg.org \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.bird@am.sony.com \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).