LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: john stultz <johnstul@us.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Tony Luck <tony.luck@intel.com>,
	bob.picco@hp.com, Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Hellwig <hch@infradead.org>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tim Bird <tim.bird@am.sony.com>, Sam Ravnborg <sam@ravnborg.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Steven Rostedt <srostedt@redhat.com>
Subject: Re: [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays
Date: Thu, 10 Jan 2008 14:51:57 -0800	[thread overview]
Message-ID: <1200005517.30225.75.camel@localhost.localdomain> (raw)
In-Reply-To: <20080110220012.GA9508@Krystal>


On Thu, 2008-01-10 at 17:00 -0500, Mathieu Desnoyers wrote:
> * john stultz (johnstul@us.ibm.com) wrote:
> > 
> > On Thu, 2008-01-10 at 15:42 -0500, Mathieu Desnoyers wrote:
> > > I think it's about time I introduce the approach I have taken for LTTng
> > > timestamping. Basically, one of the main issues with the clock sources
> > > is the xtime lock : having a read seqlock nested over a write seqlock is
> > > a really, really bad idea. This can happen with NMIs. Basically, it
> > > would cause a deadlock.
> > > 
> > > What I have done is an RCU algorithm that extends a 32 bits TSC (that's
> > > the case on MIPS, for instance) to 64 bits. The update of the MSBs is
> > > done by a periodical timer (fired often enough to make sure we always
> > > detect the 32 LSBs wrap-around) and the read-side only has to disable
> > > preemption.
> > > 
> > > I use a 2 slots array, each of them keeping, alternatively, the last 64
> > > bits counter value, to implement the RCU algorithm.
> > > 
> > > Since we are discussing time source modification, this is one that I
> > > would really like to see in the Linux kernel : it would provide the kind
> > > of time source needed for function entry/exit tracing and for generic
> > > kernel tracing as well.
> > 
> > Hmm. I know powerpc has had a similar lock-free dual structure method
> > and for just a raw cycles based method you've shown below (or for some
> > of the bits Steven is working on), I think it should be fine.
> > 
> > The concern I've had with this method for general timekeeping, is that
> > I'm not sure it can handle the frequency corrections made by NTP. Since
> > we have to make sure time does not jump backwards, consider this
> > exaggerated situation:
> > 
> > time = base + (now - last)*mult;
> > 
> > So we have two structures:
> > base: 60		base: 180
> > last: 10		last: 30
> > mult: 06		mult: 05
> > 
> > Where the second structure has just been updated lock-free, however just
> > before the atomic pointer switch we were preempted, or somehow delayed,
> > and some time has past.
> > 
> > Now imagine two cpus now race to get the time. Both read the same now
> > value, but get different structure pointer values. (Note: You can create
> > the same race if you reverse the order and grab the pointer first, then
> > the cycle. However I think this example makes it easier to understand).
> > 
> > now = 50
> > cpu1:
> >   60 + (50-10)*6 = 300
> > cpu2:
> >   180 + (50-30)*5 = 280
> > 
> > 
> > Alternatively:
> > now=50: 60 + (50-10)*6 = 300
> > now=51: 180 + (51-30)*5 = 285
> > 
> > Eek. That's not good.
> > 
> > I'm not sure how this can be avoided, but I'd be very interested in
> > hearing ideas! Bounding the issue is a possibility, but then starts to
> > run amok with NO_HZ and -rt deferment.
> > 
> > thanks
> > -john
> > 
> 
> I suggest we try to see the problem differently (and see how far we can
> get with this) :
> 
> Let's suppose we have a 32 bits cycles counter given by the
> architecture. We use the lockless algorithm to extend it to 64 bits : we
> therefore have a 64 bits cycle counter that can be read locklessly.
> 
> Then, from this 64 bits counter (let's call it "now") (which has the
> advantage of never overflowing, or after enough years so nobody
> cares...), we can calculate the current time with :

Hmm. Maybe I'm missing something here. I'm not sure I'm following the
importance of the 64bit extension. 

The clocksource code deals with counters in a range of widths (from
64bit TSC to 24bit ACPI PM). The only requirement there is that have
accumulate often enough that it doesn't wrap twice between
accumulations. Currently this is done in update_wall_time(), but the
patch Steven sent that started this thread adds an interim step using
cycle_accumulated, allowing update_wall_time() to be deferred for longer
periods of time.

I do see how the method you're describing could be applied to just the
cycle_accumulated management, and maybe that's the whole point?

However my concern is that when we do the frequency adjustment in
update_wall_time, I'm not sure the method works. Thus we still would
have to have a lock in there for gettimeofday().

But let's continue...

> time = base + (now - last) * mul
> 
> NTP would adjust time by modifying mul, last and base; base would be
> recalculated from the formula with : base + (now - last) * mul each time
> we modify the clock rate (mul), we also read the current "now" value
> (which is saved as "last"). This would be done system-wide and would be
> kept in a data structure separate from the 2 64 bits slots array.
> Ideally, this NTP correction update could also be done atomically with a
> 2 slots array.
> 
> Whenever we need to read the time, we then have to insure that the "now"
> value we use is consistent with the current NTP time correction. We want
> to eliminate races where we would use value from the wrong NTP "window"
> with a "now" value not belonging to this window. (an NTP window would be
> defined by a tuple of base, last and mul values)
> 
> If "now" is lower than "last", we are using an old timestamp with a
> new copy of the structure and must therefore re-read the "now" value.

Ok, that would avoid one type of error, but in both of my examples in
the last mail, now was greater then last.

> If, when we are about to return the "time" value calculated, we figure
> out that the current NTP window pointer have changed, we must make sure
> that the time value we are about to return is lower than the new base or
> otherwise time could go backward. If we detect that time is higher than
> the new base, we re-read the "now" value and re-do the calculation.

Again, I'm not sure I see how this resolves the previous example given,
as in that case the update code was delayed in between its reading of
now and the final pointer change.

The issue is that the race isn't just between the readers and the
writer, but that time races against writer as well. So if you don't lock
the readers out during the write, I'm not sure how you can avoid the
window for inconsistencies.

thanks
-john



  parent reply	other threads:[~2008-01-10 22:52 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 23:29 [RFC PATCH 00/22 -v2] mcount and latency tracing utility -v2 Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 01/22 -v2] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-10 18:19   ` Jan Kiszka
2008-01-10 19:54     ` Steven Rostedt
2008-01-10 23:02     ` Steven Rostedt
2008-01-10 18:28   ` Sam Ravnborg
2008-01-10 19:10     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 02/22 -v2] Annotate core code that should not be traced Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 03/22 -v2] x86_64: notrace annotations Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 04/22 -v2] add notrace annotations to vsyscall Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 05/22 -v2] add notrace annotations for NMI routines Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 06/22 -v2] mcount based trace in the form of a header file library Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 07/22 -v2] tracer add debugfs interface Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 08/22 -v2] mcount tracer output file Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 09/22 -v2] mcount tracer show task comm and pid Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 10/22 -v2] Add a symbol only trace output Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 11/22 -v2] Reset the tracer when started Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 12/22 -v2] separate out the percpu date into a percpu struct Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays Steven Rostedt
2008-01-10  0:00   ` john stultz
2008-01-10  0:09     ` Steven Rostedt
2008-01-10 19:54     ` Tony Luck
2008-01-10 20:15       ` Steven Rostedt
2008-01-10 20:41         ` john stultz
2008-01-10 20:29       ` john stultz
2008-01-10 20:42         ` Mathieu Desnoyers
2008-01-10 21:25           ` john stultz
2008-01-10 22:00             ` Mathieu Desnoyers
2008-01-10 22:40               ` Steven Rostedt
2008-01-10 22:51               ` john stultz [this message]
2008-01-10 23:05                 ` john stultz
2008-01-10 21:33         ` [RFC PATCH 13/22 -v2] handle accurate time keeping over longdelays Luck, Tony
2008-01-10  0:19   ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays john stultz
2008-01-10  0:25     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 14/22 -v2] time keeping add cycle_raw for actual incrementation Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 15/22 -v2] initialize the clock source to jiffies clock Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 16/22 -v2] add get_monotonic_cycles Steven Rostedt
2008-01-10  3:28   ` Daniel Walker
2008-01-15 21:46   ` Mathieu Desnoyers
2008-01-15 22:01     ` Steven Rostedt
2008-01-15 22:03       ` Steven Rostedt
2008-01-15 22:08       ` Mathieu Desnoyers
2008-01-16  1:38         ` Steven Rostedt
2008-01-16  3:17           ` Mathieu Desnoyers
2008-01-16 13:17             ` Steven Rostedt
2008-01-16 14:56               ` Mathieu Desnoyers
2008-01-16 15:06                 ` Steven Rostedt
2008-01-16 15:28                   ` Mathieu Desnoyers
2008-01-16 15:58                     ` Steven Rostedt
2008-01-16 17:00                       ` Mathieu Desnoyers
2008-01-16 17:49                         ` Mathieu Desnoyers
2008-01-16 19:43                         ` Steven Rostedt
2008-01-16 20:17                           ` Mathieu Desnoyers
2008-01-16 20:45                             ` Tim Bird
2008-01-16 20:49                             ` Steven Rostedt
2008-01-17 20:08                             ` Steven Rostedt
2008-01-17 20:37                               ` Frank Ch. Eigler
2008-01-17 21:03                                 ` Steven Rostedt
2008-01-18 22:26                                   ` Mathieu Desnoyers
2008-01-18 22:49                                     ` Steven Rostedt
2008-01-18 23:19                                       ` Mathieu Desnoyers
2008-01-19  3:36                                         ` Frank Ch. Eigler
2008-01-19  3:55                                           ` Steven Rostedt
2008-01-19  4:23                                             ` Frank Ch. Eigler
2008-01-19 15:29                                               ` Mathieu Desnoyers
2008-01-19  3:32                                       ` Frank Ch. Eigler
2008-01-16 18:01                       ` Tim Bird
2008-01-16 22:36                 ` john stultz
2008-01-16 22:51                   ` john stultz
2008-01-16 23:33                     ` Steven Rostedt
2008-01-17  2:28                       ` john stultz
2008-01-17  2:40                         ` Mathieu Desnoyers
2008-01-17  2:50                           ` Mathieu Desnoyers
2008-01-17  3:02                             ` Steven Rostedt
2008-01-17  3:21                             ` Paul Mackerras
2008-01-17  3:39                               ` Steven Rostedt
2008-01-17  4:22                                 ` Mathieu Desnoyers
2008-01-17  4:25                                 ` Mathieu Desnoyers
2008-01-17  4:14                               ` Mathieu Desnoyers
2008-01-17 15:22                                 ` Steven Rostedt
2008-01-17 17:46                                 ` Linus Torvalds
2008-01-17  2:51                           ` Steven Rostedt
2008-01-16 23:39                     ` Mathieu Desnoyers
2008-01-16 23:50                       ` Steven Rostedt
2008-01-17  0:36                         ` Steven Rostedt
2008-01-17  0:33                       ` john stultz
2008-01-17  2:20                         ` Mathieu Desnoyers
2008-01-17  1:03                       ` Linus Torvalds
2008-01-17  1:35                         ` Mathieu Desnoyers
2008-01-17  2:20                       ` john stultz
2008-01-17  2:35                         ` Mathieu Desnoyers
2008-01-09 23:29 ` [RFC PATCH 17/22 -v2] Add timestamps to tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 18/22 -v2] Sort trace by timestamp Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 19/22 -v2] speed up the output of the tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 20/22 -v2] Add latency_trace format tor tracer Steven Rostedt
2008-01-10  3:41   ` Daniel Walker
2008-01-09 23:29 ` [RFC PATCH 21/22 -v2] Split out specific tracing functions Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 22/22 -v2] Trace irq disabled critical timings Steven Rostedt
2008-01-10  3:58   ` Daniel Walker
2008-01-10 14:45     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1200005517.30225.75.camel@localhost.localdomain \
    --to=johnstul@us.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=akpm@linux-foundation.org \
    --cc=bob.picco@hp.com \
    --cc=fche@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=sam@ravnborg.org \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.bird@am.sony.com \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).