LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Hellwig <hch@infradead.org>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tim Bird <tim.bird@am.sony.com>, Sam Ravnborg <sam@ravnborg.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Steven Rostedt <srostedt@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	Daniel Walker <dwalker@mvista.com>
Subject: Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles
Date: Wed, 16 Jan 2008 09:56:04 -0500	[thread overview]
Message-ID: <20080116145604.GB31329@Krystal> (raw)
In-Reply-To: <Pine.LNX.4.58.0801152238130.19680@gandalf.stny.rr.com>

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> [ CC'd Daniel Walker, since he had problems with this code ]
> 
> On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> >
> > I agree with you that I don't see how the compiler could reorder this.
> > So we forget about compiler barriers. Also, the clock source used is a
> > synchronized clock source (get_cycles_sync on x86_64), so it should make
> > sure the TSC is read at the right moment.
> >
> > However, what happens if the clock source is, say, the jiffies ?
> >
> > Is this case, we have :
> >
> > static cycle_t jiffies_read(void)
> > {
> >         return (cycle_t) jiffies;
> > }
> >
> > Which is nothing more than a memory read of
> >
> > extern unsigned long volatile __jiffy_data jiffies;
> 
> Yep, and that's not my concern.
> 

Hrm, I will reply to the rest of this email in a separate mail, but
there is another concern, simpler than memory ordering, that just hit
me :

If we have CPU A calling clocksource_accumulate while CPU B is calling
get_monotonic_cycles, but events happens in the following order (because
of preemption or interrupts). Here, to make things worse, we would be on
x86 where cycle_t is not an atomic write (64 bits) :


CPU A                  CPU B

clocksource read
update cycle_mono (1st 32 bits)
                       read cycle_mono
                       read cycle_last
                       clocksource read
                       read cycle_mono
                       read cycle_last
update cycle_mono (2nd 32 bits)
update cycle_last
update cycle_acc

Therefore, we have :
- an inconsistant cycle_monotonic value
- inconsistant cycle_monotonic and cycle_last values.

Or is there something I have missed ?

If you really want an seqlock free algorithm (I _do_ want this for
tracing!) :) maybe going in the RCU direction could help (I refer to my
RCU-based 32-to-64 bits lockless timestamp counter extension, which
could be turned into the clocksource updater).

Mathieu

> >
> > I think it is wrong to assume that reads from clock->cycle_raw and from
> > jiffies will be ordered correctly in SMP. I am tempted to think that
> > ordering memory writes to clock->cycle_raw vs jiffies is also needed in this
> > case (where clock->cycle_raw is updated, or where jiffies is updated).
> >
> > We can fall in the same kind of issue if we read the HPET, which is
> > memory I/O based. It does not seems correct to assume that MMIO vs
> > normal memory reads are ordered. (pointing back to this article :
> > http://lwn.net/Articles/198988/)
> 
> That and the dread memory barrier thread that my head is still spinning
> on.
> 
> Ok, lets take a close look at the code in question. I may be wrong, and if
> so, great, we can fix it.
> 
> We have this in get_monotonic_cycles:
> 
> {
> 	cycle_t cycle_now, cycle_delta, cycle_monotonic, cycle_last;
> 	do {
> 		cycle_monotonic = clock->cycle_monotonic;
> 		cycle_last = clock->cycle_last;
> 		cycle_now = clocksource_read(clock);
> 		cycle_delta = (cycle_now - cycle_last) & clock->mask;
> 	} while (cycle_monotonic != clock->cycle_monotonic ||
> 		 cycle_last != clock->cycle_last);
> 	return cycle_monotonic + cycle_delta;
> }
> 
> and this in clocksource.h
> 
> static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
> {
> 	cycle_t offset = (now - cs->cycle_last) & cs->mask;
> 	cs->cycle_last = now;
> 	cs->cycle_accumulated += offset;
> 	cs->cycle_monotonic += offset;
> }
> 
> now is usually just a clocksource_read() passed in.
> 
> The goal is to have clock_monotonic always return something that is
> greater than what was read the last time.
> 
> Let's make a few assumptions now (for others to shoot them down). One
> thing is that we don't need to worry too much about MMIO, because we are
> doing a read. This means we need the data right now to contiune. So the
> read being a function call should keep gcc from moving stuff around, and
> since we are doing an IO read, the order of events should be pretty much
> synchronized. in
> 
>     1. load cycle_last and cycle_monotonic (we don't care which order)*
>     2. read clock source
>     3. calculate delta and while() compare (order doesn't matter)
> 
> * we might care (see below)
> 
> If the above is incorrect, then we need to fix get_monotonic_cycles.
> 
> in clocksource_accumulate, we have:
> 
>   offset = ((now = cs->read()) - cycle_last) & cs->mask;
>   cycle_last = now;
>   cycle_accumulate += offset;
>   cycle_monotonic += offset;
> 
> The order of events here are. Using the same reasoning as above, the read
> must be first and completed because for gcc it's a function, and for IO,
> it needs to return data.
> 
>   1. cs->read
>   2. update cycle_last, cycle_accumulate, cycle_monotonic.
> 
> Can we assume, if the above for get_monotonic_cycles is correct, that
> since we read and compare cycle_last and cycle_monotonic, that neither of
> them have changed over the read? So we have a snapshot of the
> clocksource_accumulate.
> 
> So the worst thing that I can think of, is that cycle_monotonic is update
> *before* cycle_last:
> 
>    cycle_monotonic += offest;
>      <get_monotonic_cycles run on other CPU>
>    cycle_last = now;
> 
> 
> cycle_last = 5
> cycle_monotonic = 0
> 
> 
>     CPU 0                         CPU 1
>   ----------                  -------------
>  cs->read() = 10
>  offset = 10 - 5 = 5
>  cycle_monotonic = 5
>                             cycle_monotonic = 5
>                             cycle_last = 5
>                             cs->read() = 11
>                             delta = 11 - 5 = 6
>                             cycle_monotonic and cycle_last still same
>                             return 5 + 6 = 11
> 
>   cycle_last = 10
> 
>                             cycle_monotonic = 5
>                             cycle_last = 10
>                             cs->read() = 12
>                             delta = 12 - 10 = 2
>                             cycle_monotonic and cycle_last still same
>                             return 5 + 2 = 7
> 
>                            **** ERROR *****
> 
> So, we *do* need memory barriers.  Looks like cycle_last and
> cycle_monotonic need to be synchronized.
> 
> OK, will this do?
> 
> cycle_t notrace get_monotonic_cycles(void)
> {
>         cycle_t cycle_now, cycle_delta, cycle_monotonic, cycle_last;
>         do {
>                 cycle_monotonic = clock->cycle_monotonic;
> 		smp_rmb();
>                 cycle_last = clock->cycle_last;
>                 cycle_now = clocksource_read(clock);
>                 cycle_delta = (cycle_now - cycle_last) & clock->mask;
>         } while (cycle_monotonic != clock->cycle_monotonic ||
>                  cycle_last != clock->cycle_last);
>         return cycle_monotonic + cycle_delta;
> }
> 
> and this in clocksource.h
> 
> static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
> {
>         cycle_t offset = (now - cs->cycle_last) & cs->mask;
>         cs->cycle_last = now;
> 	smp_wmb();
>         cs->cycle_accumulated += offset;
>         cs->cycle_monotonic += offset;
> }
> 
> We may still get to a situation where cycle_monotonic is of the old value
> and cycle_last is of the new value. That would give us a smaller delta
> than we want.
> 
> Lets look at this, with a slightly different situation.
> 
> cycle_last = 5
> cycle_monotonic = 0
> 
> 
>     CPU 0                         CPU 1
>   ----------                  -------------
>  cs->read() = 10
>  offset = 10 - 5 = 5
>  cycle_last = 10
>  cycle_monotonic = 5
> 
>                             cycle_monotonic = 5
>                             cycle_last = 10
>                             cs->read() = 12
>                             delta = 12 - 10 = 2
>                             cycle_monotonic and cycle_last still same
>                             return 5 + 2 = 7
> 
> 
>  cs->read() = 13
>  offset = 13 - 10 = 2
>  cycle_last = 13
> 
>                             cycle_monotonic = 5
>                             cycle_last = 13
>                             cs->read() = 14
>                             delta = 14 - 13 = 1
>                             cycle_monotonic and cycle_last still same
>                             return 5 + 1 = 6
> 
>                         **** ERROR ****
> 
> Crap, looks like we do need a stronger locking here :-(
> 
> Hmm, I might as well just use seq_locks, and make sure that tracing
> does not hit them.
> 
> Thanks!
> 
> -- Steve
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-01-16 14:56 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 23:29 [RFC PATCH 00/22 -v2] mcount and latency tracing utility -v2 Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 01/22 -v2] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-10 18:19   ` Jan Kiszka
2008-01-10 19:54     ` Steven Rostedt
2008-01-10 23:02     ` Steven Rostedt
2008-01-10 18:28   ` Sam Ravnborg
2008-01-10 19:10     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 02/22 -v2] Annotate core code that should not be traced Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 03/22 -v2] x86_64: notrace annotations Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 04/22 -v2] add notrace annotations to vsyscall Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 05/22 -v2] add notrace annotations for NMI routines Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 06/22 -v2] mcount based trace in the form of a header file library Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 07/22 -v2] tracer add debugfs interface Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 08/22 -v2] mcount tracer output file Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 09/22 -v2] mcount tracer show task comm and pid Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 10/22 -v2] Add a symbol only trace output Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 11/22 -v2] Reset the tracer when started Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 12/22 -v2] separate out the percpu date into a percpu struct Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays Steven Rostedt
2008-01-10  0:00   ` john stultz
2008-01-10  0:09     ` Steven Rostedt
2008-01-10 19:54     ` Tony Luck
2008-01-10 20:15       ` Steven Rostedt
2008-01-10 20:41         ` john stultz
2008-01-10 20:29       ` john stultz
2008-01-10 20:42         ` Mathieu Desnoyers
2008-01-10 21:25           ` john stultz
2008-01-10 22:00             ` Mathieu Desnoyers
2008-01-10 22:40               ` Steven Rostedt
2008-01-10 22:51               ` john stultz
2008-01-10 23:05                 ` john stultz
2008-01-10 21:33         ` [RFC PATCH 13/22 -v2] handle accurate time keeping over longdelays Luck, Tony
2008-01-10  0:19   ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays john stultz
2008-01-10  0:25     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 14/22 -v2] time keeping add cycle_raw for actual incrementation Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 15/22 -v2] initialize the clock source to jiffies clock Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 16/22 -v2] add get_monotonic_cycles Steven Rostedt
2008-01-10  3:28   ` Daniel Walker
2008-01-15 21:46   ` Mathieu Desnoyers
2008-01-15 22:01     ` Steven Rostedt
2008-01-15 22:03       ` Steven Rostedt
2008-01-15 22:08       ` Mathieu Desnoyers
2008-01-16  1:38         ` Steven Rostedt
2008-01-16  3:17           ` Mathieu Desnoyers
2008-01-16 13:17             ` Steven Rostedt
2008-01-16 14:56               ` Mathieu Desnoyers [this message]
2008-01-16 15:06                 ` Steven Rostedt
2008-01-16 15:28                   ` Mathieu Desnoyers
2008-01-16 15:58                     ` Steven Rostedt
2008-01-16 17:00                       ` Mathieu Desnoyers
2008-01-16 17:49                         ` Mathieu Desnoyers
2008-01-16 19:43                         ` Steven Rostedt
2008-01-16 20:17                           ` Mathieu Desnoyers
2008-01-16 20:45                             ` Tim Bird
2008-01-16 20:49                             ` Steven Rostedt
2008-01-17 20:08                             ` Steven Rostedt
2008-01-17 20:37                               ` Frank Ch. Eigler
2008-01-17 21:03                                 ` Steven Rostedt
2008-01-18 22:26                                   ` Mathieu Desnoyers
2008-01-18 22:49                                     ` Steven Rostedt
2008-01-18 23:19                                       ` Mathieu Desnoyers
2008-01-19  3:36                                         ` Frank Ch. Eigler
2008-01-19  3:55                                           ` Steven Rostedt
2008-01-19  4:23                                             ` Frank Ch. Eigler
2008-01-19 15:29                                               ` Mathieu Desnoyers
2008-01-19  3:32                                       ` Frank Ch. Eigler
2008-01-16 18:01                       ` Tim Bird
2008-01-16 22:36                 ` john stultz
2008-01-16 22:51                   ` john stultz
2008-01-16 23:33                     ` Steven Rostedt
2008-01-17  2:28                       ` john stultz
2008-01-17  2:40                         ` Mathieu Desnoyers
2008-01-17  2:50                           ` Mathieu Desnoyers
2008-01-17  3:02                             ` Steven Rostedt
2008-01-17  3:21                             ` Paul Mackerras
2008-01-17  3:39                               ` Steven Rostedt
2008-01-17  4:22                                 ` Mathieu Desnoyers
2008-01-17  4:25                                 ` Mathieu Desnoyers
2008-01-17  4:14                               ` Mathieu Desnoyers
2008-01-17 15:22                                 ` Steven Rostedt
2008-01-17 17:46                                 ` Linus Torvalds
2008-01-17  2:51                           ` Steven Rostedt
2008-01-16 23:39                     ` Mathieu Desnoyers
2008-01-16 23:50                       ` Steven Rostedt
2008-01-17  0:36                         ` Steven Rostedt
2008-01-17  0:33                       ` john stultz
2008-01-17  2:20                         ` Mathieu Desnoyers
2008-01-17  1:03                       ` Linus Torvalds
2008-01-17  1:35                         ` Mathieu Desnoyers
2008-01-17  2:20                       ` john stultz
2008-01-17  2:35                         ` Mathieu Desnoyers
2008-01-09 23:29 ` [RFC PATCH 17/22 -v2] Add timestamps to tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 18/22 -v2] Sort trace by timestamp Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 19/22 -v2] speed up the output of the tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 20/22 -v2] Add latency_trace format tor tracer Steven Rostedt
2008-01-10  3:41   ` Daniel Walker
2008-01-09 23:29 ` [RFC PATCH 21/22 -v2] Split out specific tracing functions Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 22/22 -v2] Trace irq disabled critical timings Steven Rostedt
2008-01-10  3:58   ` Daniel Walker
2008-01-10 14:45     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080116145604.GB31329@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=akpm@linux-foundation.org \
    --cc=dwalker@mvista.com \
    --cc=fche@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --cc=sam@ravnborg.org \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.bird@am.sony.com \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).