LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tim Bird <tim.bird@am.sony.com>, Sam Ravnborg <sam@ravnborg.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	John Stultz <johnstul@us.ibm.com>,
	Arjan van de Ven <arjan@infradead.org>,
	Steven Rostedt <srostedt@redhat.com>
Subject: Re: [PATCH 02/22 -v7] Add basic support for gcc profiler instrumentation
Date: Mon, 4 Feb 2008 22:11:01 -0800	[thread overview]
Message-ID: <20080205061101.GD8457@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080204224139.GA10564@Krystal>

On Mon, Feb 04, 2008 at 05:41:40PM -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > 
> > On Mon, 4 Feb 2008, Paul E. McKenney wrote:
> > > OK, will see what I can do...
> > >
> > > > On Sat, 2 Feb 2008, Paul E. McKenney wrote:
> > > >
> > > > > Yep, you have dependencies, so something like the following:
> > > > >
> > > > > initial state:
> > > > >
> > > > > 	struct foo {
> > > > > 		int a;
> > > > > 	};
> > > > > 	struct foo x = { 0 };
> > > > > 	struct foo y = { 0 };
> > > > > 	struct foo *global_p = &y;
> > > > > 	/* other variables are appropriately declared auto variables */
> > > > >
> > > > > 	/* No kmalloc() or kfree(), hence no RCU grace periods. */
> > > > > 	/* In the terminology of http://lwn.net/Articles/262464/, we */
> > > > > 	/* are doing only publish-subscribe, nothing else. */
> > > > >
> > > > > writer:
> > > > >
> > > > > 	x.a = 1;
> > > > > 	smp_wmb();  /* or smp_mb() */
> > > > > 	global_p = &x;
> > > > >
> > > > > reader:
> > > > >
> > > > > 	p = global_p;
> > > > > 	ta = p->a;
> > > > >
> > > > > Both Alpha and aggressive compiler optimizations can result in the reader
> > > > > seeing the new value of the pointer (&x) but the old value of the field
> > > > > (0).  Strange but true.  The fix is as follows:
> > > > >
> > > > > reader:
> > > > >
> > > > > 	p = global_p;
> > > > > 	smp_read_barrier_depends();  /* or use rcu_dereference() */
> > > > > 	ta = p->a;
> > > > >
> > > > > So how can this happen?  First note that if smp_read_barrier_depends()
> > > > > was unnecessary in this case, it would be unnecessary in all cases.
> > > > >
> > > > > Second, let's start with the compiler.  Suppose that a highly optimizing
> > > > > compiler notices that in almost all cases, the reader finds p==global_p.
> > > > > Suppose that this compiler also notices that one of the registers (say
> > > > > r1) almost always contains this expected value of global_p, and that
> > > > > cache pressure ensures that an actual load from global_p almost always
> > > > > generates an expensive cache miss.  Such a compiler would be within its
> > > > > rights (as defined by the C standard) to generate code assuming that r1
> > > > > already had the right value, while also generating code to validate this
> > > > > assumption, perhaps as follows:
> > > > >
> > > > > 	r2 = global_p;  /* high latency, other things complete meanwhile */
> > > > > 	ta == r1->a;
> > > > > 	if (r1 != r2)
> > > > > 		ta = r2->a;
> > > > >
> > > > > Now consider the following sequence of events on a superscalar CPU:
> > > >
> > > > I think you missed one step here (causing my confusion). I don't want to
> > > > assume so I'll try to put in the missing step:
> > > >
> > > > 	writer: r1 = p;  /* happens to use r1 to store parameter p */
> > >
> > > You lost me on this one...  The writer has only the following three steps:
> > 
> > You're right. I meant "writer:  r1 = x;"
> > 
> > >
> > > writer:
> > >
> > > 	x.a = 1;
> > > 	smp_wmb();  /* or smp_mb() */
> > > 	global_p = &x;
> > >
> > > Where did the "r1 = p" come from?  For that matter, where did "p" come
> > > from?
> > >
> > > > > 	reader: r2 = global_p; /* issued, has not yet completed. */
> > > > > 	reader: ta = r1->a; /* which gives zero. */
> > > > > 	writer: x.a = 1;
> > > > > 	writer: smp_wmb();
> > > > > 	writer: global_p = &x;
> > > > > 	reader: r2 = global_p; /* this instruction now completes */
> > > > > 	reader: if (r1 != r2) /* and these are equal, so we keep bad ta! */
> > > >
> > > > Is that the case?
> > >
> > > Ah!  Please note that I am doing something unusual here in that I am
> > > working with global variables, as opposed to the normal RCU practice of
> > > dynamically allocating memory.  So "x" is just a global struct, not a
> > > pointer to a struct.
> > >
> > 
> > But lets look at a simple version of my original code anyway ;-)
> > 
> > Writer:
> > 
> > void add_op(struct myops *x) {
> > 	/* x->next may be garbage here */
> > 	x->next = global_p;
> > 	smp_wmb();
> > 	global_p = x;
> > }
> > 
> > Reader:
> > 
> > void read_op(void)
> > {
> > 	struct myops *p = global_p;
> > 
> > 	while (p != NULL) {
> > 		p->func();
> > 		p = next;
> > 		/* if p->next is garbage we crash */
> > 	}
> > }
> > 
> > 
> > Here, we are missing the read_barrier_depends(). Lets look at the Alpha
> > cache issue:
> > 
> > 
> > reader reads the new version of global_p, and then reads the next
> > pointer. But since the next pointer is on a different cacheline than
> > global_p, it may have somehow had that in it's cache still. So it uses the
> > old next pointer which contains the garbage.
> > 
> > Is that correct?
> > 
> > But I will have to admit, that I can't see how an aggressive compiler
> > might have screwed this up. Being that x is a parameter, and the function
> > add_op is not in a header file.
> > 
> 
> Tell me if I am mistakened, but applying Paul's explanation to your
> example would give (I unroll the loop for clarity) :
> 
> Writer:
> 
> void add_op(struct myops *x) {
> 	/* x->next may be garbage here */
> 	x->next = global_p;
> 	smp_wmb();
> 	global_p = x;
> }
> 
> Reader:
> 
> void read_op(void)
> {
> 	struct myops *p = global_p;
> 
>   if (p != NULL) {
> 		p->func();
> 		p = p->next;
>   /*
>    * Suppose the compiler expects that p->next is likely to be equal to
>    * p + sizeof(struct myops), uses r1 to store previous p, r2 to store the
>    * next p and r3 to store the expected value. Let's look at what the
>    * compiler could do for the next loop iteration.
>    */
>   r2 = r1->next   (1)
>   r3 = r1 + sizeof(struct myops)
>   r4 = r3->func   (2)
>   if (r3 == r2 && r3 != NULL)
>     call r4
> 
> 		/* if p->next is garbage we crash */
> 	} else
>     return;
> 
>   if (p != NULL) {
> 		p->func();
> 		p = p->next;
> 		/* if p->next is garbage we crash */
> 	} else
>     return;
>   .....
> }
> 
> In this example, we would be reading the expected "r3->func" (2) before
> reading the real r1->next (1) value if reads are issued out of order.
> 
> Paul, am I correct ? And.. does the specific loop optimization I just
> described actually exist ?

This is indeed another form of value prediction.  Perhaps more common
in scientific applications, but one could imagine it occurring in the
kernel as well.

In some cases, the read from the real r1->next might be deferred until
after the computation so as to overlap the speculative computation with
the memory latency.  Border-line insane, perhaps, but some compiler
folks like this sort of approach...

> Thanks for your enlightenment :)

;-)

						Thanx, Paul

  reply	other threads:[~2008-02-05  6:57 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-30  3:15 [PATCH 00/22 -v7] mcount and latency tracing utility -v7 Steven Rostedt
2008-01-30  3:15 ` [PATCH 01/22 -v7] printk - dont wakeup klogd with interrupts disabled Steven Rostedt
2008-01-30  3:15 ` [PATCH 02/22 -v7] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-30  8:46   ` Peter Zijlstra
2008-01-30 13:08     ` Steven Rostedt
2008-01-30 14:09       ` Steven Rostedt
2008-01-30 14:25         ` Peter Zijlstra
2008-02-01 22:34           ` Paul E. McKenney
2008-02-02  1:56             ` Steven Rostedt
2008-02-02 21:41               ` Paul E. McKenney
2008-02-04 17:09                 ` Steven Rostedt
2008-02-04 21:40                   ` Paul E. McKenney
2008-02-04 22:03                     ` Steven Rostedt
2008-02-04 22:41                       ` Mathieu Desnoyers
2008-02-05  6:11                         ` Paul E. McKenney [this message]
2008-02-05  5:13                       ` Paul E. McKenney
2008-01-30 13:21   ` Jan Kiszka
2008-01-30 13:53     ` Steven Rostedt
2008-01-30 14:28       ` Steven Rostedt
2008-01-30  3:15 ` [PATCH 03/22 -v7] Annotate core code that should not be traced Steven Rostedt
2008-01-30  3:15 ` [PATCH 04/22 -v7] x86_64: notrace annotations Steven Rostedt
2008-01-30  3:15 ` [PATCH 05/22 -v7] add notrace annotations to vsyscall Steven Rostedt
2008-01-30  8:49   ` Peter Zijlstra
2008-01-30 13:15     ` Steven Rostedt
2008-01-30  3:15 ` [PATCH 06/22 -v7] handle accurate time keeping over long delays Steven Rostedt
2008-01-30  3:15 ` [PATCH 07/22 -v7] initialize the clock source to jiffies clock Steven Rostedt
2008-01-30  3:15 ` [PATCH 08/22 -v7] add get_monotonic_cycles Steven Rostedt
2008-01-30  3:15 ` [PATCH 09/22 -v7] add notrace annotations to timing events Steven Rostedt
2008-01-30  3:15 ` [PATCH 10/22 -v7] mcount based trace in the form of a header file library Steven Rostedt
2008-01-30  3:15 ` [PATCH 11/22 -v7] Add context switch marker to sched.c Steven Rostedt
2008-01-30  3:15 ` [PATCH 12/22 -v7] Make the task State char-string visible to all Steven Rostedt
2008-01-30  3:15 ` [PATCH 13/22 -v7] Add tracing of context switches Steven Rostedt
2008-01-30  3:15 ` [PATCH 14/22 -v7] Generic command line storage Steven Rostedt
2008-01-30  3:15 ` [PATCH 15/22 -v7] trace generic call to schedule switch Steven Rostedt
2008-01-30  3:15 ` [PATCH 16/22 -v7] Add marker in try_to_wake_up Steven Rostedt
2008-01-30  3:15 ` [PATCH 17/22 -v7] mcount tracer for wakeup latency timings Steven Rostedt
2008-01-30  9:31   ` Peter Zijlstra
2008-01-30 13:18     ` Steven Rostedt
2008-01-30  3:15 ` [PATCH 18/22 -v7] Trace irq disabled critical timings Steven Rostedt
2008-01-30  3:15 ` [PATCH 19/22 -v7] trace preempt off " Steven Rostedt
2008-01-30  9:40   ` Peter Zijlstra
2008-01-30 13:40     ` Steven Rostedt
2008-01-30  3:15 ` [PATCH 20/22 -v7] Add markers to various events Steven Rostedt
2008-01-30  3:15 ` [PATCH 21/22 -v7] Add event tracer Steven Rostedt
2008-01-30  3:15 ` [PATCH 22/22 -v7] Critical latency timings histogram Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080205061101.GD8457@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=fche@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=hch@infradead.org \
    --cc=jan.kiszka@siemens.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=sam@ravnborg.org \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.bird@am.sony.com \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [PATCH 02/22 -v7] Add basic support for gcc profiler instrumentation' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).