LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v2 0/2] add support for new persistent memory instructions
Date: Mon, 26 Jan 2015 12:59:29 -0700	[thread overview]
Message-ID: <1422302369.13382.3.camel@theros.lm.intel.com> (raw)
In-Reply-To: <20150124111430.GA10084@pd.tnic>

On Sat, 2015-01-24 at 12:14 +0100, Borislav Petkov wrote:
> On Fri, Jan 23, 2015 at 03:03:41PM -0800, H. Peter Anvin wrote:
> > For the specific case of CLWB, we can use an "m" input rather than a
> > "+m" output, simply because CLWB (or CLFLUSH* used as a standin for CLWB
> > doesn't need to be ordered with respect to loads (whereas CLFLUSH* do).
> 
> Well, we could do something like:
> 
>         volatile struct { char x[64]; } *p = __p;
> 
>         if (static_cpu_has(X86_FEATURE_CLWB))
>                 asm volatile(".byte 0x66,0x0f,0xae,0x30" :: "m" (*p), "a" (p));
>         else
>                 asm volatile(ALTERNATIVE(
>                         ".byte 0x3e; clflush (%[pax])",
>                         ".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
>                         X86_FEATURE_CLFLUSHOPT)
>                         : [p] "+m" (*p)
>                         : [pax] "a" (p));
> 
> which would simplify the alternative macro too.

This is interesting!  I guess I'm confused as to how this solves the ordering
issue, though.  The "m" input vs "+m" output parameter will tell gcc whether
or not the assembly can be reordered at compile time with respect to reads at
that same location, correct?

So if we have an inline function that could either read or write from gcc's
point of view (input vs output parameter, depending on the branch), it seems
like it would be forced to fall back to the most restrictive case (assume it
will write), and not reorder with respect to reads.  If so, you'd end up in
the same place as using "+m" output, only now you've got an additional branch
instead of a 3-way alternative.

Am I misunderstanding this?

> Generated asm looks ok to me (my objdump doesn't know CLWB yet :)):
> 
> 0000000000000aa0 <myclflush>:
>  aa0:   55                      push   %rbp
>  aa1:   48 89 e5                mov    %rsp,%rbp
>  aa4:   eb 0a                   jmp    ab0 <myclflush+0x10>
>  aa6:   48 89 f8                mov    %rdi,%rax
>  aa9:   66 0f ae 30             data16 xsaveopt (%rax)
>  aad:   5d                      pop    %rbp
>  aae:   c3                      retq
>  aaf:   90                      nop
>  ab0:   48 89 f8                mov    %rdi,%rax
>  ab3:   3e 0f ae 38             clflush %ds:(%rax)
>  ab7:   5d                      pop    %rbp
>  ab8:   c3                      retq
> 
> > Should we use an SFENCE as a standin if pcommit is unavailable, in case
> > we end up using CLFLUSHOPT?
> 
> Btw, is PCOMMIT a lightweight SFENCE for this persistent memory aspect
> to make sure stuff has become persistent after executing it? But not all
> stuff like SFENCE so SFENCE is the bigger hammer?

Ah, yep, I definitely need to include an example flow in my commit comments.
:) Here's a snip from my reply to hpa, to save searching:

	Both the flushes (wmb/clflushopt/clflush) and the pcommit are ordered
	by either mfence or sfence.

	An example function that flushes and commits a buffer could look like
	this (based on clflush_cache_range):

	void flush_and_commit_buffer(void *vaddr, unsigned int size)
	{       
		void *vend = vaddr + size - 1;
		
		for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
			clwb(vaddr);
		
		/* Flush any possible final partial cacheline */
		clwb(vend);
		
		/* 
		 * sfence to order clwb/clflushopt/clflush cache flushes
		 * mfence via mb() also works 
		 */
		wmb();

		pcommit();

		/* 
		 * sfence to order pcommit
		 * mfence via mb() also works 
		 */
		wmb();
	}



  reply	other threads:[~2015-01-26 19:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-23 20:40 Ross Zwisler
2015-01-23 20:40 ` [PATCH v2 1/2] x86: Add support for the pcommit instruction Ross Zwisler
2015-01-23 20:40 ` [PATCH v2 2/2] x86: Add support for the clwb instruction Ross Zwisler
2015-01-23 23:03 ` [PATCH v2 0/2] add support for new persistent memory instructions H. Peter Anvin
2015-01-24 11:14   ` Borislav Petkov
2015-01-26 19:59     ` Ross Zwisler [this message]
2015-01-26 21:34       ` Borislav Petkov
2015-01-26 21:50         ` Ross Zwisler
2015-01-26 22:39           ` Borislav Petkov
2015-01-26 23:14             ` Ross Zwisler
2015-01-26 19:51   ` Ross Zwisler
2015-01-26 20:05     ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1422302369.13382.3.camel@theros.lm.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=h.peter.anvin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    --subject='Re: [PATCH v2 0/2] add support for new persistent memory instructions' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).