LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/2] add support for new persistent memory instructions
@ 2015-01-23 20:40 Ross Zwisler
  2015-01-23 20:40 ` [PATCH v2 1/2] x86: Add support for the pcommit instruction Ross Zwisler
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ross Zwisler @ 2015-01-23 20:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, H Peter Anvin, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

This patch set adds support for two new persistent memory instructions, pcommit
and clwb.  These instructions were announced in the document "Intel
Architecture Instruction Set Extensions Programming Reference" with reference
number 319433-022.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

These patches apply cleanly to v3.19-rc5.

Changes from v1:

 - This series no longer updates the DRM code or clflush_cache_range to use
   clwb instead of clflushopt, as there was concern over whether clflushopt was
   used in some cases to explicitly evict lines from the processor cache.  I'll
   leave it up to the owners of this code to decide how they want to use clwb.

 - Reworked the clwb patch so that it doesn't use xsaveopt, as xsaveopt wasn't
   included in all GCC versions that the kernel needs to support. The assembly
   is now sort of complex because we need to hard code the clwb instruction to
   use a known register, but we also need to pull in the parameter as a memory
   constraint so that gcc doesn't reorder this instruction around other
   accesses to the same memory location.  Many thanks to hpa and Boris Petkov
   for their help on getting this right.

Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>

Ross Zwisler (2):
  x86: Add support for the pcommit instruction
  x86: Add support for the clwb instruction

 arch/x86/include/asm/cpufeature.h    |  2 ++
 arch/x86/include/asm/special_insns.h | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] x86: Add support for the pcommit instruction
  2015-01-23 20:40 [PATCH v2 0/2] add support for new persistent memory instructions Ross Zwisler
@ 2015-01-23 20:40 ` Ross Zwisler
  2015-01-23 20:40 ` [PATCH v2 2/2] x86: Add support for the clwb instruction Ross Zwisler
  2015-01-23 23:03 ` [PATCH v2 0/2] add support for new persistent memory instructions H. Peter Anvin
  2 siblings, 0 replies; 12+ messages in thread
From: Ross Zwisler @ 2015-01-23 20:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, H Peter Anvin, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

Add support for the new pcommit instruction.  This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with reference number 319433-022.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
---
 arch/x86/include/asm/cpufeature.h    | 1 +
 arch/x86/include/asm/special_insns.h | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index bb9b258..dfdd689 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -220,6 +220,7 @@
 #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX		( 9*32+19) /* The ADCX and ADOX instructions */
 #define X86_FEATURE_SMAP	( 9*32+20) /* Supervisor Mode Access Prevention */
+#define X86_FEATURE_PCOMMIT	( 9*32+22) /* PCOMMIT instruction */
 #define X86_FEATURE_CLFLUSHOPT	( 9*32+23) /* CLFLUSHOPT instruction */
 #define X86_FEATURE_AVX512PF	( 9*32+26) /* AVX-512 Prefetch */
 #define X86_FEATURE_AVX512ER	( 9*32+27) /* AVX-512 Exponential and Reciprocal */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index e820c08..1709a2e 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -199,6 +199,12 @@ static inline void clflushopt(volatile void *__p)
 		       "+m" (*(volatile char __force *)__p));
 }
 
+static inline void pcommit(void)
+{
+	alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
+		    X86_FEATURE_PCOMMIT);
+}
+
 #define nop() asm volatile ("nop")
 
 
-- 
1.9.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] x86: Add support for the clwb instruction
  2015-01-23 20:40 [PATCH v2 0/2] add support for new persistent memory instructions Ross Zwisler
  2015-01-23 20:40 ` [PATCH v2 1/2] x86: Add support for the pcommit instruction Ross Zwisler
@ 2015-01-23 20:40 ` Ross Zwisler
  2015-01-23 23:03 ` [PATCH v2 0/2] add support for new persistent memory instructions H. Peter Anvin
  2 siblings, 0 replies; 12+ messages in thread
From: Ross Zwisler @ 2015-01-23 20:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, H Peter Anvin, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

Add support for the new clwb instruction.  This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with reference number 319433-022.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

Regarding the details of how the alternatives assembly is set up, we
need one additional byte at the beginning of the clflush so that we can
flip it into a clflushopt by changing that byte into a 0x66 prefix.  Two
options are to either insert a 1 byte ASM_NOP1, or to add a 1 byte
NOP_DS_PREFIX.  Both have no functional effect with the plain clflush,
but I've been told that executing a clflush + prefix should be faster
than executing a clflush + NOP.

We had to hard code the assembly for clwb because, lacking the ability
to assemble the clwb instruction itself, the next closest thing is to
have an xsaveopt instruction with a 0x66 prefix.  Unfortunately xsaveopt
itself is also relatively new, and isn't included by all the GCC
versions that the kernel needs to support.

I tested this patch with the following versions of GCC:
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
---
 arch/x86/include/asm/cpufeature.h    |  1 +
 arch/x86/include/asm/special_insns.h | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index dfdd689..dc91747 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -222,6 +222,7 @@
 #define X86_FEATURE_SMAP	( 9*32+20) /* Supervisor Mode Access Prevention */
 #define X86_FEATURE_PCOMMIT	( 9*32+22) /* PCOMMIT instruction */
 #define X86_FEATURE_CLFLUSHOPT	( 9*32+23) /* CLFLUSHOPT instruction */
+#define X86_FEATURE_CLWB	( 9*32+24) /* CLWB instruction */
 #define X86_FEATURE_AVX512PF	( 9*32+26) /* AVX-512 Prefetch */
 #define X86_FEATURE_AVX512ER	( 9*32+27) /* AVX-512 Exponential and Reciprocal */
 #define X86_FEATURE_AVX512CD	( 9*32+28) /* AVX-512 Conflict Detection */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 1709a2e..8883cbc 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -199,6 +199,20 @@ static inline void clflushopt(volatile void *__p)
 		       "+m" (*(volatile char __force *)__p));
 }
 
+static inline void clwb(volatile void *__p)
+{
+	volatile struct { char x[64]; } *p = __p;
+
+	asm volatile(ALTERNATIVE_2(
+		".byte " __stringify(NOP_DS_PREFIX) "; clflush (%[pax])",
+		".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
+		X86_FEATURE_CLFLUSHOPT,
+		".byte 0x66, 0x0f, 0xae, 0x30",  /* clwb (%%rax) */
+		X86_FEATURE_CLWB)
+		: [p] "+m" (*p)
+		: [pax] "a" (p));
+}
+
 static inline void pcommit(void)
 {
 	alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
-- 
1.9.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-23 20:40 [PATCH v2 0/2] add support for new persistent memory instructions Ross Zwisler
  2015-01-23 20:40 ` [PATCH v2 1/2] x86: Add support for the pcommit instruction Ross Zwisler
  2015-01-23 20:40 ` [PATCH v2 2/2] x86: Add support for the clwb instruction Ross Zwisler
@ 2015-01-23 23:03 ` H. Peter Anvin
  2015-01-24 11:14   ` Borislav Petkov
  2015-01-26 19:51   ` Ross Zwisler
  2 siblings, 2 replies; 12+ messages in thread
From: H. Peter Anvin @ 2015-01-23 23:03 UTC (permalink / raw)
  To: Ross Zwisler, linux-kernel; +Cc: Ingo Molnar, Thomas Gleixner, Borislav Petkov

On 01/23/2015 12:40 PM, Ross Zwisler wrote:
> This patch set adds support for two new persistent memory instructions, pcommit
> and clwb.  These instructions were announced in the document "Intel
> Architecture Instruction Set Extensions Programming Reference" with reference
> number 319433-022.
> 
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> 

Please explain in these patch descriptions what the instructions
actually do.

+	volatile struct { char x[64]; } *p = __p;
+
+	asm volatile(ALTERNATIVE_2(
+		".byte " __stringify(NOP_DS_PREFIX) "; clflush (%[pax])",
+		".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
+		X86_FEATURE_CLFLUSHOPT,
+		".byte 0x66, 0x0f, 0xae, 0x30",  /* clwb (%%rax) */
+		X86_FEATURE_CLWB)
+		: [p] "+m" (*p)
+		: [pax] "a" (p));

For the specific case of CLWB, we can use an "m" input rather than a
"+m" output, simply because CLWB (or CLFLUSH* used as a standin for CLWB
doesn't need to be ordered with respect to loads (whereas CLFLUSH* do).

Now, one can argue that for performance reasons we should should still
use "+m" in case we use the CLFLUSH* standin, to avoid flushing a cache
line to memory just to bring it back in.

+static inline void pcommit(void)
+{
+	alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
+		    X86_FEATURE_PCOMMIT);
+}
+

Should we use an SFENCE as a standin if pcommit is unavailable, in case
we end up using CLFLUSHOPT?

	-hpa



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-23 23:03 ` [PATCH v2 0/2] add support for new persistent memory instructions H. Peter Anvin
@ 2015-01-24 11:14   ` Borislav Petkov
  2015-01-26 19:59     ` Ross Zwisler
  2015-01-26 19:51   ` Ross Zwisler
  1 sibling, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2015-01-24 11:14 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ross Zwisler, linux-kernel, Ingo Molnar, Thomas Gleixner

On Fri, Jan 23, 2015 at 03:03:41PM -0800, H. Peter Anvin wrote:
> For the specific case of CLWB, we can use an "m" input rather than a
> "+m" output, simply because CLWB (or CLFLUSH* used as a standin for CLWB
> doesn't need to be ordered with respect to loads (whereas CLFLUSH* do).

Well, we could do something like:

        volatile struct { char x[64]; } *p = __p;

        if (static_cpu_has(X86_FEATURE_CLWB))
                asm volatile(".byte 0x66,0x0f,0xae,0x30" :: "m" (*p), "a" (p));
        else
                asm volatile(ALTERNATIVE(
                        ".byte 0x3e; clflush (%[pax])",
                        ".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
                        X86_FEATURE_CLFLUSHOPT)
                        : [p] "+m" (*p)
                        : [pax] "a" (p));

which would simplify the alternative macro too.

Generated asm looks ok to me (my objdump doesn't know CLWB yet :)):

0000000000000aa0 <myclflush>:
 aa0:   55                      push   %rbp
 aa1:   48 89 e5                mov    %rsp,%rbp
 aa4:   eb 0a                   jmp    ab0 <myclflush+0x10>
 aa6:   48 89 f8                mov    %rdi,%rax
 aa9:   66 0f ae 30             data16 xsaveopt (%rax)
 aad:   5d                      pop    %rbp
 aae:   c3                      retq
 aaf:   90                      nop
 ab0:   48 89 f8                mov    %rdi,%rax
 ab3:   3e 0f ae 38             clflush %ds:(%rax)
 ab7:   5d                      pop    %rbp
 ab8:   c3                      retq

> Should we use an SFENCE as a standin if pcommit is unavailable, in case
> we end up using CLFLUSHOPT?

Btw, is PCOMMIT a lightweight SFENCE for this persistent memory aspect
to make sure stuff has become persistent after executing it? But not all
stuff like SFENCE so SFENCE is the bigger hammer?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-23 23:03 ` [PATCH v2 0/2] add support for new persistent memory instructions H. Peter Anvin
  2015-01-24 11:14   ` Borislav Petkov
@ 2015-01-26 19:51   ` Ross Zwisler
  2015-01-26 20:05     ` H. Peter Anvin
  1 sibling, 1 reply; 12+ messages in thread
From: Ross Zwisler @ 2015-01-26 19:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Borislav Petkov

On Fri, 2015-01-23 at 15:03 -0800, H. Peter Anvin wrote:
> On 01/23/2015 12:40 PM, Ross Zwisler wrote:
> > This patch set adds support for two new persistent memory instructions, pcommit
> > and clwb.  These instructions were announced in the document "Intel
> > Architecture Instruction Set Extensions Programming Reference" with reference
> > number 319433-022.
> > 
> > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> > 
> 
> Please explain in these patch descriptions what the instructions
> actually do.

Sure, will do.

> +	volatile struct { char x[64]; } *p = __p;
> +
> +	asm volatile(ALTERNATIVE_2(
> +		".byte " __stringify(NOP_DS_PREFIX) "; clflush (%[pax])",
> +		".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
> +		X86_FEATURE_CLFLUSHOPT,
> +		".byte 0x66, 0x0f, 0xae, 0x30",  /* clwb (%%rax) */
> +		X86_FEATURE_CLWB)
> +		: [p] "+m" (*p)
> +		: [pax] "a" (p));
> 
> For the specific case of CLWB, we can use an "m" input rather than a
> "+m" output, simply because CLWB (or CLFLUSH* used as a standin for CLWB
> doesn't need to be ordered with respect to loads (whereas CLFLUSH* do).
> 
> Now, one can argue that for performance reasons we should should still
> use "+m" in case we use the CLFLUSH* standin, to avoid flushing a cache
> line to memory just to bring it back in.

Understood, and an interesting point.  It seems like we can be correct using
either, yea?  I guess I'm happy with "+m" output since it's consistent with
clflush and clflushopt, and since we avoid the clflush* then read issue.
Please let me know if you have a preference.

> +static inline void pcommit(void)
> +{
> +	alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
> +		    X86_FEATURE_PCOMMIT);
> +}
> +
> 
> Should we use an SFENCE as a standin if pcommit is unavailable, in case
> we end up using CLFLUSHOPT?

Ah, sorry, I really need to include an example flow in my patch descriptions
to make this more clear. :)

Both the flushes (wmb/clflushopt/clflush) and the pcommit are ordered by
either mfence or sfence.

An example function that flushes and commits a buffer could look like this
(based on clflush_cache_range):

void flush_and_commit_buffer(void *vaddr, unsigned int size)
{       
        void *vend = vaddr + size - 1;
        
        for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
                clwb(vaddr);
        
        /* Flush any possible final partial cacheline */
        clwb(vend);
        
        /* 
         * sfence to order clwb/clflushopt/clflush cache flushes
         * mfence via mb() also works 
         */
        wmb();

        pcommit();

        /* 
         * sfence to order pcommit
         * mfence via mb() also works 
         */
        wmb();
}

In this example function I don't begin with a fence because clwb (which may
fall back to clflush/clflushopt) will be ordered with respect to either writes
or reads and writes depending on whether the argument is given as an input or
output parameter.

If the platform doesn't support PCOMMIT, you end up with this:

void flush_and_commit_buffer(void *vaddr, unsigned int size)
{       
        void *vend = vaddr + size - 1;
        
        for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
                clwb(vaddr);
        
        /* Flush any possible final partial cacheline */
        clwb(vend);
        
        /* 
         * sfence to order clwb/clflushopt/clflush cache flushes
         * mfence via mb() also works 
         */
        wmb();

        nop(); /* from pcommit(), via alternatives */

        /* 
         * sfence to order pcommit
         * mfence via mb() also works 
         */
        wmb();
}

This is fine, but now you've got two fences in a row.  Another slightly more
messy choice would be to include the fence in the pcommit assembly, so
you either get pcommit + sfence or a pair of NOPs.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-24 11:14   ` Borislav Petkov
@ 2015-01-26 19:59     ` Ross Zwisler
  2015-01-26 21:34       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Ross Zwisler @ 2015-01-26 19:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Thomas Gleixner

On Sat, 2015-01-24 at 12:14 +0100, Borislav Petkov wrote:
> On Fri, Jan 23, 2015 at 03:03:41PM -0800, H. Peter Anvin wrote:
> > For the specific case of CLWB, we can use an "m" input rather than a
> > "+m" output, simply because CLWB (or CLFLUSH* used as a standin for CLWB
> > doesn't need to be ordered with respect to loads (whereas CLFLUSH* do).
> 
> Well, we could do something like:
> 
>         volatile struct { char x[64]; } *p = __p;
> 
>         if (static_cpu_has(X86_FEATURE_CLWB))
>                 asm volatile(".byte 0x66,0x0f,0xae,0x30" :: "m" (*p), "a" (p));
>         else
>                 asm volatile(ALTERNATIVE(
>                         ".byte 0x3e; clflush (%[pax])",
>                         ".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
>                         X86_FEATURE_CLFLUSHOPT)
>                         : [p] "+m" (*p)
>                         : [pax] "a" (p));
> 
> which would simplify the alternative macro too.

This is interesting!  I guess I'm confused as to how this solves the ordering
issue, though.  The "m" input vs "+m" output parameter will tell gcc whether
or not the assembly can be reordered at compile time with respect to reads at
that same location, correct?

So if we have an inline function that could either read or write from gcc's
point of view (input vs output parameter, depending on the branch), it seems
like it would be forced to fall back to the most restrictive case (assume it
will write), and not reorder with respect to reads.  If so, you'd end up in
the same place as using "+m" output, only now you've got an additional branch
instead of a 3-way alternative.

Am I misunderstanding this?

> Generated asm looks ok to me (my objdump doesn't know CLWB yet :)):
> 
> 0000000000000aa0 <myclflush>:
>  aa0:   55                      push   %rbp
>  aa1:   48 89 e5                mov    %rsp,%rbp
>  aa4:   eb 0a                   jmp    ab0 <myclflush+0x10>
>  aa6:   48 89 f8                mov    %rdi,%rax
>  aa9:   66 0f ae 30             data16 xsaveopt (%rax)
>  aad:   5d                      pop    %rbp
>  aae:   c3                      retq
>  aaf:   90                      nop
>  ab0:   48 89 f8                mov    %rdi,%rax
>  ab3:   3e 0f ae 38             clflush %ds:(%rax)
>  ab7:   5d                      pop    %rbp
>  ab8:   c3                      retq
> 
> > Should we use an SFENCE as a standin if pcommit is unavailable, in case
> > we end up using CLFLUSHOPT?
> 
> Btw, is PCOMMIT a lightweight SFENCE for this persistent memory aspect
> to make sure stuff has become persistent after executing it? But not all
> stuff like SFENCE so SFENCE is the bigger hammer?

Ah, yep, I definitely need to include an example flow in my commit comments.
:) Here's a snip from my reply to hpa, to save searching:

	Both the flushes (wmb/clflushopt/clflush) and the pcommit are ordered
	by either mfence or sfence.

	An example function that flushes and commits a buffer could look like
	this (based on clflush_cache_range):

	void flush_and_commit_buffer(void *vaddr, unsigned int size)
	{       
		void *vend = vaddr + size - 1;
		
		for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
			clwb(vaddr);
		
		/* Flush any possible final partial cacheline */
		clwb(vend);
		
		/* 
		 * sfence to order clwb/clflushopt/clflush cache flushes
		 * mfence via mb() also works 
		 */
		wmb();

		pcommit();

		/* 
		 * sfence to order pcommit
		 * mfence via mb() also works 
		 */
		wmb();
	}



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-26 19:51   ` Ross Zwisler
@ 2015-01-26 20:05     ` H. Peter Anvin
  0 siblings, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2015-01-26 20:05 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Borislav Petkov

On 01/26/2015 11:51 AM, Ross Zwisler wrote:
> 
> This is fine, but now you've got two fences in a row.  Another slightly more
> messy choice would be to include the fence in the pcommit assembly, so
> you either get pcommit + sfence or a pair of NOPs.
> 

If that is the required usage pattern, it should be part of the clwb()
inline at least.

	-hpa


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-26 19:59     ` Ross Zwisler
@ 2015-01-26 21:34       ` Borislav Petkov
  2015-01-26 21:50         ` Ross Zwisler
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2015-01-26 21:34 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, Jan 26, 2015 at 12:59:29PM -0700, Ross Zwisler wrote:
> This is interesting!  I guess I'm confused as to how this solves the ordering
> issue, though.  The "m" input vs "+m" output parameter will tell gcc whether
> or not the assembly can be reordered at compile time with respect to reads at
> that same location, correct?
> 
> So if we have an inline function that could either read or write from gcc's
> point of view (input vs output parameter, depending on the branch), it seems
> like it would be forced to fall back to the most restrictive case (assume it
> will write), and not reorder with respect to reads.  If so, you'd end up in
> the same place as using "+m" output, only now you've got an additional branch
> instead of a 3-way alternative.
> 
> Am I misunderstanding this?

No, you're not, that is the right question. I was simply hypothesizing
about how we could do what hpa suggests but I don't have any other ideas
about having an "m" and an "+m" in the same inline asm statement.

My hunch is, the moment we have an "+m", the reordering would be
suppressed and that would not give us the CLWB case where we don't have
to suppress reordering wrt reads.

> Ah, yep, I definitely need to include an example flow in my commit comments.
> :) Here's a snip from my reply to hpa, to save searching:
> 
> 	Both the flushes (wmb/clflushopt/clflush) and the pcommit are ordered
> 	by either mfence or sfence.
> 
> 	An example function that flushes and commits a buffer could look like
> 	this (based on clflush_cache_range):
> 
> 	void flush_and_commit_buffer(void *vaddr, unsigned int size)
> 	{       
> 		void *vend = vaddr + size - 1;
> 		
> 		for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
> 			clwb(vaddr);
> 		
> 		/* Flush any possible final partial cacheline */
> 		clwb(vend);
> 		
> 		/* 
> 		 * sfence to order clwb/clflushopt/clflush cache flushes
> 		 * mfence via mb() also works 
> 		 */
> 		wmb();
> 
> 		pcommit();

Oh, so you need an SFENCE to flush out the preceding in-flight writes
*and* PCOMMIT for the persistent memory ranges. Ok, makes sense, PCOMMIT
deals with the persistent stores.

> 		/* 
> 		 * sfence to order pcommit
> 		 * mfence via mb() also works 
> 		 */
> 		wmb();

Doc says PCOMMIT is not ordered wrt loads and SFENCE too. Don't we want
to be absolutely conservative here and use MFENCE both times?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-26 21:34       ` Borislav Petkov
@ 2015-01-26 21:50         ` Ross Zwisler
  2015-01-26 22:39           ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Ross Zwisler @ 2015-01-26 21:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, 2015-01-26 at 22:34 +0100, Borislav Petkov wrote:
> On Mon, Jan 26, 2015 at 12:59:29PM -0700, Ross Zwisler wrote:
> > 		/* 
> > 		 * sfence to order pcommit
> > 		 * mfence via mb() also works 
> > 		 */
> > 		wmb();
> 
> Doc says PCOMMIT is not ordered wrt loads and SFENCE too. Don't we want
> to be absolutely conservative here and use MFENCE both times?

The code, for easy viewing:
	void write_and_commit_buffer(void *vaddr, unsigned int size)
	{
		void *vend = vaddr + size - 1;

		for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
			clwb(vaddr);

		/* Flush any possible final partial cacheline */
		clwb(vend);

		/* 
		 * sfence to order clwb/clflushopt/clflush cache flushes
		 * mfence via mb() also works 
		 */
		wmb();

		pcommit();

		/* 
		 * sfence to order pcommit
		 * mfence via mb() also works 
		 */
		wmb();
	}

We can use MFENCE, but I don't think we need to.  With SFENCE we will be
ordered with respect to stores, and the flushes and pcommit will be ordered
with respect to one another.  I think you can sprinkle in loads anywhere you
want in that flow and everything will work. The worst that will happen is
that if you've used clflush or clflushopt you'll have to re-fetch something
you just flushed out of the CPU cache hierarchy, but you'll always get correct
data from your load and you'll always pcommit valid data to the DIMM.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-26 21:50         ` Ross Zwisler
@ 2015-01-26 22:39           ` Borislav Petkov
  2015-01-26 23:14             ` Ross Zwisler
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2015-01-26 22:39 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, Jan 26, 2015 at 02:50:07PM -0700, Ross Zwisler wrote:
> We can use MFENCE, but I don't think we need to. With SFENCE we will
> be ordered with respect to stores, and the flushes and pcommit will be
> ordered with respect to one another. I think you can sprinkle in loads
> anywhere you want in that flow and everything will work.


Ok, maybe we should hold down that sequence and this somewhere as a
this-is-how-you-should-do-pcommit-properly explanation or so... or do
you have an actual patch adding write_and_commit_buffer()?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] add support for new persistent memory instructions
  2015-01-26 22:39           ` Borislav Petkov
@ 2015-01-26 23:14             ` Ross Zwisler
  0 siblings, 0 replies; 12+ messages in thread
From: Ross Zwisler @ 2015-01-26 23:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, 2015-01-26 at 23:39 +0100, Borislav Petkov wrote:
> On Mon, Jan 26, 2015 at 02:50:07PM -0700, Ross Zwisler wrote:
> > We can use MFENCE, but I don't think we need to. With SFENCE we will
> > be ordered with respect to stores, and the flushes and pcommit will be
> > ordered with respect to one another. I think you can sprinkle in loads
> > anywhere you want in that flow and everything will work.
> 
> 
> Ok, maybe we should hold down that sequence and this somewhere as a
> this-is-how-you-should-do-pcommit-properly explanation or so... or do
> you have an actual patch adding write_and_commit_buffer()?

I don't have a patch to add and use this yet.  I'll add the flow to the
commit messages for both pcommit and clwb.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-01-26 23:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-23 20:40 [PATCH v2 0/2] add support for new persistent memory instructions Ross Zwisler
2015-01-23 20:40 ` [PATCH v2 1/2] x86: Add support for the pcommit instruction Ross Zwisler
2015-01-23 20:40 ` [PATCH v2 2/2] x86: Add support for the clwb instruction Ross Zwisler
2015-01-23 23:03 ` [PATCH v2 0/2] add support for new persistent memory instructions H. Peter Anvin
2015-01-24 11:14   ` Borislav Petkov
2015-01-26 19:59     ` Ross Zwisler
2015-01-26 21:34       ` Borislav Petkov
2015-01-26 21:50         ` Ross Zwisler
2015-01-26 22:39           ` Borislav Petkov
2015-01-26 23:14             ` Ross Zwisler
2015-01-26 19:51   ` Ross Zwisler
2015-01-26 20:05     ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).