LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Mark Rutland <mark.rutland@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, aryabinin@virtuozzo.com,
	catalin.marinas@arm.com, dvyukov@google.com, will.deacon@arm.com
Subject: Re: [RFC PATCH] locking/atomics/powerpc: Introduce optimized cmpxchg_release() family of APIs for PowerPC
Date: Sat, 5 May 2018 18:26:49 +0800	[thread overview]
Message-ID: <20180505102649.t74xclzalkejeb6x@tardis> (raw)
In-Reply-To: <20180505100055.yc4upauxo5etq5ud@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9406 bytes --]

Hi Ingo,

On Sat, May 05, 2018 at 12:00:55PM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > > So there's no loss in arch flexibility.
> > 
> > BTW., PowerPC for example is already in such a situation, it does not define 
> > atomic_cmpxchg_release(), only the other APIs:
> > 
> > #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
> > #define atomic_cmpxchg_relaxed(v, o, n) \
> > 	cmpxchg_relaxed(&((v)->counter), (o), (n))
> > #define atomic_cmpxchg_acquire(v, o, n) \
> > 	cmpxchg_acquire(&((v)->counter), (o), (n))
> > 
> > Was it really the intention on the PowerPC side that the generic code falls back 
> > to cmpxchg(), i.e.:
> > 
> > #  define atomic_cmpxchg_release(...)           __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)
> > 
> > Which after macro expansion becomes:
> > 
> > 	smp_mb__before_atomic();
> > 	atomic_cmpxchg_relaxed(v, o, n);
> > 
> > smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which 
> > falls back to mb(), which on PowerPC is the 'sync' instruction.
> > 
> > Isn't this a inefficiency bug?
> > 
> > While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to 
> > have the following basic structure:
> > 
> > full cmpxchg():
> > 
> > 	PPC_ATOMIC_ENTRY_BARRIER # sync
> > 	ldarx + stdcx
> > 	PPC_ATOMIC_EXIT_BARRIER  # sync
> > 
> > cmpxchg_relaxed():
> > 
> > 	ldarx + stdcx
> > 
> > cmpxchg_acquire():
> > 
> > 	ldarx + stdcx
> > 	PPC_ACQUIRE_BARRIER      # lwsync
> > 
> > The logical extension for cmpxchg_release() would be:
> > 
> > cmpxchg_release():
> > 
> > 	PPC_RELEASE_BARRIER      # lwsync
> > 	ldarx + stdcx
> > 
> > But instead we silently get the generic fallback, which does:
> > 
> > 	smp_mb__before_atomic();
> > 	atomic_cmpxchg_relaxed(v, o, n);
> > 
> > Which maps to:
> > 
> > 	sync
> > 	ldarx + stdcx
> > 
> > Note that it uses a full barrier instead of lwsync (does that stand for 
> > 'lightweight sync'?).
> > 
> > Even if it turns out we need the full barrier, with the overly finegrained 
> > structure of the atomics this detail is totally undocumented and non-obvious.
> 
> The patch below fills in those bits and implements the optimized cmpxchg_release() 
> family of APIs. The end effect should be that cmpxchg_release() will now use 
> 'lwsync' instead of 'sync' on PowerPC, for the following APIs:
> 
>   cmpxchg_release()
>   cmpxchg64_release()
>   atomic_cmpxchg_release()
>   atomic64_cmpxchg_release()
> 
> I based this choice of the release barrier on an existing bitops low level PowerPC 
> method:
> 
>    DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER)
> 
> This clearly suggests that PPC_RELEASE_BARRIER is in active use and 'lwsync' is 
> the 'release barrier' instruction, if I interpreted that right.
> 

Thanks for looking into this, but as I said in other email:

	https://marc.info/?l=linux-kernel&m=152551511324210&w=2

, we actually generate light weight barriers for cmpxchg_release()
familiy.

The reason of the asymmetry between cmpxchg_acquire() and
cmpxchg_release() is that we want to save a barrier for
cmpxchg_acquire() if the cmp fails, but doing the similar for
cmpxchg_release() will introduce a scenario that puts a barrier in a
ll/sc loop, which may be a bad idea.

> But I know very little about PowerPC so this might be spectacularly wrong. It's 
> totally untested as well. I also pretty sick today so my mental capabilities are 
> significantly reduced ...
> 

Feel sorry about that, hope you well!

Please let me know if you think I should provide more document work to
make this more informative.

Regards,
Boqun

> So not signed off and such.
> 
> Thanks,
> 
> 	Ingo
> 
> ---
>  arch/powerpc/include/asm/atomic.h  |  4 ++
>  arch/powerpc/include/asm/cmpxchg.h | 81 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
> index 682b3e6a1e21..f7a6f29acb12 100644
> --- a/arch/powerpc/include/asm/atomic.h
> +++ b/arch/powerpc/include/asm/atomic.h
> @@ -213,6 +213,8 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
>  	cmpxchg_relaxed(&((v)->counter), (o), (n))
>  #define atomic_cmpxchg_acquire(v, o, n) \
>  	cmpxchg_acquire(&((v)->counter), (o), (n))
> +#define atomic_cmpxchg_release(v, o, n) \
> +	cmpxchg_release(&((v)->counter), (o), (n))
>  
>  #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
>  #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
> @@ -519,6 +521,8 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v)
>  	cmpxchg_relaxed(&((v)->counter), (o), (n))
>  #define atomic64_cmpxchg_acquire(v, o, n) \
>  	cmpxchg_acquire(&((v)->counter), (o), (n))
> +#define atomic64_cmpxchg_release(v, o, n) \
> +	cmpxchg_release(&((v)->counter), (o), (n))
>  
>  #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
>  #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
> diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
> index 9b001f1f6b32..6e46310b1833 100644
> --- a/arch/powerpc/include/asm/cmpxchg.h
> +++ b/arch/powerpc/include/asm/cmpxchg.h
> @@ -213,10 +213,12 @@ __xchg_relaxed(void *ptr, unsigned long x, unsigned int size)
>  CMPXCHG_GEN(u8, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory");
>  CMPXCHG_GEN(u8, _local, , , "memory");
>  CMPXCHG_GEN(u8, _acquire, , PPC_ACQUIRE_BARRIER, "memory");
> +CMPXCHG_GEN(u8, _release, PPC_RELEASE_BARRIER, , "memory");
>  CMPXCHG_GEN(u8, _relaxed, , , "cc");
>  CMPXCHG_GEN(u16, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory");
>  CMPXCHG_GEN(u16, _local, , , "memory");
>  CMPXCHG_GEN(u16, _acquire, , PPC_ACQUIRE_BARRIER, "memory");
> +CMPXCHG_GEN(u16, _release, PPC_RELEASE_BARRIER, , "memory");
>  CMPXCHG_GEN(u16, _relaxed, , , "cc");
>  
>  static __always_inline unsigned long
> @@ -314,6 +316,29 @@ __cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new)
>  	return prev;
>  }
>  
> +static __always_inline unsigned long
> +__cmpxchg_u32_release(u32 *p, unsigned long old, unsigned long new)
> +{
> +	unsigned long prev;
> +
> +	__asm__ __volatile__ (
> +	PPC_RELEASE_BARRIER
> +"1:	lwarx	%0,0,%2		# __cmpxchg_u32_release\n"
> +"	cmpw	0,%0,%3\n"
> +"	bne-	2f\n"
> +	PPC405_ERR77(0, %2)
> +"	stwcx.	%4,0,%2\n"
> +"	bne-	1b\n"
> +	"\n"
> +"2:"
> +	: "=&r" (prev), "+m" (*p)
> +	: "r" (p), "r" (old), "r" (new)
> +	: "cc", "memory");
> +
> +	return prev;
> +}
> +
> +
>  #ifdef CONFIG_PPC64
>  static __always_inline unsigned long
>  __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
> @@ -397,6 +422,27 @@ __cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new)
>  
>  	return prev;
>  }
> +
> +static __always_inline unsigned long
> +__cmpxchg_u64_release(u64 *p, unsigned long old, unsigned long new)
> +{
> +	unsigned long prev;
> +
> +	__asm__ __volatile__ (
> +	PPC_RELEASE_BARRIER
> +"1:	ldarx	%0,0,%2		# __cmpxchg_u64_release\n"
> +"	cmpd	0,%0,%3\n"
> +"	bne-	2f\n"
> +"	stdcx.	%4,0,%2\n"
> +"	bne-	1b\n"
> +	"\n"
> +"2:"
> +	: "=&r" (prev), "+m" (*p)
> +	: "r" (p), "r" (old), "r" (new)
> +	: "cc", "memory");
> +
> +	return prev;
> +}
>  #endif
>  
>  static __always_inline unsigned long
> @@ -478,6 +524,27 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
>  	BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_acquire");
>  	return old;
>  }
> +
> +static __always_inline unsigned long
> +__cmpxchg_release(void *ptr, unsigned long old, unsigned long new,
> +		  unsigned int size)
> +{
> +	switch (size) {
> +	case 1:
> +		return __cmpxchg_u8_release(ptr, old, new);
> +	case 2:
> +		return __cmpxchg_u16_release(ptr, old, new);
> +	case 4:
> +		return __cmpxchg_u32_release(ptr, old, new);
> +#ifdef CONFIG_PPC64
> +	case 8:
> +		return __cmpxchg_u64_release(ptr, old, new);
> +#endif
> +	}
> +	BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_release");
> +	return old;
> +}
> +
>  #define cmpxchg(ptr, o, n)						 \
>    ({									 \
>       __typeof__(*(ptr)) _o_ = (o);					 \
> @@ -512,6 +579,15 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
>  			(unsigned long)_o_, (unsigned long)_n_,		\
>  			sizeof(*(ptr)));				\
>  })
> +
> +#define cmpxchg_release(ptr, o, n)					\
> +({									\
> +	__typeof__(*(ptr)) _o_ = (o);					\
> +	__typeof__(*(ptr)) _n_ = (n);					\
> +	(__typeof__(*(ptr))) __cmpxchg_release((ptr),			\
> +			(unsigned long)_o_, (unsigned long)_n_,		\
> +			sizeof(*(ptr)));				\
> +})
>  #ifdef CONFIG_PPC64
>  #define cmpxchg64(ptr, o, n)						\
>    ({									\
> @@ -533,6 +609,11 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
>  	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
>  	cmpxchg_acquire((ptr), (o), (n));				\
>  })
> +#define cmpxchg64_release(ptr, o, n)					\
> +({									\
> +	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> +	cmpxchg_release((ptr), (o), (n));				\
> +})
>  #else
>  #include <asm-generic/cmpxchg-local.h>
>  #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2018-05-05 10:22 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04 17:39 [PATCH 0/6] arm64: add instrumented atomics Mark Rutland
2018-05-04 17:39 ` [PATCH 1/6] locking/atomic, asm-generic: instrument ordering variants Mark Rutland
2018-05-04 18:01   ` Peter Zijlstra
2018-05-04 18:09     ` Mark Rutland
2018-05-04 18:24       ` Peter Zijlstra
2018-05-05  9:12         ` Mark Rutland
2018-05-05  8:11       ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Ingo Molnar
2018-05-05  8:36         ` [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more Ingo Molnar
2018-05-05  8:54           ` [PATCH] locking/atomics: Combine the atomic_andnot() and atomic64_andnot() API definitions Ingo Molnar
2018-05-06 12:15             ` [tip:locking/core] " tip-bot for Ingo Molnar
2018-05-06 14:15             ` [PATCH] " Andrea Parri
2018-05-06 12:14           ` [tip:locking/core] locking/atomics: Simplify the op definitions in atomic.h some more tip-bot for Ingo Molnar
2018-05-09  7:33             ` Peter Zijlstra
2018-05-09 13:03               ` Will Deacon
2018-05-15  8:54                 ` Ingo Molnar
2018-05-15  8:35               ` Ingo Molnar
2018-05-15 11:41                 ` Peter Zijlstra
2018-05-15 12:13                   ` Peter Zijlstra
2018-05-15 15:43                   ` Mark Rutland
2018-05-15 17:10                     ` Peter Zijlstra
2018-05-15 17:53                       ` Mark Rutland
2018-05-15 18:11                         ` Peter Zijlstra
2018-05-15 18:15                           ` Peter Zijlstra
2018-05-15 18:52                             ` Linus Torvalds
2018-05-15 19:39                               ` Peter Zijlstra
2018-05-21 17:12                           ` Mark Rutland
2018-05-06 14:12           ` [PATCH] " Andrea Parri
2018-05-06 14:57             ` Ingo Molnar
2018-05-07  9:54               ` Andrea Parri
2018-05-18 18:43               ` Palmer Dabbelt
2018-05-05  8:47         ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Peter Zijlstra
2018-05-05  9:04           ` Ingo Molnar
2018-05-05  9:24             ` Peter Zijlstra
2018-05-05  9:38             ` Ingo Molnar
2018-05-05 10:00               ` [RFC PATCH] locking/atomics/powerpc: Introduce optimized cmpxchg_release() family of APIs for PowerPC Ingo Molnar
2018-05-05 10:26                 ` Boqun Feng [this message]
2018-05-06  1:56                 ` Benjamin Herrenschmidt
2018-05-05 10:16               ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Boqun Feng
2018-05-05 10:35                 ` [RFC PATCH] locking/atomics/powerpc: Clarify why the cmpxchg_relaxed() family of APIs falls back to full cmpxchg() Ingo Molnar
2018-05-05 11:28                   ` Boqun Feng
2018-05-05 13:27                     ` [PATCH] locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs Ingo Molnar
2018-05-05 14:03                       ` Boqun Feng
2018-05-06 12:11                         ` Ingo Molnar
2018-05-07  1:04                           ` Boqun Feng
2018-05-07  6:50                             ` Ingo Molnar
2018-05-06 12:13                     ` [tip:locking/core] " tip-bot for Boqun Feng
2018-05-07 13:31                       ` [PATCH v2] " Boqun Feng
2018-05-05  9:05           ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Dmitry Vyukov
2018-05-05  9:32             ` Peter Zijlstra
2018-05-07  6:43               ` [RFC PATCH] locking/atomics/x86/64: Clean up and fix details of <asm/atomic64_64.h> Ingo Molnar
2018-05-05  9:09           ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Ingo Molnar
2018-05-05  9:29             ` Peter Zijlstra
2018-05-05 10:48               ` [PATCH] locking/atomics: Shorten the __atomic_op() defines to __op() Ingo Molnar
2018-05-05 10:59                 ` Ingo Molnar
2018-05-06 12:15                 ` [tip:locking/core] " tip-bot for Ingo Molnar
2018-05-06 12:14         ` [tip:locking/core] locking/atomics: Clean up the atomic.h maze of #defines tip-bot for Ingo Molnar
2018-05-04 17:39 ` [PATCH 2/6] locking/atomic, asm-generic: instrument atomic*andnot*() Mark Rutland
2018-05-04 17:39 ` [PATCH 3/6] arm64: use <linux/atomic.h> for cmpxchg Mark Rutland
2018-05-04 17:39 ` [PATCH 4/6] arm64: fix assembly constraints " Mark Rutland
2018-05-04 17:39 ` [PATCH 5/6] arm64: use instrumented atomics Mark Rutland
2018-05-04 17:39 ` [PATCH 6/6] arm64: instrument smp_{load_acquire,store_release} Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180505102649.t74xclzalkejeb6x@tardis \
    --to=boqun.feng@gmail.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=benh@kernel.crashing.org \
    --cc=catalin.marinas@arm.com \
    --cc=dvyukov@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).