LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org> To: Peter Zijlstra <peterz@infradead.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au> Cc: Mark Rutland <mark.rutland@arm.com>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, aryabinin@virtuozzo.com, boqun.feng@gmail.com, catalin.marinas@arm.com, dvyukov@google.com, will.deacon@arm.com Subject: [RFC PATCH] locking/atomics/powerpc: Introduce optimized cmpxchg_release() family of APIs for PowerPC Date: Sat, 5 May 2018 12:00:55 +0200 [thread overview] Message-ID: <20180505100055.yc4upauxo5etq5ud@gmail.com> (raw) In-Reply-To: <20180505093829.xfylnedwd5nonhae@gmail.com> * Ingo Molnar <mingo@kernel.org> wrote: > > So there's no loss in arch flexibility. > > BTW., PowerPC for example is already in such a situation, it does not define > atomic_cmpxchg_release(), only the other APIs: > > #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) > #define atomic_cmpxchg_relaxed(v, o, n) \ > cmpxchg_relaxed(&((v)->counter), (o), (n)) > #define atomic_cmpxchg_acquire(v, o, n) \ > cmpxchg_acquire(&((v)->counter), (o), (n)) > > Was it really the intention on the PowerPC side that the generic code falls back > to cmpxchg(), i.e.: > > # define atomic_cmpxchg_release(...) __atomic_op_release(atomic_cmpxchg, __VA_ARGS__) > > Which after macro expansion becomes: > > smp_mb__before_atomic(); > atomic_cmpxchg_relaxed(v, o, n); > > smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which > falls back to mb(), which on PowerPC is the 'sync' instruction. > > Isn't this a inefficiency bug? > > While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to > have the following basic structure: > > full cmpxchg(): > > PPC_ATOMIC_ENTRY_BARRIER # sync > ldarx + stdcx > PPC_ATOMIC_EXIT_BARRIER # sync > > cmpxchg_relaxed(): > > ldarx + stdcx > > cmpxchg_acquire(): > > ldarx + stdcx > PPC_ACQUIRE_BARRIER # lwsync > > The logical extension for cmpxchg_release() would be: > > cmpxchg_release(): > > PPC_RELEASE_BARRIER # lwsync > ldarx + stdcx > > But instead we silently get the generic fallback, which does: > > smp_mb__before_atomic(); > atomic_cmpxchg_relaxed(v, o, n); > > Which maps to: > > sync > ldarx + stdcx > > Note that it uses a full barrier instead of lwsync (does that stand for > 'lightweight sync'?). > > Even if it turns out we need the full barrier, with the overly finegrained > structure of the atomics this detail is totally undocumented and non-obvious. The patch below fills in those bits and implements the optimized cmpxchg_release() family of APIs. The end effect should be that cmpxchg_release() will now use 'lwsync' instead of 'sync' on PowerPC, for the following APIs: cmpxchg_release() cmpxchg64_release() atomic_cmpxchg_release() atomic64_cmpxchg_release() I based this choice of the release barrier on an existing bitops low level PowerPC method: DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER) This clearly suggests that PPC_RELEASE_BARRIER is in active use and 'lwsync' is the 'release barrier' instruction, if I interpreted that right. But I know very little about PowerPC so this might be spectacularly wrong. It's totally untested as well. I also pretty sick today so my mental capabilities are significantly reduced ... So not signed off and such. Thanks, Ingo --- arch/powerpc/include/asm/atomic.h | 4 ++ arch/powerpc/include/asm/cmpxchg.h | 81 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+) diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index 682b3e6a1e21..f7a6f29acb12 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -213,6 +213,8 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v) cmpxchg_relaxed(&((v)->counter), (o), (n)) #define atomic_cmpxchg_acquire(v, o, n) \ cmpxchg_acquire(&((v)->counter), (o), (n)) +#define atomic_cmpxchg_release(v, o, n) \ + cmpxchg_release(&((v)->counter), (o), (n)) #define atomic_xchg(v, new) (xchg(&((v)->counter), new)) #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new)) @@ -519,6 +521,8 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v) cmpxchg_relaxed(&((v)->counter), (o), (n)) #define atomic64_cmpxchg_acquire(v, o, n) \ cmpxchg_acquire(&((v)->counter), (o), (n)) +#define atomic64_cmpxchg_release(v, o, n) \ + cmpxchg_release(&((v)->counter), (o), (n)) #define atomic64_xchg(v, new) (xchg(&((v)->counter), new)) #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new)) diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h index 9b001f1f6b32..6e46310b1833 100644 --- a/arch/powerpc/include/asm/cmpxchg.h +++ b/arch/powerpc/include/asm/cmpxchg.h @@ -213,10 +213,12 @@ __xchg_relaxed(void *ptr, unsigned long x, unsigned int size) CMPXCHG_GEN(u8, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory"); CMPXCHG_GEN(u8, _local, , , "memory"); CMPXCHG_GEN(u8, _acquire, , PPC_ACQUIRE_BARRIER, "memory"); +CMPXCHG_GEN(u8, _release, PPC_RELEASE_BARRIER, , "memory"); CMPXCHG_GEN(u8, _relaxed, , , "cc"); CMPXCHG_GEN(u16, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory"); CMPXCHG_GEN(u16, _local, , , "memory"); CMPXCHG_GEN(u16, _acquire, , PPC_ACQUIRE_BARRIER, "memory"); +CMPXCHG_GEN(u16, _release, PPC_RELEASE_BARRIER, , "memory"); CMPXCHG_GEN(u16, _relaxed, , , "cc"); static __always_inline unsigned long @@ -314,6 +316,29 @@ __cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new) return prev; } +static __always_inline unsigned long +__cmpxchg_u32_release(u32 *p, unsigned long old, unsigned long new) +{ + unsigned long prev; + + __asm__ __volatile__ ( + PPC_RELEASE_BARRIER +"1: lwarx %0,0,%2 # __cmpxchg_u32_release\n" +" cmpw 0,%0,%3\n" +" bne- 2f\n" + PPC405_ERR77(0, %2) +" stwcx. %4,0,%2\n" +" bne- 1b\n" + "\n" +"2:" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (old), "r" (new) + : "cc", "memory"); + + return prev; +} + + #ifdef CONFIG_PPC64 static __always_inline unsigned long __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new) @@ -397,6 +422,27 @@ __cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new) return prev; } + +static __always_inline unsigned long +__cmpxchg_u64_release(u64 *p, unsigned long old, unsigned long new) +{ + unsigned long prev; + + __asm__ __volatile__ ( + PPC_RELEASE_BARRIER +"1: ldarx %0,0,%2 # __cmpxchg_u64_release\n" +" cmpd 0,%0,%3\n" +" bne- 2f\n" +" stdcx. %4,0,%2\n" +" bne- 1b\n" + "\n" +"2:" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (old), "r" (new) + : "cc", "memory"); + + return prev; +} #endif static __always_inline unsigned long @@ -478,6 +524,27 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new, BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_acquire"); return old; } + +static __always_inline unsigned long +__cmpxchg_release(void *ptr, unsigned long old, unsigned long new, + unsigned int size) +{ + switch (size) { + case 1: + return __cmpxchg_u8_release(ptr, old, new); + case 2: + return __cmpxchg_u16_release(ptr, old, new); + case 4: + return __cmpxchg_u32_release(ptr, old, new); +#ifdef CONFIG_PPC64 + case 8: + return __cmpxchg_u64_release(ptr, old, new); +#endif + } + BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_release"); + return old; +} + #define cmpxchg(ptr, o, n) \ ({ \ __typeof__(*(ptr)) _o_ = (o); \ @@ -512,6 +579,15 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new, (unsigned long)_o_, (unsigned long)_n_, \ sizeof(*(ptr))); \ }) + +#define cmpxchg_release(ptr, o, n) \ +({ \ + __typeof__(*(ptr)) _o_ = (o); \ + __typeof__(*(ptr)) _n_ = (n); \ + (__typeof__(*(ptr))) __cmpxchg_release((ptr), \ + (unsigned long)_o_, (unsigned long)_n_, \ + sizeof(*(ptr))); \ +}) #ifdef CONFIG_PPC64 #define cmpxchg64(ptr, o, n) \ ({ \ @@ -533,6 +609,11 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new, BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ cmpxchg_acquire((ptr), (o), (n)); \ }) +#define cmpxchg64_release(ptr, o, n) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + cmpxchg_release((ptr), (o), (n)); \ +}) #else #include <asm-generic/cmpxchg-local.h> #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
next prev parent reply other threads:[~2018-05-05 10:01 UTC|newest] Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-04 17:39 [PATCH 0/6] arm64: add instrumented atomics Mark Rutland 2018-05-04 17:39 ` [PATCH 1/6] locking/atomic, asm-generic: instrument ordering variants Mark Rutland 2018-05-04 18:01 ` Peter Zijlstra 2018-05-04 18:09 ` Mark Rutland 2018-05-04 18:24 ` Peter Zijlstra 2018-05-05 9:12 ` Mark Rutland 2018-05-05 8:11 ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Ingo Molnar 2018-05-05 8:36 ` [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more Ingo Molnar 2018-05-05 8:54 ` [PATCH] locking/atomics: Combine the atomic_andnot() and atomic64_andnot() API definitions Ingo Molnar 2018-05-06 12:15 ` [tip:locking/core] " tip-bot for Ingo Molnar 2018-05-06 14:15 ` [PATCH] " Andrea Parri 2018-05-06 12:14 ` [tip:locking/core] locking/atomics: Simplify the op definitions in atomic.h some more tip-bot for Ingo Molnar 2018-05-09 7:33 ` Peter Zijlstra 2018-05-09 13:03 ` Will Deacon 2018-05-15 8:54 ` Ingo Molnar 2018-05-15 8:35 ` Ingo Molnar 2018-05-15 11:41 ` Peter Zijlstra 2018-05-15 12:13 ` Peter Zijlstra 2018-05-15 15:43 ` Mark Rutland 2018-05-15 17:10 ` Peter Zijlstra 2018-05-15 17:53 ` Mark Rutland 2018-05-15 18:11 ` Peter Zijlstra 2018-05-15 18:15 ` Peter Zijlstra 2018-05-15 18:52 ` Linus Torvalds 2018-05-15 19:39 ` Peter Zijlstra 2018-05-21 17:12 ` Mark Rutland 2018-05-06 14:12 ` [PATCH] " Andrea Parri 2018-05-06 14:57 ` Ingo Molnar 2018-05-07 9:54 ` Andrea Parri 2018-05-18 18:43 ` Palmer Dabbelt 2018-05-05 8:47 ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Peter Zijlstra 2018-05-05 9:04 ` Ingo Molnar 2018-05-05 9:24 ` Peter Zijlstra 2018-05-05 9:38 ` Ingo Molnar 2018-05-05 10:00 ` Ingo Molnar [this message] 2018-05-05 10:26 ` [RFC PATCH] locking/atomics/powerpc: Introduce optimized cmpxchg_release() family of APIs for PowerPC Boqun Feng 2018-05-06 1:56 ` Benjamin Herrenschmidt 2018-05-05 10:16 ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Boqun Feng 2018-05-05 10:35 ` [RFC PATCH] locking/atomics/powerpc: Clarify why the cmpxchg_relaxed() family of APIs falls back to full cmpxchg() Ingo Molnar 2018-05-05 11:28 ` Boqun Feng 2018-05-05 13:27 ` [PATCH] locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs Ingo Molnar 2018-05-05 14:03 ` Boqun Feng 2018-05-06 12:11 ` Ingo Molnar 2018-05-07 1:04 ` Boqun Feng 2018-05-07 6:50 ` Ingo Molnar 2018-05-06 12:13 ` [tip:locking/core] " tip-bot for Boqun Feng 2018-05-07 13:31 ` [PATCH v2] " Boqun Feng 2018-05-05 9:05 ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Dmitry Vyukov 2018-05-05 9:32 ` Peter Zijlstra 2018-05-07 6:43 ` [RFC PATCH] locking/atomics/x86/64: Clean up and fix details of <asm/atomic64_64.h> Ingo Molnar 2018-05-05 9:09 ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Ingo Molnar 2018-05-05 9:29 ` Peter Zijlstra 2018-05-05 10:48 ` [PATCH] locking/atomics: Shorten the __atomic_op() defines to __op() Ingo Molnar 2018-05-05 10:59 ` Ingo Molnar 2018-05-06 12:15 ` [tip:locking/core] " tip-bot for Ingo Molnar 2018-05-06 12:14 ` [tip:locking/core] locking/atomics: Clean up the atomic.h maze of #defines tip-bot for Ingo Molnar 2018-05-04 17:39 ` [PATCH 2/6] locking/atomic, asm-generic: instrument atomic*andnot*() Mark Rutland 2018-05-04 17:39 ` [PATCH 3/6] arm64: use <linux/atomic.h> for cmpxchg Mark Rutland 2018-05-04 17:39 ` [PATCH 4/6] arm64: fix assembly constraints " Mark Rutland 2018-05-04 17:39 ` [PATCH 5/6] arm64: use instrumented atomics Mark Rutland 2018-05-04 17:39 ` [PATCH 6/6] arm64: instrument smp_{load_acquire,store_release} Mark Rutland
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180505100055.yc4upauxo5etq5ud@gmail.com \ --to=mingo@kernel.org \ --cc=aryabinin@virtuozzo.com \ --cc=benh@kernel.crashing.org \ --cc=boqun.feng@gmail.com \ --cc=catalin.marinas@arm.com \ --cc=dvyukov@google.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mark.rutland@arm.com \ --cc=mpe@ellerman.id.au \ --cc=paulus@samba.org \ --cc=peterz@infradead.org \ --cc=will.deacon@arm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).