LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, aryabinin@virtuozzo.com,
	catalin.marinas@arm.com, dvyukov@google.com, will.deacon@arm.com
Subject: Re: [PATCH] locking/atomics: Clean up the atomic.h maze of #defines
Date: Sat, 5 May 2018 18:16:09 +0800	[thread overview]
Message-ID: <20180505101609.5wb56j4mspjkokmw@tardis> (raw)
In-Reply-To: <20180505093829.xfylnedwd5nonhae@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5169 bytes --]

On Sat, May 05, 2018 at 11:38:29AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > > So we could do the following simplification on top of that:
> > > > 
> > > >  #ifndef atomic_fetch_dec_relaxed
> > > >  # ifndef atomic_fetch_dec
> > > >  #  define atomic_fetch_dec(v)		atomic_fetch_sub(1, (v))
> > > >  #  define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))
> > > >  #  define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))
> > > >  #  define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))
> > > >  # else
> > > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec
> > > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec
> > > >  #  define atomic_fetch_dec_release		atomic_fetch_dec
> > > >  # endif
> > > >  #else
> > > >  # ifndef atomic_fetch_dec
> > > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)
> > > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)
> > > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)
> > > >  # endif
> > > >  #endif
> > > 
> > > This would disallow an architecture to override just fetch_dec_release for
> > > instance.
> > 
> > Couldn't such a crazy arch just define _all_ the 3 APIs in this group?
> > That's really a small price and makes the place pay the complexity
> > price that does the weirdness...
> > 
> > > I don't think there currently is any architecture that does that, but the
> > > intent was to allow it to override anything and only provide defaults where it
> > > does not.
> > 
> > I'd argue that if a new arch only defines one of these APIs that's probably a bug. 
> > If they absolutely want to do it, they still can - by defining all 3 APIs.
> > 
> > So there's no loss in arch flexibility.
> 
> BTW., PowerPC for example is already in such a situation, it does not define 
> atomic_cmpxchg_release(), only the other APIs:
> 
> #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
> #define atomic_cmpxchg_relaxed(v, o, n) \
> 	cmpxchg_relaxed(&((v)->counter), (o), (n))
> #define atomic_cmpxchg_acquire(v, o, n) \
> 	cmpxchg_acquire(&((v)->counter), (o), (n))
> 
> Was it really the intention on the PowerPC side that the generic code falls back 
> to cmpxchg(), i.e.:
> 
> #  define atomic_cmpxchg_release(...)           __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)
> 

So ppc has its own definition __atomic_op_release() in
arch/powerpc/include/asm/atomic.h:

	#define __atomic_op_release(op, args...)				\
	({									\
		__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\
		op##_relaxed(args);						\
	})

, and PPC_RELEASE_BARRIER is lwsync, so we map to

	lwsync();
	atomic_cmpxchg_relaxed(v, o, n);

And the reason, why we don't define atomic_cmpxchg_release() but define
atomic_cmpxchg_acquire() is that, atomic_cmpxchg_*() could provide no
ordering guarantee if the cmp fails, we did this for
atomic_cmpxchg_acquire() but not for atomic_cmpxchg_release(), because
doing so may introduce a memory barrier inside a ll/sc critical section,
please see the comment before __cmpxchg_u32_acquire() in
arch/powerpc/include/asm/cmpxchg.h:

	/*
	 * cmpxchg family don't have order guarantee if cmp part fails, therefore we
	 * can avoid superfluous barriers if we use assembly code to implement
	 * cmpxchg() and cmpxchg_acquire(), however we don't do the similar for
	 * cmpxchg_release() because that will result in putting a barrier in the
	 * middle of a ll/sc loop, which is probably a bad idea. For example, this
	 * might cause the conditional store more likely to fail.
	 */

Regards,
Boqun


> Which after macro expansion becomes:
> 
> 	smp_mb__before_atomic();
> 	atomic_cmpxchg_relaxed(v, o, n);
> 
> smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which 
> falls back to mb(), which on PowerPC is the 'sync' instruction.
> 
> Isn't this a inefficiency bug?
> 
> While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to 
> have the following basic structure:
> 
> full cmpxchg():
> 
> 	PPC_ATOMIC_ENTRY_BARRIER # sync
> 	ldarx + stdcx
> 	PPC_ATOMIC_EXIT_BARRIER  # sync
> 
> cmpxchg_relaxed():
> 
> 	ldarx + stdcx
> 
> cmpxchg_acquire():
> 
> 	ldarx + stdcx
> 	PPC_ACQUIRE_BARRIER      # lwsync
> 
> The logical extension for cmpxchg_release() would be:
> 
> cmpxchg_release():
> 
> 	PPC_RELEASE_BARRIER      # lwsync
> 	ldarx + stdcx
> 
> But instead we silently get the generic fallback, which does:
> 
> 	smp_mb__before_atomic();
> 	atomic_cmpxchg_relaxed(v, o, n);
> 
> Which maps to:
> 
> 	sync
> 	ldarx + stdcx
> 
> Note that it uses a full barrier instead of lwsync (does that stand for 
> 'lightweight sync'?).
> 
> Even if it turns out we need the full barrier, with the overly finegrained 
> structure of the atomics this detail is totally undocumented and non-obvious.
> 
> Thanks,
> 
> 	Ingo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2018-05-05 10:11 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04 17:39 [PATCH 0/6] arm64: add instrumented atomics Mark Rutland
2018-05-04 17:39 ` [PATCH 1/6] locking/atomic, asm-generic: instrument ordering variants Mark Rutland
2018-05-04 18:01   ` Peter Zijlstra
2018-05-04 18:09     ` Mark Rutland
2018-05-04 18:24       ` Peter Zijlstra
2018-05-05  9:12         ` Mark Rutland
2018-05-05  8:11       ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Ingo Molnar
2018-05-05  8:36         ` [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more Ingo Molnar
2018-05-05  8:54           ` [PATCH] locking/atomics: Combine the atomic_andnot() and atomic64_andnot() API definitions Ingo Molnar
2018-05-06 12:15             ` [tip:locking/core] " tip-bot for Ingo Molnar
2018-05-06 14:15             ` [PATCH] " Andrea Parri
2018-05-06 12:14           ` [tip:locking/core] locking/atomics: Simplify the op definitions in atomic.h some more tip-bot for Ingo Molnar
2018-05-09  7:33             ` Peter Zijlstra
2018-05-09 13:03               ` Will Deacon
2018-05-15  8:54                 ` Ingo Molnar
2018-05-15  8:35               ` Ingo Molnar
2018-05-15 11:41                 ` Peter Zijlstra
2018-05-15 12:13                   ` Peter Zijlstra
2018-05-15 15:43                   ` Mark Rutland
2018-05-15 17:10                     ` Peter Zijlstra
2018-05-15 17:53                       ` Mark Rutland
2018-05-15 18:11                         ` Peter Zijlstra
2018-05-15 18:15                           ` Peter Zijlstra
2018-05-15 18:52                             ` Linus Torvalds
2018-05-15 19:39                               ` Peter Zijlstra
2018-05-21 17:12                           ` Mark Rutland
2018-05-06 14:12           ` [PATCH] " Andrea Parri
2018-05-06 14:57             ` Ingo Molnar
2018-05-07  9:54               ` Andrea Parri
2018-05-18 18:43               ` Palmer Dabbelt
2018-05-05  8:47         ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Peter Zijlstra
2018-05-05  9:04           ` Ingo Molnar
2018-05-05  9:24             ` Peter Zijlstra
2018-05-05  9:38             ` Ingo Molnar
2018-05-05 10:00               ` [RFC PATCH] locking/atomics/powerpc: Introduce optimized cmpxchg_release() family of APIs for PowerPC Ingo Molnar
2018-05-05 10:26                 ` Boqun Feng
2018-05-06  1:56                 ` Benjamin Herrenschmidt
2018-05-05 10:16               ` Boqun Feng [this message]
2018-05-05 10:35                 ` [RFC PATCH] locking/atomics/powerpc: Clarify why the cmpxchg_relaxed() family of APIs falls back to full cmpxchg() Ingo Molnar
2018-05-05 11:28                   ` Boqun Feng
2018-05-05 13:27                     ` [PATCH] locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs Ingo Molnar
2018-05-05 14:03                       ` Boqun Feng
2018-05-06 12:11                         ` Ingo Molnar
2018-05-07  1:04                           ` Boqun Feng
2018-05-07  6:50                             ` Ingo Molnar
2018-05-06 12:13                     ` [tip:locking/core] " tip-bot for Boqun Feng
2018-05-07 13:31                       ` [PATCH v2] " Boqun Feng
2018-05-05  9:05           ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Dmitry Vyukov
2018-05-05  9:32             ` Peter Zijlstra
2018-05-07  6:43               ` [RFC PATCH] locking/atomics/x86/64: Clean up and fix details of <asm/atomic64_64.h> Ingo Molnar
2018-05-05  9:09           ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Ingo Molnar
2018-05-05  9:29             ` Peter Zijlstra
2018-05-05 10:48               ` [PATCH] locking/atomics: Shorten the __atomic_op() defines to __op() Ingo Molnar
2018-05-05 10:59                 ` Ingo Molnar
2018-05-06 12:15                 ` [tip:locking/core] " tip-bot for Ingo Molnar
2018-05-06 12:14         ` [tip:locking/core] locking/atomics: Clean up the atomic.h maze of #defines tip-bot for Ingo Molnar
2018-05-04 17:39 ` [PATCH 2/6] locking/atomic, asm-generic: instrument atomic*andnot*() Mark Rutland
2018-05-04 17:39 ` [PATCH 3/6] arm64: use <linux/atomic.h> for cmpxchg Mark Rutland
2018-05-04 17:39 ` [PATCH 4/6] arm64: fix assembly constraints " Mark Rutland
2018-05-04 17:39 ` [PATCH 5/6] arm64: use instrumented atomics Mark Rutland
2018-05-04 17:39 ` [PATCH 6/6] arm64: instrument smp_{load_acquire,store_release} Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180505101609.5wb56j4mspjkokmw@tardis \
    --to=boqun.feng@gmail.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=catalin.marinas@arm.com \
    --cc=dvyukov@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).