LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* spinlocks -- why are releases inlined and acquires are not?
@ 2008-04-01 0:08 Jiri Kosina
2008-04-01 0:40 ` Jiri Kosina
0 siblings, 1 reply; 9+ messages in thread
From: Jiri Kosina @ 2008-04-01 0:08 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
Hi,
include/linux/spinlock.h shows:
#define spin_lock_irq(lock) _spin_lock_irq(lock)
unconditionally, i.e. irrespectible of config options, we always (on SMP)
call kernel/spinlock.c:_spin_lock_irq(), which is even not inlined.
Contrary to that, unlocks are written as one would expect, i.e:
#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || \
!defined(CONFIG_SMP)
# define spin_unlock_irq(lock) _spin_unlock_irq(lock)
#else
# define spin_unlock_irq(lock) \
do { \
__raw_spin_unlock(&(lock)->raw_lock); \
__release(lock); \
local_irq_enable(); \
} while (0)
and __raw_spin_unlock() is of course properly inlined.
What is the reason for this asymetry? Shouldn't the acquiring functions be
implemented in the very same way? Or at least, shouldn't all the
__lockfunc functions be inlined?
Thanks,
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spinlocks -- why are releases inlined and acquires are not?
2008-04-01 0:08 spinlocks -- why are releases inlined and acquires are not? Jiri Kosina
@ 2008-04-01 0:40 ` Jiri Kosina
2008-04-01 8:33 ` Ingo Molnar
0 siblings, 1 reply; 9+ messages in thread
From: Jiri Kosina @ 2008-04-01 0:40 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
On Tue, 1 Apr 2008, Jiri Kosina wrote:
> What is the reason for this asymetry? Shouldn't the acquiring functions
> be implemented in the very same way? Or at least, shouldn't all the
> __lockfunc functions be inlined?
i.e. is there any particular reason why we don't have something like the
patch below (implemented for all the lock variants of course, this is
just to demonstrate what I mean)?
spinlocks: inline spin_lock() in non-debug cases
inline spin_lock() in the non-debug case.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 576a5f7..30922ba 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -178,7 +178,12 @@ do { \
#define read_trylock(lock) __cond_lock(lock, _read_trylock(lock))
#define write_trylock(lock) __cond_lock(lock, _write_trylock(lock))
+#if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || \
+ !defined(CONFIG_SMP)
#define spin_lock(lock) _spin_lock(lock)
+#else
+#define spin_lock(lock) \
+ do { __raw_spin_lock(&(lock)->raw_lock); __acquire(lock); } while (0)
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define spin_lock_nested(lock, subclass) _spin_lock_nested(lock, subclass)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: spinlocks -- why are releases inlined and acquires are not?
2008-04-01 0:40 ` Jiri Kosina
@ 2008-04-01 8:33 ` Ingo Molnar
2008-04-01 8:38 ` Jiri Kosina
0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2008-04-01 8:33 UTC (permalink / raw)
To: Jiri Kosina; +Cc: linux-kernel
* Jiri Kosina <jkosina@suse.cz> wrote:
> > What is the reason for this asymetry? Shouldn't the acquiring
> > functions be implemented in the very same way? Or at least,
> > shouldn't all the __lockfunc functions be inlined?
>
> i.e. is there any particular reason why we don't have something like
> the patch below (implemented for all the lock variants of course, this
> is just to demonstrate what I mean)?
IIRC the main reason we decided to uninline them was image size. So i'd
suggest for you to check how this change impacts vmlinux size (on both
64-bit and 32-bit), a typical distro config (or allyesconfig with lock
debugging disabled).
If you do the test on x86.git/latest you'll also have the
CONFIG_OPTIMIZE_INLINING=y and CONFIG_CC_OPTIMIZE_FOR_SIZE=y combination
as well, which generates the most compact x86 kernel image ever.
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spinlocks -- why are releases inlined and acquires are not?
2008-04-01 8:33 ` Ingo Molnar
@ 2008-04-01 8:38 ` Jiri Kosina
2008-04-01 8:43 ` Ingo Molnar
2008-04-01 11:12 ` Andi Kleen
0 siblings, 2 replies; 9+ messages in thread
From: Jiri Kosina @ 2008-04-01 8:38 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
On Tue, 1 Apr 2008, Ingo Molnar wrote:
> > > What is the reason for this asymetry? Shouldn't the acquiring
> > > functions be implemented in the very same way? Or at least,
> > > shouldn't all the __lockfunc functions be inlined?
> > i.e. is there any particular reason why we don't have something like
> > the patch below (implemented for all the lock variants of course, this
> > is just to demonstrate what I mean)?
> IIRC the main reason we decided to uninline them was image size. So i'd
> suggest for you to check how this change impacts vmlinux size (on both
> 64-bit and 32-bit), a typical distro config (or allyesconfig with lock
> debugging disabled). If you do the test on x86.git/latest you'll also
> have the CONFIG_OPTIMIZE_INLINING=y and CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> combination as well, which generates the most compact x86 kernel image
> ever.
In fact we have received report from one of our users that he is seeing
approximately 15% performance degradation of mmap() when spinlocks are not
inlined. I am going to do some performance measurements myself shortly, as
it seems quite strange, but while at it, I have noticed the aforementioned
asymetry in spinlock.h, so I just wanted to know if there is any
particular reason behind that.
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spinlocks -- why are releases inlined and acquires are not?
2008-04-01 8:38 ` Jiri Kosina
@ 2008-04-01 8:43 ` Ingo Molnar
2008-04-01 8:44 ` Jiri Kosina
2008-04-01 11:12 ` Andi Kleen
1 sibling, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2008-04-01 8:43 UTC (permalink / raw)
To: Jiri Kosina; +Cc: linux-kernel
* Jiri Kosina <jkosina@suse.cz> wrote:
> In fact we have received report from one of our users that he is
> seeing approximately 15% performance degradation of mmap() when
> spinlocks are not inlined. I am going to do some performance
> measurements myself shortly, as it seems quite strange, but while at
> it, I have noticed the aforementioned asymetry in spinlock.h, so I
> just wanted to know if there is any particular reason behind that.
inlining decisions almost never have effects of that order of magnitude
- especially on new CPUs, so that 15% looks quite suspicious to me. If
it's real then an easy-to-run testcase would be nice.
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spinlocks -- why are releases inlined and acquires are not?
2008-04-01 8:43 ` Ingo Molnar
@ 2008-04-01 8:44 ` Jiri Kosina
0 siblings, 0 replies; 9+ messages in thread
From: Jiri Kosina @ 2008-04-01 8:44 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
On Tue, 1 Apr 2008, Ingo Molnar wrote:
> > In fact we have received report from one of our users that he is
> > seeing approximately 15% performance degradation of mmap() when
> > spinlocks are not inlined. I am going to do some performance
> > measurements myself shortly, as it seems quite strange, but while at
> > it, I have noticed the aforementioned asymetry in spinlock.h, so I
> > just wanted to know if there is any particular reason behind that.
> inlining decisions almost never have effects of that order of magnitude
> - especially on new CPUs, so that 15% looks quite suspicious to me.
Definitely, that's why I am a little bit suspicious. Thanks,
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spinlocks -- why are releases inlined and acquires are not?
2008-04-01 8:38 ` Jiri Kosina
2008-04-01 8:43 ` Ingo Molnar
@ 2008-04-01 11:12 ` Andi Kleen
2008-04-01 12:40 ` Jiri Kosina
1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2008-04-01 11:12 UTC (permalink / raw)
To: Jiri Kosina; +Cc: Ingo Molnar, linux-kernel
Jiri Kosina <jkosina@suse.cz> writes:
>
> In fact we have received report from one of our users that he is seeing
> approximately 15% performance degradation of mmap() when spinlocks are not
> inlined. I am going to do some performance measurements myself shortly, as
> it seems quite strange, but while at it, I have noticed the aforementioned
> asymetry in spinlock.h, so I just wanted to know if there is any
> particular reason behind that.
At some point -- but that was before queued locks -- I noticed that
for i386 spin unlocks the call sequence for the sub function is
actually larger in code than the actual spin unlock operation and for
x86-64 it was about the same. That was not even counting any negative
register allocation effects the call has on the caller. Spinlocks
don't clobber a lot of registers, but the compiler doesn't know that
when calling the function so it has to assume all ABI callee clobbered
are gone.
I didn't do anything back then because at this point Ingo was
reorganizing the spinlock code hourly[1] for his lockdep etc. merge and
wanted to wait for it to settle down and then it dropped from
the radar.
Anyways without queued spinlocks that has probably changed again,
might be still worth rechecking.
-Andi
[1] ok I'm exaggerating...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spinlocks -- why are releases inlined and acquires are not?
2008-04-01 11:12 ` Andi Kleen
@ 2008-04-01 12:40 ` Jiri Kosina
2008-04-01 13:30 ` Andi Kleen
0 siblings, 1 reply; 9+ messages in thread
From: Jiri Kosina @ 2008-04-01 12:40 UTC (permalink / raw)
To: Andi Kleen; +Cc: Ingo Molnar, linux-kernel
On Tue, 1 Apr 2008, Andi Kleen wrote:
> At some point -- but that was before queued locks -- I noticed that for
> i386 spin unlocks the call sequence for the sub function is actually
> larger in code than the actual spin unlock operation and for x86-64 it
> was about the same.
spin unlocks seem to be properly inlined anyway, so that should be fine.
My concern here is the non-inlining of spin locks, for which I don't think
your argument above is also valid, right?
Thanks,
--
Jiri Kosina
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spinlocks -- why are releases inlined and acquires are not?
2008-04-01 12:40 ` Jiri Kosina
@ 2008-04-01 13:30 ` Andi Kleen
0 siblings, 0 replies; 9+ messages in thread
From: Andi Kleen @ 2008-04-01 13:30 UTC (permalink / raw)
To: Jiri Kosina; +Cc: Andi Kleen, Ingo Molnar, linux-kernel
On Tue, Apr 01, 2008 at 02:40:49PM +0200, Jiri Kosina wrote:
> On Tue, 1 Apr 2008, Andi Kleen wrote:
>
> > At some point -- but that was before queued locks -- I noticed that for
> > i386 spin unlocks the call sequence for the sub function is actually
> > larger in code than the actual spin unlock operation and for x86-64 it
> > was about the same.
>
> spin unlocks seem to be properly inlined anyway, so that should be fine.
> My concern here is the non-inlining of spin locks, for which I don't think
> your argument above is also valid, right?
When I did the original numbers (again before queued locks) the
call was slightly shorter than the actual spin lock operation.
This was not counting register allocator effects though which
are difficult to measure generally.
-Andi
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-04-01 13:27 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-01 0:08 spinlocks -- why are releases inlined and acquires are not? Jiri Kosina
2008-04-01 0:40 ` Jiri Kosina
2008-04-01 8:33 ` Ingo Molnar
2008-04-01 8:38 ` Jiri Kosina
2008-04-01 8:43 ` Ingo Molnar
2008-04-01 8:44 ` Jiri Kosina
2008-04-01 11:12 ` Andi Kleen
2008-04-01 12:40 ` Jiri Kosina
2008-04-01 13:30 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).