LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/6] Clean up ticketlock implementation
@ 2011-01-24 23:41 Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 1/6] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Hi all,
This series cleans up the x86 ticketlock implementation by converting
a large proportion of it to C. This eliminates the need for having
separate implementations for "large" (NR_CPUS >= 256) and "small"
(NR_CPUS < 256) ticket locks.
This also lays the groundwork for future changes to the ticketlock
implementation.
Of course, the big question when converting from assembler to C is
what the compiler will do to the code. In general, the results are
very similar.
For example, the original hand-coded small-ticket ticket_lock is:
movl $256, %eax
lock xadd %ax,(%rdi)
1: cmp %ah,%al
je 2f
pause
mov (%rdi),%al
jmp 1b
2:
The C version, compiled by gcc 4.5.1 is:
movl $256, %eax
lock; xaddw %ax, (%rdi)
movzbl %ah, %edx
.L3: cmpb %dl, %al
je .L2
rep; nop
movb (%rdi), %al # lock_1(D)->D.5949.tickets.head, inc$head
jmp .L3 #
.L2:
So very similar, except the compiler misses directly comparing
%ah to %al.
With big tickets, which is what distros are typically compiled with,
the results are:
hand-coded:
movl $65536, %eax #, inc
lock; xaddl %eax, (%rdi) # inc, lock_2(D)->slock
movzwl %ax, %edx # inc, tmp
shrl $16, %eax # inc
1: cmpl %eax, %edx # inc, tmp
je 2f
rep ; nop
movzwl (%rdi), %edx # lock_2(D)->slock, tmp
jmp 1b
2:
Compiled C:
movl $65536, %eax #, tickets
lock; xaddl %eax, (%rdi) # tickets, lock_1(D)->D.5952.tickets
movl %eax, %edx # tickets,
shrl $16, %edx #,
.L3: cmpw %dx, %ax # tickets$tail, inc$head
je .L2 #,
rep; nop
movw (%rdi), %ax # lock_1(D)->D.5952.tickets.head, inc$head
jmp .L3 #
.L2:
In this case the code is pretty much identical except for slight
variations in where the 32-bit values are truncated to 16.
So overall, I think this change will have negligable performance
impact.
Thanks,
J
Jeremy Fitzhardinge (6):
x86/ticketlock: clean up types and accessors
x86/ticketlock: convert spin loop to C
x86/ticketlock: Use C for __ticket_spin_unlock
x86/ticketlock: make large and small ticket versions of spin_lock the
same
x86/ticketlock: make __ticket_spin_lock common
x86/ticketlock: make __ticket_spin_trylock common
arch/x86/include/asm/spinlock.h | 146 ++++++++++++---------------------
arch/x86/include/asm/spinlock_types.h | 22 +++++-
2 files changed, 73 insertions(+), 95 deletions(-)
--
1.7.3.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/6] x86/ticketlock: clean up types and accessors
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 2/6] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
` (5 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
A few cleanups to the way spinlocks are defined and accessed:
- define __ticket_t which is the size of a spinlock ticket (ie, enough
bits to hold all the cpus)
- Define struct arch_spinlock as a union containing plain slock and
the head and tail tickets
- Use head and tail to implement some of the spinlock predicates.
- Make all ticket variables unsigned.
- Use TICKET_SHIFT to form constants
Most of this will be used in later patches.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/include/asm/spinlock.h | 24 ++++++++++--------------
arch/x86/include/asm/spinlock_types.h | 20 ++++++++++++++++++--
2 files changed, 28 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
* much between them in performance though, especially as locks are out of line.
*/
#if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
- short inc = 0x0100;
+ unsigned short inc = 1 << TICKET_SHIFT;
asm volatile (
LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
{
- int tmp, new;
+ unsigned int tmp, new;
asm volatile("movzwl %2, %0\n\t"
"cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
: "memory", "cc");
}
#else
-#define TICKET_SHIFT 16
-
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
- int inc = 0x00010000;
- int tmp;
+ unsigned inc = 1 << TICKET_SHIFT;
+ unsigned tmp;
asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
"movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
{
- int tmp;
- int new;
+ unsigned tmp;
+ unsigned new;
asm volatile("movl %2,%0\n\t"
"movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
{
- int tmp = ACCESS_ONCE(lock->slock);
+ struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
- return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+ return !!(tmp.tail ^ tmp.head);
}
static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
{
- int tmp = ACCESS_ONCE(lock->slock);
+ struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
- return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+ return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
}
#ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
# error "please don't include this file directly"
#endif
+#include <linux/types.h>
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8 __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT (sizeof(__ticket_t) * 8)
+#define TICKET_MASK ((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
typedef struct arch_spinlock {
- unsigned int slock;
+ union {
+ unsigned int slock;
+ struct __raw_tickets {
+ __ticket_t head, tail;
+ } tickets;
+ };
} arch_spinlock_t;
-#define __ARCH_SPIN_LOCK_UNLOCKED { 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED { { .slock = 0 } }
typedef struct {
unsigned int lock;
--
1.7.3.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/6] x86/ticketlock: convert spin loop to C
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 1/6] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
` (4 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.
For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.
We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/include/asm/spinlock.h | 58 +++++++++++++++++++-------------------
1 files changed, 29 insertions(+), 29 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
#if (NR_CPUS < 256)
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
- unsigned short inc = 1 << TICKET_SHIFT;
-
- asm volatile (
- LOCK_PREFIX "xaddw %w0, %1\n"
- "1:\t"
- "cmpb %h0, %b0\n\t"
- "je 2f\n\t"
- "rep ; nop\n\t"
- "movb %1, %b0\n\t"
- /* don't need lfence here, because loads are in-order */
- "jmp 1b\n"
- "2:"
- : "+Q" (inc), "+m" (lock->slock)
- :
- : "memory", "cc");
+ register union {
+ struct __raw_tickets tickets;
+ unsigned short slock;
+ } inc = { .slock = 1 << TICKET_SHIFT };
+
+ asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+ : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+ for (;;) {
+ if (inc.tickets.head == inc.tickets.tail)
+ goto out;
+ cpu_relax();
+ inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+ }
+out: barrier(); /* make sure nothing creeps before the lock is taken */
}
static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
unsigned inc = 1 << TICKET_SHIFT;
- unsigned tmp;
+ __ticket_t tmp;
- asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
- "movzwl %w0, %2\n\t"
- "shrl $16, %0\n\t"
- "1:\t"
- "cmpl %0, %2\n\t"
- "je 2f\n\t"
- "rep ; nop\n\t"
- "movzwl %1, %2\n\t"
- /* don't need lfence here, because loads are in-order */
- "jmp 1b\n"
- "2:"
- : "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
- :
- : "memory", "cc");
+ asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+ : "+r" (inc), "+m" (lock->slock)
+ : : "memory", "cc");
+
+ tmp = inc;
+ inc >>= TICKET_SHIFT;
+
+ for (;;) {
+ if ((__ticket_t)inc == tmp)
+ goto out;
+ cpu_relax();
+ tmp = ACCESS_ONCE(lock->tickets.head);
+ }
+out: barrier(); /* make sure nothing creeps before the lock is taken */
}
static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
--
1.7.3.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 1/6] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 2/6] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
2011-01-25 1:13 ` Nick Piggin
2011-01-24 23:41 ` [PATCH 4/6] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
` (3 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If we don't need to use a locked inc for unlock, then implement it in C.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/include/asm/spinlock.h | 32 +++++++++++++++++---------------
1 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..0170ba9 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
* On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
* (PPro errata 66, 92)
*/
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+ if (sizeof(lock->tickets.head) == sizeof(u8))
+ asm (LOCK_PREFIX "incb %0"
+ : "+m" (lock->tickets.head) : : "memory");
+ else
+ asm (LOCK_PREFIX "incw %0"
+ : "+m" (lock->tickets.head) : : "memory");
+
+}
#else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+ lock->tickets.head++;
+}
#endif
/*
@@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
return tmp;
}
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
- asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
- : "+m" (lock->slock)
- :
- : "memory", "cc");
-}
#else
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
@@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
return tmp;
}
+#endif
static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
{
- asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
- : "+m" (lock->slock)
- :
- : "memory", "cc");
+ __ticket_unlock_release(lock);
+ barrier(); /* prevent reordering into locked region */
}
-#endif
static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
{
--
1.7.3.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/6] x86/ticketlock: make large and small ticket versions of spin_lock the same
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
` (2 preceding siblings ...)
2011-01-24 23:41 ` [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 5/6] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
` (2 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/include/asm/spinlock.h | 23 ++++++++---------------
1 files changed, 8 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 0170ba9..1b81809 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
#if (NR_CPUS < 256)
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
- register union {
- struct __raw_tickets tickets;
- unsigned short slock;
- } inc = { .slock = 1 << TICKET_SHIFT };
+ register struct __raw_tickets inc = { .tail = 1 };
asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
- : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+ : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
for (;;) {
- if (inc.tickets.head == inc.tickets.tail)
+ if (inc.head == inc.tail)
goto out;
cpu_relax();
- inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+ inc.head = ACCESS_ONCE(lock->tickets.head);
}
out: barrier(); /* make sure nothing creeps before the lock is taken */
}
@@ -108,21 +105,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
#else
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
- unsigned inc = 1 << TICKET_SHIFT;
- __ticket_t tmp;
+ register struct __raw_tickets inc = { .tail = 1 };
asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
- : "+r" (inc), "+m" (lock->slock)
+ : "+r" (inc), "+m" (lock->tickets)
: : "memory", "cc");
- tmp = inc;
- inc >>= TICKET_SHIFT;
-
for (;;) {
- if ((__ticket_t)inc == tmp)
+ if (inc.head == inc.tail)
goto out;
cpu_relax();
- tmp = ACCESS_ONCE(lock->tickets.head);
+ inc.head = ACCESS_ONCE(lock->tickets.head);
}
out: barrier(); /* make sure nothing creeps before the lock is taken */
}
--
1.7.3.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/6] x86/ticketlock: make __ticket_spin_lock common
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
` (3 preceding siblings ...)
2011-01-24 23:41 ` [PATCH 4/6] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
2011-01-25 1:08 ` [PATCH 0/6] Clean up ticketlock implementation Nick Piggin
6 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/include/asm/spinlock.h | 42 ++++++++++++++++++--------------------
1 files changed, 20 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 1b81809..f722f96 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
* save some instructions and make the code more elegant. There really isn't
* much between them in performance though, especially as locks are out of line.
*/
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock)
{
- register struct __raw_tickets inc = { .tail = 1 };
+ register struct __raw_tickets tickets = { .tail = 1 };
+
+ if (sizeof(lock->tickets.head) == sizeof(u8))
+ asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+ : "+r" (tickets), "+m" (lock->tickets)
+ : : "memory", "cc");
+ else
+ asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+ : "+r" (tickets), "+m" (lock->tickets)
+ : : "memory", "cc");
- asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
- : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+ return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+ register struct __raw_tickets inc;
+
+ inc = __ticket_spin_claim(lock);
for (;;) {
if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
out: barrier(); /* make sure nothing creeps before the lock is taken */
}
+#if (NR_CPUS < 256)
static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
{
unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
return tmp;
}
#else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
- register struct __raw_tickets inc = { .tail = 1 };
-
- asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
- : "+r" (inc), "+m" (lock->tickets)
- : : "memory", "cc");
-
- for (;;) {
- if (inc.head == inc.tail)
- goto out;
- cpu_relax();
- inc.head = ACCESS_ONCE(lock->tickets.head);
- }
-out: barrier(); /* make sure nothing creeps before the lock is taken */
-}
-
static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
{
unsigned tmp;
--
1.7.3.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
` (4 preceding siblings ...)
2011-01-24 23:41 ` [PATCH 5/6] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
2011-01-25 1:16 ` Nick Piggin
2011-01-25 1:08 ` [PATCH 0/6] Clean up ticketlock implementation Nick Piggin
6 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Make trylock code common regardless of ticket size.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/include/asm/spinlock.h | 49 +++++++--------------------------
arch/x86/include/asm/spinlock_types.h | 6 +++-
2 files changed, 14 insertions(+), 41 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f722f96..3afb1a7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
out: barrier(); /* make sure nothing creeps before the lock is taken */
}
-#if (NR_CPUS < 256)
static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
{
- unsigned int tmp, new;
-
- asm volatile("movzwl %2, %0\n\t"
- "cmpb %h0,%b0\n\t"
- "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
- "jne 1f\n\t"
- LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
- "1:"
- "sete %b1\n\t"
- "movzbl %b1,%0\n\t"
- : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
- :
- : "memory", "cc");
-
- return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
- unsigned tmp;
- unsigned new;
-
- asm volatile("movl %2,%0\n\t"
- "movl %0,%1\n\t"
- "roll $16, %0\n\t"
- "cmpl %0,%1\n\t"
- "leal 0x00010000(%" REG_PTR_MODE "0), %1\n\t"
- "jne 1f\n\t"
- LOCK_PREFIX "cmpxchgl %1,%2\n\t"
- "1:"
- "sete %b1\n\t"
- "movzbl %b1,%0\n\t"
- : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
- :
- : "memory", "cc");
-
- return tmp;
+ arch_spinlock_t old, new;
+
+ old.tickets = ACCESS_ONCE(lock->tickets);
+ if (old.tickets.head != old.tickets.tail)
+ return 0;
+
+ new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+ /* cmpxchg is a full barrier, so nothing can move before it */
+ return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
}
-#endif
static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
{
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
#if (CONFIG_NR_CPUS < 256)
typedef u8 __ticket_t;
+typedef u16 __ticketpair_t;
#else
typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
#endif
#define TICKET_SHIFT (sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
typedef struct arch_spinlock {
union {
- unsigned int slock;
+ __ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
} arch_spinlock_t;
-#define __ARCH_SPIN_LOCK_UNLOCKED { { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
typedef struct {
unsigned int lock;
--
1.7.3.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/6] Clean up ticketlock implementation
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
` (5 preceding siblings ...)
2011-01-24 23:41 ` [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-01-25 1:08 ` Nick Piggin
2011-01-31 21:46 ` Jeremy Fitzhardinge
6 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2011-01-25 1:08 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> Hi all,
>
> This series cleans up the x86 ticketlock implementation by converting
> a large proportion of it to C. This eliminates the need for having
> separate implementations for "large" (NR_CPUS >= 256) and "small"
> (NR_CPUS < 256) ticket locks.
>
> This also lays the groundwork for future changes to the ticketlock
> implementation.
>
> Of course, the big question when converting from assembler to C is
> what the compiler will do to the code. In general, the results are
> very similar.
>
> For example, the original hand-coded small-ticket ticket_lock is:
> movl $256, %eax
> lock xadd %ax,(%rdi)
> 1: cmp %ah,%al
> je 2f
> pause
> mov (%rdi),%al
> jmp 1b
> 2:
>
> The C version, compiled by gcc 4.5.1 is:
> movl $256, %eax
> lock; xaddw %ax, (%rdi)
> movzbl %ah, %edx
> .L3: cmpb %dl, %al
> je .L2
> rep; nop
> movb (%rdi), %al # lock_1(D)->D.5949.tickets.head, inc$head
> jmp .L3 #
> .L2:
>
> So very similar, except the compiler misses directly comparing
> %ah to %al.
Oh :(
Have you filed a bug with gcc?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock
2011-01-24 23:41 ` [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-01-25 1:13 ` Nick Piggin
2011-01-25 1:42 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2011-01-25 1:13 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> If we don't need to use a locked inc for unlock, then implement it in C.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
> arch/x86/include/asm/spinlock.h | 32 +++++++++++++++++---------------
> 1 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
> index f48a6e3..0170ba9 100644
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -33,9 +33,21 @@
> * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
> * (PPro errata 66, 92)
> */
> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
> +{
> + if (sizeof(lock->tickets.head) == sizeof(u8))
> + asm (LOCK_PREFIX "incb %0"
> + : "+m" (lock->tickets.head) : : "memory");
> + else
> + asm (LOCK_PREFIX "incw %0"
> + : "+m" (lock->tickets.head) : : "memory");
> +
> +}
> #else
> -# define UNLOCK_LOCK_PREFIX
> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
> +{
> + lock->tickets.head++;
> +}
> #endif
>
> /*
> @@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>
> return tmp;
> }
> -
> -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
> -{
> - asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
> - : "+m" (lock->slock)
> - :
> - : "memory", "cc");
> -}
> #else
> static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
> {
> @@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>
> return tmp;
> }
> +#endif
>
> static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
> {
> - asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
> - : "+m" (lock->slock)
> - :
> - : "memory", "cc");
> + __ticket_unlock_release(lock);
> + barrier(); /* prevent reordering into locked region */
> }
> -#endif
The barrier is wrong.
What makes me a tiny bit uneasy is that gcc is allowed to implement
this any way it wishes. OK there may be a NULL intersection of possible
valid assembly which is a buggy unlock... but relying on gcc to implement
lock primitives is scary. Does this really help in a way that can't be done
with the assembly versions?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
2011-01-24 23:41 ` [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-01-25 1:16 ` Nick Piggin
2011-01-25 1:42 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2011-01-25 1:16 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> Make trylock code common regardless of ticket size.
What's the asm for this look like?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
2011-01-25 1:16 ` Nick Piggin
@ 2011-01-25 1:42 ` Jeremy Fitzhardinge
2011-01-25 1:58 ` Nick Piggin
0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-25 1:42 UTC (permalink / raw)
To: Nick Piggin
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On 01/24/2011 05:16 PM, Nick Piggin wrote:
> On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
>> Make trylock code common regardless of ticket size.
> What's the asm for this look like?
Asm:
movzwl (%rdi), %eax # lock_1(D)->slock, tmp
cmpb %ah,%al # tmp
leal 0x100(%rax), %edx # tmp, new
jne 1f
lock; cmpxchgw %dx,(%rdi) # new, lock_1(D)->slock
1: sete %dl # new
movzbl %dl,%eax # new, tmp
C:
movw (%rdi), %dx # lock_2(D)->D.5949.tickets, old
xorl %eax, %eax # D.13954
movzbl %dh, %ecx # old, tmp70
cmpb %dl, %cl # old, tmp70
jne .L5 #,
leal 256(%rdx), %ecx #, D.13956
movl %edx, %eax # old, __ret
lock; cmpxchgw %cx,(%rdi) # D.13956,* lock
cmpw %dx, %ax # old, __ret
sete %al #, D.13954
movzbl %al, %eax # D.13954, D.13954
.L5:
The C version can't take advantage of the fact that the cmpxchg directly
sets the flags, so it ends up re-comparing the old and swapped-out
values to set the return. And it doesn't re-use the same sete to set
the return value in the quick failed-to-acquire path.
It might be worth having a generic cmpxchg() variant which returns a
succeed/fail flag rather than the fetched value, to avoid comparison in
this case - since many (most?) cmpxchg() callers end up doing that
comparison.
How performance critical is trylock? I guess the ones in fs/dcache.c
are the ones looming large in your mind.
J
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock
2011-01-25 1:13 ` Nick Piggin
@ 2011-01-25 1:42 ` Jeremy Fitzhardinge
2011-01-25 1:49 ` Nick Piggin
0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-25 1:42 UTC (permalink / raw)
To: Nick Piggin
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On 01/24/2011 05:13 PM, Nick Piggin wrote:
> On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
>> If we don't need to use a locked inc for unlock, then implement it in C.
>>
>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> ---
>> arch/x86/include/asm/spinlock.h | 32 +++++++++++++++++---------------
>> 1 files changed, 17 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
>> index f48a6e3..0170ba9 100644
>> --- a/arch/x86/include/asm/spinlock.h
>> +++ b/arch/x86/include/asm/spinlock.h
>> @@ -33,9 +33,21 @@
>> * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
>> * (PPro errata 66, 92)
>> */
>> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>> +{
>> + if (sizeof(lock->tickets.head) == sizeof(u8))
>> + asm (LOCK_PREFIX "incb %0"
>> + : "+m" (lock->tickets.head) : : "memory");
>> + else
>> + asm (LOCK_PREFIX "incw %0"
>> + : "+m" (lock->tickets.head) : : "memory");
>> +
>> +}
>> #else
>> -# define UNLOCK_LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>> +{
>> + lock->tickets.head++;
>> +}
>> #endif
>>
>> /*
>> @@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>>
>> return tmp;
>> }
>> -
>> -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>> -{
>> - asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
>> - : "+m" (lock->slock)
>> - :
>> - : "memory", "cc");
>> -}
>> #else
>> static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>> {
>> @@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>>
>> return tmp;
>> }
>> +#endif
>>
>> static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>> {
>> - asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
>> - : "+m" (lock->slock)
>> - :
>> - : "memory", "cc");
>> + __ticket_unlock_release(lock);
>> + barrier(); /* prevent reordering into locked region */
>> }
>> -#endif
> The barrier is wrong.
In what way? Do you think it should be on the other side?
> What makes me a tiny bit uneasy is that gcc is allowed to implement
> this any way it wishes. OK there may be a NULL intersection of possible
> valid assembly which is a buggy unlock... but relying on gcc to implement
> lock primitives is scary. Does this really help in a way that can't be done
> with the assembly versions?
We rely on C/gcc for plenty of other subtle ordering things. Spinlocks
aren't particularly special in this regard.
J
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock
2011-01-25 1:42 ` Jeremy Fitzhardinge
@ 2011-01-25 1:49 ` Nick Piggin
0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2011-01-25 1:49 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On Tue, Jan 25, 2011 at 12:42 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 01/24/2011 05:13 PM, Nick Piggin wrote:
>> On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>>
>>> If we don't need to use a locked inc for unlock, then implement it in C.
>>>
>>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>> ---
>>> arch/x86/include/asm/spinlock.h | 32 +++++++++++++++++---------------
>>> 1 files changed, 17 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
>>> index f48a6e3..0170ba9 100644
>>> --- a/arch/x86/include/asm/spinlock.h
>>> +++ b/arch/x86/include/asm/spinlock.h
>>> @@ -33,9 +33,21 @@
>>> * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
>>> * (PPro errata 66, 92)
>>> */
>>> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
>>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>>> +{
>>> + if (sizeof(lock->tickets.head) == sizeof(u8))
>>> + asm (LOCK_PREFIX "incb %0"
>>> + : "+m" (lock->tickets.head) : : "memory");
>>> + else
>>> + asm (LOCK_PREFIX "incw %0"
>>> + : "+m" (lock->tickets.head) : : "memory");
>>> +
>>> +}
>>> #else
>>> -# define UNLOCK_LOCK_PREFIX
>>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>>> +{
>>> + lock->tickets.head++;
>>> +}
>>> #endif
>>>
>>> /*
>>> @@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>>>
>>> return tmp;
>>> }
>>> -
>>> -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>>> -{
>>> - asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
>>> - : "+m" (lock->slock)
>>> - :
>>> - : "memory", "cc");
>>> -}
>>> #else
>>> static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>>> {
>>> @@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>>>
>>> return tmp;
>>> }
>>> +#endif
>>>
>>> static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>>> {
>>> - asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
>>> - : "+m" (lock->slock)
>>> - :
>>> - : "memory", "cc");
>>> + __ticket_unlock_release(lock);
>>> + barrier(); /* prevent reordering into locked region */
>>> }
>>> -#endif
>> The barrier is wrong.
>
> In what way? Do you think it should be on the other side?
Well the other side is where the locked region is :)
>> What makes me a tiny bit uneasy is that gcc is allowed to implement
>> this any way it wishes. OK there may be a NULL intersection of possible
>> valid assembly which is a buggy unlock... but relying on gcc to implement
>> lock primitives is scary. Does this really help in a way that can't be done
>> with the assembly versions?
>
> We rely on C/gcc for plenty of other subtle ordering things. Spinlocks
> aren't particularly special in this regard.
Well probably not orderings, but we do rely on it to do atomic
stores and loads to <= sizeof(long) data types, unfortunately.
We also rely on it not to speculatively store into data on not taken
side of branches and things like that.
I guess it's OK, but it makes me cringe a little bit to see unlock just
do head++
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
2011-01-25 1:42 ` Jeremy Fitzhardinge
@ 2011-01-25 1:58 ` Nick Piggin
2011-01-27 23:53 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2011-01-25 1:58 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On Tue, Jan 25, 2011 at 12:42 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 01/24/2011 05:16 PM, Nick Piggin wrote:
>> On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>>
>>> Make trylock code common regardless of ticket size.
>> What's the asm for this look like?
>
> Asm:
>
> movzwl (%rdi), %eax # lock_1(D)->slock, tmp
> cmpb %ah,%al # tmp
> leal 0x100(%rax), %edx # tmp, new
> jne 1f
> lock; cmpxchgw %dx,(%rdi) # new, lock_1(D)->slock
> 1: sete %dl # new
> movzbl %dl,%eax # new, tmp
>
>
>
> C:
>
> movw (%rdi), %dx # lock_2(D)->D.5949.tickets, old
> xorl %eax, %eax # D.13954
> movzbl %dh, %ecx # old, tmp70
> cmpb %dl, %cl # old, tmp70
> jne .L5 #,
> leal 256(%rdx), %ecx #, D.13956
> movl %edx, %eax # old, __ret
> lock; cmpxchgw %cx,(%rdi) # D.13956,* lock
> cmpw %dx, %ax # old, __ret
> sete %al #, D.13954
> movzbl %al, %eax # D.13954, D.13954
> .L5:
>
>
> The C version can't take advantage of the fact that the cmpxchg directly
> sets the flags, so it ends up re-comparing the old and swapped-out
> values to set the return. And it doesn't re-use the same sete to set
> the return value in the quick failed-to-acquire path.
Hm.
> It might be worth having a generic cmpxchg() variant which returns a
> succeed/fail flag rather than the fetched value, to avoid comparison in
> this case - since many (most?) cmpxchg() callers end up doing that
> comparison.
>
> How performance critical is trylock? I guess the ones in fs/dcache.c
> are the ones looming large in your mind.
Well they are on on the reclaim/free path rather than the _hottest_
paths, but yes they are performance critical.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
2011-01-25 1:58 ` Nick Piggin
@ 2011-01-27 23:53 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-27 23:53 UTC (permalink / raw)
To: Nick Piggin
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On 01/24/2011 05:58 PM, Nick Piggin wrote:
>>
>> The C version can't take advantage of the fact that the cmpxchg directly
>> sets the flags, so it ends up re-comparing the old and swapped-out
>> values to set the return. And it doesn't re-use the same sete to set
>> the return value in the quick failed-to-acquire path.
> Hm.
Adding a "cmpxchg_flag" which does its own sete and returns a boolean
"success" flag, the whole thing goes to:
__ticket_spin_trylock:
movw (%rdi), %ax # lock_2(D)->D.5950.tickets, old
xorl %edx, %edx # D.13949
movzbl %ah, %ecx # old, tmp69
cmpb %al, %cl # old, tmp69
jne .L5 #,
leal 256(%rax), %edx #, D.13951
lock; cmpxchgw %dx,(%rdi); sete %al # D.13951,* lock, __ret
movzbl %al, %edx # __ret, D.13949
.L5:
movl %edx, %eax # D.13949,
ret
The eax/edx shuffle is a bit unfortunate I can't see it hurting very much.
>> It might be worth having a generic cmpxchg() variant which returns a
>> succeed/fail flag rather than the fetched value, to avoid comparison in
>> this case - since many (most?) cmpxchg() callers end up doing that
>> comparison.
>>
>> How performance critical is trylock? I guess the ones in fs/dcache.c
>> are the ones looming large in your mind.
> Well they are on on the reclaim/free path rather than the _hottest_
> paths, but yes they are performance critical.
I think that code looks pretty reasonable.
J
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/6] Clean up ticketlock implementation
2011-01-25 1:08 ` [PATCH 0/6] Clean up ticketlock implementation Nick Piggin
@ 2011-01-31 21:46 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-31 21:46 UTC (permalink / raw)
To: Nick Piggin
Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
Jeremy Fitzhardinge
On 01/24/2011 05:08 PM, Nick Piggin wrote:
>> So very similar, except the compiler misses directly comparing
>> %ah to %al.
> Oh :(
>
> Have you filed a bug with gcc?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47556
J
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-01-31 21:46 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 1/6] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 2/6] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
2011-01-25 1:13 ` Nick Piggin
2011-01-25 1:42 ` Jeremy Fitzhardinge
2011-01-25 1:49 ` Nick Piggin
2011-01-24 23:41 ` [PATCH 4/6] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 5/6] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
2011-01-25 1:16 ` Nick Piggin
2011-01-25 1:42 ` Jeremy Fitzhardinge
2011-01-25 1:58 ` Nick Piggin
2011-01-27 23:53 ` Jeremy Fitzhardinge
2011-01-25 1:08 ` [PATCH 0/6] Clean up ticketlock implementation Nick Piggin
2011-01-31 21:46 ` Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).