LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/10] local_t : adding and standardising atomic primitives
@ 2006-12-21  0:15 Mathieu Desnoyers
  2006-12-21  0:20 ` [PATCH 1/10] local_t : architecture agnostic Mathieu Desnoyers
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:15 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

These patches extend and standardise local_t operations on each architectures,
allowing a rich set of atomic operations to be done on per-cpu data with
minimal performance impact. On some architectures, there seems to be no
difference between the SMP and UP operation (same memory barriers, same
LOCking), local.h simply includes asm-generic/local.h, which removes duplicated
code.

These patches applies on 2.6.20-rc1-git7.
It depends on the patch "atomic.h : standardising atomic primitives"

Signed-off-by : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/10] local_t : architecture agnostic
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
@ 2006-12-21  0:20 ` Mathieu Desnoyers
  2006-12-21  0:21 ` [PATCH 2/10] local_t : alpha Mathieu Desnoyers
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:20 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

This is the architecture agnostic local_t extension.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -33,6 +33,19 @@ #define local_dec(l)	atomic_long_dec(&(l
 #define local_add(i,l)	atomic_long_add((i),(&(l)->a))
 #define local_sub(i,l)	atomic_long_sub((i),(&(l)->a))
 
+#define local_sub_and_test(i, l) atomic_long_sub_and_test((i), (&(l)->a))
+#define local_dec_and_test(l) atomic_long_dec_and_test(&(l)->a)
+#define local_inc_and_test(l) atomic_long_inc_and_test(&(l)->a)
+#define local_add_negative(i, l) atomic_long_add_negative((i), (&(l)->a))
+#define local_add_return(i, l) atomic_long_add_return((i), (&(l)->a))
+#define local_sub_return(i, l) atomic_long_sub_return((i), (&(l)->a))
+#define local_inc_return(l) atomic_long_inc_return(&(l)->a)
+
+#define local_cmpxchg(l, old, new) atomic_long_cmpxchg((&(l)->a), (old), (new))
+#define local_xchg(l, new) atomic_long_xchg((&(l)->a), (new))
+#define local_add_unless(l, a, u) atomic_long_add_unless((&(l)->a), (a), (u))
+#define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
+
 /* Non-atomic variants, ie. preemption disabled and won't be touched
  * in interrupt, etc.  Some archs can optimize this case well. */
 #define __local_inc(l)		local_set((l), local_read(l) + 1)
@@ -44,19 +57,19 @@ #define __local_sub(i,l)	local_set((l), 
  * much more efficient than these naive implementations.  Note they take
  * a variable (eg. mystruct.foo), not an address.
  */
-#define cpu_local_read(v)	local_read(&__get_cpu_var(v))
-#define cpu_local_set(v, i)	local_set(&__get_cpu_var(v), (i))
-#define cpu_local_inc(v)	local_inc(&__get_cpu_var(v))
-#define cpu_local_dec(v)	local_dec(&__get_cpu_var(v))
-#define cpu_local_add(i, v)	local_add((i), &__get_cpu_var(v))
-#define cpu_local_sub(i, v)	local_sub((i), &__get_cpu_var(v))
+#define cpu_local_read(l)	local_read(&__get_cpu_var(l))
+#define cpu_local_set(l, i)	local_set(&__get_cpu_var(l), (i))
+#define cpu_local_inc(l)	local_inc(&__get_cpu_var(l))
+#define cpu_local_dec(l)	local_dec(&__get_cpu_var(l))
+#define cpu_local_add(i, l)	local_add((i), &__get_cpu_var(l))
+#define cpu_local_sub(i, l)	local_sub((i), &__get_cpu_var(l))
 
 /* Non-atomic increments, ie. preemption disabled and won't be touched
  * in interrupt, etc.  Some archs can optimize this case well.
  */
-#define __cpu_local_inc(v)	__local_inc(&__get_cpu_var(v))
-#define __cpu_local_dec(v)	__local_dec(&__get_cpu_var(v))
-#define __cpu_local_add(i, v)	__local_add((i), &__get_cpu_var(v))
-#define __cpu_local_sub(i, v)	__local_sub((i), &__get_cpu_var(v))
+#define __cpu_local_inc(l)	__local_inc(&__get_cpu_var(l))
+#define __cpu_local_dec(l)	__local_dec(&__get_cpu_var(l))
+#define __cpu_local_add(i, l)	__local_add((i), &__get_cpu_var(l))
+#define __cpu_local_sub(i, l)	__local_sub((i), &__get_cpu_var(l))
 
 #endif /* _ASM_GENERIC_LOCAL_H */

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/10] local_t : alpha
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
  2006-12-21  0:20 ` [PATCH 1/10] local_t : architecture agnostic Mathieu Desnoyers
@ 2006-12-21  0:21 ` Mathieu Desnoyers
  2006-12-21  0:22 ` [PATCH 3/10] local_t : i386 Mathieu Desnoyers
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:21 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

Alpha architecture local_t extension.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-alpha/system.h
+++ b/include/asm-alpha/system.h
@@ -443,6 +443,111 @@ #define xchg(ptr,x)							     \
      (__typeof__(*(ptr))) __xchg((ptr), (unsigned long)_x_, sizeof(*(ptr))); \
   })
 
+static inline unsigned long
+__xchg_u8_local(volatile char *m, unsigned long val)
+{
+	unsigned long ret, tmp, addr64;
+
+	__asm__ __volatile__(
+	"	andnot	%4,7,%3\n"
+	"	insbl	%1,%4,%1\n"
+	"1:	ldq_l	%2,0(%3)\n"
+	"	extbl	%2,%4,%0\n"
+	"	mskbl	%2,%4,%2\n"
+	"	or	%1,%2,%2\n"
+	"	stq_c	%2,0(%3)\n"
+	"	beq	%2,2f\n"
+	".subsection 2\n"
+	"2:	br	1b\n"
+	".previous"
+	: "=&r" (ret), "=&r" (val), "=&r" (tmp), "=&r" (addr64)
+	: "r" ((long)m), "1" (val) : "memory");
+
+	return ret;
+}
+
+static inline unsigned long
+__xchg_u16_local(volatile short *m, unsigned long val)
+{
+	unsigned long ret, tmp, addr64;
+
+	__asm__ __volatile__(
+	"	andnot	%4,7,%3\n"
+	"	inswl	%1,%4,%1\n"
+	"1:	ldq_l	%2,0(%3)\n"
+	"	extwl	%2,%4,%0\n"
+	"	mskwl	%2,%4,%2\n"
+	"	or	%1,%2,%2\n"
+	"	stq_c	%2,0(%3)\n"
+	"	beq	%2,2f\n"
+	".subsection 2\n"
+	"2:	br	1b\n"
+	".previous"
+	: "=&r" (ret), "=&r" (val), "=&r" (tmp), "=&r" (addr64)
+	: "r" ((long)m), "1" (val) : "memory");
+
+	return ret;
+}
+
+static inline unsigned long
+__xchg_u32_local(volatile int *m, unsigned long val)
+{
+	unsigned long dummy;
+
+	__asm__ __volatile__(
+	"1:	ldl_l %0,%4\n"
+	"	bis $31,%3,%1\n"
+	"	stl_c %1,%2\n"
+	"	beq %1,2f\n"
+	".subsection 2\n"
+	"2:	br 1b\n"
+	".previous"
+	: "=&r" (val), "=&r" (dummy), "=m" (*m)
+	: "rI" (val), "m" (*m) : "memory");
+
+	return val;
+}
+
+static inline unsigned long
+__xchg_u64_local(volatile long *m, unsigned long val)
+{
+	unsigned long dummy;
+
+	__asm__ __volatile__(
+	"1:	ldq_l %0,%4\n"
+	"	bis $31,%3,%1\n"
+	"	stq_c %1,%2\n"
+	"	beq %1,2f\n"
+	".subsection 2\n"
+	"2:	br 1b\n"
+	".previous"
+	: "=&r" (val), "=&r" (dummy), "=m" (*m)
+	: "rI" (val), "m" (*m) : "memory");
+
+	return val;
+}
+
+#define __xchg_local(ptr, x, size) \
+({ \
+	unsigned long __xchg__res; \
+	volatile void *__xchg__ptr = (ptr); \
+	switch (size) { \
+		case 1: __xchg__res = __xchg_u8_local(__xchg__ptr, x); break; \
+		case 2: __xchg__res = __xchg_u16_local(__xchg__ptr, x); break; \
+		case 4: __xchg__res = __xchg_u32_local(__xchg__ptr, x); break; \
+		case 8: __xchg__res = __xchg_u64_local(__xchg__ptr, x); break; \
+		default: __xchg_called_with_bad_pointer(); __xchg__res = x; \
+	} \
+	__xchg__res; \
+})
+
+#define xchg_local(ptr,x)						     \
+  ({									     \
+     __typeof__(*(ptr)) _x_ = (x);					     \
+     (__typeof__(*(ptr))) __xchg_local((ptr), (unsigned long)_x_,	     \
+     		sizeof(*(ptr))); \
+  })
+
 #define tas(ptr) (xchg((ptr),1))
 
 
@@ -596,6 +701,128 @@ #define cmpxchg(ptr,o,n)						 \
 				    (unsigned long)_n_, sizeof(*(ptr))); \
   })
 
+static inline unsigned long
+__cmpxchg_u8_local(volatile char *m, long old, long new)
+{
+	unsigned long prev, tmp, cmp, addr64;
+
+	__asm__ __volatile__(
+	"	andnot	%5,7,%4\n"
+	"	insbl	%1,%5,%1\n"
+	"1:	ldq_l	%2,0(%4)\n"
+	"	extbl	%2,%5,%0\n"
+	"	cmpeq	%0,%6,%3\n"
+	"	beq	%3,2f\n"
+	"	mskbl	%2,%5,%2\n"
+	"	or	%1,%2,%2\n"
+	"	stq_c	%2,0(%4)\n"
+	"	beq	%2,3f\n"
+	"2:\n"
+	".subsection 2\n"
+	"3:	br	1b\n"
+	".previous"
+	: "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
+	: "r" ((long)m), "Ir" (old), "1" (new) : "memory");
+
+	return prev;
+}
+
+static inline unsigned long
+__cmpxchg_u16_local(volatile short *m, long old, long new)
+{
+	unsigned long prev, tmp, cmp, addr64;
+
+	__asm__ __volatile__(
+	"	andnot	%5,7,%4\n"
+	"	inswl	%1,%5,%1\n"
+	"1:	ldq_l	%2,0(%4)\n"
+	"	extwl	%2,%5,%0\n"
+	"	cmpeq	%0,%6,%3\n"
+	"	beq	%3,2f\n"
+	"	mskwl	%2,%5,%2\n"
+	"	or	%1,%2,%2\n"
+	"	stq_c	%2,0(%4)\n"
+	"	beq	%2,3f\n"
+	"2:\n"
+	".subsection 2\n"
+	"3:	br	1b\n"
+	".previous"
+	: "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
+	: "r" ((long)m), "Ir" (old), "1" (new) : "memory");
+
+	return prev;
+}
+
+static inline unsigned long
+__cmpxchg_u32_local(volatile int *m, int old, int new)
+{
+	unsigned long prev, cmp;
+
+	__asm__ __volatile__(
+	"1:	ldl_l %0,%5\n"
+	"	cmpeq %0,%3,%1\n"
+	"	beq %1,2f\n"
+	"	mov %4,%1\n"
+	"	stl_c %1,%2\n"
+	"	beq %1,3f\n"
+	"2:\n"
+	".subsection 2\n"
+	"3:	br 1b\n"
+	".previous"
+	: "=&r"(prev), "=&r"(cmp), "=m"(*m)
+	: "r"((long) old), "r"(new), "m"(*m) : "memory");
+
+	return prev;
+}
+
+static inline unsigned long
+__cmpxchg_u64_local(volatile long *m, unsigned long old, unsigned long new)
+{
+	unsigned long prev, cmp;
+
+	__asm__ __volatile__(
+	"1:	ldq_l %0,%5\n"
+	"	cmpeq %0,%3,%1\n"
+	"	beq %1,2f\n"
+	"	mov %4,%1\n"
+	"	stq_c %1,%2\n"
+	"	beq %1,3f\n"
+	"2:\n"
+	".subsection 2\n"
+	"3:	br 1b\n"
+	".previous"
+	: "=&r"(prev), "=&r"(cmp), "=m"(*m)
+	: "r"((long) old), "r"(new), "m"(*m) : "memory");
+
+	return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
+		int size)
+{
+	switch (size) {
+		case 1:
+			return __cmpxchg_u8_local(ptr, old, new);
+		case 2:
+			return __cmpxchg_u16_local(ptr, old, new);
+		case 4:
+			return __cmpxchg_u32_local(ptr, old, new);
+		case 8:
+			return __cmpxchg_u64_local(ptr, old, new);
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
+
+#define cmpxchg_local(ptr,o,n)						 \
+  ({									 \
+     __typeof__(*(ptr)) _o_ = (o);					 \
+     __typeof__(*(ptr)) _n_ = (n);					 \
+     (__typeof__(*(ptr))) __cmpxchg_local((ptr), (unsigned long)_o_,	 \
+				    (unsigned long)_n_, sizeof(*(ptr))); \
+  })
+
 #endif /* __ASSEMBLY__ */
 
 #define arch_align_stack(x) (x)
--- a/include/asm-alpha/local.h
+++ b/include/asm-alpha/local.h
@@ -4,37 +4,115 @@ #define _ALPHA_LOCAL_H
 #include <linux/percpu.h>
 #include <asm/atomic.h>
 
-typedef atomic64_t local_t;
+typedef struct
+{
+	atomic_long_t a;
+} local_t;
 
-#define LOCAL_INIT(i)	ATOMIC64_INIT(i)
-#define local_read(v)	atomic64_read(v)
-#define local_set(v,i)	atomic64_set(v,i)
+#define LOCAL_INIT(i)	{ ATOMIC_LONG_INIT(i) }
+#define local_read(l)	atomic_long_read(&(l)->a)
+#define local_set(l,i)	atomic_long_set(&(l)->a, (i))
+#define local_inc(l)	atomic_long_inc(&(l)->a)
+#define local_dec(l)	atomic_long_dec(&(l)->a)
+#define local_add(i,l)	atomic_long_add((i),(&(l)->a))
+#define local_sub(i,l)	atomic_long_sub((i),(&(l)->a))
 
-#define local_inc(v)	atomic64_inc(v)
-#define local_dec(v)	atomic64_dec(v)
-#define local_add(i, v)	atomic64_add(i, v)
-#define local_sub(i, v)	atomic64_sub(i, v)
+static __inline__ long local_add_return(long i, local_t * l)
+{
+	long temp, result;
+	__asm__ __volatile__(
+	"1:	ldq_l %0,%1\n"
+	"	addq %0,%3,%2\n"
+	"	addq %0,%3,%0\n"
+	"	stq_c %0,%1\n"
+	"	beq %0,2f\n"
+	".subsection 2\n"
+	"2:	br 1b\n"
+	".previous"
+	:"=&r" (temp), "=m" (l->a.counter), "=&r" (result)
+	:"Ir" (i), "m" (l->a.counter) : "memory");
+	return result;
+}
 
-#define __local_inc(v)		((v)->counter++)
-#define __local_dec(v)		((v)->counter++)
-#define __local_add(i,v)	((v)->counter+=(i))
-#define __local_sub(i,v)	((v)->counter-=(i))
+static __inline__ long local_sub_return(long i, local_t * v)
+{
+	long temp, result;
+	__asm__ __volatile__(
+	"1:	ldq_l %0,%1\n"
+	"	subq %0,%3,%2\n"
+	"	subq %0,%3,%0\n"
+	"	stq_c %0,%1\n"
+	"	beq %0,2f\n"
+	".subsection 2\n"
+	"2:	br 1b\n"
+	".previous"
+	:"=&r" (temp), "=m" (l->a.counter), "=&r" (result)
+	:"Ir" (i), "m" (l->a.counter) : "memory");
+	return result;
+}
+
+#define local_cmpxchg(l, old, new) \
+	((long)cmpxchg_local(&((l)->a.counter), old, new))
+#define local_xchg(l, new) (xchg_local(&((l)->a.counter), new))
+
+/**
+ * local_add_unless - add unless the number is a given value
+ * @l: pointer of type local_t
+ * @a: the amount to add to l...
+ * @u: ...unless l is equal to u.
+ *
+ * Atomically adds @a to @l, so long as it was not @u.
+ * Returns non-zero if @l was not @u, and zero otherwise.
+ */
+#define local_add_unless(l, a, u)				\
+({								\
+	long c, old;						\
+	c = local_read(l);					\
+	for (;;) {						\
+		if (unlikely(c == (u)))				\
+			break;					\
+		old = local_cmpxchg((l), c, c + (a));	\
+		if (likely(old == c))				\
+			break;					\
+		c = old;					\
+	}							\
+	c != (u);						\
+})
+#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+
+#define local_add_negative(a, l) (local_add_return((a), (l)) < 0)
+
+#define local_dec_return(l) local_sub_return(1,(l))
+
+#define local_inc_return(l) local_add_return(1,(l))
+
+#define local_sub_and_test(i,l) (local_sub_return((i), (l)) == 0)
+
+#define local_inc_and_test(l) (local_add_return(1, (l)) == 0)
+
+#define local_dec_and_test(l) (local_sub_return(1, (l)) == 0)
+
+/* Verify if faster than atomic ops */
+#define __local_inc(l)		((l)->a.counter++)
+#define __local_dec(l)		((l)->a.counter++)
+#define __local_add(i,l)	((l)->a.counter+=(i))
+#define __local_sub(i,l)	((l)->a.counter-=(i))
 
 /* Use these for per-cpu local_t variables: on some archs they are
  * much more efficient than these naive implementations.  Note they take
  * a variable, not an address.
  */
-#define cpu_local_read(v)	local_read(&__get_cpu_var(v))
-#define cpu_local_set(v, i)	local_set(&__get_cpu_var(v), (i))
-
-#define cpu_local_inc(v)	local_inc(&__get_cpu_var(v))
-#define cpu_local_dec(v)	local_dec(&__get_cpu_var(v))
-#define cpu_local_add(i, v)	local_add((i), &__get_cpu_var(v))
-#define cpu_local_sub(i, v)	local_sub((i), &__get_cpu_var(v))
-
-#define __cpu_local_inc(v)	__local_inc(&__get_cpu_var(v))
-#define __cpu_local_dec(v)	__local_dec(&__get_cpu_var(v))
-#define __cpu_local_add(i, v)	__local_add((i), &__get_cpu_var(v))
-#define __cpu_local_sub(i, v)	__local_sub((i), &__get_cpu_var(v))
+#define cpu_local_read(l)	local_read(&__get_cpu_var(l))
+#define cpu_local_set(l, i)	local_set(&__get_cpu_var(l), (i))
+
+#define cpu_local_inc(l)	local_inc(&__get_cpu_var(l))
+#define cpu_local_dec(l)	local_dec(&__get_cpu_var(l))
+#define cpu_local_add(i, l)	local_add((i), &__get_cpu_var(l))
+#define cpu_local_sub(i, l)	local_sub((i), &__get_cpu_var(l))
+
+#define __cpu_local_inc(l)	__local_inc(&__get_cpu_var(l))
+#define __cpu_local_dec(l)	__local_dec(&__get_cpu_var(l))
+#define __cpu_local_add(i, l)	__local_add((i), &__get_cpu_var(l))
+#define __cpu_local_sub(i, l)	__local_sub((i), &__get_cpu_var(l))
 
 #endif /* _ALPHA_LOCAL_H */

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/10] local_t : i386
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
  2006-12-21  0:20 ` [PATCH 1/10] local_t : architecture agnostic Mathieu Desnoyers
  2006-12-21  0:21 ` [PATCH 2/10] local_t : alpha Mathieu Desnoyers
@ 2006-12-21  0:22 ` Mathieu Desnoyers
  2006-12-21 19:44   ` [Ltt-dev] [PATCH 3/10] local_t : i386, local_add_return fix Mathieu Desnoyers
  2006-12-21  0:23 ` [PATCH 4/10] local_t : ia64 Mathieu Desnoyers
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:22 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

i386 architecture local_t extension.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-i386/system.h
+++ b/include/asm-i386/system.h
@@ -274,6 +274,9 @@ #define cmpxchg(ptr,o,n)\
 #define sync_cmpxchg(ptr,o,n)\
 	((__typeof__(*(ptr)))__sync_cmpxchg((ptr),(unsigned long)(o),\
 					(unsigned long)(n),sizeof(*(ptr))))
+#define cmpxchg_local(ptr,o,n)\
+	((__typeof__(*(ptr)))__cmpxchg_local((ptr),(unsigned long)(o),\
+					(unsigned long)(n),sizeof(*(ptr))))
 #endif
 
 static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
@@ -336,6 +339,33 @@ static inline unsigned long __sync_cmpxc
 	return old;
 }
 
+static inline unsigned long __cmpxchg_local(volatile void *ptr,
+			unsigned long old, unsigned long new, int size)
+{
+	unsigned long prev;
+	switch (size) {
+	case 1:
+		__asm__ __volatile__("cmpxchgb %b1,%2"
+				     : "=a"(prev)
+				     : "q"(new), "m"(*__xg(ptr)), "0"(old)
+				     : "memory");
+		return prev;
+	case 2:
+		__asm__ __volatile__("cmpxchgw %w1,%2"
+				     : "=a"(prev)
+				     : "r"(new), "m"(*__xg(ptr)), "0"(old)
+				     : "memory");
+		return prev;
+	case 4:
+		__asm__ __volatile__("cmpxchgl %1,%2"
+				     : "=a"(prev)
+				     : "r"(new), "m"(*__xg(ptr)), "0"(old)
+				     : "memory");
+		return prev;
+	}
+	return old;
+}
+
 #ifndef CONFIG_X86_CMPXCHG
 /*
  * Building a kernel capable running on 80386. It may be necessary to
@@ -372,6 +402,17 @@ ({									\
 					(unsigned long)(n), sizeof(*(ptr))); \
 	__ret;								\
 })
+#define cmpxchg_local(ptr,o,n)						\
+({									\
+	__typeof__(*(ptr)) __ret;					\
+	if (likely(boot_cpu_data.x86 > 3))				\
+		__ret = __cmpxchg_local((ptr), (unsigned long)(o),	\
+					(unsigned long)(n), sizeof(*(ptr))); \
+	else								\
+		__ret = cmpxchg_386((ptr), (unsigned long)(o),		\
+					(unsigned long)(n), sizeof(*(ptr))); \
+	__ret;								\
+})
 #endif
 
 #ifdef CONFIG_X86_CMPXCHG64
@@ -390,10 +431,26 @@ static inline unsigned long long __cmpxc
 	return prev;
 }
 
+static inline unsigned long long __cmpxchg64_local(volatile void *ptr,
+			unsigned long long old, unsigned long long new)
+{
+	unsigned long long prev;
+	__asm__ __volatile__("cmpxchg8b %3"
+			     : "=A"(prev)
+			     : "b"((unsigned long)new),
+			       "c"((unsigned long)(new >> 32)),
+			       "m"(*__xg(ptr)),
+			       "0"(old)
+			     : "memory");
+	return prev;
+}
+
 #define cmpxchg64(ptr,o,n)\
 	((__typeof__(*(ptr)))__cmpxchg64((ptr),(unsigned long long)(o),\
 					(unsigned long long)(n)))
-
+#define cmpxchg64_local(ptr,o,n)\
+	((__typeof__(*(ptr)))__cmpxchg64_local((ptr),(unsigned long long)(o),\
+					(unsigned long long)(n)))
 #endif
     
 /*
--- a/include/asm-i386/local.h
+++ b/include/asm-i386/local.h
@@ -2,47 +2,198 @@ #ifndef _ARCH_I386_LOCAL_H
 #define _ARCH_I386_LOCAL_H
 
 #include <linux/percpu.h>
+#include <asm/system.h>
+#include <asm/atomic.h>
 
 typedef struct
 {
-	volatile long counter;
+	atomic_long_t a;
 } local_t;
 
-#define LOCAL_INIT(i)	{ (i) }
+#define LOCAL_INIT(i)	{ ATOMIC_LONG_INIT(i) }
 
-#define local_read(v)	((v)->counter)
-#define local_set(v,i)	(((v)->counter) = (i))
+#define local_read(l)	atomic_long_read(&(l)->a)
+#define local_set(l,i)	atomic_long_set(&(l)->a, (i))
 
-static __inline__ void local_inc(local_t *v)
+static __inline__ void local_inc(local_t *l)
 {
 	__asm__ __volatile__(
 		"incl %0"
-		:"+m" (v->counter));
+		:"+m" (l->a.counter));
 }
 
-static __inline__ void local_dec(local_t *v)
+static __inline__ void local_dec(local_t *l)
 {
 	__asm__ __volatile__(
 		"decl %0"
-		:"+m" (v->counter));
+		:"+m" (l->a.counter));
 }
 
-static __inline__ void local_add(long i, local_t *v)
+static __inline__ void local_add(long i, local_t *l)
 {
 	__asm__ __volatile__(
 		"addl %1,%0"
-		:"+m" (v->counter)
+		:"+m" (l->a.counter)
 		:"ir" (i));
 }
 
-static __inline__ void local_sub(long i, local_t *v)
+static __inline__ void local_sub(long i, local_t *l)
 {
 	__asm__ __volatile__(
 		"subl %1,%0"
-		:"+m" (v->counter)
+		:"+m" (l->a.counter)
 		:"ir" (i));
 }
 
+/**
+ * local_sub_and_test - subtract value from variable and test result
+ * @i: integer value to subtract
+ * @l: pointer of type local_t
+ * 
+ * Atomically subtracts @i from @l and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+static __inline__ int local_sub_and_test(long i, local_t *l)
+{
+	unsigned char c;
+
+	__asm__ __volatile__(
+		"subl %2,%0; sete %1"
+		:"+m" (l->a.counter), "=qm" (c)
+		:"ir" (i) : "memory");
+	return c;
+}
+
+/**
+ * local_dec_and_test - decrement and test
+ * @l: pointer of type local_t
+ * 
+ * Atomically decrements @l by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
+ */ 
+static __inline__ int local_dec_and_test(local_t *l)
+{
+	unsigned char c;
+
+	__asm__ __volatile__(
+		"decl %0; sete %1"
+		:"+m" (l->a.counter), "=qm" (c)
+		: : "memory");
+	return c != 0;
+}
+
+/**
+ * local_inc_and_test - increment and test 
+ * @l: pointer of type local_t
+ * 
+ * Atomically increments @l by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */ 
+static __inline__ int local_inc_and_test(local_t *l)
+{
+	unsigned char c;
+
+	__asm__ __volatile__(
+		"incl %0; sete %1"
+		:"+m" (l->a.counter), "=qm" (c)
+		: : "memory");
+	return c != 0;
+}
+
+/**
+ * local_add_negative - add and test if negative
+ * @l: pointer of type local_t
+ * @i: integer value to add
+ * 
+ * Atomically adds @i to @l and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */ 
+static __inline__ int local_add_negative(long i, local_t *l)
+{
+	unsigned char c;
+
+	__asm__ __volatile__(
+		"addl %2,%0; sets %1"
+		:"+m" (l->a.counter), "=qm" (c)
+		:"ir" (i) : "memory");
+	return c;
+}
+
+/**
+ * local_add_return - add and return
+ * @l: pointer of type local_t
+ * @i: integer value to add
+ *
+ * Atomically adds @i to @l and returns @i + @l
+ */
+static __inline__ long local_add_return(long i, local_t *l)
+{
+	long __i;
+#ifdef CONFIG_M386
+	unsigned long flags;
+	if(unlikely(boot_cpu_data.x86==3))
+		goto no_xadd;
+#endif
+	/* Modern 486+ processor */
+	__i = i;
+	__asm__ __volatile__(
+		"xaddl %0, %1;"
+		:"=r"(i)
+		:"m"(l->a.counter), "0"(i));
+	return i + __i;
+
+#ifdef CONFIG_M386
+no_xadd: /* Legacy 386 processor */
+	local_irq_save(flags);
+	__i = local_read(l);
+	local_set(l, i + __i);
+	local_irq_restore(flags);
+	return i + __i;
+#endif
+}
+
+static __inline__ long local_sub_return(long i, local_t *l)
+{
+	return local_add_return(-i,l);
+}
+
+#define local_inc_return(l)  (local_add_return(1,l))
+#define local_dec_return(l)  (local_sub_return(1,l))
+
+#define local_cmpxchg(l, o, n) \
+	((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+/* Always has a lock prefix anyway */
+#define local_xchg(l, new) (xchg(&((l)->a.counter), new))
+
+/**
+ * local_add_unless - add unless the number is a given value
+ * @l: pointer of type local_t
+ * @a: the amount to add to l...
+ * @u: ...unless l is equal to u.
+ *
+ * Atomically adds @a to @l, so long as it was not @u.
+ * Returns non-zero if @l was not @u, and zero otherwise.
+ */
+#define local_add_unless(l, a, u)				\
+({								\
+	long c, old;						\
+	c = local_read(l);					\
+	for (;;) {						\
+		if (unlikely(c == (u)))				\
+			break;					\
+		old = local_cmpxchg((l), c, c + (a));	\
+		if (likely(old == c))				\
+			break;					\
+		c = old;					\
+	}							\
+	c != (u);						\
+})
+#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+
 /* On x86, these are no better than the atomic variants. */
 #define __local_inc(l)		local_inc(l)
 #define __local_dec(l)		local_dec(l)
@@ -56,27 +207,27 @@ #define __local_sub(i,l)	local_sub((i),(
 
 /* Need to disable preemption for the cpu local counters otherwise we could
    still access a variable of a previous CPU in a non atomic way. */
-#define cpu_local_wrap_v(v)	 	\
+#define cpu_local_wrap_v(l)	 	\
 	({ local_t res__;		\
 	   preempt_disable(); 		\
-	   res__ = (v);			\
+	   res__ = (l);			\
 	   preempt_enable();		\
 	   res__; })
-#define cpu_local_wrap(v)		\
+#define cpu_local_wrap(l)		\
 	({ preempt_disable();		\
-	   v;				\
+	   l;				\
 	   preempt_enable(); })		\
 
-#define cpu_local_read(v)    cpu_local_wrap_v(local_read(&__get_cpu_var(v)))
-#define cpu_local_set(v, i)  cpu_local_wrap(local_set(&__get_cpu_var(v), (i)))
-#define cpu_local_inc(v)     cpu_local_wrap(local_inc(&__get_cpu_var(v)))
-#define cpu_local_dec(v)     cpu_local_wrap(local_dec(&__get_cpu_var(v)))
-#define cpu_local_add(i, v)  cpu_local_wrap(local_add((i), &__get_cpu_var(v)))
-#define cpu_local_sub(i, v)  cpu_local_wrap(local_sub((i), &__get_cpu_var(v)))
-
-#define __cpu_local_inc(v)	cpu_local_inc(v)
-#define __cpu_local_dec(v)	cpu_local_dec(v)
-#define __cpu_local_add(i, v)	cpu_local_add((i), (v))
-#define __cpu_local_sub(i, v)	cpu_local_sub((i), (v))
+#define cpu_local_read(l)    cpu_local_wrap_v(local_read(&__get_cpu_var(l)))
+#define cpu_local_set(l, i)  cpu_local_wrap(local_set(&__get_cpu_var(l), (i)))
+#define cpu_local_inc(l)     cpu_local_wrap(local_inc(&__get_cpu_var(l)))
+#define cpu_local_dec(l)     cpu_local_wrap(local_dec(&__get_cpu_var(l)))
+#define cpu_local_add(i, l)  cpu_local_wrap(local_add((i), &__get_cpu_var(l)))
+#define cpu_local_sub(i, l)  cpu_local_wrap(local_sub((i), &__get_cpu_var(l)))
+
+#define __cpu_local_inc(l)	cpu_local_inc(l)
+#define __cpu_local_dec(l)	cpu_local_dec(l)
+#define __cpu_local_add(i, l)	cpu_local_add((i), (l))
+#define __cpu_local_sub(i, l)	cpu_local_sub((i), (l))
 
 #endif /* _ARCH_I386_LOCAL_H */

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/10] local_t : ia64
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2006-12-21  0:22 ` [PATCH 3/10] local_t : i386 Mathieu Desnoyers
@ 2006-12-21  0:23 ` Mathieu Desnoyers
  2006-12-21  0:25 ` [PATCH 5/10] " Mathieu Desnoyers
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:23 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

ia64 architecture local_t cleanup : use asm-generic/local.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-ia64/local.h
+++ b/include/asm-ia64/local.h
@@ -1,50 +1 @@
-#ifndef _ASM_IA64_LOCAL_H
-#define _ASM_IA64_LOCAL_H
-
-/*
- * Copyright (C) 2003 Hewlett-Packard Co
- *	David Mosberger-Tang <davidm@hpl.hp.com>
- */
-
-#include <linux/percpu.h>
-
-typedef struct {
-	atomic64_t val;
-} local_t;
-
-#define LOCAL_INIT(i)	((local_t) { { (i) } })
-#define local_read(l)	atomic64_read(&(l)->val)
-#define local_set(l, i)	atomic64_set(&(l)->val, i)
-#define local_inc(l)	atomic64_inc(&(l)->val)
-#define local_dec(l)	atomic64_dec(&(l)->val)
-#define local_add(i, l)	atomic64_add((i), &(l)->val)
-#define local_sub(i, l)	atomic64_sub((i), &(l)->val)
-
-/* Non-atomic variants, i.e., preemption disabled and won't be touched in interrupt, etc.  */
-
-#define __local_inc(l)		(++(l)->val.counter)
-#define __local_dec(l)		(--(l)->val.counter)
-#define __local_add(i,l)	((l)->val.counter += (i))
-#define __local_sub(i,l)	((l)->val.counter -= (i))
-
-/*
- * Use these for per-cpu local_t variables.  Note they take a variable (eg. mystruct.foo),
- * not an address.
- */
-#define cpu_local_read(v)	local_read(&__ia64_per_cpu_var(v))
-#define cpu_local_set(v, i)	local_set(&__ia64_per_cpu_var(v), (i))
-#define cpu_local_inc(v)	local_inc(&__ia64_per_cpu_var(v))
-#define cpu_local_dec(v)	local_dec(&__ia64_per_cpu_var(v))
-#define cpu_local_add(i, v)	local_add((i), &__ia64_per_cpu_var(v))
-#define cpu_local_sub(i, v)	local_sub((i), &__ia64_per_cpu_var(v))
-
-/*
- * Non-atomic increments, i.e., preemption disabled and won't be touched in interrupt,
- * etc.
- */
-#define __cpu_local_inc(v)	__local_inc(&__ia64_per_cpu_var(v))
-#define __cpu_local_dec(v)	__local_dec(&__ia64_per_cpu_var(v))
-#define __cpu_local_add(i, v)	__local_add((i), &__ia64_per_cpu_var(v))
-#define __cpu_local_sub(i, v)	__local_sub((i), &__ia64_per_cpu_var(v))
-
-#endif /* _ASM_IA64_LOCAL_H */
+#include <asm-generic/local.h>

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 5/10] local_t : ia64
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2006-12-21  0:23 ` [PATCH 4/10] local_t : ia64 Mathieu Desnoyers
@ 2006-12-21  0:25 ` Mathieu Desnoyers
  2006-12-21 14:04   ` [Ltt-dev] [PATCH 5/10] local_t : MIPS Mathieu Desnoyers
  2006-12-21  0:25 ` [PATCH 6/10] local_t : parisc Mathieu Desnoyers
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:25 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, Linux-MIPS
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

ia64 architecture local_t extension.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-mips/system.h
+++ b/include/asm-mips/system.h
@@ -253,6 +253,58 @@ static inline unsigned long __cmpxchg_u3
 	return retval;
 }
 
+static inline unsigned long __cmpxchg_u32_local(volatile int * m,
+	unsigned long old, unsigned long new)
+{
+	__u32 retval;
+
+	if (cpu_has_llsc && R10000_LLSC_WAR) {
+		__asm__ __volatile__(
+		"	.set	push					\n"
+		"	.set	noat					\n"
+		"	.set	mips3					\n"
+		"1:	ll	%0, %2			# __cmpxchg_u32	\n"
+		"	bne	%0, %z3, 2f				\n"
+		"	.set	mips0					\n"
+		"	move	$1, %z4					\n"
+		"	.set	mips3					\n"
+		"	sc	$1, %1					\n"
+		"	beqzl	$1, 1b					\n"
+		"2:							\n"
+		"	.set	pop					\n"
+		: "=&r" (retval), "=R" (*m)
+		: "R" (*m), "Jr" (old), "Jr" (new)
+		: "memory");
+	} else if (cpu_has_llsc) {
+		__asm__ __volatile__(
+		"	.set	push					\n"
+		"	.set	noat					\n"
+		"	.set	mips3					\n"
+		"1:	ll	%0, %2			# __cmpxchg_u32	\n"
+		"	bne	%0, %z3, 2f				\n"
+		"	.set	mips0					\n"
+		"	move	$1, %z4					\n"
+		"	.set	mips3					\n"
+		"	sc	$1, %1					\n"
+		"	beqz	$1, 1b					\n"
+		"2:							\n"
+		"	.set	pop					\n"
+		: "=&r" (retval), "=R" (*m)
+		: "R" (*m), "Jr" (old), "Jr" (new)
+		: "memory");
+	} else {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		retval = *m;
+		if (retval == old)
+			*m = new;
+		local_irq_restore(flags);	/* implies memory barrier  */
+	}
+
+	return retval;
+}
+
 #ifdef CONFIG_64BIT
 static inline unsigned long __cmpxchg_u64(volatile int * m, unsigned long old,
 	unsigned long new)
@@ -303,10 +355,62 @@ static inline unsigned long __cmpxchg_u6
 
 	return retval;
 }
+
+static inline unsigned long __cmpxchg_u64_local(volatile int * m,
+	unsigned long old, unsigned long new)
+{
+	__u64 retval;
+
+	if (cpu_has_llsc && R10000_LLSC_WAR) {
+		__asm__ __volatile__(
+		"	.set	push					\n"
+		"	.set	noat					\n"
+		"	.set	mips3					\n"
+		"1:	lld	%0, %2			# __cmpxchg_u64	\n"
+		"	bne	%0, %z3, 2f				\n"
+		"	move	$1, %z4					\n"
+		"	scd	$1, %1					\n"
+		"	beqzl	$1, 1b					\n"
+		"2:							\n"
+		"	.set	pop					\n"
+		: "=&r" (retval), "=R" (*m)
+		: "R" (*m), "Jr" (old), "Jr" (new)
+		: "memory");
+	} else if (cpu_has_llsc) {
+		__asm__ __volatile__(
+		"	.set	push					\n"
+		"	.set	noat					\n"
+		"	.set	mips3					\n"
+		"1:	lld	%0, %2			# __cmpxchg_u64	\n"
+		"	bne	%0, %z3, 2f				\n"
+		"	move	$1, %z4					\n"
+		"	scd	$1, %1					\n"
+		"	beqz	$1, 1b					\n"
+		"2:							\n"
+		"	.set	pop					\n"
+		: "=&r" (retval), "=R" (*m)
+		: "R" (*m), "Jr" (old), "Jr" (new)
+		: "memory");
+	} else {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		retval = *m;
+		if (retval == old)
+			*m = new;
+		local_irq_restore(flags);	/* implies memory barrier  */
+	}
+
+	return retval;
+}
+
 #else
 extern unsigned long __cmpxchg_u64_unsupported_on_32bit_kernels(
 	volatile int * m, unsigned long old, unsigned long new);
 #define __cmpxchg_u64 __cmpxchg_u64_unsupported_on_32bit_kernels
+extern unsigned long __cmpxchg_u64_local_unsupported_on_32bit_kernels(
+	volatile int * m, unsigned long old, unsigned long new);
+#define __cmpxchg_u64_local __cmpxchg_u64_local_unsupported_on_32bit_kernels
 #endif
 
 /* This function doesn't exist, so you'll get a linker error
@@ -326,7 +430,26 @@ static inline unsigned long __cmpxchg(vo
 	return old;
 }
 
-#define cmpxchg(ptr,old,new) ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(old), (unsigned long)(new),sizeof(*(ptr))))
+static inline unsigned long __cmpxchg_local(volatile void * ptr,
+	unsigned long old, unsigned long new, int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_u32_local(ptr, old, new);
+	case 8:
+		return __cmpxchg_u64_local(ptr, old, new);
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
+
+#define cmpxchg(ptr,old,new) \
+	((__typeof__(*(ptr)))__cmpxchg((ptr), \
+		(unsigned long)(old), (unsigned long)(new),sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,old,new) \
+	((__typeof__(*(ptr)))__cmpxchg_local((ptr), \
+		(unsigned long)(old), (unsigned long)(new),sizeof(*(ptr))))
 
 extern void set_handler (unsigned long offset, void *addr, unsigned long len);
 extern void set_uncached_handler (unsigned long offset, void *addr, unsigned long len);
--- a/include/asm-mips/local.h
+++ b/include/asm-mips/local.h
@@ -1,60 +1,527 @@
-#ifndef _ASM_LOCAL_H
-#define _ASM_LOCAL_H
+#ifndef _ARCH_POWERPC_LOCAL_H
+#define _ARCH_POWERPC_LOCAL_H
 
 #include <linux/percpu.h>
 #include <asm/atomic.h>
 
-#ifdef CONFIG_32BIT
+typedef struct
+{
+	local_long_t a;
+} local_t;
 
-typedef atomic_t local_t;
+#define LOCAL_INIT(i)	{ local_LONG_INIT(i) }
 
-#define LOCAL_INIT(i)	ATOMIC_INIT(i)
-#define local_read(v)	atomic_read(v)
-#define local_set(v,i)	atomic_set(v,i)
+#define local_read(l)	local_long_read(&(l)->a)
+#define local_set(l,i)	local_long_set(&(l)->a, (i))
 
-#define local_inc(v)	atomic_inc(v)
-#define local_dec(v)	atomic_dec(v)
-#define local_add(i, v)	atomic_add(i, v)
-#define local_sub(i, v)	atomic_sub(i, v)
+#define local_add(i,l)	local_long_add((i),(&(l)->a))
+#define local_sub(i,l)	local_long_sub((i),(&(l)->a))
+#define local_inc(l)	local_long_inc(&(l)->a)
+#define local_dec(l)	local_long_dec(&(l)->a)
 
-#endif
 
-#ifdef CONFIG_64BIT
+#ifndef CONFIG_64BITS
 
-typedef atomic64_t local_t;
+/*
+ * Same as above, but return the result value
+ */
+static __inline__ int local_add_return(int i, local_t * l)
+{
+	unsigned long result;
+
+	if (cpu_has_llsc && R10000_LLSC_WAR) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	ll	%1, %2		# local_add_return	\n"
+		"	addu	%0, %1, %3				\n"
+		"	sc	%0, %2					\n"
+		"	beqzl	%0, 1b					\n"
+		"	addu	%0, %1, %3				\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else if (cpu_has_llsc) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	ll	%1, %2		# local_add_return	\n"
+		"	addu	%0, %1, %3				\n"
+		"	sc	%0, %2					\n"
+		"	beqz	%0, 1b					\n"
+		"	addu	%0, %1, %3				\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		result = &(l->a.counter);
+		result += i;
+		&(l->a.counter) = result;
+		local_irq_restore(flags);
+	}
+
+	return result;
+}
+
+static __inline__ int local_sub_return(int i, local_t * l)
+{
+	unsigned long result;
+
+	if (cpu_has_llsc && R10000_LLSC_WAR) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	ll	%1, %2		# local_sub_return	\n"
+		"	subu	%0, %1, %3				\n"
+		"	sc	%0, %2					\n"
+		"	beqzl	%0, 1b					\n"
+		"	subu	%0, %1, %3				\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else if (cpu_has_llsc) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	ll	%1, %2		# local_sub_return	\n"
+		"	subu	%0, %1, %3				\n"
+		"	sc	%0, %2					\n"
+		"	beqz	%0, 1b					\n"
+		"	subu	%0, %1, %3				\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		result = &(l->a.counter);
+		result -= i;
+		&(l->a.counter) = result;
+		local_irq_restore(flags);
+	}
+
+	return result;
+}
+
+/*
+ * local_sub_if_positive - conditionally subtract integer from atomic variable
+ * @i: integer value to subtract
+ * @l: pointer of type local_t
+ *
+ * Atomically test @l and subtract @i if @l is greater or equal than @i.
+ * The function returns the old value of @l minus @i.
+ */
+static __inline__ int local_sub_if_positive(int i, local_t * l)
+{
+	unsigned long result;
+
+	if (cpu_has_llsc && R10000_LLSC_WAR) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	ll	%1, %2		# local_sub_if_positive\n"
+		"	subu	%0, %1, %3				\n"
+		"	bltz	%0, 1f					\n"
+		"	sc	%0, %2					\n"
+		"	.set	noreorder				\n"
+		"	beqzl	%0, 1b					\n"
+		"	 subu	%0, %1, %3				\n"
+		"	.set	reorder					\n"
+		"1:							\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else if (cpu_has_llsc) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	ll	%1, %2		# local_sub_if_positive\n"
+		"	subu	%0, %1, %3				\n"
+		"	bltz	%0, 1f					\n"
+		"	sc	%0, %2					\n"
+		"	.set	noreorder				\n"
+		"	beqz	%0, 1b					\n"
+		"	 subu	%0, %1, %3				\n"
+		"	.set	reorder					\n"
+		"1:							\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		result = &(l->a.counter);
+		result -= i;
+		if (result >= 0)
+			&(l->a.counter) = result;
+		local_irq_restore(flags);
+	}
+
+	return result;
+}
+
+#define local_cmpxchg(l, o, n) \
+	((long)cmpxchg(&((l)->a.counter), (o), (n)))
+#define local_xchg(l, new) (xchg(&((l)->a.counter), new))
+
+/**
+ * local_add_unless - add unless the number is a given value
+ * @l: pointer of type local_t
+ * @a: the amount to add to l...
+ * @u: ...unless l is equal to u.
+ *
+ * Atomically adds @a to @l, so long as it was not @u.
+ * Returns non-zero if @l was not @u, and zero otherwise.
+ */
+#define local_add_unless(l, a, u)				\
+({								\
+	long c, old;						\
+	c = local_read(l);					\
+	while (c != (u) && (old = local_cmpxchg((l), c, c + (a))) != c) \
+		c = old;					\
+	c != (u);						\
+})
+#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+
+#define local_dec_return(l) local_sub_return(1,(l))
+#define local_inc_return(l) local_add_return(1,(l))
+
+/*
+ * local_sub_and_test - subtract value from variable and test result
+ * @i: integer value to subtract
+ * @l: pointer of type local_t
+ *
+ * Atomically subtracts @i from @l and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+#define local_sub_and_test(i,l) (local_sub_return((i), (l)) == 0)
+
+/*
+ * local_inc_and_test - increment and test
+ * @l: pointer of type local_t
+ *
+ * Atomically increments @l by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+#define local_inc_and_test(l) (local_inc_return(l) == 0)
+
+/*
+ * local_dec_and_test - decrement by 1 and test
+ * @l: pointer of type local_t
+ *
+ * Atomically decrements @l by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
+ */
+#define local_dec_and_test(l) (local_sub_return(1, (l)) == 0)
+
+/*
+ * local_dec_if_positive - decrement by 1 if old value positive
+ * @l: pointer of type local_t
+ */
+#define local_dec_if_positive(l)	local_sub_if_positive(1, l)
+
+/*
+ * local_add_negative - add and test if negative
+ * @l: pointer of type local_t
+ * @i: integer value to add
+ *
+ * Atomically adds @i to @l and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */
+#define local_add_negative(i,l) (local_add_return(i, (l)) < 0)
+
+#else /* CONFIG_64BITS */
 
-#define LOCAL_INIT(i)	ATOMIC64_INIT(i)
-#define local_read(v)	atomic64_read(v)
-#define local_set(v,i)	atomic64_set(v,i)
+/*
+ * Same as above, but return the result value
+ */
+static __inline__ long local_add_return(long i, local_t * l)
+{
+	unsigned long result;
+
+	if (cpu_has_llsc && R10000_LLSC_WAR) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	lld	%1, %2		# local_add_return	\n"
+		"	addu	%0, %1, %3				\n"
+		"	scd	%0, %2					\n"
+		"	beqzl	%0, 1b					\n"
+		"	addu	%0, %1, %3				\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else if (cpu_has_llsc) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	lld	%1, %2		# local_add_return	\n"
+		"	addu	%0, %1, %3				\n"
+		"	scd	%0, %2					\n"
+		"	beqz	%0, 1b					\n"
+		"	addu	%0, %1, %3				\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		result = &(l->a.counter);
+		result += i;
+		&(l->a.counter) = result;
+		local_irq_restore(flags);
+	}
 
-#define local_inc(v)	atomic64_inc(v)
-#define local_dec(v)	atomic64_dec(v)
-#define local_add(i, v)	atomic64_add(i, v)
-#define local_sub(i, v)	atomic64_sub(i, v)
+	return result;
+}
 
-#endif
+static __inline__ long local_sub_return(long i, local_t * l)
+{
+	unsigned long result;
 
-#define __local_inc(v)		((v)->counter++)
-#define __local_dec(v)		((v)->counter--)
-#define __local_add(i,v)	((v)->counter+=(i))
-#define __local_sub(i,v)	((v)->counter-=(i))
+	if (cpu_has_llsc && R10000_LLSC_WAR) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	lld	%1, %2		# local_sub_return	\n"
+		"	subu	%0, %1, %3				\n"
+		"	scd	%0, %2					\n"
+		"	beqzl	%0, 1b					\n"
+		"	subu	%0, %1, %3				\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else if (cpu_has_llsc) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	lld	%1, %2		# local_sub_return	\n"
+		"	subu	%0, %1, %3				\n"
+		"	scd	%0, %2					\n"
+		"	beqz	%0, 1b					\n"
+		"	subu	%0, %1, %3				\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		result = &(l->a.counter);
+		result -= i;
+		&(l->a.counter) = result;
+		local_irq_restore(flags);
+	}
+
+	return result;
+}
 
 /*
- * Use these for per-cpu local_t variables: on some archs they are
+ * local_sub_if_positive - conditionally subtract integer from atomic variable
+ * @i: integer value to subtract
+ * @l: pointer of type local_t
+ *
+ * Atomically test @l and subtract @i if @l is greater or equal than @i.
+ * The function returns the old value of @l minus @i.
+ */
+static __inline__ long local_sub_if_positive(long i, local_t * l)
+{
+	unsigned long result;
+
+	if (cpu_has_llsc && R10000_LLSC_WAR) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	lld	%1, %2		# local_sub_if_positive\n"
+		"	dsubu	%0, %1, %3				\n"
+		"	bltz	%0, 1f					\n"
+		"	scd	%0, %2					\n"
+		"	.set	noreorder				\n"
+		"	beqzl	%0, 1b					\n"
+		"	 dsubu	%0, %1, %3				\n"
+		"	.set	reorder					\n"
+		"1:							\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else if (cpu_has_llsc) {
+		unsigned long temp;
+
+		__asm__ __volatile__(
+		"	.set	mips3					\n"
+		"1:	lld	%1, %2		# local_sub_if_positive\n"
+		"	dsubu	%0, %1, %3				\n"
+		"	bltz	%0, 1f					\n"
+		"	scd	%0, %2					\n"
+		"	.set	noreorder				\n"
+		"	beqz	%0, 1b					\n"
+		"	 dsubu	%0, %1, %3				\n"
+		"	.set	reorder					\n"
+		"1:							\n"
+		"	.set	mips0					\n"
+		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
+		: "Ir" (i), "m" (&(l->a.counter))
+		: "memory");
+	} else {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		result = &(l->a.counter);
+		result -= i;
+		if (result >= 0)
+			&(l->a.counter) = result;
+		local_irq_restore(flags);
+	}
+
+	return result;
+}
+
+
+#define local_cmpxchg(l, o, n) \
+	((long)cmpxchg(&((l)->a.counter), (o), (n)))
+#define local_xchg(l, new) (xchg(&((l)->a.counter), new))
+
+/**
+ * local_add_unless - add unless the number is a given value
+ * @l: pointer of type local_t
+ * @a: the amount to add to l...
+ * @u: ...unless l is equal to u.
+ *
+ * Atomically adds @a to @l, so long as it was not @u.
+ * Returns non-zero if @l was not @u, and zero otherwise.
+ */
+#define local_add_unless(l, a, u)				\
+({								\
+	long c, old;						\
+	c = local_read(l);					\
+	while (c != (u) && (old = local_cmpxchg((l), c, c + (a))) != c) \
+		c = old;					\
+	c != (u);						\
+})
+#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+
+#define local_dec_return(l) local_sub_return(1,(l))
+#define local_inc_return(l) local_add_return(1,(l))
+
+/*
+ * local_sub_and_test - subtract value from variable and test result
+ * @i: integer value to subtract
+ * @l: pointer of type local_t
+ *
+ * Atomically subtracts @i from @l and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+#define local_sub_and_test(i,l) (local_sub_return((i), (l)) == 0)
+
+/*
+ * local_inc_and_test - increment and test
+ * @l: pointer of type local_t
+ *
+ * Atomically increments @l by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+#define local_inc_and_test(l) (local_inc_return(l) == 0)
+
+/*
+ * local_dec_and_test - decrement by 1 and test
+ * @l: pointer of type local_t
+ *
+ * Atomically decrements @l by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
+ */
+#define local_dec_and_test(l) (local_sub_return(1, (l)) == 0)
+
+/*
+ * local_dec_if_positive - decrement by 1 if old value positive
+ * @l: pointer of type local_t
+ */
+#define local_dec_if_positive(l)	local_sub_if_positive(1, l)
+
+/*
+ * local_add_negative - add and test if negative
+ * @l: pointer of type local_t
+ * @i: integer value to add
+ *
+ * Atomically adds @i to @l and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */
+#define local_add_negative(i,l) (local_add_return(i, (l)) < 0)
+
+#endif /* !CONFIG_64BITS */
+
+
+/* Use these for per-cpu local_t variables: on some archs they are
  * much more efficient than these naive implementations.  Note they take
  * a variable, not an address.
+ *
+ * This could be done better if we moved the per cpu data directly
+ * after GS.
  */
-#define cpu_local_read(v)	local_read(&__get_cpu_var(v))
-#define cpu_local_set(v, i)	local_set(&__get_cpu_var(v), (i))
 
-#define cpu_local_inc(v)	local_inc(&__get_cpu_var(v))
-#define cpu_local_dec(v)	local_dec(&__get_cpu_var(v))
-#define cpu_local_add(i, v)	local_add((i), &__get_cpu_var(v))
-#define cpu_local_sub(i, v)	local_sub((i), &__get_cpu_var(v))
+#define __local_inc(l)		((l)->a.counter++)
+#define __local_dec(l)		((l)->a.counter++)
+#define __local_add(i,l)	((l)->a.counter+=(i))
+#define __local_sub(i,l)	((l)->a.counter-=(i))
+
+/* Need to disable preemption for the cpu local counters otherwise we could
+   still access a variable of a previous CPU in a non atomic way. */
+#define cpu_local_wrap_v(l)	 	\
+	({ local_t res__;		\
+	   preempt_disable(); 		\
+	   res__ = (l);			\
+	   preempt_enable();		\
+	   res__; })
+#define cpu_local_wrap(l)		\
+	({ preempt_disable();		\
+	   l;				\
+	   preempt_enable(); })		\
+
+#define cpu_local_read(l)    cpu_local_wrap_v(local_read(&__get_cpu_var(l)))
+#define cpu_local_set(l, i)  cpu_local_wrap(local_set(&__get_cpu_var(l), (i)))
+#define cpu_local_inc(l)     cpu_local_wrap(local_inc(&__get_cpu_var(l)))
+#define cpu_local_dec(l)     cpu_local_wrap(local_dec(&__get_cpu_var(l)))
+#define cpu_local_add(i, l)  cpu_local_wrap(local_add((i), &__get_cpu_var(l)))
+#define cpu_local_sub(i, l)  cpu_local_wrap(local_sub((i), &__get_cpu_var(l)))
 
-#define __cpu_local_inc(v)	__local_inc(&__get_cpu_var(v))
-#define __cpu_local_dec(v)	__local_dec(&__get_cpu_var(v))
-#define __cpu_local_add(i, v)	__local_add((i), &__get_cpu_var(v))
-#define __cpu_local_sub(i, v)	__local_sub((i), &__get_cpu_var(v))
+#define __cpu_local_inc(l)	cpu_local_inc(l)
+#define __cpu_local_dec(l)	cpu_local_dec(l)
+#define __cpu_local_add(i, l)	cpu_local_add((i), (l))
+#define __cpu_local_sub(i, l)	cpu_local_sub((i), (l))
 
-#endif /* _ASM_LOCAL_H */
+#endif /* _ARCH_POWERPC_LOCAL_H */

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 6/10] local_t : parisc
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2006-12-21  0:25 ` [PATCH 5/10] " Mathieu Desnoyers
@ 2006-12-21  0:25 ` Mathieu Desnoyers
  2006-12-21  0:27 ` [PATCH 7/10] local_t : powerpc Mathieu Desnoyers
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:25 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

parisc architecture local_t cleanup : use asm-generic/local.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-parisc/local.h
+++ b/include/asm-parisc/local.h
@@ -1,40 +1 @@
-#ifndef _ARCH_PARISC_LOCAL_H
-#define _ARCH_PARISC_LOCAL_H
-
-#include <linux/percpu.h>
-#include <asm/atomic.h>
-
-typedef atomic_long_t local_t;
-
-#define LOCAL_INIT(i)	ATOMIC_LONG_INIT(i)
-#define local_read(v)	atomic_long_read(v)
-#define local_set(v,i)	atomic_long_set(v,i)
-
-#define local_inc(v)	atomic_long_inc(v)
-#define local_dec(v)	atomic_long_dec(v)
-#define local_add(i, v)	atomic_long_add(i, v)
-#define local_sub(i, v)	atomic_long_sub(i, v)
-
-#define __local_inc(v)		((v)->counter++)
-#define __local_dec(v)		((v)->counter--)
-#define __local_add(i,v)	((v)->counter+=(i))
-#define __local_sub(i,v)	((v)->counter-=(i))
-
-/* Use these for per-cpu local_t variables: on some archs they are
- * much more efficient than these naive implementations.  Note they take
- * a variable, not an address.
- */
-#define cpu_local_read(v)	local_read(&__get_cpu_var(v))
-#define cpu_local_set(v, i)	local_set(&__get_cpu_var(v), (i))
-
-#define cpu_local_inc(v)	local_inc(&__get_cpu_var(v))
-#define cpu_local_dec(v)	local_dec(&__get_cpu_var(v))
-#define cpu_local_add(i, v)	local_add((i), &__get_cpu_var(v))
-#define cpu_local_sub(i, v)	local_sub((i), &__get_cpu_var(v))
-
-#define __cpu_local_inc(v)	__local_inc(&__get_cpu_var(v))
-#define __cpu_local_dec(v)	__local_dec(&__get_cpu_var(v))
-#define __cpu_local_add(i, v)	__local_add((i), &__get_cpu_var(v))
-#define __cpu_local_sub(i, v)	__local_sub((i), &__get_cpu_var(v))
-
-#endif /* _ARCH_PARISC_LOCAL_H */
+#include <asm-generic/local.h>

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 7/10] local_t : powerpc
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2006-12-21  0:25 ` [PATCH 6/10] local_t : parisc Mathieu Desnoyers
@ 2006-12-21  0:27 ` Mathieu Desnoyers
  2006-12-21  3:34   ` [Ltt-dev] " Mathieu Desnoyers
  2007-01-24  9:08   ` Paul Mackerras
  2006-12-21  0:27 ` [PATCH 8/10] local_t : s390 Mathieu Desnoyers
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:27 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, paulus
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh,
	Thomas Gleixner, linuxppc-dev

PowerPC local_t extension.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-powerpc/system.h
+++ b/include/asm-powerpc/system.h
@@ -226,6 +226,29 @@ __xchg_u32(volatile void *p, unsigned lo
 	return prev;
 }
 
+/*
+ * Atomic exchange
+ *
+ * Changes the memory location '*ptr' to be val and returns
+ * the previous value stored there.
+ */
+static __inline__ unsigned long
+__xchg_u32_local(volatile void *p, unsigned long val)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__(
+"1:	lwarx	%0,0,%2 \n"
+	PPC405_ERR77(0,%2)
+"	stwcx.	%3,0,%2 \n\
+	bne-	1b"
+	: "=&r" (prev), "+m" (*(volatile unsigned int *)p)
+	: "r" (p), "r" (val)
+	: "cc", "memory");
+
+	return prev;
+}
+
 #ifdef CONFIG_PPC64
 static __inline__ unsigned long
 __xchg_u64(volatile void *p, unsigned long val)
@@ -245,6 +268,23 @@ __xchg_u64(volatile void *p, unsigned lo
 
 	return prev;
 }
+
+static __inline__ unsigned long
+__xchg_u64_local(volatile void *p, unsigned long val)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__(
+"1:	ldarx	%0,0,%2 \n"
+	PPC405_ERR77(0,%2)
+"	stdcx.	%3,0,%2 \n\
+	bne-	1b"
+	: "=&r" (prev), "+m" (*(volatile unsigned long *)p)
+	: "r" (p), "r" (val)
+	: "cc", "memory");
+
+	return prev;
+}
 #endif
 
 /*
@@ -268,12 +308,33 @@ #endif
 	return x;
 }
 
+static __inline__ unsigned long
+__xchg_local(volatile void *ptr, unsigned long x, unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __xchg_u32_local(ptr, x);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __xchg_u64_local(ptr, x);
+#endif
+	}
+	__xchg_called_with_bad_pointer();
+	return x;
+}
 #define xchg(ptr,x)							     \
   ({									     \
      __typeof__(*(ptr)) _x_ = (x);					     \
      (__typeof__(*(ptr))) __xchg((ptr), (unsigned long)_x_, sizeof(*(ptr))); \
   })
 
+#define xchg_local(ptr,x)						     \
+  ({									     \
+     __typeof__(*(ptr)) _x_ = (x);					     \
+     (__typeof__(*(ptr))) __xchg_local((ptr),				     \
+     		(unsigned long)_x_, sizeof(*(ptr))); 			     \
+  })
+
 #define tas(ptr) (xchg((ptr),1))
 
 /*
@@ -305,6 +366,28 @@ __cmpxchg_u32(volatile unsigned int *p, 
 	return prev;
 }
 
+static __inline__ unsigned long
+__cmpxchg_u32_local(volatile unsigned int *p, unsigned long old,
+			unsigned long new)
+{
+	unsigned int prev;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%2		# __cmpxchg_u32\n\
+	cmpw	0,%0,%3\n\
+	bne-	2f\n"
+	PPC405_ERR77(0,%2)
+"	stwcx.	%4,0,%2\n\
+	bne-	1b"
+	"\n\
+2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return prev;
+}
+
 #ifdef CONFIG_PPC64
 static __inline__ unsigned long
 __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
@@ -327,6 +410,27 @@ __cmpxchg_u64(volatile unsigned long *p,
 
 	return prev;
 }
+
+static __inline__ unsigned long
+__cmpxchg_u64_local(volatile unsigned long *p, unsigned long old,
+			unsigned long new)
+{
+	unsigned long prev;
+
+	__asm__ __volatile__ (
+"1:	ldarx	%0,0,%2		# __cmpxchg_u64\n\
+	cmpd	0,%0,%3\n\
+	bne-	2f\n\
+	stdcx.	%4,0,%2\n\
+	bne-	1b"
+	"\n\
+2:"
+	: "=&r" (prev), "+m" (*p)
+	: "r" (p), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return prev;
+}
 #endif
 
 /* This function doesn't exist, so you'll get a linker error
@@ -349,6 +453,22 @@ #endif
 	return old;
 }
 
+static __inline__ unsigned long
+__cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
+	  unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_u32_local(ptr, old, new);
+#ifdef CONFIG_PPC64
+	case 8:
+		return __cmpxchg_u64_local(ptr, old, new);
+#endif
+	}
+	__cmpxchg_called_with_bad_pointer();
+	return old;
+}
+
 #define cmpxchg(ptr,o,n)						 \
   ({									 \
      __typeof__(*(ptr)) _o_ = (o);					 \
@@ -357,6 +477,15 @@ #define cmpxchg(ptr,o,n)						 \
 				    (unsigned long)_n_, sizeof(*(ptr))); \
   })
 
+
+#define cmpxchg_local(ptr,o,n)						 \
+  ({									 \
+     __typeof__(*(ptr)) _o_ = (o);					 \
+     __typeof__(*(ptr)) _n_ = (n);					 \
+     (__typeof__(*(ptr))) __cmpxchg_local((ptr), (unsigned long)_o_,	 \
+				    (unsigned long)_n_, sizeof(*(ptr))); \
+  })
+
 #ifdef CONFIG_PPC64
 /*
  * We handle most unaligned accesses in hardware. On the other hand 
--- a/include/asm-powerpc/local.h
+++ b/include/asm-powerpc/local.h
@@ -1 +1,345 @@
-#include <asm-generic/local.h>
+#ifndef _ARCH_POWERPC_LOCAL_H
+#define _ARCH_POWERPC_LOCAL_H
+
+#include <linux/percpu.h>
+#include <asm/atomic.h>
+
+typedef struct
+{
+	atomic_long_t a;
+} local_t;
+
+#define LOCAL_INIT(i)	{ ATOMIC_LONG_INIT(i) }
+
+#define local_read(l)	atomic_long_read(&(l)->a)
+#define local_set(l,i)	atomic_long_set(&(l)->a, (i))
+
+#define local_add(i,l)	atomic_long_add((i),(&(l)->a))
+#define local_sub(i,l)	atomic_long_sub((i),(&(l)->a))
+#define local_inc(l)	atomic_long_inc(&(l)->a)
+#define local_dec(l)	atomic_long_dec(&(l)->a)
+
+#ifndef __powerpc64__
+
+static __inline__ int local_add_return(int a, local_t *l)
+{
+	int t;
+
+	__asm__ __volatile__(
+"1:	lwarx	%0,0,%2		# local_add_return\n\
+	add	%0,%1,%0\n"
+	PPC405_ERR77(0,%2)
+"	stwcx.	%0,0,%2 \n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (a), "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+#define local_add_negative(a, l)	(local_add_return((a), (l)) < 0)
+
+static __inline__ int local_sub_return(int a, local_t *l)
+{
+	int t;
+
+	__asm__ __volatile__(
+"1:	lwarx	%0,0,%2		# local_sub_return\n\
+	subf	%0,%1,%0\n"
+	PPC405_ERR77(0,%2)
+"	stwcx.	%0,0,%2 \n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (a), "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+static __inline__ int local_inc_return(local_t *l)
+{
+	int t;
+
+	__asm__ __volatile__(
+"1:	lwarx	%0,0,%1		# local_inc_return\n\
+	addic	%0,%0,1\n"
+	PPC405_ERR77(0,%1)
+"	stwcx.	%0,0,%1 \n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+/*
+ * local_inc_and_test - increment and test
+ * @l: pointer of type local_t
+ *
+ * Atomically increments @l by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+#define local_inc_and_test(l) (local_inc_return(l) == 0)
+
+static __inline__ int local_dec_return(local_t *l)
+{
+	int t;
+
+	__asm__ __volatile__(
+"1:	lwarx	%0,0,%1		# local_dec_return\n\
+	addic	%0,%0,-1\n"
+	PPC405_ERR77(0,%1)
+"	stwcx.	%0,0,%1\n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+#define local_cmpxchg(l, o, n) \
+	((long)cmpxchg(&((l)->a.counter), (o), (n)))
+#define local_xchg(l, new) (xchg(&((l)->a.counter), new))
+
+/**
+ * local_add_unless - add unless the number is a given value
+ * @l: pointer of type local_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @l, so long as it was not @u.
+ * Returns non-zero if @l was not @u, and zero otherwise.
+ */
+static __inline__ int local_add_unless(local_t *l, int a, int u)
+{
+	int t;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%1		# local_add_unless\n\
+	cmpw	0,%0,%3 \n\
+	beq-	2f \n\
+	add	%0,%2,%0 \n"
+	PPC405_ERR77(0,%2)
+"	stwcx.	%0,0,%1 \n\
+	bne-	1b \n"
+"	subf	%0,%2,%0 \n\
+2:"
+	: "=&r" (t)
+	: "r" (&(l->a.counter)), "r" (a), "r" (u)
+	: "cc", "memory");
+
+	return t != u;
+}
+
+#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+
+#define local_sub_and_test(a, l)	(local_sub_return((a), (l)) == 0)
+#define local_dec_and_test(l)		(local_dec_return((l)) == 0)
+
+/*
+ * Atomically test *l and decrement if it is greater than 0.
+ * The function returns the old value of *l minus 1.
+ */
+static __inline__ int local_dec_if_positive(local_t *l)
+{
+	int t;
+
+	__asm__ __volatile__(
+"1:	lwarx	%0,0,%1		# local_dec_if_positive\n\
+	addic.	%0,%0,-1\n\
+	blt-	2f\n"
+	PPC405_ERR77(0,%1)
+"	stwcx.	%0,0,%1\n\
+	bne-	1b"
+	"\n\
+2:"	: "=&r" (t)
+	: "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+#else /* __powerpc64__ */
+
+static __inline__ long local_add_return(long a, local_t *l)
+{
+	long t;
+
+	__asm__ __volatile__(
+"1:	ldarx	%0,0,%2		# local_add_return\n\
+	add	%0,%1,%0\n\
+	stdcx.	%0,0,%2 \n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (a), "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+#define local_add_negative(a, l)	(local_add_return((a), (l)) < 0)
+
+static __inline__ long local_sub_return(long a, local_t *l)
+{
+	long t;
+
+	__asm__ __volatile__(
+"1:	ldarx	%0,0,%2		# local_sub_return\n\
+	subf	%0,%1,%0\n\
+	stdcx.	%0,0,%2 \n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (a), "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+static __inline__ long local_inc_return(local_t *l)
+{
+	long t;
+
+	__asm__ __volatile__(
+"1:	ldarx	%0,0,%1		# local_inc_return\n\
+	addic	%0,%0,1\n\
+	stdcx.	%0,0,%1 \n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+/*
+ * local_inc_and_test - increment and test
+ * @l: pointer of type local_t
+ *
+ * Atomically increments @l by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+#define local_inc_and_test(l) (local_inc_return(l) == 0)
+
+static __inline__ long local_dec_return(local_t *l)
+{
+	long t;
+
+	__asm__ __volatile__(
+"1:	ldarx	%0,0,%1		# local_dec_return\n\
+	addic	%0,%0,-1\n\
+	stdcx.	%0,0,%1\n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+#define local_sub_and_test(a, l)	(local_sub_return((a), (l)) == 0)
+#define local_dec_and_test(l)	(local_dec_return((l)) == 0)
+
+/*
+ * Atomically test *l and decrement if it is greater than 0.
+ * The function returns the old value of *l minus 1.
+ */
+static __inline__ long local_dec_if_positive(local_t *l)
+{
+	long t;
+
+	__asm__ __volatile__(
+"1:	ldarx	%0,0,%1		# local_dec_if_positive\n\
+	addic.	%0,%0,-1\n\
+	blt-	2f\n\
+	stdcx.	%0,0,%1\n\
+	bne-	1b"
+	"\n\
+2:"	: "=&r" (t)
+	: "r" (&(l->a.counter))
+	: "cc", "memory");
+
+	return t;
+}
+
+#define local_cmpxchg(l, o, n) \
+	((__typeof__((l)->a.counter))cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_xchg(l, new) (xchg_local(&((l)->a.counter), new))
+
+/**
+ * local_add_unless - add unless the number is a given value
+ * @l: pointer of type local_t
+ * @a: the amount to add to l...
+ * @u: ...unless l is equal to u.
+ *
+ * Atomically adds @a to @l, so long as it was not @u.
+ * Returns non-zero if @l was not @u, and zero otherwise.
+ */
+static __inline__ int local_add_unless(local_t *l, long a, long u)
+{
+	long t;
+
+	__asm__ __volatile__ (
+"1:	ldarx	%0,0,%1		# local_add_unless\n\
+	cmpd	0,%0,%3 \n\
+	beq-	2f \n\
+	add	%0,%2,%0 \n"
+	PPC405_ERR77(0,%2)
+"	stdcx.	%0,0,%1 \n\
+	bne-	1b \n"
+"	subf	%0,%2,%0 \n\
+2:"
+	: "=&r" (t)
+	: "r" (&(l->a.counter)), "r" (a), "r" (u)
+	: "cc", "memory");
+
+	return t != u;
+}
+
+#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+
+#endif /* !__powerpc64__ */
+
+/* Use these for per-cpu local_t variables: on some archs they are
+ * much more efficient than these naive implementations.  Note they take
+ * a variable, not an address.
+ *
+ * This could be done better if we moved the per cpu data directly
+ * after GS.
+ */
+
+#define __local_inc(l)		((l)->a.counter++)
+#define __local_dec(l)		((l)->a.counter++)
+#define __local_add(i,l)	((l)->a.counter+=(i))
+#define __local_sub(i,l)	((l)->a.counter-=(i))
+
+/* Need to disable preemption for the cpu local counters otherwise we could
+   still access a variable of a previous CPU in a non atomic way. */
+#define cpu_local_wrap_v(l)	 	\
+	({ local_t res__;		\
+	   preempt_disable(); 		\
+	   res__ = (l);			\
+	   preempt_enable();		\
+	   res__; })
+#define cpu_local_wrap(l)		\
+	({ preempt_disable();		\
+	   l;				\
+	   preempt_enable(); })		\
+
+#define cpu_local_read(l)    cpu_local_wrap_v(local_read(&__get_cpu_var(l)))
+#define cpu_local_set(l, i)  cpu_local_wrap(local_set(&__get_cpu_var(l), (i)))
+#define cpu_local_inc(l)     cpu_local_wrap(local_inc(&__get_cpu_var(l)))
+#define cpu_local_dec(l)     cpu_local_wrap(local_dec(&__get_cpu_var(l)))
+#define cpu_local_add(i, l)  cpu_local_wrap(local_add((i), &__get_cpu_var(l)))
+#define cpu_local_sub(i, l)  cpu_local_wrap(local_sub((i), &__get_cpu_var(l)))
+
+#define __cpu_local_inc(l)	cpu_local_inc(l)
+#define __cpu_local_dec(l)	cpu_local_dec(l)
+#define __cpu_local_add(i, l)	cpu_local_add((i), (l))
+#define __cpu_local_sub(i, l)	cpu_local_sub((i), (l))
+
+#endif /* _ARCH_POWERPC_LOCAL_H */

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 8/10] local_t : s390
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2006-12-21  0:27 ` [PATCH 7/10] local_t : powerpc Mathieu Desnoyers
@ 2006-12-21  0:27 ` Mathieu Desnoyers
  2006-12-21  0:28 ` [PATCH 9/10] local_t : sparc64 Mathieu Desnoyers
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:27 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

s390 architecture local_t cleanup : use asm-generic/local.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-s390/local.h
+++ b/include/asm-s390/local.h
@@ -1,58 +1 @@
-#ifndef _ASM_LOCAL_H
-#define _ASM_LOCAL_H
-
-#include <linux/percpu.h>
-#include <asm/atomic.h>
-
-#ifndef __s390x__
-
-typedef atomic_t local_t;
-
-#define LOCAL_INIT(i)	ATOMIC_INIT(i)
-#define local_read(v)	atomic_read(v)
-#define local_set(v,i)	atomic_set(v,i)
-
-#define local_inc(v)	atomic_inc(v)
-#define local_dec(v)	atomic_dec(v)
-#define local_add(i, v)	atomic_add(i, v)
-#define local_sub(i, v)	atomic_sub(i, v)
-
-#else
-
-typedef atomic64_t local_t;
-
-#define LOCAL_INIT(i)	ATOMIC64_INIT(i)
-#define local_read(v)	atomic64_read(v)
-#define local_set(v,i)	atomic64_set(v,i)
-
-#define local_inc(v)	atomic64_inc(v)
-#define local_dec(v)	atomic64_dec(v)
-#define local_add(i, v)	atomic64_add(i, v)
-#define local_sub(i, v)	atomic64_sub(i, v)
-
-#endif
-
-#define __local_inc(v)		((v)->counter++)
-#define __local_dec(v)		((v)->counter--)
-#define __local_add(i,v)	((v)->counter+=(i))
-#define __local_sub(i,v)	((v)->counter-=(i))
-
-/*
- * Use these for per-cpu local_t variables: on some archs they are
- * much more efficient than these naive implementations.  Note they take
- * a variable, not an address.
- */
-#define cpu_local_read(v)	local_read(&__get_cpu_var(v))
-#define cpu_local_set(v, i)	local_set(&__get_cpu_var(v), (i))
-
-#define cpu_local_inc(v)	local_inc(&__get_cpu_var(v))
-#define cpu_local_dec(v)	local_dec(&__get_cpu_var(v))
-#define cpu_local_add(i, v)	local_add((i), &__get_cpu_var(v))
-#define cpu_local_sub(i, v)	local_sub((i), &__get_cpu_var(v))
-
-#define __cpu_local_inc(v)	__local_inc(&__get_cpu_var(v))
-#define __cpu_local_dec(v)	__local_dec(&__get_cpu_var(v))
-#define __cpu_local_add(i, v)	__local_add((i), &__get_cpu_var(v))
-#define __cpu_local_sub(i, v)	__local_sub((i), &__get_cpu_var(v))
-
-#endif /* _ASM_LOCAL_H */
+#include <asm-generic/local.h>

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 9/10] local_t : sparc64
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  2006-12-21  0:27 ` [PATCH 8/10] local_t : s390 Mathieu Desnoyers
@ 2006-12-21  0:28 ` Mathieu Desnoyers
  2006-12-21  0:29 ` [PATCH 10/10] local_t : x86_64 Mathieu Desnoyers
  2006-12-23  9:33 ` [PATCH 0/10] local_t : adding and standardising atomic primitives Pavel Machek
  10 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:28 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

sparc64 local_t cleanup : simply use asm-generic/local.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-sparc64/local.h
+++ b/include/asm-sparc64/local.h
@@ -1,40 +1 @@
-#ifndef _ARCH_SPARC64_LOCAL_H
-#define _ARCH_SPARC64_LOCAL_H
-
-#include <linux/percpu.h>
-#include <asm/atomic.h>
-
-typedef atomic64_t local_t;
-
-#define LOCAL_INIT(i)	ATOMIC64_INIT(i)
-#define local_read(v)	atomic64_read(v)
-#define local_set(v,i)	atomic64_set(v,i)
-
-#define local_inc(v)	atomic64_inc(v)
-#define local_dec(v)	atomic64_dec(v)
-#define local_add(i, v)	atomic64_add(i, v)
-#define local_sub(i, v)	atomic64_sub(i, v)
-
-#define __local_inc(v)		((v)->counter++)
-#define __local_dec(v)		((v)->counter--)
-#define __local_add(i,v)	((v)->counter+=(i))
-#define __local_sub(i,v)	((v)->counter-=(i))
-
-/* Use these for per-cpu local_t variables: on some archs they are
- * much more efficient than these naive implementations.  Note they take
- * a variable, not an address.
- */
-#define cpu_local_read(v)	local_read(&__get_cpu_var(v))
-#define cpu_local_set(v, i)	local_set(&__get_cpu_var(v), (i))
-
-#define cpu_local_inc(v)	local_inc(&__get_cpu_var(v))
-#define cpu_local_dec(v)	local_dec(&__get_cpu_var(v))
-#define cpu_local_add(i, v)	local_add((i), &__get_cpu_var(v))
-#define cpu_local_sub(i, v)	local_sub((i), &__get_cpu_var(v))
-
-#define __cpu_local_inc(v)	__local_inc(&__get_cpu_var(v))
-#define __cpu_local_dec(v)	__local_dec(&__get_cpu_var(v))
-#define __cpu_local_add(i, v)	__local_add((i), &__get_cpu_var(v))
-#define __cpu_local_sub(i, v)	__local_sub((i), &__get_cpu_var(v))
-
-#endif /* _ARCH_SPARC64_LOCAL_H */
+#include <asm-generic/local.h>

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 10/10] local_t : x86_64
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
                   ` (8 preceding siblings ...)
  2006-12-21  0:28 ` [PATCH 9/10] local_t : sparc64 Mathieu Desnoyers
@ 2006-12-21  0:29 ` Mathieu Desnoyers
  2006-12-21 19:46   ` [Ltt-dev] [PATCH 10/10] local_t : x86_64 : local_add_return Mathieu Desnoyers
  2006-12-23  9:33 ` [PATCH 0/10] local_t : adding and standardising atomic primitives Pavel Machek
  10 siblings, 1 reply; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  0:29 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: ltt-dev, systemtap, Douglas Niehaus, Martin J. Bligh, Thomas Gleixner

x86_64 architecture local_t extension.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-x86_64/system.h
+++ b/include/asm-x86_64/system.h
@@ -209,9 +209,45 @@ static inline unsigned long __cmpxchg(vo
 	return old;
 }
 
+static inline unsigned long __cmpxchg_local(volatile void *ptr,
+			unsigned long old, unsigned long new, int size)
+{
+	unsigned long prev;
+	switch (size) {
+	case 1:
+		__asm__ __volatile__("cmpxchgb %b1,%2"
+				     : "=a"(prev)
+				     : "q"(new), "m"(*__xg(ptr)), "0"(old)
+				     : "memory");
+		return prev;
+	case 2:
+		__asm__ __volatile__("cmpxchgw %w1,%2"
+				     : "=a"(prev)
+				     : "r"(new), "m"(*__xg(ptr)), "0"(old)
+				     : "memory");
+		return prev;
+	case 4:
+		__asm__ __volatile__("cmpxchgl %k1,%2"
+				     : "=a"(prev)
+				     : "r"(new), "m"(*__xg(ptr)), "0"(old)
+				     : "memory");
+		return prev;
+	case 8:
+		__asm__ __volatile__("cmpxchgq %1,%2"
+				     : "=a"(prev)
+				     : "r"(new), "m"(*__xg(ptr)), "0"(old)
+				     : "memory");
+		return prev;
+	}
+	return old;
+}
+
 #define cmpxchg(ptr,o,n)\
 	((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
 					(unsigned long)(n),sizeof(*(ptr))))
+#define cmpxchg_local(ptr,o,n)\
+	((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
+					(unsigned long)(n),sizeof(*(ptr))))
 
 #ifdef CONFIG_SMP
 #define smp_mb()	mb()
--- a/include/asm-x86_64/local.h
+++ b/include/asm-x86_64/local.h
@@ -2,49 +2,183 @@ #ifndef _ARCH_X8664_LOCAL_H
 #define _ARCH_X8664_LOCAL_H
 
 #include <linux/percpu.h>
+#include <asm/atomic.h>
 
 typedef struct
 {
-	volatile long counter;
+	atomic_long_t a;
 } local_t;
 
-#define LOCAL_INIT(i)	{ (i) }
+#define LOCAL_INIT(i)	{ ATOMIC_LONG_INIT(i) }
 
-#define local_read(v)	((v)->counter)
-#define local_set(v,i)	(((v)->counter) = (i))
+#define local_read(l)	atomic_long_read(&(l)->a)
+#define local_set(l,i)	atomic_long_set(&(l)->a, (i))
 
-static inline void local_inc(local_t *v)
+static inline void local_inc(local_t *l)
 {
 	__asm__ __volatile__(
 		"incq %0"
-		:"=m" (v->counter)
-		:"m" (v->counter));
+		:"=m" (l->a.counter)
+		:"m" (l->a.counter));
 }
 
-static inline void local_dec(local_t *v)
+static inline void local_dec(local_t *l)
 {
 	__asm__ __volatile__(
 		"decq %0"
-		:"=m" (v->counter)
-		:"m" (v->counter));
+		:"=m" (l->a.counter)
+		:"m" (l->a.counter));
 }
 
-static inline void local_add(long i, local_t *v)
+static inline void local_add(long i, local_t *l)
 {
 	__asm__ __volatile__(
 		"addq %1,%0"
-		:"=m" (v->counter)
-		:"ir" (i), "m" (v->counter));
+		:"=m" (l->a.counter)
+		:"ir" (i), "m" (l->a.counter));
 }
 
-static inline void local_sub(long i, local_t *v)
+static inline void local_sub(long i, local_t *l)
 {
 	__asm__ __volatile__(
 		"subq %1,%0"
-		:"=m" (v->counter)
-		:"ir" (i), "m" (v->counter));
+		:"=m" (l->a.counter)
+		:"ir" (i), "m" (l->a.counter));
 }
 
+/**
+ * local_sub_and_test - subtract value from variable and test result
+ * @i: integer value to subtract
+ * @l: pointer to type local_t
+ *
+ * Atomically subtracts @i from @l and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+static __inline__ int local_sub_and_test(long i, local_t *l)
+{
+	unsigned char c;
+
+	__asm__ __volatile__(
+		"subq %2,%0; sete %1"
+		:"=m" (l->a.counter), "=qm" (c)
+		:"ir" (i), "m" (l->a.counter) : "memory");
+	return c;
+}
+
+/**
+ * local_dec_and_test - decrement and test
+ * @l: pointer to type local_t
+ *
+ * Atomically decrements @l by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
+ */
+static __inline__ int local_dec_and_test(local_t *l)
+{
+	unsigned char c;
+
+	__asm__ __volatile__(
+		"decq %0; sete %1"
+		:"=m" (l->a.counter), "=qm" (c)
+		:"m" (l->a.counter) : "memory");
+	return c != 0;
+}
+
+/**
+ * local_inc_and_test - increment and test
+ * @l: pointer to type local_t
+ *
+ * Atomically increments @l by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+static __inline__ int local_inc_and_test(local_t *l)
+{
+	unsigned char c;
+
+	__asm__ __volatile__(
+		"incq %0; sete %1"
+		:"=m" (l->a.counter), "=qm" (c)
+		:"m" (l->a.counter) : "memory");
+	return c != 0;
+}
+
+/**
+ * local_add_negative - add and test if negative
+ * @i: integer value to add
+ * @l: pointer to type local_t
+ *
+ * Atomically adds @i to @l and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */
+static __inline__ int local_add_negative(long i, local_t *l)
+{
+	unsigned char c;
+
+	__asm__ __volatile__(
+		"addq %2,%0; sets %1"
+		:"=m" (l->a.counter), "=qm" (c)
+		:"ir" (i), "m" (l->a.counter) : "memory");
+	return c;
+}
+
+/**
+ * local_add_return - add and return
+ * @i: integer value to add
+ * @l: pointer to type local_t
+ *
+ * Atomically adds @i to @l and returns @i + @l
+ */
+static __inline__ long local_add_return(long i, local_t *l)
+{
+	long __i = i;
+	__asm__ __volatile__(
+		"xaddq %0, %1;"
+		:"=r"(i)
+		:"m"(l->a.counter), "0"(i));
+	return i + __i;
+}
+
+static __inline__ long local_sub_return(long i, local_t *l)
+{
+	return local_add_return(-i,l);
+}
+
+#define local_inc_return(l)  (local_add_return(1,l))
+#define local_dec_return(l)  (local_sub_return(1,l))
+
+#define local_cmpxchg(l, o, n) \
+	((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+/* Always has a lock prefix anyway */
+#define local_xchg(l, new) (xchg(&((l)->a.counter), new))
+
+/**
+ * atomic_up_add_unless - add unless the number is a given value
+ * @l: pointer of type local_t
+ * @a: the amount to add to l...
+ * @u: ...unless l is equal to u.
+ *
+ * Atomically adds @a to @l, so long as it was not @u.
+ * Returns non-zero if @l was not @u, and zero otherwise.
+ */
+#define local_add_unless(l, a, u)				\
+({								\
+	long c, old;						\
+	c = local_read(l);					\
+	for (;;) {						\
+		if (unlikely(c == (u)))				\
+			break;					\
+		old = local_cmpxchg((l), c, c + (a));	\
+		if (likely(old == c))				\
+			break;					\
+		c = old;					\
+	}							\
+	c != (u);						\
+})
+#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+
 /* On x86-64 these are better than the atomic variants on SMP kernels
    because they dont use a lock prefix. */
 #define __local_inc(l)		local_inc(l)
@@ -62,27 +196,27 @@ #define __local_sub(i,l)	local_sub((i),(
 
 /* Need to disable preemption for the cpu local counters otherwise we could
    still access a variable of a previous CPU in a non atomic way. */
-#define cpu_local_wrap_v(v)	 	\
+#define cpu_local_wrap_v(l)	 	\
 	({ local_t res__;		\
 	   preempt_disable(); 		\
-	   res__ = (v);			\
+	   res__ = (l);			\
 	   preempt_enable();		\
 	   res__; })
-#define cpu_local_wrap(v)		\
+#define cpu_local_wrap(l)		\
 	({ preempt_disable();		\
-	   v;				\
+	   l;				\
 	   preempt_enable(); })		\
 
-#define cpu_local_read(v)    cpu_local_wrap_v(local_read(&__get_cpu_var(v)))
-#define cpu_local_set(v, i)  cpu_local_wrap(local_set(&__get_cpu_var(v), (i)))
-#define cpu_local_inc(v)     cpu_local_wrap(local_inc(&__get_cpu_var(v)))
-#define cpu_local_dec(v)     cpu_local_wrap(local_dec(&__get_cpu_var(v)))
-#define cpu_local_add(i, v)  cpu_local_wrap(local_add((i), &__get_cpu_var(v)))
-#define cpu_local_sub(i, v)  cpu_local_wrap(local_sub((i), &__get_cpu_var(v)))
+#define cpu_local_read(l)    cpu_local_wrap_v(local_read(&__get_cpu_var(l)))
+#define cpu_local_set(l, i)  cpu_local_wrap(local_set(&__get_cpu_var(l), (i)))
+#define cpu_local_inc(l)     cpu_local_wrap(local_inc(&__get_cpu_var(l)))
+#define cpu_local_dec(l)     cpu_local_wrap(local_dec(&__get_cpu_var(l)))
+#define cpu_local_add(i, l)  cpu_local_wrap(local_add((i), &__get_cpu_var(l)))
+#define cpu_local_sub(i, l)  cpu_local_wrap(local_sub((i), &__get_cpu_var(l)))
 
-#define __cpu_local_inc(v)	cpu_local_inc(v)
-#define __cpu_local_dec(v)	cpu_local_dec(v)
-#define __cpu_local_add(i, v)	cpu_local_add((i), (v))
-#define __cpu_local_sub(i, v)	cpu_local_sub((i), (v))
+#define __cpu_local_inc(l)	cpu_local_inc(l)
+#define __cpu_local_dec(l)	cpu_local_dec(l)
+#define __cpu_local_add(i, l)	cpu_local_add((i), (l))
+#define __cpu_local_sub(i, l)	cpu_local_sub((i), (l))
 
-#endif /* _ARCH_I386_LOCAL_H */
+#endif /* _ARCH_X8664_LOCAL_H */

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Ltt-dev] [PATCH 7/10] local_t : powerpc
  2006-12-21  0:27 ` [PATCH 7/10] local_t : powerpc Mathieu Desnoyers
@ 2006-12-21  3:34   ` Mathieu Desnoyers
  2007-01-24  9:08   ` Paul Mackerras
  1 sibling, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21  3:34 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, paulus
  Cc: Martin J. Bligh, linuxppc-dev, Douglas Niehaus, ltt-dev,
	systemtap, Thomas Gleixner

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> --- a/include/asm-powerpc/local.h
> +++ b/include/asm-powerpc/local.h
> +/**
> + * local_add_unless - add unless the number is a given value
> + * @l: pointer of type local_t
> + * @a: the amount to add to l...
> + * @u: ...unless l is equal to u.
> + *
> + * Atomically adds @a to @l, so long as it was not @u.
> + * Returns non-zero if @l was not @u, and zero otherwise.
> + */
> +static __inline__ int local_add_unless(local_t *l, long a, long u)
> +{
> +	long t;
> +
> +	__asm__ __volatile__ (
> +"1:	ldarx	%0,0,%1		# local_add_unless\n\
> +	cmpd	0,%0,%3 \n\
> +	beq-	2f \n\
> +	add	%0,%2,%0 \n"
> +	PPC405_ERR77(0,%2)

Sorry, the previous line is unnecessary : PPC405 is a 32 bits arch errata and
this code is compiled on 64 bits arch only.

> +"	stdcx.	%0,0,%1 \n\
> +	bne-	1b \n"
> +"	subf	%0,%2,%0 \n\
> +2:"
> +	: "=&r" (t)
> +	: "r" (&(l->a.counter)), "r" (a), "r" (u)
> +	: "cc", "memory");
> +
> +	return t != u;
> +}
> +

Mathieu

-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Ltt-dev] [PATCH 5/10] local_t : MIPS
  2006-12-21  0:25 ` [PATCH 5/10] " Mathieu Desnoyers
@ 2006-12-21 14:04   ` Mathieu Desnoyers
  0 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21 14:04 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, Linux-MIPS
  Cc: Douglas Niehaus, Martin J. Bligh, ltt-dev, Thomas Gleixner, systemtap

Sorry, I meant MIPS.

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> ia64 architecture local_t extension.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> 
> --- a/include/asm-mips/system.h
> +++ b/include/asm-mips/system.h
> @@ -253,6 +253,58 @@ static inline unsigned long __cmpxchg_u3
>  	return retval;
>  }
>  
> +static inline unsigned long __cmpxchg_u32_local(volatile int * m,
> +	unsigned long old, unsigned long new)
> +{
> +	__u32 retval;
> +
> +	if (cpu_has_llsc && R10000_LLSC_WAR) {
> +		__asm__ __volatile__(
> +		"	.set	push					\n"
> +		"	.set	noat					\n"
> +		"	.set	mips3					\n"
> +		"1:	ll	%0, %2			# __cmpxchg_u32	\n"
> +		"	bne	%0, %z3, 2f				\n"
> +		"	.set	mips0					\n"
> +		"	move	$1, %z4					\n"
> +		"	.set	mips3					\n"
> +		"	sc	$1, %1					\n"
> +		"	beqzl	$1, 1b					\n"
> +		"2:							\n"
> +		"	.set	pop					\n"
> +		: "=&r" (retval), "=R" (*m)
> +		: "R" (*m), "Jr" (old), "Jr" (new)
> +		: "memory");
> +	} else if (cpu_has_llsc) {
> +		__asm__ __volatile__(
> +		"	.set	push					\n"
> +		"	.set	noat					\n"
> +		"	.set	mips3					\n"
> +		"1:	ll	%0, %2			# __cmpxchg_u32	\n"
> +		"	bne	%0, %z3, 2f				\n"
> +		"	.set	mips0					\n"
> +		"	move	$1, %z4					\n"
> +		"	.set	mips3					\n"
> +		"	sc	$1, %1					\n"
> +		"	beqz	$1, 1b					\n"
> +		"2:							\n"
> +		"	.set	pop					\n"
> +		: "=&r" (retval), "=R" (*m)
> +		: "R" (*m), "Jr" (old), "Jr" (new)
> +		: "memory");
> +	} else {
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		retval = *m;
> +		if (retval == old)
> +			*m = new;
> +		local_irq_restore(flags);	/* implies memory barrier  */
> +	}
> +
> +	return retval;
> +}
> +
>  #ifdef CONFIG_64BIT
>  static inline unsigned long __cmpxchg_u64(volatile int * m, unsigned long old,
>  	unsigned long new)
> @@ -303,10 +355,62 @@ static inline unsigned long __cmpxchg_u6
>  
>  	return retval;
>  }
> +
> +static inline unsigned long __cmpxchg_u64_local(volatile int * m,
> +	unsigned long old, unsigned long new)
> +{
> +	__u64 retval;
> +
> +	if (cpu_has_llsc && R10000_LLSC_WAR) {
> +		__asm__ __volatile__(
> +		"	.set	push					\n"
> +		"	.set	noat					\n"
> +		"	.set	mips3					\n"
> +		"1:	lld	%0, %2			# __cmpxchg_u64	\n"
> +		"	bne	%0, %z3, 2f				\n"
> +		"	move	$1, %z4					\n"
> +		"	scd	$1, %1					\n"
> +		"	beqzl	$1, 1b					\n"
> +		"2:							\n"
> +		"	.set	pop					\n"
> +		: "=&r" (retval), "=R" (*m)
> +		: "R" (*m), "Jr" (old), "Jr" (new)
> +		: "memory");
> +	} else if (cpu_has_llsc) {
> +		__asm__ __volatile__(
> +		"	.set	push					\n"
> +		"	.set	noat					\n"
> +		"	.set	mips3					\n"
> +		"1:	lld	%0, %2			# __cmpxchg_u64	\n"
> +		"	bne	%0, %z3, 2f				\n"
> +		"	move	$1, %z4					\n"
> +		"	scd	$1, %1					\n"
> +		"	beqz	$1, 1b					\n"
> +		"2:							\n"
> +		"	.set	pop					\n"
> +		: "=&r" (retval), "=R" (*m)
> +		: "R" (*m), "Jr" (old), "Jr" (new)
> +		: "memory");
> +	} else {
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		retval = *m;
> +		if (retval == old)
> +			*m = new;
> +		local_irq_restore(flags);	/* implies memory barrier  */
> +	}
> +
> +	return retval;
> +}
> +
>  #else
>  extern unsigned long __cmpxchg_u64_unsupported_on_32bit_kernels(
>  	volatile int * m, unsigned long old, unsigned long new);
>  #define __cmpxchg_u64 __cmpxchg_u64_unsupported_on_32bit_kernels
> +extern unsigned long __cmpxchg_u64_local_unsupported_on_32bit_kernels(
> +	volatile int * m, unsigned long old, unsigned long new);
> +#define __cmpxchg_u64_local __cmpxchg_u64_local_unsupported_on_32bit_kernels
>  #endif
>  
>  /* This function doesn't exist, so you'll get a linker error
> @@ -326,7 +430,26 @@ static inline unsigned long __cmpxchg(vo
>  	return old;
>  }
>  
> -#define cmpxchg(ptr,old,new) ((__typeof__(*(ptr)))__cmpxchg((ptr), (unsigned long)(old), (unsigned long)(new),sizeof(*(ptr))))
> +static inline unsigned long __cmpxchg_local(volatile void * ptr,
> +	unsigned long old, unsigned long new, int size)
> +{
> +	switch (size) {
> +	case 4:
> +		return __cmpxchg_u32_local(ptr, old, new);
> +	case 8:
> +		return __cmpxchg_u64_local(ptr, old, new);
> +	}
> +	__cmpxchg_called_with_bad_pointer();
> +	return old;
> +}
> +
> +#define cmpxchg(ptr,old,new) \
> +	((__typeof__(*(ptr)))__cmpxchg((ptr), \
> +		(unsigned long)(old), (unsigned long)(new),sizeof(*(ptr))))
> +
> +#define cmpxchg_local(ptr,old,new) \
> +	((__typeof__(*(ptr)))__cmpxchg_local((ptr), \
> +		(unsigned long)(old), (unsigned long)(new),sizeof(*(ptr))))
>  
>  extern void set_handler (unsigned long offset, void *addr, unsigned long len);
>  extern void set_uncached_handler (unsigned long offset, void *addr, unsigned long len);
> --- a/include/asm-mips/local.h
> +++ b/include/asm-mips/local.h
> @@ -1,60 +1,527 @@
> -#ifndef _ASM_LOCAL_H
> -#define _ASM_LOCAL_H
> +#ifndef _ARCH_POWERPC_LOCAL_H
> +#define _ARCH_POWERPC_LOCAL_H
>  
>  #include <linux/percpu.h>
>  #include <asm/atomic.h>
>  
> -#ifdef CONFIG_32BIT
> +typedef struct
> +{
> +	local_long_t a;
> +} local_t;
>  
> -typedef atomic_t local_t;
> +#define LOCAL_INIT(i)	{ local_LONG_INIT(i) }
>  
> -#define LOCAL_INIT(i)	ATOMIC_INIT(i)
> -#define local_read(v)	atomic_read(v)
> -#define local_set(v,i)	atomic_set(v,i)
> +#define local_read(l)	local_long_read(&(l)->a)
> +#define local_set(l,i)	local_long_set(&(l)->a, (i))
>  
> -#define local_inc(v)	atomic_inc(v)
> -#define local_dec(v)	atomic_dec(v)
> -#define local_add(i, v)	atomic_add(i, v)
> -#define local_sub(i, v)	atomic_sub(i, v)
> +#define local_add(i,l)	local_long_add((i),(&(l)->a))
> +#define local_sub(i,l)	local_long_sub((i),(&(l)->a))
> +#define local_inc(l)	local_long_inc(&(l)->a)
> +#define local_dec(l)	local_long_dec(&(l)->a)
>  
> -#endif
>  
> -#ifdef CONFIG_64BIT
> +#ifndef CONFIG_64BITS
>  
> -typedef atomic64_t local_t;
> +/*
> + * Same as above, but return the result value
> + */
> +static __inline__ int local_add_return(int i, local_t * l)
> +{
> +	unsigned long result;
> +
> +	if (cpu_has_llsc && R10000_LLSC_WAR) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	ll	%1, %2		# local_add_return	\n"
> +		"	addu	%0, %1, %3				\n"
> +		"	sc	%0, %2					\n"
> +		"	beqzl	%0, 1b					\n"
> +		"	addu	%0, %1, %3				\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else if (cpu_has_llsc) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	ll	%1, %2		# local_add_return	\n"
> +		"	addu	%0, %1, %3				\n"
> +		"	sc	%0, %2					\n"
> +		"	beqz	%0, 1b					\n"
> +		"	addu	%0, %1, %3				\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else {
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		result = &(l->a.counter);
> +		result += i;
> +		&(l->a.counter) = result;
> +		local_irq_restore(flags);
> +	}
> +
> +	return result;
> +}
> +
> +static __inline__ int local_sub_return(int i, local_t * l)
> +{
> +	unsigned long result;
> +
> +	if (cpu_has_llsc && R10000_LLSC_WAR) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	ll	%1, %2		# local_sub_return	\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	sc	%0, %2					\n"
> +		"	beqzl	%0, 1b					\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else if (cpu_has_llsc) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	ll	%1, %2		# local_sub_return	\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	sc	%0, %2					\n"
> +		"	beqz	%0, 1b					\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else {
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		result = &(l->a.counter);
> +		result -= i;
> +		&(l->a.counter) = result;
> +		local_irq_restore(flags);
> +	}
> +
> +	return result;
> +}
> +
> +/*
> + * local_sub_if_positive - conditionally subtract integer from atomic variable
> + * @i: integer value to subtract
> + * @l: pointer of type local_t
> + *
> + * Atomically test @l and subtract @i if @l is greater or equal than @i.
> + * The function returns the old value of @l minus @i.
> + */
> +static __inline__ int local_sub_if_positive(int i, local_t * l)
> +{
> +	unsigned long result;
> +
> +	if (cpu_has_llsc && R10000_LLSC_WAR) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	ll	%1, %2		# local_sub_if_positive\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	bltz	%0, 1f					\n"
> +		"	sc	%0, %2					\n"
> +		"	.set	noreorder				\n"
> +		"	beqzl	%0, 1b					\n"
> +		"	 subu	%0, %1, %3				\n"
> +		"	.set	reorder					\n"
> +		"1:							\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else if (cpu_has_llsc) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	ll	%1, %2		# local_sub_if_positive\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	bltz	%0, 1f					\n"
> +		"	sc	%0, %2					\n"
> +		"	.set	noreorder				\n"
> +		"	beqz	%0, 1b					\n"
> +		"	 subu	%0, %1, %3				\n"
> +		"	.set	reorder					\n"
> +		"1:							\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else {
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		result = &(l->a.counter);
> +		result -= i;
> +		if (result >= 0)
> +			&(l->a.counter) = result;
> +		local_irq_restore(flags);
> +	}
> +
> +	return result;
> +}
> +
> +#define local_cmpxchg(l, o, n) \
> +	((long)cmpxchg(&((l)->a.counter), (o), (n)))
> +#define local_xchg(l, new) (xchg(&((l)->a.counter), new))
> +
> +/**
> + * local_add_unless - add unless the number is a given value
> + * @l: pointer of type local_t
> + * @a: the amount to add to l...
> + * @u: ...unless l is equal to u.
> + *
> + * Atomically adds @a to @l, so long as it was not @u.
> + * Returns non-zero if @l was not @u, and zero otherwise.
> + */
> +#define local_add_unless(l, a, u)				\
> +({								\
> +	long c, old;						\
> +	c = local_read(l);					\
> +	while (c != (u) && (old = local_cmpxchg((l), c, c + (a))) != c) \
> +		c = old;					\
> +	c != (u);						\
> +})
> +#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
> +
> +#define local_dec_return(l) local_sub_return(1,(l))
> +#define local_inc_return(l) local_add_return(1,(l))
> +
> +/*
> + * local_sub_and_test - subtract value from variable and test result
> + * @i: integer value to subtract
> + * @l: pointer of type local_t
> + *
> + * Atomically subtracts @i from @l and returns
> + * true if the result is zero, or false for all
> + * other cases.
> + */
> +#define local_sub_and_test(i,l) (local_sub_return((i), (l)) == 0)
> +
> +/*
> + * local_inc_and_test - increment and test
> + * @l: pointer of type local_t
> + *
> + * Atomically increments @l by 1
> + * and returns true if the result is zero, or false for all
> + * other cases.
> + */
> +#define local_inc_and_test(l) (local_inc_return(l) == 0)
> +
> +/*
> + * local_dec_and_test - decrement by 1 and test
> + * @l: pointer of type local_t
> + *
> + * Atomically decrements @l by 1 and
> + * returns true if the result is 0, or false for all other
> + * cases.
> + */
> +#define local_dec_and_test(l) (local_sub_return(1, (l)) == 0)
> +
> +/*
> + * local_dec_if_positive - decrement by 1 if old value positive
> + * @l: pointer of type local_t
> + */
> +#define local_dec_if_positive(l)	local_sub_if_positive(1, l)
> +
> +/*
> + * local_add_negative - add and test if negative
> + * @l: pointer of type local_t
> + * @i: integer value to add
> + *
> + * Atomically adds @i to @l and returns true
> + * if the result is negative, or false when
> + * result is greater than or equal to zero.
> + */
> +#define local_add_negative(i,l) (local_add_return(i, (l)) < 0)
> +
> +#else /* CONFIG_64BITS */
>  
> -#define LOCAL_INIT(i)	ATOMIC64_INIT(i)
> -#define local_read(v)	atomic64_read(v)
> -#define local_set(v,i)	atomic64_set(v,i)
> +/*
> + * Same as above, but return the result value
> + */
> +static __inline__ long local_add_return(long i, local_t * l)
> +{
> +	unsigned long result;
> +
> +	if (cpu_has_llsc && R10000_LLSC_WAR) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	lld	%1, %2		# local_add_return	\n"
> +		"	addu	%0, %1, %3				\n"
> +		"	scd	%0, %2					\n"
> +		"	beqzl	%0, 1b					\n"
> +		"	addu	%0, %1, %3				\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else if (cpu_has_llsc) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	lld	%1, %2		# local_add_return	\n"
> +		"	addu	%0, %1, %3				\n"
> +		"	scd	%0, %2					\n"
> +		"	beqz	%0, 1b					\n"
> +		"	addu	%0, %1, %3				\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else {
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		result = &(l->a.counter);
> +		result += i;
> +		&(l->a.counter) = result;
> +		local_irq_restore(flags);
> +	}
>  
> -#define local_inc(v)	atomic64_inc(v)
> -#define local_dec(v)	atomic64_dec(v)
> -#define local_add(i, v)	atomic64_add(i, v)
> -#define local_sub(i, v)	atomic64_sub(i, v)
> +	return result;
> +}
>  
> -#endif
> +static __inline__ long local_sub_return(long i, local_t * l)
> +{
> +	unsigned long result;
>  
> -#define __local_inc(v)		((v)->counter++)
> -#define __local_dec(v)		((v)->counter--)
> -#define __local_add(i,v)	((v)->counter+=(i))
> -#define __local_sub(i,v)	((v)->counter-=(i))
> +	if (cpu_has_llsc && R10000_LLSC_WAR) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	lld	%1, %2		# local_sub_return	\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	scd	%0, %2					\n"
> +		"	beqzl	%0, 1b					\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else if (cpu_has_llsc) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	lld	%1, %2		# local_sub_return	\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	scd	%0, %2					\n"
> +		"	beqz	%0, 1b					\n"
> +		"	subu	%0, %1, %3				\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else {
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		result = &(l->a.counter);
> +		result -= i;
> +		&(l->a.counter) = result;
> +		local_irq_restore(flags);
> +	}
> +
> +	return result;
> +}
>  
>  /*
> - * Use these for per-cpu local_t variables: on some archs they are
> + * local_sub_if_positive - conditionally subtract integer from atomic variable
> + * @i: integer value to subtract
> + * @l: pointer of type local_t
> + *
> + * Atomically test @l and subtract @i if @l is greater or equal than @i.
> + * The function returns the old value of @l minus @i.
> + */
> +static __inline__ long local_sub_if_positive(long i, local_t * l)
> +{
> +	unsigned long result;
> +
> +	if (cpu_has_llsc && R10000_LLSC_WAR) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	lld	%1, %2		# local_sub_if_positive\n"
> +		"	dsubu	%0, %1, %3				\n"
> +		"	bltz	%0, 1f					\n"
> +		"	scd	%0, %2					\n"
> +		"	.set	noreorder				\n"
> +		"	beqzl	%0, 1b					\n"
> +		"	 dsubu	%0, %1, %3				\n"
> +		"	.set	reorder					\n"
> +		"1:							\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else if (cpu_has_llsc) {
> +		unsigned long temp;
> +
> +		__asm__ __volatile__(
> +		"	.set	mips3					\n"
> +		"1:	lld	%1, %2		# local_sub_if_positive\n"
> +		"	dsubu	%0, %1, %3				\n"
> +		"	bltz	%0, 1f					\n"
> +		"	scd	%0, %2					\n"
> +		"	.set	noreorder				\n"
> +		"	beqz	%0, 1b					\n"
> +		"	 dsubu	%0, %1, %3				\n"
> +		"	.set	reorder					\n"
> +		"1:							\n"
> +		"	.set	mips0					\n"
> +		: "=&r" (result), "=&r" (temp), "=m" (&(l->a.counter))
> +		: "Ir" (i), "m" (&(l->a.counter))
> +		: "memory");
> +	} else {
> +		unsigned long flags;
> +
> +		local_irq_save(flags);
> +		result = &(l->a.counter);
> +		result -= i;
> +		if (result >= 0)
> +			&(l->a.counter) = result;
> +		local_irq_restore(flags);
> +	}
> +
> +	return result;
> +}
> +
> +
> +#define local_cmpxchg(l, o, n) \
> +	((long)cmpxchg(&((l)->a.counter), (o), (n)))
> +#define local_xchg(l, new) (xchg(&((l)->a.counter), new))
> +
> +/**
> + * local_add_unless - add unless the number is a given value
> + * @l: pointer of type local_t
> + * @a: the amount to add to l...
> + * @u: ...unless l is equal to u.
> + *
> + * Atomically adds @a to @l, so long as it was not @u.
> + * Returns non-zero if @l was not @u, and zero otherwise.
> + */
> +#define local_add_unless(l, a, u)				\
> +({								\
> +	long c, old;						\
> +	c = local_read(l);					\
> +	while (c != (u) && (old = local_cmpxchg((l), c, c + (a))) != c) \
> +		c = old;					\
> +	c != (u);						\
> +})
> +#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
> +
> +#define local_dec_return(l) local_sub_return(1,(l))
> +#define local_inc_return(l) local_add_return(1,(l))
> +
> +/*
> + * local_sub_and_test - subtract value from variable and test result
> + * @i: integer value to subtract
> + * @l: pointer of type local_t
> + *
> + * Atomically subtracts @i from @l and returns
> + * true if the result is zero, or false for all
> + * other cases.
> + */
> +#define local_sub_and_test(i,l) (local_sub_return((i), (l)) == 0)
> +
> +/*
> + * local_inc_and_test - increment and test
> + * @l: pointer of type local_t
> + *
> + * Atomically increments @l by 1
> + * and returns true if the result is zero, or false for all
> + * other cases.
> + */
> +#define local_inc_and_test(l) (local_inc_return(l) == 0)
> +
> +/*
> + * local_dec_and_test - decrement by 1 and test
> + * @l: pointer of type local_t
> + *
> + * Atomically decrements @l by 1 and
> + * returns true if the result is 0, or false for all other
> + * cases.
> + */
> +#define local_dec_and_test(l) (local_sub_return(1, (l)) == 0)
> +
> +/*
> + * local_dec_if_positive - decrement by 1 if old value positive
> + * @l: pointer of type local_t
> + */
> +#define local_dec_if_positive(l)	local_sub_if_positive(1, l)
> +
> +/*
> + * local_add_negative - add and test if negative
> + * @l: pointer of type local_t
> + * @i: integer value to add
> + *
> + * Atomically adds @i to @l and returns true
> + * if the result is negative, or false when
> + * result is greater than or equal to zero.
> + */
> +#define local_add_negative(i,l) (local_add_return(i, (l)) < 0)
> +
> +#endif /* !CONFIG_64BITS */
> +
> +
> +/* Use these for per-cpu local_t variables: on some archs they are
>   * much more efficient than these naive implementations.  Note they take
>   * a variable, not an address.
> + *
> + * This could be done better if we moved the per cpu data directly
> + * after GS.
>   */
> -#define cpu_local_read(v)	local_read(&__get_cpu_var(v))
> -#define cpu_local_set(v, i)	local_set(&__get_cpu_var(v), (i))
>  
> -#define cpu_local_inc(v)	local_inc(&__get_cpu_var(v))
> -#define cpu_local_dec(v)	local_dec(&__get_cpu_var(v))
> -#define cpu_local_add(i, v)	local_add((i), &__get_cpu_var(v))
> -#define cpu_local_sub(i, v)	local_sub((i), &__get_cpu_var(v))
> +#define __local_inc(l)		((l)->a.counter++)
> +#define __local_dec(l)		((l)->a.counter++)
> +#define __local_add(i,l)	((l)->a.counter+=(i))
> +#define __local_sub(i,l)	((l)->a.counter-=(i))
> +
> +/* Need to disable preemption for the cpu local counters otherwise we could
> +   still access a variable of a previous CPU in a non atomic way. */
> +#define cpu_local_wrap_v(l)	 	\
> +	({ local_t res__;		\
> +	   preempt_disable(); 		\
> +	   res__ = (l);			\
> +	   preempt_enable();		\
> +	   res__; })
> +#define cpu_local_wrap(l)		\
> +	({ preempt_disable();		\
> +	   l;				\
> +	   preempt_enable(); })		\
> +
> +#define cpu_local_read(l)    cpu_local_wrap_v(local_read(&__get_cpu_var(l)))
> +#define cpu_local_set(l, i)  cpu_local_wrap(local_set(&__get_cpu_var(l), (i)))
> +#define cpu_local_inc(l)     cpu_local_wrap(local_inc(&__get_cpu_var(l)))
> +#define cpu_local_dec(l)     cpu_local_wrap(local_dec(&__get_cpu_var(l)))
> +#define cpu_local_add(i, l)  cpu_local_wrap(local_add((i), &__get_cpu_var(l)))
> +#define cpu_local_sub(i, l)  cpu_local_wrap(local_sub((i), &__get_cpu_var(l)))
>  
> -#define __cpu_local_inc(v)	__local_inc(&__get_cpu_var(v))
> -#define __cpu_local_dec(v)	__local_dec(&__get_cpu_var(v))
> -#define __cpu_local_add(i, v)	__local_add((i), &__get_cpu_var(v))
> -#define __cpu_local_sub(i, v)	__local_sub((i), &__get_cpu_var(v))
> +#define __cpu_local_inc(l)	cpu_local_inc(l)
> +#define __cpu_local_dec(l)	cpu_local_dec(l)
> +#define __cpu_local_add(i, l)	cpu_local_add((i), (l))
> +#define __cpu_local_sub(i, l)	cpu_local_sub((i), (l))
>  
> -#endif /* _ASM_LOCAL_H */
> +#endif /* _ARCH_POWERPC_LOCAL_H */
> 

-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Ltt-dev] [PATCH 3/10] local_t : i386, local_add_return fix
  2006-12-21  0:22 ` [PATCH 3/10] local_t : i386 Mathieu Desnoyers
@ 2006-12-21 19:44   ` Mathieu Desnoyers
  0 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21 19:44 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: Douglas Niehaus, Martin J. Bligh, ltt-dev, Thomas Gleixner, systemtap

local_add_return fix for non volatile local_t on i386.

local_add_return should act like the new atomic_add_return considering the
removal of volatile from atomic_t.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

--- a/include/asm-i386/local.h
+++ b/include/asm-i386/local.h
@@ -142,8 +142,8 @@ #endif
 	__i = i;
 	__asm__ __volatile__(
 		"xaddl %0, %1;"
-		:"=r"(i)
-		:"m"(l->a.counter), "0"(i));
+		:"+r" (i), "+m" (l->a.counter)
+		: : "memory");
 	return i + __i;
 
 #ifdef CONFIG_M386
-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Ltt-dev] [PATCH 10/10] local_t : x86_64 : local_add_return
  2006-12-21  0:29 ` [PATCH 10/10] local_t : x86_64 Mathieu Desnoyers
@ 2006-12-21 19:46   ` Mathieu Desnoyers
  0 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2006-12-21 19:46 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig
  Cc: Douglas Niehaus, Martin J. Bligh, ltt-dev, Thomas Gleixner, systemtap

local_add_return should also deal with the removed volatile from local_t.

Inspired from atomic_t modifications : it must use the local_t both as input and
output.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>


--- a/include/asm-x86_64/local.h
+++ b/include/asm-x86_64/local.h
@@ -136,8 +136,8 @@ static __inline__ long local_add_return(
 	long __i = i;
 	__asm__ __volatile__(
 		"xaddq %0, %1;"
-		:"=r"(i)
-		:"m"(l->a.counter), "0"(i));
+		:"+r" (i), "+m" (l->a.counter)
+		: : "memory");
 	return i + __i;
 }
 
-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/10] local_t : adding and standardising atomic primitives
  2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
                   ` (9 preceding siblings ...)
  2006-12-21  0:29 ` [PATCH 10/10] local_t : x86_64 Mathieu Desnoyers
@ 2006-12-23  9:33 ` Pavel Machek
  2007-01-09  3:14   ` [PATCH] local_t : Documentation Mathieu Desnoyers
  10 siblings, 1 reply; 29+ messages in thread
From: Pavel Machek @ 2006-12-23  9:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

Hi!

> These patches extend and standardise local_t operations on each architectures,
> allowing a rich set of atomic operations to be done on per-cpu data with
> minimal performance impact. On some architectures, there seems to be no
> difference between the SMP and UP operation (same memory barriers, same
> LOCking), local.h simply includes asm-generic/local.h, which removes duplicated
> code.

Could you provide some Documentation/? Knowing when local_t can be
used is kind-of important.
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] local_t : Documentation
  2006-12-23  9:33 ` [PATCH 0/10] local_t : adding and standardising atomic primitives Pavel Machek
@ 2007-01-09  3:14   ` Mathieu Desnoyers
  2007-01-09 21:01     ` Andrew Morton
  2007-01-09 22:41     ` Pavel Machek
  0 siblings, 2 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2007-01-09  3:14 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

* Pavel Machek (pavel@suse.cz) wrote:
> Hi!
> 
> > These patches extend and standardise local_t operations on each architectures,
> > allowing a rich set of atomic operations to be done on per-cpu data with
> > minimal performance impact. On some architectures, there seems to be no
> > difference between the SMP and UP operation (same memory barriers, same
> > LOCking), local.h simply includes asm-generic/local.h, which removes duplicated
> > code.
> 
> Could you provide some Documentation/? Knowing when local_t can be
> used is kind-of important.
> 							Pavel

Hi Pavel,

Thanks for this appropriate comment. I totally agree that there is a need for
documentation about how local_t variables should be used. Here is the patch
that adds Documentation/local_ops.txt. Comments are welcome.

Regards,

Mathieu


diff --git a/Documentation/local_ops.txt b/Documentation/local_ops.txt
new file mode 100644
index 0000000..dfeec94
--- /dev/null
+++ b/Documentation/local_ops.txt
@@ -0,0 +1,148 @@
+	     Semantics and Behavior of Local Atomic Operations
+
+			    Mathieu Desnoyers
+
+
+	This document explains the purpose of the local atomic operations, how
+to implement them for any given architecture and shows how they can be used
+properly. It also stresses on the precautions that must be taken when reading
+those local variables across CPUs when the order of memory writes matters.
+
+
+
+* Purpose of local atomic operations
+
+Local atomic operations are meant to provide fast and highly reentrant per CPU
+counters. They minimize the performance cost of standard atomic operations by
+removing the LOCK prefix and memory barriers normally required to synchronize
+across CPUs.
+
+Having fast per CPU atomic counters is interesting in many cases : it does not
+require disabling interrupts to protect from interrupt handlers and it permits
+coherent counters in NMI handlers. It is especially useful for tracing purposes
+and for various performance monitoring counters.
+
+
+* Implementation for a given architecture
+
+It can be done by slightly modifying the standard atomic operations : only
+their UP variant must be kept. It typically means removing LOCK prefix (on
+i386 and x86_64) and any SMP sychronization barrier. If the architecture does
+not have a different behavior between SMP and UP, including asm-generic/local.h
+in your archtecture's local.h is sufficient.
+
+
+* How to use local atomic operations
+
+#include <linux/percpu.h>
+#include <asm/local.h>
+
+static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
+
+
+* Counting
+
+In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic
+operations : it makes sure that preemption is disabled around write access to
+the per cpu variable. For instance :
+
+	local_inc(&get_cpu_var(counters));
+	put_cpu_var(counters);
+
+If you are already in a preemption-safe context, you can directly use 
+__get_cpu_var() instead.
+
+	local_inc(&__get_cpu_var(counters));
+
+
+
+* Reading the counters
+
+Those local counters can be read from foreign CPUs to sum the count. Note that
+the data seen by local_read across CPUs must be considered to be out of order
+relatively to other memory writes happening on the CPU that owns the data.
+
+	long sum = 0;
+	for_each_online_cpu(cpu)
+		sum += local_read(&per_cpu(counters, cpu));
+
+If you want to use a remote local_read to synchronize access to a resource
+between CPUs, explicit smp_wmb() and smp_rmb() memory barriers must be used
+respectively on the writer and the reader CPUs. It would be the case if you use
+the local_t variable as a counter of bytes written in a buffer : there should
+be a smp_wmb() between the buffer write and the counter increment and also a
+smp_rmb() between the counter read and the buffer read.
+
+
+Here is a sample module which implements a basic per cpu counter using local.h.
+
+--- BEGIN ---
+/* test-local.c
+ *
+ * Sample module for local.h usage.
+ */
+
+
+#include <asm/local.h>
+#include <linux/module.h>
+#include <linux/timer.h>
+
+static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
+
+static struct timer_list test_timer;
+
+/* IPI called on each CPU. */
+static void test_each(void *info)
+{
+	/* Increment the counter from a non preemptible context */
+	printk("Increment on cpu %d\n", smp_processor_id());
+	local_inc(&__get_cpu_var(counters));
+
+	/* This is what incrementing the variable would look like within a
+	 * preemptible context (it disables preemption) :
+	 *
+	 * local_inc(&get_cpu_var(counters));
+	 * put_cpu_var(counters);
+	 */
+}
+
+static void do_test_timer(unsigned long data)
+{
+	int cpu;
+
+	/* Increment the counters */
+	on_each_cpu(test_each, NULL, 0, 1);
+	/* Read all the counters */
+	printk("Counters read from CPU %d\n", smp_processor_id());
+	for_each_online_cpu(cpu) {
+		printk("Read : CPU %d, count %ld\n", cpu,
+			local_read(&per_cpu(counters, cpu)));
+	}
+	del_timer(&test_timer);
+	test_timer.expires = jiffies + 1000;
+	add_timer(&test_timer);
+}
+
+static int __init test_init(void)
+{
+	/* initialize the timer that will increment the counter */
+	init_timer(&test_timer);
+	test_timer.function = do_test_timer;
+	test_timer.expires = jiffies + 1;
+	add_timer(&test_timer);
+
+	return 0;
+}
+
+static void __exit test_exit(void)
+{
+	del_timer_sync(&test_timer);
+}
+
+module_init(test_init);
+module_exit(test_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Local Atomic Ops");
+--- END ---
-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] local_t : Documentation
  2007-01-09  3:14   ` [PATCH] local_t : Documentation Mathieu Desnoyers
@ 2007-01-09 21:01     ` Andrew Morton
  2007-01-09 22:06       ` Mathieu Desnoyers
  2007-01-09 22:38       ` Pavel Machek
  2007-01-09 22:41     ` Pavel Machek
  1 sibling, 2 replies; 29+ messages in thread
From: Andrew Morton @ 2007-01-09 21:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Pavel Machek, linux-kernel, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

On Mon, 8 Jan 2007 22:14:46 -0500
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> +* How to use local atomic operations
> +
> +#include <linux/percpu.h>
> +#include <asm/local.h>
> +
> +static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
> +
> +
> +* Counting
> +
> +In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic
> +operations : it makes sure that preemption is disabled around write access to
> +the per cpu variable. For instance :
> +
> +	local_inc(&get_cpu_var(counters));
> +	put_cpu_var(counters);

Confused.  The whole point behind local_t is that we can do
atomic-wrt-interrupts inc and dec on them.

Consequently, as atomic-wrt-interrupts means atomic-wrt-preemption, there
is no need to do a preempt_disable() around local_inc() and local_dec().


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] local_t : Documentation
  2007-01-09 21:01     ` Andrew Morton
@ 2007-01-09 22:06       ` Mathieu Desnoyers
  2007-01-09 22:11         ` Andrew Morton
  2007-01-09 22:38       ` Pavel Machek
  1 sibling, 1 reply; 29+ messages in thread
From: Mathieu Desnoyers @ 2007-01-09 22:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pavel Machek, linux-kernel, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

* Andrew Morton (akpm@osdl.org) wrote:
> On Mon, 8 Jan 2007 22:14:46 -0500
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > +* How to use local atomic operations
> > +
> > +#include <linux/percpu.h>
> > +#include <asm/local.h>
> > +
> > +static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
> > +
> > +
> > +* Counting
> > +
> > +In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic
> > +operations : it makes sure that preemption is disabled around write access to
> > +the per cpu variable. For instance :
> > +
> > +	local_inc(&get_cpu_var(counters));
> > +	put_cpu_var(counters);
> 
> Confused.  The whole point behind local_t is that we can do
> atomic-wrt-interrupts inc and dec on them.
> 
> Consequently, as atomic-wrt-interrupts means atomic-wrt-preemption, there
> is no need to do a preempt_disable() around local_inc() and local_dec().
> 

Hi Andrew,

Not exactly : the increment operation is atomic, but not the selection of the
local variable. local_inc(&__get_cpu_var()) implies the following sequence 
of operations :

1 - Get the variable copy corresponding to the currently running CPU.
2 - atomically increment the variable.

It would be wrong to be scheduled on another CPU between 1 and 2, because the
atomic increment should only be done by the CPU "owner" of the local variable,
as the local atomic increment is not atomic wrt other CPUs.

Mathieu

-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] local_t : Documentation
  2007-01-09 22:06       ` Mathieu Desnoyers
@ 2007-01-09 22:11         ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2007-01-09 22:11 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Pavel Machek, linux-kernel, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

On Tue, 9 Jan 2007 17:06:16 -0500
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> * Andrew Morton (akpm@osdl.org) wrote:
> > On Mon, 8 Jan 2007 22:14:46 -0500
> > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > 
> > > +* How to use local atomic operations
> > > +
> > > +#include <linux/percpu.h>
> > > +#include <asm/local.h>
> > > +
> > > +static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
> > > +
> > > +
> > > +* Counting
> > > +
> > > +In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic
> > > +operations : it makes sure that preemption is disabled around write access to
> > > +the per cpu variable. For instance :
> > > +
> > > +	local_inc(&get_cpu_var(counters));
> > > +	put_cpu_var(counters);
> > 
> > Confused.  The whole point behind local_t is that we can do
> > atomic-wrt-interrupts inc and dec on them.
> > 
> > Consequently, as atomic-wrt-interrupts means atomic-wrt-preemption, there
> > is no need to do a preempt_disable() around local_inc() and local_dec().
> > 
> 
> Hi Andrew,
> 
> Not exactly : the increment operation is atomic, but not the selection of the
> local variable. local_inc(&__get_cpu_var()) implies the following sequence 
> of operations :
> 
> 1 - Get the variable copy corresponding to the currently running CPU.
> 2 - atomically increment the variable.
> 
> It would be wrong to be scheduled on another CPU between 1 and 2, because the
> atomic increment should only be done by the CPU "owner" of the local variable,
> as the local atomic increment is not atomic wrt other CPUs.
> 

doh.  I knew that.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] local_t : Documentation
  2007-01-09 21:01     ` Andrew Morton
  2007-01-09 22:06       ` Mathieu Desnoyers
@ 2007-01-09 22:38       ` Pavel Machek
  1 sibling, 0 replies; 29+ messages in thread
From: Pavel Machek @ 2007-01-09 22:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

On Tue 2007-01-09 13:01:10, Andrew Morton wrote:
> On Mon, 8 Jan 2007 22:14:46 -0500
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > +* How to use local atomic operations
> > +
> > +#include <linux/percpu.h>
> > +#include <asm/local.h>
> > +
> > +static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
> > +
> > +
> > +* Counting
> > +
> > +In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic
> > +operations : it makes sure that preemption is disabled around write access to
> > +the per cpu variable. For instance :
> > +
> > +	local_inc(&get_cpu_var(counters));
> > +	put_cpu_var(counters);
> 
> Confused.  The whole point behind local_t is that we can do
> atomic-wrt-interrupts inc and dec on them.

Could we get this short of two line description into the Doc/ file? It
talks about how to implement them, mentions LOCK prefixes unlikely to
be present on non-i386, but does not tell me what they guarantee...
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] local_t : Documentation
  2007-01-09  3:14   ` [PATCH] local_t : Documentation Mathieu Desnoyers
  2007-01-09 21:01     ` Andrew Morton
@ 2007-01-09 22:41     ` Pavel Machek
  2007-01-09 23:21       ` [PATCH] local_t : Documentation - update Mathieu Desnoyers
  1 sibling, 1 reply; 29+ messages in thread
From: Pavel Machek @ 2007-01-09 22:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

Hi!

> > > These patches extend and standardise local_t operations on each architectures,
> > > allowing a rich set of atomic operations to be done on per-cpu data with
> > > minimal performance impact. On some architectures, there seems to be no
> > > difference between the SMP and UP operation (same memory barriers, same
> > > LOCking), local.h simply includes asm-generic/local.h, which removes duplicated
> > > code.
> > 
> > Could you provide some Documentation/? Knowing when local_t can be
> > used is kind-of important.
> 
> Hi Pavel,
> 
> Thanks for this appropriate comment. I totally agree that there is a need for
> documentation about how local_t variables should be used. Here is the patch
> that adds Documentation/local_ops.txt. Comments are welcome.

AFAICT this fails to mention... Is local_t as big as int? As big as
long? Or perhaps smaller because high bits may be needed for locking?

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] local_t : Documentation - update
  2007-01-09 22:41     ` Pavel Machek
@ 2007-01-09 23:21       ` Mathieu Desnoyers
  2007-01-09 23:45         ` Pavel Machek
  0 siblings, 1 reply; 29+ messages in thread
From: Mathieu Desnoyers @ 2007-01-09 23:21 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

* Pavel Machek (pavel@ucw.cz) wrote:
> Hi!
> 
> AFAICT this fails to mention... Is local_t as big as int? As big as
> long? Or perhaps smaller because high bits may be needed for locking?
> 
> 									Pavel
> 

Hi Pavel,

Here is an update that adds the information you mentionned in this reply and the
one to Andrew. Thanks for the comments.

Mathieu


index dfeec94..bd854b3 100644
--- a/Documentation/local_ops.txt
+++ b/Documentation/local_ops.txt
@@ -22,6 +22,13 @@ require disabling interrupts to protect from interrupt handlers and it permits
 coherent counters in NMI handlers. It is especially useful for tracing purposes
 and for various performance monitoring counters.
 
+Local atomic operations only guarantee variable modification atomicity wrt the
+CPU which owns the data. Therefore, care must taken to make sure that only one
+CPU writes to the local_t data. This is done by using per cpu data and making
+sure that we modify it from within a preemption safe context. It is however
+permitted to read local_t data from any CPU : it will then appear to be written
+out of order wrt other memory writes on the owner CPU.
+
 
 * Implementation for a given architecture
 
@@ -31,6 +38,12 @@ i386 and x86_64) and any SMP sychronization barrier. If the architecture does
 not have a different behavior between SMP and UP, including asm-generic/local.h
 in your archtecture's local.h is sufficient.
 
+The local_t type is defined as an opaque signed long by embedding an
+atomic_long_t inside a structure. This is made so a cast from this type to a
+long fails. The definition looks like :
+
+typedef struct { atomic_long_t a; } local_t;
+
 
 * How to use local atomic operations
 
@@ -42,6 +55,8 @@ static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
 
 * Counting
 
+Counting is done on all the bits of a signed long.
+
 In preemptible context, use get_cpu_var() and put_cpu_var() around local atomic
 operations : it makes sure that preemption is disabled around write access to
 the per cpu variable. For instance :
-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] local_t : Documentation - update
  2007-01-09 23:21       ` [PATCH] local_t : Documentation - update Mathieu Desnoyers
@ 2007-01-09 23:45         ` Pavel Machek
  2007-01-10  0:39           ` Mathieu Desnoyers
  0 siblings, 1 reply; 29+ messages in thread
From: Pavel Machek @ 2007-01-09 23:45 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

Hi!

> > AFAICT this fails to mention... Is local_t as big as int? As big as
> > long? Or perhaps smaller because high bits may be needed for locking?
> 
> Hi Pavel,
> 
> Here is an update that adds the information you mentionned in this reply and the
> one to Andrew. Thanks for the comments.
> 
> Mathieu
> 
> 
> index dfeec94..bd854b3 100644
> --- a/Documentation/local_ops.txt
> +++ b/Documentation/local_ops.txt
> @@ -22,6 +22,13 @@ require disabling interrupts to protect from interrupt handlers and it permits
>  coherent counters in NMI handlers. It is especially useful for tracing purposes
>  and for various performance monitoring counters.
>  
> +Local atomic operations only guarantee variable modification atomicity wrt the
> +CPU which owns the data. Therefore, care must taken to make sure that only one
> +CPU writes to the local_t data. This is done by using per cpu data and making
> +sure that we modify it from within a preemption safe context. It is however
> +permitted to read local_t data from any CPU : it will then appear to be written
> +out of order wrt other memory writes on the owner CPU.

So it is "one cpu may write, other cpus may read", and as big as
long. Are you sure obscure architectures (sparc?) can implement this
in useful way? ... maybe yes, unless obscure architecture exists where
second other cpu can see garbage data when first cpu writes into long
...?


								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] local_t : Documentation - update
  2007-01-09 23:45         ` Pavel Machek
@ 2007-01-10  0:39           ` Mathieu Desnoyers
  2007-01-10  1:06             ` [Ltt-dev] " Mathieu Desnoyers
  0 siblings, 1 reply; 29+ messages in thread
From: Mathieu Desnoyers @ 2007-01-10  0:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, ltt-dev, systemtap, Douglas Niehaus,
	Martin J. Bligh, Thomas Gleixner

* Pavel Machek (pavel@ucw.cz) wrote:
> > index dfeec94..bd854b3 100644
> > --- a/Documentation/local_ops.txt
> > +++ b/Documentation/local_ops.txt
> > @@ -22,6 +22,13 @@ require disabling interrupts to protect from interrupt handlers and it permits
> >  coherent counters in NMI handlers. It is especially useful for tracing purposes
> >  and for various performance monitoring counters.
> >  
> > +Local atomic operations only guarantee variable modification atomicity wrt the
> > +CPU which owns the data. Therefore, care must taken to make sure that only one
> > +CPU writes to the local_t data. This is done by using per cpu data and making
> > +sure that we modify it from within a preemption safe context. It is however
> > +permitted to read local_t data from any CPU : it will then appear to be written
> > +out of order wrt other memory writes on the owner CPU.
> 
> So it is "one cpu may write, other cpus may read", and as big as
> long. Are you sure obscure architectures (sparc?) can implement this
> in useful way? ... maybe yes, unless obscure architecture exists where
> second other cpu can see garbage data when first cpu writes into long
> ...?
> 
> 

Sparc64 uses a memory barrier around the atomic operations in the SMP case
(see arch/sparc64/lib/atomic.S). The same is true for sparc. As I am not a sparc
expert, I left the asm-generic default behavior, but I think it should be safe
to implement local.S code derived from atomic.S to optimize the speed of the
local_t operations on sparc and sparc64. Can anyone confirm this ?

I don't know any architecture where an aligned memory access (read or write)
to a pointer type is not atomic. Size of longs are either 32 or 64 bits, but
always smaller than the pointer size (LLP64 has 32 bits longs, LP64 has 64
bits longs, ILP64 has 64 bits longs).

Mathieu

-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Ltt-dev] [PATCH] local_t : Documentation - update
  2007-01-10  0:39           ` Mathieu Desnoyers
@ 2007-01-10  1:06             ` Mathieu Desnoyers
  0 siblings, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2007-01-10  1:06 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Greg Kroah-Hartman, linux-kernel, Martin J. Bligh,
	Christoph Hellwig, Douglas Niehaus, Ingo Molnar, ltt-dev,
	systemtap, Thomas Gleixner

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> > So it is "one cpu may write, other cpus may read", and as big as
> > long. Are you sure obscure architectures (sparc?) can implement this
> > in useful way? ... maybe yes, unless obscure architecture exists where
> > second other cpu can see garbage data when first cpu writes into long
> > ...?
> > 
> > 
> 
> Sparc64 uses a memory barrier around the atomic operations in the SMP case
> (see arch/sparc64/lib/atomic.S). The same is true for sparc. As I am not a sparc
> expert, I left the asm-generic default behavior, but I think it should be safe
> to implement local.S code derived from atomic.S to optimize the speed of the
> local_t operations on sparc and sparc64. Can anyone confirm this ?
> 

Sorry for the self reply.. looking at arch/sparc/lib/atomic32.c tells me that
local.h could use its own version that would only disable interrupts without
taking any hashed spinlock.

sparc64 seems to be a saner architecture providing atomic operations wrt the
local CPU. A barrier-free version of arch/sparc64/lib/atomic.S would improve
performance.

Mathieu

-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/10] local_t : powerpc
  2006-12-21  0:27 ` [PATCH 7/10] local_t : powerpc Mathieu Desnoyers
  2006-12-21  3:34   ` [Ltt-dev] " Mathieu Desnoyers
@ 2007-01-24  9:08   ` Paul Mackerras
  2007-01-24 10:43     ` Gabriel Paubert
  2007-01-24 17:00     ` Mathieu Desnoyers
  1 sibling, 2 replies; 29+ messages in thread
From: Paul Mackerras @ 2007-01-24  9:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, Martin J. Bligh, linuxppc-dev,
	Douglas Niehaus, ltt-dev, systemtap, Thomas Gleixner

Mathieu Desnoyers writes:

> +static __inline__ int local_dec_if_positive(local_t *l)
> +{
> +	int t;
> +
> +	__asm__ __volatile__(
> +"1:	lwarx	%0,0,%1		# local_dec_if_positive\n\
> +	addic.	%0,%0,-1\n\
> +	blt-	2f\n"
> +	PPC405_ERR77(0,%1)
> +"	stwcx.	%0,0,%1\n\
> +	bne-	1b"

This has the same bugs that we fixed recently in atomic_dec_if_positive;
first, on 64-bit machines, the lwarx will zero-extend the word loaded
from memory, and so the result of the addic will be negative only if
the word was originally 0.  Secondly, even on 32-bit machines,
0x80000000 will be considered positive since decrementing it gives
0x7fffffff, which is positive.

> +/* Use these for per-cpu local_t variables: on some archs they are
> + * much more efficient than these naive implementations.  Note they take
> + * a variable, not an address.
> + *
> + * This could be done better if we moved the per cpu data directly
> + * after GS.
> + */

What's "GS"?  Does this comment really apply on powerpc?

Paul.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/10] local_t : powerpc
  2007-01-24  9:08   ` Paul Mackerras
@ 2007-01-24 10:43     ` Gabriel Paubert
  2007-01-24 17:00     ` Mathieu Desnoyers
  1 sibling, 0 replies; 29+ messages in thread
From: Gabriel Paubert @ 2007-01-24 10:43 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Mathieu Desnoyers, Andrew Morton, Greg Kroah-Hartman,
	linux-kernel, Martin J. Bligh, Christoph Hellwig, linuxppc-dev,
	Douglas Niehaus, Ingo Molnar, ltt-dev, systemtap,
	Thomas Gleixner

On Wed, Jan 24, 2007 at 08:08:12PM +1100, Paul Mackerras wrote:
> Mathieu Desnoyers writes:
> 
> > +static __inline__ int local_dec_if_positive(local_t *l)
> > +{
> > +	int t;
> > +
> > +	__asm__ __volatile__(
> > +"1:	lwarx	%0,0,%1		# local_dec_if_positive\n\
> > +	addic.	%0,%0,-1\n\
> > +	blt-	2f\n"
> > +	PPC405_ERR77(0,%1)
> > +"	stwcx.	%0,0,%1\n\
> > +	bne-	1b"
> 
> This has the same bugs that we fixed recently in atomic_dec_if_positive;
> first, on 64-bit machines, the lwarx will zero-extend the word loaded
> from memory, and so the result of the addic will be negative only if
> the word was originally 0.  Secondly, even on 32-bit machines,
> 0x80000000 will be considered positive since decrementing it gives
> 0x7fffffff, which is positive.
> 
> > +/* Use these for per-cpu local_t variables: on some archs they are
> > + * much more efficient than these naive implementations.  Note they take
> > + * a variable, not an address.
> > + *
> > + * This could be done better if we moved the per cpu data directly
> > + * after GS.
> > + */
> 
> What's "GS"?  Does this comment really apply on powerpc?
> 

1) It's an (application visible) i386/x86_64 segment register used
   to make memory addressing more confusing.
2) Because of 1), obviously not ;-)

	Gabriel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/10] local_t : powerpc
  2007-01-24  9:08   ` Paul Mackerras
  2007-01-24 10:43     ` Gabriel Paubert
@ 2007-01-24 17:00     ` Mathieu Desnoyers
  1 sibling, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2007-01-24 17:00 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linux-kernel, Andrew Morton, Ingo Molnar, Greg Kroah-Hartman,
	Christoph Hellwig, Martin J. Bligh, linuxppc-dev,
	Douglas Niehaus, ltt-dev, systemtap, Thomas Gleixner

* Paul Mackerras (paulus@samba.org) wrote:
> Mathieu Desnoyers writes:
> 
> > +static __inline__ int local_dec_if_positive(local_t *l)
> > +{
> > +	int t;
> > +
> > +	__asm__ __volatile__(
> > +"1:	lwarx	%0,0,%1		# local_dec_if_positive\n\
> > +	addic.	%0,%0,-1\n\
> > +	blt-	2f\n"
> > +	PPC405_ERR77(0,%1)
> > +"	stwcx.	%0,0,%1\n\
> > +	bne-	1b"
> 
> This has the same bugs that we fixed recently in atomic_dec_if_positive;
> first, on 64-bit machines, the lwarx will zero-extend the word loaded
> from memory, and so the result of the addic will be negative only if
> the word was originally 0.  Secondly, even on 32-bit machines,
> 0x80000000 will be considered positive since decrementing it gives
> 0x7fffffff, which is positive.
> 
Hi Paul,

Thanks, will fix.

Mathieu

-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-01-24 17:05 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-21  0:15 [PATCH 0/10] local_t : adding and standardising atomic primitives Mathieu Desnoyers
2006-12-21  0:20 ` [PATCH 1/10] local_t : architecture agnostic Mathieu Desnoyers
2006-12-21  0:21 ` [PATCH 2/10] local_t : alpha Mathieu Desnoyers
2006-12-21  0:22 ` [PATCH 3/10] local_t : i386 Mathieu Desnoyers
2006-12-21 19:44   ` [Ltt-dev] [PATCH 3/10] local_t : i386, local_add_return fix Mathieu Desnoyers
2006-12-21  0:23 ` [PATCH 4/10] local_t : ia64 Mathieu Desnoyers
2006-12-21  0:25 ` [PATCH 5/10] " Mathieu Desnoyers
2006-12-21 14:04   ` [Ltt-dev] [PATCH 5/10] local_t : MIPS Mathieu Desnoyers
2006-12-21  0:25 ` [PATCH 6/10] local_t : parisc Mathieu Desnoyers
2006-12-21  0:27 ` [PATCH 7/10] local_t : powerpc Mathieu Desnoyers
2006-12-21  3:34   ` [Ltt-dev] " Mathieu Desnoyers
2007-01-24  9:08   ` Paul Mackerras
2007-01-24 10:43     ` Gabriel Paubert
2007-01-24 17:00     ` Mathieu Desnoyers
2006-12-21  0:27 ` [PATCH 8/10] local_t : s390 Mathieu Desnoyers
2006-12-21  0:28 ` [PATCH 9/10] local_t : sparc64 Mathieu Desnoyers
2006-12-21  0:29 ` [PATCH 10/10] local_t : x86_64 Mathieu Desnoyers
2006-12-21 19:46   ` [Ltt-dev] [PATCH 10/10] local_t : x86_64 : local_add_return Mathieu Desnoyers
2006-12-23  9:33 ` [PATCH 0/10] local_t : adding and standardising atomic primitives Pavel Machek
2007-01-09  3:14   ` [PATCH] local_t : Documentation Mathieu Desnoyers
2007-01-09 21:01     ` Andrew Morton
2007-01-09 22:06       ` Mathieu Desnoyers
2007-01-09 22:11         ` Andrew Morton
2007-01-09 22:38       ` Pavel Machek
2007-01-09 22:41     ` Pavel Machek
2007-01-09 23:21       ` [PATCH] local_t : Documentation - update Mathieu Desnoyers
2007-01-09 23:45         ` Pavel Machek
2007-01-10  0:39           ` Mathieu Desnoyers
2007-01-10  1:06             ` [Ltt-dev] " Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).