LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()"
@ 2018-04-10  6:34 Christophe Leroy
  2018-05-16 23:10 ` Paul Mackerras
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Christophe Leroy @ 2018-04-10  6:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Scott Wood
  Cc: linux-kernel, linuxppc-dev, Shile Zhang

This reverts commit 6ad966d7303b70165228dba1ee8da1a05c10eefe.

That commit was pointless, because csum_add() sums two 32 bits
values, so the sum is 0x1fffffffe at the maximum.
And then when adding upper part (1) and lower part (0xfffffffe),
the result is 0xffffffff which doesn't carry.
Any lower value will not carry either.

And behind the fact that this commit is useless, it also kills the
whole purpose of having an arch specific inline csum_add()
because the resulting code gets even worse than what is obtained
with the generic implementation of csum_add()

0000000000000240 <.csum_add>:
 240:	38 00 ff ff 	li      r0,-1
 244:	7c 84 1a 14 	add     r4,r4,r3
 248:	78 00 00 20 	clrldi  r0,r0,32
 24c:	78 89 00 22 	rldicl  r9,r4,32,32
 250:	7c 80 00 38 	and     r0,r4,r0
 254:	7c 09 02 14 	add     r0,r9,r0
 258:	78 09 00 22 	rldicl  r9,r0,32,32
 25c:	7c 00 4a 14 	add     r0,r0,r9
 260:	78 03 00 20 	clrldi  r3,r0,32
 264:	4e 80 00 20 	blr

In comparison, the generic implementation of csum_add() gives:

0000000000000290 <.csum_add>:
 290:	7c 63 22 14 	add     r3,r3,r4
 294:	7f 83 20 40 	cmplw   cr7,r3,r4
 298:	7c 10 10 26 	mfocrf  r0,1
 29c:	54 00 ef fe 	rlwinm  r0,r0,29,31,31
 2a0:	7c 60 1a 14 	add     r3,r0,r3
 2a4:	78 63 00 20 	clrldi  r3,r3,32
 2a8:	4e 80 00 20 	blr

And the reverted implementation for PPC64 gives:

0000000000000240 <.csum_add>:
 240:	7c 84 1a 14 	add     r4,r4,r3
 244:	78 80 00 22 	rldicl  r0,r4,32,32
 248:	7c 80 22 14 	add     r4,r0,r4
 24c:	78 83 00 20 	clrldi  r3,r4,32
 250:	4e 80 00 20 	blr

Fixes: 6ad966d7303b7 ("powerpc/64: Fix checksum folding in csum_add()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/checksum.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 842124b199b5..4e63787dc3be 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -112,7 +112,7 @@ static inline __wsum csum_add(__wsum csum, __wsum addend)
 
 #ifdef __powerpc64__
 	res += (__force u64)addend;
-	return (__force __wsum) from64to32(res);
+	return (__force __wsum)((u32)res + (res >> 32));
 #else
 	asm("addc %0,%0,%1;"
 	    "addze %0,%0;"
-- 
2.13.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()"
  2018-04-10  6:34 [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()" Christophe Leroy
@ 2018-05-16 23:10 ` Paul Mackerras
  2018-05-17 14:38 ` Segher Boessenkool
  2018-05-21 10:01 ` Michael Ellerman
  2 siblings, 0 replies; 4+ messages in thread
From: Paul Mackerras @ 2018-05-16 23:10 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Michael Ellerman, Scott Wood,
	linux-kernel, linuxppc-dev, Shile Zhang

On Tue, Apr 10, 2018 at 08:34:37AM +0200, Christophe Leroy wrote:
> This reverts commit 6ad966d7303b70165228dba1ee8da1a05c10eefe.
> 
> That commit was pointless, because csum_add() sums two 32 bits
> values, so the sum is 0x1fffffffe at the maximum.
> And then when adding upper part (1) and lower part (0xfffffffe),
> the result is 0xffffffff which doesn't carry.
> Any lower value will not carry either.
> 
> And behind the fact that this commit is useless, it also kills the
> whole purpose of having an arch specific inline csum_add()
> because the resulting code gets even worse than what is obtained
> with the generic implementation of csum_add()
> 
> 0000000000000240 <.csum_add>:
>  240:	38 00 ff ff 	li      r0,-1
>  244:	7c 84 1a 14 	add     r4,r4,r3
>  248:	78 00 00 20 	clrldi  r0,r0,32
>  24c:	78 89 00 22 	rldicl  r9,r4,32,32
>  250:	7c 80 00 38 	and     r0,r4,r0
>  254:	7c 09 02 14 	add     r0,r9,r0
>  258:	78 09 00 22 	rldicl  r9,r0,32,32
>  25c:	7c 00 4a 14 	add     r0,r0,r9
>  260:	78 03 00 20 	clrldi  r3,r0,32
>  264:	4e 80 00 20 	blr
> 
> In comparison, the generic implementation of csum_add() gives:
> 
> 0000000000000290 <.csum_add>:
>  290:	7c 63 22 14 	add     r3,r3,r4
>  294:	7f 83 20 40 	cmplw   cr7,r3,r4
>  298:	7c 10 10 26 	mfocrf  r0,1
>  29c:	54 00 ef fe 	rlwinm  r0,r0,29,31,31
>  2a0:	7c 60 1a 14 	add     r3,r0,r3
>  2a4:	78 63 00 20 	clrldi  r3,r3,32
>  2a8:	4e 80 00 20 	blr
> 
> And the reverted implementation for PPC64 gives:
> 
> 0000000000000240 <.csum_add>:
>  240:	7c 84 1a 14 	add     r4,r4,r3
>  244:	78 80 00 22 	rldicl  r0,r4,32,32
>  248:	7c 80 22 14 	add     r4,r0,r4
>  24c:	78 83 00 20 	clrldi  r3,r4,32
>  250:	4e 80 00 20 	blr
> 
> Fixes: 6ad966d7303b7 ("powerpc/64: Fix checksum folding in csum_add()")
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Seems I was right first time... :)

Acked-by: Paul Mackerras <paulus@ozlabs.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()"
  2018-04-10  6:34 [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()" Christophe Leroy
  2018-05-16 23:10 ` Paul Mackerras
@ 2018-05-17 14:38 ` Segher Boessenkool
  2018-05-21 10:01 ` Michael Ellerman
  2 siblings, 0 replies; 4+ messages in thread
From: Segher Boessenkool @ 2018-05-17 14:38 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Scott Wood, Shile Zhang, linuxppc-dev, linux-kernel

On Tue, Apr 10, 2018 at 08:34:37AM +0200, Christophe Leroy wrote:
> This reverts commit 6ad966d7303b70165228dba1ee8da1a05c10eefe.
> 
> That commit was pointless, because csum_add() sums two 32 bits
> values, so the sum is 0x1fffffffe at the maximum.
> And then when adding upper part (1) and lower part (0xfffffffe),
> the result is 0xffffffff which doesn't carry.
> Any lower value will not carry either.
> 
> And behind the fact that this commit is useless, it also kills the
> whole purpose of having an arch specific inline csum_add()
> because the resulting code gets even worse than what is obtained
> with the generic implementation of csum_add()

:-)

> And the reverted implementation for PPC64 gives:
> 
> 0000000000000240 <.csum_add>:
>  240:	7c 84 1a 14 	add     r4,r4,r3
>  244:	78 80 00 22 	rldicl  r0,r4,32,32
>  248:	7c 80 22 14 	add     r4,r0,r4
>  24c:	78 83 00 20 	clrldi  r3,r4,32
>  250:	4e 80 00 20 	blr

If you really, really, *really* want to optimise this you could
make it:

	rldimi r3,r3,0,32
	rldimi r4,r4,0,32
	add r3,r3,r4
	srdi r3,r3,32
	blr

which is the same size, but has a shorter critical path length.  Very
analogous to how you fold 64->32.


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Revert "powerpc/64: Fix checksum folding in csum_add()"
  2018-04-10  6:34 [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()" Christophe Leroy
  2018-05-16 23:10 ` Paul Mackerras
  2018-05-17 14:38 ` Segher Boessenkool
@ 2018-05-21 10:01 ` Michael Ellerman
  2 siblings, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2018-05-21 10:01 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, Scott Wood
  Cc: Shile Zhang, linuxppc-dev, linux-kernel

On Tue, 2018-04-10 at 06:34:37 UTC, Christophe Leroy wrote:
> This reverts commit 6ad966d7303b70165228dba1ee8da1a05c10eefe.
> 
> That commit was pointless, because csum_add() sums two 32 bits
> values, so the sum is 0x1fffffffe at the maximum.
> And then when adding upper part (1) and lower part (0xfffffffe),
> the result is 0xffffffff which doesn't carry.
> Any lower value will not carry either.
> 
> And behind the fact that this commit is useless, it also kills the
> whole purpose of having an arch specific inline csum_add()
> because the resulting code gets even worse than what is obtained
> with the generic implementation of csum_add()
> 
> 0000000000000240 <.csum_add>:
>  240:	38 00 ff ff 	li      r0,-1
>  244:	7c 84 1a 14 	add     r4,r4,r3
>  248:	78 00 00 20 	clrldi  r0,r0,32
>  24c:	78 89 00 22 	rldicl  r9,r4,32,32
>  250:	7c 80 00 38 	and     r0,r4,r0
>  254:	7c 09 02 14 	add     r0,r9,r0
>  258:	78 09 00 22 	rldicl  r9,r0,32,32
>  25c:	7c 00 4a 14 	add     r0,r0,r9
>  260:	78 03 00 20 	clrldi  r3,r0,32
>  264:	4e 80 00 20 	blr
> 
> In comparison, the generic implementation of csum_add() gives:
> 
> 0000000000000290 <.csum_add>:
>  290:	7c 63 22 14 	add     r3,r3,r4
>  294:	7f 83 20 40 	cmplw   cr7,r3,r4
>  298:	7c 10 10 26 	mfocrf  r0,1
>  29c:	54 00 ef fe 	rlwinm  r0,r0,29,31,31
>  2a0:	7c 60 1a 14 	add     r3,r0,r3
>  2a4:	78 63 00 20 	clrldi  r3,r3,32
>  2a8:	4e 80 00 20 	blr
> 
> And the reverted implementation for PPC64 gives:
> 
> 0000000000000240 <.csum_add>:
>  240:	7c 84 1a 14 	add     r4,r4,r3
>  244:	78 80 00 22 	rldicl  r0,r4,32,32
>  248:	7c 80 22 14 	add     r4,r0,r4
>  24c:	78 83 00 20 	clrldi  r3,r4,32
>  250:	4e 80 00 20 	blr
> 
> Fixes: 6ad966d7303b7 ("powerpc/64: Fix checksum folding in csum_add()")
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Acked-by: Paul Mackerras <paulus@ozlabs.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/96f391cf40ee5c9201cc7b55abe390

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-21 10:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10  6:34 [PATCH] Revert "powerpc/64: Fix checksum folding in csum_add()" Christophe Leroy
2018-05-16 23:10 ` Paul Mackerras
2018-05-17 14:38 ` Segher Boessenkool
2018-05-21 10:01 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).