LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2] powerpc/32: Add support for out-of-line static calls
@ 2021-08-31 13:12 Christophe Leroy
2021-08-31 14:00 ` Peter Zijlstra
0 siblings, 1 reply; 3+ messages in thread
From: Christophe Leroy @ 2021-08-31 13:12 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linux-kernel, linuxppc-dev, Peter Zijlstra, Josh Poimboeuf,
Jason Baron, Steven Rostedt, Ard Biesheuvel
Add support for out-of-line static calls on PPC32. This change
improve performance of calls to global function pointers by
using direct calls instead of indirect calls.
The trampoline is initialy populated with a 'blr' or branch to target,
followed by an unreachable long jump sequence.
In order to cater with parallele execution, the trampoline needs to
be updated in a way that ensures it remains consistent at all time.
This means we can't use the traditional lis/addi to load r12 with
the target address, otherwise there would be a window during which
the first instruction contains the upper part of the new target
address while the second instruction still contains the lower part of
the old target address. To avoid that the target address is stored
just after the 'bctr' and loaded from there with a single instruction.
Then, depending on the target distance, arch_static_call_transform()
will either replace the first instruction by a direct 'bl <target>' or
'nop' in order to have the trampoline fall through the long jump
sequence.
Performancewise the long jump sequence is probably not better than
the indirect calls set by GCC when we don't use static calls, but
such calls are unlikely to be required on powerpc32: With most
configurations the kernel size is far below 32 Mbytes so only
modules may happen to be too far. And even modules are likely to
be close enough as they are allocated below the kernel core and
as close as possible of the kernel text.
static_call selftest is running successfully with this change.
With this patch, __do_irq() has the following sequence to trace
irq entries:
c0004a00 <__SCT__tp_func_irq_entry>:
c0004a00: 48 00 00 e0 b c0004ae0 <__traceiter_irq_entry>
c0004a04: 3d 80 c0 00 lis r12,-16384
c0004a08: 81 8c 4a 14 lwz r12,18964(r12)
c0004a0c: 7d 89 03 a6 mtctr r12
c0004a10: 4e 80 04 20 bctr
c0004a14: 00 00 00 00 .long 0x0
c0004a18: 60 00 00 00 nop
c0004a1c: 60 00 00 00 nop
...
c0005654 <__do_irq>:
...
c0005664: 7c 7f 1b 78 mr r31,r3
...
c00056a0: 81 22 00 00 lwz r9,0(r2)
c00056a4: 39 29 00 01 addi r9,r9,1
c00056a8: 91 22 00 00 stw r9,0(r2)
c00056ac: 3d 20 c0 af lis r9,-16209
c00056b0: 81 29 74 cc lwz r9,29900(r9)
c00056b4: 2c 09 00 00 cmpwi r9,0
c00056b8: 41 82 00 10 beq c00056c8 <__do_irq+0x74>
c00056bc: 80 69 00 04 lwz r3,4(r9)
c00056c0: 7f e4 fb 78 mr r4,r31
c00056c4: 4b ff f3 3d bl c0004a00 <__SCT__tp_func_irq_entry>
Before this patch, __do_irq() was doing the following to trace irq
entries:
c0005700 <__do_irq>:
...
c0005710: 7c 7e 1b 78 mr r30,r3
...
c000574c: 93 e1 00 0c stw r31,12(r1)
c0005750: 81 22 00 00 lwz r9,0(r2)
c0005754: 39 29 00 01 addi r9,r9,1
c0005758: 91 22 00 00 stw r9,0(r2)
c000575c: 3d 20 c0 af lis r9,-16209
c0005760: 83 e9 f4 cc lwz r31,-2868(r9)
c0005764: 2c 1f 00 00 cmpwi r31,0
c0005768: 41 82 00 24 beq c000578c <__do_irq+0x8c>
c000576c: 81 3f 00 00 lwz r9,0(r31)
c0005770: 80 7f 00 04 lwz r3,4(r31)
c0005774: 7d 29 03 a6 mtctr r9
c0005778: 7f c4 f3 78 mr r4,r30
c000577c: 4e 80 04 21 bctrl
c0005780: 85 3f 00 0c lwzu r9,12(r31)
c0005784: 2c 09 00 00 cmpwi r9,0
c0005788: 40 82 ff e4 bne c000576c <__do_irq+0x6c>
Behind the fact of now using a direct 'bl' instead of a
'load/mtctr/bctr' sequence, we can also see that we get one less
register on the stack.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Use indirect load in long jump sequence to cater with parallele execution and preemption.
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/static_call.h | 25 ++++++++++++++++++
arch/powerpc/kernel/Makefile | 2 +-
arch/powerpc/kernel/static_call.c | 36 ++++++++++++++++++++++++++
4 files changed, 63 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/include/asm/static_call.h
create mode 100644 arch/powerpc/kernel/static_call.c
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 36b72d972568..a0fe69d8ec83 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -247,6 +247,7 @@ config PPC
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
+ select HAVE_STATIC_CALL if PPC32
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HUGETLB_PAGE_SIZE_VARIABLE if PPC_BOOK3S_64 && HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
new file mode 100644
index 000000000000..2402c6d32439
--- /dev/null
+++ b/arch/powerpc/include/asm/static_call.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_STATIC_CALL_H
+#define _ASM_POWERPC_STATIC_CALL_H
+
+#define __POWERPC_SCT(name, inst) \
+ asm(".pushsection .text, \"ax\" \n" \
+ ".align 5 \n" \
+ ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
+ STATIC_CALL_TRAMP_STR(name) ": \n" \
+ inst " \n" \
+ " lis 12,1f@ha \n" \
+ " lwz 12,1f@l(12) \n" \
+ " mtctr 12 \n" \
+ " bctr \n" \
+ "1: .long 0 \n" \
+ " nop \n" \
+ " nop \n" \
+ ".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
+ ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
+ ".popsection \n")
+
+#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) __POWERPC_SCT(name, "b " #func)
+#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) __POWERPC_SCT(name, "blr")
+
+#endif /* _ASM_POWERPC_STATIC_CALL_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 7be36c1e1db6..0e3640e14eb1 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -106,7 +106,7 @@ extra-y += vmlinux.lds
obj-$(CONFIG_RELOCATABLE) += reloc_$(BITS).o
-obj-$(CONFIG_PPC32) += entry_32.o setup_32.o early_32.o
+obj-$(CONFIG_PPC32) += entry_32.o setup_32.o early_32.o static_call.o
obj-$(CONFIG_PPC64) += dma-iommu.o iommu.o
obj-$(CONFIG_KGDB) += kgdb.o
obj-$(CONFIG_BOOTX_TEXT) += btext.o
diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
new file mode 100644
index 000000000000..e5e78205ccb4
--- /dev/null
+++ b/arch/powerpc/kernel/static_call.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/memory.h>
+#include <linux/static_call.h>
+
+#include <asm/code-patching.h>
+
+void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
+{
+ int err;
+ unsigned long target = (long)func;
+ bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
+
+ if (!tramp)
+ return;
+
+ mutex_lock(&text_mutex);
+
+ if (func && !is_short) {
+ err = patch_instruction(tramp + 20, ppc_inst(target));
+ if (err)
+ goto out;
+ }
+
+ if (!func)
+ err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
+ else if (is_short)
+ err = patch_branch(tramp, target, 0);
+ else
+ err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
+out:
+ mutex_unlock(&text_mutex);
+
+ if (err)
+ panic("%s: patching failed %pS at %pS\n", __func__, func, tramp);
+}
+EXPORT_SYMBOL_GPL(arch_static_call_transform);
--
2.25.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] powerpc/32: Add support for out-of-line static calls
2021-08-31 13:12 [PATCH v2] powerpc/32: Add support for out-of-line static calls Christophe Leroy
@ 2021-08-31 14:00 ` Peter Zijlstra
2022-03-11 14:56 ` Christophe Leroy
0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2021-08-31 14:00 UTC (permalink / raw)
To: Christophe Leroy
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
linux-kernel, linuxppc-dev, Josh Poimboeuf, Jason Baron,
Steven Rostedt, Ard Biesheuvel
On Tue, Aug 31, 2021 at 01:12:26PM +0000, Christophe Leroy wrote:
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 36b72d972568..a0fe69d8ec83 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -247,6 +247,7 @@ config PPC
> select HAVE_SOFTIRQ_ON_OWN_STACK
> select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
> select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
> + select HAVE_STATIC_CALL if PPC32
> select HAVE_SYSCALL_TRACEPOINTS
> select HAVE_VIRT_CPU_ACCOUNTING
> select HUGETLB_PAGE_SIZE_VARIABLE if PPC_BOOK3S_64 && HUGETLB_PAGE
> diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
> new file mode 100644
> index 000000000000..2402c6d32439
> --- /dev/null
> +++ b/arch/powerpc/include/asm/static_call.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_STATIC_CALL_H
> +#define _ASM_POWERPC_STATIC_CALL_H
> +
> +#define __POWERPC_SCT(name, inst) \
> + asm(".pushsection .text, \"ax\" \n" \
> + ".align 5 \n" \
> + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
> + STATIC_CALL_TRAMP_STR(name) ": \n" \
> + inst " \n" \
> + " lis 12,1f@ha \n" \
> + " lwz 12,1f@l(12) \n" \
> + " mtctr 12 \n" \
> + " bctr \n" \
> + "1: .long 0 \n" \
> + " nop \n" \
> + " nop \n" \
> + ".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
> + ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
> + ".popsection \n")
> +
> +#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) __POWERPC_SCT(name, "b " #func)
> +#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) __POWERPC_SCT(name, "blr")
> +
> +#endif /* _ASM_POWERPC_STATIC_CALL_H */
> diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
> new file mode 100644
> index 000000000000..e5e78205ccb4
> --- /dev/null
> +++ b/arch/powerpc/kernel/static_call.c
> @@ -0,0 +1,36 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/memory.h>
> +#include <linux/static_call.h>
> +
> +#include <asm/code-patching.h>
> +
> +void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
> +{
> + int err;
> + unsigned long target = (long)func;
> + bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
> +
> + if (!tramp)
> + return;
> +
> + mutex_lock(&text_mutex);
> +
> + if (func && !is_short) {
> + err = patch_instruction(tramp + 20, ppc_inst(target));
> + if (err)
> + goto out;
> + }
> +
> + if (!func)
> + err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
> + else if (is_short)
> + err = patch_branch(tramp, target, 0);
> + else
> + err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
> +out:
> + mutex_unlock(&text_mutex);
> +
> + if (err)
> + panic("%s: patching failed %pS at %pS\n", __func__, func, tramp);
> +}
> +EXPORT_SYMBOL_GPL(arch_static_call_transform);
Yes, this should work nicely!
Since you have the two nop's at the end, you could frob in an
optimization for __static_call_return0 without too much issue.
Replace the two nops with (excuse my ppc asm):
li r3, 0
blr
and augment arch_static_call_transform() with something like:
if (func == &__static_call_return0)
err = patch_branch(tramp, tramp+24, 0);
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] powerpc/32: Add support for out-of-line static calls
2021-08-31 14:00 ` Peter Zijlstra
@ 2022-03-11 14:56 ` Christophe Leroy
0 siblings, 0 replies; 3+ messages in thread
From: Christophe Leroy @ 2022-03-11 14:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
linux-kernel, linuxppc-dev, Josh Poimboeuf, Jason Baron,
Steven Rostedt, Ard Biesheuvel
Hi Peter,
Le 31/08/2021 à 16:00, Peter Zijlstra a écrit :
> On Tue, Aug 31, 2021 at 01:12:26PM +0000, Christophe Leroy wrote:
>
> Yes, this should work nicely!
>
> Since you have the two nop's at the end, you could frob in an
> optimization for __static_call_return0 without too much issue.
>
> Replace the two nops with (excuse my ppc asm):
>
> li r3, 0
> blr
>
> and augment arch_static_call_transform() with something like:
>
> if (func == &__static_call_return0)
> err = patch_branch(tramp, tramp+24, 0);
I just discovered that we likely have an issue with the implementation
of that RET0 static call.
Looking at System.map I have:
c0004fc0 t __static_call_return0
c0011518 t __static_call_return0
c00d8160 t __static_call_return0
So when we do:
if (func == &__static_call_return0)
It is unlikely that we'll get the expected one.
I see __static_call_return0 is defined as 'static inline' in
include/linux/static_call.h
Any reason for not having it as a single global symbol instead ?
Thanks
Christophe
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-03-11 14:57 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31 13:12 [PATCH v2] powerpc/32: Add support for out-of-line static calls Christophe Leroy
2021-08-31 14:00 ` Peter Zijlstra
2022-03-11 14:56 ` Christophe Leroy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).