LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
@ 2019-04-27 10:06 Nicolai Stange
  2019-04-27 10:06 ` [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member Nicolai Stange
                   ` (3 more replies)
  0 siblings, 4 replies; 64+ messages in thread
From: Nicolai Stange @ 2019-04-27 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest,
	Nicolai Stange

Hi,

this series is the result of the discussion to the RFC patch found at [1].

The goal is to make x86' ftrace_int3_handler() not to simply skip over
the trapping instruction as this is problematic in the context of
the live patching consistency model. For details, c.f. the commit message
of [3/4] ("x86/ftrace: make ftrace_int3_handler() not to skip fops
invocation").

Everything is based on v5.1-rc6, please let me know in case you want me to
rebase on somehing else.

For x86_64, the live patching selftest added in [4/4] succeeds with this
series applied and fails without it. On 32 bits I only compile-tested.

checkpatch reports warnings about
- an overlong line in assembly -- I chose to ignore that
- MAINTAINERS perhaps needing updates due to the new files
  arch/x86/kernel/ftrace_int3_stubs.S and
  tools/testing/selftests/livepatch/test-livepatch-vs-ftrace.sh.
  As the existing arch/x86/kernel/ftrace_{32,64}.S haven't got an
  explicit entry either, this one is probably Ok? The selftest
  definitely is.

Changes to the RFC patch:
- s/trampoline/stub/ to avoid confusion with the ftrace_ops' trampolines,
- use a fixed size stack kept in struct thread_info for passing the
  (adjusted) ->ip values from ftrace_int3_handler() to the stubs,
- provide one stub for each of the two possible jump targets and hardcode
  those,
- add the live patching selftest.

Thanks,

Nicolai

Nicolai Stange (4):
  x86/thread_info: introduce ->ftrace_int3_stack member
  ftrace: drop 'static' qualifier from ftrace_ops_list_func()
  x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  selftests/livepatch: add "ftrace a live patched function" test

 arch/x86/include/asm/thread_info.h                 | 11 +++
 arch/x86/kernel/Makefile                           |  1 +
 arch/x86/kernel/asm-offsets.c                      |  8 +++
 arch/x86/kernel/ftrace.c                           | 79 +++++++++++++++++++---
 arch/x86/kernel/ftrace_int3_stubs.S                | 61 +++++++++++++++++
 kernel/trace/ftrace.c                              |  8 +--
 tools/testing/selftests/livepatch/Makefile         |  3 +-
 .../livepatch/test-livepatch-vs-ftrace.sh          | 44 ++++++++++++
 8 files changed, 199 insertions(+), 16 deletions(-)
 create mode 100644 arch/x86/kernel/ftrace_int3_stubs.S
 create mode 100755 tools/testing/selftests/livepatch/test-livepatch-vs-ftrace.sh

-- 
2.13.7


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member
  2019-04-27 10:06 [PATCH 0/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
@ 2019-04-27 10:06 ` Nicolai Stange
  2019-04-28 17:41   ` Andy Lutomirski
  2019-04-27 10:06 ` [PATCH 2/4] ftrace: drop 'static' qualifier from ftrace_ops_list_func() Nicolai Stange
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 64+ messages in thread
From: Nicolai Stange @ 2019-04-27 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest,
	Nicolai Stange

Before actually rewriting an insn, x86' DYNAMIC_FTRACE  implementation
places an int3 breakpoint on it. Currently, ftrace_int3_handler() simply
treats the insn in question as nop and advances %rip past it. An upcoming
patch will improve this by making the int3 trap handler emulate the call
insn. To this end, ftrace_int3_handler() will be made to change its iret
frame's ->ip to some stub which will then mimic the function call in the
original context.

Somehow the trapping ->ip address will have to get communicated from
ftrace_int3_handler() to these stubs though. Note that at any given point
in time, there can be at most four such call insn emulations pending:
namely at most one per "process", "irq", "softirq" and "nmi" context.

Introduce struct ftrace_int3_stack providing four entries for storing
the instruction pointer.

In principle, it could be made per-cpu, but this would require making
ftrace_int3_handler() to return with preemption disabled and to enable it
from those emulation stubs again only after the stack's top entry has been
consumed. I've been told that this would "break a lot of norms" and that
making this stack part of struct thread_info instead would be less fragile.
Follow this advice and add a struct ftrace_int3_stack instance to x86's
struct thread_info. Note that these stacks will get only rarely accessed
(only during ftrace's code modifications) and thus, cache line dirtying
won't have any significant impact on the neighbouring fields.

Initialization will take place implicitly through INIT_THREAD_INFO as per
the rules for missing elements in initializers. The memcpy() in
arch_dup_task_struct() will propagate the initial state properly, because
it's always run in process context and won't ever see a non-zero ->depth
value.

Finally, add the necessary bits to asm-offsets for making struct
ftrace_int3_stack accessible from assembly.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/include/asm/thread_info.h | 11 +++++++++++
 arch/x86/kernel/asm-offsets.c      |  8 ++++++++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e0eccbcb8447..83434a88cfbb 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -56,6 +56,17 @@ struct task_struct;
 struct thread_info {
 	unsigned long		flags;		/* low level flags */
 	u32			status;		/* thread synchronous flags */
+#ifdef CONFIG_DYNAMIC_FTRACE
+	struct ftrace_int3_stack {
+		int depth;
+		/*
+		 * There can be at most one slot in use per context,
+		 * i.e. at most one for "normal", "irq", "softirq" and
+		 * "nmi" each.
+		 */
+		unsigned long slots[4];
+	} ftrace_int3_stack;
+#endif
 };
 
 #define INIT_THREAD_INFO(tsk)			\
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 168543d077d7..ca6ee24a0c6e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -105,4 +105,12 @@ static void __used common(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+	BLANK();
+	OFFSET(TASK_TI_ftrace_int3_depth, task_struct,
+	       thread_info.ftrace_int3_stack.depth);
+	OFFSET(TASK_TI_ftrace_int3_slots, task_struct,
+	       thread_info.ftrace_int3_stack.slots);
+#endif
 }
-- 
2.13.7


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 2/4] ftrace: drop 'static' qualifier from ftrace_ops_list_func()
  2019-04-27 10:06 [PATCH 0/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
  2019-04-27 10:06 ` [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member Nicolai Stange
@ 2019-04-27 10:06 ` Nicolai Stange
  2019-04-27 10:06 ` [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
  2019-04-27 10:06 ` [PATCH 4/4] selftests/livepatch: add "ftrace a live patched function" test Nicolai Stange
  3 siblings, 0 replies; 64+ messages in thread
From: Nicolai Stange @ 2019-04-27 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest,
	Nicolai Stange

With an upcoming patch improving x86' ftrace_int3_handler() not to simply
skip over the insn being updated, ftrace_ops_list_func() will have to get
referenced from arch/x86 code. Drop its 'static' qualifier.

Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 kernel/trace/ftrace.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b920358dd8f7..ed3c20811d9a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -125,8 +125,8 @@ ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
 struct ftrace_ops global_ops;
 
 #if ARCH_SUPPORTS_FTRACE_OPS
-static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
-				 struct ftrace_ops *op, struct pt_regs *regs);
+void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
+			  struct ftrace_ops *op, struct pt_regs *regs);
 #else
 /* See comment below, where ftrace_ops_list_func is defined */
 static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip);
@@ -6302,8 +6302,8 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
  * set the ARCH_SUPPORTS_FTRACE_OPS.
  */
 #if ARCH_SUPPORTS_FTRACE_OPS
-static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
-				 struct ftrace_ops *op, struct pt_regs *regs)
+void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
+			  struct ftrace_ops *op, struct pt_regs *regs)
 {
 	__ftrace_ops_list_func(ip, parent_ip, NULL, regs);
 }
-- 
2.13.7


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-27 10:06 [PATCH 0/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
  2019-04-27 10:06 ` [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member Nicolai Stange
  2019-04-27 10:06 ` [PATCH 2/4] ftrace: drop 'static' qualifier from ftrace_ops_list_func() Nicolai Stange
@ 2019-04-27 10:06 ` Nicolai Stange
  2019-04-27 10:26   ` Peter Zijlstra
  2019-04-27 10:06 ` [PATCH 4/4] selftests/livepatch: add "ftrace a live patched function" test Nicolai Stange
  3 siblings, 1 reply; 64+ messages in thread
From: Nicolai Stange @ 2019-04-27 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest,
	Nicolai Stange

With dynamic ftrace, ftrace patches call sites on x86 in a three steps:
1. Put a breakpoint at the to be patched location,
2. update call site and
3. finally remove the breakpoint again.

Note that the breakpoint ftrace_int3_handler() simply makes execution to
skip over the to be patched instruction.

This patching happens in the following circumstances:
a.) the global ftrace_trace_function changes and the call sites at
    ftrace_call and ftrace_regs_call get patched,
b.) an ftrace_ops' ->func changes and the call site in its trampoline gets
    patched,
c.) graph tracing gets enabled/disabled and the jump site at
    ftrace_graph_call gets patched,
d.) a mcount site gets converted from nop -> call, call -> nop, or
    call -> call.

The latter case, i.e. a mcount call getting redirected, is possible in e.g.
a transition from trampolined to not trampolined upon a user enabling
function tracing on a live patched function.

The ftrace_int3_handler() simply skipping over the updated insn is quite
problematic in the context of live patching, because it means that a live
patched function gets temporarily reverted to its unpatched original and
this breaks the live patching consistency model. But even without live
patching, it is desirable to avoid missing traces when making changes to
the tracing setup.

Make ftrace_int3_handler not to skip over the fops invocation, but modify
the interrupted control flow to issue a call as needed.

Case c.) from the list above can be ignored, because a jmp instruction gets
changed to a nop or vice versa.

The remaining cases a.), b.) and d.) all involve call instructions. For a.)
and b.), the call always goes to some ftrace_func_t and the generic
ftrace_ops_list_func() impementation will be able to demultiplex and do the
right thing. For case d.), the call target is either of ftrace_caller(),
ftrace_regs_caller() or some ftrace_ops' trampoline. Because providing the
register state won't cause any harm for !FTRACE_OPS_FL_SAVE_REGS
ftrace_ops, ftrace_regs_caller() would be a suitable target capable of
handling any case.

ftrace_int3_handler()'s context is different from the interrupted call
instruction's one, obviously. In order to be able to emulate the call
within the original context, make ftrace_int3_handler() set its iret
frame's ->ip to some helper stub. Upon return from the trap, this stub will
then mimic the call by pushing the the return address onto the stack and
issuing a jmp to the target address. As describe above, the jmp target
will be either of ftrace_ops_list_func() or ftrace_regs_caller(). Provide
one such stub implementation for each of the two cases.

Finally, the desired return address, which is derived from the trapping
->ip, must get passed from ftrace_int3_handler() to these stubs. Make
ftrace_int3_handler() push it onto the ftrace_int3_stack introduced by an
earlier patch and let the stubs consume it. Be careful to use proper
compiler barriers such that nested int3 handling from e.g. irqs won't
clobber entries owned by outer instances.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 arch/x86/kernel/Makefile            |  1 +
 arch/x86/kernel/ftrace.c            | 79 +++++++++++++++++++++++++++++++------
 arch/x86/kernel/ftrace_int3_stubs.S | 61 ++++++++++++++++++++++++++++
 3 files changed, 130 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/kernel/ftrace_int3_stubs.S

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 00b7e27bc2b7..0b63ae02b1f3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -93,6 +93,7 @@ obj-$(CONFIG_LIVEPATCH)	+= livepatch.o
 obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace_$(BITS).o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
+obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace_int3_stubs.o
 obj-$(CONFIG_X86_TSC)		+= trace_clock.o
 obj-$(CONFIG_KEXEC_CORE)	+= machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)	+= relocate_kernel_$(BITS).o crash.o
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ef49517f6bb2..917494f35cba 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -280,25 +280,84 @@ static nokprobe_inline int is_ftrace_caller(unsigned long ip)
 	return 0;
 }
 
-/*
- * A breakpoint was added to the code address we are about to
- * modify, and this is the handle that will just skip over it.
- * We are either changing a nop into a trace call, or a trace
- * call to a nop. While the change is taking place, we treat
- * it just like it was a nop.
- */
+extern void ftrace_graph_call(void);
+
+asmlinkage void ftrace_int3_stub_regs_caller(void);
+asmlinkage void ftrace_int3_stub_list_func(void);
+
+/* A breakpoint was added to the code address we are about to modify. */
 int ftrace_int3_handler(struct pt_regs *regs)
 {
 	unsigned long ip;
+	bool is_ftrace_location;
+	struct ftrace_int3_stack *stack;
+	int slot;
 
 	if (WARN_ON_ONCE(!regs))
 		return 0;
 
 	ip = regs->ip - 1;
-	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
+	is_ftrace_location = ftrace_location(ip);
+	if (!is_ftrace_location && !is_ftrace_caller(ip))
 		return 0;
 
-	regs->ip += MCOUNT_INSN_SIZE - 1;
+	ip += MCOUNT_INSN_SIZE;
+
+	if (!is_ftrace_location &&
+	    ftrace_update_func == (unsigned long)ftrace_graph_call) {
+		/*
+		 * The insn at ftrace_graph_call is being changed from a
+		 * nop to a jmp or vice versa. Treat it as a nop and
+		 * skip over it.
+		 */
+		regs->ip = ip;
+		return 1;
+	}
+
+	/*
+	 * The insn having the breakpoint on it is either some mcount
+	 * call site or one of ftrace_call, ftrace_regs_call and their
+	 * equivalents within some trampoline. The currently pending
+	 * transition is known to turn the insn from a nop to a call,
+	 * from a call to a nop or to change the target address of an
+	 * existing call. We're going to emulate a call to the most
+	 * generic implementation capable of handling any possible
+	 * configuration. For the mcount sites that would be
+	 * ftrace_regs_caller() and for the remaining calls, which all
+	 * have got some ftrace_func_t target, ftrace_ops_list_func()
+	 * will do the right thing.
+	 *
+	 * However, the call insn can't get emulated from this trap
+	 * handler here. Rewrite the iret frame's ->ip value to one of
+	 * the ftrace_int3_stub instances which will then setup
+	 * everything in the original context. The address following
+	 * the current insn will be passed to the stub via the
+	 * ftrace_int3_stack.
+	 */
+	stack = &current_thread_info()->ftrace_int3_stack;
+	if (WARN_ON_ONCE(stack->depth >= 4)) {
+		/*
+		 * This should not happen as at most one stack slot is
+		 * required per the contexts "normal", "irq",
+		 * "softirq" and "nmi" each. However, be conservative
+		 * and treat it like a nop.
+		 */
+		regs->ip = ip;
+		return 1;
+	}
+
+	/*
+	 * Make sure interrupts will see the incremented ->depth value
+	 * before writing the stack entry.
+	 */
+	slot = stack->depth;
+	WRITE_ONCE(stack->depth, slot + 1);
+	WRITE_ONCE(stack->slots[slot], ip);
+
+	if (is_ftrace_location)
+		regs->ip = (unsigned long)ftrace_int3_stub_regs_caller;
+	else
+		regs->ip = (unsigned long)ftrace_int3_stub_list_func;
 
 	return 1;
 }
@@ -949,8 +1008,6 @@ void arch_ftrace_trampoline_free(struct ftrace_ops *ops)
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-extern void ftrace_graph_call(void);
-
 static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 {
 	return ftrace_text_replace(0xe9, ip, addr);
diff --git a/arch/x86/kernel/ftrace_int3_stubs.S b/arch/x86/kernel/ftrace_int3_stubs.S
new file mode 100644
index 000000000000..ef5f580450bb
--- /dev/null
+++ b/arch/x86/kernel/ftrace_int3_stubs.S
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2019 SUSE Linux GmbH */
+
+#include <asm/asm.h>
+#include <asm/percpu.h>
+#include <asm/unwind_hints.h>
+#include <asm/asm-offsets.h>
+#include <linux/linkage.h>
+
+#ifdef CONFIG_X86_64
+#define WORD_SIZE 8
+#else
+#define WORD_SIZE 4
+#endif
+
+.macro ftrace_int3_stub call_target
+	/*
+	 * We got here from ftrace_int3_handler() because a breakpoint
+	 * on a call insn currently being modified has been
+	 * hit. ftrace_int3_handler() can't emulate the function call
+	 * directly, because it's running at a different position on
+	 * the stack, obviously. Hence it sets the regs->ip to this
+	 * stub so that we end up here upon the iret from the int3
+	 * handler. The stack is now in its original state and we can
+	 * emulate the function call insn by pushing the return
+	 * address onto the stack and jumping to the call target. The
+	 * desired return address has been put onto the ftrace_int3_stack
+	 * kept within struct thread_info.
+	 */
+	UNWIND_HINT_EMPTY
+	/* Reserve room for the emulated call's return address. */
+	sub	$WORD_SIZE, %_ASM_SP
+	/*
+	 * Pop the return address from ftrace_int3_ip_stack and write
+	 * it to the location just reserved. Be careful to retrieve
+	 * the address before decrementing ->depth in order to protect
+	 * against nested contexts clobbering it.
+	 */
+	push	%_ASM_AX
+	push	%_ASM_CX
+	push	%_ASM_DX
+	mov	PER_CPU_VAR(current_task), %_ASM_AX
+	mov	TASK_TI_ftrace_int3_depth(%_ASM_AX), %_ASM_CX
+	dec	%_ASM_CX
+	mov	TASK_TI_ftrace_int3_slots(%_ASM_AX, %_ASM_CX, WORD_SIZE), %_ASM_DX
+	mov	%_ASM_CX, TASK_TI_ftrace_int3_depth(%_ASM_AX)
+	mov	%_ASM_DX, 3*WORD_SIZE(%_ASM_SP)
+	pop	%_ASM_DX
+	pop	%_ASM_CX
+	pop	%_ASM_AX
+	/* Finally, transfer control to the target function. */
+	jmp	\call_target
+.endm
+
+ENTRY(ftrace_int3_stub_regs_caller)
+	ftrace_int3_stub ftrace_regs_caller
+END(ftrace_int3_stub_regs_caller)
+
+ENTRY(ftrace_int3_stub_list_func)
+	ftrace_int3_stub ftrace_ops_list_func
+END(ftrace_int3_stub_list_func)
-- 
2.13.7


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 4/4] selftests/livepatch: add "ftrace a live patched function" test
  2019-04-27 10:06 [PATCH 0/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
                   ` (2 preceding siblings ...)
  2019-04-27 10:06 ` [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
@ 2019-04-27 10:06 ` Nicolai Stange
  3 siblings, 0 replies; 64+ messages in thread
From: Nicolai Stange @ 2019-04-27 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest,
	Nicolai Stange

There had been an issue with interactions between tracing and live patching
due to how x86' CONFIG_DYNAMIC_FTRACE used to handle the breakpoints at the
updated instructions from its ftrace_int3_handler().

More specifically, starting to trace a live patched function caused a short
period in time where the live patching redirection became ineffective. In
particular, the guarantees from the consistency model couldn't be held up
in this situation.

Implement a testcase for verifying that a function's live patch replacement
is kept effective when enabling tracing on it.

Reuse the existing 'test_klp_livepatch' live patch module which patches
cmdline_proc_show(), the handler for /proc/cmdline.

Let the testcase in a loop
- apply this live patch,
- launch a background shell job enabling tracing on that function
- while continuously verifying that the contents of /proc/cmdline still
  match what would be expected when the live patch is applied.

Signed-off-by: Nicolai Stange <nstange@suse.de>
---
 tools/testing/selftests/livepatch/Makefile         |  3 +-
 .../livepatch/test-livepatch-vs-ftrace.sh          | 44 ++++++++++++++++++++++
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/livepatch/test-livepatch-vs-ftrace.sh

diff --git a/tools/testing/selftests/livepatch/Makefile b/tools/testing/selftests/livepatch/Makefile
index af4aee79bebb..bfa5353f6d17 100644
--- a/tools/testing/selftests/livepatch/Makefile
+++ b/tools/testing/selftests/livepatch/Makefile
@@ -3,6 +3,7 @@
 TEST_GEN_PROGS := \
 	test-livepatch.sh \
 	test-callbacks.sh \
-	test-shadow-vars.sh
+	test-shadow-vars.sh \
+	test-livepatch-vs-ftrace.sh
 
 include ../lib.mk
diff --git a/tools/testing/selftests/livepatch/test-livepatch-vs-ftrace.sh b/tools/testing/selftests/livepatch/test-livepatch-vs-ftrace.sh
new file mode 100755
index 000000000000..5c982ec56373
--- /dev/null
+++ b/tools/testing/selftests/livepatch/test-livepatch-vs-ftrace.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2019 SUSE Linux GmbH
+
+. $(dirname $0)/functions.sh
+
+set -e
+
+MOD_LIVEPATCH=test_klp_livepatch
+
+# TEST: ftrace a live patched function
+# - load a livepatch that modifies the output from /proc/cmdline
+# - install a function tracer at the live patched function
+# - verify that the function is still patched by reading /proc/cmdline
+# - unload the livepatch and make sure the patch was removed
+
+echo -n "TEST: ftrace a live patched function ... "
+dmesg -C
+
+for i in $(seq 1 3); do
+	load_lp $MOD_LIVEPATCH
+
+	( echo cmdline_proc_show > /sys/kernel/debug/tracing/set_ftrace_filter;
+	  echo function > /sys/kernel/debug/tracing/current_tracer ) &
+
+	for j in $(seq 1 200); do
+		if [[ "$(cat /proc/cmdline)" !=				\
+			"$MOD_LIVEPATCH: this has been live patched" ]] ; then
+			echo -e "FAIL\n\n"
+			die "livepatch kselftest(s) failed"
+		fi
+	done
+
+	wait %1
+
+	echo nop > /sys/kernel/debug/tracing/current_tracer
+	echo > /sys/kernel/debug/tracing/set_ftrace_filter
+
+	disable_lp $MOD_LIVEPATCH
+	unload_lp $MOD_LIVEPATCH
+done
+
+echo "ok"
+exit 0
-- 
2.13.7


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-27 10:06 ` [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
@ 2019-04-27 10:26   ` Peter Zijlstra
  2019-04-28 17:38     ` Steven Rostedt
  0 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2019-04-27 10:26 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Steven Rostedt, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest

On Sat, Apr 27, 2019 at 12:06:38PM +0200, Nicolai Stange wrote:
> ftrace_int3_handler()'s context is different from the interrupted call
> instruction's one, obviously. In order to be able to emulate the call
> within the original context, make ftrace_int3_handler() set its iret
> frame's ->ip to some helper stub. Upon return from the trap, this stub will
> then mimic the call by pushing the the return address onto the stack and
> issuing a jmp to the target address. As describe above, the jmp target
> will be either of ftrace_ops_list_func() or ftrace_regs_caller(). Provide
> one such stub implementation for each of the two cases.

Yuck; I'd much rather we get that static_call() stuff sorted such that
text_poke() and poke_int3_handler() can do CALL emulation.

Given all the back and forth, I think the solution where we shift
pt_regs a bit to allow the emulated PUSH is a viable solution; eg. I
think we collectively hated it least.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-27 10:26   ` Peter Zijlstra
@ 2019-04-28 17:38     ` Steven Rostedt
  2019-04-29 18:06       ` Linus Torvalds
  0 siblings, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-04-28 17:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest,
	Linus Torvalds


[ Added Linus ]

On Sat, 27 Apr 2019 12:26:57 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, Apr 27, 2019 at 12:06:38PM +0200, Nicolai Stange wrote:
> > ftrace_int3_handler()'s context is different from the interrupted call
> > instruction's one, obviously. In order to be able to emulate the call
> > within the original context, make ftrace_int3_handler() set its iret
> > frame's ->ip to some helper stub. Upon return from the trap, this stub will
> > then mimic the call by pushing the the return address onto the stack and
> > issuing a jmp to the target address. As describe above, the jmp target
> > will be either of ftrace_ops_list_func() or ftrace_regs_caller(). Provide
> > one such stub implementation for each of the two cases.  
> 
> Yuck; I'd much rather we get that static_call() stuff sorted such that
> text_poke() and poke_int3_handler() can do CALL emulation.
> 
> Given all the back and forth, I think the solution where we shift
> pt_regs a bit to allow the emulated PUSH is a viable solution; eg. I
> think we collectively hated it least.

Note, this use case is slightly different than the static calls but has
the same issue.  That issue is that we need a way to simulate a "call"
function from int3, and there's no good way to handle that case right
now.

The static call needs to modify the call sight but make the call still
work from int3 context.

This use case is similar, but the issue is with a "bug" in the function
tracing implementation, that has become an issue with live kernel
patching which is built on top of it.

The issue is this:

For optimization reasons, if there's only a single user of a function
it gets its own trampoline that sets up the call to its callback and
calls that callback directly:


<function being traced>
  call custom trampoline


<custom trampoline>
   save regs to call C code
   call custom_callback
   restore regs
   ret

If more than one callback is attached to that function, then we need to
call the iterator that loops through all registered callbacks of the
function tracer and checks their hashes to see if they want to trace
this function or not.

<function being traced>
  call iterator_trampoline

<iterator_trampoline>
  save regs to call C code
  call iterator
  restore regs
  ret

What happens in that transition? We add an "int3" at the function being
traced, and change:

  call custom_trampoline

to

  call iterator_trampoline

But if the function being traced is executed during this transition,
the int3 just makes it act like a nop and returns pass the call.

For tracing this is a bug because we just missed a function that should
be traced. For live kernel patching, this could be more devastating
because the original "buggy" patched function is called, and this could
cause subtle problems.

If we can add a slot to the int3 handler, that we can use to emulate a
call, (perhaps add another slot to pt_regs structure, call it link
register)

It would make it easier to solve both this and the static call problems.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member
  2019-04-27 10:06 ` [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member Nicolai Stange
@ 2019-04-28 17:41   ` Andy Lutomirski
  2019-04-28 17:51     ` Steven Rostedt
  0 siblings, 1 reply; 64+ messages in thread
From: Andy Lutomirski @ 2019-04-28 17:41 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Steven Rostedt, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest


> On Apr 27, 2019, at 3:06 AM, Nicolai Stange <nstange@suse.de> wrote:
> 
> Before actually rewriting an insn, x86' DYNAMIC_FTRACE  implementation
> places an int3 breakpoint on it. Currently, ftrace_int3_handler() simply
> treats the insn in question as nop and advances %rip past it.

How does this not crash all the time?

> An upcoming
> patch will improve this by making the int3 trap handler emulate the call
> insn. To this end, ftrace_int3_handler() will be made to change its iret
> frame's ->ip to some stub which will then mimic the function call in the
> original context.
> 
> Somehow the trapping ->ip address will have to get communicated from
> ftrace_int3_handler() to these stubs though.

Cute.  But this should get “ftrace” removed from the name, since it’s potentially more generally useful.  Josh wanted something like this for static_call.

> Note that at any given point
> in time, there can be at most four such call insn emulations pending:
> namely at most one per "process", "irq", "softirq" and "nmi" context.
> 

That’s quite an assumption. I think your list should also contain exception, exceptions nested inside that exception, and machine check, at the very least.  I’m also wondering why irq and softirq are treated separately.

All this makes me think that one of the other solutions we came up with last time we discussed this might be better.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member
  2019-04-28 17:41   ` Andy Lutomirski
@ 2019-04-28 17:51     ` Steven Rostedt
  2019-04-28 18:08       ` Andy Lutomirski
  2019-04-28 21:22       ` Nicolai Stange
  0 siblings, 2 replies; 64+ messages in thread
From: Steven Rostedt @ 2019-04-28 17:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest

On Sun, 28 Apr 2019 10:41:10 -0700
Andy Lutomirski <luto@amacapital.net> wrote:


> > Note that at any given point
> > in time, there can be at most four such call insn emulations pending:
> > namely at most one per "process", "irq", "softirq" and "nmi" context.
> >   
> 
> That’s quite an assumption. I think your list should also contain
> exception, exceptions nested inside that exception, and machine
> check, at the very least.  I’m also wondering why irq and softirq are
> treated separately.

4 has usually been the context count we choose. But I guess in theory,
if we get exceptions then I could potentially be more.

As for irq vs softirq, an interrupt can preempt a softirq. Interrupts
are enabled while softirqs are running. When sofirqs run, softirqs are
disabled to prevent nested softirqs. But interrupts are enabled again,
and another interrupt may come in while a softirq is executing.

> 
> All this makes me think that one of the other solutions we came up
> with last time we discussed this might be better.

+100

Perhaps adding another slot into pt_regs that gets used by int3 to
store a slot to emulate a call on return?

-- Steve


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member
  2019-04-28 17:51     ` Steven Rostedt
@ 2019-04-28 18:08       ` Andy Lutomirski
  2019-04-28 19:43         ` Steven Rostedt
  2019-04-28 21:22       ` Nicolai Stange
  1 sibling, 1 reply; 64+ messages in thread
From: Andy Lutomirski @ 2019-04-28 18:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest



> On Apr 28, 2019, at 10:51 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Sun, 28 Apr 2019 10:41:10 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
>>> Note that at any given point
>>> in time, there can be at most four such call insn emulations pending:
>>> namely at most one per "process", "irq", "softirq" and "nmi" context.
>>> 
>> 
>> That’s quite an assumption. I think your list should also contain
>> exception, exceptions nested inside that exception, and machine
>> check, at the very least.  I’m also wondering why irq and softirq are
>> treated separately.
> 
> 4 has usually been the context count we choose. But I guess in theory,
> if we get exceptions then I could potentially be more.
> 
> As for irq vs softirq, an interrupt can preempt a softirq. Interrupts
> are enabled while softirqs are running. When sofirqs run, softirqs are
> disabled to prevent nested softirqs. But interrupts are enabled again,
> and another interrupt may come in while a softirq is executing.
> 
>> 
>> All this makes me think that one of the other solutions we came up
>> with last time we discussed this might be better.
> 
> +100
> 
> Perhaps adding another slot into pt_regs that gets used by int3 to
> store a slot to emulate a call on return?
> 
> 

That’s not totally nuts, although finding pt_regs isn’t entirely trivial.

I still think I prefer an approach where we just emulate the call directly.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member
  2019-04-28 18:08       ` Andy Lutomirski
@ 2019-04-28 19:43         ` Steven Rostedt
  2019-04-28 20:56           ` Andy Lutomirski
  0 siblings, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-04-28 19:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest

On Sun, 28 Apr 2019 11:08:34 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> > 
> > Perhaps adding another slot into pt_regs that gets used by int3 to
> > store a slot to emulate a call on return?
> > 
> >   
> 
> That’s not totally nuts, although finding pt_regs isn’t entirely trivial.

I meant on the int3 handler (which stores the pt_regs).

> 
> I still think I prefer an approach where we just emulate the call directly.

Then, on the return of int3, if there's anything in that slot, then we
could possibly shift the exception handler frame (that was added by the
hardware), insert the slot data into the top of the stack, and then
call iret (which the int3 handler, would add the return ip to be the
function being called), which would in essence emulate the call directly.

I believe the complexity comes from the exception frame added by the
hardware is where we need to put the return of the call for the
emulation.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member
  2019-04-28 19:43         ` Steven Rostedt
@ 2019-04-28 20:56           ` Andy Lutomirski
  0 siblings, 0 replies; 64+ messages in thread
From: Andy Lutomirski @ 2019-04-28 20:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest


> On Apr 28, 2019, at 12:43 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Sun, 28 Apr 2019 11:08:34 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
>>> 
>>> Perhaps adding another slot into pt_regs that gets used by int3 to
>>> store a slot to emulate a call on return?
>>> 
>>> 
>> 
>> That’s not totally nuts, although finding pt_regs isn’t entirely trivial.
> 
> I meant on the int3 handler (which stores the pt_regs).

But that’s below the stub’s RSP, so it’s toast if another interrupt happens. Or am I misunderstanding you?

> 
>> 
>> I still think I prefer an approach where we just emulate the call directly.
> 
> Then, on the return of int3, if there's anything in that slot, then we
> could possibly shift the exception handler frame (that was added by the
> hardware), insert the slot data into the top of the stack, and then
> call iret (which the int3 handler, would add the return ip to be the
> function being called), which would in essence emulate the call directly.

Oh, I get it.

I liked Josh’s old proposal of unconditionally shifting the #BP frame 8 bytes better. It will be interesting when kernel shadow stacks are thrown in the mix, but that’s a problem for another day.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member
  2019-04-28 17:51     ` Steven Rostedt
  2019-04-28 18:08       ` Andy Lutomirski
@ 2019-04-28 21:22       ` Nicolai Stange
  2019-04-28 23:27         ` Andy Lutomirski
  1 sibling, 1 reply; 64+ messages in thread
From: Nicolai Stange @ 2019-04-28 21:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest

Steven Rostedt <rostedt@goodmis.org> writes:

> On Sun, 28 Apr 2019 10:41:10 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>> > Note that at any given point
>> > in time, there can be at most four such call insn emulations pending:
>> > namely at most one per "process", "irq", "softirq" and "nmi" context.
>> >   
>> 
>> That’s quite an assumption. I think your list should also contain
>> exception, exceptions nested inside that exception, and machine
>> check, at the very least.  I’m also wondering why irq and softirq are
>> treated separately.

You're right, I missed the machine check case.


> 4 has usually been the context count we choose. But I guess in theory,
> if we get exceptions then I could potentially be more.

After having seen the static_call discussion, I'm in no way defending
this limited approach here, but out of curiosity: can the code between
the push onto the stack from ftrace_int3_handler() and the subsequent
pop from the stub actually trigger an (non-#MC) exception? There's an
iret inbetween, but that can fault only when returning to user space,
correct?


> As for irq vs softirq, an interrupt can preempt a softirq. Interrupts
> are enabled while softirqs are running. When sofirqs run, softirqs are
> disabled to prevent nested softirqs. But interrupts are enabled again,
> and another interrupt may come in while a softirq is executing.
>

Thanks,

Nicolai


-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member
  2019-04-28 21:22       ` Nicolai Stange
@ 2019-04-28 23:27         ` Andy Lutomirski
  0 siblings, 0 replies; 64+ messages in thread
From: Andy Lutomirski @ 2019-04-28 23:27 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Steven Rostedt, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, linux-kernel, live-patching, linux-kselftest


> On Apr 28, 2019, at 2:22 PM, Nicolai Stange <nstange@suse.de> wrote:
> 
> Steven Rostedt <rostedt@goodmis.org> writes:
> 
>> On Sun, 28 Apr 2019 10:41:10 -0700
>> Andy Lutomirski <luto@amacapital.net> wrote:
>> 
>> 
>>>> Note that at any given point
>>>> in time, there can be at most four such call insn emulations pending:
>>>> namely at most one per "process", "irq", "softirq" and "nmi" context.
>>>> 
>>> 
>>> That’s quite an assumption. I think your list should also contain
>>> exception, exceptions nested inside that exception, and machine
>>> check, at the very least.  I’m also wondering why irq and softirq are
>>> treated separately.
> 
> You're right, I missed the machine check case.
> 
> 
>> 4 has usually been the context count we choose. But I guess in theory,
>> if we get exceptions then I could potentially be more.
> 
> After having seen the static_call discussion, I'm in no way defending
> this limited approach here, but out of curiosity: can the code between
> the push onto the stack from ftrace_int3_handler() and the subsequent
> pop from the stub actually trigger an (non-#MC) exception? There's an
> iret inbetween, but that can fault only when returning to user space,
> correct?

IRET doesn’t do any fancy masking, so #DB, NMI and regular IRQs should all be possible.

> 
> 
>> As for irq vs softirq, an interrupt can preempt a softirq. Interrupts
>> are enabled while softirqs are running. When sofirqs run, softirqs are
>> disabled to prevent nested softirqs. But interrupts are enabled again,
>> and another interrupt may come in while a softirq is executing.
>> 
> 
> Thanks,
> 
> Nicolai
> 
> 
> -- 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-28 17:38     ` Steven Rostedt
@ 2019-04-29 18:06       ` Linus Torvalds
  2019-04-29 18:22         ` Linus Torvalds
  2019-04-29 18:52         ` Steven Rostedt
  0 siblings, 2 replies; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 18:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 2496 bytes --]

On Sun, Apr 28, 2019 at 10:38 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> For optimization reasons, if there's only a single user of a function
> it gets its own trampoline that sets up the call to its callback and
> calls that callback directly:

So this is the same issue as the static calls, and it has exactly the
same solution.

Which I already outlined once, and nobody wrote the code for.

So here's a COMPLETELY UNTESTED patch that only works (_if_ it works) for

 (a) 64-bit

 (b) SMP

but that's just because I've hardcoded the percpu segment handling.

It does *not* emulate the "call" in the BP handler itself, instead if
replace the %ip (the same way all the other BP handlers replace the
%ip) with a code sequence that just does

        push %gs:bp_call_return
        jmp *%gs:bp_call_target

after having filled in those per-cpu things.

The reason they are percpu is that after the %ip has been changed, the
target CPU goes its merry way, and doesn't wait for the text--poke
semaphore serialization. But since we have interrupts disabled on that
CPU, we know that *another* text poke won't be coming around and
changing the values.

THIS IS ENTIRELY UNTESTED! I've built it, and it at least seems to
build, although with warnings

  arch/x86/kernel/alternative.o: warning: objtool:
emulate_call_irqoff()+0x9: indirect jump found in RETPOLINE build
  arch/x86/kernel/alternative.o: warning: objtool:
emulate_call_irqon()+0x8: indirect jump found in RETPOLINE build
  arch/x86/kernel/alternative.o: warning: objtool:
emulate_call_irqoff()+0x9: sibling call from callable instruction with
modified stack frame
  arch/x86/kernel/alternative.o: warning: objtool:
emulate_call_irqon()+0x8: sibling call from callable instruction with
modified stack frame

that will need the appropriate "ignore this case" annotations that I didn't do.

Do I expect it to work? No. I'm sure there's some silly mistake here,
but the point of the patch is to show it as an example, so that it can
actually be tested.

With this, it should be possible (under the text rewriting lock) to do

        replace_call(callsite, newcallopcode, callsize, calltargettarget);

to do the static rewriting of the call at "callsite" to have the new
call target.

And again. Untested. But doesn't need any special code in the entry
path, and the concept is simple even if there are probably stupid bugs
just because it's entirely untested.

Oh, and did I mention that I didn't test this?

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2788 bytes --]

 arch/x86/kernel/alternative.c | 54 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 9a79c7808f9c..92b59958cff3 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -739,7 +739,11 @@ static void do_sync_core(void *info)
 }
 
 static bool bp_patching_in_progress;
-static void *bp_int3_handler, *bp_int3_addr;
+static void *bp_int3_handler_irqoff, *bp_int3_handler_irqon, *bp_int3_addr;
+static void *bp_int3_call_target, *bp_int3_call_return;
+
+static DEFINE_PER_CPU(void *, bp_call_return);
+static DEFINE_PER_CPU(void *, bp_call_target);
 
 int poke_int3_handler(struct pt_regs *regs)
 {
@@ -762,7 +766,22 @@ int poke_int3_handler(struct pt_regs *regs)
 		return 0;
 
 	/* set up the specified breakpoint handler */
-	regs->ip = (unsigned long) bp_int3_handler;
+	regs->ip = (unsigned long) bp_int3_handler_irqon;
+
+	/*
+	 * If we want an irqoff irq3 handler, and interrupts were
+	 * on, we turn them off and use the special irqoff handler
+	 * instead.
+	 */
+	if (bp_int3_handler_irqoff) {
+		this_cpu_write(bp_call_target, bp_int3_call_target);
+		this_cpu_write(bp_call_return, bp_int3_call_return);
+
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) bp_int3_handler_irqoff;
+		}
+	}
 
 	return 1;
 }
@@ -792,7 +811,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
 {
 	unsigned char int3 = 0xcc;
 
-	bp_int3_handler = handler;
+	bp_int3_handler_irqon = handler;
 	bp_int3_addr = (u8 *)addr + sizeof(int3);
 	bp_patching_in_progress = true;
 
@@ -830,7 +849,36 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
 	 * the writing of the new instruction.
 	 */
 	bp_patching_in_progress = false;
+	bp_int3_handler_irqoff = NULL;
 
 	return addr;
 }
 
+extern asmlinkage void emulate_call_irqon(void);
+extern asmlinkage void emulate_call_irqoff(void);
+
+asm(
+	".text\n"
+	".global emulate_call_irqoff\n"
+	".type emulate_call_irqoff, @function\n"
+	"emulate_call_irqoff:\n\t"
+		"push %gs:bp_call_return\n\t"
+		"sti\n\t"
+		"jmp *%gs:bp_call_target\n"
+	".size emulate_call_irqoff, .-emulate_call_irqoff\n"
+
+	".global emulate_call_irqon\n"
+	".type emulate_call_irqon, @function\n"
+	"emulate_call_irqon:\n\t"
+		"push %gs:bp_call_return\n\t"
+		"jmp *%gs:bp_call_target\n"
+	".size emulate_call_irqon, .-emulate_call_irqon\n"
+	".previous\n");
+
+void replace_call(void *addr, const void *opcode, size_t len, void *target)
+{
+	bp_int3_call_target = target;
+	bp_int3_call_return = addr + len;
+	bp_int3_handler_irqoff = emulate_call_irqoff;
+	text_poke_bp(addr, opcode, len, emulate_call_irqon);
+}

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 18:06       ` Linus Torvalds
@ 2019-04-29 18:22         ` Linus Torvalds
  2019-04-29 18:42           ` Andy Lutomirski
  2019-04-29 18:52         ` Steven Rostedt
  1 sibling, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 18:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 11:06 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> It does *not* emulate the "call" in the BP handler itself, instead if
> replace the %ip (the same way all the other BP handlers replace the
> %ip) with a code sequence that just does
>
>         push %gs:bp_call_return
>         jmp *%gs:bp_call_target
>
> after having filled in those per-cpu things.

Note that if you read the patch, you'll see that my explanation
glossed over the "what if an interrupt happens" part. Which is handled
by having two handlers, one for "interrupts were already disabled" and
one for "interrupts were enabled, so I disabled them before entering
the handler".

The second handler does the same push/jmp sequence, but has a "sti"
before the jmp. Because of the one-instruction sti shadow, interrupts
won't actually be enabled until after the jmp instruction has
completed, and thus the "push/jmp" is atomic wrt regular interrupts.

It's not safe wrt NMI, of course, but since NMI won't be rescheduling,
and since any SMP IPI won't be punching through that sequence anyway,
it's still atomic wrt _another_ text_poke() attempt coming in and
re-using the bp_call_return/tyarget slots.

So yeah, it's not "one-liner" trivial, but it's not like it's
complicated either, and it actually matches the existing "call this
code to emulate the replaced instruction". So I'd much rather have a
couple of tens of lines of code here that still acts pretty much
exactly like all the other rewriting does, rather than play subtle
games with the entry stack frame.

Finally: there might be other situations where you want to have this
kind of "pseudo-atomic" replacement sequence, so I think while it's a
hack specific to emulating a "call" instruction, I don't think it is
conceptually limited to just that case.

                 Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 18:22         ` Linus Torvalds
@ 2019-04-29 18:42           ` Andy Lutomirski
       [not found]             ` <CAHk-=whtt4K2f0KPtG-4Pykh3FK8UBOjD8jhXCUKB5nWDj_YRA@mail.gmail.com>
  0 siblings, 1 reply; 64+ messages in thread
From: Andy Lutomirski @ 2019-04-29 18:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Andy Lutomirski, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 11:29 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Apr 29, 2019 at 11:06 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >
> > It does *not* emulate the "call" in the BP handler itself, instead if
> > replace the %ip (the same way all the other BP handlers replace the
> > %ip) with a code sequence that just does
> >
> >         push %gs:bp_call_return
> >         jmp *%gs:bp_call_target
> >
> > after having filled in those per-cpu things.
>
> Note that if you read the patch, you'll see that my explanation
> glossed over the "what if an interrupt happens" part. Which is handled
> by having two handlers, one for "interrupts were already disabled" and
> one for "interrupts were enabled, so I disabled them before entering
> the handler".

This is quite a cute solution.

>
> The second handler does the same push/jmp sequence, but has a "sti"
> before the jmp. Because of the one-instruction sti shadow, interrupts
> won't actually be enabled until after the jmp instruction has
> completed, and thus the "push/jmp" is atomic wrt regular interrupts.
>
> It's not safe wrt NMI, of course, but since NMI won't be rescheduling,
> and since any SMP IPI won't be punching through that sequence anyway,
> it's still atomic wrt _another_ text_poke() attempt coming in and
> re-using the bp_call_return/tyarget slots.

I'm less than 100% convinced about this argument.  Sure, an NMI right
there won't cause a problem.  But an NMI followed by an interrupt will
kill us if preemption is on.  I can think of three solutions:

1. Assume that all CPUs (and all relevant hypervisors!) either mask
NMIs in the STI shadow or return from NMIs with interrupts masked for
one instruction.  Ditto for MCE.  This seems too optimistic.

2. Put a fixup in the NMI handler and MCE handler: if the return
address is one of these magic jumps, clear IF and back up IP by one
byte.  This should *work*, but it's a bit ugly.

3. Use a different magic sequence:

push %gs:bp_call_return
int3

and have the int3 handler adjust regs->ip and regs->flags as appropriate.

I think I like #3 the best, even though it's twice as slow.

FWIW, kernel shadow stack patches will show up eventually, and none of
these approaches are compatible as is.  Perhaps the actual sequence
should be this, instead:

bp_int3_fixup_irqsoff:
  call 1f
1:
  int3

bp_int3_fixup_irqson:
  call 1f
1:
  int3

and the int3 handler will update the conventional return address *and*
the shadow return address.  Linus, what do you think about this
variant?

Finally, with my maintainer hat on: if anyone actually wants to merge
this thing, I want to see a test case, exercised regularly (every boot
in configured, perhaps) that actually *runs* this code.  Hitting it in
practice will be rare, and I want bugs to be caught.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 18:06       ` Linus Torvalds
  2019-04-29 18:22         ` Linus Torvalds
@ 2019-04-29 18:52         ` Steven Rostedt
       [not found]           ` <CAHk-=wjm93jLtVxTX4HZs6K4k1Wqh3ujjmapqaYtcibVk_YnzQ@mail.gmail.com>
  2019-04-30 10:46           ` Peter Zijlstra
  1 sibling, 2 replies; 64+ messages in thread
From: Steven Rostedt @ 2019-04-29 18:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, 29 Apr 2019 11:06:58 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> +void replace_call(void *addr, const void *opcode, size_t len, void *target)
> +{
> +	bp_int3_call_target = target;
> +	bp_int3_call_return = addr + len;
> +	bp_int3_handler_irqoff = emulate_call_irqoff;
> +	text_poke_bp(addr, opcode, len, emulate_call_irqon);
> +}

Note, the function tracer does not use text poke. It does it in batch
mode. It can update over 40,000 calls in one go:

	add int3 breakpoint to all 40,000 call sites.
	sync()
	change the last four bytes of each of those call sites
	sync()
	remove int3 from the 40,000 call site with new call.

It's a bit more intrusive than the static call updates we were
discussing before.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
       [not found]             ` <CAHk-=whtt4K2f0KPtG-4Pykh3FK8UBOjD8jhXCUKB5nWDj_YRA@mail.gmail.com>
@ 2019-04-29 18:56               ` Andy Lutomirski
       [not found]                 ` <CAHk-=wgewK4eFhF3=0RNtk1KQjMANFH6oDE=8m=84RExn2gxhw@mail.gmail.com>
  2019-04-29 22:06                 ` Linus Torvalds
  0 siblings, 2 replies; 64+ messages in thread
From: Andy Lutomirski @ 2019-04-29 18:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Lutomirski, Steven Rostedt, Peter Zijlstra,
	Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 11:53 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
>
> On Mon, Apr 29, 2019, 11:42 Andy Lutomirski <luto@kernel.org> wrote:
>>
>>
>> I'm less than 100% convinced about this argument.  Sure, an NMI right
>> there won't cause a problem.  But an NMI followed by an interrupt will
>> kill us if preemption is on.  I can think of three solutions:
>
>
> No, because either the sti shadow disables nmi too (that's the case on some CPUs at least) or the iret from nmi does.
>
> Otherwise you could never trust the whole sti shadow thing - and it very much is part of the architecture.
>

Is this documented somewhere?  And do you actually believe that this
is true under KVM, Hyper-V, etc?  As I recall, Andrew Cooper dug in to
the way that VMX dealt with this stuff and concluded that the SDM was
blatantly wrong in many cases, which leads me to believe that Xen
HVM/PVH is the *only* hypervisor that gets it right.

Steven's point about batched updates is quite valid, though.  My
personal favorite solution to this whole mess is to rework the whole
thing so that the int3 handler simply returns and retries and to
replace the sync_core() broadcast with an SMI broadcast.  I don't know
whether this will actually work on real CPUs and on VMs and whether
it's going to crash various BIOSes out there.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
       [not found]           ` <CAHk-=wjm93jLtVxTX4HZs6K4k1Wqh3ujjmapqaYtcibVk_YnzQ@mail.gmail.com>
@ 2019-04-29 19:07             ` Steven Rostedt
  2019-04-29 20:06               ` Linus Torvalds
  0 siblings, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-04-29 19:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, 29 Apr 2019 11:59:04 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I really don't care. Just do what I suggested, and if you have numbers to
> show problems, then maybe I'll care.
> 

Are you suggesting that I rewrite the code to do it one function at a
time? This has always been batch mode. This is not something new. The
function tracer has been around longer than the text poke code.

> Right now you're just making excuses for this. I described the solution
> months ago, now I've written a patch, if that's not good enough then we can
> just skip this all entirely.
> 
> Honestly, if you need to rewrite tens of thousands of calls, maybe you're
> doing something wrong?
> 

 # cd /sys/kernel/debug/tracing
 # cat available_filter_functions | wc -l
 45856
 # cat enabled_functions | wc -l
 0
 # echo function > current_tracer
 # cat enabled_functions | wc -l
 45856

There, I just enabled 45,856 function call sites in one shot!

How else do you want to update them? Every function in the kernel has a
nop, that turns into a call to the ftrace_handler, if I add another
user of that code, it will change each one as well.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
       [not found]                   ` <CAHk-=whay7eN6+2gZjY-ybRbkbcqAmgrLwwszzHx8ws3c=S-MA@mail.gmail.com>
@ 2019-04-29 19:24                     ` Andy Lutomirski
  2019-04-29 20:07                       ` Linus Torvalds
  0 siblings, 1 reply; 64+ messages in thread
From: Andy Lutomirski @ 2019-04-29 19:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Lutomirski, Steven Rostedt, Peter Zijlstra,
	Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 12:13 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
>
> On Mon, Apr 29, 2019, 12:02 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>
>>
>> If nmi were to break it, it would be a cpu bug.
>
>
> Side note: we *already* depend on sti shadow working in other parts of the kernel, namely sti->iret.
>

Where?  STI; IRET would be nuts.

Before:

commit 4214a16b02971c60960afd675d03544e109e0d75
Author: Andy Lutomirski <luto@kernel.org>
Date:   Thu Apr 2 17:12:12 2015 -0700

    x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER

we did sti; sysxit, but, when we discussed this, I don't recall anyone
speaking up in favor of the safely of the old code.

Not to mention that the crash we'll get if we get an NMI and a
rescheduling interrupt in this path will be very, very hard to debug.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 19:07             ` Steven Rostedt
@ 2019-04-29 20:06               ` Linus Torvalds
  2019-04-29 20:20                 ` Linus Torvalds
  2019-04-29 20:30                 ` Steven Rostedt
  0 siblings, 2 replies; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 20:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 12:07 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Are you suggesting that I rewrite the code to do it one function at a
> time? This has always been batch mode. This is not something new. The
> function tracer has been around longer than the text poke code.

Only do the 'call' instructions one at a time. Why would you change
_existing_ code?

                 Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 19:24                     ` Andy Lutomirski
@ 2019-04-29 20:07                       ` Linus Torvalds
  2019-04-30 13:56                         ` Peter Zijlstra
  0 siblings, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 20:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steven Rostedt, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 12:24 PM Andy Lutomirski <luto@kernel.org> wrote:
> > Side note: we *already* depend on sti shadow working in other parts of the kernel, namely sti->iret.
>
> Where?  STI; IRET would be nuts.

Sorry, not 'sti;iret' but 'sti;sysexit'

> before commit 4214a16b02971c60960afd675d03544e109e0d75
>     x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER
>
> we did sti; sysxit, but, when we discussed this, I don't recall anyone
> speaking up in favor of the safely of the old code.

We still have that sti sysexit in the 32-bit code.

                     Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
       [not found]                 ` <CAHk-=wgewK4eFhF3=0RNtk1KQjMANFH6oDE=8m=84RExn2gxhw@mail.gmail.com>
       [not found]                   ` <CAHk-=whay7eN6+2gZjY-ybRbkbcqAmgrLwwszzHx8ws3c=S-MA@mail.gmail.com>
@ 2019-04-29 20:16                   ` Linus Torvalds
  2019-04-29 22:08                     ` Sean Christopherson
  1 sibling, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 20:16 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Steven Rostedt, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 12:02 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If nmi were to break it, it would be a cpu bug. I'm pretty sure I've
> seen the "shadow stops even nmi" documented for some uarch, but as
> mentioned it's not necessarily the only way to guarantee the shadow.

In fact, the documentation is simply the official Intel instruction
docs for "STI":

    The IF flag and the STI and CLI instructions do not prohibit the
    generation of exceptions and NMI interrupts. NMI interrupts (and
    SMIs) may be blocked for one macroinstruction following an STI.

note the "may be blocked". As mentioned, that's just one option for
not having NMI break the STI shadow guarantee, but it's clearly one
that Intel has done at times, and clearly even documents as having
done so.

There is absolutely no question that the sti shadow is real, and that
people have depended on it for _decades_. It would be a horrible
errata if the shadow can just be made to go away by randomly getting
an NMI or SMI.

              Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 20:06               ` Linus Torvalds
@ 2019-04-29 20:20                 ` Linus Torvalds
  2019-04-29 20:30                 ` Steven Rostedt
  1 sibling, 0 replies; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 20:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 1:06 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Only do the 'call' instructions one at a time. Why would you change
> _existing_ code?

Side note: if you want to, you can easily batch up rewriting 'call'
instructions to the same target using the exact same code. You just
need to change the int3 handler case to calculate the
bp_int3_call_return from the fixed one-time address to use sopmething
like

     this_cpu_write(bp_call_return, int3_address-1+bp_int3_call_size);

instead (and you'd need to also teach the function that there's not
just a single int3 live at a time)

                Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 20:06               ` Linus Torvalds
  2019-04-29 20:20                 ` Linus Torvalds
@ 2019-04-29 20:30                 ` Steven Rostedt
  2019-04-29 21:38                   ` Linus Torvalds
  1 sibling, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-04-29 20:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, 29 Apr 2019 13:06:17 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Apr 29, 2019 at 12:07 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Are you suggesting that I rewrite the code to do it one function at a
> > time? This has always been batch mode. This is not something new. The
> > function tracer has been around longer than the text poke code.  
> 
> Only do the 'call' instructions one at a time. Why would you change
> _existing_ code?

The function tracing is a call instruction.

On boot:

<function_X>:
	nop
	blah
	blah

After a callback to function tracing is called:

<function_X>
	call custom_trampoline
	blah
	blah


If we have two functions to that function added:

<function_X>
	call iterator_trampoline
	blah
	blah

The update from "call custom_trampoline" to "call iterator_trampoline"
is where we have an issue.

We could make this a special case where we do this one at a time, but
currently the code is all the same looking at tables to determine to
what to do. Which is one of three:

 1) change nop to call function
 2) change call function to nop
 3) update call function to another call function

#3 is where we have an issue. But if we want this to be different, we
would need to change the code significantly, and know that we are only
updating calls to calls. Which would take a bit of accounting to see if
that's the change that is being made.

This thread started about that #3 operation causing a call to be missed
because we turn it into a nop while we make the transition, where in
reality it needs to be a call to one of the two functions in the
transition.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 20:30                 ` Steven Rostedt
@ 2019-04-29 21:38                   ` Linus Torvalds
  2019-04-29 22:07                     ` Steven Rostedt
  0 siblings, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 21:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 1:30 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The update from "call custom_trampoline" to "call iterator_trampoline"
> is where we have an issue.

So it has never worked. Just tell people that they have two chocies:

 - you do the careful rewriting, which takes more time

 - you do it by rewriting as nop and then back, which is what
historically has been done, and that is fast and simple, because
there's no need to be careful.

Really. I find your complaints completely incomprehensible. You've
never rewritten call instructions atomically before, and now you
complain about it being slightly more expensive to do it when I give
you the code? Yes it damn well will be slightly more expensive. Deal
with it.

Btw, once again - I several months ago also gave a suggestion on how
it could be done batch-mode by having lots of those small stubs and
just generating them dynamically.

You never wrote that code *EITHER*. It's been *months*.

So now I've written the non-batch-mode code for you, and you just
*complain* about it?

I'm done with this discussion. I'm totally fed up.

                 Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 18:56               ` Andy Lutomirski
       [not found]                 ` <CAHk-=wgewK4eFhF3=0RNtk1KQjMANFH6oDE=8m=84RExn2gxhw@mail.gmail.com>
@ 2019-04-29 22:06                 ` Linus Torvalds
  2019-04-30 11:18                   ` Peter Zijlstra
  1 sibling, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 22:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steven Rostedt, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 11:57 AM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > Otherwise you could never trust the whole sti shadow thing - and it very much is part of the architecture.
>
> Is this documented somewhere?

Btw, if you really don't trust the sti shadow despite it going all the
way back to the 8086, then you could instead make the irqoff code do

        push %gs:bp_call_return
        push %gs:bp_call_target
        sti
        ret

which just keeps interrupts explicitly disabled over the whole use of
the percpu data.

The actual "ret" instruction doesn't matter, it's not going to change
in this model (where the code isn't dynamically generated or changed).
So I claim that it will still be protected by the sti shadow, but when
written that way it doesn't actually matter, and you could reschedule
immediately after the sti (getting an interrupt there might make the
stack frame look odd, but it doesn't really affect anything else)

                  Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 21:38                   ` Linus Torvalds
@ 2019-04-29 22:07                     ` Steven Rostedt
  2019-04-30  9:24                       ` Nicolai Stange
  0 siblings, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-04-29 22:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, 29 Apr 2019 14:38:35 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Apr 29, 2019 at 1:30 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > The update from "call custom_trampoline" to "call iterator_trampoline"
> > is where we have an issue.  
> 
> So it has never worked. Just tell people that they have two chocies:

The custom call to iterator translation never worked. One option is to
always call the iterator, and just take the hit. There's another
solution to to make permanent updates that would handle the live
patching case, but not for the tracing case. It is more critical for
live patching than tracing (missed traced function is not as critical
as running buggy code unexpectedly). I could look to work on that
instead.

> 
>  - you do the careful rewriting, which takes more time

Which would be the method I said making the call to call a special case.

> 
>  - you do it by rewriting as nop and then back, which is what
> historically has been done, and that is fast and simple, because
> there's no need to be careful.

Except in the live kernel patching case where you just put back the
buggy function.

> 
> Really. I find your complaints completely incomprehensible. You've
> never rewritten call instructions atomically before, and now you
> complain about it being slightly more expensive to do it when I give
> you the code? Yes it damn well will be slightly more expensive. Deal
> with it.

I wasn't complaining about the expense of doing it that way. I was
stating that it would take more time to get it right as it as it would
require a different implementation for the call to call case.


> 
> Btw, once again - I several months ago also gave a suggestion on how
> it could be done batch-mode by having lots of those small stubs and
> just generating them dynamically.
> 
> You never wrote that code *EITHER*. It's been *months*.

Note, I wasn't the one writing the code back then either. I posted a
proof of concept for a similar topic (trace events) for the purpose of
bringing back the performance lost due to the retpolines (something
like 8% hit). Josh took the initiative to do that work, but I don't
remember ever getting to a consensus on a solution to that. Yes you had
given us ideas, but I remember people (like Andy) having concerns with
it. But because it was just an optimization change, and Josh had other
things to work on, that work just stalled.

But I just found out that the function tracing code has the same issue,
but this can cause real problems with live kernel patching. This is why
this patch set started.

> 
> So now I've written the non-batch-mode code for you, and you just
> *complain* about it?

Because this is a different story. The trace event code is updated one
at a time (there's patches out there to do it in batch). But this is
not trace events. This is function tracing which only does batching.


> 
> I'm done with this discussion. I'm totally fed up.
> 

Note, I wasn't writing this code anyway as I didn't have the time to. I
was helping others who took the initiative to do this work. But I guess
they didn't have time to move it forward either.

For the live kernel patching case, I'll start working on the permanent
changes, where once a function entry is changed, it can never be put
back. Then we wont have an issue with the live kernel patching case,
but only for tracing. But the worse thing there is you miss tracing
functions while an update is being made.

If Nicolai has time, perhaps he can try out the method you suggest and
see if we can move this forward.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 20:16                   ` [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Linus Torvalds
@ 2019-04-29 22:08                     ` Sean Christopherson
  2019-04-29 22:22                       ` Linus Torvalds
  0 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2019-04-29 22:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Lutomirski, Steven Rostedt, Peter Zijlstra,
	Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 01:16:10PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019 at 12:02 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > If nmi were to break it, it would be a cpu bug. I'm pretty sure I've
> > seen the "shadow stops even nmi" documented for some uarch, but as
> > mentioned it's not necessarily the only way to guarantee the shadow.
> 
> In fact, the documentation is simply the official Intel instruction
> docs for "STI":
> 
>     The IF flag and the STI and CLI instructions do not prohibit the
>     generation of exceptions and NMI interrupts. NMI interrupts (and
>     SMIs) may be blocked for one macroinstruction following an STI.
> 
> note the "may be blocked". As mentioned, that's just one option for
> not having NMI break the STI shadow guarantee, but it's clearly one
> that Intel has done at times, and clearly even documents as having
> done so.
> 
> There is absolutely no question that the sti shadow is real, and that
> people have depended on it for _decades_. It would be a horrible
> errata if the shadow can just be made to go away by randomly getting
> an NMI or SMI.

FWIW, Lakemont (Quark) doesn't block NMI/SMI in the STI shadow, but I'm
not sure that counters the "horrible errata" statement ;-).  SMI+RSM saves
and restores STI blocking in that case, but AFAICT NMI has no such
protection and will effectively break the shadow on its IRET.

All other (modern) Intel uArchs block NMI in the shadow and also enforce
STI_BLOCKING==0 when injecting an NMI via VM-Enter, i.e. prevent a VMM
from breaking the shadow so long as the VMM preserves the shadow info.

KVM is generally ok with respect to STI blocking, but ancient versions
didn't migrate STI blocking and there's currently a hole where
single-stepping a guest (from host userspace) could drop STI_BLOCKING
if a different VM-Exit occurs between the single-step #DB VM-Exit and the
instruction in the shadow.  Though "don't do that" may be a reasonable
answer in that case.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 22:08                     ` Sean Christopherson
@ 2019-04-29 22:22                       ` Linus Torvalds
  2019-04-30  0:08                         ` Sean Christopherson
  0 siblings, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2019-04-29 22:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andrew Lutomirski, Steven Rostedt, Peter Zijlstra,
	Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 3:08 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> FWIW, Lakemont (Quark) doesn't block NMI/SMI in the STI shadow, but I'm
> not sure that counters the "horrible errata" statement ;-).  SMI+RSM saves
> and restores STI blocking in that case, but AFAICT NMI has no such
> protection and will effectively break the shadow on its IRET.

Ugh. I can't say I care deeply about Quark (ie never seemed to go
anywhere), but it's odd. I thought it was based on a Pentium core (or
i486+?). Are you saying those didn't do it either?

I have this dim memory about talking about this with some (AMD?)
engineer, and having an alternative approach for the sti shadow wrt
NMI - basically not checking interrupts in the instruction you return
to with 'iret'. I don't think it was even conditional on the "iret
from NMI", I think it was basically any iret also did the sti shadow
thing.

But I can find no actual paper to back that up, so this may be me just
making sh*t up.

> KVM is generally ok with respect to STI blocking, but ancient versions
> didn't migrate STI blocking and there's currently a hole where
> single-stepping a guest (from host userspace) could drop STI_BLOCKING
> if a different VM-Exit occurs between the single-step #DB VM-Exit and the
> instruction in the shadow.  Though "don't do that" may be a reasonable
> answer in that case.

I thought the sti shadow blocked the single-step exception too? I know
"mov->ss" does block debug interrupts too.

Or are you saying that it's some "single step by emulation" that just
miss setting the STI_BLOCKING flag?

                   Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 22:22                       ` Linus Torvalds
@ 2019-04-30  0:08                         ` Sean Christopherson
  2019-04-30  0:45                           ` Sean Christopherson
  0 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2019-04-30  0:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Lutomirski, Steven Rostedt, Peter Zijlstra,
	Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 03:22:09PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019 at 3:08 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > FWIW, Lakemont (Quark) doesn't block NMI/SMI in the STI shadow, but I'm
> > not sure that counters the "horrible errata" statement ;-).  SMI+RSM saves
> > and restores STI blocking in that case, but AFAICT NMI has no such
> > protection and will effectively break the shadow on its IRET.
> 
> Ugh. I can't say I care deeply about Quark (ie never seemed to go
> anywhere), but it's odd. I thought it was based on a Pentium core (or
> i486+?). Are you saying those didn't do it either?

It's 486 based, but either way I suspect the answer is "yes".  IIRC,
Knights Corner, a.k.a. Larrabee, also had funkiness around SMM and that
was based on P54C, though I'm struggling to recall exactly what the
Larrabee weirdness was.

> I have this dim memory about talking about this with some (AMD?)
> engineer, and having an alternative approach for the sti shadow wrt
> NMI - basically not checking interrupts in the instruction you return
> to with 'iret'. I don't think it was even conditional on the "iret
> from NMI", I think it was basically any iret also did the sti shadow
> thing.
> 
> But I can find no actual paper to back that up, so this may be me just
> making sh*t up.

If Intel CPUs ever did anything like that on IRET it's long gone.

> > KVM is generally ok with respect to STI blocking, but ancient versions
> > didn't migrate STI blocking and there's currently a hole where
> > single-stepping a guest (from host userspace) could drop STI_BLOCKING
> > if a different VM-Exit occurs between the single-step #DB VM-Exit and the
> > instruction in the shadow.  Though "don't do that" may be a reasonable
> > answer in that case.
> 
> I thought the sti shadow blocked the single-step exception too? I know
> "mov->ss" does block debug interrupts too.

{MOV,POP}SS blocks #DBs, STI does not.

> Or are you saying that it's some "single step by emulation" that just
> miss setting the STI_BLOCKING flag?

This is the case I was talking about for KVM.  KVM supports single-stepping
the guest from userpace and uses EFLAGS.TF to do so (since it works on both
Intel and AMD).  VMX has a consistency check that fails VM-Entry if
STI_BLOCKING=1, EFLAGS.TF==1, IA32_DEBUGCTL.BTF=0 and there isn't a pending
single-step #DB, and so KVM clears STI_BLOCKING immediately before entering
the guest when single-stepping the guest.  If a VM-Exit occurs immediately
after VM-Entry, e.g. due to hardware interrupt, then KVM will see
STI_BLOCKING=0 when processing guest events in its run loop and will inject
any pending interrupts.

I *think* the KVM behavior can be fixed, e.g. I'm not entirely sure why KVM
takes this approach instead of setting PENDING_DBG.BS, but that's probably
a moot point.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30  0:08                         ` Sean Christopherson
@ 2019-04-30  0:45                           ` Sean Christopherson
  2019-04-30  2:26                             ` Linus Torvalds
  0 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2019-04-30  0:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Lutomirski, Steven Rostedt, Peter Zijlstra,
	Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 05:08:46PM -0700, Sean Christopherson wrote:
> On Mon, Apr 29, 2019 at 03:22:09PM -0700, Linus Torvalds wrote:
> > On Mon, Apr 29, 2019 at 3:08 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > FWIW, Lakemont (Quark) doesn't block NMI/SMI in the STI shadow, but I'm
> > > not sure that counters the "horrible errata" statement ;-).  SMI+RSM saves
> > > and restores STI blocking in that case, but AFAICT NMI has no such
> > > protection and will effectively break the shadow on its IRET.
> > 
> > Ugh. I can't say I care deeply about Quark (ie never seemed to go
> > anywhere), but it's odd. I thought it was based on a Pentium core (or
> > i486+?). Are you saying those didn't do it either?
> 
> It's 486 based, but either way I suspect the answer is "yes".  IIRC,
> Knights Corner, a.k.a. Larrabee, also had funkiness around SMM and that
> was based on P54C, though I'm struggling to recall exactly what the
> Larrabee weirdness was.

Aha!  Found an ancient comment that explicitly states P5 does not block
NMI/SMI in the STI shadow, while P6 does block NMI/SMI.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30  0:45                           ` Sean Christopherson
@ 2019-04-30  2:26                             ` Linus Torvalds
  2019-04-30 10:40                               ` Peter Zijlstra
  2019-04-30 11:17                               ` Jiri Kosina
  0 siblings, 2 replies; 64+ messages in thread
From: Linus Torvalds @ 2019-04-30  2:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andrew Lutomirski, Steven Rostedt, Peter Zijlstra,
	Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 5:45 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Apr 29, 2019 at 05:08:46PM -0700, Sean Christopherson wrote:
> >
> > It's 486 based, but either way I suspect the answer is "yes".  IIRC,
> > Knights Corner, a.k.a. Larrabee, also had funkiness around SMM and that
> > was based on P54C, though I'm struggling to recall exactly what the
> > Larrabee weirdness was.
>
> Aha!  Found an ancient comment that explicitly states P5 does not block
> NMI/SMI in the STI shadow, while P6 does block NMI/SMI.

Ok, so the STI shadow really wouldn't be reliable on those machines. Scary.

Of course, the good news is that hopefully nobody has them any more,
and if they do, they presumably don't use fancy NMI profiling etc, so
any actual NMI's are probably relegated purely to largely rare and
effectively fatal errors anyway (ie memory parity errors).

                     Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 22:07                     ` Steven Rostedt
@ 2019-04-30  9:24                       ` Nicolai Stange
  0 siblings, 0 replies; 64+ messages in thread
From: Nicolai Stange @ 2019-04-30  9:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Andy Lutomirski, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

Steven Rostedt <rostedt@goodmis.org> writes:

> On Mon, 29 Apr 2019 14:38:35 -0700
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Mon, Apr 29, 2019 at 1:30 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>> >
>> > The update from "call custom_trampoline" to "call iterator_trampoline"
>> > is where we have an issue.  
>> 
>> So it has never worked. Just tell people that they have two chocies:
>
> The custom call to iterator translation never worked. One option is to
> always call the iterator, and just take the hit. There's another
> solution to to make permanent updates that would handle the live
> patching case, but not for the tracing case. It is more critical for
> live patching than tracing (missed traced function is not as critical
> as running buggy code unexpectedly). I could look to work on that
> instead.

Making the live patching case permanent would disable tracing of live
patched functions though?

For unconditionally calling the iterator: I don't have numbers, but
would expect that always searching through something like 50-70 live
patching ftrace_ops from some hot path will be noticeable.


> If Nicolai has time, perhaps he can try out the method you suggest and
> see if we can move this forward.

You mean making ftrace handle call->call transitions one by one in
non-batch mode, right? Sure, I can do that.

Another question though: is there anything that prevents us from calling
ftrace_ops_list_func() directly from ftrace_int3_handler()? We don't
have parent_ip, but if that is used for reporting only, maybe setting it
to some ftrace_is_in_int3_and_doesnt_now_the_parent() will work?
Graph tracing entries would still be lost, but well...

Thanks,

Nicolai

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30  2:26                             ` Linus Torvalds
@ 2019-04-30 10:40                               ` Peter Zijlstra
  2019-04-30 11:17                               ` Jiri Kosina
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2019-04-30 10:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sean Christopherson, Andrew Lutomirski, Steven Rostedt,
	Nicolai Stange, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 07:26:02PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019 at 5:45 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Mon, Apr 29, 2019 at 05:08:46PM -0700, Sean Christopherson wrote:
> > >
> > > It's 486 based, but either way I suspect the answer is "yes".  IIRC,
> > > Knights Corner, a.k.a. Larrabee, also had funkiness around SMM and that
> > > was based on P54C, though I'm struggling to recall exactly what the
> > > Larrabee weirdness was.
> >
> > Aha!  Found an ancient comment that explicitly states P5 does not block
> > NMI/SMI in the STI shadow, while P6 does block NMI/SMI.
> 
> Ok, so the STI shadow really wouldn't be reliable on those machines. Scary.
> 
> Of course, the good news is that hopefully nobody has them any more,
> and if they do, they presumably don't use fancy NMI profiling etc, so
> any actual NMI's are probably relegated purely to largely rare and
> effectively fatal errors anyway (ie memory parity errors).

We do have KNC perf support, if that chip has 'issues'...

Outside of that, we only do perf from P6 onwards. With P4 support being
in dubious shape, because it's massively weird and 'nobody' still has
those machines.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 18:52         ` Steven Rostedt
       [not found]           ` <CAHk-=wjm93jLtVxTX4HZs6K4k1Wqh3ujjmapqaYtcibVk_YnzQ@mail.gmail.com>
@ 2019-04-30 10:46           ` Peter Zijlstra
  2019-04-30 13:44             ` Steven Rostedt
  1 sibling, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2019-04-30 10:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 02:52:50PM -0400, Steven Rostedt wrote:
> On Mon, 29 Apr 2019 11:06:58 -0700
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > +void replace_call(void *addr, const void *opcode, size_t len, void *target)
> > +{
> > +	bp_int3_call_target = target;
> > +	bp_int3_call_return = addr + len;
> > +	bp_int3_handler_irqoff = emulate_call_irqoff;
> > +	text_poke_bp(addr, opcode, len, emulate_call_irqon);
> > +}
> 
> Note, the function tracer does not use text poke. It does it in batch
> mode. It can update over 40,000 calls in one go:

Note that Daniel is adding batch stuff to text_poke(), because
jump_label/static_key stuffs also end up wanting to change more than one
site at a time and IPI spraying the machine for every single instance is
hurting.

So ideally ftrace would start to use the 'normal' code when that
happens.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30  2:26                             ` Linus Torvalds
  2019-04-30 10:40                               ` Peter Zijlstra
@ 2019-04-30 11:17                               ` Jiri Kosina
  1 sibling, 0 replies; 64+ messages in thread
From: Jiri Kosina @ 2019-04-30 11:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sean Christopherson, Andrew Lutomirski, Steven Rostedt,
	Peter Zijlstra, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Mon, 29 Apr 2019, Linus Torvalds wrote:

> > > It's 486 based, but either way I suspect the answer is "yes".  IIRC,
> > > Knights Corner, a.k.a. Larrabee, also had funkiness around SMM and that
> > > was based on P54C, though I'm struggling to recall exactly what the
> > > Larrabee weirdness was.
> >
> > Aha!  Found an ancient comment that explicitly states P5 does not block
> > NMI/SMI in the STI shadow, while P6 does block NMI/SMI.
> 
> Ok, so the STI shadow really wouldn't be reliable on those machines. Scary.
> 
> Of course, the good news is that hopefully nobody has them any more, and 
> if they do, they presumably don't use fancy NMI profiling etc, so any 
> actual NMI's are probably relegated purely to largely rare and 
> effectively fatal errors anyway (ie memory parity errors).

FWIW, if that thing has local apic (I have no idea, I've never seen Quark 
myself), then NMIs are used to trigger all-cpu backtrace as well. Which 
still can be done in situations where the kernel is then expected to 
continue running undisrupted (softlockup, sysrq, hung task detector, ...).

Nothing to really worry about in the particular case of this HW perhaps, 
sure.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 22:06                 ` Linus Torvalds
@ 2019-04-30 11:18                   ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2019-04-30 11:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Steven Rostedt, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 03:06:30PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019 at 11:57 AM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > Otherwise you could never trust the whole sti shadow thing - and it very much is part of the architecture.
> >
> > Is this documented somewhere?
> 
> Btw, if you really don't trust the sti shadow despite it going all the
> way back to the 8086, then you could instead make the irqoff code do
> 
>         push %gs:bp_call_return
>         push %gs:bp_call_target
>         sti
>         ret

This variant cures the RETPOLINE complaint; due to there not actually
being an indirect jump anymore. And it cures the sibling call complaint,
but trades it for "return with modified stack frame".

Something like so is clean:

+extern asmlinkage void emulate_call_irqon(void);
+extern asmlinkage void emulate_call_irqoff(void);
+
+asm(
+	".text\n"
+	".global emulate_call_irqoff\n"
+	".type emulate_call_irqoff, @function\n"
+	"emulate_call_irqoff:\n\t"
+		"push %gs:bp_call_return\n\t"
+		"push %gs:bp_call_target\n\t"
+		"sti\n\t"
+		"ret\n"
+	".size emulate_call_irqoff, .-emulate_call_irqoff\n"
+
+	".global emulate_call_irqon\n"
+	".type emulate_call_irqon, @function\n"
+	"emulate_call_irqon:\n\t"
+		"push %gs:bp_call_return\n\t"
+		"push %gs:bp_call_target\n\t"
+		"ret\n"
+	".size emulate_call_irqon, .-emulate_call_irqon\n"
+	".previous\n");
+
+STACK_FRAME_NON_STANDARD(emulate_call_irqoff);
+STACK_FRAME_NON_STANDARD(emulate_call_irqon);

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30 10:46           ` Peter Zijlstra
@ 2019-04-30 13:44             ` Steven Rostedt
  2019-04-30 14:20               ` Peter Zijlstra
  0 siblings, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-04-30 13:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Tue, 30 Apr 2019 12:46:48 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Apr 29, 2019 at 02:52:50PM -0400, Steven Rostedt wrote:
> > On Mon, 29 Apr 2019 11:06:58 -0700
> > Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >   
> > > +void replace_call(void *addr, const void *opcode, size_t len, void *target)
> > > +{
> > > +	bp_int3_call_target = target;
> > > +	bp_int3_call_return = addr + len;
> > > +	bp_int3_handler_irqoff = emulate_call_irqoff;
> > > +	text_poke_bp(addr, opcode, len, emulate_call_irqon);
> > > +}  
> > 
> > Note, the function tracer does not use text poke. It does it in batch
> > mode. It can update over 40,000 calls in one go:  
> 
> Note that Daniel is adding batch stuff to text_poke(), because
> jump_label/static_key stuffs also end up wanting to change more than one
> site at a time and IPI spraying the machine for every single instance is
> hurting.
> 
> So ideally ftrace would start to use the 'normal' code when that
> happens.

Sure, but that's a completely different issue. We would need to solve
the 'emulate' call for batch mode first.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-29 20:07                       ` Linus Torvalds
@ 2019-04-30 13:56                         ` Peter Zijlstra
  2019-04-30 16:06                           ` Linus Torvalds
  0 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2019-04-30 13:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Steven Rostedt, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Mon, Apr 29, 2019 at 01:07:33PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019 at 12:24 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > Side note: we *already* depend on sti shadow working in other parts of the kernel, namely sti->iret.
> >
> > Where?  STI; IRET would be nuts.
> 
> Sorry, not 'sti;iret' but 'sti;sysexit'
> 
> > before commit 4214a16b02971c60960afd675d03544e109e0d75
> >     x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER
> >
> > we did sti; sysxit, but, when we discussed this, I don't recall anyone
> > speaking up in favor of the safely of the old code.
> 
> We still have that sti sysexit in the 32-bit code.

We also have both: "STI; HLT" and "STI; MWAIT" where we rely on the STI
shadow. Getting an NMI in between shouldn't hurt too much, but if that
in turn can lead to an actual interrupt happening, we're up some creek
without no paddle.

Most moden systems don't use either anymore though. As
mwait_idle_with_hints() relies on MWAIT ECX[0]=1 to allow MWAIT to wake
from pending interrupts.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30 13:44             ` Steven Rostedt
@ 2019-04-30 14:20               ` Peter Zijlstra
  2019-04-30 14:36                 ` Steven Rostedt
  0 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2019-04-30 14:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Tue, Apr 30, 2019 at 09:44:45AM -0400, Steven Rostedt wrote:
> On Tue, 30 Apr 2019 12:46:48 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Apr 29, 2019 at 02:52:50PM -0400, Steven Rostedt wrote:
> > > On Mon, 29 Apr 2019 11:06:58 -0700
> > > Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > >   
> > > > +void replace_call(void *addr, const void *opcode, size_t len, void *target)
> > > > +{
> > > > +	bp_int3_call_target = target;
> > > > +	bp_int3_call_return = addr + len;
> > > > +	bp_int3_handler_irqoff = emulate_call_irqoff;
> > > > +	text_poke_bp(addr, opcode, len, emulate_call_irqon);
> > > > +}  
> > > 
> > > Note, the function tracer does not use text poke. It does it in batch
> > > mode. It can update over 40,000 calls in one go:  
> > 
> > Note that Daniel is adding batch stuff to text_poke(), because
> > jump_label/static_key stuffs also end up wanting to change more than one
> > site at a time and IPI spraying the machine for every single instance is
> > hurting.
> > 
> > So ideally ftrace would start to use the 'normal' code when that
> > happens.
> 
> Sure, but that's a completely different issue. We would need to solve
> the 'emulate' call for batch mode first.

I don't see a problem there; when INT3 happens; you bsearch() the batch
array to find the one you hit, you frob that into the percpu variables
and do the thing. Or am I loosing the plot again?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30 14:20               ` Peter Zijlstra
@ 2019-04-30 14:36                 ` Steven Rostedt
  0 siblings, 0 replies; 64+ messages in thread
From: Steven Rostedt @ 2019-04-30 14:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Nicolai Stange, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk, Tim Chen,
	Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Andy Lutomirski,
	Joerg Roedel, Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Tue, 30 Apr 2019 16:20:47 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > 
> > Sure, but that's a completely different issue. We would need to solve
> > the 'emulate' call for batch mode first.  
> 
> I don't see a problem there; when INT3 happens; you bsearch() the batch
> array to find the one you hit, you frob that into the percpu variables
> and do the thing. Or am I loosing the plot again?

I may need to tweak Linus's patch slightly, but I may be able to get it
to work with the batch mode.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30 13:56                         ` Peter Zijlstra
@ 2019-04-30 16:06                           ` Linus Torvalds
  2019-04-30 16:33                             ` Andy Lutomirski
  0 siblings, 1 reply; 64+ messages in thread
From: Linus Torvalds @ 2019-04-30 16:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Steven Rostedt, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, Apr 30, 2019 at 6:56 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Apr 29, 2019 at 01:07:33PM -0700, Linus Torvalds wrote:
> >
> > We still have that sti sysexit in the 32-bit code.
>
> We also have both: "STI; HLT" and "STI; MWAIT" where we rely on the STI
> shadow.

I guess the good news is that in all cases we really only ever protect
against a very unlikely race, and if the race happens it's not
actually fatal.

Yes, if we get an NMI and then an interrupt in between the "st;hlt" we
might wait for the next interrupt and get a (potentially fairly
horrible) latency issue. I guess that with maximal luck it might be a
one-shot timer and not get re-armed, but it sounds very very very
unlikely.

Googling around, I actually find a patch from Avi Kivity from back in
2010 for this exact issue, apparently because kvm got this case wrong
and somebody hit it. The patch never made it upstream exactly because
kvm could be fixed and people decided that most real hardware didn't
have the issue in the first place.

In the discussion I found, Peter Anvin tried to get confirmation from
AMD engineers about this too, but I don't see any resolution.

Realistically, I don't think you can hit the problem in practice. The
only way to hit that incredibly small race of "one instruction, *both*
NMI and interrupts" is to have a lot of interrupts going all at the
same time, but that will also then solve the latency problem, so the
very act of triggering it will also fix it.

I don't see any case where it's really bad. The "sti sysexit" race is
similar, just about latency of user space signal reporting (and
perhaps any pending TIF_WORK_xyz flags).

So maybe we don't care deeply about the sti shadow. It's a potential
latecy problem when broken, but not a huge issue. And for the
instruction rewriting hack, moving to "push+sti+ret" also makes a lost
sti shadow just a "possibly odd stack frame visibility" issue rather
than anything deeply fatal.

We can probably just write it off as "some old CPU's (and a smattering
or very rare and not relevant new ones) have potential but unlikely
latency issues because of a historical CPU mis-design - don't do perf
on them".

                Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30 16:06                           ` Linus Torvalds
@ 2019-04-30 16:33                             ` Andy Lutomirski
  2019-04-30 17:03                               ` Steven Rostedt
  0 siblings, 1 reply; 64+ messages in thread
From: Andy Lutomirski @ 2019-04-30 16:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Andy Lutomirski, Steven Rostedt, Nicolai Stange,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, Apr 30, 2019 at 9:06 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Apr 30, 2019 at 6:56 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
>
> Realistically, I don't think you can hit the problem in practice. The
> only way to hit that incredibly small race of "one instruction, *both*
> NMI and interrupts" is to have a lot of interrupts going all at the
> same time, but that will also then solve the latency problem, so the
> very act of triggering it will also fix it.
>
> I don't see any case where it's really bad. The "sti sysexit" race is
> similar, just about latency of user space signal reporting (and
> perhaps any pending TIF_WORK_xyz flags).

In the worst case, it actually kills the machine.  Last time I tracked
a bug like this down, I think the issue was that we got preempted
after the last TIF_ check, entered a VM, exited, context switched
back, and switched to user mode without noticing that there was a
ending KVM user return notifier.  This left us with bogus CPU state
and the machine exploded.

Linus, can I ask you to reconsider your opposition to Josh's other
approach of just shifting the stack on int3 entry?  I agree that it's
ugly, but the ugliness is easily manageable and fairly self-contained.
We add a little bit of complication to the entry asm (but it's not
like it's unprecedented -- the entry asm does all kinds of stack
rearrangement due to IST and PTI crap already), and we add an
int3_emulate_call(struct pt_regs *regs, unsigned long target) helper
that has appropriate assertions that the stack is okay and emulates
the call.  And that's it.

In contrast, your approach involves multiple asm trampolines, hash
tables, batching complications, and sti shadows.

As an additional argument, with the stack-shifting approach, it runs
on *every int3 from kernel mode*.  This means that we can do something
like this:

static bool int3_emulate_call_okay(struct pt_regs *regs)
{
    unsigned long available_stack = regs->sp - (unsigned long);
    return available_stack >= sizeof(long);
}

void do_int3(...) {
{
  WARN_ON_ONCE(!user_mode(regs) && !int3_emulate_call_okay(regs));
  ...;
}

static void int3_emulate_call(struct pt_regs *regs, unsigned long target)
{
  BUG_ON(user_mode(regs) || !int3_emulate_call_okey(regs));
  regs->sp -= sizeof(unsigned long);
  *(unsigned long *)regs->sp = target;
  /* CET SHSTK fixup goes here */
}

Obviously the CET SHSTK fixup might be rather nasty, but I suspect
it's a solvable problem.

A major benefit of this is that the entry asm nastiness will get
exercised all the time, and, if we screw it up, the warning will fire.
This is the basic principle behind why the entry stuff *works* these
days.  I've put a lot of effort into making sure that running kernels
with CONFIG_DEBUG_ENTRY and running the selftests actually exercises
the nasty cases.

--Andy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30 16:33                             ` Andy Lutomirski
@ 2019-04-30 17:03                               ` Steven Rostedt
  2019-04-30 17:20                                 ` Steven Rostedt
  0 siblings, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-04-30 17:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, 30 Apr 2019 09:33:51 -0700
Andy Lutomirski <luto@kernel.org> wrote:


> Linus, can I ask you to reconsider your opposition to Josh's other
> approach of just shifting the stack on int3 entry?  I agree that it's
> ugly, but the ugliness is easily manageable and fairly self-contained.
> We add a little bit of complication to the entry asm (but it's not
> like it's unprecedented -- the entry asm does all kinds of stack
> rearrangement due to IST and PTI crap already), and we add an
> int3_emulate_call(struct pt_regs *regs, unsigned long target) helper
> that has appropriate assertions that the stack is okay and emulates
> the call.  And that's it.

I also prefer Josh's stack shift solution, as I personally believe
that's a cleaner solution. But I went ahead and implemented Linus's
version to get it working for ftrace. Here's the code, and it survived
some preliminary tests.

There's three places that use the update code. One is the start of
every function call (yes, I counted that as one, and that case is
determined by: ftrace_location(ip)). The other is the trampoline itself
has an update. That could also be converted to a text poke, but for now
its here as it was written before text poke existed. The third place is
actually a jump (to the function graph code). But that can be safely
skipped if we are converting it, as it only goes from jump to nop, or
nop to jump.

The trampolines reflect this. Also, as NMI code is traced by ftrace, I
had to duplicate the trampolines for the nmi case (but only for the
interrupts disabled case as NMIs don't have interrupts enabled).

-- Steve

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ef49517f6bb2..bf320bf791dd 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -17,6 +17,7 @@
 #include <linux/uaccess.h>
 #include <linux/ftrace.h>
 #include <linux/percpu.h>
+#include <linux/frame.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/init.h>
@@ -232,6 +233,9 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 
 static unsigned long ftrace_update_func;
 
+/* Used within inline asm below */
+unsigned long ftrace_update_func_call;
+
 static int update_ftrace_func(unsigned long ip, void *new)
 {
 	unsigned char old[MCOUNT_INSN_SIZE];
@@ -259,6 +263,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
 
@@ -280,6 +286,70 @@ static nokprobe_inline int is_ftrace_caller(unsigned long ip)
 	return 0;
 }
 
+extern asmlinkage void ftrace_emulate_call_irqon(void);
+extern asmlinkage void ftrace_emulate_call_irqoff(void);
+extern asmlinkage void ftrace_emulate_call_nmi(void);
+extern asmlinkage void ftrace_emulate_call_update_irqoff(void);
+extern asmlinkage void ftrace_emulate_call_update_irqon(void);
+extern asmlinkage void ftrace_emulate_call_update_nmi(void);
+
+static DEFINE_PER_CPU(void *, ftrace_bp_call_return);
+static DEFINE_PER_CPU(void *, ftrace_bp_call_nmi_return);
+
+asm(
+	".text\n"
+	".global ftrace_emulate_call_irqoff\n"
+	".type ftrace_emulate_call_irqoff, @function\n"
+	"ftrace_emulate_call_irqoff:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"sti\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_irqoff, .-ftrace_emulate_call_irqoff\n"
+
+	".global ftrace_emulate_call_irqon\n"
+	".type ftrace_emulate_call_irqon, @function\n"
+	"ftrace_emulate_call_irqon:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_irqon, .-ftrace_emulate_call_irqon\n"
+
+	".global ftrace_emulate_call_nmi\n"
+	".type ftrace_emulate_call_nmi, @function\n"
+	"ftrace_emulate_call_nmi:\n\t"
+		"push %gs:ftrace_bp_call_nmi_return\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_nmi, .-ftrace_emulate_call_nmi\n"
+
+	".global ftrace_emulate_call_update_irqoff\n"
+	".type ftrace_emulate_call_update_irqoff, @function\n"
+	"ftrace_emulate_call_update_irqoff:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"sti\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_irqoff, .-ftrace_emulate_call_update_irqoff\n"
+
+	".global ftrace_emulate_call_update_irqon\n"
+	".type ftrace_emulate_call_update_irqon, @function\n"
+	"ftrace_emulate_call_update_irqon:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_irqon, .-ftrace_emulate_call_update_irqon\n"
+
+	".global ftrace_emulate_call_update_nmi\n"
+	".type ftrace_emulate_call_update_nmi, @function\n"
+	"ftrace_emulate_call_update_nmi:\n\t"
+		"push %gs:ftrace_bp_call_nmi_return\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_nmi, .-ftrace_emulate_call_update_nmi\n"
+	".previous\n");
+
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_irqoff);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_irqon);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_nmi);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_irqoff);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_irqon);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_nmi);
+
 /*
  * A breakpoint was added to the code address we are about to
  * modify, and this is the handle that will just skip over it.
@@ -295,10 +365,40 @@ int ftrace_int3_handler(struct pt_regs *regs)
 		return 0;
 
 	ip = regs->ip - 1;
-	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
+	if (ftrace_location(ip)) {
+		if (in_nmi()) {
+			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
+			regs->ip = (unsigned long) ftrace_emulate_call_nmi;
+			return 1;
+		}
+		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) ftrace_emulate_call_irqoff;
+		} else {
+			regs->ip = (unsigned long) ftrace_emulate_call_irqon;
+		}
+	} else if (is_ftrace_caller(ip)) {
+		/* If it's a jump, just need to skip it */
+		if (!ftrace_update_func_call) {
+			regs->ip += MCOUNT_INSN_SIZE -1;
+			return 1;
+		}
+		if (in_nmi()) {
+			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
+			regs->ip = (unsigned long) ftrace_emulate_call_update_nmi;
+			return 1;
+		}
+		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) ftrace_emulate_call_update_irqoff;
+		} else {
+			regs->ip = (unsigned long) ftrace_emulate_call_update_irqon;
+		}
+	} else {
 		return 0;
-
-	regs->ip += MCOUNT_INSN_SIZE - 1;
+	}
 
 	return 1;
 }
@@ -859,6 +959,8 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 	func = ftrace_ops_get_func(ops);
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	/* Do a safe modify in case the trampoline is executing */
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
@@ -960,6 +1062,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func)
 {
 	unsigned char *new;
 
+	ftrace_update_func_call = 0;
 	new = ftrace_jmp_replace(ip, (unsigned long)func);
 
 	return update_ftrace_func(ip, new);


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation
  2019-04-30 17:03                               ` Steven Rostedt
@ 2019-04-30 17:20                                 ` Steven Rostedt
  2019-04-30 17:49                                   ` [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler Steven Rostedt
  0 siblings, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-04-30 17:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, 30 Apr 2019 13:03:59 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> I also prefer Josh's stack shift solution, as I personally believe
> that's a cleaner solution. But I went ahead and implemented Linus's
> version to get it working for ftrace. Here's the code, and it survived
> some preliminary tests.

Well it past the second level of tests.

If this is the way we are going, I could add comments to the code and
apply it to my queue and run it through the rest of my test suite and
make it ready for the merge window. I may add a stable tag to it to go
back to where live kernel patching was added, as it fixes a potential
bug there.

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-04-30 17:20                                 ` Steven Rostedt
@ 2019-04-30 17:49                                   ` Steven Rostedt
  2019-04-30 18:33                                     ` Linus Torvalds
  2019-04-30 21:53                                     ` [RFC][PATCH v2] " Steven Rostedt
  0 siblings, 2 replies; 64+ messages in thread
From: Steven Rostedt @ 2019-04-30 17:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK


From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>


Nicolai Stange discovered[1] that if live kernel patching is enabled, and the
function tracer started tracing the same function that was patched, the
conversion of the fentry call site during the translation of going from
calling the live kernel patch trampoline to the iterator trampoline, would
have as slight window where it didn't call anything.

As live kernel patching depends on ftrace to always call its code (to
prevent the function being traced from being called, as it will redirect
it). This small window would allow the old buggy function to be called, and
this can cause undesirable results.

Nicolai submitted new patches[2] but these were controversial. As this is
similar to the static call emulation issues that came up a while ago[3],
Linus suggested using per CPU data along with special trampolines[4] to emulate
the calls.

Linus's solution was for text poke (which was mostly what the static_call
code did), but as ftrace has its own mechanism, it required doing its own
thing.

Having ftrace use its own per CPU data and having its own set of specialized
trampolines solves the issue of missed calls that live kernel patching
suffers.

[1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de
[2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de
[3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.1541711457.git.jpoimboe@redhat.com
[4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=A@mail.gmail.com

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 arch/x86/kernel/ftrace.c | 146 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 143 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ef49517f6bb2..835277043348 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -17,6 +17,7 @@
 #include <linux/uaccess.h>
 #include <linux/ftrace.h>
 #include <linux/percpu.h>
+#include <linux/frame.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/init.h>
@@ -232,6 +233,9 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 
 static unsigned long ftrace_update_func;
 
+/* Used within inline asm below */
+unsigned long ftrace_update_func_call;
+
 static int update_ftrace_func(unsigned long ip, void *new)
 {
 	unsigned char old[MCOUNT_INSN_SIZE];
@@ -259,6 +263,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
 
@@ -280,6 +286,103 @@ static nokprobe_inline int is_ftrace_caller(unsigned long ip)
 	return 0;
 }
 
+/*
+ * We need to handle the "call func1" -> "call func2" case.
+ * Just skipping the call is not sufficient as it will be like
+ * changing to "nop" first and then updating the call. But some
+ * users of ftrace require calls never to be missed.
+ *
+ * To emulate the call while converting the call site with a breakpoint,
+ * some trampolines are used along with per CPU buffers.
+ * There are three trampolines for the call sites and three trampolines
+ * for the updating of the call in ftrace trampoline. The three
+ * trampolines are:
+ *
+ * 1) Interrupts are enabled when the breakpoint is hit
+ * 2) Interrupts are disabled when the breakpoint is hit
+ * 3) The breakpoint was hit in an NMI
+ *
+ * As per CPU data is used, interrupts must be disabled to prevent them
+ * from corrupting the data. A separate NMI trampoline is used for the
+ * NMI case. If interrupts are already disabled, then the return path
+ * of where the breakpoint was hit (saved in the per CPU data) is pushed
+ * on the stack and then a jump to either the ftrace_caller (which will
+ * loop through all registered ftrace_ops handlers depending on the ip
+ * address), or if its a ftrace trampoline call update, it will call
+ * ftrace_update_func_call which will hold the call that should be
+ * called.
+ */
+extern asmlinkage void ftrace_emulate_call_irqon(void);
+extern asmlinkage void ftrace_emulate_call_irqoff(void);
+extern asmlinkage void ftrace_emulate_call_nmi(void);
+extern asmlinkage void ftrace_emulate_call_update_irqoff(void);
+extern asmlinkage void ftrace_emulate_call_update_irqon(void);
+extern asmlinkage void ftrace_emulate_call_update_nmi(void);
+
+static DEFINE_PER_CPU(void *, ftrace_bp_call_return);
+static DEFINE_PER_CPU(void *, ftrace_bp_call_nmi_return);
+
+asm(
+	".text\n"
+
+	/* Trampoline for function update with interrupts enabled */
+	".global ftrace_emulate_call_irqoff\n"
+	".type ftrace_emulate_call_irqoff, @function\n"
+	"ftrace_emulate_call_irqoff:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"sti\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_irqoff, .-ftrace_emulate_call_irqoff\n"
+
+	/* Trampoline for function update with interrupts disabled*/
+	".global ftrace_emulate_call_irqon\n"
+	".type ftrace_emulate_call_irqon, @function\n"
+	"ftrace_emulate_call_irqon:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_irqon, .-ftrace_emulate_call_irqon\n"
+
+	/* Trampoline for function update in an NMI */
+	".global ftrace_emulate_call_nmi\n"
+	".type ftrace_emulate_call_nmi, @function\n"
+	"ftrace_emulate_call_nmi:\n\t"
+		"push %gs:ftrace_bp_call_nmi_return\n\t"
+		"jmp ftrace_caller\n"
+	".size ftrace_emulate_call_nmi, .-ftrace_emulate_call_nmi\n"
+
+	/* Trampoline for ftrace trampoline call update with interrupts enabled */
+	".global ftrace_emulate_call_update_irqoff\n"
+	".type ftrace_emulate_call_update_irqoff, @function\n"
+	"ftrace_emulate_call_update_irqoff:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"sti\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_irqoff, .-ftrace_emulate_call_update_irqoff\n"
+
+	/* Trampoline for ftrace trampoline call update with interrupts disabled */
+	".global ftrace_emulate_call_update_irqon\n"
+	".type ftrace_emulate_call_update_irqon, @function\n"
+	"ftrace_emulate_call_update_irqon:\n\t"
+		"push %gs:ftrace_bp_call_return\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_irqon, .-ftrace_emulate_call_update_irqon\n"
+
+	/* Trampoline for ftrace trampoline call update in an NMI */
+	".global ftrace_emulate_call_update_nmi\n"
+	".type ftrace_emulate_call_update_nmi, @function\n"
+	"ftrace_emulate_call_update_nmi:\n\t"
+		"push %gs:ftrace_bp_call_nmi_return\n\t"
+		"jmp *ftrace_update_func_call\n"
+	".size ftrace_emulate_call_update_nmi, .-ftrace_emulate_call_update_nmi\n"
+	".previous\n");
+
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_irqoff);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_irqon);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_nmi);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_irqoff);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_irqon);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_nmi);
+
 /*
  * A breakpoint was added to the code address we are about to
  * modify, and this is the handle that will just skip over it.
@@ -295,10 +398,44 @@ int ftrace_int3_handler(struct pt_regs *regs)
 		return 0;
 
 	ip = regs->ip - 1;
-	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
+	if (ftrace_location(ip)) {
+		/* A breakpoint at the beginning of the function was hit */
+		if (in_nmi()) {
+			/* NMIs have their own trampoline */
+			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
+			regs->ip = (unsigned long) ftrace_emulate_call_nmi;
+			return 1;
+		}
+		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) ftrace_emulate_call_irqoff;
+		} else {
+			regs->ip = (unsigned long) ftrace_emulate_call_irqon;
+		}
+	} else if (is_ftrace_caller(ip)) {
+		/* An ftrace trampoline is being updated */
+		if (!ftrace_update_func_call) {
+			/* If it's a jump, just need to skip it */
+			regs->ip += MCOUNT_INSN_SIZE -1;
+			return 1;
+		}
+		if (in_nmi()) {
+			/* NMIs have their own trampoline */
+			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
+			regs->ip = (unsigned long) ftrace_emulate_call_update_nmi;
+			return 1;
+		}
+		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) ftrace_emulate_call_update_irqoff;
+		} else {
+			regs->ip = (unsigned long) ftrace_emulate_call_update_irqon;
+		}
+	} else {
 		return 0;
-
-	regs->ip += MCOUNT_INSN_SIZE - 1;
+	}
 
 	return 1;
 }
@@ -859,6 +996,8 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 	func = ftrace_ops_get_func(ops);
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	/* Do a safe modify in case the trampoline is executing */
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
@@ -960,6 +1099,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func)
 {
 	unsigned char *new;
 
+	ftrace_update_func_call = 0;
 	new = ftrace_jmp_replace(ip, (unsigned long)func);
 
 	return update_ftrace_func(ip, new);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-04-30 17:49                                   ` [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler Steven Rostedt
@ 2019-04-30 18:33                                     ` Linus Torvalds
  2019-04-30 19:00                                       ` Steven Rostedt
                                                         ` (2 more replies)
  2019-04-30 21:53                                     ` [RFC][PATCH v2] " Steven Rostedt
  1 sibling, 3 replies; 64+ messages in thread
From: Linus Torvalds @ 2019-04-30 18:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, Apr 30, 2019 at 10:49 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> +
> +asm(
> +       ".text\n"
> +
> +       /* Trampoline for function update with interrupts enabled */
> +       ".global ftrace_emulate_call_irqoff\n"
> +       ".type ftrace_emulate_call_irqoff, @function\n"
> +       "ftrace_emulate_call_irqoff:\n\t"
> +               "push %gs:ftrace_bp_call_return\n\t"

Well, as mentioned in my original suggestion, this won't work on
32-bit, or on UP. They have different models for per-cpu data (32-bti
uses %fs, and UP doesn't use a segment override at all).

Maybe we just don't care about UP at all for this code, of course.

And maybe we can make the decision to also make 32-bit just not use
this either - so maybe the code is ok per se, just needs to make sure
it never triggers for the cases that it's not written for..

> +       "ftrace_emulate_call_update_irqoff:\n\t"
> +               "push %gs:ftrace_bp_call_return\n\t"
> +               "sti\n\t"
> +               "jmp *ftrace_update_func_call\n"

.. and this should then use the "push push sti ret" model instead.

Plus get updated for objtool complaints.

Anyway, since Andy really likes the entry code change, can we have
that patch in parallel and judge the difference that way? Iirc, that
was x86-64 specific too.

                           Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-04-30 18:33                                     ` Linus Torvalds
@ 2019-04-30 19:00                                       ` Steven Rostedt
  2019-04-30 21:08                                       ` Steven Rostedt
  2019-05-01 13:11                                       ` Peter Zijlstra
  2 siblings, 0 replies; 64+ messages in thread
From: Steven Rostedt @ 2019-04-30 19:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, 30 Apr 2019 11:33:21 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Apr 30, 2019 at 10:49 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > +
> > +asm(
> > +       ".text\n"
> > +
> > +       /* Trampoline for function update with interrupts enabled */
> > +       ".global ftrace_emulate_call_irqoff\n"
> > +       ".type ftrace_emulate_call_irqoff, @function\n"
> > +       "ftrace_emulate_call_irqoff:\n\t"
> > +               "push %gs:ftrace_bp_call_return\n\t"  
> 
> Well, as mentioned in my original suggestion, this won't work on
> 32-bit, or on UP. They have different models for per-cpu data (32-bti
> uses %fs, and UP doesn't use a segment override at all).

Ah, yeah, I forgot about 32-bit. I could easily make this use fs as
well, and for UP, just use a static variable.

> 
> Maybe we just don't care about UP at all for this code, of course.
> 
> And maybe we can make the decision to also make 32-bit just not use
> this either - so maybe the code is ok per se, just needs to make sure
> it never triggers for the cases that it's not written for..
> 
> > +       "ftrace_emulate_call_update_irqoff:\n\t"
> > +               "push %gs:ftrace_bp_call_return\n\t"
> > +               "sti\n\t"
> > +               "jmp *ftrace_update_func_call\n"  
> 
> .. and this should then use the "push push sti ret" model instead.
> 
> Plus get updated for objtool complaints.

Yeah, I see that now. Somehow it disappeared when I looked for it after
making some other changes. I can update it.

> 
> Anyway, since Andy really likes the entry code change, can we have
> that patch in parallel and judge the difference that way? Iirc, that
> was x86-64 specific too.

Note, I don't think live kernel patching supports 32 bit anyway, so
that may not be an issue.

Josh,

When you come back to the office, can you look into that method?

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-04-30 18:33                                     ` Linus Torvalds
  2019-04-30 19:00                                       ` Steven Rostedt
@ 2019-04-30 21:08                                       ` Steven Rostedt
  2019-05-01 13:11                                       ` Peter Zijlstra
  2 siblings, 0 replies; 64+ messages in thread
From: Steven Rostedt @ 2019-04-30 21:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, 30 Apr 2019 11:33:21 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > +       "ftrace_emulate_call_update_irqoff:\n\t"
> > +               "push %gs:ftrace_bp_call_return\n\t"
> > +               "sti\n\t"
> > +               "jmp *ftrace_update_func_call\n"  
> 
> .. and this should then use the "push push sti ret" model instead.
> 
> Plus get updated for objtool complaints.

And unfortunately, this blows up on lockdep. Lockdep notices that the
return from the breakpoint handler has interrupts enabled, and will not
enable them in its shadow irqs disabled variable. But then we enabled
them in the trampoline, without telling lockdep and we trigger
something likes this:

------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 2 PID: 0 at kernel/time/tick-sched.c:979 tick_nohz_idle_enter+0x44/0x8c
Modules linked in:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.1.0-rc3-test+ #123
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: f0 05 00 00 00 75 26 83 b8 c4 05 00 00 00 75 1d 80 3d 5f 0f 43 c1 00 75 14 68 72 74 16 c1 c6 05 5f 0f 43 c1 01 e8 33 d7 f8 ff <0f> 0b 58 fa e8 4e 2c 04 00 bb e0 36 6b c1 64 03 1d 28 81 56 c1 8b
EAX: 0000001c EBX: ee769f84 ECX: 00000000 EDX: 00000006
ESI: 00000000 EDI: 00000002 EBP: ee769f50 ESP: ee769f48
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210292
CR0: 80050033 CR2: 00000000 CR3: 016c4000 CR4: 001406f0
Call Trace:
 do_idle+0x2a/0x1fc
 cpu_startup_entry+0x1e/0x20
 start_secondary+0x1d3/0x1ec
 startup_32_smp+0x164/0x168


I have to fool lockdep with the following:

		if (regs->flags & X86_EFLAGS_IF) {
			regs->flags &= ~X86_EFLAGS_IF;
			regs->ip = (unsigned long) ftrace_emulate_call_irqoff;
			/* Tell lockdep here we are enabling interrupts */
			trace_hardirqs_on();
		} else {
			regs->ip = (unsigned long) ftrace_emulate_call_irqon;
		}

-- Steve

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [RFC][PATCH v2] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-04-30 17:49                                   ` [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler Steven Rostedt
  2019-04-30 18:33                                     ` Linus Torvalds
@ 2019-04-30 21:53                                     ` Steven Rostedt
  2019-05-01  1:35                                       ` Steven Rostedt
  2019-05-01  8:26                                       ` Nicolai Stange
  1 sibling, 2 replies; 64+ messages in thread
From: Steven Rostedt @ 2019-04-30 21:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Nicolai Stange discovered[1] that if live kernel patching is enabled, and the
function tracer started tracing the same function that was patched, the
conversion of the fentry call site during the translation of going from
calling the live kernel patch trampoline to the iterator trampoline, would
have as slight window where it didn't call anything.

As live kernel patching depends on ftrace to always call its code (to
prevent the function being traced from being called, as it will redirect
it). This small window would allow the old buggy function to be called, and
this can cause undesirable results.

Nicolai submitted new patches[2] but these were controversial. As this is
similar to the static call emulation issues that came up a while ago[3],
Linus suggested using per CPU data along with special trampolines[4] to emulate
the calls.

Linus's solution was for text poke (which was mostly what the static_call
code did), but as ftrace has its own mechanism, it required doing its own
thing.

Having ftrace use its own per CPU data and having its own set of specialized
trampolines solves the issue of missed calls that live kernel patching
suffers.

[1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de
[2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de
[3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.1541711457.git.jpoimboe@redhat.com
[4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=A@mail.gmail.com

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---

Changes since v1:

  - Use "push push ret" instead of indirect jumps (Linus)
  - Handle 32 bit as well as non SMP
  - Fool lockdep into thinking interrupts are enabled


 arch/x86/kernel/ftrace.c | 175 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 170 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ef49517f6bb2..9160f5cc3b6d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -17,6 +17,7 @@
 #include <linux/uaccess.h>
 #include <linux/ftrace.h>
 #include <linux/percpu.h>
+#include <linux/frame.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/init.h>
@@ -232,6 +233,9 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 
 static unsigned long ftrace_update_func;
 
+/* Used within inline asm below */
+unsigned long ftrace_update_func_call;
+
 static int update_ftrace_func(unsigned long ip, void *new)
 {
 	unsigned char old[MCOUNT_INSN_SIZE];
@@ -259,6 +263,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
 
@@ -280,6 +286,125 @@ static nokprobe_inline int is_ftrace_caller(unsigned long ip)
 	return 0;
 }
 
+/*
+ * We need to handle the "call func1" -> "call func2" case.
+ * Just skipping the call is not sufficient as it will be like
+ * changing to "nop" first and then updating the call. But some
+ * users of ftrace require calls never to be missed.
+ *
+ * To emulate the call while converting the call site with a breakpoint,
+ * some trampolines are used along with per CPU buffers.
+ * There are three trampolines for the call sites and three trampolines
+ * for the updating of the call in ftrace trampoline. The three
+ * trampolines are:
+ *
+ * 1) Interrupts are enabled when the breakpoint is hit
+ * 2) Interrupts are disabled when the breakpoint is hit
+ * 3) The breakpoint was hit in an NMI
+ *
+ * As per CPU data is used, interrupts must be disabled to prevent them
+ * from corrupting the data. A separate NMI trampoline is used for the
+ * NMI case. If interrupts are already disabled, then the return path
+ * of where the breakpoint was hit (saved in the per CPU data) is pushed
+ * on the stack and then a jump to either the ftrace_caller (which will
+ * loop through all registered ftrace_ops handlers depending on the ip
+ * address), or if its a ftrace trampoline call update, it will call
+ * ftrace_update_func_call which will hold the call that should be
+ * called.
+ */
+extern asmlinkage void ftrace_emulate_call_irqon(void);
+extern asmlinkage void ftrace_emulate_call_irqoff(void);
+extern asmlinkage void ftrace_emulate_call_nmi(void);
+extern asmlinkage void ftrace_emulate_call_update_irqoff(void);
+extern asmlinkage void ftrace_emulate_call_update_irqon(void);
+extern asmlinkage void ftrace_emulate_call_update_nmi(void);
+
+static DEFINE_PER_CPU(void *, ftrace_bp_call_return);
+static DEFINE_PER_CPU(void *, ftrace_bp_call_nmi_return);
+
+#ifdef CONFIG_SMP
+#ifdef CONFIG_X86_64
+# define BP_CALL_RETURN		"%gs:ftrace_bp_call_return"
+# define BP_CALL_NMI_RETURN	"%gs:ftrace_bp_call_nmi_return"
+#else
+# define BP_CALL_RETURN		"%fs:ftrace_bp_call_return"
+# define BP_CALL_NMI_RETURN	"%fs:ftrace_bp_call_nmi_return"
+#endif
+#else /* SMP */
+# define BP_CALL_RETURN		"ftrace_bp_call_return"
+# define BP_CALL_NMI_RETURN	"ftrace_bp_call_nmi_return"
+#endif
+
+/* To hold the ftrace_caller address to push on the stack */
+void *ftrace_caller_func = (void *)ftrace_caller;
+
+asm(
+	".text\n"
+
+	/* Trampoline for function update with interrupts enabled */
+	".global ftrace_emulate_call_irqoff\n"
+	".type ftrace_emulate_call_irqoff, @function\n"
+	"ftrace_emulate_call_irqoff:\n\t"
+		"push "BP_CALL_RETURN"\n\t"
+		"push ftrace_caller_func\n"
+		"sti\n\t"
+		"ret\n\t"
+	".size ftrace_emulate_call_irqoff, .-ftrace_emulate_call_irqoff\n"
+
+	/* Trampoline for function update with interrupts disabled*/
+	".global ftrace_emulate_call_irqon\n"
+	".type ftrace_emulate_call_irqon, @function\n"
+	"ftrace_emulate_call_irqon:\n\t"
+		"push "BP_CALL_RETURN"\n\t"
+		"push ftrace_caller_func\n"
+		"ret\n\t"
+	".size ftrace_emulate_call_irqon, .-ftrace_emulate_call_irqon\n"
+
+	/* Trampoline for function update in an NMI */
+	".global ftrace_emulate_call_nmi\n"
+	".type ftrace_emulate_call_nmi, @function\n"
+	"ftrace_emulate_call_nmi:\n\t"
+		"push "BP_CALL_NMI_RETURN"\n\t"
+		"push ftrace_caller_func\n"
+		"ret\n\t"
+	".size ftrace_emulate_call_nmi, .-ftrace_emulate_call_nmi\n"
+
+	/* Trampoline for ftrace trampoline call update with interrupts enabled */
+	".global ftrace_emulate_call_update_irqoff\n"
+	".type ftrace_emulate_call_update_irqoff, @function\n"
+	"ftrace_emulate_call_update_irqoff:\n\t"
+		"push "BP_CALL_RETURN"\n\t"
+		"push ftrace_update_func_call\n"
+		"sti\n\t"
+		"ret\n\t"
+	".size ftrace_emulate_call_update_irqoff, .-ftrace_emulate_call_update_irqoff\n"
+
+	/* Trampoline for ftrace trampoline call update with interrupts disabled */
+	".global ftrace_emulate_call_update_irqon\n"
+	".type ftrace_emulate_call_update_irqon, @function\n"
+	"ftrace_emulate_call_update_irqon:\n\t"
+		"push "BP_CALL_RETURN"\n\t"
+		"push ftrace_update_func_call\n"
+		"ret\n\t"
+	".size ftrace_emulate_call_update_irqon, .-ftrace_emulate_call_update_irqon\n"
+
+	/* Trampoline for ftrace trampoline call update in an NMI */
+	".global ftrace_emulate_call_update_nmi\n"
+	".type ftrace_emulate_call_update_nmi, @function\n"
+	"ftrace_emulate_call_update_nmi:\n\t"
+		"push "BP_CALL_NMI_RETURN"\n\t"
+		"push ftrace_update_func_call\n"
+		"ret\n\t"
+	".size ftrace_emulate_call_update_nmi, .-ftrace_emulate_call_update_nmi\n"
+	".previous\n");
+
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_irqoff);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_irqon);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_nmi);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_irqoff);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_irqon);
+STACK_FRAME_NON_STANDARD(ftrace_emulate_call_update_nmi);
+
 /*
  * A breakpoint was added to the code address we are about to
  * modify, and this is the handle that will just skip over it.
@@ -295,12 +420,49 @@ int ftrace_int3_handler(struct pt_regs *regs)
 		return 0;
 
 	ip = regs->ip - 1;
-	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
-		return 0;
-
-	regs->ip += MCOUNT_INSN_SIZE - 1;
+	if (ftrace_location(ip)) {
+		/* A breakpoint at the beginning of the function was hit */
+		if (in_nmi()) {
+			/* NMIs have their own trampoline */
+			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
+			regs->ip = (unsigned long) ftrace_emulate_call_nmi;
+			return 1;
+		}
+		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) ftrace_emulate_call_irqoff;
+			/* Tell lockdep here we are enabling interrupts */
+			trace_hardirqs_on();
+		} else {
+			regs->ip = (unsigned long) ftrace_emulate_call_irqon;
+		}
+		return 1;
+	} else if (is_ftrace_caller(ip)) {
+		/* An ftrace trampoline is being updated */
+		if (!ftrace_update_func_call) {
+			/* If it's a jump, just need to skip it */
+			regs->ip += MCOUNT_INSN_SIZE -1;
+			return 1;
+		}
+		if (in_nmi()) {
+			/* NMIs have their own trampoline */
+			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
+			regs->ip = (unsigned long) ftrace_emulate_call_update_nmi;
+			return 1;
+		}
+		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
+		if (regs->flags & X86_EFLAGS_IF) {
+			regs->flags &= ~X86_EFLAGS_IF;
+			regs->ip = (unsigned long) ftrace_emulate_call_update_irqoff;
+			trace_hardirqs_on();
+		} else {
+			regs->ip = (unsigned long) ftrace_emulate_call_update_irqon;
+		}
+		return 1;
+	}
 
-	return 1;
+	return 0;
 }
 NOKPROBE_SYMBOL(ftrace_int3_handler);
 
@@ -859,6 +1021,8 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 	func = ftrace_ops_get_func(ops);
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	/* Do a safe modify in case the trampoline is executing */
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
@@ -960,6 +1124,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func)
 {
 	unsigned char *new;
 
+	ftrace_update_func_call = 0;
 	new = ftrace_jmp_replace(ip, (unsigned long)func);
 
 	return update_ftrace_func(ip, new);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH v2] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-04-30 21:53                                     ` [RFC][PATCH v2] " Steven Rostedt
@ 2019-05-01  1:35                                       ` Steven Rostedt
  2019-05-01  1:58                                         ` Linus Torvalds
  2019-05-01  8:26                                       ` Nicolai Stange
  1 sibling, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-05-01  1:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, 30 Apr 2019 17:53:34 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> +	if (ftrace_location(ip)) {
> +		/* A breakpoint at the beginning of the function was hit */
> +		if (in_nmi()) {
> +			/* NMIs have their own trampoline */
> +			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
> +			regs->ip = (unsigned long) ftrace_emulate_call_nmi;
> +			return 1;
> +		}
> +		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
> +		if (regs->flags & X86_EFLAGS_IF) {
> +			regs->flags &= ~X86_EFLAGS_IF;
> +			regs->ip = (unsigned long) ftrace_emulate_call_irqoff;
> +			/* Tell lockdep here we are enabling interrupts */
> +			trace_hardirqs_on();

This isn't good enough. The return from interrupt does call lockdep
saying interrupts are disabled. Need to add the lockdep tracking in the
asm as well.

Probably easier to move it from inline asm to ftrace_X.S and use the
lockdep TRACE_ON/OFF macros.

-- Steve




> +		} else {
> +			regs->ip = (unsigned long) ftrace_emulate_call_irqon;
> +		}
> +		return 1;
> +	} else if (is_ftrace_caller(ip)) {
> +		/* An ftrace trampoline is being updated */
> +		if (!ftrace_update_func_call) {
> +			/* If it's a jump, just need to skip it */
> +			regs->ip += MCOUNT_INSN_SIZE -1;
> +			return 1;
> +		}
> +		if (in_nmi()) {
> +			/* NMIs have their own trampoline */
> +			this_cpu_write(ftrace_bp_call_nmi_return, (void *)ip + MCOUNT_INSN_SIZE);
> +			regs->ip = (unsigned long) ftrace_emulate_call_update_nmi;
> +			return 1;
> +		}
> +		this_cpu_write(ftrace_bp_call_return, (void *)ip + MCOUNT_INSN_SIZE);
> +		if (regs->flags & X86_EFLAGS_IF) {
> +			regs->flags &= ~X86_EFLAGS_IF;
> +			regs->ip = (unsigned long) ftrace_emulate_call_update_irqoff;
> +			trace_hardirqs_on();
> +		} else {
> +			regs->ip = (unsigned long) ftrace_emulate_call_update_irqon;
> +		}
> +		return 1;
> +	}
>  
> -	return 1;
> +	return 0;
>  }

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH v2] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01  1:35                                       ` Steven Rostedt
@ 2019-05-01  1:58                                         ` Linus Torvalds
  0 siblings, 0 replies; 64+ messages in thread
From: Linus Torvalds @ 2019-05-01  1:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Peter Zijlstra, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, Apr 30, 2019 at 6:35 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
>
> Probably easier to move it from inline asm to ftrace_X.S and use the
> lockdep TRACE_ON/OFF macros.

Yeah, that should clean up the percpu stuff too since we have helper
macros for it for *.S files anyway.

I only did the asm() in C because it made the "look, something like
this" patch simpler to test (and it made it easy to check the
generated asm file). Not because it was a good idea ;)

                 Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH v2] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-04-30 21:53                                     ` [RFC][PATCH v2] " Steven Rostedt
  2019-05-01  1:35                                       ` Steven Rostedt
@ 2019-05-01  8:26                                       ` Nicolai Stange
  2019-05-01 13:22                                         ` Steven Rostedt
  1 sibling, 1 reply; 64+ messages in thread
From: Nicolai Stange @ 2019-05-01  8:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Andy Lutomirski, Peter Zijlstra, Nicolai Stange,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

Hi Steve,

many thanks for moving this forward!


Steven Rostedt <rostedt@goodmis.org> writes:

>
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index ef49517f6bb2..9160f5cc3b6d 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -17,6 +17,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/ftrace.h>
>  #include <linux/percpu.h>
> +#include <linux/frame.h>
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/init.h>
> @@ -232,6 +233,9 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>  
>  static unsigned long ftrace_update_func;
>  
> +/* Used within inline asm below */
> +unsigned long ftrace_update_func_call;
> +
>  static int update_ftrace_func(unsigned long ip, void *new)
>  {
>  	unsigned char old[MCOUNT_INSN_SIZE];
> @@ -259,6 +263,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
>  	unsigned char *new;
>  	int ret;
>  
> +	ftrace_update_func_call = (unsigned long)func;
> +
>  	new = ftrace_call_replace(ip, (unsigned long)func);
>  	ret = update_ftrace_func(ip, new);
>  
> @@ -280,6 +286,125 @@ static nokprobe_inline int is_ftrace_caller(unsigned long ip)
>  	return 0;
>  }
>  
> +/*
> + * We need to handle the "call func1" -> "call func2" case.
> + * Just skipping the call is not sufficient as it will be like
> + * changing to "nop" first and then updating the call. But some
> + * users of ftrace require calls never to be missed.
> + *
> + * To emulate the call while converting the call site with a breakpoint,
> + * some trampolines are used along with per CPU buffers.
> + * There are three trampolines for the call sites and three trampolines
> + * for the updating of the call in ftrace trampoline. The three
> + * trampolines are:
> + *
> + * 1) Interrupts are enabled when the breakpoint is hit
> + * 2) Interrupts are disabled when the breakpoint is hit
> + * 3) The breakpoint was hit in an NMI
> + *
> + * As per CPU data is used, interrupts must be disabled to prevent them
> + * from corrupting the data. A separate NMI trampoline is used for the
> + * NMI case. If interrupts are already disabled, then the return path
> + * of where the breakpoint was hit (saved in the per CPU data) is pushed
> + * on the stack and then a jump to either the ftrace_caller (which will
> + * loop through all registered ftrace_ops handlers depending on the ip
> + * address), or if its a ftrace trampoline call update, it will call
> + * ftrace_update_func_call which will hold the call that should be
> + * called.
> + */
> +extern asmlinkage void ftrace_emulate_call_irqon(void);
> +extern asmlinkage void ftrace_emulate_call_irqoff(void);
> +extern asmlinkage void ftrace_emulate_call_nmi(void);
> +extern asmlinkage void ftrace_emulate_call_update_irqoff(void);
> +extern asmlinkage void ftrace_emulate_call_update_irqon(void);
> +extern asmlinkage void ftrace_emulate_call_update_nmi(void);
> +
> +static DEFINE_PER_CPU(void *, ftrace_bp_call_return);
> +static DEFINE_PER_CPU(void *, ftrace_bp_call_nmi_return);

Andy mentioned #DB and #MC exceptions here:
https://lkml.kernel.org/r/C55DED25-C60D-4731-9A6B-92BDA8771766@amacapital.net

I think that #DB won't be possible, provided the trampolines below get
tagged as NOKPROBE (do_int3() and ftrace_int3_handler() already have
it).

It's highly theoretic, but tracing do_machine_check() could clobber
ftrace_bp_call_return or ftrace_bp_call_nmi_return?


> +#ifdef CONFIG_SMP
> +#ifdef CONFIG_X86_64
> +# define BP_CALL_RETURN		"%gs:ftrace_bp_call_return"
> +# define BP_CALL_NMI_RETURN	"%gs:ftrace_bp_call_nmi_return"
> +#else
> +# define BP_CALL_RETURN		"%fs:ftrace_bp_call_return"
> +# define BP_CALL_NMI_RETURN	"%fs:ftrace_bp_call_nmi_return"
> +#endif
> +#else /* SMP */
> +# define BP_CALL_RETURN		"ftrace_bp_call_return"
> +# define BP_CALL_NMI_RETURN	"ftrace_bp_call_nmi_return"
> +#endif
> +
> +/* To hold the ftrace_caller address to push on the stack */
> +void *ftrace_caller_func = (void *)ftrace_caller;

The live patching ftrace_ops need ftrace_regs_caller.


> +
> +asm(
> +	".text\n"
> +
> +	/* Trampoline for function update with interrupts enabled */
> +	".global ftrace_emulate_call_irqoff\n"
> +	".type ftrace_emulate_call_irqoff, @function\n"
> +	"ftrace_emulate_call_irqoff:\n\t"
> +		"push "BP_CALL_RETURN"\n\t"
> +		"push ftrace_caller_func\n"
> +		"sti\n\t"
> +		"ret\n\t"
> +	".size ftrace_emulate_call_irqoff, .-ftrace_emulate_call_irqoff\n"
> +
> +	/* Trampoline for function update with interrupts disabled*/
> +	".global ftrace_emulate_call_irqon\n"

The naming is perhaps a bit confusing, i.e. "update with interrupts
disabled" vs. "irqon"... How about swapping irqoff<->irqon?

Thanks,

Nicolai


> +	".type ftrace_emulate_call_irqon, @function\n"
> +	"ftrace_emulate_call_irqon:\n\t"
> +		"push "BP_CALL_RETURN"\n\t"
> +		"push ftrace_caller_func\n"
> +		"ret\n\t"
> +	".size ftrace_emulate_call_irqon, .-ftrace_emulate_call_irqon\n"
> +
> +	/* Trampoline for function update in an NMI */
> +	".global ftrace_emulate_call_nmi\n"
> +	".type ftrace_emulate_call_nmi, @function\n"
> +	"ftrace_emulate_call_nmi:\n\t"
> +		"push "BP_CALL_NMI_RETURN"\n\t"
> +		"push ftrace_caller_func\n"
> +		"ret\n\t"
> +	".size ftrace_emulate_call_nmi, .-ftrace_emulate_call_nmi\n"
> +

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-04-30 18:33                                     ` Linus Torvalds
  2019-04-30 19:00                                       ` Steven Rostedt
  2019-04-30 21:08                                       ` Steven Rostedt
@ 2019-05-01 13:11                                       ` Peter Zijlstra
  2019-05-01 18:58                                         ` Steven Rostedt
  2019-05-01 19:03                                         ` Linus Torvalds
  2 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2019-05-01 13:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Tue, Apr 30, 2019 at 11:33:21AM -0700, Linus Torvalds wrote:
> Anyway, since Andy really likes the entry code change, can we have
> that patch in parallel and judge the difference that way? Iirc, that
> was x86-64 specific too.

Here goes, compile tested only...

It obviously needs a self-test, but that shoulnd't be too hard to
arrange.

---
 arch/x86/entry/entry_32.S            |  7 +++++++
 arch/x86/entry/entry_64.S            | 14 ++++++++++++--
 arch/x86/include/asm/text-patching.h | 20 ++++++++++++++++++++
 arch/x86/kernel/ftrace.c             | 24 +++++++++++++++++++-----
 4 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7b23431be5cb..d246302085a3 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1479,6 +1479,13 @@ ENTRY(int3)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jnz	.Lfrom_usermode_no_gap
+	.rept 6
+	pushl	5*4(%esp)
+	.endr
+.Lfrom_usermode_no_gap:
+
 	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 20e45d9b4e15..268cd9affe04 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -878,7 +878,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
  * @paranoid == 2 is special: the stub will never switch stacks.  This is for
  * #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS.
  */
-.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0
+.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0 create_gap=0
 ENTRY(\sym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 
@@ -898,6 +898,16 @@ ENTRY(\sym)
 	jnz	.Lfrom_usermode_switch_stack_\@
 	.endif
 
+	.if \create_gap == 1
+	testb	$3, CS-ORIG_RAX(%rsp)
+	jnz	.Lfrom_usermode_no_gap_\@
+	.rept 6
+	pushq	5*8(%rsp)
+	.endr
+	UNWIND_HINT_IRET_REGS offset=8
+.Lfrom_usermode_no_gap_\@:
+	.endif
+
 	.if \paranoid
 	call	paranoid_entry
 	.else
@@ -1129,7 +1139,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
 #endif /* CONFIG_HYPERV */
 
 idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
-idtentry int3			do_int3			has_error_code=0
+idtentry int3			do_int3			has_error_code=0	create_gap=1
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index e85ff65c43c3..ba275b6292db 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -39,4 +39,24 @@ extern int poke_int3_handler(struct pt_regs *regs);
 extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
 extern int after_bootmem;
 
+static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
+{
+	regs->sp -= sizeof(unsigned long);
+	*(unsigned long *)regs->sp = val;
+}
+
+static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
+{
+	regs->ip = ip;
+}
+
+#define INT3_INSN_SIZE 1
+#define CALL_INSN_SIZE 5
+
+static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func)
+{
+	int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
+	int3_emulate_jmp(regs, func);
+}
+
 #endif /* _ASM_X86_TEXT_PATCHING_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ef49517f6bb2..90d319687d7e 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,6 +29,7 @@
 #include <asm/kprobes.h>
 #include <asm/ftrace.h>
 #include <asm/nops.h>
+#include <asm/text-patching.h>
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
@@ -231,6 +232,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 }
 
 static unsigned long ftrace_update_func;
+static unsigned long ftrace_update_func_call;
 
 static int update_ftrace_func(unsigned long ip, void *new)
 {
@@ -259,6 +261,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
 
@@ -295,12 +299,19 @@ int ftrace_int3_handler(struct pt_regs *regs)
 		return 0;
 
 	ip = regs->ip - 1;
-	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
-		return 0;
-
-	regs->ip += MCOUNT_INSN_SIZE - 1;
+	if (ftrace_location(ip)) {
+		int3_emulate_call(regs, ftrace_update_func_call);
+		return 1;
+	} else if (is_ftrace_caller(ip)) {
+		if (!ftrace_update_func_call) {
+			int3_emulate_jmp(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
+			return 1;
+		}
+		int3_emulate_call(regs, ftrace_update_func_call);
+		return 1;
+	}
 
-	return 1;
+	return 0;
 }
 NOKPROBE_SYMBOL(ftrace_int3_handler);
 
@@ -859,6 +870,8 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 	func = ftrace_ops_get_func(ops);
 
+	ftrace_update_func_call = (unsigned long)func;
+
 	/* Do a safe modify in case the trampoline is executing */
 	new = ftrace_call_replace(ip, (unsigned long)func);
 	ret = update_ftrace_func(ip, new);
@@ -960,6 +973,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func)
 {
 	unsigned char *new;
 
+	ftrace_update_func_call = 0UL;
 	new = ftrace_jmp_replace(ip, (unsigned long)func);
 
 	return update_ftrace_func(ip, new);

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH v2] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01  8:26                                       ` Nicolai Stange
@ 2019-05-01 13:22                                         ` Steven Rostedt
  0 siblings, 0 replies; 64+ messages in thread
From: Steven Rostedt @ 2019-05-01 13:22 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Linus Torvalds, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Wed, 01 May 2019 10:26:32 +0200
Nicolai Stange <nstange@suse.de> wrote:

> > +extern asmlinkage void ftrace_emulate_call_irqon(void);
> > +extern asmlinkage void ftrace_emulate_call_irqoff(void);
> > +extern asmlinkage void ftrace_emulate_call_nmi(void);
> > +extern asmlinkage void ftrace_emulate_call_update_irqoff(void);
> > +extern asmlinkage void ftrace_emulate_call_update_irqon(void);
> > +extern asmlinkage void ftrace_emulate_call_update_nmi(void);
> > +
> > +static DEFINE_PER_CPU(void *, ftrace_bp_call_return);
> > +static DEFINE_PER_CPU(void *, ftrace_bp_call_nmi_return);  
> 
> Andy mentioned #DB and #MC exceptions here:
> https://lkml.kernel.org/r/C55DED25-C60D-4731-9A6B-92BDA8771766@amacapital.net
> 
> I think that #DB won't be possible, provided the trampolines below get
> tagged as NOKPROBE (do_int3() and ftrace_int3_handler() already have
> it).
> 
> It's highly theoretic, but tracing do_machine_check() could clobber
> ftrace_bp_call_return or ftrace_bp_call_nmi_return?

Probably shouldn't trace do_machine_check() then ;-)

> 
> 
> > +#ifdef CONFIG_SMP
> > +#ifdef CONFIG_X86_64
> > +# define BP_CALL_RETURN		"%gs:ftrace_bp_call_return"
> > +# define BP_CALL_NMI_RETURN	"%gs:ftrace_bp_call_nmi_return"
> > +#else
> > +# define BP_CALL_RETURN		"%fs:ftrace_bp_call_return"
> > +# define BP_CALL_NMI_RETURN	"%fs:ftrace_bp_call_nmi_return"
> > +#endif
> > +#else /* SMP */
> > +# define BP_CALL_RETURN		"ftrace_bp_call_return"
> > +# define BP_CALL_NMI_RETURN	"ftrace_bp_call_nmi_return"
> > +#endif
> > +
> > +/* To hold the ftrace_caller address to push on the stack */
> > +void *ftrace_caller_func = (void *)ftrace_caller;  
> 
> The live patching ftrace_ops need ftrace_regs_caller.

Ah, you're right. Luckily ftrace_regs_caller is a superset of
ftrace_caller. That is, those only needing ftrace_caller can do fine
with ftrace_regs_caller (but not vice versa).

Easy enough to fix.

> 
> 
> > +
> > +asm(
> > +	".text\n"
> > +
> > +	/* Trampoline for function update with interrupts enabled */
> > +	".global ftrace_emulate_call_irqoff\n"
> > +	".type ftrace_emulate_call_irqoff, @function\n"
> > +	"ftrace_emulate_call_irqoff:\n\t"
> > +		"push "BP_CALL_RETURN"\n\t"
> > +		"push ftrace_caller_func\n"
> > +		"sti\n\t"
> > +		"ret\n\t"
> > +	".size ftrace_emulate_call_irqoff, .-ftrace_emulate_call_irqoff\n"
> > +
> > +	/* Trampoline for function update with interrupts disabled*/
> > +	".global ftrace_emulate_call_irqon\n"  
> 
> The naming is perhaps a bit confusing, i.e. "update with interrupts
> disabled" vs. "irqon"... How about swapping irqoff<->irqon?

I just used the terminology Linus used. It is confusing. Perhaps just
call it ftrace_emulate_call (for non sti case) and
ftrace_emulate_call_sti for the sti case. That should remove the
confusion.

-- Steve



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01 13:11                                       ` Peter Zijlstra
@ 2019-05-01 18:58                                         ` Steven Rostedt
  2019-05-01 19:03                                           ` Peter Zijlstra
  2019-05-01 19:03                                         ` Linus Torvalds
  1 sibling, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-05-01 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Wed, 1 May 2019 15:11:17 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Apr 30, 2019 at 11:33:21AM -0700, Linus Torvalds wrote:
> > Anyway, since Andy really likes the entry code change, can we have
> > that patch in parallel and judge the difference that way? Iirc, that
> > was x86-64 specific too.  
> 
> Here goes, compile tested only...
> 
> It obviously needs a self-test, but that shoulnd't be too hard to
> arrange.
> 

I was able to get it applied (with slight tweaking) but it then
crashed. But that was due to incorrect updates in the
ftrace_int3_handler().

> ---
>  arch/x86/entry/entry_32.S            |  7 +++++++
>  arch/x86/entry/entry_64.S            | 14 ++++++++++++--
>  arch/x86/include/asm/text-patching.h | 20 ++++++++++++++++++++
>  arch/x86/kernel/ftrace.c             | 24 +++++++++++++++++++-----
>  4 files changed, 58 insertions(+), 7 deletions(-)


>  #endif /* _ASM_X86_TEXT_PATCHING_H */
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index ef49517f6bb2..90d319687d7e 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -29,6 +29,7 @@
>  #include <asm/kprobes.h>
>  #include <asm/ftrace.h>
>  #include <asm/nops.h>
> +#include <asm/text-patching.h>
>  
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  
> @@ -231,6 +232,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec,
> unsigned long old_addr, }
>  
>  static unsigned long ftrace_update_func;
> +static unsigned long ftrace_update_func_call;
>  
>  static int update_ftrace_func(unsigned long ip, void *new)
>  {
> @@ -259,6 +261,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
>  	unsigned char *new;
>  	int ret;
>  
> +	ftrace_update_func_call = (unsigned long)func;
> +
>  	new = ftrace_call_replace(ip, (unsigned long)func);
>  	ret = update_ftrace_func(ip, new);
>  
> @@ -295,12 +299,19 @@ int ftrace_int3_handler(struct pt_regs *regs)
>  		return 0;
>  
>  	ip = regs->ip - 1;
> -	if (!ftrace_location(ip) && !is_ftrace_caller(ip))
> -		return 0;
> -
> -	regs->ip += MCOUNT_INSN_SIZE - 1;
> +	if (ftrace_location(ip)) {
> +		int3_emulate_call(regs, ftrace_update_func_call);

Should be:

		int3_emulate_call(regs, (unsigned long)ftrace_regs_caller);

> +		return 1;
> +	} else if (is_ftrace_caller(ip)) {
> +		if (!ftrace_update_func_call) {
> +			int3_emulate_jmp(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);

I see what you did here, but I think:

			int3_emulate_jmp(regs, ip + CALL_INSN_SIZE);

looks better. But that said, we could in the beginning do:

	ip = regs->ip - INT3_INSN_SIZE;

instead of

	ip = regs->ip - 1;

I made these updates and posted them to Linus.

-- Steve


> +			return 1;
> +		}
> +		int3_emulate_call(regs, ftrace_update_func_call);
> +		return 1;
> +	}
>  
> -	return 1;
> +	return 0;
>  }
>  NOKPROBE_SYMBOL(ftrace_int3_handler);
>  
> @@ -859,6 +870,8 @@ void arch_ftrace_update_trampoline(struct
> ftrace_ops *ops) 
>  	func = ftrace_ops_get_func(ops);
>  
> +	ftrace_update_func_call = (unsigned long)func;
> +
>  	/* Do a safe modify in case the trampoline is executing */
>  	new = ftrace_call_replace(ip, (unsigned long)func);
>  	ret = update_ftrace_func(ip, new);
> @@ -960,6 +973,7 @@ static int ftrace_mod_jmp(unsigned long ip, void
> *func) {
>  	unsigned char *new;
>  
> +	ftrace_update_func_call = 0UL;
>  	new = ftrace_jmp_replace(ip, (unsigned long)func);
>  
>  	return update_ftrace_func(ip, new);

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01 18:58                                         ` Steven Rostedt
@ 2019-05-01 19:03                                           ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2019-05-01 19:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Wed, May 01, 2019 at 02:58:24PM -0400, Steven Rostedt wrote:
> > +	if (ftrace_location(ip)) {
> > +		int3_emulate_call(regs, ftrace_update_func_call);
> 
> Should be:
> 
> 		int3_emulate_call(regs, (unsigned long)ftrace_regs_caller);

Ah, I lost the plot a little there.

> > +		return 1;
> > +	} else if (is_ftrace_caller(ip)) {
> > +		if (!ftrace_update_func_call) {
> > +			int3_emulate_jmp(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
> 
> I see what you did here, but I think:
> 
> 			int3_emulate_jmp(regs, ip + CALL_INSN_SIZE);
> 
> looks better. But that said, we could in the beginning do:
> 
> 	ip = regs->ip - INT3_INSN_SIZE;
> 
> instead of
> 
> 	ip = regs->ip - 1;
> 
> I made these updates and posted them to Linus.

I was actually considering:

static inline void int3_emulate_nop(struct pt_regs *regs, unsigned long size)
{
	int3_emulate_jmp(regs, regs->ip - INT3_INSN_SIZE + size);
}

And then the above becomes:

	int3_emulate_nop(regs, CALL_INSN_SIZE);

Hmm?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01 13:11                                       ` Peter Zijlstra
  2019-05-01 18:58                                         ` Steven Rostedt
@ 2019-05-01 19:03                                         ` Linus Torvalds
  2019-05-01 19:13                                           ` Peter Zijlstra
  2019-05-01 19:13                                           ` Steven Rostedt
  1 sibling, 2 replies; 64+ messages in thread
From: Linus Torvalds @ 2019-05-01 19:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Wed, May 1, 2019 at 6:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Here goes, compile tested only...

Ugh, two different threads. This has the same bug (same source) as the
one Steven posted:

> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1479,6 +1479,13 @@ ENTRY(int3)
>         ASM_CLAC
>         pushl   $-1                             # mark this as an int
>
> +       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> +       jnz     .Lfrom_usermode_no_gap
> +       .rept 6
> +       pushl   5*4(%esp)
> +       .endr
> +.Lfrom_usermode_no_gap:

This will corrupt things horribly if you still use vm86 mode. Checking
CS RPL is simply not correct.

                 Linus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01 19:03                                         ` Linus Torvalds
@ 2019-05-01 19:13                                           ` Peter Zijlstra
  2019-05-01 19:13                                           ` Steven Rostedt
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2019-05-01 19:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Wed, May 01, 2019 at 12:03:52PM -0700, Linus Torvalds wrote:
> On Wed, May 1, 2019 at 6:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Here goes, compile tested only...
> 
> Ugh, two different threads. This has the same bug (same source) as the
> one Steven posted:

This is what Steve started from; lets continue in the other thread.

> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -1479,6 +1479,13 @@ ENTRY(int3)
> >         ASM_CLAC
> >         pushl   $-1                             # mark this as an int
> >
> > +       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> > +       jnz     .Lfrom_usermode_no_gap
> > +       .rept 6
> > +       pushl   5*4(%esp)
> > +       .endr
> > +.Lfrom_usermode_no_gap:
> 
> This will corrupt things horribly if you still use vm86 mode. Checking
> CS RPL is simply not correct.

I'll go fix; I never really understood that vm86 crud and I cobbled this
32bit thing together based on the 64bit version (that Josh did a while
ago).

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01 19:03                                         ` Linus Torvalds
  2019-05-01 19:13                                           ` Peter Zijlstra
@ 2019-05-01 19:13                                           ` Steven Rostedt
  2019-05-01 19:33                                             ` Jiri Kosina
  1 sibling, 1 reply; 64+ messages in thread
From: Steven Rostedt @ 2019-05-01 19:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Andy Lutomirski, Nicolai Stange, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Konrad Rzeszutek Wilk, Tim Chen, Sebastian Andrzej Siewior,
	Mimi Zohar, Juergen Gross, Nick Desaulniers, Nayna Jain,
	Masahiro Yamada, Joerg Roedel, Linux List Kernel Mailing,
	live-patching, open list:KERNEL SELFTEST FRAMEWORK

On Wed, 1 May 2019 12:03:52 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, May 1, 2019 at 6:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Here goes, compile tested only...  
> 
> Ugh, two different threads. This has the same bug (same source) as the
> one Steven posted:
> 
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -1479,6 +1479,13 @@ ENTRY(int3)
> >         ASM_CLAC
> >         pushl   $-1                             # mark this as an int
> >
> > +       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> > +       jnz     .Lfrom_usermode_no_gap
> > +       .rept 6
> > +       pushl   5*4(%esp)
> > +       .endr
> > +.Lfrom_usermode_no_gap:  
> 
> This will corrupt things horribly if you still use vm86 mode. Checking
> CS RPL is simply not correct.

I never tested the 32 bit version of this. And we could just not
implement it (I don't think there's live kernel patching for it
either).

But this doesn't make it any worse than my version, because under the
full testing of my patch with the trampolines, I would easily crash the
32 bit version. That was one reason I made my last patch only support 64
bit.

Under light load, 32 bit works, but when I stress it (running perf and
ftrace together) it blows up. Could be an NMI issue.

-- Steve


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01 19:13                                           ` Steven Rostedt
@ 2019-05-01 19:33                                             ` Jiri Kosina
  2019-05-01 19:41                                               ` Peter Zijlstra
  0 siblings, 1 reply; 64+ messages in thread
From: Jiri Kosina @ 2019-05-01 19:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Peter Zijlstra, Andy Lutomirski, Nicolai Stange,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Wed, 1 May 2019, Steven Rostedt wrote:

> I never tested the 32 bit version of this. And we could just not
> implement it (I don't think there's live kernel patching for it
> either).

That's correct, there is no livepatching on x86_32 (and no plans for 
it). CONFIG_LIVEPATCH is not available for 32bit builds at all.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler
  2019-05-01 19:33                                             ` Jiri Kosina
@ 2019-05-01 19:41                                               ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2019-05-01 19:41 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Steven Rostedt, Linus Torvalds, Andy Lutomirski, Nicolai Stange,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Josh Poimboeuf, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Konrad Rzeszutek Wilk,
	Tim Chen, Sebastian Andrzej Siewior, Mimi Zohar, Juergen Gross,
	Nick Desaulniers, Nayna Jain, Masahiro Yamada, Joerg Roedel,
	Linux List Kernel Mailing, live-patching,
	open list:KERNEL SELFTEST FRAMEWORK

On Wed, May 01, 2019 at 09:33:28PM +0200, Jiri Kosina wrote:
> On Wed, 1 May 2019, Steven Rostedt wrote:
> 
> > I never tested the 32 bit version of this. And we could just not
> > implement it (I don't think there's live kernel patching for it
> > either).
> 
> That's correct, there is no livepatching on x86_32 (and no plans for 
> it). CONFIG_LIVEPATCH is not available for 32bit builds at all.

We still want this for static_call(), even on 32bit.

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2019-05-01 19:42 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-27 10:06 [PATCH 0/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
2019-04-27 10:06 ` [PATCH 1/4] x86/thread_info: introduce ->ftrace_int3_stack member Nicolai Stange
2019-04-28 17:41   ` Andy Lutomirski
2019-04-28 17:51     ` Steven Rostedt
2019-04-28 18:08       ` Andy Lutomirski
2019-04-28 19:43         ` Steven Rostedt
2019-04-28 20:56           ` Andy Lutomirski
2019-04-28 21:22       ` Nicolai Stange
2019-04-28 23:27         ` Andy Lutomirski
2019-04-27 10:06 ` [PATCH 2/4] ftrace: drop 'static' qualifier from ftrace_ops_list_func() Nicolai Stange
2019-04-27 10:06 ` [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Nicolai Stange
2019-04-27 10:26   ` Peter Zijlstra
2019-04-28 17:38     ` Steven Rostedt
2019-04-29 18:06       ` Linus Torvalds
2019-04-29 18:22         ` Linus Torvalds
2019-04-29 18:42           ` Andy Lutomirski
     [not found]             ` <CAHk-=whtt4K2f0KPtG-4Pykh3FK8UBOjD8jhXCUKB5nWDj_YRA@mail.gmail.com>
2019-04-29 18:56               ` Andy Lutomirski
     [not found]                 ` <CAHk-=wgewK4eFhF3=0RNtk1KQjMANFH6oDE=8m=84RExn2gxhw@mail.gmail.com>
     [not found]                   ` <CAHk-=whay7eN6+2gZjY-ybRbkbcqAmgrLwwszzHx8ws3c=S-MA@mail.gmail.com>
2019-04-29 19:24                     ` Andy Lutomirski
2019-04-29 20:07                       ` Linus Torvalds
2019-04-30 13:56                         ` Peter Zijlstra
2019-04-30 16:06                           ` Linus Torvalds
2019-04-30 16:33                             ` Andy Lutomirski
2019-04-30 17:03                               ` Steven Rostedt
2019-04-30 17:20                                 ` Steven Rostedt
2019-04-30 17:49                                   ` [RFC][PATCH] ftrace/x86: Emulate call function while updating in breakpoint handler Steven Rostedt
2019-04-30 18:33                                     ` Linus Torvalds
2019-04-30 19:00                                       ` Steven Rostedt
2019-04-30 21:08                                       ` Steven Rostedt
2019-05-01 13:11                                       ` Peter Zijlstra
2019-05-01 18:58                                         ` Steven Rostedt
2019-05-01 19:03                                           ` Peter Zijlstra
2019-05-01 19:03                                         ` Linus Torvalds
2019-05-01 19:13                                           ` Peter Zijlstra
2019-05-01 19:13                                           ` Steven Rostedt
2019-05-01 19:33                                             ` Jiri Kosina
2019-05-01 19:41                                               ` Peter Zijlstra
2019-04-30 21:53                                     ` [RFC][PATCH v2] " Steven Rostedt
2019-05-01  1:35                                       ` Steven Rostedt
2019-05-01  1:58                                         ` Linus Torvalds
2019-05-01  8:26                                       ` Nicolai Stange
2019-05-01 13:22                                         ` Steven Rostedt
2019-04-29 20:16                   ` [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation Linus Torvalds
2019-04-29 22:08                     ` Sean Christopherson
2019-04-29 22:22                       ` Linus Torvalds
2019-04-30  0:08                         ` Sean Christopherson
2019-04-30  0:45                           ` Sean Christopherson
2019-04-30  2:26                             ` Linus Torvalds
2019-04-30 10:40                               ` Peter Zijlstra
2019-04-30 11:17                               ` Jiri Kosina
2019-04-29 22:06                 ` Linus Torvalds
2019-04-30 11:18                   ` Peter Zijlstra
2019-04-29 18:52         ` Steven Rostedt
     [not found]           ` <CAHk-=wjm93jLtVxTX4HZs6K4k1Wqh3ujjmapqaYtcibVk_YnzQ@mail.gmail.com>
2019-04-29 19:07             ` Steven Rostedt
2019-04-29 20:06               ` Linus Torvalds
2019-04-29 20:20                 ` Linus Torvalds
2019-04-29 20:30                 ` Steven Rostedt
2019-04-29 21:38                   ` Linus Torvalds
2019-04-29 22:07                     ` Steven Rostedt
2019-04-30  9:24                       ` Nicolai Stange
2019-04-30 10:46           ` Peter Zijlstra
2019-04-30 13:44             ` Steven Rostedt
2019-04-30 14:20               ` Peter Zijlstra
2019-04-30 14:36                 ` Steven Rostedt
2019-04-27 10:06 ` [PATCH 4/4] selftests/livepatch: add "ftrace a live patched function" test Nicolai Stange

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).