LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH V4 00/13] perf/x86: Add perf text poke events
@ 2020-03-04  9:06 Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 01/13] perf: Add perf text poke event Adrian Hunter
                   ` (13 more replies)
  0 siblings, 14 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Hi

Here are patches to add a text poke event to record changes to kernel text
(i.e. self-modifying code) in order to support tracers like Intel PT
decoding through jump labels, kprobes and ftrace trampolines.

The first 8 patches make the kernel changes and the subsequent patches are
tools changes.

The next 4 patches add support for updating perf tools' data cache
with the changed bytes.

The last patch is an Intel PT specific tools change.

Patches also here:

	git://git.infradead.org/users/ahunter/linux-perf.git text_poke


Changes in V4

  kprobes: Add symbols for kprobe insn pages

	Change "module name" from kprobe to __builtin__kprobes
	Added comment about "module name" use

  ftrace: Add symbols for ftrace trampolines
	
	Change "module name" from ftrace to __builtin__ftrace
	Move calls of ftrace_add_trampoline_to_kallsyms() and
	ftrace_remove_trampoline_from_kallsyms() into
	kernel/trace/ftrace.c
	Added comment about "module name" use

  ftrace: Add perf ksymbol events for ftrace trampolines

	Move changes into kernel/trace/ftrace.c

  ftrace: Add perf text poke events for ftrace trampolines

	Move changes into kernel/trace/ftrace.c

Changes in V3:

  perf: Add perf text poke event

	To prevent warning, cast pointer to (unsigned long) not (u64)

  kprobes: Add symbols for kprobe insn pages

	Expand commit message
	Remove un-needed declarations of kprobe_cache_get_kallsym() and arch_kprobe_get_kallsym() when !CONFIG_KPROBES

  ftrace: Add symbols for ftrace trampolines

	Expand commit message
	Make ftrace_get_trampoline_kallsym() static

Changes in V2:

  perf: Add perf text poke event

	Separate out x86 changes
	The text poke event now has old len and new len
	Revised commit message

  perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers

	New patch containing x86 changes from original first patch

  kprobes: Add symbols for kprobe insn pages
  kprobes: Add perf ksymbol events for kprobe insn pages
  perf/x86: Add perf text poke events for kprobes
  ftrace: Add symbols for ftrace trampolines
  ftrace: Add perf ksymbol events for ftrace trampolines
  ftrace: Add perf text poke events for ftrace trampolines
  perf kcore_copy: Fix module map when there are no modules loaded
  perf evlist: Disable 'immediate' events last

	New patches

  perf tools: Add support for PERF_RECORD_TEXT_POKE

	The text poke event now has old len and new len
	Also select ksymbol events with text poke events

  perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL

	New patch

  perf intel-pt: Add support for text poke events

	The text poke event now has old len and new len
	Allow for the address not having a map yet


Changes since RFC:

  Dropped 'flags' from the new event.  The consensus seemed to be that text
  pokes should employ a scheme similar to x86's INT3 method instead.

  dropped tools patches already applied.


Example:

  For jump labels, the kernel needs
	CONFIG_JUMP_LABEL=y
  and also an easy to flip jump label is in sysctl_schedstats() which needs
	CONFIG_SCHEDSTATS=y
	CONFIG_PROC_SYSCTL=y
	CONFIG_SCHED_DEBUG=y

  Also note the 'sudo perf record' is put into the background which, as
  written, needs sudo credential caching (otherwise the background task
  will stop awaiting the sudo password), hence the 'sudo echo' to start.

Before:

  $ sudo echo
  $ sudo perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
  [1] 1640
  $ cat /proc/sys/kernel/sched_schedstats
  0
  $ sudo bash -c 'echo 1 > /proc/sys/kernel/sched_schedstats'
  $ cat /proc/sys/kernel/sched_schedstats
  1
  $ sudo bash -c 'echo 0 > /proc/sys/kernel/sched_schedstats'
  $ cat /proc/sys/kernel/sched_schedstats
  0
  $ sudo kill 1640
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 16.635 MB perf.data.before ]
  $ perf script -i perf.data.before --itrace=e >/dev/null
  Warning:
  1946 instruction trace errors

After:

  $ sudo echo
  $ sudo perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
  [1] 1882
  $ cat /proc/sys/kernel/sched_schedstats
  0
  $ sudo bash -c 'echo 1 > /proc/sys/kernel/sched_schedstats'
  $ cat /proc/sys/kernel/sched_schedstats
  1
  $ sudo bash -c 'echo 0 > /proc/sys/kernel/sched_schedstats'
  $ cat /proc/sys/kernel/sched_schedstats
  0
  $ sudo kill 1882
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 10.893 MB perf.data.after ]
  $ perf script -i perf.data.after --itrace=e
  $


Adrian Hunter (13):
      perf: Add perf text poke event
      perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
      kprobes: Add symbols for kprobe insn pages
      kprobes: Add perf ksymbol events for kprobe insn pages
      perf/x86: Add perf text poke events for kprobes
      ftrace: Add symbols for ftrace trampolines
      ftrace: Add perf ksymbol events for ftrace trampolines
      ftrace: Add perf text poke events for ftrace trampolines
      perf kcore_copy: Fix module map when there are no modules loaded
      perf evlist: Disable 'immediate' events last
      perf tools: Add support for PERF_RECORD_TEXT_POKE
      perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL
      perf intel-pt: Add support for text poke events

 arch/x86/include/asm/kprobes.h            |   4 ++
 arch/x86/include/asm/text-patching.h      |   2 +
 arch/x86/kernel/alternative.c             |  70 +++++++++++++++++----
 arch/x86/kernel/kprobes/core.c            |   7 +++
 arch/x86/kernel/kprobes/opt.c             |  18 +++++-
 include/linux/ftrace.h                    |  12 ++--
 include/linux/kprobes.h                   |  15 +++++
 include/linux/perf_event.h                |   8 +++
 include/uapi/linux/perf_event.h           |  26 +++++++-
 kernel/events/core.c                      |  90 +++++++++++++++++++++++++-
 kernel/kallsyms.c                         |  42 +++++++++++--
 kernel/kprobes.c                          |  57 +++++++++++++++++
 kernel/trace/ftrace.c                     | 101 +++++++++++++++++++++++++++++-
 tools/include/uapi/linux/perf_event.h     |  26 +++++++-
 tools/lib/perf/include/perf/event.h       |   9 +++
 tools/perf/arch/x86/util/intel-pt.c       |   4 ++
 tools/perf/builtin-record.c               |  45 +++++++++++++
 tools/perf/util/dso.c                     |   3 +
 tools/perf/util/dso.h                     |   1 +
 tools/perf/util/event.c                   |  40 ++++++++++++
 tools/perf/util/event.h                   |   5 ++
 tools/perf/util/evlist.c                  |  31 ++++++---
 tools/perf/util/evlist.h                  |   1 +
 tools/perf/util/evsel.c                   |   7 ++-
 tools/perf/util/intel-pt.c                |  75 ++++++++++++++++++++++
 tools/perf/util/machine.c                 |  49 +++++++++++++++
 tools/perf/util/machine.h                 |   3 +
 tools/perf/util/map.c                     |   5 ++
 tools/perf/util/map.h                     |   3 +-
 tools/perf/util/perf_event_attr_fprintf.c |   1 +
 tools/perf/util/record.c                  |  10 +++
 tools/perf/util/record.h                  |   1 +
 tools/perf/util/session.c                 |  23 +++++++
 tools/perf/util/symbol-elf.c              |   7 +++
 tools/perf/util/symbol.c                  |   1 +
 tools/perf/util/tool.h                    |   3 +-
 36 files changed, 765 insertions(+), 40 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 01/13] perf: Add perf text poke event
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers Adrian Hunter
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Record (single instruction) changes to the kernel text (i.e.
self-modifying code) in order to support tracers like Intel PT and
ARM CoreSight.

A copy of the running kernel code is needed as a reference point (e.g.
from /proc/kcore). The text poke event records the old bytes and the
new bytes so that the event can be processed forwards or backwards.

The basic problem is recording the modified instruction in an
unambiguous manner given SMP instruction cache (in)coherence. That is,
when modifying an instruction concurrently any solution with one or
multiple timestamps is not sufficient:

	CPU0				CPU1
 0
 1	write insn A
 2					execute insn A
 3	sync-I$
 4

Due to I$, CPU1 might execute either the old or new A. No matter where
we record tracepoints on CPU0, one simply cannot tell what CPU1 will
have observed, except that at 0 it must be the old one and at 4 it
must be the new one.

To solve this, take inspiration from x86 text poking, which has to
solve this exact problem due to variable length instruction encoding
and I-fetch windows.

 1) overwrite the instruction with a breakpoint and sync I$

This guarantees that that code flow will never hit the target
instruction anymore, on any CPU (or rather, it will cause an
exception).

 2) issue the TEXT_POKE event

 3) overwrite the breakpoint with the new instruction and sync I$

Now we know that any execution after the TEXT_POKE event will either
observe the breakpoint (and hit the exception) or the new instruction.

So by guarding the TEXT_POKE event with an exception on either side;
we can now tell, without doubt, which instruction another CPU will
have observed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/linux/perf_event.h      |  8 +++
 include/uapi/linux/perf_event.h | 21 +++++++-
 kernel/events/core.c            | 90 ++++++++++++++++++++++++++++++++-
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6d4c22aee384..e2b110e418fa 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1212,6 +1212,9 @@ extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
 extern void perf_event_namespaces(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
+extern void perf_event_text_poke(const void *addr,
+				 const void *old_bytes, size_t old_len,
+				 const void *new_bytes, size_t new_len);
 
 /* Callchains */
 DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
@@ -1462,6 +1465,11 @@ static inline void perf_event_exec(void)				{ }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec)	{ }
 static inline void perf_event_namespaces(struct task_struct *tsk)	{ }
 static inline void perf_event_fork(struct task_struct *tsk)		{ }
+static inline void perf_event_text_poke(const void *addr,
+					const void *old_bytes,
+					size_t old_len,
+					const void *new_bytes,
+					size_t new_len)			{ }
 static inline void perf_event_init(void)				{ }
 static inline int  perf_swevent_get_recursion_context(void)		{ return -1; }
 static inline void perf_swevent_put_recursion_context(int rctx)		{ }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 377d794d3105..bae9e9d2d897 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				text_poke      :  1, /* include text poke events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1007,24 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * Records changes to kernel text i.e. self-modified code. 'old_len' is
+	 * the number of old bytes, 'new_len' is the number of new bytes. Either
+	 * 'old_len' or 'new_len' may be zero to indicate, for example, the
+	 * addition or removal of a trampoline. 'bytes' contains the old bytes
+	 * followed immediately by the new bytes.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				addr;
+	 *	u16				old_len;
+	 *	u16				new_len;
+	 *	u8				bytes[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_TEXT_POKE			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2173c23c25b4..b0f6b3292b8f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -386,6 +386,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_text_poke_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4397,7 +4398,7 @@ static bool is_sb_event(struct perf_event *event)
 	if (attr->mmap || attr->mmap_data || attr->mmap2 ||
 	    attr->comm || attr->comm_exec ||
 	    attr->task || attr->ksymbol ||
-	    attr->context_switch ||
+	    attr->context_switch || attr->text_poke ||
 	    attr->bpf_event)
 		return true;
 	return false;
@@ -4471,6 +4472,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_ksymbol_events);
 	if (event->attr.bpf_event)
 		atomic_dec(&nr_bpf_events);
+	if (event->attr.text_poke)
+		atomic_dec(&nr_text_poke_events);
 
 	if (dec) {
 		if (!atomic_add_unless(&perf_sched_count, -1, 1))
@@ -8309,6 +8312,89 @@ void perf_event_bpf_event(struct bpf_prog *prog,
 	perf_iterate_sb(perf_event_bpf_output, &bpf_event, NULL);
 }
 
+struct perf_text_poke_event {
+	const void		*old_bytes;
+	const void		*new_bytes;
+	size_t			pad;
+	u16			old_len;
+	u16			new_len;
+
+	struct {
+		struct perf_event_header	header;
+
+		u64				addr;
+	} event_id;
+};
+
+static int perf_event_text_poke_match(struct perf_event *event)
+{
+	return event->attr.text_poke;
+}
+
+static void perf_event_text_poke_output(struct perf_event *event, void *data)
+{
+	struct perf_text_poke_event *text_poke_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u64 padding = 0;
+	int ret;
+
+	if (!perf_event_text_poke_match(event))
+		return;
+
+	perf_event_header__init_id(&text_poke_event->event_id.header, &sample, event);
+
+	ret = perf_output_begin(&handle, event, text_poke_event->event_id.header.size);
+	if (ret)
+		return;
+
+	perf_output_put(&handle, text_poke_event->event_id);
+	perf_output_put(&handle, text_poke_event->old_len);
+	perf_output_put(&handle, text_poke_event->new_len);
+
+	__output_copy(&handle, text_poke_event->old_bytes, text_poke_event->old_len);
+	__output_copy(&handle, text_poke_event->new_bytes, text_poke_event->new_len);
+
+	if (text_poke_event->pad)
+		__output_copy(&handle, &padding, text_poke_event->pad);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
+void perf_event_text_poke(const void *addr, const void *old_bytes,
+			  size_t old_len, const void *new_bytes, size_t new_len)
+{
+	struct perf_text_poke_event text_poke_event;
+	size_t tot, pad;
+
+	if (!atomic_read(&nr_text_poke_events))
+		return;
+
+	tot  = sizeof(text_poke_event.old_len) + old_len;
+	tot += sizeof(text_poke_event.new_len) + new_len;
+	pad  = ALIGN(tot, sizeof(u64)) - tot;
+
+	text_poke_event = (struct perf_text_poke_event){
+		.old_bytes    = old_bytes,
+		.new_bytes    = new_bytes,
+		.pad          = pad,
+		.old_len      = old_len,
+		.new_len      = new_len,
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_TEXT_POKE,
+				.misc = PERF_RECORD_MISC_KERNEL,
+				.size = sizeof(text_poke_event.event_id) + tot + pad,
+			},
+			.addr = (unsigned long)addr,
+		},
+	};
+
+	perf_iterate_sb(perf_event_text_poke_output, &text_poke_event, NULL);
+}
+
 void perf_event_itrace_started(struct perf_event *event)
 {
 	event->attach_state |= PERF_ATTACH_ITRACE;
@@ -10623,6 +10709,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_ksymbol_events);
 	if (event->attr.bpf_event)
 		atomic_inc(&nr_bpf_events);
+	if (event->attr.text_poke)
+		atomic_inc(&nr_text_poke_events);
 
 	if (inc) {
 		/*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 01/13] perf: Add perf text poke event Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Add support for perf text poke event for text_poke_bp_batch() callers. That
includes jump labels. See comments for more details.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/kernel/alternative.c | 37 ++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 34360ca301a2..737e7a842f85 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -3,6 +3,7 @@
 
 #include <linux/module.h>
 #include <linux/sched.h>
+#include <linux/perf_event.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/stringify.h>
@@ -946,6 +947,7 @@ struct text_poke_loc {
 	s32 rel32;
 	u8 opcode;
 	const u8 text[POKE_MAX_OPCODE_SIZE];
+	u8 old;
 };
 
 struct bp_patching_desc {
@@ -1114,8 +1116,10 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
 	/*
 	 * First step: add a int3 trap to the address that will be patched.
 	 */
-	for (i = 0; i < nr_entries; i++)
+	for (i = 0; i < nr_entries; i++) {
+		tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
 		text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
+	}
 
 	text_poke_sync();
 
@@ -1123,14 +1127,45 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
 	 * Second step: update all but the first byte of the patched range.
 	 */
 	for (do_sync = 0, i = 0; i < nr_entries; i++) {
+		u8 old[POKE_MAX_OPCODE_SIZE] = { tp[i].old, };
 		int len = text_opcode_size(tp[i].opcode);
 
 		if (len - INT3_INSN_SIZE > 0) {
+			memcpy(old + INT3_INSN_SIZE,
+			       text_poke_addr(&tp[i]) + INT3_INSN_SIZE,
+			       len - INT3_INSN_SIZE);
 			text_poke(text_poke_addr(&tp[i]) + INT3_INSN_SIZE,
 				  (const char *)tp[i].text + INT3_INSN_SIZE,
 				  len - INT3_INSN_SIZE);
 			do_sync++;
 		}
+
+		/*
+		 * Emit a perf event to record the text poke, primarily to
+		 * support Intel PT decoding which must walk the executable code
+		 * to reconstruct the trace. The flow up to here is:
+		 *   - write INT3 byte
+		 *   - IPI-SYNC
+		 *   - write instruction tail
+		 * At this point the actual control flow will be through the
+		 * INT3 and handler and not hit the old or new instruction.
+		 * Intel PT outputs FUP/TIP packets for the INT3, so the flow
+		 * can still be decoded. Subsequently:
+		 *   - emit RECORD_TEXT_POKE with the new instruction
+		 *   - IPI-SYNC
+		 *   - write first byte
+		 *   - IPI-SYNC
+		 * So before the text poke event timestamp, the decoder will see
+		 * either the old instruction flow or FUP/TIP of INT3. After the
+		 * text poke event timestamp, the decoder will see either the
+		 * new instruction flow or FUP/TIP of INT3. Thus decoders can
+		 * use the timestamp as the point at which to modify the
+		 * executable code.
+		 * The old instruction is recorded so that the event can be
+		 * processed forwards or backwards.
+		 */
+		perf_event_text_poke(text_poke_addr(&tp[i]), old, len,
+				     tp[i].text, len);
 	}
 
 	if (do_sync) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 01/13] perf: Add perf text poke event Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-05  5:58   ` Masami Hiramatsu
  2020-03-24 12:31   ` Peter Zijlstra
  2020-03-04  9:06 ` [PATCH V4 04/13] kprobes: Add perf ksymbol events " Adrian Hunter
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Symbols are needed for tools to describe instruction addresses. Pages
allocated for kprobe's purposes need symbols to be created for them.
Add such symbols to be visible via /proc/kallsyms.

Note: kprobe insn pages are not used if ftrace is configured. To see the
effect of this patch, the kernel must be configured with:

	# CONFIG_FUNCTION_TRACER is not set
	CONFIG_KPROBES=y

and for optimised kprobes:

	CONFIG_OPTPROBES=y

Example on x86:

	# perf probe __schedule
	Added new event:
	  probe:__schedule     (on __schedule)
	# cat /proc/kallsyms | grep '\[__builtin__kprobes\]'
	ffffffffc00d4000 t kprobe_insn_page     [__builtin__kprobes]
	ffffffffc00d6000 t kprobe_optinsn_page  [__builtin__kprobes]

Note: This patch adds "__builtin__kprobes" as a module name in
/proc/kallsyms for symbols for pages allocated for kprobes' purposes, even
though "__builtin__kprobes" is not a module.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/linux/kprobes.h | 15 ++++++++++++++
 kernel/kallsyms.c       | 37 +++++++++++++++++++++++++++++----
 kernel/kprobes.c        | 45 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 04bdaf01112c..62d682f47b5e 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -242,6 +242,7 @@ struct kprobe_insn_cache {
 	struct mutex mutex;
 	void *(*alloc)(void);	/* allocate insn page */
 	void (*free)(void *);	/* free insn page */
+	const char *sym;	/* symbol for insn pages */
 	struct list_head pages; /* list of kprobe_insn_page */
 	size_t insn_size;	/* size of instruction slot */
 	int nr_garbage;
@@ -272,6 +273,8 @@ static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
 {									\
 	return __is_insn_slot_addr(&kprobe_##__name##_slots, addr);	\
 }
+#define KPROBE_INSN_PAGE_SYM		"kprobe_insn_page"
+#define KPROBE_OPTINSN_PAGE_SYM		"kprobe_optinsn_page"
 #else /* __ARCH_WANT_KPROBES_INSN_SLOT */
 #define DEFINE_INSN_CACHE_OPS(__name)					\
 static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
@@ -373,6 +376,13 @@ void dump_kprobe(struct kprobe *kp);
 void *alloc_insn_page(void);
 void free_insn_page(void *page);
 
+int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
+		       char *sym);
+int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
+			     unsigned long *value, char *type, char *sym);
+
+int arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
+			    char *type, char *sym);
 #else /* !CONFIG_KPROBES: */
 
 static inline int kprobes_built_in(void)
@@ -435,6 +445,11 @@ static inline bool within_kprobe_blacklist(unsigned long addr)
 {
 	return true;
 }
+static inline int kprobe_get_kallsym(unsigned int symnum, unsigned long *value,
+				     char *type, char *sym)
+{
+	return -ERANGE;
+}
 #endif /* CONFIG_KPROBES */
 static inline int disable_kretprobe(struct kretprobe *rp)
 {
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 136ce049c4ad..4a93511e6243 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -24,6 +24,7 @@
 #include <linux/slab.h>
 #include <linux/filter.h>
 #include <linux/ftrace.h>
+#include <linux/kprobes.h>
 #include <linux/compiler.h>
 
 /*
@@ -438,6 +439,7 @@ struct kallsym_iter {
 	loff_t pos_arch_end;
 	loff_t pos_mod_end;
 	loff_t pos_ftrace_mod_end;
+	loff_t pos_bpf_end;
 	unsigned long value;
 	unsigned int nameoff; /* If iterating in core kernel symbols. */
 	char type;
@@ -497,11 +499,33 @@ static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)
 
 static int get_ksymbol_bpf(struct kallsym_iter *iter)
 {
+	int ret;
+
 	strlcpy(iter->module_name, "bpf", MODULE_NAME_LEN);
 	iter->exported = 0;
-	return bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
-			       &iter->value, &iter->type,
-			       iter->name) < 0 ? 0 : 1;
+	ret = bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
+			      &iter->value, &iter->type,
+			      iter->name);
+	if (ret < 0) {
+		iter->pos_bpf_end = iter->pos;
+		return 0;
+	}
+
+	return 1;
+}
+
+/*
+ * This uses "__builtin__kprobes" as a module name for symbols for pages
+ * allocated for kprobes' purposes, even though "__builtin__kprobes" is not a
+ * module.
+ */
+static int get_ksymbol_kprobe(struct kallsym_iter *iter)
+{
+	strlcpy(iter->module_name, "__builtin__kprobes", MODULE_NAME_LEN);
+	iter->exported = 0;
+	return kprobe_get_kallsym(iter->pos - iter->pos_bpf_end,
+				  &iter->value, &iter->type,
+				  iter->name) < 0 ? 0 : 1;
 }
 
 /* Returns space to next name. */
@@ -528,6 +552,7 @@ static void reset_iter(struct kallsym_iter *iter, loff_t new_pos)
 		iter->pos_arch_end = 0;
 		iter->pos_mod_end = 0;
 		iter->pos_ftrace_mod_end = 0;
+		iter->pos_bpf_end = 0;
 	}
 }
 
@@ -552,7 +577,11 @@ static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
 	    get_ksymbol_ftrace_mod(iter))
 		return 1;
 
-	return get_ksymbol_bpf(iter);
+	if ((!iter->pos_bpf_end || iter->pos_bpf_end > pos) &&
+	    get_ksymbol_bpf(iter))
+		return 1;
+
+	return get_ksymbol_kprobe(iter);
 }
 
 /* Returns false if pos at or past end of file. */
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..229d1b596690 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -118,6 +118,7 @@ struct kprobe_insn_cache kprobe_insn_slots = {
 	.mutex = __MUTEX_INITIALIZER(kprobe_insn_slots.mutex),
 	.alloc = alloc_insn_page,
 	.free = free_insn_page,
+	.sym = KPROBE_INSN_PAGE_SYM,
 	.pages = LIST_HEAD_INIT(kprobe_insn_slots.pages),
 	.insn_size = MAX_INSN_SIZE,
 	.nr_garbage = 0,
@@ -296,6 +297,7 @@ struct kprobe_insn_cache kprobe_optinsn_slots = {
 	.mutex = __MUTEX_INITIALIZER(kprobe_optinsn_slots.mutex),
 	.alloc = alloc_insn_page,
 	.free = free_insn_page,
+	.sym = KPROBE_OPTINSN_PAGE_SYM,
 	.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
 	/* .insn_size is initialized later */
 	.nr_garbage = 0,
@@ -2179,6 +2181,49 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
 	return 0;
 }
 
+int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
+			     unsigned long *value, char *type, char *sym)
+{
+	struct kprobe_insn_page *kip;
+	int ret = -ERANGE;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(kip, &c->pages, list) {
+		if ((*symnum)--)
+			continue;
+		strlcpy(sym, c->sym, KSYM_NAME_LEN);
+		*type = 't';
+		*value = (unsigned long)kip->insns;
+		ret = 0;
+		break;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
+int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
+				   char *type, char *sym)
+{
+	return -ERANGE;
+}
+
+int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
+		       char *sym)
+{
+#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
+	if (!kprobe_cache_get_kallsym(&kprobe_insn_slots, &symnum, value, type, sym))
+		return 0;
+#ifdef CONFIG_OPTPROBES
+	if (!kprobe_cache_get_kallsym(&kprobe_optinsn_slots, &symnum, value, type, sym))
+		return 0;
+#endif
+#endif
+	if (!arch_kprobe_get_kallsym(&symnum, value, type, sym))
+		return 0;
+	return -ERANGE;
+}
+
 int __init __weak arch_populate_kprobe_blacklist(void)
 {
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 04/13] kprobes: Add perf ksymbol events for kprobe insn pages
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (2 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes Adrian Hunter
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Symbols are needed for tools to describe instruction addresses. Pages
allocated for kprobe's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 include/uapi/linux/perf_event.h |  5 +++++
 kernel/kprobes.c                | 12 ++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index bae9e9d2d897..9b38ac04c110 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1031,6 +1031,11 @@ enum perf_event_type {
 enum perf_record_ksymbol_type {
 	PERF_RECORD_KSYMBOL_TYPE_UNKNOWN	= 0,
 	PERF_RECORD_KSYMBOL_TYPE_BPF		= 1,
+	/*
+	 * Out of line code such as kprobe-replaced instructions or optimized
+	 * kprobes.
+	 */
+	PERF_RECORD_KSYMBOL_TYPE_OOL		= 2,
 	PERF_RECORD_KSYMBOL_TYPE_MAX		/* non-ABI */
 };
 
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 229d1b596690..f880eb2189c0 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -35,6 +35,7 @@
 #include <linux/ftrace.h>
 #include <linux/cpu.h>
 #include <linux/jump_label.h>
+#include <linux/perf_event.h>
 
 #include <asm/sections.h>
 #include <asm/cacheflush.h>
@@ -184,6 +185,10 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 	kip->cache = c;
 	list_add_rcu(&kip->list, &c->pages);
 	slot = kip->insns;
+
+	/* Record the perf ksymbol register event after adding the page */
+	perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL, (u64)kip->insns,
+			   PAGE_SIZE, false, c->sym);
 out:
 	mutex_unlock(&c->mutex);
 	return slot;
@@ -202,6 +207,13 @@ static int collect_one_slot(struct kprobe_insn_page *kip, int idx)
 		 * next time somebody inserts a probe.
 		 */
 		if (!list_is_singular(&kip->list)) {
+			/*
+			 * Record perf ksymbol unregister event before removing
+			 * the page.
+			 */
+			perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
+					   (u64)kip->insns, PAGE_SIZE, true,
+					   kip->cache->sym);
 			list_del_rcu(&kip->list);
 			synchronize_rcu();
 			kip->cache->free(kip->insns);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (3 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 04/13] kprobes: Add perf ksymbol events " Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-24 12:21   ` Peter Zijlstra
  2020-03-04  9:06 ` [PATCH V4 06/13] ftrace: Add symbols for ftrace trampolines Adrian Hunter
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Add perf text poke events for kprobes. That includes:

 - the replaced instruction(s) which are executed out-of-line
   i.e. arch_copy_kprobe() and arch_remove_kprobe()

 - optimised kprobe function
   i.e. arch_prepare_optimized_kprobe() and
      __arch_remove_optimized_kprobe()

 - optimised kprobe
   i.e. arch_optimize_kprobes() and arch_unoptimize_kprobe()

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/include/asm/kprobes.h       |  4 ++++
 arch/x86/include/asm/text-patching.h |  2 ++
 arch/x86/kernel/alternative.c        | 35 +++++++++++++++++-----------
 arch/x86/kernel/kprobes/core.c       |  7 ++++++
 arch/x86/kernel/kprobes/opt.c        | 18 +++++++++++++-
 5 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 95b1f053bd96..542ce120a54d 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -65,11 +65,15 @@ struct arch_specific_insn {
 	 */
 	bool boostable;
 	bool if_modifier;
+	/* Number of bytes of text poked */
+	int tp_len;
 };
 
 struct arch_optimized_insn {
 	/* copy of the original instructions */
 	kprobe_opcode_t copied_insn[DISP32_SIZE];
+	/* Number of bytes of text poked */
+	int tp_len;
 	/* detour code buffer */
 	kprobe_opcode_t *insn;
 	/* the size of instructions copied to detour code buffer */
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index 67315fa3956a..13bb51a7789c 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -45,6 +45,8 @@ extern void *text_poke(void *addr, const void *opcode, size_t len);
 extern void text_poke_sync(void);
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
+extern void __text_poke_bp(void *addr, const void *opcode, size_t len,
+			   const void *emulate, const u8 *oldptr);
 extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
 
 extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 737e7a842f85..c8cfc97abc9e 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1075,6 +1075,7 @@ static int tp_vec_nr;
  * text_poke_bp_batch() -- update instructions on live kernel on SMP
  * @tp:			vector of instructions to patch
  * @nr_entries:		number of entries in the vector
+ * @oldptr:		pointer to original old insn byte
  *
  * Modify multi-byte instruction by using int3 breakpoint on SMP.
  * We completely avoid stop_machine() here, and achieve the
@@ -1092,7 +1093,8 @@ static int tp_vec_nr;
  *		  replacing opcode
  *	- sync cores
  */
-static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
+static void text_poke_bp_batch(struct text_poke_loc *tp,
+			       unsigned int nr_entries, const u8 *oldptr)
 {
 	struct bp_patching_desc desc = {
 		.vec = tp,
@@ -1117,7 +1119,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
 	 * First step: add a int3 trap to the address that will be patched.
 	 */
 	for (i = 0; i < nr_entries; i++) {
-		tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
+		tp[i].old = oldptr ? *oldptr : *(u8 *)text_poke_addr(&tp[i]);
 		text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
 	}
 
@@ -1274,7 +1276,7 @@ static bool tp_order_fail(void *addr)
 static void text_poke_flush(void *addr)
 {
 	if (tp_vec_nr == TP_VEC_MAX || tp_order_fail(addr)) {
-		text_poke_bp_batch(tp_vec, tp_vec_nr);
+		text_poke_bp_batch(tp_vec, tp_vec_nr, NULL);
 		tp_vec_nr = 0;
 	}
 }
@@ -1299,6 +1301,20 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
 	text_poke_loc_init(tp, addr, opcode, len, emulate);
 }
 
+void __ref __text_poke_bp(void *addr, const void *opcode, size_t len,
+			  const void *emulate, const u8 *oldptr)
+{
+	struct text_poke_loc tp;
+
+	if (unlikely(system_state == SYSTEM_BOOTING)) {
+		text_poke_early(addr, opcode, len);
+		return;
+	}
+
+	text_poke_loc_init(&tp, addr, opcode, len, emulate);
+	text_poke_bp_batch(&tp, 1, oldptr);
+}
+
 /**
  * text_poke_bp() -- update instructions on live kernel on SMP
  * @addr:	address to patch
@@ -1310,15 +1326,8 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
  * dynamically allocated memory. This function should be used when it is
  * not possible to allocate memory.
  */
-void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate)
+void __ref text_poke_bp(void *addr, const void *opcode, size_t len,
+			const void *emulate)
 {
-	struct text_poke_loc tp;
-
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
-	text_poke_loc_init(&tp, addr, opcode, len, emulate);
-	text_poke_bp_batch(&tp, 1);
+	return __text_poke_bp(addr, opcode, len, emulate, NULL);
 }
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 579d30e91a36..12ea05d923ec 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -33,6 +33,7 @@
 #include <linux/hardirq.h>
 #include <linux/preempt.h>
 #include <linux/sched/debug.h>
+#include <linux/perf_event.h>
 #include <linux/extable.h>
 #include <linux/kdebug.h>
 #include <linux/kallsyms.h>
@@ -470,6 +471,9 @@ static int arch_copy_kprobe(struct kprobe *p)
 	/* Also, displacement change doesn't affect the first byte */
 	p->opcode = buf[0];
 
+	p->ainsn.tp_len = len;
+	perf_event_text_poke(p->ainsn.insn, NULL, 0, buf, len);
+
 	/* OK, write back the instruction(s) into ROX insn buffer */
 	text_poke(p->ainsn.insn, buf, len);
 
@@ -514,6 +518,9 @@ void arch_disarm_kprobe(struct kprobe *p)
 void arch_remove_kprobe(struct kprobe *p)
 {
 	if (p->ainsn.insn) {
+		/* Record the perf event before freeing the slot */
+		perf_event_text_poke(p->ainsn.insn, p->ainsn.insn,
+				     p->ainsn.tp_len, NULL, 0);
 		free_insn_slot(p->ainsn.insn, p->ainsn.boostable);
 		p->ainsn.insn = NULL;
 	}
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 3f45b5c43a71..0f0b84b3f4b9 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -6,6 +6,7 @@
  * Copyright (C) Hitachi Ltd., 2012
  */
 #include <linux/kprobes.h>
+#include <linux/perf_event.h>
 #include <linux/ptrace.h>
 #include <linux/string.h>
 #include <linux/slab.h>
@@ -332,6 +333,10 @@ static
 void __arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
 {
 	if (op->optinsn.insn) {
+		/* Record the perf event before freeing the slot */
+		if (dirty)
+			perf_event_text_poke(op->optinsn.insn, op->optinsn.insn,
+					     op->optinsn.tp_len, NULL, 0);
 		free_optinsn_slot(op->optinsn.insn, dirty);
 		op->optinsn.insn = NULL;
 		op->optinsn.size = 0;
@@ -401,6 +406,9 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
 			   (u8 *)op->kp.addr + op->optinsn.size);
 	len += JMP32_INSN_SIZE;
 
+	op->optinsn.tp_len = len;
+	perf_event_text_poke(slot, NULL, 0, buf, len);
+
 	/* We have to use text_poke() for instruction buffer because it is RO */
 	text_poke(slot, buf, len);
 	ret = 0;
@@ -439,7 +447,8 @@ void arch_optimize_kprobes(struct list_head *oplist)
 		insn_buff[0] = JMP32_INSN_OPCODE;
 		*(s32 *)(&insn_buff[1]) = rel;
 
-		text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE, NULL);
+		__text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE, NULL,
+			       &op->kp.opcode);
 
 		list_del_init(&op->list);
 	}
@@ -454,9 +463,16 @@ void arch_optimize_kprobes(struct list_head *oplist)
  */
 void arch_unoptimize_kprobe(struct optimized_kprobe *op)
 {
+	u8 old[POKE_MAX_OPCODE_SIZE];
+	u8 new[POKE_MAX_OPCODE_SIZE] = { op->kp.opcode, };
+	size_t len = INT3_INSN_SIZE + DISP32_SIZE;
+
+	memcpy(old, op->kp.addr, len);
 	arch_arm_kprobe(&op->kp);
 	text_poke(op->kp.addr + INT3_INSN_SIZE,
 		  op->optinsn.copied_insn, DISP32_SIZE);
+	memcpy(new + INT3_INSN_SIZE, op->optinsn.copied_insn, DISP32_SIZE);
+	perf_event_text_poke(op->kp.addr, old, len, new, len);
 	text_poke_sync();
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 06/13] ftrace: Add symbols for ftrace trampolines
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (4 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 07/13] ftrace: Add perf ksymbol events " Adrian Hunter
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via /proc/kallsyms.

Example on x86 with CONFIG_DYNAMIC_FTRACE=y

	# echo function > /sys/kernel/debug/tracing/current_tracer
	# cat /proc/kallsyms | grep '\[__builtin__ftrace\]'
	ffffffffc0238000 t ftrace_trampoline    [__builtin__ftrace]

Note: This patch adds "__builtin__ftrace" as a module name in /proc/kallsyms for
symbols for pages allocated for ftrace's purposes, even though "__builtin__ftrace"
is not a module.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/linux/ftrace.h | 12 ++++---
 kernel/kallsyms.c      |  5 +++
 kernel/trace/ftrace.c  | 77 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index db95244a62d4..ea726ad1fa83 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -58,9 +58,6 @@ struct ftrace_direct_func;
 const char *
 ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
 		   unsigned long *off, char **modname, char *sym);
-int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
-			   char *type, char *name,
-			   char *module_name, int *exported);
 #else
 static inline const char *
 ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
@@ -68,6 +65,13 @@ ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
 {
 	return NULL;
 }
+#endif
+
+#if defined(CONFIG_FUNCTION_TRACER) && defined(CONFIG_DYNAMIC_FTRACE)
+int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
+			   char *type, char *name,
+			   char *module_name, int *exported);
+#else
 static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
 					 char *type, char *name,
 					 char *module_name, int *exported)
@@ -76,7 +80,6 @@ static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *val
 }
 #endif
 
-
 #ifdef CONFIG_FUNCTION_TRACER
 
 extern int ftrace_enabled;
@@ -207,6 +210,7 @@ struct ftrace_ops {
 	struct ftrace_ops_hash		old_hash;
 	unsigned long			trampoline;
 	unsigned long			trampoline_size;
+	struct list_head		list;
 #endif
 };
 
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 4a93511e6243..24638586a39e 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -483,6 +483,11 @@ static int get_ksymbol_mod(struct kallsym_iter *iter)
 	return 1;
 }
 
+/*
+ * ftrace_mod_get_kallsym() may also get symbols for pages allocated for ftrace
+ * purposes. In that case "__builtin__ftrace" is used as a module name, even
+ * though "__builtin__ftrace" is not a module.
+ */
 static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)
 {
 	int ret = ftrace_mod_get_kallsym(iter->pos - iter->pos_mod_end,
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9bf1f2cd515e..aa3149fd1fc2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2768,6 +2768,38 @@ void __weak arch_ftrace_trampoline_free(struct ftrace_ops *ops)
 {
 }
 
+/* List of trace_ops that have allocated trampolines */
+static LIST_HEAD(ftrace_ops_trampoline_list);
+
+static void ftrace_add_trampoline_to_kallsyms(struct ftrace_ops *ops)
+{
+	lockdep_assert_held(&ftrace_lock);
+	list_add_rcu(&ops->list, &ftrace_ops_trampoline_list);
+}
+
+static void ftrace_remove_trampoline_from_kallsyms(struct ftrace_ops *ops)
+{
+	lockdep_assert_held(&ftrace_lock);
+	list_del_rcu(&ops->list);
+}
+
+/*
+ * "__builtin__ftrace" is used as a module name in /proc/kallsyms for symbols
+ * for pages allocated for ftrace purposes, even though "__builtin__ftrace" is
+ * not a module.
+ */
+#define FTRACE_TRAMPOLINE_MOD "__builtin__ftrace"
+#define FTRACE_TRAMPOLINE_SYM "ftrace_trampoline"
+
+static void ftrace_trampoline_free(struct ftrace_ops *ops)
+{
+	if (ops && (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP) &&
+	    ops->trampoline)
+		ftrace_remove_trampoline_from_kallsyms(ops);
+
+	arch_ftrace_trampoline_free(ops);
+}
+
 static void ftrace_startup_enable(int command)
 {
 	if (saved_ftrace_func != ftrace_trace_function) {
@@ -2938,7 +2970,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
 			synchronize_rcu_tasks();
 
  free_ops:
-		arch_ftrace_trampoline_free(ops);
+		ftrace_trampoline_free(ops);
 	}
 
 	return 0;
@@ -6174,6 +6206,27 @@ struct ftrace_mod_map {
 	unsigned int		num_funcs;
 };
 
+static int ftrace_get_trampoline_kallsym(unsigned int symnum,
+					 unsigned long *value, char *type,
+					 char *name, char *module_name,
+					 int *exported)
+{
+	struct ftrace_ops *op;
+
+	list_for_each_entry_rcu(op, &ftrace_ops_trampoline_list, list) {
+		if (!op->trampoline || symnum--)
+			continue;
+		*value = op->trampoline;
+		*type = 't';
+		strlcpy(name, FTRACE_TRAMPOLINE_SYM, KSYM_NAME_LEN);
+		strlcpy(module_name, FTRACE_TRAMPOLINE_MOD, MODULE_NAME_LEN);
+		*exported = 0;
+		return 0;
+	}
+
+	return -ERANGE;
+}
+
 #ifdef CONFIG_MODULES
 
 #define next_to_ftrace_page(p) container_of(p, struct ftrace_page, next)
@@ -6510,6 +6563,7 @@ int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
 {
 	struct ftrace_mod_map *mod_map;
 	struct ftrace_mod_func *mod_func;
+	int ret;
 
 	preempt_disable();
 	list_for_each_entry_rcu(mod_map, &ftrace_mod_maps, list) {
@@ -6536,8 +6590,10 @@ int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
 		WARN_ON(1);
 		break;
 	}
+	ret = ftrace_get_trampoline_kallsym(symnum, value, type, name,
+					    module_name, exported);
 	preempt_enable();
-	return -ERANGE;
+	return ret;
 }
 
 #else
@@ -6549,6 +6605,18 @@ allocate_ftrace_mod_map(struct module *mod,
 {
 	return NULL;
 }
+int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
+			   char *type, char *name, char *module_name,
+			   int *exported)
+{
+	int ret;
+
+	preempt_disable();
+	ret = ftrace_get_trampoline_kallsym(symnum, value, type, name,
+					    module_name, exported);
+	preempt_enable();
+	return ret;
+}
 #endif /* CONFIG_MODULES */
 
 struct ftrace_init_func {
@@ -6729,7 +6797,12 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 static void ftrace_update_trampoline(struct ftrace_ops *ops)
 {
+	unsigned long trampoline = ops->trampoline;
+
 	arch_ftrace_update_trampoline(ops);
+	if (ops->trampoline && ops->trampoline != trampoline &&
+	    (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
+		ftrace_add_trampoline_to_kallsyms(ops);
 }
 
 void ftrace_init_trace_array(struct trace_array *tr)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 07/13] ftrace: Add perf ksymbol events for ftrace trampolines
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (5 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 06/13] ftrace: Add symbols for ftrace trampolines Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 08/13] ftrace: Add perf text poke " Adrian Hunter
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/uapi/linux/perf_event.h |  2 +-
 kernel/trace/ftrace.c           | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9b38ac04c110..f80ce2e9f8b9 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1033,7 +1033,7 @@ enum perf_record_ksymbol_type {
 	PERF_RECORD_KSYMBOL_TYPE_BPF		= 1,
 	/*
 	 * Out of line code such as kprobe-replaced instructions or optimized
-	 * kprobes.
+	 * kprobes or ftrace trampolines.
 	 */
 	PERF_RECORD_KSYMBOL_TYPE_OOL		= 2,
 	PERF_RECORD_KSYMBOL_TYPE_MAX		/* non-ABI */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index aa3149fd1fc2..fda0c9144642 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2794,8 +2794,13 @@ static void ftrace_remove_trampoline_from_kallsyms(struct ftrace_ops *ops)
 static void ftrace_trampoline_free(struct ftrace_ops *ops)
 {
 	if (ops && (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP) &&
-	    ops->trampoline)
+	    ops->trampoline) {
+		perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
+				   ops->trampoline, ops->trampoline_size,
+				   true, FTRACE_TRAMPOLINE_SYM);
+		/* Remove from kallsyms after the perf events */
 		ftrace_remove_trampoline_from_kallsyms(ops);
+	}
 
 	arch_ftrace_trampoline_free(ops);
 }
@@ -6801,8 +6806,13 @@ static void ftrace_update_trampoline(struct ftrace_ops *ops)
 
 	arch_ftrace_update_trampoline(ops);
 	if (ops->trampoline && ops->trampoline != trampoline &&
-	    (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
+	    (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP)) {
+		/* Add to kallsyms before the perf events */
 		ftrace_add_trampoline_to_kallsyms(ops);
+		perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
+				   ops->trampoline, ops->trampoline_size, false,
+				   FTRACE_TRAMPOLINE_SYM);
+	}
 }
 
 void ftrace_init_trace_array(struct trace_array *tr)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 08/13] ftrace: Add perf text poke events for ftrace trampolines
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (6 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 07/13] ftrace: Add perf ksymbol events " Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-04-01 10:09   ` Peter Zijlstra
  2020-03-04  9:06 ` [PATCH V4 09/13] perf kcore_copy: Fix module map when there are no modules loaded Adrian Hunter
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Add perf text poke events for ftrace trampolines when created and when
freed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 kernel/trace/ftrace.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fda0c9144642..75f9aac31fec 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2795,6 +2795,13 @@ static void ftrace_trampoline_free(struct ftrace_ops *ops)
 {
 	if (ops && (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP) &&
 	    ops->trampoline) {
+		/*
+		 * Record the text poke event before the ksymbol unregister
+		 * event.
+		 */
+		perf_event_text_poke((void *)ops->trampoline,
+				     (void *)ops->trampoline,
+				     ops->trampoline_size, NULL, 0);
 		perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
 				   ops->trampoline, ops->trampoline_size,
 				   true, FTRACE_TRAMPOLINE_SYM);
@@ -6812,6 +6819,13 @@ static void ftrace_update_trampoline(struct ftrace_ops *ops)
 		perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
 				   ops->trampoline, ops->trampoline_size, false,
 				   FTRACE_TRAMPOLINE_SYM);
+		/*
+		 * Record the perf text poke event after the ksymbol register
+		 * event.
+		 */
+		perf_event_text_poke((void *)ops->trampoline, NULL, 0,
+				     (void *)ops->trampoline,
+				     ops->trampoline_size);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 09/13] perf kcore_copy: Fix module map when there are no modules loaded
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (7 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 08/13] ftrace: Add perf text poke " Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 10/13] perf evlist: Disable 'immediate' events last Adrian Hunter
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

In the absence of any modules, no "modules" map is created, but there are
other executable pages to map, due to eBPF JIT, kprobe or ftrace. Map them
by recognizing that the first "module" symbol is not necessarily from a
module, and adjust the map accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/symbol-elf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 1965aefccb02..3df77b56e28a 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1452,6 +1452,7 @@ struct kcore_copy_info {
 	u64 first_symbol;
 	u64 last_symbol;
 	u64 first_module;
+	u64 first_module_symbol;
 	u64 last_module_symbol;
 	size_t phnum;
 	struct list_head phdrs;
@@ -1528,6 +1529,8 @@ static int kcore_copy__process_kallsyms(void *arg, const char *name, char type,
 		return 0;
 
 	if (strchr(name, '[')) {
+		if (!kci->first_module_symbol || start < kci->first_module_symbol)
+			kci->first_module_symbol = start;
 		if (start > kci->last_module_symbol)
 			kci->last_module_symbol = start;
 		return 0;
@@ -1725,6 +1728,10 @@ static int kcore_copy__calc_maps(struct kcore_copy_info *kci, const char *dir,
 		kci->etext += page_size;
 	}
 
+	if (kci->first_module_symbol &&
+	    (!kci->first_module || kci->first_module_symbol < kci->first_module))
+		kci->first_module = kci->first_module_symbol;
+
 	kci->first_module = round_down(kci->first_module, page_size);
 
 	if (kci->last_module_symbol) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 10/13] perf evlist: Disable 'immediate' events last
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (8 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 09/13] perf kcore_copy: Fix module map when there are no modules loaded Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE Adrian Hunter
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Events marked as 'immediate' are started before other events to ensure
that there is context at the start of the main tracing events. The same
is true at the end of tracing, so disable 'immediate' events after other
events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1548237b6558..06c4586065cc 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -379,22 +379,33 @@ void evlist__disable(struct evlist *evlist)
 {
 	struct evsel *pos;
 	struct affinity affinity;
-	int cpu, i;
+	int cpu, i, imm = 0;
+	bool has_imm = false;
 
 	if (affinity__setup(&affinity) < 0)
 		return;
 
-	evlist__for_each_cpu(evlist, i, cpu) {
-		affinity__set(&affinity, cpu);
-
-		evlist__for_each_entry(evlist, pos) {
-			if (evsel__cpu_iter_skip(pos, cpu))
-				continue;
-			if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
-				continue;
-			evsel__disable_cpu(pos, pos->cpu_iter - 1);
+	/* Disable 'immediate' events last */
+	for (imm = 0; imm <= 1; imm++) {
+		evlist__for_each_cpu(evlist, i, cpu) {
+			affinity__set(&affinity, cpu);
+
+			evlist__for_each_entry(evlist, pos) {
+				if (evsel__cpu_iter_skip(pos, cpu))
+					continue;
+				if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
+					continue;
+				if (pos->immediate)
+					has_imm = true;
+				if (pos->immediate != imm)
+					continue;
+				evsel__disable_cpu(pos, pos->cpu_iter - 1);
+			}
 		}
+		if (!has_imm)
+			break;
 	}
+
 	affinity__cleanup(&affinity);
 	evlist__for_each_entry(evlist, pos) {
 		if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (9 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 10/13] perf evlist: Disable 'immediate' events last Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL Adrian Hunter
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Add processing for PERF_RECORD_TEXT_POKE events. When a text poke event is
processed, then the kernel dso data cache is updated with the poked bytes.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/include/uapi/linux/perf_event.h     | 21 ++++++++++-
 tools/lib/perf/include/perf/event.h       |  9 +++++
 tools/perf/builtin-record.c               | 45 +++++++++++++++++++++++
 tools/perf/util/event.c                   | 40 ++++++++++++++++++++
 tools/perf/util/event.h                   |  5 +++
 tools/perf/util/evlist.h                  |  1 +
 tools/perf/util/evsel.c                   |  7 +++-
 tools/perf/util/machine.c                 | 43 ++++++++++++++++++++++
 tools/perf/util/machine.h                 |  3 ++
 tools/perf/util/perf_event_attr_fprintf.c |  1 +
 tools/perf/util/record.c                  | 10 +++++
 tools/perf/util/record.h                  |  1 +
 tools/perf/util/session.c                 | 23 ++++++++++++
 tools/perf/util/tool.h                    |  3 +-
 14 files changed, 209 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 377d794d3105..bae9e9d2d897 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				text_poke      :  1, /* include text poke events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1007,24 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * Records changes to kernel text i.e. self-modified code. 'old_len' is
+	 * the number of old bytes, 'new_len' is the number of new bytes. Either
+	 * 'old_len' or 'new_len' may be zero to indicate, for example, the
+	 * addition or removal of a trampoline. 'bytes' contains the old bytes
+	 * followed immediately by the new bytes.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				addr;
+	 *	u16				old_len;
+	 *	u16				new_len;
+	 *	u8				bytes[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_TEXT_POKE			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 18106899cb4e..e3bc9afff150 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -105,6 +105,14 @@ struct perf_record_bpf_event {
 	__u8			 tag[BPF_TAG_SIZE];  // prog tag
 };
 
+struct perf_record_text_poke_event {
+	struct perf_event_header header;
+	__u64			addr;
+	__u16			old_len;
+	__u16			new_len;
+	__u8			bytes[];
+};
+
 struct perf_record_sample {
 	struct perf_event_header header;
 	__u64			 array[];
@@ -360,6 +368,7 @@ union perf_event {
 	struct perf_record_sample		sample;
 	struct perf_record_bpf_event		bpf;
 	struct perf_record_ksymbol		ksymbol;
+	struct perf_record_text_poke_event	text_poke;
 	struct perf_record_header_attr		attr;
 	struct perf_record_event_update		event_update;
 	struct perf_record_header_event_type	event_type;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..538e3622547e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -723,6 +723,43 @@ static int record__auxtrace_init(struct record *rec __maybe_unused)
 
 #endif
 
+static int record__config_text_poke(struct evlist *evlist)
+{
+	struct evsel *evsel;
+	int err;
+
+	/* Nothing to do if text poke is already configured */
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.text_poke)
+			return 0;
+	}
+
+	err = parse_events(evlist, "dummy:u", NULL);
+	if (err)
+		return err;
+
+	evsel = evlist__last(evlist);
+
+	evsel->core.attr.freq = 0;
+	evsel->core.attr.sample_period = 1;
+	evsel->core.attr.text_poke = 1;
+	evsel->core.attr.ksymbol = 1;
+
+	evsel->core.system_wide = true;
+	evsel->no_aux_samples = true;
+	evsel->immediate = true;
+
+	/* Text poke must be collected on all CPUs */
+	perf_cpu_map__put(evsel->core.own_cpus);
+	evsel->core.own_cpus = perf_cpu_map__new(NULL);
+	perf_cpu_map__put(evsel->core.cpus);
+	evsel->core.cpus = perf_cpu_map__get(evsel->core.own_cpus);
+
+	perf_evsel__set_sample_bit(evsel, TIME);
+
+	return 0;
+}
+
 static bool record__kcore_readable(struct machine *machine)
 {
 	char kcore[PATH_MAX];
@@ -2610,6 +2647,14 @@ int cmd_record(int argc, const char **argv)
 	if (rec->opts.full_auxtrace)
 		rec->buildid_all = true;
 
+	if (rec->opts.text_poke) {
+		err = record__config_text_poke(rec->evlist);
+		if (err) {
+			pr_err("record__config_text_poke failed, error %d\n", err);
+			goto out;
+		}
+	}
+
 	if (record_opts__config(&rec->opts)) {
 		err = -EINVAL;
 		goto out;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c5447ff516a2..5a5baa4edde5 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -31,6 +31,7 @@
 #include "stat.h"
 #include "session.h"
 #include "bpf-event.h"
+#include "print_binary.h"
 #include "tool.h"
 #include "../perf.h"
 
@@ -54,6 +55,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_KSYMBOL]			= "KSYMBOL",
 	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
+	[PERF_RECORD_TEXT_POKE]			= "TEXT_POKE",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -252,6 +254,14 @@ int perf_event__process_bpf(struct perf_tool *tool __maybe_unused,
 	return machine__process_bpf(machine, event, sample);
 }
 
+int perf_event__process_text_poke(struct perf_tool *tool __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct machine *machine)
+{
+	return machine__process_text_poke(machine, event, sample);
+}
+
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRI_lx64 "(%#" PRI_lx64 ") @ %#" PRI_lx64 "]: %c %s\n",
@@ -398,6 +408,33 @@ size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp)
 		       event->bpf.type, event->bpf.flags, event->bpf.id);
 }
 
+static int text_poke_printer(enum binary_printer_ops op, unsigned int val,
+			     void *extra __maybe_unused, FILE *fp)
+{
+	if (op == BINARY_PRINT_NUM_DATA)
+		return fprintf(fp, " %02x", val);
+	if (op == BINARY_PRINT_LINE_END)
+		return fprintf(fp, "\n");
+	return 0;
+}
+
+size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp)
+{
+	size_t ret = fprintf(fp, " addr %#" PRI_lx64 " old len %u new len %u\n",
+			     event->text_poke.addr,
+			     event->text_poke.old_len,
+			     event->text_poke.new_len);
+
+	ret += fprintf(fp, "     old bytes:");
+	ret += binary__fprintf(event->text_poke.bytes, event->text_poke.old_len,
+			       16, text_poke_printer, NULL, fp);
+	ret += fprintf(fp, "     new bytes:");
+	ret += binary__fprintf(event->text_poke.bytes + event->text_poke.old_len,
+			       event->text_poke.new_len, 16, text_poke_printer,
+			       NULL, fp);
+	return ret;
+}
+
 size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 {
 	size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -439,6 +476,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_BPF_EVENT:
 		ret += perf_event__fprintf_bpf(event, fp);
 		break;
+	case PERF_RECORD_TEXT_POKE:
+		ret += perf_event__fprintf_text_poke(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 85223159737c..cbe98e22527f 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -345,6 +345,10 @@ int perf_event__process_bpf(struct perf_tool *tool,
 			    union perf_event *event,
 			    struct perf_sample *sample,
 			    struct machine *machine);
+int perf_event__process_text_poke(struct perf_tool *tool,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct machine *machine);
 int perf_event__process(struct perf_tool *tool,
 			union perf_event *event,
 			struct perf_sample *sample,
@@ -378,6 +382,7 @@ size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 int kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index f5bd5c386df1..93e9b35e53b0 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -175,6 +175,7 @@ struct callchain_param;
 void perf_evlist__set_id_pos(struct evlist *evlist);
 bool perf_can_sample_identifier(void);
 bool perf_can_record_switch_events(void);
+bool perf_can_record_text_poke_events(void);
 bool perf_can_record_cpu_wide(void);
 bool perf_can_aux_sample(void);
 void perf_evlist__config(struct evlist *evlist, struct record_opts *opts,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c8dc4450884c..acaa82a0787d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1096,7 +1096,12 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts,
 	attr->mmap  = track;
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
-	attr->ksymbol = track && !perf_missing_features.ksymbol;
+	/*
+	 * ksymbol is tracked separately with text poke because it needs to be
+	 * system wide and enabled immediately.
+	 */
+	if (!opts->text_poke)
+		attr->ksymbol = track && !perf_missing_features.ksymbol;
 	attr->bpf_event = track && !opts->no_bpf_event && !perf_missing_features.bpf;
 
 	if (opts->record_namespaces)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c8c5410315e8..c2825aaf3927 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -770,6 +770,47 @@ int machine__process_ksymbol(struct machine *machine __maybe_unused,
 	return machine__process_ksymbol_register(machine, event, sample);
 }
 
+int machine__process_text_poke(struct machine *machine, union perf_event *event,
+			       struct perf_sample *sample __maybe_unused)
+{
+	struct map *map = maps__find(&machine->kmaps, event->text_poke.addr);
+	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+	if (dump_trace)
+		perf_event__fprintf_text_poke(event, stdout);
+
+	if (!event->text_poke.new_len)
+		return 0;
+
+	if (cpumode != PERF_RECORD_MISC_KERNEL) {
+		pr_debug("%s: unsupported cpumode - ignoring\n", __func__);
+		return 0;
+	}
+
+	if (map && map->dso) {
+		u8 *new_bytes = event->text_poke.bytes + event->text_poke.old_len;
+		int ret;
+
+		/*
+		 * Kernel maps might be changed when loading symbols so loading
+		 * must be done prior to using kernel maps.
+		 */
+		map__load(map);
+		ret = dso__data_write_cache_addr(map->dso, map, machine,
+						 event->text_poke.addr,
+						 new_bytes,
+						 event->text_poke.new_len);
+		if (ret != event->text_poke.new_len)
+			pr_debug("Failed to write kernel text poke at %#" PRI_lx64 "\n",
+				 event->text_poke.addr);
+	} else {
+		pr_debug("Failed to find kernel text poke address map for %#" PRI_lx64 "\n",
+			 event->text_poke.addr);
+	}
+
+	return 0;
+}
+
 static struct map *machine__addnew_module_map(struct machine *machine, u64 start,
 					      const char *filename)
 {
@@ -1901,6 +1942,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_ksymbol(machine, event, sample); break;
 	case PERF_RECORD_BPF_EVENT:
 		ret = machine__process_bpf(machine, event, sample); break;
+	case PERF_RECORD_TEXT_POKE:
+		ret = machine__process_text_poke(machine, event, sample); break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index be0a930eca89..0bb086acb7ed 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -135,6 +135,9 @@ int machine__process_mmap2_event(struct machine *machine, union perf_event *even
 int machine__process_ksymbol(struct machine *machine,
 			     union perf_event *event,
 			     struct perf_sample *sample);
+int machine__process_text_poke(struct machine *machine,
+			       union perf_event *event,
+			       struct perf_sample *sample);
 int machine__process_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 651203126c71..4c6c700335a6 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -144,6 +144,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(aux_watermark, p_unsigned);
 	PRINT_ATTRf(sample_max_stack, p_unsigned);
 	PRINT_ATTRf(aux_sample_size, p_unsigned);
+	PRINT_ATTRf(text_poke, p_unsigned);
 
 	return ret;
 }
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 7def66168503..207ba2a65008 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -97,6 +97,11 @@ static void perf_probe_context_switch(struct evsel *evsel)
 	evsel->core.attr.context_switch = 1;
 }
 
+static void perf_probe_text_poke(struct evsel *evsel)
+{
+	evsel->core.attr.text_poke = 1;
+}
+
 bool perf_can_sample_identifier(void)
 {
 	return perf_probe_api(perf_probe_sample_identifier);
@@ -112,6 +117,11 @@ bool perf_can_record_switch_events(void)
 	return perf_probe_api(perf_probe_context_switch);
 }
 
+bool perf_can_record_text_poke_events(void)
+{
+	return perf_probe_api(perf_probe_text_poke);
+}
+
 bool perf_can_record_cpu_wide(void)
 {
 	struct perf_event_attr attr = {
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 5421fd2ad383..127b4ca11fd3 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -46,6 +46,7 @@ struct record_opts {
 	bool	      sample_id;
 	bool	      no_bpf_event;
 	bool	      kcore;
+	bool	      text_poke;
 	unsigned int  freq;
 	unsigned int  mmap_pages;
 	unsigned int  auxtrace_mmap_pages;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index d0d7d25b23e3..20bfdb08264f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -489,6 +489,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->ksymbol = perf_event__process_ksymbol;
 	if (tool->bpf == NULL)
 		tool->bpf = perf_event__process_bpf;
+	if (tool->text_poke == NULL)
+		tool->text_poke = perf_event__process_text_poke;
 	if (tool->read == NULL)
 		tool->read = process_event_sample_stub;
 	if (tool->throttle == NULL)
@@ -658,6 +660,24 @@ static void perf_event__switch_swap(union perf_event *event, bool sample_id_all)
 		swap_sample_id_all(event, &event->context_switch + 1);
 }
 
+static void perf_event__text_poke_swap(union perf_event *event, bool sample_id_all)
+{
+	event->text_poke.addr    = bswap_64(event->text_poke.addr);
+	event->text_poke.old_len = bswap_16(event->text_poke.old_len);
+	event->text_poke.new_len = bswap_16(event->text_poke.new_len);
+
+	if (sample_id_all) {
+		size_t len = sizeof(event->text_poke.old_len) +
+			     sizeof(event->text_poke.new_len) +
+			     event->text_poke.old_len +
+			     event->text_poke.new_len;
+		void *data = &event->text_poke.old_len;
+
+		data += PERF_ALIGN(len, sizeof(u64));
+		swap_sample_id_all(event, data);
+	}
+}
+
 static void perf_event__throttle_swap(union perf_event *event,
 				      bool sample_id_all)
 {
@@ -931,6 +951,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_SWITCH]		  = perf_event__switch_swap,
 	[PERF_RECORD_SWITCH_CPU_WIDE]	  = perf_event__switch_swap,
 	[PERF_RECORD_NAMESPACES]	  = perf_event__namespaces_swap,
+	[PERF_RECORD_TEXT_POKE]		  = perf_event__text_poke_swap,
 	[PERF_RECORD_HEADER_ATTR]	  = perf_event__hdr_attr_swap,
 	[PERF_RECORD_HEADER_EVENT_TYPE]	  = perf_event__event_type_swap,
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -1470,6 +1491,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->ksymbol(tool, event, sample, machine);
 	case PERF_RECORD_BPF_EVENT:
 		return tool->bpf(tool, event, sample, machine);
+	case PERF_RECORD_TEXT_POKE:
+		return tool->text_poke(tool, event, sample, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 2abbf668b8de..006182923bbe 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -56,7 +56,8 @@ struct perf_tool {
 			throttle,
 			unthrottle,
 			ksymbol,
-			bpf;
+			bpf,
+			text_poke;
 
 	event_attr_op	attr;
 	event_attr_op	event_update;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (10 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-04  9:06 ` [PATCH V4 13/13] perf intel-pt: Add support for text poke events Adrian Hunter
  2020-03-16  7:07 ` [PATCH V4 00/13] perf/x86: Add perf " Adrian Hunter
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

PERF_RECORD_KSYMBOL_TYPE_OOL marks an executable page. Create a map backed
only by memory, which will populated as necessary by text poke events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/include/uapi/linux/perf_event.h | 5 +++++
 tools/perf/util/dso.c                 | 3 +++
 tools/perf/util/dso.h                 | 1 +
 tools/perf/util/machine.c             | 6 ++++++
 tools/perf/util/map.c                 | 5 +++++
 tools/perf/util/map.h                 | 3 ++-
 tools/perf/util/symbol.c              | 1 +
 7 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index bae9e9d2d897..f80ce2e9f8b9 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1031,6 +1031,11 @@ enum perf_event_type {
 enum perf_record_ksymbol_type {
 	PERF_RECORD_KSYMBOL_TYPE_UNKNOWN	= 0,
 	PERF_RECORD_KSYMBOL_TYPE_BPF		= 1,
+	/*
+	 * Out of line code such as kprobe-replaced instructions or optimized
+	 * kprobes or ftrace trampolines.
+	 */
+	PERF_RECORD_KSYMBOL_TYPE_OOL		= 2,
 	PERF_RECORD_KSYMBOL_TYPE_MAX		/* non-ABI */
 };
 
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 91f21239608b..9eb50ff6da96 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -191,6 +191,7 @@ int dso__read_binary_type_filename(const struct dso *dso,
 	case DSO_BINARY_TYPE__GUEST_KALLSYMS:
 	case DSO_BINARY_TYPE__JAVA_JIT:
 	case DSO_BINARY_TYPE__BPF_PROG_INFO:
+	case DSO_BINARY_TYPE__OOL:
 	case DSO_BINARY_TYPE__NOT_FOUND:
 		ret = -1;
 		break;
@@ -881,6 +882,8 @@ static struct dso_cache *dso_cache__populate(struct dso *dso,
 
 	if (dso->binary_type == DSO_BINARY_TYPE__BPF_PROG_INFO)
 		*ret = bpf_read(dso, cache_offset, cache->data);
+	else if (dso->binary_type == DSO_BINARY_TYPE__OOL)
+		*ret = DSO__DATA_CACHE_SIZE;
 	else
 		*ret = file_read(dso, machine, cache_offset, cache->data);
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 2db64b79617a..b482453bc3d1 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -40,6 +40,7 @@ enum dso_binary_type {
 	DSO_BINARY_TYPE__GUEST_KCORE,
 	DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO,
 	DSO_BINARY_TYPE__BPF_PROG_INFO,
+	DSO_BINARY_TYPE__OOL,
 	DSO_BINARY_TYPE__NOT_FOUND,
 };
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c2825aaf3927..c7bb1aa81f3e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -730,6 +730,12 @@ static int machine__process_ksymbol_register(struct machine *machine,
 		if (!map)
 			return -ENOMEM;
 
+		if (event->ksymbol.ksym_type == PERF_RECORD_KSYMBOL_TYPE_OOL) {
+			map->dso->binary_type = DSO_BINARY_TYPE__OOL;
+			map->dso->data.file_size = event->ksymbol.len;
+			dso__set_loaded(map->dso);
+		}
+
 		map->start = event->ksymbol.addr;
 		map->end = map->start + event->ksymbol.len;
 		maps__insert(&machine->kmaps, map);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index f67960bedebb..4ec8cfd3967e 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -267,6 +267,11 @@ bool __map__is_bpf_prog(const struct map *map)
 	return name && (strstr(name, "bpf_prog_") == name);
 }
 
+bool __map__is_ool(const struct map *map)
+{
+	return map->dso && map->dso->binary_type == DSO_BINARY_TYPE__OOL;
+}
+
 bool map__has_symbols(const struct map *map)
 {
 	return dso__has_symbols(map->dso);
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 067036e8970c..9e312ae2d656 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -147,11 +147,12 @@ int map__set_kallsyms_ref_reloc_sym(struct map *map, const char *symbol_name,
 bool __map__is_kernel(const struct map *map);
 bool __map__is_extra_kernel_map(const struct map *map);
 bool __map__is_bpf_prog(const struct map *map);
+bool __map__is_ool(const struct map *map);
 
 static inline bool __map__is_kmodule(const struct map *map)
 {
 	return !__map__is_kernel(map) && !__map__is_extra_kernel_map(map) &&
-	       !__map__is_bpf_prog(map);
+	       !__map__is_bpf_prog(map) && !__map__is_ool(map);
 }
 
 bool map__has_symbols(const struct map *map);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 3b379b1296f1..deb7f9bee3af 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1537,6 +1537,7 @@ static bool dso__is_compatible_symtab_type(struct dso *dso, bool kmod,
 		return true;
 
 	case DSO_BINARY_TYPE__BPF_PROG_INFO:
+	case DSO_BINARY_TYPE__OOL:
 	case DSO_BINARY_TYPE__NOT_FOUND:
 	default:
 		return false;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V4 13/13] perf intel-pt: Add support for text poke events
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (11 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL Adrian Hunter
@ 2020-03-04  9:06 ` Adrian Hunter
  2020-03-16  7:07 ` [PATCH V4 00/13] perf/x86: Add perf " Adrian Hunter
  13 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-04  9:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

Select text poke events when available and the kernel is being traced.
Process text poke events to invalidate entries in Intel PT's instruction
cache.

Example:

  The example requires kernel config:
    CONFIG_PROC_SYSCTL=y
    CONFIG_SCHED_DEBUG=y
    CONFIG_SCHEDSTATS=y

  Before:

    # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
    # cat /proc/sys/kernel/sched_schedstats
    0
    # echo 1 > /proc/sys/kernel/sched_schedstats
    # cat /proc/sys/kernel/sched_schedstats
    1
    # echo 0 > /proc/sys/kernel/sched_schedstats
    # cat /proc/sys/kernel/sched_schedstats
    0
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 3.341 MB perf.data.before ]
    [1]+  Terminated                 perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
    # perf script -i perf.data.before --itrace=e >/dev/null
    Warning:
    474 instruction trace errors

  After:

    # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
    # cat /proc/sys/kernel/sched_schedstats
    0
    # echo 1 > /proc/sys/kernel/sched_schedstats
    # cat /proc/sys/kernel/sched_schedstats
    1
    # echo 0 > /proc/sys/kernel/sched_schedstats
    # cat /proc/sys/kernel/sched_schedstats
    0
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 2.646 MB perf.data.after ]
    [1]+  Terminated                 perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
    # perf script -i perf.data.after --itrace=e >/dev/null

Example:

  The example requires kernel config:
    # CONFIG_FUNCTION_TRACER is not set

  Before:
    # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
    # perf probe __schedule
    Added new event:
      probe:__schedule     (on __schedule)

    You can now use it in all perf tools, such as:

            perf record -e probe:__schedule -aR sleep 1

    # perf record -e probe:__schedule -aR sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.026 MB perf.data (68 samples) ]
    # perf probe -d probe:__schedule
    Removed event: probe:__schedule
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 41.268 MB t1 ]
    [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
    # perf script -i t1 --itrace=e >/dev/null
    Warning:
    207 instruction trace errors

  After:
    # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
    # perf probe __schedule
    Added new event:
      probe:__schedule     (on __schedule)

    You can now use it in all perf tools, such as:

        perf record -e probe:__schedule -aR sleep 1

    # perf record -e probe:__schedule -aR sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.028 MB perf.data (107 samples) ]
    # perf probe -d probe:__schedule
    Removed event: probe:__schedule
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 39.978 MB t1 ]
    [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
    # perf script -i t1 --itrace=e >/dev/null
    # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
    6 565303693547 0x291f18 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027a000 len 4096 type 2 flags 0x0 name kprobe_insn_page
    6 565303697010 0x291f68 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 0 new len 6
    6 565303838278 0x291fa8 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027c000 len 4096 type 2 flags 0x0 name kprobe_optinsn_page
    6 565303848286 0x291ff8 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 0 new len 106
    6 565369336743 0x292af8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
    7 566434327704 0x217c208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
    6 566456313475 0x293198 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 106 new len 0
    6 566456314935 0x293238 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 6 new len 0

Example:

  The example requires kernel config:
    CONFIG_FUNCTION_TRACER=y

  Before:
    # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
    # perf probe __kmalloc
    Added new event:
      probe:__kmalloc      (on __kmalloc)

    You can now use it in all perf tools, such as:

        perf record -e probe:__kmalloc -aR sleep 1

    # perf record -e probe:__kmalloc -aR sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.022 MB perf.data (6 samples) ]
    # perf probe -d probe:__kmalloc
    Removed event: probe:__kmalloc
    # kill %1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 43.850 MB t1 ]
    [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
    # perf script -i t1 --itrace=e >/dev/null
    Warning:
    8 instruction trace errors

  After:
    # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
    # perf probe __kmalloc
    Added new event:
      probe:__kmalloc      (on __kmalloc)

    You can now use it in all perf tools, such as:

            perf record -e probe:__kmalloc -aR sleep 1

    # perf record -e probe:__kmalloc -aR sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.037 MB perf.data (206 samples) ]
    # perf probe -d probe:__kmalloc
    Removed event: probe:__kmalloc
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 41.442 MB t1 ]
    [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
    # perf script -i t1 --itrace=e >/dev/null
    # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
    5 312216133258 0x8bafe0 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc0360000 len 415 type 2 flags 0x0 name ftrace_trampoline
    5 312216133494 0x8bb030 [0x1d8]: PERF_RECORD_TEXT_POKE addr 0xffffffffc0360000 old len 0 new len 415
    5 312216229563 0x8bb208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
    5 312216239063 0x8bb248 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
    5 312216727230 0x8bb288 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
    5 312216739322 0x8bb2c8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
    5 312216748321 0x8bb308 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
    7 313287163462 0x2817430 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
    7 313287174890 0x2817470 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
    7 313287818979 0x28174b0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
    7 313287829357 0x28174f0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
    7 313287841246 0x2817530 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c |  4 ++
 tools/perf/util/intel-pt.c          | 75 +++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 20df442fdf36..ec75445ef7cf 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -827,6 +827,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 		}
 	}
 
+	if (have_timing_info && !intel_pt_evsel->core.attr.exclude_kernel &&
+	    perf_can_record_text_poke_events() && perf_can_record_cpu_wide())
+		opts->text_poke = true;
+
 	if (intel_pt_evsel) {
 		/*
 		 * To obtain the auxtrace buffer file descriptor, the auxtrace
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 33cf8928cf05..64768312c994 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -514,6 +514,17 @@ intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
 	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
 }
 
+static void intel_pt_cache_invalidate(struct dso *dso, struct machine *machine,
+				      u64 offset)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+
+	if (!c)
+		return;
+
+	auxtrace_cache__remove(dso->auxtrace_cache, offset);
+}
+
 static inline u8 intel_pt_cpumode(struct intel_pt *pt, uint64_t ip)
 {
 	return ip >= pt->kernel_start ?
@@ -2592,6 +2603,67 @@ static int intel_pt_process_itrace_start(struct intel_pt *pt,
 					event->itrace_start.tid);
 }
 
+static int intel_pt_find_map(struct thread *thread, u8 cpumode, u64 addr,
+			     struct addr_location *al)
+{
+	if (!al->map || addr < al->map->start || addr >= al->map->end) {
+		if (!thread__find_map(thread, cpumode, addr, al))
+			return -1;
+	}
+
+	return 0;
+}
+
+/* Invalidate all instruction cache entries that overlap the text poke */
+static int intel_pt_text_poke(struct intel_pt *pt, union perf_event *event)
+{
+	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	u64 addr = event->text_poke.addr + event->text_poke.new_len - 1;
+	/* Assume text poke begins in a basic block no more than 4096 bytes */
+	int cnt = 4096 + event->text_poke.new_len;
+	struct thread *thread = pt->unknown_thread;
+	struct addr_location al = { .map = NULL };
+	struct machine *machine = pt->machine;
+	struct intel_pt_cache_entry *e;
+	u64 offset;
+
+	if (!event->text_poke.new_len)
+		return 0;
+
+	for (; cnt; cnt--, addr--) {
+		if (intel_pt_find_map(thread, cpumode, addr, &al)) {
+			if (addr < event->text_poke.addr)
+				return 0;
+			continue;
+		}
+
+		if (!al.map->dso || !al.map->dso->auxtrace_cache)
+			continue;
+
+		offset = al.map->map_ip(al.map, addr);
+
+		e = intel_pt_cache_lookup(al.map->dso, machine, offset);
+		if (!e)
+			continue;
+
+		if (addr + e->byte_cnt + e->length <= event->text_poke.addr) {
+			/*
+			 * No overlap. Working backwards there cannot be another
+			 * basic block that overlaps the text poke if there is a
+			 * branch instruction before the text poke address.
+			 */
+			if (e->branch != INTEL_PT_BR_NO_BRANCH)
+				return 0;
+		} else {
+			intel_pt_cache_invalidate(al.map->dso, machine, offset);
+			intel_pt_log("Invalidated instruction cache for %s at %#"PRIx64"\n",
+				     al.map->dso->long_name, addr);
+		}
+	}
+
+	return 0;
+}
+
 static int intel_pt_process_event(struct perf_session *session,
 				  union perf_event *event,
 				  struct perf_sample *sample,
@@ -2653,6 +2725,9 @@ static int intel_pt_process_event(struct perf_session *session,
 		 event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
 		err = intel_pt_context_switch(pt, event, sample);
 
+	if (!err && event->header.type == PERF_RECORD_TEXT_POKE)
+		err = intel_pt_text_poke(pt, event);
+
 	intel_pt_log("event %u: cpu %d time %"PRIu64" tsc %#"PRIx64" ",
 		     event->header.type, sample->cpu, sample->time, timestamp);
 	intel_pt_log_event(event);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages
  2020-03-04  9:06 ` [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
@ 2020-03-05  5:58   ` Masami Hiramatsu
  2020-03-05  6:10     ` Alexei Starovoitov
  2020-03-24 12:31   ` Peter Zijlstra
  1 sibling, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2020-03-05  5:58 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Masami Hiramatsu, Steven Rostedt,
	Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

On Wed,  4 Mar 2020 11:06:23 +0200
Adrian Hunter <adrian.hunter@intel.com> wrote:

> Symbols are needed for tools to describe instruction addresses. Pages
> allocated for kprobe's purposes need symbols to be created for them.
> Add such symbols to be visible via /proc/kallsyms.
> 
> Note: kprobe insn pages are not used if ftrace is configured. To see the
> effect of this patch, the kernel must be configured with:
> 
> 	# CONFIG_FUNCTION_TRACER is not set
> 	CONFIG_KPROBES=y
> 
> and for optimised kprobes:
> 
> 	CONFIG_OPTPROBES=y
> 
> Example on x86:
> 
> 	# perf probe __schedule
> 	Added new event:
> 	  probe:__schedule     (on __schedule)
> 	# cat /proc/kallsyms | grep '\[__builtin__kprobes\]'
> 	ffffffffc00d4000 t kprobe_insn_page     [__builtin__kprobes]
> 	ffffffffc00d6000 t kprobe_optinsn_page  [__builtin__kprobes]
> 
> Note: This patch adds "__builtin__kprobes" as a module name in
> /proc/kallsyms for symbols for pages allocated for kprobes' purposes, even
> though "__builtin__kprobes" is not a module.

Looks good to me.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

BTW, would you also make a patch to change [bpf] to [__builtin__bpf]?

Thanks,

> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  include/linux/kprobes.h | 15 ++++++++++++++
>  kernel/kallsyms.c       | 37 +++++++++++++++++++++++++++++----
>  kernel/kprobes.c        | 45 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 93 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 04bdaf01112c..62d682f47b5e 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -242,6 +242,7 @@ struct kprobe_insn_cache {
>  	struct mutex mutex;
>  	void *(*alloc)(void);	/* allocate insn page */
>  	void (*free)(void *);	/* free insn page */
> +	const char *sym;	/* symbol for insn pages */
>  	struct list_head pages; /* list of kprobe_insn_page */
>  	size_t insn_size;	/* size of instruction slot */
>  	int nr_garbage;
> @@ -272,6 +273,8 @@ static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
>  {									\
>  	return __is_insn_slot_addr(&kprobe_##__name##_slots, addr);	\
>  }
> +#define KPROBE_INSN_PAGE_SYM		"kprobe_insn_page"
> +#define KPROBE_OPTINSN_PAGE_SYM		"kprobe_optinsn_page"
>  #else /* __ARCH_WANT_KPROBES_INSN_SLOT */
>  #define DEFINE_INSN_CACHE_OPS(__name)					\
>  static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
> @@ -373,6 +376,13 @@ void dump_kprobe(struct kprobe *kp);
>  void *alloc_insn_page(void);
>  void free_insn_page(void *page);
>  
> +int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> +		       char *sym);
> +int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
> +			     unsigned long *value, char *type, char *sym);
> +
> +int arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> +			    char *type, char *sym);
>  #else /* !CONFIG_KPROBES: */
>  
>  static inline int kprobes_built_in(void)
> @@ -435,6 +445,11 @@ static inline bool within_kprobe_blacklist(unsigned long addr)
>  {
>  	return true;
>  }
> +static inline int kprobe_get_kallsym(unsigned int symnum, unsigned long *value,
> +				     char *type, char *sym)
> +{
> +	return -ERANGE;
> +}
>  #endif /* CONFIG_KPROBES */
>  static inline int disable_kretprobe(struct kretprobe *rp)
>  {
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 136ce049c4ad..4a93511e6243 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -24,6 +24,7 @@
>  #include <linux/slab.h>
>  #include <linux/filter.h>
>  #include <linux/ftrace.h>
> +#include <linux/kprobes.h>
>  #include <linux/compiler.h>
>  
>  /*
> @@ -438,6 +439,7 @@ struct kallsym_iter {
>  	loff_t pos_arch_end;
>  	loff_t pos_mod_end;
>  	loff_t pos_ftrace_mod_end;
> +	loff_t pos_bpf_end;
>  	unsigned long value;
>  	unsigned int nameoff; /* If iterating in core kernel symbols. */
>  	char type;
> @@ -497,11 +499,33 @@ static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)
>  
>  static int get_ksymbol_bpf(struct kallsym_iter *iter)
>  {
> +	int ret;
> +
>  	strlcpy(iter->module_name, "bpf", MODULE_NAME_LEN);
>  	iter->exported = 0;
> -	return bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
> -			       &iter->value, &iter->type,
> -			       iter->name) < 0 ? 0 : 1;
> +	ret = bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
> +			      &iter->value, &iter->type,
> +			      iter->name);
> +	if (ret < 0) {
> +		iter->pos_bpf_end = iter->pos;
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
> +/*
> + * This uses "__builtin__kprobes" as a module name for symbols for pages
> + * allocated for kprobes' purposes, even though "__builtin__kprobes" is not a
> + * module.
> + */
> +static int get_ksymbol_kprobe(struct kallsym_iter *iter)
> +{
> +	strlcpy(iter->module_name, "__builtin__kprobes", MODULE_NAME_LEN);
> +	iter->exported = 0;
> +	return kprobe_get_kallsym(iter->pos - iter->pos_bpf_end,
> +				  &iter->value, &iter->type,
> +				  iter->name) < 0 ? 0 : 1;
>  }
>  
>  /* Returns space to next name. */
> @@ -528,6 +552,7 @@ static void reset_iter(struct kallsym_iter *iter, loff_t new_pos)
>  		iter->pos_arch_end = 0;
>  		iter->pos_mod_end = 0;
>  		iter->pos_ftrace_mod_end = 0;
> +		iter->pos_bpf_end = 0;
>  	}
>  }
>  
> @@ -552,7 +577,11 @@ static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
>  	    get_ksymbol_ftrace_mod(iter))
>  		return 1;
>  
> -	return get_ksymbol_bpf(iter);
> +	if ((!iter->pos_bpf_end || iter->pos_bpf_end > pos) &&
> +	    get_ksymbol_bpf(iter))
> +		return 1;
> +
> +	return get_ksymbol_kprobe(iter);
>  }
>  
>  /* Returns false if pos at or past end of file. */
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..229d1b596690 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -118,6 +118,7 @@ struct kprobe_insn_cache kprobe_insn_slots = {
>  	.mutex = __MUTEX_INITIALIZER(kprobe_insn_slots.mutex),
>  	.alloc = alloc_insn_page,
>  	.free = free_insn_page,
> +	.sym = KPROBE_INSN_PAGE_SYM,
>  	.pages = LIST_HEAD_INIT(kprobe_insn_slots.pages),
>  	.insn_size = MAX_INSN_SIZE,
>  	.nr_garbage = 0,
> @@ -296,6 +297,7 @@ struct kprobe_insn_cache kprobe_optinsn_slots = {
>  	.mutex = __MUTEX_INITIALIZER(kprobe_optinsn_slots.mutex),
>  	.alloc = alloc_insn_page,
>  	.free = free_insn_page,
> +	.sym = KPROBE_OPTINSN_PAGE_SYM,
>  	.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
>  	/* .insn_size is initialized later */
>  	.nr_garbage = 0,
> @@ -2179,6 +2181,49 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
>  	return 0;
>  }
>  
> +int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
> +			     unsigned long *value, char *type, char *sym)
> +{
> +	struct kprobe_insn_page *kip;
> +	int ret = -ERANGE;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(kip, &c->pages, list) {
> +		if ((*symnum)--)
> +			continue;
> +		strlcpy(sym, c->sym, KSYM_NAME_LEN);
> +		*type = 't';
> +		*value = (unsigned long)kip->insns;
> +		ret = 0;
> +		break;
> +	}
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> +				   char *type, char *sym)
> +{
> +	return -ERANGE;
> +}
> +
> +int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> +		       char *sym)
> +{
> +#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
> +	if (!kprobe_cache_get_kallsym(&kprobe_insn_slots, &symnum, value, type, sym))
> +		return 0;
> +#ifdef CONFIG_OPTPROBES
> +	if (!kprobe_cache_get_kallsym(&kprobe_optinsn_slots, &symnum, value, type, sym))
> +		return 0;
> +#endif
> +#endif
> +	if (!arch_kprobe_get_kallsym(&symnum, value, type, sym))
> +		return 0;
> +	return -ERANGE;
> +}
> +
>  int __init __weak arch_populate_kprobe_blacklist(void)
>  {
>  	return 0;
> -- 
> 2.17.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages
  2020-03-05  5:58   ` Masami Hiramatsu
@ 2020-03-05  6:10     ` Alexei Starovoitov
  2020-03-05  9:04       ` Masami Hiramatsu
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2020-03-05  6:10 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Adrian Hunter, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Borislav Petkov, H . Peter Anvin, X86 ML, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, LKML

On Wed, Mar 4, 2020 at 10:01 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> >       # perf probe __schedule
> >       Added new event:
> >         probe:__schedule     (on __schedule)
> >       # cat /proc/kallsyms | grep '\[__builtin__kprobes\]'
> >       ffffffffc00d4000 t kprobe_insn_page     [__builtin__kprobes]
> >       ffffffffc00d6000 t kprobe_optinsn_page  [__builtin__kprobes]
> >
> > Note: This patch adds "__builtin__kprobes" as a module name in
> > /proc/kallsyms for symbols for pages allocated for kprobes' purposes, even
> > though "__builtin__kprobes" is not a module.
>
> Looks good to me.
>
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
>
> BTW, would you also make a patch to change [bpf] to [__builtin__bpf]?

Please do not.
There is nothing 'builtin' about bpf.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages
  2020-03-05  6:10     ` Alexei Starovoitov
@ 2020-03-05  9:04       ` Masami Hiramatsu
  0 siblings, 0 replies; 30+ messages in thread
From: Masami Hiramatsu @ 2020-03-05  9:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Adrian Hunter, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Borislav Petkov, H . Peter Anvin, X86 ML, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, LKML

On Wed, 4 Mar 2020 22:10:10 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Wed, Mar 4, 2020 at 10:01 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >
> > >       # perf probe __schedule
> > >       Added new event:
> > >         probe:__schedule     (on __schedule)
> > >       # cat /proc/kallsyms | grep '\[__builtin__kprobes\]'
> > >       ffffffffc00d4000 t kprobe_insn_page     [__builtin__kprobes]
> > >       ffffffffc00d6000 t kprobe_optinsn_page  [__builtin__kprobes]
> > >
> > > Note: This patch adds "__builtin__kprobes" as a module name in
> > > /proc/kallsyms for symbols for pages allocated for kprobes' purposes, even
> > > though "__builtin__kprobes" is not a module.
> >
> > Looks good to me.
> >
> > Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> >
> > BTW, would you also make a patch to change [bpf] to [__builtin__bpf]?
> 
> Please do not.
> There is nothing 'builtin' about bpf.

Hmm, so, would we reject bpf.ko to be loaded ?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 00/13] perf/x86: Add perf text poke events
  2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (12 preceding siblings ...)
  2020-03-04  9:06 ` [PATCH V4 13/13] perf intel-pt: Add support for text poke events Adrian Hunter
@ 2020-03-16  7:07 ` Adrian Hunter
  2020-03-24  9:29   ` Adrian Hunter
  13 siblings, 1 reply; 30+ messages in thread
From: Adrian Hunter @ 2020-03-16  7:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On 4/03/20 11:06 am, Adrian Hunter wrote:
> Hi
> 
> Here are patches to add a text poke event to record changes to kernel text
> (i.e. self-modifying code) in order to support tracers like Intel PT
> decoding through jump labels, kprobes and ftrace trampolines.
> 
> The first 8 patches make the kernel changes and the subsequent patches are
> tools changes.
> 
> The next 4 patches add support for updating perf tools' data cache
> with the changed bytes.
> 
> The last patch is an Intel PT specific tools change.
> 
> Patches also here:
> 
> 	git://git.infradead.org/users/ahunter/linux-perf.git text_poke

Any comments?

> 
> 
> Changes in V4
> 
>   kprobes: Add symbols for kprobe insn pages
> 
> 	Change "module name" from kprobe to __builtin__kprobes
> 	Added comment about "module name" use
> 
>   ftrace: Add symbols for ftrace trampolines
> 	
> 	Change "module name" from ftrace to __builtin__ftrace
> 	Move calls of ftrace_add_trampoline_to_kallsyms() and
> 	ftrace_remove_trampoline_from_kallsyms() into
> 	kernel/trace/ftrace.c
> 	Added comment about "module name" use
> 
>   ftrace: Add perf ksymbol events for ftrace trampolines
> 
> 	Move changes into kernel/trace/ftrace.c
> 
>   ftrace: Add perf text poke events for ftrace trampolines
> 
> 	Move changes into kernel/trace/ftrace.c
> 
> Changes in V3:
> 
>   perf: Add perf text poke event
> 
> 	To prevent warning, cast pointer to (unsigned long) not (u64)
> 
>   kprobes: Add symbols for kprobe insn pages
> 
> 	Expand commit message
> 	Remove un-needed declarations of kprobe_cache_get_kallsym() and arch_kprobe_get_kallsym() when !CONFIG_KPROBES
> 
>   ftrace: Add symbols for ftrace trampolines
> 
> 	Expand commit message
> 	Make ftrace_get_trampoline_kallsym() static
> 
> Changes in V2:
> 
>   perf: Add perf text poke event
> 
> 	Separate out x86 changes
> 	The text poke event now has old len and new len
> 	Revised commit message
> 
>   perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
> 
> 	New patch containing x86 changes from original first patch
> 
>   kprobes: Add symbols for kprobe insn pages
>   kprobes: Add perf ksymbol events for kprobe insn pages
>   perf/x86: Add perf text poke events for kprobes
>   ftrace: Add symbols for ftrace trampolines
>   ftrace: Add perf ksymbol events for ftrace trampolines
>   ftrace: Add perf text poke events for ftrace trampolines
>   perf kcore_copy: Fix module map when there are no modules loaded
>   perf evlist: Disable 'immediate' events last
> 
> 	New patches
> 
>   perf tools: Add support for PERF_RECORD_TEXT_POKE
> 
> 	The text poke event now has old len and new len
> 	Also select ksymbol events with text poke events
> 
>   perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL
> 
> 	New patch
> 
>   perf intel-pt: Add support for text poke events
> 
> 	The text poke event now has old len and new len
> 	Allow for the address not having a map yet
> 
> 
> Changes since RFC:
> 
>   Dropped 'flags' from the new event.  The consensus seemed to be that text
>   pokes should employ a scheme similar to x86's INT3 method instead.
> 
>   dropped tools patches already applied.
> 
> 
> Example:
> 
>   For jump labels, the kernel needs
> 	CONFIG_JUMP_LABEL=y
>   and also an easy to flip jump label is in sysctl_schedstats() which needs
> 	CONFIG_SCHEDSTATS=y
> 	CONFIG_PROC_SYSCTL=y
> 	CONFIG_SCHED_DEBUG=y
> 
>   Also note the 'sudo perf record' is put into the background which, as
>   written, needs sudo credential caching (otherwise the background task
>   will stop awaiting the sudo password), hence the 'sudo echo' to start.
> 
> Before:
> 
>   $ sudo echo
>   $ sudo perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
>   [1] 1640
>   $ cat /proc/sys/kernel/sched_schedstats
>   0
>   $ sudo bash -c 'echo 1 > /proc/sys/kernel/sched_schedstats'
>   $ cat /proc/sys/kernel/sched_schedstats
>   1
>   $ sudo bash -c 'echo 0 > /proc/sys/kernel/sched_schedstats'
>   $ cat /proc/sys/kernel/sched_schedstats
>   0
>   $ sudo kill 1640
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 16.635 MB perf.data.before ]
>   $ perf script -i perf.data.before --itrace=e >/dev/null
>   Warning:
>   1946 instruction trace errors
> 
> After:
> 
>   $ sudo echo
>   $ sudo perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
>   [1] 1882
>   $ cat /proc/sys/kernel/sched_schedstats
>   0
>   $ sudo bash -c 'echo 1 > /proc/sys/kernel/sched_schedstats'
>   $ cat /proc/sys/kernel/sched_schedstats
>   1
>   $ sudo bash -c 'echo 0 > /proc/sys/kernel/sched_schedstats'
>   $ cat /proc/sys/kernel/sched_schedstats
>   0
>   $ sudo kill 1882
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 10.893 MB perf.data.after ]
>   $ perf script -i perf.data.after --itrace=e
>   $
> 
> 
> Adrian Hunter (13):
>       perf: Add perf text poke event
>       perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
>       kprobes: Add symbols for kprobe insn pages
>       kprobes: Add perf ksymbol events for kprobe insn pages
>       perf/x86: Add perf text poke events for kprobes
>       ftrace: Add symbols for ftrace trampolines
>       ftrace: Add perf ksymbol events for ftrace trampolines
>       ftrace: Add perf text poke events for ftrace trampolines
>       perf kcore_copy: Fix module map when there are no modules loaded
>       perf evlist: Disable 'immediate' events last
>       perf tools: Add support for PERF_RECORD_TEXT_POKE
>       perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL
>       perf intel-pt: Add support for text poke events
> 
>  arch/x86/include/asm/kprobes.h            |   4 ++
>  arch/x86/include/asm/text-patching.h      |   2 +
>  arch/x86/kernel/alternative.c             |  70 +++++++++++++++++----
>  arch/x86/kernel/kprobes/core.c            |   7 +++
>  arch/x86/kernel/kprobes/opt.c             |  18 +++++-
>  include/linux/ftrace.h                    |  12 ++--
>  include/linux/kprobes.h                   |  15 +++++
>  include/linux/perf_event.h                |   8 +++
>  include/uapi/linux/perf_event.h           |  26 +++++++-
>  kernel/events/core.c                      |  90 +++++++++++++++++++++++++-
>  kernel/kallsyms.c                         |  42 +++++++++++--
>  kernel/kprobes.c                          |  57 +++++++++++++++++
>  kernel/trace/ftrace.c                     | 101 +++++++++++++++++++++++++++++-
>  tools/include/uapi/linux/perf_event.h     |  26 +++++++-
>  tools/lib/perf/include/perf/event.h       |   9 +++
>  tools/perf/arch/x86/util/intel-pt.c       |   4 ++
>  tools/perf/builtin-record.c               |  45 +++++++++++++
>  tools/perf/util/dso.c                     |   3 +
>  tools/perf/util/dso.h                     |   1 +
>  tools/perf/util/event.c                   |  40 ++++++++++++
>  tools/perf/util/event.h                   |   5 ++
>  tools/perf/util/evlist.c                  |  31 ++++++---
>  tools/perf/util/evlist.h                  |   1 +
>  tools/perf/util/evsel.c                   |   7 ++-
>  tools/perf/util/intel-pt.c                |  75 ++++++++++++++++++++++
>  tools/perf/util/machine.c                 |  49 +++++++++++++++
>  tools/perf/util/machine.h                 |   3 +
>  tools/perf/util/map.c                     |   5 ++
>  tools/perf/util/map.h                     |   3 +-
>  tools/perf/util/perf_event_attr_fprintf.c |   1 +
>  tools/perf/util/record.c                  |  10 +++
>  tools/perf/util/record.h                  |   1 +
>  tools/perf/util/session.c                 |  23 +++++++
>  tools/perf/util/symbol-elf.c              |   7 +++
>  tools/perf/util/symbol.c                  |   1 +
>  tools/perf/util/tool.h                    |   3 +-
>  36 files changed, 765 insertions(+), 40 deletions(-)
> 
> 
> Regards
> Adrian
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 00/13] perf/x86: Add perf text poke events
  2020-03-16  7:07 ` [PATCH V4 00/13] perf/x86: Add perf " Adrian Hunter
@ 2020-03-24  9:29   ` Adrian Hunter
  0 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-24  9:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On 16/03/20 9:07 am, Adrian Hunter wrote:
> On 4/03/20 11:06 am, Adrian Hunter wrote:
>> Hi
>>
>> Here are patches to add a text poke event to record changes to kernel text
>> (i.e. self-modifying code) in order to support tracers like Intel PT
>> decoding through jump labels, kprobes and ftrace trampolines.
>>
>> The first 8 patches make the kernel changes and the subsequent patches are
>> tools changes.
>>
>> The next 4 patches add support for updating perf tools' data cache
>> with the changed bytes.
>>
>> The last patch is an Intel PT specific tools change.
>>
>> Patches also here:
>>
>> 	git://git.infradead.org/users/ahunter/linux-perf.git text_poke
> 
> Any comments?

Peter, do you have any comments on the first 2 patches?  They are pretty
much what we have already discussed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes
  2020-03-04  9:06 ` [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes Adrian Hunter
@ 2020-03-24 12:21   ` Peter Zijlstra
  2020-03-26  1:58     ` Masami Hiramatsu
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2020-03-24 12:21 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On Wed, Mar 04, 2020 at 11:06:25AM +0200, Adrian Hunter wrote:

> diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
> index 67315fa3956a..13bb51a7789c 100644
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -45,6 +45,8 @@ extern void *text_poke(void *addr, const void *opcode, size_t len);
>  extern void text_poke_sync(void);
>  extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
>  extern int poke_int3_handler(struct pt_regs *regs);
> +extern void __text_poke_bp(void *addr, const void *opcode, size_t len,
> +			   const void *emulate, const u8 *oldptr);
>  extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
>  
>  extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate);
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 737e7a842f85..c8cfc97abc9e 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1075,6 +1075,7 @@ static int tp_vec_nr;
>   * text_poke_bp_batch() -- update instructions on live kernel on SMP
>   * @tp:			vector of instructions to patch
>   * @nr_entries:		number of entries in the vector
> + * @oldptr:		pointer to original old insn byte
>   *
>   * Modify multi-byte instruction by using int3 breakpoint on SMP.
>   * We completely avoid stop_machine() here, and achieve the
> @@ -1092,7 +1093,8 @@ static int tp_vec_nr;
>   *		  replacing opcode
>   *	- sync cores
>   */
> -static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
> +static void text_poke_bp_batch(struct text_poke_loc *tp,
> +			       unsigned int nr_entries, const u8 *oldptr)
>  {
>  	struct bp_patching_desc desc = {
>  		.vec = tp,
> @@ -1117,7 +1119,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
>  	 * First step: add a int3 trap to the address that will be patched.
>  	 */
>  	for (i = 0; i < nr_entries; i++) {
> -		tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
> +		tp[i].old = oldptr ? *oldptr : *(u8 *)text_poke_addr(&tp[i]);
>  		text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
>  	}
>  
> @@ -1274,7 +1276,7 @@ static bool tp_order_fail(void *addr)
>  static void text_poke_flush(void *addr)
>  {
>  	if (tp_vec_nr == TP_VEC_MAX || tp_order_fail(addr)) {
> -		text_poke_bp_batch(tp_vec, tp_vec_nr);
> +		text_poke_bp_batch(tp_vec, tp_vec_nr, NULL);
>  		tp_vec_nr = 0;
>  	}
>  }
> @@ -1299,6 +1301,20 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
>  	text_poke_loc_init(tp, addr, opcode, len, emulate);
>  }
>  
> +void __ref __text_poke_bp(void *addr, const void *opcode, size_t len,
> +			  const void *emulate, const u8 *oldptr)
> +{
> +	struct text_poke_loc tp;
> +
> +	if (unlikely(system_state == SYSTEM_BOOTING)) {
> +		text_poke_early(addr, opcode, len);
> +		return;
> +	}
> +
> +	text_poke_loc_init(&tp, addr, opcode, len, emulate);
> +	text_poke_bp_batch(&tp, 1, oldptr);
> +}
> +
>  /**
>   * text_poke_bp() -- update instructions on live kernel on SMP
>   * @addr:	address to patch
> @@ -1310,15 +1326,8 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
>   * dynamically allocated memory. This function should be used when it is
>   * not possible to allocate memory.
>   */
> -void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate)
> +void __ref text_poke_bp(void *addr, const void *opcode, size_t len,
> +			const void *emulate)
>  {
> -	struct text_poke_loc tp;
> -
> -	if (unlikely(system_state == SYSTEM_BOOTING)) {
> -		text_poke_early(addr, opcode, len);
> -		return;
> -	}
> -
> -	text_poke_loc_init(&tp, addr, opcode, len, emulate);
> -	text_poke_bp_batch(&tp, 1);
> +	return __text_poke_bp(addr, opcode, len, emulate, NULL);
>  }

So all of that ^, is needed for this v ??


> @@ -439,7 +447,8 @@ void arch_optimize_kprobes(struct list_head *oplist)
>  		insn_buff[0] = JMP32_INSN_OPCODE;
>  		*(s32 *)(&insn_buff[1]) = rel;
>  
> -		text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE, NULL);
> +		__text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE, NULL,
> +			       &op->kp.opcode);
>  
>  		list_del_init(&op->list);
>  	}

That seems 'unfortunate'...

We optimize only after having already installed a regular probe, that
is, what we're actually doing here is replacing INT3 with a JMP.d32. But
the above will make it appear as if we're replacing the original text
with a JMP.d32. Which doesn't make sense, since we've already poked an
INT3 there and that poke will have had a corresponding
perf_event_text_poke(), right? (except you didn't, see below)

At this point we'll already have constructed the optprobe trampoline,
which contains however much of the original instruction (in whole) as
will be overwritten by our 5 byte JMP.d32. And IIUC, we'll have a
perf_event_text_poke() event for the whole of that already -- except I
can't find that in the patches (again, see below).

> @@ -454,9 +463,16 @@ void arch_optimize_kprobes(struct list_head *oplist)
>   */
>  void arch_unoptimize_kprobe(struct optimized_kprobe *op)
>  {
> +	u8 old[POKE_MAX_OPCODE_SIZE];
> +	u8 new[POKE_MAX_OPCODE_SIZE] = { op->kp.opcode, };
> +	size_t len = INT3_INSN_SIZE + DISP32_SIZE;
> +
> +	memcpy(old, op->kp.addr, len);
>  	arch_arm_kprobe(&op->kp);
>  	text_poke(op->kp.addr + INT3_INSN_SIZE,
>  		  op->optinsn.copied_insn, DISP32_SIZE);
> +	memcpy(new + INT3_INSN_SIZE, op->optinsn.copied_insn, DISP32_SIZE);

And then this is 'wrong' too. You've not written the original
instruction, you've just written an INT3.

> +	perf_event_text_poke(op->kp.addr, old, len, new, len);
>  	text_poke_sync();
>  }


So how about something like the below, with it you'll get 6 text_poke
events:

1:  old0 -> INT3

  // kprobe active

2:  NULL -> optprobe_trampoline
3:  INT3,old1,old2,old3,old4 -> JMP32

  // optprobe active

4:  JMP32 -> INT3,old1,old2,old3,old4
5:  optprobe_trampoline -> NULL

  // kprobe active

6:  INT3 -> old0



Masami, did I get this all right?


---
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -502,12 +502,18 @@ int arch_prepare_kprobe(struct kprobe *p
 
 void arch_arm_kprobe(struct kprobe *p)
 {
-	text_poke(p->addr, ((unsigned char []){INT3_INSN_OPCODE}), 1);
+	u8 int3 = INT3_INSN_OPCODE;
+
+	text_poke(p->addr, &int3, 1);
 	text_poke_sync();
+	perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1);
 }
 
 void arch_disarm_kprobe(struct kprobe *p)
 {
+	u8 int3 = INT3_INSN_OPCODE;
+
+	perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1);
 	text_poke(p->addr, &p->opcode, 1);
 	text_poke_sync();
 }
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -356,8 +356,14 @@ int arch_within_optimized_kprobe(struct
 static
 void __arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
 {
-	if (op->optinsn.insn) {
-		free_optinsn_slot(op->optinsn.insn, dirty);
+	u8 *slot = op->optinsn.insn;
+	if (slot) {
+		int len = TMPL_END_IDX + op->optinsn.size + JMP32_INSN_SIZE;
+
+		if (dirty)
+			perf_event_text_poke(slot, slot, len, NULL, 0);
+
+		free_optinsn_slot(slot, dirty);
 		op->optinsn.insn = NULL;
 		op->optinsn.size = 0;
 	}
@@ -429,7 +435,9 @@ int arch_prepare_optimized_kprobe(struct
 	len += JMP32_INSN_SIZE;
 
 	/* We have to use text_poke() for instruction buffer because it is RO */
+	perf_event_text_poke(slot, NULL, 0, buf, len);
 	text_poke(slot, buf, len);
+
 	ret = 0;
 out:
 	kfree(buf);
@@ -481,10 +489,23 @@ void arch_optimize_kprobes(struct list_h
  */
 void arch_unoptimize_kprobe(struct optimized_kprobe *op)
 {
-	arch_arm_kprobe(&op->kp);
-	text_poke(op->kp.addr + INT3_INSN_SIZE,
-		  op->optinsn.copied_insn, DISP32_SIZE);
+	u8 new[JMP32_INSN_SIZE] = { INT3_INSN_OPCODE, };
+	u8 old[JMP32_INSN_SIZE];
+	u8 *addr = op->kp.addr;
+
+	memcpy(old, op->kp.addr, JMP32_INSN_SIZE);
+	memcpy(new + INT3_INSN_SIZE,
+	       op->optinsn.copied_insn + INT3_INSN_SIZE,
+	       JMP32_INSN_SIZE - INT3_INSN_SIZE);
+
+	text_poke(addr, new, INT3_INSN_SIZE);
+	text_poke_sync();
+	text_poke(addr + INT3_INSN_SIZE,
+		  new + INT3_INSN_SIZE,
+		  JMP32_INSN_SIZE - INT3_INSN_SIZE);
 	text_poke_sync();
+
+	perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE);
 }
 
 /*

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages
  2020-03-04  9:06 ` [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
  2020-03-05  5:58   ` Masami Hiramatsu
@ 2020-03-24 12:31   ` Peter Zijlstra
  2020-03-24 12:54     ` Adrian Hunter
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2020-03-24 12:31 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On Wed, Mar 04, 2020 at 11:06:23AM +0200, Adrian Hunter wrote:
> Symbols are needed for tools to describe instruction addresses. Pages
> allocated for kprobe's purposes need symbols to be created for them.
> Add such symbols to be visible via /proc/kallsyms.
> 
> Note: kprobe insn pages are not used if ftrace is configured. To see the
> effect of this patch, the kernel must be configured with:
> 
> 	# CONFIG_FUNCTION_TRACER is not set
> 	CONFIG_KPROBES=y
> 
> and for optimised kprobes:
> 
> 	CONFIG_OPTPROBES=y
> 
> Example on x86:
> 
> 	# perf probe __schedule
> 	Added new event:
> 	  probe:__schedule     (on __schedule)
> 	# cat /proc/kallsyms | grep '\[__builtin__kprobes\]'
> 	ffffffffc00d4000 t kprobe_insn_page     [__builtin__kprobes]
> 	ffffffffc00d6000 t kprobe_optinsn_page  [__builtin__kprobes]
> 

I'm confused; why are you iterating pages and not slots? A 'page' is not
a symbol, they contain text, sometimes.

If you iterate slots you can even get them a proper name; something
like:

	optinsn-sym+xxx [__builtin__kprobes]

	insn-sym+xxx [__builtin__kprobes]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages
  2020-03-24 12:31   ` Peter Zijlstra
@ 2020-03-24 12:54     ` Adrian Hunter
  0 siblings, 0 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-24 12:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On 24/03/20 2:31 pm, Peter Zijlstra wrote:
> On Wed, Mar 04, 2020 at 11:06:23AM +0200, Adrian Hunter wrote:
>> Symbols are needed for tools to describe instruction addresses. Pages
>> allocated for kprobe's purposes need symbols to be created for them.
>> Add such symbols to be visible via /proc/kallsyms.
>>
>> Note: kprobe insn pages are not used if ftrace is configured. To see the
>> effect of this patch, the kernel must be configured with:
>>
>> 	# CONFIG_FUNCTION_TRACER is not set
>> 	CONFIG_KPROBES=y
>>
>> and for optimised kprobes:
>>
>> 	CONFIG_OPTPROBES=y
>>
>> Example on x86:
>>
>> 	# perf probe __schedule
>> 	Added new event:
>> 	  probe:__schedule     (on __schedule)
>> 	# cat /proc/kallsyms | grep '\[__builtin__kprobes\]'
>> 	ffffffffc00d4000 t kprobe_insn_page     [__builtin__kprobes]
>> 	ffffffffc00d6000 t kprobe_optinsn_page  [__builtin__kprobes]
>>
> 
> I'm confused; why are you iterating pages and not slots? A 'page' is not
> a symbol, they contain text, sometimes.

A symbol for each slot is not necessary, and it doesn't look like the slots
can be walked without taking a mutex, which is a problem for kallsyms.

> 
> If you iterate slots you can even get them a proper name; something
> like:
> 
> 	optinsn-sym+xxx [__builtin__kprobes]
> 
> 	insn-sym+xxx [__builtin__kprobes]
> 

Addresses resolve to the previous symbol plus an offset, so it seems to me
that the only difference in what you are proposing is the symbol name, which
can be changed if you like.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes
  2020-03-24 12:21   ` Peter Zijlstra
@ 2020-03-26  1:58     ` Masami Hiramatsu
  2020-03-26  7:42       ` Adrian Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2020-03-26  1:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Ingo Molnar, Masami Hiramatsu, Steven Rostedt,
	Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

On Tue, 24 Mar 2020 13:21:50 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> We optimize only after having already installed a regular probe, that
> is, what we're actually doing here is replacing INT3 with a JMP.d32. But
> the above will make it appear as if we're replacing the original text
> with a JMP.d32. Which doesn't make sense, since we've already poked an
> INT3 there and that poke will have had a corresponding
> perf_event_text_poke(), right? (except you didn't, see below)
> 
> At this point we'll already have constructed the optprobe trampoline,
> which contains however much of the original instruction (in whole) as
> will be overwritten by our 5 byte JMP.d32. And IIUC, we'll have a
> perf_event_text_poke() event for the whole of that already -- except I
> can't find that in the patches (again, see below).

Thanks Peter to point it out.

> 
> > @@ -454,9 +463,16 @@ void arch_optimize_kprobes(struct list_head *oplist)
> >   */
> >  void arch_unoptimize_kprobe(struct optimized_kprobe *op)
> >  {
> > +	u8 old[POKE_MAX_OPCODE_SIZE];
> > +	u8 new[POKE_MAX_OPCODE_SIZE] = { op->kp.opcode, };
> > +	size_t len = INT3_INSN_SIZE + DISP32_SIZE;
> > +
> > +	memcpy(old, op->kp.addr, len);
> >  	arch_arm_kprobe(&op->kp);
> >  	text_poke(op->kp.addr + INT3_INSN_SIZE,
> >  		  op->optinsn.copied_insn, DISP32_SIZE);
> > +	memcpy(new + INT3_INSN_SIZE, op->optinsn.copied_insn, DISP32_SIZE);
> 
> And then this is 'wrong' too. You've not written the original
> instruction, you've just written an INT3.
> 
> > +	perf_event_text_poke(op->kp.addr, old, len, new, len);
> >  	text_poke_sync();
> >  }
> 
> 
> So how about something like the below, with it you'll get 6 text_poke
> events:
> 
> 1:  old0 -> INT3
> 
>   // kprobe active
> 
> 2:  NULL -> optprobe_trampoline
> 3:  INT3,old1,old2,old3,old4 -> JMP32
> 
>   // optprobe active
> 
> 4:  JMP32 -> INT3,old1,old2,old3,old4
> 5:  optprobe_trampoline -> NULL
> 
>   // kprobe active
> 
> 6:  INT3 -> old0
> 
> 
> 
> Masami, did I get this all right?

Yes, you understand correctly. And there is also boosted kprobe
which runs probe.ainsn.insn directly and jump back to old place.
I guess it will also disturb Intel PT.

0:  NULL -> probe.ainsn.insn (if ainsn.boostable && !kp.post_handler)

> 1:  old0 -> INT3
> 
  // boosted kprobe active
> 
> 2:  NULL -> optprobe_trampoline
> 3:  INT3,old1,old2,old3,old4 -> JMP32
> 
>   // optprobe active
> 
> 4:  JMP32 -> INT3,old1,old2,old3,old4

   // optprobe disabled and kprobe active (this sometimes goes back to 3)

> 5:  optprobe_trampoline -> NULL
> 
  // boosted kprobe active
> 
> 6:  INT3 -> old0

7:  probe.ainsn.insn -> NULL (if ainsn.boostable && !kp.post_handler)

So you'll get 8 events in max.

Adrian, would you also need to trace the buffer which is used for
single stepping? If so, as you did, we need to trace p->ainsn.insn
always.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes
  2020-03-26  1:58     ` Masami Hiramatsu
@ 2020-03-26  7:42       ` Adrian Hunter
  2020-03-27  8:36         ` [PATCH V5 " Adrian Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: Adrian Hunter @ 2020-03-26  7:42 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Borislav Petkov, H . Peter Anvin,
	x86, Mark Rutland, Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

On 26/03/20 3:58 am, Masami Hiramatsu wrote:
> On Tue, 24 Mar 2020 13:21:50 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> We optimize only after having already installed a regular probe, that
>> is, what we're actually doing here is replacing INT3 with a JMP.d32. But
>> the above will make it appear as if we're replacing the original text
>> with a JMP.d32. Which doesn't make sense, since we've already poked an
>> INT3 there and that poke will have had a corresponding
>> perf_event_text_poke(), right? (except you didn't, see below)
>>
>> At this point we'll already have constructed the optprobe trampoline,
>> which contains however much of the original instruction (in whole) as
>> will be overwritten by our 5 byte JMP.d32. And IIUC, we'll have a
>> perf_event_text_poke() event for the whole of that already -- except I
>> can't find that in the patches (again, see below).
> 
> Thanks Peter to point it out.
> 
>>
>>> @@ -454,9 +463,16 @@ void arch_optimize_kprobes(struct list_head *oplist)
>>>   */
>>>  void arch_unoptimize_kprobe(struct optimized_kprobe *op)
>>>  {
>>> +	u8 old[POKE_MAX_OPCODE_SIZE];
>>> +	u8 new[POKE_MAX_OPCODE_SIZE] = { op->kp.opcode, };
>>> +	size_t len = INT3_INSN_SIZE + DISP32_SIZE;
>>> +
>>> +	memcpy(old, op->kp.addr, len);
>>>  	arch_arm_kprobe(&op->kp);
>>>  	text_poke(op->kp.addr + INT3_INSN_SIZE,
>>>  		  op->optinsn.copied_insn, DISP32_SIZE);
>>> +	memcpy(new + INT3_INSN_SIZE, op->optinsn.copied_insn, DISP32_SIZE);
>>
>> And then this is 'wrong' too. You've not written the original
>> instruction, you've just written an INT3.
>>
>>> +	perf_event_text_poke(op->kp.addr, old, len, new, len);
>>>  	text_poke_sync();
>>>  }
>>
>>
>> So how about something like the below, with it you'll get 6 text_poke
>> events:
>>
>> 1:  old0 -> INT3
>>
>>   // kprobe active
>>
>> 2:  NULL -> optprobe_trampoline
>> 3:  INT3,old1,old2,old3,old4 -> JMP32
>>
>>   // optprobe active
>>
>> 4:  JMP32 -> INT3,old1,old2,old3,old4
>> 5:  optprobe_trampoline -> NULL
>>
>>   // kprobe active
>>
>> 6:  INT3 -> old0
>>
>>
>>
>> Masami, did I get this all right?
> 
> Yes, you understand correctly. And there is also boosted kprobe
> which runs probe.ainsn.insn directly and jump back to old place.
> I guess it will also disturb Intel PT.
> 
> 0:  NULL -> probe.ainsn.insn (if ainsn.boostable && !kp.post_handler)
> 
>> 1:  old0 -> INT3
>>
>   // boosted kprobe active
>>
>> 2:  NULL -> optprobe_trampoline
>> 3:  INT3,old1,old2,old3,old4 -> JMP32
>>
>>   // optprobe active
>>
>> 4:  JMP32 -> INT3,old1,old2,old3,old4
> 
>    // optprobe disabled and kprobe active (this sometimes goes back to 3)
> 
>> 5:  optprobe_trampoline -> NULL
>>
>   // boosted kprobe active
>>
>> 6:  INT3 -> old0
> 
> 7:  probe.ainsn.insn -> NULL (if ainsn.boostable && !kp.post_handler)
> 
> So you'll get 8 events in max.
> 
> Adrian, would you also need to trace the buffer which is used for
> single stepping? If so, as you did, we need to trace p->ainsn.insn
> always.

Peter's simplification (thanks for that I will test it) didn't look
at that aspect but it was covered in the original patch in the chunk
below.  That will be included in the next version also.

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 579d30e91a36..12ea05d923ec 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -33,6 +33,7 @@
 #include <linux/hardirq.h>
 #include <linux/preempt.h>
 #include <linux/sched/debug.h>
+#include <linux/perf_event.h>
 #include <linux/extable.h>
 #include <linux/kdebug.h>
 #include <linux/kallsyms.h>
@@ -470,6 +471,9 @@ static int arch_copy_kprobe(struct kprobe *p)
 	/* Also, displacement change doesn't affect the first byte */
 	p->opcode = buf[0];
 
+	p->ainsn.tp_len = len;
+	perf_event_text_poke(p->ainsn.insn, NULL, 0, buf, len);
+
 	/* OK, write back the instruction(s) into ROX insn buffer */
 	text_poke(p->ainsn.insn, buf, len);
 
@@ -514,6 +518,9 @@ void arch_disarm_kprobe(struct kprobe *p)
 void arch_remove_kprobe(struct kprobe *p)
 {
 	if (p->ainsn.insn) {
+		/* Record the perf event before freeing the slot */
+		perf_event_text_poke(p->ainsn.insn, p->ainsn.insn,
+				     p->ainsn.tp_len, NULL, 0);
 		free_insn_slot(p->ainsn.insn, p->ainsn.boostable);
 		p->ainsn.insn = NULL;
 	}



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V5 05/13] perf/x86: Add perf text poke events for kprobes
  2020-03-26  7:42       ` Adrian Hunter
@ 2020-03-27  8:36         ` Adrian Hunter
  2020-03-31 23:44           ` Masami Hiramatsu
  2020-04-01 10:13           ` Peter Zijlstra
  0 siblings, 2 replies; 30+ messages in thread
From: Adrian Hunter @ 2020-03-27  8:36 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Borislav Petkov, H . Peter Anvin,
	x86, Mark Rutland, Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Add perf text poke events for kprobes. That includes:

 - the replaced instruction(s) which are executed out-of-line
   i.e. arch_copy_kprobe() and arch_remove_kprobe()

 - optimised kprobe function
   i.e. arch_prepare_optimized_kprobe() and
      __arch_remove_optimized_kprobe()

 - optimised kprobe
   i.e. arch_optimize_kprobes() and arch_unoptimize_kprobe()

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---


Changes in V5:

	Simplify optimized kprobes events (Peter)


 arch/x86/include/asm/kprobes.h |  2 ++
 arch/x86/kernel/kprobes/core.c | 15 +++++++++++++-
 arch/x86/kernel/kprobes/opt.c  | 38 +++++++++++++++++++++++++++++-----
 3 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 95b1f053bd96..ee669cdb5709 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -65,6 +65,8 @@ struct arch_specific_insn {
 	 */
 	bool boostable;
 	bool if_modifier;
+	/* Number of bytes of text poked */
+	int tp_len;
 };
 
 struct arch_optimized_insn {
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 579d30e91a36..8513594bfed1 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -33,6 +33,7 @@
 #include <linux/hardirq.h>
 #include <linux/preempt.h>
 #include <linux/sched/debug.h>
+#include <linux/perf_event.h>
 #include <linux/extable.h>
 #include <linux/kdebug.h>
 #include <linux/kallsyms.h>
@@ -470,6 +471,9 @@ static int arch_copy_kprobe(struct kprobe *p)
 	/* Also, displacement change doesn't affect the first byte */
 	p->opcode = buf[0];
 
+	p->ainsn.tp_len = len;
+	perf_event_text_poke(p->ainsn.insn, NULL, 0, buf, len);
+
 	/* OK, write back the instruction(s) into ROX insn buffer */
 	text_poke(p->ainsn.insn, buf, len);
 
@@ -501,12 +505,18 @@ int arch_prepare_kprobe(struct kprobe *p)
 
 void arch_arm_kprobe(struct kprobe *p)
 {
-	text_poke(p->addr, ((unsigned char []){INT3_INSN_OPCODE}), 1);
+	u8 int3 = INT3_INSN_OPCODE;
+
+	text_poke(p->addr, &int3, 1);
 	text_poke_sync();
+	perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1);
 }
 
 void arch_disarm_kprobe(struct kprobe *p)
 {
+	u8 int3 = INT3_INSN_OPCODE;
+
+	perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1);
 	text_poke(p->addr, &p->opcode, 1);
 	text_poke_sync();
 }
@@ -514,6 +524,9 @@ void arch_disarm_kprobe(struct kprobe *p)
 void arch_remove_kprobe(struct kprobe *p)
 {
 	if (p->ainsn.insn) {
+		/* Record the perf event before freeing the slot */
+		perf_event_text_poke(p->ainsn.insn, p->ainsn.insn,
+				     p->ainsn.tp_len, NULL, 0);
 		free_insn_slot(p->ainsn.insn, p->ainsn.boostable);
 		p->ainsn.insn = NULL;
 	}
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 3f45b5c43a71..b1072c47b595 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -6,6 +6,7 @@
  * Copyright (C) Hitachi Ltd., 2012
  */
 #include <linux/kprobes.h>
+#include <linux/perf_event.h>
 #include <linux/ptrace.h>
 #include <linux/string.h>
 #include <linux/slab.h>
@@ -331,8 +332,15 @@ int arch_within_optimized_kprobe(struct optimized_kprobe *op,
 static
 void __arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
 {
-	if (op->optinsn.insn) {
-		free_optinsn_slot(op->optinsn.insn, dirty);
+	u8 *slot = op->optinsn.insn;
+	if (slot) {
+		int len = TMPL_END_IDX + op->optinsn.size + JMP32_INSN_SIZE;
+
+		/* Record the perf event before freeing the slot */
+		if (dirty)
+			perf_event_text_poke(slot, slot, len, NULL, 0);
+
+		free_optinsn_slot(slot, dirty);
 		op->optinsn.insn = NULL;
 		op->optinsn.size = 0;
 	}
@@ -401,8 +409,15 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
 			   (u8 *)op->kp.addr + op->optinsn.size);
 	len += JMP32_INSN_SIZE;
 
+	/*
+	 * Note	len = TMPL_END_IDX + op->optinsn.size + JMP32_INSN_SIZE is also
+	 * used in __arch_remove_optimized_kprobe().
+	 */
+
 	/* We have to use text_poke() for instruction buffer because it is RO */
+	perf_event_text_poke(slot, NULL, 0, buf, len);
 	text_poke(slot, buf, len);
+
 	ret = 0;
 out:
 	kfree(buf);
@@ -454,10 +469,23 @@ void arch_optimize_kprobes(struct list_head *oplist)
  */
 void arch_unoptimize_kprobe(struct optimized_kprobe *op)
 {
-	arch_arm_kprobe(&op->kp);
-	text_poke(op->kp.addr + INT3_INSN_SIZE,
-		  op->optinsn.copied_insn, DISP32_SIZE);
+	u8 new[JMP32_INSN_SIZE] = { INT3_INSN_OPCODE, };
+	u8 old[JMP32_INSN_SIZE];
+	u8 *addr = op->kp.addr;
+
+	memcpy(old, op->kp.addr, JMP32_INSN_SIZE);
+	memcpy(new + INT3_INSN_SIZE,
+	       op->optinsn.copied_insn,
+	       JMP32_INSN_SIZE - INT3_INSN_SIZE);
+
+	text_poke(addr, new, INT3_INSN_SIZE);
 	text_poke_sync();
+	text_poke(addr + INT3_INSN_SIZE,
+		  new + INT3_INSN_SIZE,
+		  JMP32_INSN_SIZE - INT3_INSN_SIZE);
+	text_poke_sync();
+
+	perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 05/13] perf/x86: Add perf text poke events for kprobes
  2020-03-27  8:36         ` [PATCH V5 " Adrian Hunter
@ 2020-03-31 23:44           ` Masami Hiramatsu
  2020-04-01 10:13           ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Masami Hiramatsu @ 2020-03-31 23:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On Fri, 27 Mar 2020 10:36:09 +0200
Adrian Hunter <adrian.hunter@intel.com> wrote:

> Add perf text poke events for kprobes. That includes:
> 
>  - the replaced instruction(s) which are executed out-of-line
>    i.e. arch_copy_kprobe() and arch_remove_kprobe()
> 
>  - optimised kprobe function
>    i.e. arch_prepare_optimized_kprobe() and
>       __arch_remove_optimized_kprobe()
> 
>  - optimised kprobe
>    i.e. arch_optimize_kprobes() and arch_unoptimize_kprobe()
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

This looks good to me.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

Thank you!

> ---
> 
> 
> Changes in V5:
> 
> 	Simplify optimized kprobes events (Peter)
> 
> 
>  arch/x86/include/asm/kprobes.h |  2 ++
>  arch/x86/kernel/kprobes/core.c | 15 +++++++++++++-
>  arch/x86/kernel/kprobes/opt.c  | 38 +++++++++++++++++++++++++++++-----
>  3 files changed, 49 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
> index 95b1f053bd96..ee669cdb5709 100644
> --- a/arch/x86/include/asm/kprobes.h
> +++ b/arch/x86/include/asm/kprobes.h
> @@ -65,6 +65,8 @@ struct arch_specific_insn {
>  	 */
>  	bool boostable;
>  	bool if_modifier;
> +	/* Number of bytes of text poked */
> +	int tp_len;
>  };
>  
>  struct arch_optimized_insn {
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 579d30e91a36..8513594bfed1 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -33,6 +33,7 @@
>  #include <linux/hardirq.h>
>  #include <linux/preempt.h>
>  #include <linux/sched/debug.h>
> +#include <linux/perf_event.h>
>  #include <linux/extable.h>
>  #include <linux/kdebug.h>
>  #include <linux/kallsyms.h>
> @@ -470,6 +471,9 @@ static int arch_copy_kprobe(struct kprobe *p)
>  	/* Also, displacement change doesn't affect the first byte */
>  	p->opcode = buf[0];
>  
> +	p->ainsn.tp_len = len;
> +	perf_event_text_poke(p->ainsn.insn, NULL, 0, buf, len);
> +
>  	/* OK, write back the instruction(s) into ROX insn buffer */
>  	text_poke(p->ainsn.insn, buf, len);
>  
> @@ -501,12 +505,18 @@ int arch_prepare_kprobe(struct kprobe *p)
>  
>  void arch_arm_kprobe(struct kprobe *p)
>  {
> -	text_poke(p->addr, ((unsigned char []){INT3_INSN_OPCODE}), 1);
> +	u8 int3 = INT3_INSN_OPCODE;
> +
> +	text_poke(p->addr, &int3, 1);
>  	text_poke_sync();
> +	perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1);
>  }
>  
>  void arch_disarm_kprobe(struct kprobe *p)
>  {
> +	u8 int3 = INT3_INSN_OPCODE;
> +
> +	perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1);
>  	text_poke(p->addr, &p->opcode, 1);
>  	text_poke_sync();
>  }
> @@ -514,6 +524,9 @@ void arch_disarm_kprobe(struct kprobe *p)
>  void arch_remove_kprobe(struct kprobe *p)
>  {
>  	if (p->ainsn.insn) {
> +		/* Record the perf event before freeing the slot */
> +		perf_event_text_poke(p->ainsn.insn, p->ainsn.insn,
> +				     p->ainsn.tp_len, NULL, 0);
>  		free_insn_slot(p->ainsn.insn, p->ainsn.boostable);
>  		p->ainsn.insn = NULL;
>  	}
> diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
> index 3f45b5c43a71..b1072c47b595 100644
> --- a/arch/x86/kernel/kprobes/opt.c
> +++ b/arch/x86/kernel/kprobes/opt.c
> @@ -6,6 +6,7 @@
>   * Copyright (C) Hitachi Ltd., 2012
>   */
>  #include <linux/kprobes.h>
> +#include <linux/perf_event.h>
>  #include <linux/ptrace.h>
>  #include <linux/string.h>
>  #include <linux/slab.h>
> @@ -331,8 +332,15 @@ int arch_within_optimized_kprobe(struct optimized_kprobe *op,
>  static
>  void __arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
>  {
> -	if (op->optinsn.insn) {
> -		free_optinsn_slot(op->optinsn.insn, dirty);
> +	u8 *slot = op->optinsn.insn;
> +	if (slot) {
> +		int len = TMPL_END_IDX + op->optinsn.size + JMP32_INSN_SIZE;
> +
> +		/* Record the perf event before freeing the slot */
> +		if (dirty)
> +			perf_event_text_poke(slot, slot, len, NULL, 0);
> +
> +		free_optinsn_slot(slot, dirty);
>  		op->optinsn.insn = NULL;
>  		op->optinsn.size = 0;
>  	}
> @@ -401,8 +409,15 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
>  			   (u8 *)op->kp.addr + op->optinsn.size);
>  	len += JMP32_INSN_SIZE;
>  
> +	/*
> +	 * Note	len = TMPL_END_IDX + op->optinsn.size + JMP32_INSN_SIZE is also
> +	 * used in __arch_remove_optimized_kprobe().
> +	 */
> +
>  	/* We have to use text_poke() for instruction buffer because it is RO */
> +	perf_event_text_poke(slot, NULL, 0, buf, len);
>  	text_poke(slot, buf, len);
> +
>  	ret = 0;
>  out:
>  	kfree(buf);
> @@ -454,10 +469,23 @@ void arch_optimize_kprobes(struct list_head *oplist)
>   */
>  void arch_unoptimize_kprobe(struct optimized_kprobe *op)
>  {
> -	arch_arm_kprobe(&op->kp);
> -	text_poke(op->kp.addr + INT3_INSN_SIZE,
> -		  op->optinsn.copied_insn, DISP32_SIZE);
> +	u8 new[JMP32_INSN_SIZE] = { INT3_INSN_OPCODE, };
> +	u8 old[JMP32_INSN_SIZE];
> +	u8 *addr = op->kp.addr;
> +
> +	memcpy(old, op->kp.addr, JMP32_INSN_SIZE);
> +	memcpy(new + INT3_INSN_SIZE,
> +	       op->optinsn.copied_insn,
> +	       JMP32_INSN_SIZE - INT3_INSN_SIZE);
> +
> +	text_poke(addr, new, INT3_INSN_SIZE);
>  	text_poke_sync();
> +	text_poke(addr + INT3_INSN_SIZE,
> +		  new + INT3_INSN_SIZE,
> +		  JMP32_INSN_SIZE - INT3_INSN_SIZE);
> +	text_poke_sync();
> +
> +	perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE);
>  }
>  
>  /*
> -- 
> 2.17.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 08/13] ftrace: Add perf text poke events for ftrace trampolines
  2020-03-04  9:06 ` [PATCH V4 08/13] ftrace: Add perf text poke " Adrian Hunter
@ 2020-04-01 10:09   ` Peter Zijlstra
  2020-04-01 10:42     ` Adrian Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2020-04-01 10:09 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On Wed, Mar 04, 2020 at 11:06:28AM +0200, Adrian Hunter wrote:
> Add perf text poke events for ftrace trampolines when created and when
> freed.

If I'm not mistaken that ends up like so:

static void ftrace_update_trampoline(struct ftrace_ops *ops)
{
+       unsigned long trampoline = ops->trampoline;
+
	arch_ftrace_update_trampoline(ops);
+       if (ops->trampoline && ops->trampoline != trampoline &&
> +	    (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP)) {
> +		/* Add to kallsyms before the perf events */
+               ftrace_add_trampoline_to_kallsyms(ops);
> +		perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
> +				   ops->trampoline, ops->trampoline_size, false,
> +				   FTRACE_TRAMPOLINE_SYM);
> +		/*
> +		 * Record the perf text poke event after the ksymbol register
> +		 * event.
> +		 */
> +		perf_event_text_poke((void *)ops->trampoline, NULL, 0,
> +				     (void *)ops->trampoline,
> +				     ops->trampoline_size);
	}
}

And afaict, that is wrong.

The thing is; arch_ftrace_update_trampoline() can actually *update* an
existing trampoline, as per the name. Yes it also creates a trampoline
if there isn't one already, but if there already is one, it will modify
it in-place.

I see the appeal of having this event in generic code; but I'm thinking
you'll need the update even in arch code anyway, at which point it'd
probably be easier to do all of this in arch code.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V5 05/13] perf/x86: Add perf text poke events for kprobes
  2020-03-27  8:36         ` [PATCH V5 " Adrian Hunter
  2020-03-31 23:44           ` Masami Hiramatsu
@ 2020-04-01 10:13           ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2020-04-01 10:13 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Masami Hiramatsu, Ingo Molnar, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On Fri, Mar 27, 2020 at 10:36:09AM +0200, Adrian Hunter wrote:
> Add perf text poke events for kprobes. That includes:
> 
>  - the replaced instruction(s) which are executed out-of-line
>    i.e. arch_copy_kprobe() and arch_remove_kprobe()
> 
>  - optimised kprobe function
>    i.e. arch_prepare_optimized_kprobe() and
>       __arch_remove_optimized_kprobe()
> 
>  - optimised kprobe
>    i.e. arch_optimize_kprobes() and arch_unoptimize_kprobe()
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Looks good, but we had these nice graphs illustrating how the various
events connect, I'm thinking that would be nice to have in the
Changelog, perhaps even in a document somewhere.

These are after all 8 events here that all interplay.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 08/13] ftrace: Add perf text poke events for ftrace trampolines
  2020-04-01 10:09   ` Peter Zijlstra
@ 2020-04-01 10:42     ` Adrian Hunter
  2020-04-01 11:14       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Adrian Hunter @ 2020-04-01 10:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On 1/04/20 1:09 pm, Peter Zijlstra wrote:
> On Wed, Mar 04, 2020 at 11:06:28AM +0200, Adrian Hunter wrote:
>> Add perf text poke events for ftrace trampolines when created and when
>> freed.
> 
> If I'm not mistaken that ends up like so:
> 
> static void ftrace_update_trampoline(struct ftrace_ops *ops)
> {
> +       unsigned long trampoline = ops->trampoline;
> +
> 	arch_ftrace_update_trampoline(ops);
> +       if (ops->trampoline && ops->trampoline != trampoline &&
>> +	    (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP)) {
>> +		/* Add to kallsyms before the perf events */
> +               ftrace_add_trampoline_to_kallsyms(ops);
>> +		perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
>> +				   ops->trampoline, ops->trampoline_size, false,
>> +				   FTRACE_TRAMPOLINE_SYM);
>> +		/*
>> +		 * Record the perf text poke event after the ksymbol register
>> +		 * event.
>> +		 */
>> +		perf_event_text_poke((void *)ops->trampoline, NULL, 0,
>> +				     (void *)ops->trampoline,
>> +				     ops->trampoline_size);
> 	}
> }
> 
> And afaict, that is wrong.
> 
> The thing is; arch_ftrace_update_trampoline() can actually *update* an
> existing trampoline, as per the name. Yes it also creates a trampoline
> if there isn't one already, but if there already is one, it will modify
> it in-place.
> 
> I see the appeal of having this event in generic code; but I'm thinking
> you'll need the update even in arch code anyway, at which point it'd
> probably be easier to do all of this in arch code.

For x86, we use text_poke_bp for updates which already does text_poke events
via text_poke_bp_batch.

It might be reasonable to assume other architectures will also need to put
updates through a common text poker which will take care of text_poke events.

The V3 patch had it in arch code but Steven Rostedt asked why it couldn't be
in ftrace_update)trampoline, so I moved it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V4 08/13] ftrace: Add perf text poke events for ftrace trampolines
  2020-04-01 10:42     ` Adrian Hunter
@ 2020-04-01 11:14       ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2020-04-01 11:14 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, Masami Hiramatsu, Steven Rostedt, Borislav Petkov,
	H . Peter Anvin, x86, Mark Rutland, Alexander Shishkin,
	Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On Wed, Apr 01, 2020 at 01:42:50PM +0300, Adrian Hunter wrote:
> On 1/04/20 1:09 pm, Peter Zijlstra wrote:
> > On Wed, Mar 04, 2020 at 11:06:28AM +0200, Adrian Hunter wrote:
> >> Add perf text poke events for ftrace trampolines when created and when
> >> freed.
> > 
> > If I'm not mistaken that ends up like so:
> > 
> > static void ftrace_update_trampoline(struct ftrace_ops *ops)
> > {
> > +       unsigned long trampoline = ops->trampoline;
> > +
> > 	arch_ftrace_update_trampoline(ops);
> > +       if (ops->trampoline && ops->trampoline != trampoline &&
> >> +	    (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP)) {
> >> +		/* Add to kallsyms before the perf events */
> > +               ftrace_add_trampoline_to_kallsyms(ops);
> >> +		perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
> >> +				   ops->trampoline, ops->trampoline_size, false,
> >> +				   FTRACE_TRAMPOLINE_SYM);
> >> +		/*
> >> +		 * Record the perf text poke event after the ksymbol register
> >> +		 * event.
> >> +		 */
> >> +		perf_event_text_poke((void *)ops->trampoline, NULL, 0,
> >> +				     (void *)ops->trampoline,
> >> +				     ops->trampoline_size);
> > 	}
> > }
> > 
> > And afaict, that is wrong.
> > 
> > The thing is; arch_ftrace_update_trampoline() can actually *update* an
> > existing trampoline, as per the name. Yes it also creates a trampoline
> > if there isn't one already, but if there already is one, it will modify
> > it in-place.
> > 
> > I see the appeal of having this event in generic code; but I'm thinking
> > you'll need the update even in arch code anyway, at which point it'd
> > probably be easier to do all of this in arch code.
> 
> For x86, we use text_poke_bp for updates which already does text_poke events
> via text_poke_bp_batch.

Ah, indeed! Damn, I even wrote that code :/

> It might be reasonable to assume other architectures will also need to put
> updates through a common text poker which will take care of text_poke events.

You'd better look, I recently rewrote the x86/ftrace code to be 'sane'
and not re-implement all the text poking stuff itself.

But I suppose that any arch adding support for this can fix that up if
needed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-04-01 11:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-04  9:06 [PATCH V4 00/13] perf/x86: Add perf text poke events Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 01/13] perf: Add perf text poke event Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
2020-03-05  5:58   ` Masami Hiramatsu
2020-03-05  6:10     ` Alexei Starovoitov
2020-03-05  9:04       ` Masami Hiramatsu
2020-03-24 12:31   ` Peter Zijlstra
2020-03-24 12:54     ` Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 04/13] kprobes: Add perf ksymbol events " Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes Adrian Hunter
2020-03-24 12:21   ` Peter Zijlstra
2020-03-26  1:58     ` Masami Hiramatsu
2020-03-26  7:42       ` Adrian Hunter
2020-03-27  8:36         ` [PATCH V5 " Adrian Hunter
2020-03-31 23:44           ` Masami Hiramatsu
2020-04-01 10:13           ` Peter Zijlstra
2020-03-04  9:06 ` [PATCH V4 06/13] ftrace: Add symbols for ftrace trampolines Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 07/13] ftrace: Add perf ksymbol events " Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 08/13] ftrace: Add perf text poke " Adrian Hunter
2020-04-01 10:09   ` Peter Zijlstra
2020-04-01 10:42     ` Adrian Hunter
2020-04-01 11:14       ` Peter Zijlstra
2020-03-04  9:06 ` [PATCH V4 09/13] perf kcore_copy: Fix module map when there are no modules loaded Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 10/13] perf evlist: Disable 'immediate' events last Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL Adrian Hunter
2020-03-04  9:06 ` [PATCH V4 13/13] perf intel-pt: Add support for text poke events Adrian Hunter
2020-03-16  7:07 ` [PATCH V4 00/13] perf/x86: Add perf " Adrian Hunter
2020-03-24  9:29   ` Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).