LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH V2 00/13] perf/x86: Add perf text poke events
@ 2020-02-12 12:49 Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 01/13] perf: Add perf text poke event Adrian Hunter
                   ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Hi

Here are patches to add a text poke event to record changes to kernel text
(i.e. self-modifying code) in order to support tracers like Intel PT
decoding through jump labels, kprobes and ftrace trampolines.

The first 8 patches make the kernel changes and the subsequent patches are
tools changes.

The next 4 patches add support for updating perf tools' data cache
with the changed bytes.

The last patch is an Intel PT specific tools change.


Changes in V2:

  perf: Add perf text poke event

	Separate out x86 changes
	The text poke event now has old len and new len
	Revised commit message

  perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers

	New patch containing x86 changes from original first patch

  kprobes: Add symbols for kprobe insn pages
  kprobes: Add perf ksymbol events for kprobe insn pages
  perf/x86: Add perf text poke events for kprobes
  ftrace: Add symbols for ftrace trampolines
  ftrace: Add perf ksymbol events for ftrace trampolines
  ftrace: Add perf text poke events for ftrace trampolines
  perf kcore_copy: Fix module map when there are no modules loaded
  perf evlist: Disable 'immediate' events last

	New patches

  perf tools: Add support for PERF_RECORD_TEXT_POKE

	The text poke event now has old len and new len
	Also select ksymbol events with text poke events

  perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL

	New patch

  perf intel-pt: Add support for text poke events

	The text poke event now has old len and new len
	Allow for the address not having a map yet


Changes since RFC:

  Dropped 'flags' from the new event.  The consensus seemed to be that text
  pokes should employ a scheme similar to x86's INT3 method instead.

  dropped tools patches already applied.


Example:

  For jump labels, the kernel needs
	CONFIG_JUMP_LABEL=y
  and also an easy to flip jump label is in sysctl_schedstats() which needs
	CONFIG_SCHEDSTATS=y
	CONFIG_PROC_SYSCTL=y
	CONFIG_SCHED_DEBUG=y

  Also note the 'sudo perf record' is put into the background which, as
  written, needs sudo credential caching (otherwise the background task
  will stop awaiting the sudo password), hence the 'sudo echo' to start.

Before:

  $ sudo echo
  $ sudo perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
  [1] 1640
  $ cat /proc/sys/kernel/sched_schedstats
  0
  $ sudo bash -c 'echo 1 > /proc/sys/kernel/sched_schedstats'
  $ cat /proc/sys/kernel/sched_schedstats
  1
  $ sudo bash -c 'echo 0 > /proc/sys/kernel/sched_schedstats'
  $ cat /proc/sys/kernel/sched_schedstats
  0
  $ sudo kill 1640
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 16.635 MB perf.data.before ]
  $ perf script -i perf.data.before --itrace=e >/dev/null
  Warning:
  1946 instruction trace errors

After:

  $ sudo echo
  $ sudo perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
  [1] 1882
  $ cat /proc/sys/kernel/sched_schedstats
  0
  $ sudo bash -c 'echo 1 > /proc/sys/kernel/sched_schedstats'
  $ cat /proc/sys/kernel/sched_schedstats
  1
  $ sudo bash -c 'echo 0 > /proc/sys/kernel/sched_schedstats'
  $ cat /proc/sys/kernel/sched_schedstats
  0
  $ sudo kill 1882
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 10.893 MB perf.data.after ]
  $ perf script -i perf.data.after --itrace=e
  $


Adrian Hunter (13):
      perf: Add perf text poke event
      perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
      kprobes: Add symbols for kprobe insn pages
      kprobes: Add perf ksymbol events for kprobe insn pages
      perf/x86: Add perf text poke events for kprobes
      ftrace: Add symbols for ftrace trampolines
      ftrace: Add perf ksymbol events for ftrace trampolines
      ftrace: Add perf text poke events for ftrace trampolines
      perf kcore_copy: Fix module map when there are no modules loaded
      perf evlist: Disable 'immediate' events last
      perf tools: Add support for PERF_RECORD_TEXT_POKE
      perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL
      perf intel-pt: Add support for text poke events

 arch/x86/include/asm/kprobes.h            |  4 ++
 arch/x86/include/asm/text-patching.h      |  2 +
 arch/x86/kernel/alternative.c             | 70 +++++++++++++++++++-----
 arch/x86/kernel/ftrace.c                  | 37 ++++++++-----
 arch/x86/kernel/kprobes/core.c            |  7 +++
 arch/x86/kernel/kprobes/opt.c             | 18 ++++++-
 include/linux/ftrace.h                    | 15 ++++--
 include/linux/kprobes.h                   | 28 ++++++++++
 include/linux/perf_event.h                |  8 +++
 include/uapi/linux/perf_event.h           | 26 ++++++++-
 kernel/events/core.c                      | 90 ++++++++++++++++++++++++++++++-
 kernel/kallsyms.c                         | 32 +++++++++--
 kernel/kprobes.c                          | 57 ++++++++++++++++++++
 kernel/trace/ftrace.c                     | 52 +++++++++++++++++-
 tools/include/uapi/linux/perf_event.h     | 26 ++++++++-
 tools/lib/perf/include/perf/event.h       |  9 ++++
 tools/perf/arch/x86/util/intel-pt.c       |  4 ++
 tools/perf/builtin-record.c               | 45 ++++++++++++++++
 tools/perf/util/dso.c                     |  3 ++
 tools/perf/util/dso.h                     |  1 +
 tools/perf/util/event.c                   | 40 ++++++++++++++
 tools/perf/util/event.h                   |  5 ++
 tools/perf/util/evlist.c                  | 31 +++++++----
 tools/perf/util/evlist.h                  |  1 +
 tools/perf/util/evsel.c                   |  7 ++-
 tools/perf/util/intel-pt.c                | 75 ++++++++++++++++++++++++++
 tools/perf/util/machine.c                 | 49 +++++++++++++++++
 tools/perf/util/machine.h                 |  3 ++
 tools/perf/util/map.c                     |  5 ++
 tools/perf/util/map.h                     |  3 +-
 tools/perf/util/perf_event_attr_fprintf.c |  1 +
 tools/perf/util/record.c                  | 10 ++++
 tools/perf/util/record.h                  |  1 +
 tools/perf/util/session.c                 | 23 ++++++++
 tools/perf/util/symbol-elf.c              |  7 +++
 tools/perf/util/symbol.c                  |  1 +
 tools/perf/util/tool.h                    |  3 +-
 37 files changed, 748 insertions(+), 51 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 01/13] perf: Add perf text poke event
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers Adrian Hunter
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Record (single instruction) changes to the kernel text (i.e.
self-modifying code) in order to support tracers like Intel PT and
ARM CoreSight.

A copy of the running kernel code is needed as a reference point (e.g.
from /proc/kcore). The text poke event records the old bytes and the
new bytes so that the event can be processed forwards or backwards.

The basic problem is recording the modified instruction in an
unambiguous manner given SMP instruction cache (in)coherence. That is,
when modifying an instruction concurrently any solution with one or
multiple timestamps is not sufficient:

	CPU0				CPU1
 0
 1	write insn A
 2					execute insn A
 3	sync-I$
 4

Due to I$, CPU1 might execute either the old or new A. No matter where
we record tracepoints on CPU0, one simply cannot tell what CPU1 will
have observed, except that at 0 it must be the old one and at 4 it
must be the new one.

To solve this, take inspiration from x86 text poking, which has to
solve this exact problem due to variable length instruction encoding
and I-fetch windows.

 1) overwrite the instruction with a breakpoint and sync I$

This guarantees that that code flow will never hit the target
instruction anymore, on any CPU (or rather, it will cause an
exception).

 2) issue the TEXT_POKE event

 3) overwrite the breakpoint with the new instruction and sync I$

Now we know that any execution after the TEXT_POKE event will either
observe the breakpoint (and hit the exception) or the new instruction.

So by guarding the TEXT_POKE event with an exception on either side;
we can now tell, without doubt, which instruction another CPU will
have observed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/linux/perf_event.h      |  8 +++
 include/uapi/linux/perf_event.h | 21 +++++++-
 kernel/events/core.c            | 90 ++++++++++++++++++++++++++++++++-
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6d4c22aee384..e2b110e418fa 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1212,6 +1212,9 @@ extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
 extern void perf_event_namespaces(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
+extern void perf_event_text_poke(const void *addr,
+				 const void *old_bytes, size_t old_len,
+				 const void *new_bytes, size_t new_len);
 
 /* Callchains */
 DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
@@ -1462,6 +1465,11 @@ static inline void perf_event_exec(void)				{ }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec)	{ }
 static inline void perf_event_namespaces(struct task_struct *tsk)	{ }
 static inline void perf_event_fork(struct task_struct *tsk)		{ }
+static inline void perf_event_text_poke(const void *addr,
+					const void *old_bytes,
+					size_t old_len,
+					const void *new_bytes,
+					size_t new_len)			{ }
 static inline void perf_event_init(void)				{ }
 static inline int  perf_swevent_get_recursion_context(void)		{ return -1; }
 static inline void perf_swevent_put_recursion_context(int rctx)		{ }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 377d794d3105..bae9e9d2d897 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				text_poke      :  1, /* include text poke events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1007,24 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * Records changes to kernel text i.e. self-modified code. 'old_len' is
+	 * the number of old bytes, 'new_len' is the number of new bytes. Either
+	 * 'old_len' or 'new_len' may be zero to indicate, for example, the
+	 * addition or removal of a trampoline. 'bytes' contains the old bytes
+	 * followed immediately by the new bytes.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				addr;
+	 *	u16				old_len;
+	 *	u16				new_len;
+	 *	u8				bytes[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_TEXT_POKE			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2173c23c25b4..47fca7a4ac7a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -386,6 +386,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_text_poke_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4397,7 +4398,7 @@ static bool is_sb_event(struct perf_event *event)
 	if (attr->mmap || attr->mmap_data || attr->mmap2 ||
 	    attr->comm || attr->comm_exec ||
 	    attr->task || attr->ksymbol ||
-	    attr->context_switch ||
+	    attr->context_switch || attr->text_poke ||
 	    attr->bpf_event)
 		return true;
 	return false;
@@ -4471,6 +4472,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_ksymbol_events);
 	if (event->attr.bpf_event)
 		atomic_dec(&nr_bpf_events);
+	if (event->attr.text_poke)
+		atomic_dec(&nr_text_poke_events);
 
 	if (dec) {
 		if (!atomic_add_unless(&perf_sched_count, -1, 1))
@@ -8309,6 +8312,89 @@ void perf_event_bpf_event(struct bpf_prog *prog,
 	perf_iterate_sb(perf_event_bpf_output, &bpf_event, NULL);
 }
 
+struct perf_text_poke_event {
+	const void		*old_bytes;
+	const void		*new_bytes;
+	size_t			pad;
+	u16			old_len;
+	u16			new_len;
+
+	struct {
+		struct perf_event_header	header;
+
+		u64				addr;
+	} event_id;
+};
+
+static int perf_event_text_poke_match(struct perf_event *event)
+{
+	return event->attr.text_poke;
+}
+
+static void perf_event_text_poke_output(struct perf_event *event, void *data)
+{
+	struct perf_text_poke_event *text_poke_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u64 padding = 0;
+	int ret;
+
+	if (!perf_event_text_poke_match(event))
+		return;
+
+	perf_event_header__init_id(&text_poke_event->event_id.header, &sample, event);
+
+	ret = perf_output_begin(&handle, event, text_poke_event->event_id.header.size);
+	if (ret)
+		return;
+
+	perf_output_put(&handle, text_poke_event->event_id);
+	perf_output_put(&handle, text_poke_event->old_len);
+	perf_output_put(&handle, text_poke_event->new_len);
+
+	__output_copy(&handle, text_poke_event->old_bytes, text_poke_event->old_len);
+	__output_copy(&handle, text_poke_event->new_bytes, text_poke_event->new_len);
+
+	if (text_poke_event->pad)
+		__output_copy(&handle, &padding, text_poke_event->pad);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
+void perf_event_text_poke(const void *addr, const void *old_bytes,
+			  size_t old_len, const void *new_bytes, size_t new_len)
+{
+	struct perf_text_poke_event text_poke_event;
+	size_t tot, pad;
+
+	if (!atomic_read(&nr_text_poke_events))
+		return;
+
+	tot  = sizeof(text_poke_event.old_len) + old_len;
+	tot += sizeof(text_poke_event.new_len) + new_len;
+	pad  = ALIGN(tot, sizeof(u64)) - tot;
+
+	text_poke_event = (struct perf_text_poke_event){
+		.old_bytes    = old_bytes,
+		.new_bytes    = new_bytes,
+		.pad          = pad,
+		.old_len      = old_len,
+		.new_len      = new_len,
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_TEXT_POKE,
+				.misc = PERF_RECORD_MISC_KERNEL,
+				.size = sizeof(text_poke_event.event_id) + tot + pad,
+			},
+			.addr = (u64)addr,
+		},
+	};
+
+	perf_iterate_sb(perf_event_text_poke_output, &text_poke_event, NULL);
+}
+
 void perf_event_itrace_started(struct perf_event *event)
 {
 	event->attach_state |= PERF_ATTACH_ITRACE;
@@ -10623,6 +10709,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_ksymbol_events);
 	if (event->attr.bpf_event)
 		atomic_inc(&nr_bpf_events);
+	if (event->attr.text_poke)
+		atomic_inc(&nr_text_poke_events);
 
 	if (inc) {
 		/*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 01/13] perf: Add perf text poke event Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Add support for perf text poke event for text_poke_bp_batch() callers. That
includes jump labels. See comments for more details.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/kernel/alternative.c | 37 ++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 34360ca301a2..737e7a842f85 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -3,6 +3,7 @@
 
 #include <linux/module.h>
 #include <linux/sched.h>
+#include <linux/perf_event.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/stringify.h>
@@ -946,6 +947,7 @@ struct text_poke_loc {
 	s32 rel32;
 	u8 opcode;
 	const u8 text[POKE_MAX_OPCODE_SIZE];
+	u8 old;
 };
 
 struct bp_patching_desc {
@@ -1114,8 +1116,10 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
 	/*
 	 * First step: add a int3 trap to the address that will be patched.
 	 */
-	for (i = 0; i < nr_entries; i++)
+	for (i = 0; i < nr_entries; i++) {
+		tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
 		text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
+	}
 
 	text_poke_sync();
 
@@ -1123,14 +1127,45 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
 	 * Second step: update all but the first byte of the patched range.
 	 */
 	for (do_sync = 0, i = 0; i < nr_entries; i++) {
+		u8 old[POKE_MAX_OPCODE_SIZE] = { tp[i].old, };
 		int len = text_opcode_size(tp[i].opcode);
 
 		if (len - INT3_INSN_SIZE > 0) {
+			memcpy(old + INT3_INSN_SIZE,
+			       text_poke_addr(&tp[i]) + INT3_INSN_SIZE,
+			       len - INT3_INSN_SIZE);
 			text_poke(text_poke_addr(&tp[i]) + INT3_INSN_SIZE,
 				  (const char *)tp[i].text + INT3_INSN_SIZE,
 				  len - INT3_INSN_SIZE);
 			do_sync++;
 		}
+
+		/*
+		 * Emit a perf event to record the text poke, primarily to
+		 * support Intel PT decoding which must walk the executable code
+		 * to reconstruct the trace. The flow up to here is:
+		 *   - write INT3 byte
+		 *   - IPI-SYNC
+		 *   - write instruction tail
+		 * At this point the actual control flow will be through the
+		 * INT3 and handler and not hit the old or new instruction.
+		 * Intel PT outputs FUP/TIP packets for the INT3, so the flow
+		 * can still be decoded. Subsequently:
+		 *   - emit RECORD_TEXT_POKE with the new instruction
+		 *   - IPI-SYNC
+		 *   - write first byte
+		 *   - IPI-SYNC
+		 * So before the text poke event timestamp, the decoder will see
+		 * either the old instruction flow or FUP/TIP of INT3. After the
+		 * text poke event timestamp, the decoder will see either the
+		 * new instruction flow or FUP/TIP of INT3. Thus decoders can
+		 * use the timestamp as the point at which to modify the
+		 * executable code.
+		 * The old instruction is recorded so that the event can be
+		 * processed forwards or backwards.
+		 */
+		perf_event_text_poke(text_poke_addr(&tp[i]), old, len,
+				     tp[i].text, len);
 	}
 
 	if (do_sync) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 01/13] perf: Add perf text poke event Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-26 15:21   ` Jiri Olsa
  2020-02-26 15:22   ` Jiri Olsa
  2020-02-12 12:49 ` [PATCH V2 04/13] kprobes: Add perf ksymbol events " Adrian Hunter
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Symbols are needed for tools to describe instruction addresses. Pages
allocated for kprobe's purposes need symbols to be created for them.
Add such symbols to be visible via /proc/kallsyms.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/linux/kprobes.h | 28 +++++++++++++++++++++++++
 kernel/kallsyms.c       | 32 +++++++++++++++++++++++++----
 kernel/kprobes.c        | 45 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 04bdaf01112c..6613682f3961 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -242,6 +242,7 @@ struct kprobe_insn_cache {
 	struct mutex mutex;
 	void *(*alloc)(void);	/* allocate insn page */
 	void (*free)(void *);	/* free insn page */
+	const char *sym;	/* symbol for insn pages */
 	struct list_head pages; /* list of kprobe_insn_page */
 	size_t insn_size;	/* size of instruction slot */
 	int nr_garbage;
@@ -272,6 +273,8 @@ static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
 {									\
 	return __is_insn_slot_addr(&kprobe_##__name##_slots, addr);	\
 }
+#define KPROBE_INSN_PAGE_SYM		"kprobe_insn_page"
+#define KPROBE_OPTINSN_PAGE_SYM		"kprobe_optinsn_page"
 #else /* __ARCH_WANT_KPROBES_INSN_SLOT */
 #define DEFINE_INSN_CACHE_OPS(__name)					\
 static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
@@ -373,6 +376,13 @@ void dump_kprobe(struct kprobe *kp);
 void *alloc_insn_page(void);
 void free_insn_page(void *page);
 
+int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
+		       char *sym);
+int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
+			     unsigned long *value, char *type, char *sym);
+
+int arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
+			    char *type, char *sym);
 #else /* !CONFIG_KPROBES: */
 
 static inline int kprobes_built_in(void)
@@ -435,6 +445,24 @@ static inline bool within_kprobe_blacklist(unsigned long addr)
 {
 	return true;
 }
+static inline int kprobe_get_kallsym(unsigned int symnum, unsigned long *value,
+				     char *type, char *sym)
+{
+	return 0;
+}
+static inline int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c,
+					   unsigned int *symnum,
+					   unsigned long *value, char *type,
+					   char *sym)
+{
+	return 0;
+}
+static inline int arch_kprobe_get_kallsym(unsigned int *symnum,
+					  unsigned long *value, char *type,
+					  char *sym)
+{
+	return 0;
+}
 #endif /* CONFIG_KPROBES */
 static inline int disable_kretprobe(struct kretprobe *rp)
 {
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 136ce049c4ad..fd5579171087 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -24,6 +24,7 @@
 #include <linux/slab.h>
 #include <linux/filter.h>
 #include <linux/ftrace.h>
+#include <linux/kprobes.h>
 #include <linux/compiler.h>
 
 /*
@@ -438,6 +439,7 @@ struct kallsym_iter {
 	loff_t pos_arch_end;
 	loff_t pos_mod_end;
 	loff_t pos_ftrace_mod_end;
+	loff_t pos_bpf_end;
 	unsigned long value;
 	unsigned int nameoff; /* If iterating in core kernel symbols. */
 	char type;
@@ -497,11 +499,28 @@ static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)
 
 static int get_ksymbol_bpf(struct kallsym_iter *iter)
 {
+	int ret;
+
 	strlcpy(iter->module_name, "bpf", MODULE_NAME_LEN);
 	iter->exported = 0;
-	return bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
-			       &iter->value, &iter->type,
-			       iter->name) < 0 ? 0 : 1;
+	ret = bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
+			      &iter->value, &iter->type,
+			      iter->name);
+	if (ret < 0) {
+		iter->pos_bpf_end = iter->pos;
+		return 0;
+	}
+
+	return 1;
+}
+
+static int get_ksymbol_kprobe(struct kallsym_iter *iter)
+{
+	strlcpy(iter->module_name, "kprobe", MODULE_NAME_LEN);
+	iter->exported = 0;
+	return kprobe_get_kallsym(iter->pos - iter->pos_bpf_end,
+				  &iter->value, &iter->type,
+				  iter->name) < 0 ? 0 : 1;
 }
 
 /* Returns space to next name. */
@@ -528,6 +547,7 @@ static void reset_iter(struct kallsym_iter *iter, loff_t new_pos)
 		iter->pos_arch_end = 0;
 		iter->pos_mod_end = 0;
 		iter->pos_ftrace_mod_end = 0;
+		iter->pos_bpf_end = 0;
 	}
 }
 
@@ -552,7 +572,11 @@ static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
 	    get_ksymbol_ftrace_mod(iter))
 		return 1;
 
-	return get_ksymbol_bpf(iter);
+	if ((!iter->pos_bpf_end || iter->pos_bpf_end > pos) &&
+	    get_ksymbol_bpf(iter))
+		return 1;
+
+	return get_ksymbol_kprobe(iter);
 }
 
 /* Returns false if pos at or past end of file. */
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..229d1b596690 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -118,6 +118,7 @@ struct kprobe_insn_cache kprobe_insn_slots = {
 	.mutex = __MUTEX_INITIALIZER(kprobe_insn_slots.mutex),
 	.alloc = alloc_insn_page,
 	.free = free_insn_page,
+	.sym = KPROBE_INSN_PAGE_SYM,
 	.pages = LIST_HEAD_INIT(kprobe_insn_slots.pages),
 	.insn_size = MAX_INSN_SIZE,
 	.nr_garbage = 0,
@@ -296,6 +297,7 @@ struct kprobe_insn_cache kprobe_optinsn_slots = {
 	.mutex = __MUTEX_INITIALIZER(kprobe_optinsn_slots.mutex),
 	.alloc = alloc_insn_page,
 	.free = free_insn_page,
+	.sym = KPROBE_OPTINSN_PAGE_SYM,
 	.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
 	/* .insn_size is initialized later */
 	.nr_garbage = 0,
@@ -2179,6 +2181,49 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
 	return 0;
 }
 
+int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
+			     unsigned long *value, char *type, char *sym)
+{
+	struct kprobe_insn_page *kip;
+	int ret = -ERANGE;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(kip, &c->pages, list) {
+		if ((*symnum)--)
+			continue;
+		strlcpy(sym, c->sym, KSYM_NAME_LEN);
+		*type = 't';
+		*value = (unsigned long)kip->insns;
+		ret = 0;
+		break;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
+int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
+				   char *type, char *sym)
+{
+	return -ERANGE;
+}
+
+int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
+		       char *sym)
+{
+#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
+	if (!kprobe_cache_get_kallsym(&kprobe_insn_slots, &symnum, value, type, sym))
+		return 0;
+#ifdef CONFIG_OPTPROBES
+	if (!kprobe_cache_get_kallsym(&kprobe_optinsn_slots, &symnum, value, type, sym))
+		return 0;
+#endif
+#endif
+	if (!arch_kprobe_get_kallsym(&symnum, value, type, sym))
+		return 0;
+	return -ERANGE;
+}
+
 int __init __weak arch_populate_kprobe_blacklist(void)
 {
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 04/13] kprobes: Add perf ksymbol events for kprobe insn pages
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (2 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 05/13] perf/x86: Add perf text poke events for kprobes Adrian Hunter
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Symbols are needed for tools to describe instruction addresses. Pages
allocated for kprobe's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/uapi/linux/perf_event.h |  5 +++++
 kernel/kprobes.c                | 12 ++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index bae9e9d2d897..9b38ac04c110 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1031,6 +1031,11 @@ enum perf_event_type {
 enum perf_record_ksymbol_type {
 	PERF_RECORD_KSYMBOL_TYPE_UNKNOWN	= 0,
 	PERF_RECORD_KSYMBOL_TYPE_BPF		= 1,
+	/*
+	 * Out of line code such as kprobe-replaced instructions or optimized
+	 * kprobes.
+	 */
+	PERF_RECORD_KSYMBOL_TYPE_OOL		= 2,
 	PERF_RECORD_KSYMBOL_TYPE_MAX		/* non-ABI */
 };
 
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 229d1b596690..f880eb2189c0 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -35,6 +35,7 @@
 #include <linux/ftrace.h>
 #include <linux/cpu.h>
 #include <linux/jump_label.h>
+#include <linux/perf_event.h>
 
 #include <asm/sections.h>
 #include <asm/cacheflush.h>
@@ -184,6 +185,10 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 	kip->cache = c;
 	list_add_rcu(&kip->list, &c->pages);
 	slot = kip->insns;
+
+	/* Record the perf ksymbol register event after adding the page */
+	perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL, (u64)kip->insns,
+			   PAGE_SIZE, false, c->sym);
 out:
 	mutex_unlock(&c->mutex);
 	return slot;
@@ -202,6 +207,13 @@ static int collect_one_slot(struct kprobe_insn_page *kip, int idx)
 		 * next time somebody inserts a probe.
 		 */
 		if (!list_is_singular(&kip->list)) {
+			/*
+			 * Record perf ksymbol unregister event before removing
+			 * the page.
+			 */
+			perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
+					   (u64)kip->insns, PAGE_SIZE, true,
+					   kip->cache->sym);
 			list_del_rcu(&kip->list);
 			synchronize_rcu();
 			kip->cache->free(kip->insns);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 05/13] perf/x86: Add perf text poke events for kprobes
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (3 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 04/13] kprobes: Add perf ksymbol events " Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 06/13] ftrace: Add symbols for ftrace trampolines Adrian Hunter
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Add perf text poke events for kprobes. That includes:

 - the replaced instruction(s) which are executed out-of-line
   i.e. arch_copy_kprobe() and arch_remove_kprobe()

 - optimised kprobe function
   i.e. arch_prepare_optimized_kprobe() and
      __arch_remove_optimized_kprobe()

 - optimised kprobe
   i.e. arch_optimize_kprobes() and arch_unoptimize_kprobe()

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/include/asm/kprobes.h       |  4 ++++
 arch/x86/include/asm/text-patching.h |  2 ++
 arch/x86/kernel/alternative.c        | 35 +++++++++++++++++-----------
 arch/x86/kernel/kprobes/core.c       |  7 ++++++
 arch/x86/kernel/kprobes/opt.c        | 18 +++++++++++++-
 5 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 95b1f053bd96..542ce120a54d 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -65,11 +65,15 @@ struct arch_specific_insn {
 	 */
 	bool boostable;
 	bool if_modifier;
+	/* Number of bytes of text poked */
+	int tp_len;
 };
 
 struct arch_optimized_insn {
 	/* copy of the original instructions */
 	kprobe_opcode_t copied_insn[DISP32_SIZE];
+	/* Number of bytes of text poked */
+	int tp_len;
 	/* detour code buffer */
 	kprobe_opcode_t *insn;
 	/* the size of instructions copied to detour code buffer */
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index 67315fa3956a..13bb51a7789c 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -45,6 +45,8 @@ extern void *text_poke(void *addr, const void *opcode, size_t len);
 extern void text_poke_sync(void);
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
+extern void __text_poke_bp(void *addr, const void *opcode, size_t len,
+			   const void *emulate, const u8 *oldptr);
 extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate);
 
 extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 737e7a842f85..c8cfc97abc9e 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1075,6 +1075,7 @@ static int tp_vec_nr;
  * text_poke_bp_batch() -- update instructions on live kernel on SMP
  * @tp:			vector of instructions to patch
  * @nr_entries:		number of entries in the vector
+ * @oldptr:		pointer to original old insn byte
  *
  * Modify multi-byte instruction by using int3 breakpoint on SMP.
  * We completely avoid stop_machine() here, and achieve the
@@ -1092,7 +1093,8 @@ static int tp_vec_nr;
  *		  replacing opcode
  *	- sync cores
  */
-static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
+static void text_poke_bp_batch(struct text_poke_loc *tp,
+			       unsigned int nr_entries, const u8 *oldptr)
 {
 	struct bp_patching_desc desc = {
 		.vec = tp,
@@ -1117,7 +1119,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
 	 * First step: add a int3 trap to the address that will be patched.
 	 */
 	for (i = 0; i < nr_entries; i++) {
-		tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
+		tp[i].old = oldptr ? *oldptr : *(u8 *)text_poke_addr(&tp[i]);
 		text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
 	}
 
@@ -1274,7 +1276,7 @@ static bool tp_order_fail(void *addr)
 static void text_poke_flush(void *addr)
 {
 	if (tp_vec_nr == TP_VEC_MAX || tp_order_fail(addr)) {
-		text_poke_bp_batch(tp_vec, tp_vec_nr);
+		text_poke_bp_batch(tp_vec, tp_vec_nr, NULL);
 		tp_vec_nr = 0;
 	}
 }
@@ -1299,6 +1301,20 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
 	text_poke_loc_init(tp, addr, opcode, len, emulate);
 }
 
+void __ref __text_poke_bp(void *addr, const void *opcode, size_t len,
+			  const void *emulate, const u8 *oldptr)
+{
+	struct text_poke_loc tp;
+
+	if (unlikely(system_state == SYSTEM_BOOTING)) {
+		text_poke_early(addr, opcode, len);
+		return;
+	}
+
+	text_poke_loc_init(&tp, addr, opcode, len, emulate);
+	text_poke_bp_batch(&tp, 1, oldptr);
+}
+
 /**
  * text_poke_bp() -- update instructions on live kernel on SMP
  * @addr:	address to patch
@@ -1310,15 +1326,8 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
  * dynamically allocated memory. This function should be used when it is
  * not possible to allocate memory.
  */
-void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate)
+void __ref text_poke_bp(void *addr, const void *opcode, size_t len,
+			const void *emulate)
 {
-	struct text_poke_loc tp;
-
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
-	text_poke_loc_init(&tp, addr, opcode, len, emulate);
-	text_poke_bp_batch(&tp, 1);
+	return __text_poke_bp(addr, opcode, len, emulate, NULL);
 }
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 579d30e91a36..12ea05d923ec 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -33,6 +33,7 @@
 #include <linux/hardirq.h>
 #include <linux/preempt.h>
 #include <linux/sched/debug.h>
+#include <linux/perf_event.h>
 #include <linux/extable.h>
 #include <linux/kdebug.h>
 #include <linux/kallsyms.h>
@@ -470,6 +471,9 @@ static int arch_copy_kprobe(struct kprobe *p)
 	/* Also, displacement change doesn't affect the first byte */
 	p->opcode = buf[0];
 
+	p->ainsn.tp_len = len;
+	perf_event_text_poke(p->ainsn.insn, NULL, 0, buf, len);
+
 	/* OK, write back the instruction(s) into ROX insn buffer */
 	text_poke(p->ainsn.insn, buf, len);
 
@@ -514,6 +518,9 @@ void arch_disarm_kprobe(struct kprobe *p)
 void arch_remove_kprobe(struct kprobe *p)
 {
 	if (p->ainsn.insn) {
+		/* Record the perf event before freeing the slot */
+		perf_event_text_poke(p->ainsn.insn, p->ainsn.insn,
+				     p->ainsn.tp_len, NULL, 0);
 		free_insn_slot(p->ainsn.insn, p->ainsn.boostable);
 		p->ainsn.insn = NULL;
 	}
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 3f45b5c43a71..0f0b84b3f4b9 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -6,6 +6,7 @@
  * Copyright (C) Hitachi Ltd., 2012
  */
 #include <linux/kprobes.h>
+#include <linux/perf_event.h>
 #include <linux/ptrace.h>
 #include <linux/string.h>
 #include <linux/slab.h>
@@ -332,6 +333,10 @@ static
 void __arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
 {
 	if (op->optinsn.insn) {
+		/* Record the perf event before freeing the slot */
+		if (dirty)
+			perf_event_text_poke(op->optinsn.insn, op->optinsn.insn,
+					     op->optinsn.tp_len, NULL, 0);
 		free_optinsn_slot(op->optinsn.insn, dirty);
 		op->optinsn.insn = NULL;
 		op->optinsn.size = 0;
@@ -401,6 +406,9 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
 			   (u8 *)op->kp.addr + op->optinsn.size);
 	len += JMP32_INSN_SIZE;
 
+	op->optinsn.tp_len = len;
+	perf_event_text_poke(slot, NULL, 0, buf, len);
+
 	/* We have to use text_poke() for instruction buffer because it is RO */
 	text_poke(slot, buf, len);
 	ret = 0;
@@ -439,7 +447,8 @@ void arch_optimize_kprobes(struct list_head *oplist)
 		insn_buff[0] = JMP32_INSN_OPCODE;
 		*(s32 *)(&insn_buff[1]) = rel;
 
-		text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE, NULL);
+		__text_poke_bp(op->kp.addr, insn_buff, JMP32_INSN_SIZE, NULL,
+			       &op->kp.opcode);
 
 		list_del_init(&op->list);
 	}
@@ -454,9 +463,16 @@ void arch_optimize_kprobes(struct list_head *oplist)
  */
 void arch_unoptimize_kprobe(struct optimized_kprobe *op)
 {
+	u8 old[POKE_MAX_OPCODE_SIZE];
+	u8 new[POKE_MAX_OPCODE_SIZE] = { op->kp.opcode, };
+	size_t len = INT3_INSN_SIZE + DISP32_SIZE;
+
+	memcpy(old, op->kp.addr, len);
 	arch_arm_kprobe(&op->kp);
 	text_poke(op->kp.addr + INT3_INSN_SIZE,
 		  op->optinsn.copied_insn, DISP32_SIZE);
+	memcpy(new + INT3_INSN_SIZE, op->optinsn.copied_insn, DISP32_SIZE);
+	perf_event_text_poke(op->kp.addr, old, len, new, len);
 	text_poke_sync();
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 06/13] ftrace: Add symbols for ftrace trampolines
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (4 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 05/13] perf/x86: Add perf text poke events for kprobes Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 07/13] ftrace: Add perf ksymbol events " Adrian Hunter
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via /proc/kallsyms.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/kernel/ftrace.c | 26 ++++++++++----------
 include/linux/ftrace.h   | 15 ++++++++----
 kernel/trace/ftrace.c    | 52 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 108ee96f8b66..76920ce85b9c 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -307,9 +307,9 @@ union ftrace_op_code_union {
 
 #define RET_SIZE		1
 
-static unsigned long
-create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
+static void create_trampoline(struct ftrace_ops *ops)
 {
+	unsigned int tramp_size;
 	unsigned long start_offset;
 	unsigned long end_offset;
 	unsigned long op_offset;
@@ -347,10 +347,10 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 */
 	trampoline = alloc_tramp(size + RET_SIZE + sizeof(void *));
 	if (!trampoline)
-		return 0;
+		return;
 
-	*tramp_size = size + RET_SIZE + sizeof(void *);
-	npages = DIV_ROUND_UP(*tramp_size, PAGE_SIZE);
+	tramp_size = size + RET_SIZE + sizeof(void *);
+	npages = DIV_ROUND_UP(tramp_size, PAGE_SIZE);
 
 	/* Copy ftrace_caller onto the trampoline memory */
 	ret = probe_kernel_read(trampoline, (void *)start_offset, size);
@@ -403,15 +403,18 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	/* ALLOC_TRAMP flags lets us know we created it */
 	ops->flags |= FTRACE_OPS_FL_ALLOC_TRAMP;
+	ops->trampoline = (unsigned long)trampoline;
+	ops->trampoline_size = tramp_size;
+
+	ftrace_add_trampoline_to_kallsyms(ops);
 
 	set_vm_flush_reset_perms(trampoline);
 
 	set_memory_ro((unsigned long)trampoline, npages);
 	set_memory_x((unsigned long)trampoline, npages);
-	return (unsigned long)trampoline;
+	return;
 fail:
 	tramp_free(trampoline);
-	return 0;
 }
 
 static unsigned long calc_trampoline_call_offset(bool save_regs)
@@ -435,14 +438,10 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 	ftrace_func_t func;
 	unsigned long offset;
 	unsigned long ip;
-	unsigned int size;
 	const char *new;
 
 	if (!ops->trampoline) {
-		ops->trampoline = create_trampoline(ops, &size);
-		if (!ops->trampoline)
-			return;
-		ops->trampoline_size = size;
+		create_trampoline(ops);
 		return;
 	}
 
@@ -535,6 +534,9 @@ void arch_ftrace_trampoline_free(struct ftrace_ops *ops)
 	if (!ops || !(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
 		return;
 
+	ftrace_remove_trampoline_from_kallsyms(ops);
+	synchronize_rcu();
+
 	tramp_free((void *)ops->trampoline);
 	ops->trampoline = 0;
 }
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index db95244a62d4..70dea4785443 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -58,9 +58,6 @@ struct ftrace_direct_func;
 const char *
 ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
 		   unsigned long *off, char **modname, char *sym);
-int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
-			   char *type, char *name,
-			   char *module_name, int *exported);
 #else
 static inline const char *
 ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
@@ -68,6 +65,13 @@ ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
 {
 	return NULL;
 }
+#endif
+
+#if defined(CONFIG_FUNCTION_TRACER) && defined(CONFIG_DYNAMIC_FTRACE)
+int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
+			   char *type, char *name,
+			   char *module_name, int *exported);
+#else
 static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
 					 char *type, char *name,
 					 char *module_name, int *exported)
@@ -76,7 +80,6 @@ static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *val
 }
 #endif
 
-
 #ifdef CONFIG_FUNCTION_TRACER
 
 extern int ftrace_enabled;
@@ -179,6 +182,9 @@ struct ftrace_ops_hash {
 
 void ftrace_free_init_mem(void);
 void ftrace_free_mem(struct module *mod, void *start, void *end);
+#define FTRACE_TRAMPOLINE_SYM "ftrace_trampoline"
+void ftrace_add_trampoline_to_kallsyms(struct ftrace_ops *ops);
+void ftrace_remove_trampoline_from_kallsyms(struct ftrace_ops *ops);
 #else
 static inline void ftrace_free_init_mem(void) { }
 static inline void ftrace_free_mem(struct module *mod, void *start, void *end) { }
@@ -207,6 +213,7 @@ struct ftrace_ops {
 	struct ftrace_ops_hash		old_hash;
 	unsigned long			trampoline;
 	unsigned long			trampoline_size;
+	struct list_head		list;
 #endif
 };
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9bf1f2cd515e..697ab4759f0c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6174,6 +6174,41 @@ struct ftrace_mod_map {
 	unsigned int		num_funcs;
 };
 
+/* List of trace_ops that have allocated trampolines */
+static LIST_HEAD(ftrace_ops_trampoline_list);
+
+void ftrace_add_trampoline_to_kallsyms(struct ftrace_ops *ops)
+{
+	lockdep_assert_held(&ftrace_lock);
+	list_add_rcu(&ops->list, &ftrace_ops_trampoline_list);
+}
+
+void ftrace_remove_trampoline_from_kallsyms(struct ftrace_ops *ops)
+{
+	lockdep_assert_held(&ftrace_lock);
+	list_del_rcu(&ops->list);
+}
+
+int ftrace_get_trampoline_kallsym(unsigned int symnum, unsigned long *value,
+				  char *type, char *name, char *module_name,
+				  int *exported)
+{
+	struct ftrace_ops *op;
+
+	list_for_each_entry_rcu(op, &ftrace_ops_trampoline_list, list) {
+		if (!op->trampoline || symnum--)
+			continue;
+		*value = op->trampoline;
+		*type = 't';
+		strlcpy(name, FTRACE_TRAMPOLINE_SYM, KSYM_NAME_LEN);
+		strlcpy(module_name, "ftrace", MODULE_NAME_LEN);
+		*exported = 0;
+		return 0;
+	}
+
+	return -ERANGE;
+}
+
 #ifdef CONFIG_MODULES
 
 #define next_to_ftrace_page(p) container_of(p, struct ftrace_page, next)
@@ -6510,6 +6545,7 @@ int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
 {
 	struct ftrace_mod_map *mod_map;
 	struct ftrace_mod_func *mod_func;
+	int ret;
 
 	preempt_disable();
 	list_for_each_entry_rcu(mod_map, &ftrace_mod_maps, list) {
@@ -6536,8 +6572,10 @@ int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
 		WARN_ON(1);
 		break;
 	}
+	ret = ftrace_get_trampoline_kallsym(symnum, value, type, name,
+					    module_name, exported);
 	preempt_enable();
-	return -ERANGE;
+	return ret;
 }
 
 #else
@@ -6549,6 +6587,18 @@ allocate_ftrace_mod_map(struct module *mod,
 {
 	return NULL;
 }
+int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
+			   char *type, char *name, char *module_name,
+			   int *exported)
+{
+	int ret;
+
+	preempt_disable();
+	ret = ftrace_get_trampoline_kallsym(symnum, value, type, name,
+					    module_name, exported);
+	preempt_enable();
+	return ret;
+}
 #endif /* CONFIG_MODULES */
 
 struct ftrace_init_func {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 07/13] ftrace: Add perf ksymbol events for ftrace trampolines
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (5 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 06/13] ftrace: Add symbols for ftrace trampolines Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 08/13] ftrace: Add perf text poke " Adrian Hunter
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/kernel/ftrace.c        | 6 ++++++
 include/uapi/linux/perf_event.h | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 76920ce85b9c..34e1dd495ce4 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -406,7 +406,10 @@ static void create_trampoline(struct ftrace_ops *ops)
 	ops->trampoline = (unsigned long)trampoline;
 	ops->trampoline_size = tramp_size;
 
+	/* Add to kallsyms before the perf events */
 	ftrace_add_trampoline_to_kallsyms(ops);
+	perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL, (u64)trampoline,
+			   tramp_size, false, FTRACE_TRAMPOLINE_SYM);
 
 	set_vm_flush_reset_perms(trampoline);
 
@@ -534,6 +537,9 @@ void arch_ftrace_trampoline_free(struct ftrace_ops *ops)
 	if (!ops || !(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
 		return;
 
+	perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL, (u64)ops->trampoline,
+			   ops->trampoline_size, true, FTRACE_TRAMPOLINE_SYM);
+	/* Remove from kallsyms after the perf events */
 	ftrace_remove_trampoline_from_kallsyms(ops);
 	synchronize_rcu();
 
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9b38ac04c110..f80ce2e9f8b9 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1033,7 +1033,7 @@ enum perf_record_ksymbol_type {
 	PERF_RECORD_KSYMBOL_TYPE_BPF		= 1,
 	/*
 	 * Out of line code such as kprobe-replaced instructions or optimized
-	 * kprobes.
+	 * kprobes or ftrace trampolines.
 	 */
 	PERF_RECORD_KSYMBOL_TYPE_OOL		= 2,
 	PERF_RECORD_KSYMBOL_TYPE_MAX		/* non-ABI */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 08/13] ftrace: Add perf text poke events for ftrace trampolines
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (6 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 07/13] ftrace: Add perf ksymbol events " Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 09/13] perf kcore_copy: Fix module map when there are no modules loaded Adrian Hunter
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Add perf text poke events for ftrace trampolines when created and when
freed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/kernel/ftrace.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 34e1dd495ce4..cec2ffae9e60 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -410,6 +410,8 @@ static void create_trampoline(struct ftrace_ops *ops)
 	ftrace_add_trampoline_to_kallsyms(ops);
 	perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL, (u64)trampoline,
 			   tramp_size, false, FTRACE_TRAMPOLINE_SYM);
+	/* Record the perf text poke event after the ksymbol register event */
+	perf_event_text_poke(trampoline, NULL, 0, trampoline, tramp_size);
 
 	set_vm_flush_reset_perms(trampoline);
 
@@ -537,6 +539,9 @@ void arch_ftrace_trampoline_free(struct ftrace_ops *ops)
 	if (!ops || !(ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
 		return;
 
+	/* Record the text poke event before the ksymbol unregister event */
+	perf_event_text_poke((void *)ops->trampoline, (void *)ops->trampoline,
+			     ops->trampoline_size, NULL, 0);
 	perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL, (u64)ops->trampoline,
 			   ops->trampoline_size, true, FTRACE_TRAMPOLINE_SYM);
 	/* Remove from kallsyms after the perf events */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 09/13] perf kcore_copy: Fix module map when there are no modules loaded
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (7 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 08/13] ftrace: Add perf text poke " Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 10/13] perf evlist: Disable 'immediate' events last Adrian Hunter
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

In the absence of any modules, no "modules" map is created, but there are
other executable pages to map, due to eBPF JIT, kprobe or ftrace. Map them
by recognizing that the first "module" symbol is not necessarily from a
module, and adjust the map accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/symbol-elf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 1965aefccb02..3df77b56e28a 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1452,6 +1452,7 @@ struct kcore_copy_info {
 	u64 first_symbol;
 	u64 last_symbol;
 	u64 first_module;
+	u64 first_module_symbol;
 	u64 last_module_symbol;
 	size_t phnum;
 	struct list_head phdrs;
@@ -1528,6 +1529,8 @@ static int kcore_copy__process_kallsyms(void *arg, const char *name, char type,
 		return 0;
 
 	if (strchr(name, '[')) {
+		if (!kci->first_module_symbol || start < kci->first_module_symbol)
+			kci->first_module_symbol = start;
 		if (start > kci->last_module_symbol)
 			kci->last_module_symbol = start;
 		return 0;
@@ -1725,6 +1728,10 @@ static int kcore_copy__calc_maps(struct kcore_copy_info *kci, const char *dir,
 		kci->etext += page_size;
 	}
 
+	if (kci->first_module_symbol &&
+	    (!kci->first_module || kci->first_module_symbol < kci->first_module))
+		kci->first_module = kci->first_module_symbol;
+
 	kci->first_module = round_down(kci->first_module, page_size);
 
 	if (kci->last_module_symbol) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 10/13] perf evlist: Disable 'immediate' events last
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (8 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 09/13] perf kcore_copy: Fix module map when there are no modules loaded Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE Adrian Hunter
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Events marked as 'immediate' are started before other events to ensure
that there is context at the start of the main tracing events. The same
is true at the end of tracing, so disable 'immediate' events after other
events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1548237b6558..06c4586065cc 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -379,22 +379,33 @@ void evlist__disable(struct evlist *evlist)
 {
 	struct evsel *pos;
 	struct affinity affinity;
-	int cpu, i;
+	int cpu, i, imm = 0;
+	bool has_imm = false;
 
 	if (affinity__setup(&affinity) < 0)
 		return;
 
-	evlist__for_each_cpu(evlist, i, cpu) {
-		affinity__set(&affinity, cpu);
-
-		evlist__for_each_entry(evlist, pos) {
-			if (evsel__cpu_iter_skip(pos, cpu))
-				continue;
-			if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
-				continue;
-			evsel__disable_cpu(pos, pos->cpu_iter - 1);
+	/* Disable 'immediate' events last */
+	for (imm = 0; imm <= 1; imm++) {
+		evlist__for_each_cpu(evlist, i, cpu) {
+			affinity__set(&affinity, cpu);
+
+			evlist__for_each_entry(evlist, pos) {
+				if (evsel__cpu_iter_skip(pos, cpu))
+					continue;
+				if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
+					continue;
+				if (pos->immediate)
+					has_imm = true;
+				if (pos->immediate != imm)
+					continue;
+				evsel__disable_cpu(pos, pos->cpu_iter - 1);
+			}
 		}
+		if (!has_imm)
+			break;
 	}
+
 	affinity__cleanup(&affinity);
 	evlist__for_each_entry(evlist, pos) {
 		if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (9 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 10/13] perf evlist: Disable 'immediate' events last Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL Adrian Hunter
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Add processing for PERF_RECORD_TEXT_POKE events. When a text poke event is
processed, then the kernel dso data cache is updated with the poked bytes.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/include/uapi/linux/perf_event.h     | 21 ++++++++++-
 tools/lib/perf/include/perf/event.h       |  9 +++++
 tools/perf/builtin-record.c               | 45 +++++++++++++++++++++++
 tools/perf/util/event.c                   | 40 ++++++++++++++++++++
 tools/perf/util/event.h                   |  5 +++
 tools/perf/util/evlist.h                  |  1 +
 tools/perf/util/evsel.c                   |  7 +++-
 tools/perf/util/machine.c                 | 43 ++++++++++++++++++++++
 tools/perf/util/machine.h                 |  3 ++
 tools/perf/util/perf_event_attr_fprintf.c |  1 +
 tools/perf/util/record.c                  | 10 +++++
 tools/perf/util/record.h                  |  1 +
 tools/perf/util/session.c                 | 23 ++++++++++++
 tools/perf/util/tool.h                    |  3 +-
 14 files changed, 209 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 377d794d3105..bae9e9d2d897 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				text_poke      :  1, /* include text poke events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1007,24 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * Records changes to kernel text i.e. self-modified code. 'old_len' is
+	 * the number of old bytes, 'new_len' is the number of new bytes. Either
+	 * 'old_len' or 'new_len' may be zero to indicate, for example, the
+	 * addition or removal of a trampoline. 'bytes' contains the old bytes
+	 * followed immediately by the new bytes.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				addr;
+	 *	u16				old_len;
+	 *	u16				new_len;
+	 *	u8				bytes[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_TEXT_POKE			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 18106899cb4e..e3bc9afff150 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -105,6 +105,14 @@ struct perf_record_bpf_event {
 	__u8			 tag[BPF_TAG_SIZE];  // prog tag
 };
 
+struct perf_record_text_poke_event {
+	struct perf_event_header header;
+	__u64			addr;
+	__u16			old_len;
+	__u16			new_len;
+	__u8			bytes[];
+};
+
 struct perf_record_sample {
 	struct perf_event_header header;
 	__u64			 array[];
@@ -360,6 +368,7 @@ union perf_event {
 	struct perf_record_sample		sample;
 	struct perf_record_bpf_event		bpf;
 	struct perf_record_ksymbol		ksymbol;
+	struct perf_record_text_poke_event	text_poke;
 	struct perf_record_header_attr		attr;
 	struct perf_record_event_update		event_update;
 	struct perf_record_header_event_type	event_type;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..538e3622547e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -723,6 +723,43 @@ static int record__auxtrace_init(struct record *rec __maybe_unused)
 
 #endif
 
+static int record__config_text_poke(struct evlist *evlist)
+{
+	struct evsel *evsel;
+	int err;
+
+	/* Nothing to do if text poke is already configured */
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.text_poke)
+			return 0;
+	}
+
+	err = parse_events(evlist, "dummy:u", NULL);
+	if (err)
+		return err;
+
+	evsel = evlist__last(evlist);
+
+	evsel->core.attr.freq = 0;
+	evsel->core.attr.sample_period = 1;
+	evsel->core.attr.text_poke = 1;
+	evsel->core.attr.ksymbol = 1;
+
+	evsel->core.system_wide = true;
+	evsel->no_aux_samples = true;
+	evsel->immediate = true;
+
+	/* Text poke must be collected on all CPUs */
+	perf_cpu_map__put(evsel->core.own_cpus);
+	evsel->core.own_cpus = perf_cpu_map__new(NULL);
+	perf_cpu_map__put(evsel->core.cpus);
+	evsel->core.cpus = perf_cpu_map__get(evsel->core.own_cpus);
+
+	perf_evsel__set_sample_bit(evsel, TIME);
+
+	return 0;
+}
+
 static bool record__kcore_readable(struct machine *machine)
 {
 	char kcore[PATH_MAX];
@@ -2610,6 +2647,14 @@ int cmd_record(int argc, const char **argv)
 	if (rec->opts.full_auxtrace)
 		rec->buildid_all = true;
 
+	if (rec->opts.text_poke) {
+		err = record__config_text_poke(rec->evlist);
+		if (err) {
+			pr_err("record__config_text_poke failed, error %d\n", err);
+			goto out;
+		}
+	}
+
 	if (record_opts__config(&rec->opts)) {
 		err = -EINVAL;
 		goto out;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c5447ff516a2..5a5baa4edde5 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -31,6 +31,7 @@
 #include "stat.h"
 #include "session.h"
 #include "bpf-event.h"
+#include "print_binary.h"
 #include "tool.h"
 #include "../perf.h"
 
@@ -54,6 +55,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_KSYMBOL]			= "KSYMBOL",
 	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
+	[PERF_RECORD_TEXT_POKE]			= "TEXT_POKE",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -252,6 +254,14 @@ int perf_event__process_bpf(struct perf_tool *tool __maybe_unused,
 	return machine__process_bpf(machine, event, sample);
 }
 
+int perf_event__process_text_poke(struct perf_tool *tool __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct machine *machine)
+{
+	return machine__process_text_poke(machine, event, sample);
+}
+
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRI_lx64 "(%#" PRI_lx64 ") @ %#" PRI_lx64 "]: %c %s\n",
@@ -398,6 +408,33 @@ size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp)
 		       event->bpf.type, event->bpf.flags, event->bpf.id);
 }
 
+static int text_poke_printer(enum binary_printer_ops op, unsigned int val,
+			     void *extra __maybe_unused, FILE *fp)
+{
+	if (op == BINARY_PRINT_NUM_DATA)
+		return fprintf(fp, " %02x", val);
+	if (op == BINARY_PRINT_LINE_END)
+		return fprintf(fp, "\n");
+	return 0;
+}
+
+size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp)
+{
+	size_t ret = fprintf(fp, " addr %#" PRI_lx64 " old len %u new len %u\n",
+			     event->text_poke.addr,
+			     event->text_poke.old_len,
+			     event->text_poke.new_len);
+
+	ret += fprintf(fp, "     old bytes:");
+	ret += binary__fprintf(event->text_poke.bytes, event->text_poke.old_len,
+			       16, text_poke_printer, NULL, fp);
+	ret += fprintf(fp, "     new bytes:");
+	ret += binary__fprintf(event->text_poke.bytes + event->text_poke.old_len,
+			       event->text_poke.new_len, 16, text_poke_printer,
+			       NULL, fp);
+	return ret;
+}
+
 size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 {
 	size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -439,6 +476,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_BPF_EVENT:
 		ret += perf_event__fprintf_bpf(event, fp);
 		break;
+	case PERF_RECORD_TEXT_POKE:
+		ret += perf_event__fprintf_text_poke(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 85223159737c..cbe98e22527f 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -345,6 +345,10 @@ int perf_event__process_bpf(struct perf_tool *tool,
 			    union perf_event *event,
 			    struct perf_sample *sample,
 			    struct machine *machine);
+int perf_event__process_text_poke(struct perf_tool *tool,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct machine *machine);
 int perf_event__process(struct perf_tool *tool,
 			union perf_event *event,
 			struct perf_sample *sample,
@@ -378,6 +382,7 @@ size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 int kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index f5bd5c386df1..93e9b35e53b0 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -175,6 +175,7 @@ struct callchain_param;
 void perf_evlist__set_id_pos(struct evlist *evlist);
 bool perf_can_sample_identifier(void);
 bool perf_can_record_switch_events(void);
+bool perf_can_record_text_poke_events(void);
 bool perf_can_record_cpu_wide(void);
 bool perf_can_aux_sample(void);
 void perf_evlist__config(struct evlist *evlist, struct record_opts *opts,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c8dc4450884c..acaa82a0787d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1096,7 +1096,12 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts,
 	attr->mmap  = track;
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
-	attr->ksymbol = track && !perf_missing_features.ksymbol;
+	/*
+	 * ksymbol is tracked separately with text poke because it needs to be
+	 * system wide and enabled immediately.
+	 */
+	if (!opts->text_poke)
+		attr->ksymbol = track && !perf_missing_features.ksymbol;
 	attr->bpf_event = track && !opts->no_bpf_event && !perf_missing_features.bpf;
 
 	if (opts->record_namespaces)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c8c5410315e8..c2825aaf3927 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -770,6 +770,47 @@ int machine__process_ksymbol(struct machine *machine __maybe_unused,
 	return machine__process_ksymbol_register(machine, event, sample);
 }
 
+int machine__process_text_poke(struct machine *machine, union perf_event *event,
+			       struct perf_sample *sample __maybe_unused)
+{
+	struct map *map = maps__find(&machine->kmaps, event->text_poke.addr);
+	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+	if (dump_trace)
+		perf_event__fprintf_text_poke(event, stdout);
+
+	if (!event->text_poke.new_len)
+		return 0;
+
+	if (cpumode != PERF_RECORD_MISC_KERNEL) {
+		pr_debug("%s: unsupported cpumode - ignoring\n", __func__);
+		return 0;
+	}
+
+	if (map && map->dso) {
+		u8 *new_bytes = event->text_poke.bytes + event->text_poke.old_len;
+		int ret;
+
+		/*
+		 * Kernel maps might be changed when loading symbols so loading
+		 * must be done prior to using kernel maps.
+		 */
+		map__load(map);
+		ret = dso__data_write_cache_addr(map->dso, map, machine,
+						 event->text_poke.addr,
+						 new_bytes,
+						 event->text_poke.new_len);
+		if (ret != event->text_poke.new_len)
+			pr_debug("Failed to write kernel text poke at %#" PRI_lx64 "\n",
+				 event->text_poke.addr);
+	} else {
+		pr_debug("Failed to find kernel text poke address map for %#" PRI_lx64 "\n",
+			 event->text_poke.addr);
+	}
+
+	return 0;
+}
+
 static struct map *machine__addnew_module_map(struct machine *machine, u64 start,
 					      const char *filename)
 {
@@ -1901,6 +1942,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_ksymbol(machine, event, sample); break;
 	case PERF_RECORD_BPF_EVENT:
 		ret = machine__process_bpf(machine, event, sample); break;
+	case PERF_RECORD_TEXT_POKE:
+		ret = machine__process_text_poke(machine, event, sample); break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index be0a930eca89..0bb086acb7ed 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -135,6 +135,9 @@ int machine__process_mmap2_event(struct machine *machine, union perf_event *even
 int machine__process_ksymbol(struct machine *machine,
 			     union perf_event *event,
 			     struct perf_sample *sample);
+int machine__process_text_poke(struct machine *machine,
+			       union perf_event *event,
+			       struct perf_sample *sample);
 int machine__process_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 651203126c71..4c6c700335a6 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -144,6 +144,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(aux_watermark, p_unsigned);
 	PRINT_ATTRf(sample_max_stack, p_unsigned);
 	PRINT_ATTRf(aux_sample_size, p_unsigned);
+	PRINT_ATTRf(text_poke, p_unsigned);
 
 	return ret;
 }
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 7def66168503..207ba2a65008 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -97,6 +97,11 @@ static void perf_probe_context_switch(struct evsel *evsel)
 	evsel->core.attr.context_switch = 1;
 }
 
+static void perf_probe_text_poke(struct evsel *evsel)
+{
+	evsel->core.attr.text_poke = 1;
+}
+
 bool perf_can_sample_identifier(void)
 {
 	return perf_probe_api(perf_probe_sample_identifier);
@@ -112,6 +117,11 @@ bool perf_can_record_switch_events(void)
 	return perf_probe_api(perf_probe_context_switch);
 }
 
+bool perf_can_record_text_poke_events(void)
+{
+	return perf_probe_api(perf_probe_text_poke);
+}
+
 bool perf_can_record_cpu_wide(void)
 {
 	struct perf_event_attr attr = {
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 5421fd2ad383..127b4ca11fd3 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -46,6 +46,7 @@ struct record_opts {
 	bool	      sample_id;
 	bool	      no_bpf_event;
 	bool	      kcore;
+	bool	      text_poke;
 	unsigned int  freq;
 	unsigned int  mmap_pages;
 	unsigned int  auxtrace_mmap_pages;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index d0d7d25b23e3..20bfdb08264f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -489,6 +489,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->ksymbol = perf_event__process_ksymbol;
 	if (tool->bpf == NULL)
 		tool->bpf = perf_event__process_bpf;
+	if (tool->text_poke == NULL)
+		tool->text_poke = perf_event__process_text_poke;
 	if (tool->read == NULL)
 		tool->read = process_event_sample_stub;
 	if (tool->throttle == NULL)
@@ -658,6 +660,24 @@ static void perf_event__switch_swap(union perf_event *event, bool sample_id_all)
 		swap_sample_id_all(event, &event->context_switch + 1);
 }
 
+static void perf_event__text_poke_swap(union perf_event *event, bool sample_id_all)
+{
+	event->text_poke.addr    = bswap_64(event->text_poke.addr);
+	event->text_poke.old_len = bswap_16(event->text_poke.old_len);
+	event->text_poke.new_len = bswap_16(event->text_poke.new_len);
+
+	if (sample_id_all) {
+		size_t len = sizeof(event->text_poke.old_len) +
+			     sizeof(event->text_poke.new_len) +
+			     event->text_poke.old_len +
+			     event->text_poke.new_len;
+		void *data = &event->text_poke.old_len;
+
+		data += PERF_ALIGN(len, sizeof(u64));
+		swap_sample_id_all(event, data);
+	}
+}
+
 static void perf_event__throttle_swap(union perf_event *event,
 				      bool sample_id_all)
 {
@@ -931,6 +951,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_SWITCH]		  = perf_event__switch_swap,
 	[PERF_RECORD_SWITCH_CPU_WIDE]	  = perf_event__switch_swap,
 	[PERF_RECORD_NAMESPACES]	  = perf_event__namespaces_swap,
+	[PERF_RECORD_TEXT_POKE]		  = perf_event__text_poke_swap,
 	[PERF_RECORD_HEADER_ATTR]	  = perf_event__hdr_attr_swap,
 	[PERF_RECORD_HEADER_EVENT_TYPE]	  = perf_event__event_type_swap,
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -1470,6 +1491,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->ksymbol(tool, event, sample, machine);
 	case PERF_RECORD_BPF_EVENT:
 		return tool->bpf(tool, event, sample, machine);
+	case PERF_RECORD_TEXT_POKE:
+		return tool->text_poke(tool, event, sample, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 2abbf668b8de..006182923bbe 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -56,7 +56,8 @@ struct perf_tool {
 			throttle,
 			unthrottle,
 			ksymbol,
-			bpf;
+			bpf,
+			text_poke;
 
 	event_attr_op	attr;
 	event_attr_op	event_update;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (10 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-12 12:49 ` [PATCH V2 13/13] perf intel-pt: Add support for text poke events Adrian Hunter
  2020-02-24 13:50 ` [PATCH V2 00/13] perf/x86: Add perf " Adrian Hunter
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

PERF_RECORD_KSYMBOL_TYPE_OOL marks an executable page. Create a map backed
only by memory, which will populated as necessary by text poke events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/include/uapi/linux/perf_event.h | 5 +++++
 tools/perf/util/dso.c                 | 3 +++
 tools/perf/util/dso.h                 | 1 +
 tools/perf/util/machine.c             | 6 ++++++
 tools/perf/util/map.c                 | 5 +++++
 tools/perf/util/map.h                 | 3 ++-
 tools/perf/util/symbol.c              | 1 +
 7 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index bae9e9d2d897..f80ce2e9f8b9 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1031,6 +1031,11 @@ enum perf_event_type {
 enum perf_record_ksymbol_type {
 	PERF_RECORD_KSYMBOL_TYPE_UNKNOWN	= 0,
 	PERF_RECORD_KSYMBOL_TYPE_BPF		= 1,
+	/*
+	 * Out of line code such as kprobe-replaced instructions or optimized
+	 * kprobes or ftrace trampolines.
+	 */
+	PERF_RECORD_KSYMBOL_TYPE_OOL		= 2,
 	PERF_RECORD_KSYMBOL_TYPE_MAX		/* non-ABI */
 };
 
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 91f21239608b..9eb50ff6da96 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -191,6 +191,7 @@ int dso__read_binary_type_filename(const struct dso *dso,
 	case DSO_BINARY_TYPE__GUEST_KALLSYMS:
 	case DSO_BINARY_TYPE__JAVA_JIT:
 	case DSO_BINARY_TYPE__BPF_PROG_INFO:
+	case DSO_BINARY_TYPE__OOL:
 	case DSO_BINARY_TYPE__NOT_FOUND:
 		ret = -1;
 		break;
@@ -881,6 +882,8 @@ static struct dso_cache *dso_cache__populate(struct dso *dso,
 
 	if (dso->binary_type == DSO_BINARY_TYPE__BPF_PROG_INFO)
 		*ret = bpf_read(dso, cache_offset, cache->data);
+	else if (dso->binary_type == DSO_BINARY_TYPE__OOL)
+		*ret = DSO__DATA_CACHE_SIZE;
 	else
 		*ret = file_read(dso, machine, cache_offset, cache->data);
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 2db64b79617a..b482453bc3d1 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -40,6 +40,7 @@ enum dso_binary_type {
 	DSO_BINARY_TYPE__GUEST_KCORE,
 	DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO,
 	DSO_BINARY_TYPE__BPF_PROG_INFO,
+	DSO_BINARY_TYPE__OOL,
 	DSO_BINARY_TYPE__NOT_FOUND,
 };
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c2825aaf3927..c7bb1aa81f3e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -730,6 +730,12 @@ static int machine__process_ksymbol_register(struct machine *machine,
 		if (!map)
 			return -ENOMEM;
 
+		if (event->ksymbol.ksym_type == PERF_RECORD_KSYMBOL_TYPE_OOL) {
+			map->dso->binary_type = DSO_BINARY_TYPE__OOL;
+			map->dso->data.file_size = event->ksymbol.len;
+			dso__set_loaded(map->dso);
+		}
+
 		map->start = event->ksymbol.addr;
 		map->end = map->start + event->ksymbol.len;
 		maps__insert(&machine->kmaps, map);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index f67960bedebb..4ec8cfd3967e 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -267,6 +267,11 @@ bool __map__is_bpf_prog(const struct map *map)
 	return name && (strstr(name, "bpf_prog_") == name);
 }
 
+bool __map__is_ool(const struct map *map)
+{
+	return map->dso && map->dso->binary_type == DSO_BINARY_TYPE__OOL;
+}
+
 bool map__has_symbols(const struct map *map)
 {
 	return dso__has_symbols(map->dso);
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 067036e8970c..9e312ae2d656 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -147,11 +147,12 @@ int map__set_kallsyms_ref_reloc_sym(struct map *map, const char *symbol_name,
 bool __map__is_kernel(const struct map *map);
 bool __map__is_extra_kernel_map(const struct map *map);
 bool __map__is_bpf_prog(const struct map *map);
+bool __map__is_ool(const struct map *map);
 
 static inline bool __map__is_kmodule(const struct map *map)
 {
 	return !__map__is_kernel(map) && !__map__is_extra_kernel_map(map) &&
-	       !__map__is_bpf_prog(map);
+	       !__map__is_bpf_prog(map) && !__map__is_ool(map);
 }
 
 bool map__has_symbols(const struct map *map);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 3b379b1296f1..deb7f9bee3af 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1537,6 +1537,7 @@ static bool dso__is_compatible_symtab_type(struct dso *dso, bool kmod,
 		return true;
 
 	case DSO_BINARY_TYPE__BPF_PROG_INFO:
+	case DSO_BINARY_TYPE__OOL:
 	case DSO_BINARY_TYPE__NOT_FOUND:
 	default:
 		return false;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 13/13] perf intel-pt: Add support for text poke events
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (11 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL Adrian Hunter
@ 2020-02-12 12:49 ` Adrian Hunter
  2020-02-24 13:50 ` [PATCH V2 00/13] perf/x86: Add perf " Adrian Hunter
  13 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-12 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Select text poke events when available and the kernel is being traced.
Process text poke events to invalidate entries in Intel PT's instruction
cache.

Example:

  The example requires kernel config:
    CONFIG_PROC_SYSCTL=y
    CONFIG_SCHED_DEBUG=y
    CONFIG_SCHEDSTATS=y

  Before:

    # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
    # cat /proc/sys/kernel/sched_schedstats
    0
    # echo 1 > /proc/sys/kernel/sched_schedstats
    # cat /proc/sys/kernel/sched_schedstats
    1
    # echo 0 > /proc/sys/kernel/sched_schedstats
    # cat /proc/sys/kernel/sched_schedstats
    0
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 3.341 MB perf.data.before ]
    [1]+  Terminated                 perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
    # perf script -i perf.data.before --itrace=e >/dev/null
    Warning:
    474 instruction trace errors

  After:

    # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
    # cat /proc/sys/kernel/sched_schedstats
    0
    # echo 1 > /proc/sys/kernel/sched_schedstats
    # cat /proc/sys/kernel/sched_schedstats
    1
    # echo 0 > /proc/sys/kernel/sched_schedstats
    # cat /proc/sys/kernel/sched_schedstats
    0
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 2.646 MB perf.data.after ]
    [1]+  Terminated                 perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
    # perf script -i perf.data.after --itrace=e >/dev/null

Example:

  The example requires kernel config:
    # CONFIG_FUNCTION_TRACER is not set

  Before:
    # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
    # perf probe __schedule
    Added new event:
      probe:__schedule     (on __schedule)

    You can now use it in all perf tools, such as:

            perf record -e probe:__schedule -aR sleep 1

    # perf record -e probe:__schedule -aR sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.026 MB perf.data (68 samples) ]
    # perf probe -d probe:__schedule
    Removed event: probe:__schedule
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 41.268 MB t1 ]
    [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
    # perf script -i t1 --itrace=e >/dev/null
    Warning:
    207 instruction trace errors

  After:
    # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
    # perf probe __schedule
    Added new event:
      probe:__schedule     (on __schedule)

    You can now use it in all perf tools, such as:

        perf record -e probe:__schedule -aR sleep 1

    # perf record -e probe:__schedule -aR sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.028 MB perf.data (107 samples) ]
    # perf probe -d probe:__schedule
    Removed event: probe:__schedule
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 39.978 MB t1 ]
    [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
    # perf script -i t1 --itrace=e >/dev/null
    # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
    6 565303693547 0x291f18 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027a000 len 4096 type 2 flags 0x0 name kprobe_insn_page
    6 565303697010 0x291f68 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 0 new len 6
    6 565303838278 0x291fa8 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027c000 len 4096 type 2 flags 0x0 name kprobe_optinsn_page
    6 565303848286 0x291ff8 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 0 new len 106
    6 565369336743 0x292af8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
    7 566434327704 0x217c208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
    6 566456313475 0x293198 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 106 new len 0
    6 566456314935 0x293238 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 6 new len 0

Example:

  The example requires kernel config:
    CONFIG_FUNCTION_TRACER=y

  Before:
    # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
    # perf probe __kmalloc
    Added new event:
      probe:__kmalloc      (on __kmalloc)

    You can now use it in all perf tools, such as:

        perf record -e probe:__kmalloc -aR sleep 1

    # perf record -e probe:__kmalloc -aR sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.022 MB perf.data (6 samples) ]
    # perf probe -d probe:__kmalloc
    Removed event: probe:__kmalloc
    # kill %1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 43.850 MB t1 ]
    [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
    # perf script -i t1 --itrace=e >/dev/null
    Warning:
    8 instruction trace errors

  After:
    # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
    # perf probe __kmalloc
    Added new event:
      probe:__kmalloc      (on __kmalloc)

    You can now use it in all perf tools, such as:

            perf record -e probe:__kmalloc -aR sleep 1

    # perf record -e probe:__kmalloc -aR sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.037 MB perf.data (206 samples) ]
    # perf probe -d probe:__kmalloc
    Removed event: probe:__kmalloc
    # kill %1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 41.442 MB t1 ]
    [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
    # perf script -i t1 --itrace=e >/dev/null
    # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
    5 312216133258 0x8bafe0 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc0360000 len 415 type 2 flags 0x0 name ftrace_trampoline
    5 312216133494 0x8bb030 [0x1d8]: PERF_RECORD_TEXT_POKE addr 0xffffffffc0360000 old len 0 new len 415
    5 312216229563 0x8bb208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
    5 312216239063 0x8bb248 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
    5 312216727230 0x8bb288 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
    5 312216739322 0x8bb2c8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
    5 312216748321 0x8bb308 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
    7 313287163462 0x2817430 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
    7 313287174890 0x2817470 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
    7 313287818979 0x28174b0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
    7 313287829357 0x28174f0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
    7 313287841246 0x2817530 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c |  4 ++
 tools/perf/util/intel-pt.c          | 75 +++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 20df442fdf36..ec75445ef7cf 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -827,6 +827,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 		}
 	}
 
+	if (have_timing_info && !intel_pt_evsel->core.attr.exclude_kernel &&
+	    perf_can_record_text_poke_events() && perf_can_record_cpu_wide())
+		opts->text_poke = true;
+
 	if (intel_pt_evsel) {
 		/*
 		 * To obtain the auxtrace buffer file descriptor, the auxtrace
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 33cf8928cf05..64768312c994 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -514,6 +514,17 @@ intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
 	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
 }
 
+static void intel_pt_cache_invalidate(struct dso *dso, struct machine *machine,
+				      u64 offset)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+
+	if (!c)
+		return;
+
+	auxtrace_cache__remove(dso->auxtrace_cache, offset);
+}
+
 static inline u8 intel_pt_cpumode(struct intel_pt *pt, uint64_t ip)
 {
 	return ip >= pt->kernel_start ?
@@ -2592,6 +2603,67 @@ static int intel_pt_process_itrace_start(struct intel_pt *pt,
 					event->itrace_start.tid);
 }
 
+static int intel_pt_find_map(struct thread *thread, u8 cpumode, u64 addr,
+			     struct addr_location *al)
+{
+	if (!al->map || addr < al->map->start || addr >= al->map->end) {
+		if (!thread__find_map(thread, cpumode, addr, al))
+			return -1;
+	}
+
+	return 0;
+}
+
+/* Invalidate all instruction cache entries that overlap the text poke */
+static int intel_pt_text_poke(struct intel_pt *pt, union perf_event *event)
+{
+	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	u64 addr = event->text_poke.addr + event->text_poke.new_len - 1;
+	/* Assume text poke begins in a basic block no more than 4096 bytes */
+	int cnt = 4096 + event->text_poke.new_len;
+	struct thread *thread = pt->unknown_thread;
+	struct addr_location al = { .map = NULL };
+	struct machine *machine = pt->machine;
+	struct intel_pt_cache_entry *e;
+	u64 offset;
+
+	if (!event->text_poke.new_len)
+		return 0;
+
+	for (; cnt; cnt--, addr--) {
+		if (intel_pt_find_map(thread, cpumode, addr, &al)) {
+			if (addr < event->text_poke.addr)
+				return 0;
+			continue;
+		}
+
+		if (!al.map->dso || !al.map->dso->auxtrace_cache)
+			continue;
+
+		offset = al.map->map_ip(al.map, addr);
+
+		e = intel_pt_cache_lookup(al.map->dso, machine, offset);
+		if (!e)
+			continue;
+
+		if (addr + e->byte_cnt + e->length <= event->text_poke.addr) {
+			/*
+			 * No overlap. Working backwards there cannot be another
+			 * basic block that overlaps the text poke if there is a
+			 * branch instruction before the text poke address.
+			 */
+			if (e->branch != INTEL_PT_BR_NO_BRANCH)
+				return 0;
+		} else {
+			intel_pt_cache_invalidate(al.map->dso, machine, offset);
+			intel_pt_log("Invalidated instruction cache for %s at %#"PRIx64"\n",
+				     al.map->dso->long_name, addr);
+		}
+	}
+
+	return 0;
+}
+
 static int intel_pt_process_event(struct perf_session *session,
 				  union perf_event *event,
 				  struct perf_sample *sample,
@@ -2653,6 +2725,9 @@ static int intel_pt_process_event(struct perf_session *session,
 		 event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
 		err = intel_pt_context_switch(pt, event, sample);
 
+	if (!err && event->header.type == PERF_RECORD_TEXT_POKE)
+		err = intel_pt_text_poke(pt, event);
+
 	intel_pt_log("event %u: cpu %d time %"PRIu64" tsc %#"PRIx64" ",
 		     event->header.type, sample->cpu, sample->time, timestamp);
 	intel_pt_log_event(event);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 00/13] perf/x86: Add perf text poke events
  2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
                   ` (12 preceding siblings ...)
  2020-02-12 12:49 ` [PATCH V2 13/13] perf intel-pt: Add support for text poke events Adrian Hunter
@ 2020-02-24 13:50 ` Adrian Hunter
  2020-02-24 14:03   ` Jiri Olsa
  13 siblings, 1 reply; 20+ messages in thread
From: Adrian Hunter @ 2020-02-24 13:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, H . Peter Anvin, x86, Mark Rutland,
	Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

On 12/02/20 2:49 pm, Adrian Hunter wrote:
> Hi
> 
> Here are patches to add a text poke event to record changes to kernel text
> (i.e. self-modifying code) in order to support tracers like Intel PT
> decoding through jump labels, kprobes and ftrace trampolines.
> 
> The first 8 patches make the kernel changes and the subsequent patches are
> tools changes.
> 
> The next 4 patches add support for updating perf tools' data cache
> with the changed bytes.
> 
> The last patch is an Intel PT specific tools change.
> 
> 
> Changes in V2:
> 
>   perf: Add perf text poke event
> 
> 	Separate out x86 changes
> 	The text poke event now has old len and new len
> 	Revised commit message
> 
>   perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
> 
> 	New patch containing x86 changes from original first patch

Any comments?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 00/13] perf/x86: Add perf text poke events
  2020-02-24 13:50 ` [PATCH V2 00/13] perf/x86: Add perf " Adrian Hunter
@ 2020-02-24 14:03   ` Jiri Olsa
  0 siblings, 0 replies; 20+ messages in thread
From: Jiri Olsa @ 2020-02-24 14:03 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	x86, Mark Rutland, Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, linux-kernel

On Mon, Feb 24, 2020 at 03:50:13PM +0200, Adrian Hunter wrote:
> On 12/02/20 2:49 pm, Adrian Hunter wrote:
> > Hi
> > 
> > Here are patches to add a text poke event to record changes to kernel text
> > (i.e. self-modifying code) in order to support tracers like Intel PT
> > decoding through jump labels, kprobes and ftrace trampolines.
> > 
> > The first 8 patches make the kernel changes and the subsequent patches are
> > tools changes.
> > 
> > The next 4 patches add support for updating perf tools' data cache
> > with the changed bytes.
> > 
> > The last patch is an Intel PT specific tools change.
> > 
> > 
> > Changes in V2:
> > 
> >   perf: Add perf text poke event
> > 
> > 	Separate out x86 changes
> > 	The text poke event now has old len and new len
> > 	Revised commit message
> > 
> >   perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
> > 
> > 	New patch containing x86 changes from original first patch
> 
> Any comments?
> 

hi,
I plan to look on it hopefuly today or tomorrow

jirka


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages
  2020-02-12 12:49 ` [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
@ 2020-02-26 15:21   ` Jiri Olsa
  2020-02-28 14:02     ` Adrian Hunter
  2020-02-26 15:22   ` Jiri Olsa
  1 sibling, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2020-02-26 15:21 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	x86, Mark Rutland, Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, linux-kernel

On Wed, Feb 12, 2020 at 02:49:39PM +0200, Adrian Hunter wrote:
> Symbols are needed for tools to describe instruction addresses. Pages
> allocated for kprobe's purposes need symbols to be created for them.
> Add such symbols to be visible via /proc/kallsyms.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

SNIP

> @@ -272,6 +273,8 @@ static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
>  {									\
>  	return __is_insn_slot_addr(&kprobe_##__name##_slots, addr);	\
>  }
> +#define KPROBE_INSN_PAGE_SYM		"kprobe_insn_page"
> +#define KPROBE_OPTINSN_PAGE_SYM		"kprobe_optinsn_page"
>  #else /* __ARCH_WANT_KPROBES_INSN_SLOT */
>  #define DEFINE_INSN_CACHE_OPS(__name)					\
>  static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
> @@ -373,6 +376,13 @@ void dump_kprobe(struct kprobe *kp);
>  void *alloc_insn_page(void);
>  void free_insn_page(void *page);
>  
> +int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> +		       char *sym);
> +int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
> +			     unsigned long *value, char *type, char *sym);
> +
> +int arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> +			    char *type, char *sym);
>  #else /* !CONFIG_KPROBES: */
>  
>  static inline int kprobes_built_in(void)
> @@ -435,6 +445,24 @@ static inline bool within_kprobe_blacklist(unsigned long addr)
>  {
>  	return true;
>  }
> +static inline int kprobe_get_kallsym(unsigned int symnum, unsigned long *value,
> +				     char *type, char *sym)
> +{
> +	return 0;
> +}
> +static inline int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c,
> +					   unsigned int *symnum,
> +					   unsigned long *value, char *type,
> +					   char *sym)
> +{
> +	return 0;
> +}
> +static inline int arch_kprobe_get_kallsym(unsigned int *symnum,
> +					  unsigned long *value, char *type,
> +					  char *sym)
> +{
> +	return 0;
> +}

there's another arch_kprobe_get_kallsym marked as __weak,
is above function superfluous?

jirka

SNIP

> +int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
> +				   char *type, char *sym)
> +{
> +	return -ERANGE;
> +}
> +
> +int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> +		       char *sym)
> +{
> +#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
> +	if (!kprobe_cache_get_kallsym(&kprobe_insn_slots, &symnum, value, type, sym))
> +		return 0;
> +#ifdef CONFIG_OPTPROBES
> +	if (!kprobe_cache_get_kallsym(&kprobe_optinsn_slots, &symnum, value, type, sym))
> +		return 0;
> +#endif
> +#endif
> +	if (!arch_kprobe_get_kallsym(&symnum, value, type, sym))
> +		return 0;
> +	return -ERANGE;
> +}
> +
>  int __init __weak arch_populate_kprobe_blacklist(void)
>  {
>  	return 0;
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages
  2020-02-12 12:49 ` [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
  2020-02-26 15:21   ` Jiri Olsa
@ 2020-02-26 15:22   ` Jiri Olsa
  2020-02-27  7:13     ` Adrian Hunter
  1 sibling, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2020-02-26 15:22 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	x86, Mark Rutland, Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, linux-kernel

On Wed, Feb 12, 2020 at 02:49:39PM +0200, Adrian Hunter wrote:
> Symbols are needed for tools to describe instruction addresses. Pages
> allocated for kprobe's purposes need symbols to be created for them.
> Add such symbols to be visible via /proc/kallsyms.

I can't see kprobes in /proc/kallsyms, I tried making some with
perf probe and some of bcc-tools.. I'm greping for [kprobe]

could you put to changelog soem example /proc/kallsyms output?

jirka


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages
  2020-02-26 15:22   ` Jiri Olsa
@ 2020-02-27  7:13     ` Adrian Hunter
  0 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-27  7:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	x86, Mark Rutland, Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, linux-kernel

On 26/02/20 5:22 pm, Jiri Olsa wrote:
> On Wed, Feb 12, 2020 at 02:49:39PM +0200, Adrian Hunter wrote:
>> Symbols are needed for tools to describe instruction addresses. Pages
>> allocated for kprobe's purposes need symbols to be created for them.
>> Add such symbols to be visible via /proc/kallsyms.
> 
> I can't see kprobes in /proc/kallsyms, I tried making some with
> perf probe and some of bcc-tools.. I'm greping for [kprobe]
> 
> could you put to changelog soem example /proc/kallsyms output?

Will do.  See also the examples in patch 13.

For kprobes you need:

	# CONFIG_FUNCTION_TRACER is not set

otherwise the kprobes are done by ftrace.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages
  2020-02-26 15:21   ` Jiri Olsa
@ 2020-02-28 14:02     ` Adrian Hunter
  0 siblings, 0 replies; 20+ messages in thread
From: Adrian Hunter @ 2020-02-28 14:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	x86, Mark Rutland, Alexander Shishkin, Mathieu Poirier, Leo Yan,
	Arnaldo Carvalho de Melo, linux-kernel

On 26/02/20 5:21 pm, Jiri Olsa wrote:
> On Wed, Feb 12, 2020 at 02:49:39PM +0200, Adrian Hunter wrote:
>> Symbols are needed for tools to describe instruction addresses. Pages
>> allocated for kprobe's purposes need symbols to be created for them.
>> Add such symbols to be visible via /proc/kallsyms.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> SNIP
> 
>> @@ -272,6 +273,8 @@ static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
>>  {									\
>>  	return __is_insn_slot_addr(&kprobe_##__name##_slots, addr);	\
>>  }
>> +#define KPROBE_INSN_PAGE_SYM		"kprobe_insn_page"
>> +#define KPROBE_OPTINSN_PAGE_SYM		"kprobe_optinsn_page"
>>  #else /* __ARCH_WANT_KPROBES_INSN_SLOT */
>>  #define DEFINE_INSN_CACHE_OPS(__name)					\
>>  static inline bool is_kprobe_##__name##_slot(unsigned long addr)	\
>> @@ -373,6 +376,13 @@ void dump_kprobe(struct kprobe *kp);
>>  void *alloc_insn_page(void);
>>  void free_insn_page(void *page);
>>  
>> +int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
>> +		       char *sym);
>> +int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
>> +			     unsigned long *value, char *type, char *sym);
>> +
>> +int arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
>> +			    char *type, char *sym);
>>  #else /* !CONFIG_KPROBES: */
>>  
>>  static inline int kprobes_built_in(void)
>> @@ -435,6 +445,24 @@ static inline bool within_kprobe_blacklist(unsigned long addr)
>>  {
>>  	return true;
>>  }
>> +static inline int kprobe_get_kallsym(unsigned int symnum, unsigned long *value,
>> +				     char *type, char *sym)
>> +{
>> +	return 0;
>> +}
>> +static inline int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c,
>> +					   unsigned int *symnum,
>> +					   unsigned long *value, char *type,
>> +					   char *sym)
>> +{
>> +	return 0;
>> +}
>> +static inline int arch_kprobe_get_kallsym(unsigned int *symnum,
>> +					  unsigned long *value, char *type,
>> +					  char *sym)
>> +{
>> +	return 0;
>> +}
> 
> there's another arch_kprobe_get_kallsym marked as __weak,
> is above function superfluous?

Yes, it is removed in V3.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-02-28 14:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 01/13] perf: Add perf text poke event Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
2020-02-26 15:21   ` Jiri Olsa
2020-02-28 14:02     ` Adrian Hunter
2020-02-26 15:22   ` Jiri Olsa
2020-02-27  7:13     ` Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 04/13] kprobes: Add perf ksymbol events " Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 05/13] perf/x86: Add perf text poke events for kprobes Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 06/13] ftrace: Add symbols for ftrace trampolines Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 07/13] ftrace: Add perf ksymbol events " Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 08/13] ftrace: Add perf text poke " Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 09/13] perf kcore_copy: Fix module map when there are no modules loaded Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 10/13] perf evlist: Disable 'immediate' events last Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 13/13] perf intel-pt: Add support for text poke events Adrian Hunter
2020-02-24 13:50 ` [PATCH V2 00/13] perf/x86: Add perf " Adrian Hunter
2020-02-24 14:03   ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).